Monday, July 28, 2014

PDQm - Patient Demographics Query as Mobile API

The IHE ITI committee last week approved the Patient Demographics Query for Mobile (PDQm) profile go forward to Trial Implementation. This puts it on track for testing at the USA Connectathon in Cleveland. This is the first Profile of the new FHIR core standard. Both are in a trail state, so it is possible things will change. But these both are based on very mature concepts.

The executive summary is that this is a very minimal profile of the HL7 FHIR Patient resource. Two items have been made mandatory, where the FHIR spec leaves them potentially optional. One extension was added to support the pediatrics use-case that is an option in PDQ. Otherwise it is a simple FHIR 'GET' transaction, aka Query, that returns a Bundle (aka Atom Feed) of the resulting matching entries.

Use-case for PDQm

 The Patient Demographics Query for Mobile (PDQm) Profile defines a lightweight RESTful interface to a patient demographics supplier leveraging technologies readily available to mobile applications and lightweight browser based applications.
  • The functionality is identical to the PDQ Profile described in the ITI TF-1:8. The differences are transport and messaging format of messages and queries. The profile leverages HTTP transport, and the JavaScript Object Notation (JSON), Simple-XML, and Representational State Transfer (REST). The payload format is defined by the HL7 Fast Health Interoperable Resources (FHIR) draft standard.
Using these patterns, the PDQm Profile exposes the functionality of a patient demographics supplier to mobile applications and lightweight browser applications.

The following list provides a few examples of how PDQm might be leveraged by implementers:
  • A health portal securely exposing demographics data to browser based plugins
  • Medical devices which need to access patient demographic information
  • Mobile devices used by physicians (example bedside eCharts) which need to establish patient context by scanning a bracelet
  • Web based EHR/EMR applications which wish to provide dynamic updates of patient demographic information such as a non-postback search, additional demographic detail, etc.
  • Any low resource application which exposes patient demographic search functionality
  • Any application using the MHD Profile to access documents may use PDQm to find an appropriate patient identifier
  • This supplement is intended to be fully compliant with the HL7 FHIR specification, providing only use-case driven constraints to aid with interoperability, deterministic results, and compatibility with existing PDQ and PDQv3 Profiles.
Currently the HL7 FHIR standard is in “Draft Standard for Test Use” (DSTU) and may experience a large amount of change during this phase. Readers are advised that, while the profiled components in this supplement may not accurately reflect the most recent version of the FHIR standard, implementations of PDQm will be tested as specified in this supplement. Changes to the FHIR DSTU will be integrated into this supplement via the formal IHE Change Proposal (CP) process.

Using the FHIR Patient resource, IHE needed only profile two elements. IHE sets the patient name and identifier as mandatory, whereas the core FHIR allows them to be empty. This is the FHIR Patient Resource as it is used by PDQm.

FHIR Patient Resource ++ 

This is the definition for the Patient Resource contained within the Query Patient Resource response message. The purpose of the definition is to describe the data elements relevant for this transaction. It is a restriction of the Patient Resource found in chapter 5.1.2 of the FHIR standard. For the complete FHIR definition of this Resource please see ITI TF-2x: Appendix W

Pediatrics Demographic Option

PDQm has usecases to support new-born that have not been fully identified, and thus require that the mother’s maiden name be included in their Patient resource. This was introduced to PDQ with a Pediatrics Option. This use-case is not currently supported by the FHIR DSTU, so IHE has extended the Patient resource to include this. IHE will be bringing the use-case to HL7 for proper handling (recognizing that it is most likely the mothers name, not the mother’s maiden name that is desired).

"extensionDefn" : [
       "code" : "mothersMaidenName",
       "contextType" : "resource",
       "context" : [ "Patient" ],
       "definition" : {
              "short" : "Patient's mother's maiden name",
              "formal" : "The name of the patient's mother",
              "min" : 0,
              "max" : 1,
              "type" : [
                           "code" : "HumanName"
              "isModifier" : false

PDQm as a RESTful API to a mature PDQ and PIX environment.

PDQm can be viewed as an API to a classic PDQ or PDQ v3 environment. Same PDQ use cases and interactions, using the FHIR encoding for the transaction from the mobile consumer. This diagram shows an implementation of a gateway that converts the PDQm into a normal PDQ transaction via a gateway.

PDQ attributes 

The following table shows the attribute mapping between PDQ, PDQ v3, and PDQm.

Abstract Field
Identifier List                                    
PID.3 and PID.18
PID.5 and PID.9
Date / Time of Birth
Telecommunications Address(es)
PID.13 and PID.14
Language(s) of communication
Marital Status
Non-Medical Identifiers
PID.19 and PID.20
Death Date/Time
Mother’s Maiden Name
See ITI TF-2c:
Patient Birth Order

PDQ Query Parameters

And the query parameters, mapped between the three

Abstract Parameter
Identifier List
@PID.3 and @PID.18
given and family
Date / Time of Birth
Domains to be Returned
See ITI TF-2c:
Mother’s Maiden Name
mothersMaidenName.given and
Patient Telecommunications Addresses

The result of a query is a “Bundle”, which for XML encoding is derived in FHIR from the standards based ATOM feed.


Here is a sample Query, in HTTP form, requesting JSON format be returned, all Male patients with the Given name “John” and the Family name “Smith”.

GET http://pdm-sample:8080/iti-y1/Patient?_format=application/json+fhir&gender=M&family=Smith&given=John&count=10
User-Agent: Fiddler
Host: pdq-sample:8080

Here is a sample result.  Amazingly there is only one John Smith.

HTTP/1.1 200 OK
Connection: close
Content-Type: application/json+fhir; charset=UTF-8
Content-Length: 2683
Date: Sun, 06 Apr 2014 20:38:23 GMT
Expires: Sat, 05 Apr 2014 20:38:20 GMT

  "resourceType" : "Bundle",
  "title" : "Search results for resource type Patient",
  "id" : "urn:uuid:c179d5bd-e81e-4fe0-981a-46f6c6588f",
  "link" : [
      "href" : "http://pdm-sample:8080/iti-y1/Patient?_format=application/json+fhir&gender=M&family=Smith&given=John&count=10",
      "rel" : "self"
  "updated" : "2014-04-06T20:38:23Z",
  "totalResults" : "1",
  "entry" : [
      "title" : "Patient \"1\"",
      "id" : "http://pdm-sample:8080/iti-y1/Patient/1",
      "link" : [
          "href" : "http://pdm-sample:8080/iti-y1/Patient/1",
          "rel" : "self"
      "updated" : "2014-03-11T20:34:55Z",
      "content" : {
        "resourceType" : "Patient",
        "text" : {
          "status" : "generated",
          "div" : "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<div xmlns=\"\">John Smith (Male) - 1974-12-25</div>"
        "identifier" : [
            "use" : "usual",
            "system" : "urn:oid:",
            "value" : "12345",
            "assigner" : {
              "display" : "Acme Healthcare"
        "name" : [
            "use" : "official",
            "family" : [
            "given" : [
            "use" : "usual",
            "given" : [
        "telecom" : [
            "use" : "home"
            "system" : "phone",
            "value" : "+1(202)555-6474",
            "use" : "work"
        "gender" : {
          "coding" : [
              "system" : "",
              "code" : "M",
              "display" : "Male"
        "birthDate" : "1974-12-25",
        "deceasedDateTime" : null,
        "address" : [
            "use" : "home",
            "line" : [
              "123 Main St. West Unit 33"
            "city" : "Chicago",
            "state" : "IL",
            "zip" : "00000"
        "managingOrganization" : {
          "display" : "ACME Medical Centres"
        "active" : true
      "summary" : "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<div xmlns=\"\">John Smith (Male) - 1974-12-25</div>"



IHE won't be the only one to profile FHIR. This supplement helps give synergy between the two PDQ flavors, providing a new flavor. The advantage of this new flavor is that applications can more easily process it. It doesn't really make the server job easier

Blog References on mHealth

Thursday, July 3, 2014

Book: The Lean Startup

I am sure that GE is not alone in adopting the concepts of ‘The Lean Startup’ by Eric Ries. As a GE employee I am surrounded by this, especially as a Design Engineer, especially one that works on leading-edge products and standards development. There is no internal meeting where the key-words of The Lean Startup are not spoken. So learning what these terms really mean is important.

The Lean Startup is a fantastic book. I was afraid that it would be unreasonable, too focused on the loose-and-carefree world of startups. It however is nothing of the kind. It very carefully distinguishes the methodology of The Lean Startup. It carefully shows how to use and when not to use the methodology.

What is very nice about the book is the narrative methodology. For each topic covered it includes an illustrative example or two. Often including a failure case in addition to a successful case. Adding to the credibility the author uses his failures most of the time.

I work in a Medical Device vendor, that is highly structured around ‘the waterfall’ design methodology. The clear apparent conflict between this and The Lean Startup methodology are covered. Turns out this has been thought through, and there are ways to get the best of both worlds.

I highly suggest reading the whole book. I learned stuff that is just not learnable through the Executive overview or common training. In reading the book I learned that many people are using the terminology wrong, while others are using it right. The most abused word is “Pivot” and “MVP”. I now can distinguish people just using the buzzwords from those that are using the proper concepts. I am now much more comfortable with the business and design organization changes; as they are clearly using the proper concepts.

Lucky for me GE has a book service where I can get these business purpose books free. The cool part is that most of these books come in audio form, or e-book form. I got this one in audio form, which is not as nice as Their audio form is a bunch of MP3 files. This would have been nice with my old MP3 player, but now days I am using an iPhone-5. I couldn’t figure out how to get simple MP3 files to the iPhone, I refuse to load iTunes. So I ended up with a web based hack. Ultimately this hack didn’t hurt too bad. The MP3 files were all about an hour and a half long, just perfect for my workout. Added benefit is the book is read by the author.

Wednesday, July 2, 2014

PCAST - Big Data: A Technological Perspective

The USA White-house "President Council of Advisors on Science and Technology (PCAST)" has produced an interesting paper on "Big Data: A Technological Perspective". It is a worthy read, and I think much of it is rather level-headed and good advice to the President. I highly recommend reading it. It is a nice layout for a college level course in big-data privacy.

USA Centric viewpoint is bad:

It however is extremely USA centric. This is to be expected, it is being written by PCAST. This is however that USA centric thinking that lead the NSA and FBI to do things that are not internationally friendly, especially privacy. Thus now the USA based businesses are not looked at by the international community as an appropriate place to store or to allow to manage their data. This short-sighted viewpoint is killing the USA market for big-data. I think this is the most important policy change that must happen. The USA government must behave better, especially regarding international perspectives. The USA can be both Friendly to Privacy, and Friendly to Business. These two are not in conflict.

Data Collection:

The paper argues
Recommendation 1. Policy Attention should focus more on the actual uses of big data and less on its collection and analysis.
That there should not be regulations on data collection, that it is improper use of data that should be regulated. I agree with the outcome of this statement, but not the means. The outcome is that regulation is needed to punish poor use of data. This follows the principle I have explained on regulations, regulations need to be written to the outcome not the technology. Note their Recommendation 2 is specifically on the topic of regulating outcomes, not the technology.

What I don't like about Recommendation 1 is that is it presumes that all data that is collected will be perfectly protected. I don't see how anyone can presume that any data is going to be perfectly protected. There are breaches of data all the time. The Recommendation totally ignores all the unintended-use caused by a breach. I ague this happens more than the other uses.

This Recommendation 1 is implicitly guided by the concept that although we might not have a use for the data we are collecting today, there might be a use for it in the future. If we don't collect it now, it won't be there in the future. Storage space for unnecessary data is cheap. I understand this business intent. I just think that the risks to exposure are higher than the future benefit of undefined use.

I would simply augment Recommendation 1 to guide for gathering the minimum data that is necessary for the intended use.

De-Identification is good when used right:

I have already commented on the topic of De-Identification and anonymization, even for free-text. This is a case where the report outright says that De-Identification should not be used as it is too easily defeated, especially where data fusion is applied.
Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data. In general, as the size and diversity of available data grows, the likelihood of being able to re‐identify individuals (that is, re‐associate their records with their names) grows substantially. While anonymization may remain somewhat useful as an added safeguard in some situations, approaches that deem it, by itself, a sufficient safeguard need updating.
I don't think that this concept is in conflict with what I say. In the case of the PCAST report, they are taking the perspective that data should be allowed to be gathered in full fidelity where it is highly protected. If one is going to highly protect it, then it doesn't benefit from the reduced risk that de-identification brings. To this point I agree. It is far better to protect the original data, than to expose the data that are poorly de-identified.

I do however think that there are 'uses' of data where de-identification is appropriate. Even if simply as a data reduction technique. That is to use the de-identification process to eliminate data elements that are not necessary for the intended use of the resulting data set. This elimination of data elements ends up with a smaller data-set, more concentrated on the needed elements. The lower risk is an added benefit.

In all cases where de-identification is used; one must consider the residual risk of the resulting data-set. Unless that data-set is empty, then there is some residual risk.


Overall I like this paper. It has some depth that is nice to have. I encourage a full read of the paper, as the executive overview doesn't cover everything . The above observations are not surprising given that there are three representatives from Google on the team that wrote this paper. I am surprised at the lack of many of the other big-data perspectives.

Sunday, June 29, 2014

De-Identifying free-text

Each time I blog on De-Identification, I get questions about free-text fields.  The focus on free-text fields is covered in ISO 25237 healthcare specification on De-Identification, in DICOM, and in the IHE Handbook on De-Identification. They all cover the more broad concept of “Non-Structured Data Variables”. This includes free-text fields, but also recognizes other data that are not well constrained upon input and storage. So this includes Voice Recordings, Images, and even calls out the medical imaging standards of DICOM. DICOM also addresses this problem space, pointing out that historically it is common for the Radiology image to have burned-into the image identifying and routing information.

Why is Free-Text a concern?

So the specific problem with non-struct
ured data elements/attributes/variables are that they could contain Direct Identifiers, Indirect Identifiers, or simply non-identifying data. It is the very fact that they are non-structured that results in this non-deterministic situation. Often times these fields are simple text-editing fields where the clinician can write anything they want. They might be prompted to enter relevant information, like description of the disposition of the patient. However without restrictions, the clinician could have put the patient name into the field.

Free-text is not safe

This is where the Intended Use-case of the resulting data-set comes into play. If there is no need that comes from the Intended Use-case for these non-structured data fields, then simply dropping them. My second rule of De-Identification is that by default you get ZERO data elements. The intended use-case needs to justify everything that is provided. Most of the time the value of a free-text field is of no value. Often times they are included simply to future-proof a workflow. That is there is a free-text field to handle ‘anything else’. So deleting the free-text field is really the most likely right way to handle it.

Useful Free-text

However there are times where for example a study is being done, and those clinicians participating are instructed to put critical study information into a normally unused free-text field. Thus the field content is critical to the Intended Use-case of the resulting data-set. In this case, we do know that the otherwise free-text field, really might contain some structured data. So the easy thing to do is pre-process the free-text fields to extract out the information that is needed by the Intended Use-case of the resulting data-set into a structured and coded entry. Throw away the rest of the free-text field. Treat the new coded entry according to the normal processing rules, which does mean it must be determined if it is a Direct Identifier, Indirect Identifier, or non-identifying data; and one must also look at the values resulting to determine if they might themselves identify an individual.

What is important here is to recognize that you have converted the free-text into structured and coded values. So, ultimately you are not passing free-text, the free-text field is destroyed.

Unstructured Image

It is much less likely that you can post-process images or voice into structured data, but I am not going to say it can’t be done. DICOM has a very comprehensive treatment of De-Identifying DICOM objects.  Specifically Chapter E of Part 15

5. The de-identifier should ensure that no identifying information that is burned in to the image pixel data either because the modality does not generate such burned in identification in the first place, or by removing it through the use of the Clean Pixel Data Option; see Section E.3. If non-pixel data graphics or overlays contain identification, the de-identifier is required to remove them, or clean them if the Clean Graphics option is supported. See Section E.3.3 The means by which burned in or graphic identifying information is located and removed is outside the scope of this standard.


I am a fan of De-Identification, when used properly and for the right reason. However De-Identification is not the only tool to be used, sometimes data simply should be properly managed, including Access Controls and Audit Controls. This same conclusion is true of data that are De-Identified, that is unless you end up with the null-set then you will have some risk that needs to be properly managed, including Access Controls and Audit Controls.

Free-text fields, all fields that you don't know have a specific structure to them, need to be treated carefully. Best case is to delete their content, but if you need part of the content then parse that information out into structured and coded values, discarding the original free-text.

Friday, June 27, 2014

De-Identification: process reduce risk of identification of entries in a data-set

It has been a very active De-Identification month. There have been many blog articles lately, some complaining about other blogs. Other blogs saying how over blown re-identification is. Many of these were inspired by the USA White-house "President Council of Advisors on Science and Technology (PCAST)" that produced an interesting paper on "Big Data: A Technological Perspective".

I would like to say first: YOU ALL ARE RIGHT, yet also sligntly twisted in your perspective.

Whenever the topic of De-Identification comes up, I am quick to remind the audience that "The only truly de-identified data are the null-set!". It is important that everyone understand that as long as there are any data, there is some risk. This is not unlike encryption, at the extremes, in that brute force can crack all encryption, but the 'key' (pun intended) is to make it so hard to brute force that the cost is simply too expensive (lengthy). Unlike encryption, the maturity and effectiveness of de-identification is much less. 

There are plenty of cases where someone thought they had done a good enough job of de-identifying, only to be proven wrong. These cases are really embarrasing to those of us that are trying to use de-identification. But these cases almost always fail due to poor execution of the 'de-identification process'.

De-Identification is a process to reduce risk.

I have been working on the revision of the ISO 25237 healthcare specification on De-Identification. We are making it even more clear that this is just a risk reduction, not an elimination of risk. Often times the result of a de-identification process is a data-set that still has some risk. Thus the de-identification process must consider the Security and Privacy controls that will manage the resulting data-set. It is rare to lower the risk so much that the data-set needs no ongoing security controls.

The following is a visualization of this  process. This shows that the top-most concept is de-identification, as a process.  This process utilizes sub-processes: Pseudonymization and/or Anonymization. These sub-processes use various tools that are specific to the type of data element they operate on, and the method of risk reduction.

The presumption is that zero data are allowed to pass through the system. Each element must be justified by the intended use of the resulting data-set. This intended use of the data-set greatly affects the de-identification process.


De-Identification might leverage Pseudonymization where longitudinal consistency is needed. This might be to keep a bunch of records together that should be associated with each other, where without this longitudinal consistency they might get disassociated. This is useful to keep all of the records for a patient together, under a pseudonym. This also can be used to assure that each time data are extracted into a de-identified set that new entries are also associated with the same pseudonym. In Pseudonymization the algorithm used might be intentionally reversible, or intentionally not-reversible. A reversible scheme might be a secret lookup-table that where authorized can be used to discover the original identity. In non-reversable is a temporary table might be used during the process, but is destroyed when the process completes.


Anonymization is the process and set of tools used where no longitudinal consistency is needed. The Anonymization process is also used where Pseudonymization has been used to address the remaining data attributes. Anonymization utilizes tools like Redaction, Removal, Blanking,  Substitution, Randomization, Shifting, Skewing, Truncation, Grouping, etc.

Each element allowed to pass must be justified. Each element must present the minimal risk, given the intended use of the resulting data-set. Thus where the intended use of the resulting data-set does not require fine-grain codes, a grouping of codes might be used.

Direct and Indirect Identifiers

De-Identification process identifies three kinds of data: Direct identifiers, that by themselves identify the patient; Indirect identifiers, that provide correlation when used with other indirect or external knowledge; and non-identifying data, the rest of the data. Some also refer to indirect identifiers as 'pseudo identifiers'.

Usually a de-identification process is applied to a data-set, made up of entries that have many attributes. For example a spreadsheet, made up of rows of data organized by column.

The de-identification process, including pseudonymization and anonymization, are applied to all the data. Pseudonymization generally are used against direct identifiers, but might be used against indirect identifiers, as appropriate to reduce risk while maintaining the longitudinal needs of the intended use of the resulting data-set. Anonymization tools are used against all forms of data, as appropriate to reduce risk.

IHE De-Identification Handbook

Books on De-Identification

I just finished reading, and highly recommend the book "Anonymizing Health Data: Case Studies and Methods to Get You Started", by Khaled El Emam. This is a good read for someone needing to understand the de-identification domain. It is not a reference, or deep instructional document. I presume his other books cover that. There are some really compelling examples in this book, real-world examples. There is also a very nicely done explanation, high-level explanation, of the quantitative mechanism to assess residual risk on a resulting data-set. Such as K-anonymity.

References to Blog articles

Friday, June 6, 2014

FW: IHE ITI Published: PDQm, SeR, and De-Id Handbook

 I will further explain these new supplements and handbook in later posts

IHE IT Infrastructure Technical Framework Supplements Published for Public Comment

The IHE IT Infrastructure Technical Committee has published the following supplements to the IHE IT Infrastructure Technical Framework for public comment in the period from June 6 through July 5, 2014:
  • Patient Demographics Query for Mobile (PDQm) 
  • Secure Retrieve (SeR)
The documents are available for download at Comments submitted by July 5, 2014 will be considered by the IHE IT Infrastructure Technical Committee in developing the trial implementation versions of the supplements. Comments can be submitted at

The committee has also published the following Handbook:
  • De-Identification
    • De-Identification Mapping (Excel file)
The documents are available for download at Comments on all documents are invited at any time and can be submitted at