Wednesday, December 7, 2011

Patient Identity Matching

IHE has Patient Identity Matching profiles like PIX for inside and across a health information exchange (HIE), and XCPD for across communities (e.g. NwHIN-Exchange). Patient Matching is also known as Multi-Patient-ID (eMPI), which is a system that matches many different patient identities (PID) in a cross-reference. This is where the name comes for the IHE Profile: “Patient Identity cross-reference” (PIX). This is not to be confused with a Master-Patient-Identify, which is a concept where there is a master patient ID that everyone uses. The Cross-Community Patient Demographics (XCPD) Profile is more suited to support of multi-community duty.

The slide at the right is from an upcoming IHE educational webinar on Patient Identity. It shows ALL of the IHE profiles that are relevant as combined into a 'system'. This system happens to combine all the Actors simply for educational purposes, but very reasonable products could do sub-selections of this for specific purposes.

IHE does not define the internal workings of the eMPI system which might implement the IHE PIX manager and/or XCPD responder actors. There is much left not specifically defined by the PIX or XCPD profile. There are several items of interest in the requirements of a PIX manager, specifically demographic matching and the value of globally unique identifiers.

These concepts are often used in combination, such as the XDS concept of an Affinity Domain, which requires a Master Patient identity that is used in XDS as the patient ID. When other entities using a local patient identifier wish to communicate using XDS, a Cross-Reference system is used to map to this Master Patient identity. XCPD operates in environments where no eMPI system or Master Patient Identity exists. Entities use demographic matching to correlate patients with selected partners, as necessary. No central, or authoritative eMPI system enables this process so, to keep correlations up-to-date, repeated matching requests are needed.

Demographics Matching
Usually an eMPI operating environment tries to define the minimal attributes necessary to make a fuzzy-match algorithm function good-enough. Good-enough is a subjective assessment but includes that most of the time a positive match is found, and only very few times does it produce a false match or a false non-match. This tuning of the matching algorithm is the primary function of an eMPI. The downside of using a centralized eMPI service is that it has a database of all of these demographics, and is thus a point of security/privacy threat.

The minimal attributes are often things like First-Name, Last-Name, Date-of-Birth, and Sex. These values are delivered to the Patient ID Manager in the Patient Identity Feed transaction (essentially a basic ADT message). They are then ‘normalized’ to handle things like uppercase vs lowercase; like initials; like spelling differences (e.g. Rich, Richard, Dick). These 4 attributes have been used well beyond the healthcare industry. For example they are used in the gambling world by the ‘house’ to detect repeat offenders. In-fact they use a system that doesn’t store the individuals demographics, but rather a cryptographic value, lowering the risk of disclosure if their database was exposed. This is the trick that John Halamka referred to in #8 of his post on Freeing the Data. It is also used to keep one casino’s clientele list from the other casinos, so there is a strong business requirement.

Change over Time
Recognize that these values, and other values such as their phone number or address, change overtime. This is shown by the figure at the right, which comes from The HIT Standards Summer Camp Patient Matching report in August, 2011. This change overtime can be detected, and when it is detected both the new and old are remembered as equivalence. In this way one can match data that is submitted under the old or new demographics. This does require that the eMPI hold many generations of entries. Identities and demographics changing over time do add complexity, but reality must be recognized.

Because this information changes over time we need to recognize that as well. There should also be a place where the most current set of demographics are. It might not be the 'authoritative' set, but it sure would be good to be using the First or Last name that the patient wants to be addressed by. Which brings up the topic of the longitudinal record. The HIE and Community Exchange are longitudinal, meaning they ultimately will contain many decades of records on any one patient. Throughout this longitudinal record many of the factors will change, even those that are shown above way off the chart to the lower right (meaning they are highly stable). This means that when pulling a record from an HIE of any kind, one must not necessarily expect that the demographics inside that document represent the current or even local understanding of the patients demographics. This doesn't mean the Document should not be interrogated, but when discrepancies are found they should be somewhat expected with possibly only a warning message to the user.

Additional Identifiers
There are many different types of identifiers that a patient can use to uniquely identify themselves. If these identifiers are provided as input to the eMPI they help produce a better positive match. If these identifiers are treated as opaque and fully specified identifiers they don’t require special handling. This is to say that both the identifier and the identifier of the assigning authority are submitted. With a generic system like this the solution supports endless types of identifiers.

For instance, if the patient has a SSN, one enters the SSN with an ‘assigning authority’ for the SSN admin (i.e., 2.16.840.1.113883.4.1). If the patient has their insurance card, you enter that with the insurance admin as the assigning authority. If the patient carries a patient id from another facility, you enter that. It is always a <ID value> + <assigning authority value>; this is just another patient-ID in the context of an eMPI (even when the ID isn’t healthcare specific). It is very important that everyone uses the same values for ‘assigning authority’, so one does not need a ‘value-set’. This is especially true when the assigning authority doesn’t have its own globally unique assigning authority value, or the value is hard to discover.

Universal Health ID
What would be best is if there was some form of universal health ID. This is currently used in other countries such as across Europe in the epSOS multi-country exchange. There is regulations forbidding the USA government from funding an effort to create a universal health ID.  A unique approach to get around this is a neffort to create a digital identity for Medicare beneficiaries. Interesting how they get around the ‘can’t fund a universal ID’ problem by scoping it to Medicare beneficiaries.

A very visible example of a universal ID (that comes with a unique string encoding) is an e-mail address: One can see how this will work with PHRs to create a globally unique patient ID, for example my HealthVault id can be viewed as This is entered simply as another patient-ID, and if it has ever been submitted in the past, it will be there for a positive match. E-mail addresses come with a built in globally unique assigning authority, the second part of the address (e.g. “”). These are globally unique simply because of the internet domain name system.

Another approach to using identifiers to improve patient matching is the Voluntary Universal Healthcare Identifier ( which supports creation and management of patient identifiers that are independent of a particular healthcare provider entity so can be used to match patients in an eMPI.

Note that with any ID the biggest concern is to be sure you have an authoritative ID. We are used to looking at Drivers Licenses or Passports to get an authoritative identifier. We somehow trust that a patient can tell us their SSN (really bad practice due to the well known fraud and identity theft). When it comes to a patient presenting something like an e-mail address, there is a reasonable concern that this information is not authoritative. But clearly, it should be seen as just as authoritative as the SSN. Likely with an e-mail address we can work up mechanisms to prove they are authoritative before we use them, very much like the banking industry and e-mail distribution lists.

Security Considerations
Clearly Security is important for any system that holds sensitive information. The eMPI is a form of a directory, a specific form. When queried using PDQ it looks more like a directory. One must recognize that the patient demographics and identifiers are sensitive (valuable). So the eMPI system must be protect against security risks: Risks to Confidentiality, Risks to Integrity, and Risks to Availability.

Clearly when accepting query requests or information, the eMPI needs to make sure the query request or information is authentic and authorative. This is typically done, and profiled in IHE with ATNA, with a mutual-authentication of the communications. That is that the requesting system can authenticate that they have connected to the correct and authentic eMPI, but also that the eMPI can be assured that the system that has connected to it is authentic and authorized.  This system level authentication is usually enough for an eMPI, especially PIX/PDQ in a HIE. The XCPD profile also supports user assertions using the XUA profile. This allows the XCPD interface to an eMPI to make more fine grain decisions, but more importantly to record in the audit log more fully. This said, recognize that data returned to a system like an EHR is usually totally available to everyone in that EHR.

The eMPI should also be able to protect the different types of attributes that it holds. That is it might consider some attributes more sensitive than others. For example as I showed above the eMPI can authenticate the system that is sending a Query. For some of these systems might be more highly trusted with all the attributes, where other systems would be allowed only access to the healthcare identifiers.

Consent enforcement
There are use-cases where the very knowledge that a consumer has information at a healthcare providing location is considered controlled by privacy policy. This is true of the highly sensitive health topics (e.g. 42 CFR Part 2), but is also true in some states. In these cases there is a need to have the eMPI recognize the current state of patient consent to disclose. That is that the eMPI must not let others know that the patient has an identifier (or data) when the patient has not authorized it. In this case the eMPI acts as if the patient simply doesn't exist.

The HIT Standards Summer Camp covered Patient Matching and produced their report in August, 2011. This report leverages the more detailed report from ONC on Privacy and Security Solutions for Interoperable Health Information Exchange - Perspectives on Patient Matching: Approaches, Findings, and Challenges.

I thank Karen Witting (IBM) for helping produce this article. Karen has extensive knowledge of the Patient Matching domain, acquired during her extensive research to produce the Cross-Community Patient Discovery (XCPD) profile.

Update: Umesh Madan at his Blog "Engineer by Day", does a fantastic job of explaining how Spell-Checkers work. This is very similar to how Patient Demographics are 'fuzzy' matched too.

1 comment:

  1. The S&I Framework Consent group discussed letting data objects carry a link to the appropriate consent policy. This is indeed secure, and does not require pre-establishing a link to policy at the recipient. But it also has disadvantages, for both patient and government policies. First, next year one might want to use a different policy store. Second and more serious, we want to pass structured objects that can be shredded and stored in recipient's databases. Few such databases are capable of attaching a policy link to each cell in their database.

    An alternative that removes the first difficulty is to pass a consent ID for the patient, and provide a look-up service that determines which consent service has that patient's preferences. (This is akin to looking up my phone provider based on my phone#, a good use of late binding).

    The second difficulty is tougher. It seems to require having the patient authenticate and associate a consent ID (a sort of voluntary patient identifier) with each record holder's patient account. The consent service would retain links among all of a patient’s IDs, but not reveal them. The only legitimate use would be for validating a shipment of data.