Friday, November 24, 2017

HIE transition to Patient-Centered from Provider-Centered

The whole concept of  Health Information Exchanges, that I have been involved with, is there to improve Health outcomes for the Patient. So I often get frustrated when someone says that the HIE needs to become Patient Centered. I am a Patient in Wisconsin, and I feel the impact of WISHIN. There is no other purpose of an HIE besides the Patient. I have to take a few breaths and remind myself that better outcomes for the Patient is fantastic, but that the Patient doesn't 'feel' like they have any say or involvement. It is this that needs to improve.

In Wisconsin we do have Consent, specifically there is a state wide system for a Patient to choose to NOT allow their data to be shared over the exchange. This does not give them much other than ON vs OFF. But it is more than some.  So, this is usually first step in moving from a Provider-Centered to a Patient-Centered model.

This level are not fantastic, but it is far better than what we had a handful of years ago when there was no network. There is a bright future for the Health Information Exchange too. I want to expand upon the future transition from Provider-Centered to Patient-Centered, as a trend that is already underway.

Provider-Centered

First I need to address this statement. There is nothing in existing HIE that is "Provider-Centered". Today's system is highly "Manual", outright hard to use. But that is the last article. I truly feel sorry for Providers that are going out-of-their-way to use an HIE for the benefit of their Patient.

I will say though that this is exactly along the same lines as why "Patient-Centered" is not a good statement either. That is to say that today the Provider gets to interact with the HIE, where as the Patient is just a passenger. Thus Patient-Engaged vs Provider-Engaged might have been the better phrase.

Patient-Centered (aka Patient-Engaged)

So, let me say "Patient-Engaged" as the more clear two word statement. I think the goal of this initiative is to get the Patient more engaged. This might be more actual engagement, but might also be the feeling like being engaged. 

The pretty picture, Wisconsin is a pretty state, on the right shows how well WISHIN has included Patients. Odd choice of colors, as I would have chosen green to be the best case, and red to be the areas where more work was needed.  This graph is a current state, and included means that the given percentage of the population have their data accessible within WISHIN (and thus eHEX). 

The prime Patient need in Wisconsin is to support our tendency, especially the elderly, to fly-south in the winter (Arizona, Texas, Florida, etc). WISHIN has that priority covered through eHEX and direct HIE-to-HIE engagements.   They don't have good coverage with all states, but the southern ones where we tend to retreat to were clearly seen as a priority.  Also, some of the other states that have people that come up to Wisconsin in the summer and fall are also covered. 

The WISHIN system also includes support for Direct-Secure-Messaging, so the theory is that anyone that supports Direct is reachable. 

Here are some other ways to engage the patient more:
  1. Provide an Access Log. I would say Accounting of Disclosures, but there are simply too many exceptions that this results in a useless log. I want an Access Log, that is a log of every time my data was accessed (Direct or Exchange) using the WISHIN infrastructure. Who requested the access? What did they ask for? What did they get? When was this? Where was this? Why did they access (PurposeOfUse)?  There no network that I know of that provides a view of how the HIE was used to expose the Patient data. I recognize the concern that Covered Entity have that gives us "Normal Operations" exceptions. I don't like these exceptions, but I understand why they exist. I think that ALL accesses over an Exchange need to be reported to the Patient.
  2. Provide API access for applications the Patient chooses and authorizes. In the past this would be covered by a statement of "PHR", but that concept is too limited today. This item is inclusive of the older concept of a PHR, but is also inclusive of newer health Apps. Where a PHR was a system that would copy the patient data and give the patient the ability to connect apps to that copy of the data; now we are looking to use FHIR as a way to connect these Apps directly to the data.  These apps will tend to just need read-only access, but...
  3. Provide capability of the Patient to author data. Many patients, myself included, are using many home-care devices, personal-care devices, health-monitoring devices, and sports related devices. These are producing a wealth of information, much of it is just background measurements. These measurements are not accessible to Providers unless they can be contributed on-behalf of the Patient.
  4. Provide the Patient to challenge the validity of data. Once we can see the data, we will surely find some mistakes. Being able to challenge the validity of the data is essential. Even HIPAA acknowledged the need for the Patient to 'Amend" their data. I say challenge as to be closer to "Patient-Centered" or "Patient-Engaged". 
I'm sure there are others. I base these on the Privacy Principles

Summary

Caution. I have a long history in Patient-Safety, and Security. Thus I recognize the need to be careful with adding any capability. We must do "Risk Assessments" and mitigate the risks.  The new capabilities are clearly helpful for Patients that want to use them to better their health. However these new capabilities can be used maliciously too. Someone might use these new capabilities to gain advantage on someone else. Someone might use these new capabilities to gain themselves an advantage. We must consider someone who is Motivated, Capable, and Funded; along with mixtures of these. Engaging the Patient, means enabling abuse.

Benefit. Many patients can help with their own health if only they could get accurate data. Healthcare is about the Patient, it should be not just about the patient but with the patient.  Every living creature is a potential Patient, this should be obvious as an important priority.  The cynic would say that the law recognizes corporations as entities, well that kind of entity is never a Patient.  Humans are so much more important.

What is your ideas????? Please post comments



Wednesday, November 22, 2017

HIE Future is bright - Automated not Manual

I think it is important to celebrate what we have today in Health Information Exchanges. They are not fantastic yet, but they are far better than what we had a handful of years ago.  However there is a bright future for the Health Information Exchange too. I want to expand upon the future transition from Manual to Automated, as a trend that is already underway.

Manual HIE

Today we have Health Information Exchanges that enable Providers to send Directed Secure E-Mail messages to other Providers. This s a conscious thought, usually when referring a patient, or when a patient asks for this to be done. This is basic PUSH capability. The word basic should not be viewed as easy or trivial. There is a huge progress that gets us to the point of being able to say that this capability exists and is mostly ubiquitous.  We should celebrate getting to this point. But we should not stop here.

The Query model of Health Information Exchange also enables a Provider to publish a document they created under the hope that someone might find it useful. This is available, but who really has the extra time to publish something under the hope that someone might find it useful in the future.  But this Query model does have documents that are available, and those documents are found to be useful.

Most of these documents are not documents that are written and published. Most of these documents are documents that someone asked if they existed, and then the document was created. So even in a Query model, it is a manual step of someone asking if a document exists, followed by a document being assembled at that time.  This is either "On-Demand", or "Delayed-Document-Assembly".   In the case of "On-Demand", a prototypical document is published as available, which when Retrieved causes a fresh document to be created from current information. In "Delayed-Document-Assembly" a prototypical but specific document potential is published, which when Retrieved causes that specific document to be created. The big difference is that the "On-Demand" is made from current information, where "Delayed-Document-Assembly" is made from historic information.

The manual part in this case is that the recipient is manually choosing to Query the network to see if anyone has documents available for a specific patient.

Manual is not bad. Manual is the first step. We must go through Manual to get to Automated. It is time...

Future is Automated

In all of the Manual use of an HIE, there is a conscious effort to take action and use the HIE. This is really bad User Experience. What we really want is that the HIE capability, availability of patient data, is automatic. There should be no effort to consciously use the HIE. If the HIE can be useful, the 'system' should just use the HIE.

  1. When a Patient is scheduled for a appointment or procedure; the 'system' should gather all the data available. Not just gather it, but process it into useful information, perhaps using a new mXDE profile from IHE.
  2. When a Patient is at a registration desk; the 'system' should gather all the data....
  3. Inappropriate activities are detected and appropriate authorities engaged. Such as Providers using break-glass, unusual billing patterns, or drug seeking patients. These patterns would not be noticeable within any single organization, but can when viewed across a region.
  4. When a decision on a CarePlan is made for a Patient, the 'system' should look for potential experts in the local region, that are covered by the patient's health plan, and have characteristics that the patient has expressed.
  5. When a CarePlan progresses, the members of the CareTeam should be appropriately notified. What is appropriate notification will depend on their role. Some will see the Patient show up in their census. Some will have time on their schedule reserve to review the CarePlan activity...
  6. When a clinical research project is initiated, and where the Patient has expressed interest, that Patient's pre-arranged preferences (on the blockchain using smart-contracts) for participation will engage. That is the Patient will express interest, and conditions. This might mean that the Patient allows their fully-identified data to be used, partially-deIdentified data to be used, etc.. Including conditions of contact, conditions of notification. The preference might be to just notify the patient and nothing more.

I expect many more automated use-cases. These however are ones I know are actively being worked on. Like how about an automated analysis of treatment to discover better outcomes, and promote the plan of care that got that better outcome?

The Future is bright, but it must be appropriately designed

The point of Automated is that the HIE needs to move from something we think about, to something that is automatically used, but used in an appropriate way. Used in a way that is comprehensively more productive, more privacy respecting, more safety protecting, and more efficient. All of these must be considered. We can't automate and give away Privacy. We can't automate and give away Safety. We can't automate and give away better treatment outcomes. We can't automate and give away financial efficiency.

What is your ideas on how HIE can be made more Automated? Please post comments...

Tuesday, November 21, 2017

Future of HIE is bright

The Wisconsin HIE (WISHIN) held a summit last week. I signed up, but was unable to make it due to schedule conflicts. Their slide decks are now available and I am very sad I missed it.  The following diagram clearly shows Wisconsin leadership


This is such an exciting perspective of what the Wisconsin HIE delivers today, and where they are targeting for future support.

The other slide decks further elaborate on this plan. It is driven by delivering Value, not just Volume.  They had a segment that focused on Care Coordination as a driver of these changes.

They also have a comparison of nationwide HIEs, that is hard to disagree with the slide deck as there is so little detail present. There does seem to be a decidedly Wisconsin bent to the comparison. Focused on CareEverywhere (Epic solution), and on viewing WISHIN themselves as a solution. This is all expected, but not very helpful for use of the evaluation elsewhere.

I also found interesting the statistics slides. I don't have much to compare them to, so don't know if this is good, bad, or indifferent. It seems to me that the statistics show a really strong and healthy HIE. Widely implemented, with really good engagement.  But, then again I am a proud Wisconsinite. Also, I must note that I was on the Technical Advisory Committee in the early years. I think I am still listed, but have not heard about any meetings of that committee in a long time.

Note they are not ignorant of FHIR, I suspect it is going to be engaged in some of these future capabilities. I suspect it is more likely to be engaged in capabilities beyond this future.

Tuesday, November 14, 2017

Extra software/transaction details in FHIR AuditEvent / ATNA Audit Message

I have been in a few discussions lately where the question came up on how to add additional information to an ATNA Audit Message (aka FHIR AuditEvent). This additional information is not the kind of information that needs to go into an 'extension', as the Audit Message schema has support for what needs to be recorded. But there is a need to setup some things.

Here are the use-cases

  1. In a service oriented transaction there is a 'transaction identifier' that is specific to that transaction. When all audit events are recorded with their transaction identifier, then one can see all the audit events caused by one transaction. This transaction identifier might be in  SOAP header, or might be in an HTTP header, or might be elsewhere. For the purpose of this article it doesn't matter were it comes from, but rather that many different audit logging events can record the transaction identifier so that later the many different audit log entries can be correlated.
  2. In a layered software environment there are various software 'applications' (aka services) that might be involved in an 'event'. It would be good to record all of these applications so that an analysis of the audit log can show the involvement. For example: During the audit logging, it is possible to walk the call-stack to see all the layers involved. 

First thing to think about is if this additional information is appropriate. This is an important step because sometimes initial thinking is to put data into the audit log. One should keep data out of the audit log, and rely on metadata as linkage to the data that is managed more formally in a database. In the use-cases here these are both metadata, and not data. They explain what was involved, hence 'meta'.

Note that both of these use-cases are addressing similar realities, that when looking at an audit log one finds that it is useful to be able to understand how the event was processed. Realistically BOTH of these are likely to be needed. That is to say that every piece of software that touches a transaction could record their participation in that transaction, this would produce a large number of audit events, possibly too noisy.  The software "architecture" could be understood so well that only one event is recorded, with complete details. Most likely it is a combination of all of this.

IHE defined Audit Message is minimally transaction focused

Normally an Audit Message from IHE asks that the critical Who, What, Where, When, and Why (W5) are recorded. A specification like IHE will explain the most likely specifics about this for a given transaction. The unfortunate reality is that IHE is limited to the Interoperability layer of the software design, so they can only mention things that IHE manages; which is Actors. 

Such an example is where a XCPD Query on the server would record for this event: (Using IHE/DICOM Audit Message terms, similar map to FHIR AuditEvent)
  • Event Description: Defining this kind of an event
  • ActiveParticipant: the Initiating Gateway would be described by IP address, possibly the ReplyTo value if async web services are being used.
  • ActiveParticipant: the Responding Gateway by IP Address, and SOAP endpoint URI. The Process ID can be recorded to tie to local system logs.
  • AuditSource: identity of the system recording the audit event
  • ParticipantObject: The patient identity(ies) if known
  • ParticipantObject: The query parameters 
The closest thing to what this article is about is that the Responding Gateway can hold the local machine Process ID so that the event can be correlated with local audit logs (in non-IHE format)

You should also notice that this specification only mentions "Actors" (i.e. Initiating Gateway, and Responding Gateway). IHE can't specify a software architecture, so it doesn't know what the software architecture is. It is expected that the Software Architect can add these details.

Layered specification

I want to point out that IHE does augment the audit message expectation based on layered 'grouping' of actors. For example if the XCPD server above had received a XUA assertion, then there is the expectation that the above audit message would be augmented with an additional ActiveParticipant with the UserName element composed of the parts of the identity in the SAML assertion. See section ITI TF-2:3.40.4.2

    +  ActiveParticipant: 

           UserIsRequetor == TRUE  (this indicates the ActiveParticipant is describing the user)
           UserId ==  alias"<"user"@"issuer">" 
           RoleIdCode == roles from the SAML
           PurposeOfUse == from the SAML

Note that this is not a whole new Audit Message, but rather additional detail that gets added to any Audit Message when there is an XUA assertion. This additional information is expected to be added to ALL audit messages.

In looking at this. The specification should have been more clear that this should be done with a dedicated ActiveParticipant. The way the XUA specification describes it, one could just add the detailed UserName to any existing ActiveParticipant. I would rather it be a dedicated ActiveParticipant. Given that there is an infinite number of ActiveParticipant elements that could exist, and each one added only adds the elements populated, this doesn't make it much bigger of an Audit Message. Let me know what you think, should we CP this?

Where to put the transaction ID details?

In the first use-case the extra detail is is some kind of transaction identifier. In my example of XCPD there is a transaction identifier in the WS-Addressing MesssageID element in the query that goes into the WS-Addressing RelatesTo element on results returned. Recording this identifier might be useful to help relate responses with requests. This however is not part of the current IHE Technical Framework. I wonder if it should be?

In http REST, like FHIR, one could have an X-Request-Id that might also be good to record. See the FHIR chat discussion.

So the transaction identifier does not seem like an Active participant (AditEvent.agent) in the transaction, so it must be simply a ParticipantObject (AuditEvent.entity). The difference between these two is the "Active" part. That is if something/someone takes action then it is an ActiveParticipant (AditEvent.agent), else it is just some object that was part of the event being recorded.

Thus I would add a ParticipantObject that does nothing but hold the Transaction ID. By adding a ParticipantObject that only holds the Transaction ID it is easier to find all messages that have this ParticipantObject.

    + ParticipantObject: 
          ParticipantObjectTypeCode == 4 (Other)
          ParticipantObjectTypeCodeRole == 21 (Job Stream)
          ParticipantObjectIDTypeCode == transaction number type
          ParticipantObjectID == transaction identifier

Where to put Application Software Identifier details?

The next one is about where to place various layers of software. I would worry if this list gets long, especially if it is repetitious. It would not be helpful to list all the required layers that would be processing. But it does seem logical if there are different services that may or may-not be called upon, or where there are different instances that would need to be identified. 

In this case I think the application software is actively involved in the processing, so I would recommend this be recorded as a ActiveParticipant. So adding an additional ActiveParticipant would look like:

    + ActiveParticipant:
          UserId == the software identifier
          UserIsRequestor == FALSE 
          RoleIdCode == 110150 (Application)

There would be one of these for each layer of software actively involved. Thus care must be taken to not include all layers, just those that are useful. The other elements in the ActiveParticipant might also be used for some other purpose.

Conclusion

So the exiting ATNA Audit Message and FHIR AuditEvent can easily carry these additional details. This additional details can help resolve when there are failure-modes in a system, or to keep track of proper processing. Thus these additional details seem reasonable to incrementally add to existing Audit logs.

Note that there are some interpretations of what an audit log entry MUST or SHOULD contain. Sometimes these rules are considered too tight, and thus there have been some cases where an audit log entry has been declared non-conformant because it contained additional detail (or less detail). This is unfortunate, and wrong. An Audit Event log entry should record anything useful that is known at the time the audit log entry is recorded. This is a case where recording everything that is known is more important than recording everything theoretically possible, or failing to record because there is some data that is not known but is considered required by some tooling.

Friday, November 10, 2017

Healthcare use of Blockchain thru creative use of Smart-Contracts

I went to a blockchain conference yesterday. All the experts were clear that this early days. Caution, but excitement. They were all full of encouragement to try stuff out. All recognized that there is much misinformation and hype. All recognized anyone using blockchain is taking a big risk. None would state any prediction of the future. They also all recognized that those that have succeeded have reaped great rewards. They are all fully committed and excited...

No surprise, I have said this too:

What are THEY doing?

The experts that presented are focused on the financial flows. Not just money, like bitcoin. They are working on other financial flows including bank-to-bank money transfers, insurance payments,  payments based on contract terms, etc.  When pressed it was because these things can be made fully virtual, leverage the fact money is a concept in blockchain, and the biggest problem these flows have is the double-spend problem. The double-spend problem is well addressed in blockchain.  Also, when these kinds of things go wrong it results in simply loss of money. This loss of money can be huge, but it is still just a loss of money.

 keeps fidgeters occupied while not bothering others around them
That is to say they are not protecting an individual's Privacy, where a failure here is permanent loss of privacy. They are not protecting an individuals health, where a failure might be pain, loss of function, or death.

What should Healthcare do with blockchain?

This does not mean there is no room for blockchain and healthcare. just that we need to be careful about how we approach the topic. I have covered a set of Blockchain considerations for healthcare

Don't put medical data into the blockchain.
I am more convinced that putting healthcare data into a blockchain is a really bad idea.  Seems this is also a consensus that is coming together. Thus there will be many FHIR Servers that hold your data (might be others than FHIR, but why bother mentioning them).  For any specific use of data related to the blockchain, there might be one server or there might be many.That is to say that it is fully possible that the FHIR server associated with a blockchain project might have centralized the data prior to exposure through blockchain, or might proxy and make it appear as if there is only one. However probably best to presume many FHIR servers. In initial experimentation, experiment with one, but keep open as a gap expansion to many.

Don't use blockchain for direct Treatment.
I am equally convinced that using blockchain for direct Treatment use-cases is also a bad idea. Treatment has many expectations. The data must clearly be identified with a specific human, can't use pseudonyms. There must be no delay in getting to the data (urgency). There must be clear provenance of the data (where did this data come from, etc...). Treatment use-cases require that new events, observation, interactions are recorded; that any mistake detected is corrected. And there is also medical-emergency break-glass. etc.

Treatment related workflows
There are some Treatment like things that don't have these expectations. Such as participating in a clinical trial, where they can treat you as a pseudonym (strongly identity proofed). There are other Treatment scenarios where one also don't need actual identity, like a laboratory or pharmacy supply. Some of these are already given only the MRN, thus they don't have much more than a form of pseudonym.

Smart-Contract is the key

I think the biggest opportunity is focused on creative ways to use the smart-contract. Smart-contracts can exist elsewhere, indeed our FHIR Consent and OAuth (UMA) are two examples of smart-contracts. 

The difference being that various blockchains have specific smart-contract language, and mechanism to execute that specific language. These languages are usually very basic, like in bitcoin; but are getting more comprehensive. 

Legal contacts are not as easy as they seem
This is the space that needs to mature. It starting with putting a legal-will into a blockchain. Motivation is to unlock coins that an individual holds upon death in a way defined by that individual. This motivation is strong by that individual, and those benefactors who would receive the coin. It is also strongly motivated by the coin community as otherwise those coins go permanently out of circulation. It would seem this is not unlike a normal coin transaction smart-contract like is the foundation of bitcoin. But it is not that simple. Key improvements needed is that these contracts need to interact with sources-of-truth from outside the blockchain, the proof of death, the proof that the death was not caused by a benefactor. First these don't exist, but also these are external sources of truth. Blockchain wants to have the community 'be the source of truth', and not use external sources of truth... 

Patient data for sale for Clinical Research
 So a patient might offer access to their data (which is elsewhere) to anyone that can satisfy a smart-contract they put into a public chain. Unfortunately this best opportunity is what I described over a year ago given Grahame's original ask  Healthcare Blockchain - Big-Data Pseudonyms on FHIR

The smart-contract would include:
  • Upfront payment for the access (some micro-payment)
  • Requirement for escrow of coin to be unlocked to the Patient if other terms are violated
  • Terms of protection of the data
  • Kind of clinical trial allowed (heart conditions, but not brain)
  • Agreement to keep all research public
  • Agreement to contact patient if the patient could benefit from new treatment detected
  • Agreement to contact patient if some treatable medical condition not previously known is discovered
  • Agreement to not contact patient if terminal condition is detected
A clinical trail that can meet these, could satisfy the contract and gain access. If they violated any of the terms, the smart-contract would automatically transfer the escrow coin to the patient.  Based on some sunset term (like possibly the natural death of the patient), the escrow coin goes back to the research organization. So clearly that legal-will is important to this use-case...

Variants on smart-contract based on de-identification capability
It is possible that the patient publishes multiple flavors of the smart-contract. Each offering different types of pseudonym blinding: Some flavors would expose MORE information, and have higher contract requirements (like shown above). Some would expose very well de-identified data, and have less strict contract requirements.  

Highly de-identified data, where ALL direct and in-direct (Quasi identifiers) are removed. Including fuzzing completely dates, patient characteristics, location, etc. If the data is highly de-identified it is less valuable for clinical trials, but it also wold not need to be as strongly protected. So it is possible for this offering the smart-contract does not require an escrow of coin.

These variants would require that the authorized access to the data enforce these variations. Thus one would need some access method to the data where the de-identification can be accomplished. This might be done by different servers hosting the various flavors, confirmed by a human statistical analysis. This might be done by some automated de-identification service as I describe in
#FHIR and Bulk De-Identification

Healthcare Financial transactions
I any financial related transactions might certainly be good blockchain, even if it is healthcare related. Still privacy and safety concerns, but these are a step away. For example 

More reading:

Tuesday, October 10, 2017

Granularity of FormatCode

A Question was asked regarding FormatCode granularity, especially given that the IHE defined FormatCode vocabulary seem to be defining much smaller concept relative to HL7 defined FormatCode for C-CDA. Where the combined list is available in FHIR as a ValueSet of FormatCodes (updated in current build)

Important background :: Eating an Elephant -- How to approach IHE documentation on Health Information Exchange (HIE) and Healthcare Metadata

The FormatCode is there to differentiate 'technical format'. It is a further technical distinction more refined than the mime-type. So it is related to mime-type.

FormatCode is not a replacement for class or type. In fact it is very possible to have the exact same type of content available in Multiple-Formats. Including FHIR Documents in XDS

It is true that IHE defined FormatCodes tend to be one-per-Profile, where as all of C-CDA R2.1 is one FormatCode. This difference in scope seems like a very big difference, but is at the technical level not different. That is to say that the IHE XPHR profile defines a unique set of constraints on the format of the content, where as C-CDA R2.1 similarly defines a unique set of constrains on the format of the content.

This is a good time to explain that what IHE calls a "Profile" is commonly what HL7 would publish as an "Implementation Guide". Thus they are often very similar in purpose.

While it is true that XPHR has only one type (34133-9 Summary of Episode Note), where as there are a set of unique use-cases that are each unique clinical 'type' of document in C-CDA R2.1. This is a good example of why formatCode is not the same thing as 'type'. Type expresses the kind of clinical content, where as FormatCode expresses the technical encoding used.

So the FormatCode focuses on the technical distinction as a sub type on mime-type; and should be as specific as necessary to understand the Profile (or Implementation Guide) set of constraints.

HL7 could have defined a set of FormatCodes for the second-level deep technical format within their C-CDA. They didn't, I was not there to explain why they didn't. However this should, in theory, not be a problem. It simply means that at the technical encoding level there is only one FormatCode, rather than a set for each sub-format. I don't know how useful this would be, but I would be glad to help the committee understand the benefit.

I am the one that maintains these vocabulary. Not because I am a party to the creation of the vocabulary values, but simply because I am willing. If there are problems with any of these locations or values; please let me know. I hear rumor that there are problems, but until the problem is pointed out in specific detail, I can't fix it. This is especially true in FHIR where a Change Request (CR) must be filed.

Wednesday, September 27, 2017

#FHIR and Bulk De-Identification

Might be time to dust off a concept that hasn't been discussed for a while. But first let me explain two factors that bring me to that conclusion.

Bulk Data Access

The addition of a Bulk Data Access in #FHIR was a hot topic at the San Diego workgroup meeting. Grahame has explained it on his blog. What is not well explained is the use-cases that drive this API request. Without the use-cases, we have simply a "Solution" looking or a "Problem". When that happens one either has a useless solution, or one ends up with a solution that is used in appropriately. The only hint at what the Problem is a fragment of a sentence in the first paragraph of Grahame's blog article.
"... in support of provider-based exchange, analytics and other value-based services."
Which is not a definition of a use-case...  But one can imagine from this that one use-case is Public Health reporting, or Clinical Research data mining, or Insurance Fraud detection... All kinds of things that Privacy advocates get really worried about, with good reason.

De-Identification

There have been few, possibly unrelated, discussions of De-Identification (which is the high level term that is inclusive of pseudonymization and anonymization). These are also only presented as a "Solution", and when I try to uncover the "Problem" they are trying to solve I find no details. Thus again, I worry that the solution is either useless, or that the solution will get used inappropriately.

I have addressed this topic in FHIR before, FHIR does not need a deidentify=true parameter . Back then the solution was to add a parameter to any FHIR query asking that the results be deidentified. The only possible solution for this is to return an empty Bundle. As that is the only way to De-Identify when one as no use-case. 

De-Identification is a Process

De-Identification is a Process that takes a use-case 'need' (I need this data for my use-case, I want this data for my use-case, I need to re-identify targeted individuals, my use-case can handle X statistical uncertainty, etc). This De-Identification Process then determines how the data can be manipulated such that it lowers the Privacy RISK as much as possible, while satisfying the use-case need.  IHE has a handbook that walks one through the Process of De-Identification.

All De-Identification use-cases have different needs. This is mostly true, but some patterns can be determined once a well defined set of use-cases have been defined. DICOM (See Part 15, Chapter E) has done a good job taking their very mature data-model (FHIR is no where near mature enough for this pattern recognition). These patterns, even in DICOM, are only minimally re-usable. 

Most detailed use-cases need slightly different algorithm, with different variability acceptable.  For example IHE applied their De-Identification handbook to a Family Planning use-case and defined an Algorithm. Another example is Apple's use of Differential Privacy (a form of fuzzing). These are algorithms, but they are not general purpose algorithms. These examples show that each De-Identification algorithm is customized to enable a use-case needs, while limiting Privacy Risk. 

Lastly, Privacy Risk is never brought to zero... unless you have eliminated all data (the null set).

De-Identification as a Service

What I propose is that a Service (http RESTful like) could be defined. This service would be defined to be composable, thus it would be usable in a pipeline. That is the output of a #FHIR query (Normal, or Bulk) is a Bundle of FHIR Resources. This would be the input to the De-Identification Service, along with a de-identificationAlgorithm identifier. The result would be a Bundle of data that has been de-identified to that algorithm, which may be an empty Bundle where the algorithm can't be satisfied. One reason it can't be satisfied is that the algorithm requires that the output meet a de-identification quality measure. Such as K-Anonymity

The De-IdentificationAlgorithm would be defied using a De-Identification Process. The resulting Algorithm would be registered with this service (RESTful POST). Therefore the now registered algorithm can then be activated on a De-Identification request.

De-Identification Algorithm Consideration

So, what kind of Algorithms would need to be definable? Best is to look at the IHE Handbook which defines many data types, and algorithms that are logical.

For #FHIR. We would first need to an assessment of each FHIR Resource, element by element. Identify if this element is a Direct Identifier, Indirect Identifier (Quasi Identifier), Free Text, or Data.  This effort would best be done as the FHIR resources approach maturity. It could be represented within the FHIR specification, much like the _summary flag is represented...

Direct Identifiers -- Remove, Replace with fixed values, Replace with faked and random data, Replace with pseudonym replica, Replace with project specified identifiers, Replace with reversible pseudonym, etc...

Indirect Identifiers -- Remove, Fuzz, Replace, Generalize, Unmodify

Free Text -- Remove, Unmodify, interpret into codes removing unknown text

Quality Output -- Analysis of output to determine if it meets quality such as K-Anonymity

Just a hint of what it will take...

Conclusion

Plenty of good work to do, but the basis is well known.

Bulk Data -- De-Identification can be much more reliability (lowering Risk reliability) when the De-Identification process is executed on a Data-Set. Meaning the larger the number of Patients in the data-set, the more likely that one can protect Privacy. As this data-set approaches ONE, the risk approaches 100%.  This fact is usually the downfall of a de-identification result, they overlook how easy re-identification can be, especially where there is motivation and skill.