Wednesday, September 27, 2017

#FHIR and Bulk De-Identification

Might be time to dust off a concept that hasn't been discussed for a while. But first let me explain two factors that bring me to that conclusion.

Bulk Data Access

The addition of a Bulk Data Access in #FHIR was a hot topic at the San Diego workgroup meeting. Grahame has explained it on his blog. What is not well explained is the use-cases that drive this API request. Without the use-cases, we have simply a "Solution" looking or a "Problem". When that happens one either has a useless solution, or one ends up with a solution that is used in appropriately. The only hint at what the Problem is a fragment of a sentence in the first paragraph of Grahame's blog article.
"... in support of provider-based exchange, analytics and other value-based services."
Which is not a definition of a use-case...  But one can imagine from this that one use-case is Public Health reporting, or Clinical Research data mining, or Insurance Fraud detection... All kinds of things that Privacy advocates get really worried about, with good reason.

De-Identification

There have been few, possibly unrelated, discussions of De-Identification (which is the high level term that is inclusive of pseudonymization and anonymization). These are also only presented as a "Solution", and when I try to uncover the "Problem" they are trying to solve I find no details. Thus again, I worry that the solution is either useless, or that the solution will get used inappropriately.

I have addressed this topic in FHIR before, FHIR does not need a deidentify=true parameter . Back then the solution was to add a parameter to any FHIR query asking that the results be deidentified. The only possible solution for this is to return an empty Bundle. As that is the only way to De-Identify when one as no use-case. 

De-Identification is a Process

De-Identification is a Process that takes a use-case 'need' (I need this data for my use-case, I want this data for my use-case, I need to re-identify targeted individuals, my use-case can handle X statistical uncertainty, etc). This De-Identification Process then determines how the data can be manipulated such that it lowers the Privacy RISK as much as possible, while satisfying the use-case need.  IHE has a handbook that walks one through the Process of De-Identification.

All De-Identification use-cases have different needs. This is mostly true, but some patterns can be determined once a well defined set of use-cases have been defined. DICOM (See Part 15, Chapter E) has done a good job taking their very mature data-model (FHIR is no where near mature enough for this pattern recognition). These patterns, even in DICOM, are only minimally re-usable. 

Most detailed use-cases need slightly different algorithm, with different variability acceptable.  For example IHE applied their De-Identification handbook to a Family Planning use-case and defined an Algorithm. Another example is Apple's use of Differential Privacy (a form of fuzzing). These are algorithms, but they are not general purpose algorithms. These examples show that each De-Identification algorithm is customized to enable a use-case needs, while limiting Privacy Risk. 

Lastly, Privacy Risk is never brought to zero... unless you have eliminated all data (the null set).

De-Identification as a Service

What I propose is that a Service (http RESTful like) could be defined. This service would be defined to be composable, thus it would be usable in a pipeline. That is the output of a #FHIR query (Normal, or Bulk) is a Bundle of FHIR Resources. This would be the input to the De-Identification Service, along with a de-identificationAlgorithm identifier. The result would be a Bundle of data that has been de-identified to that algorithm, which may be an empty Bundle where the algorithm can't be satisfied. One reason it can't be satisfied is that the algorithm requires that the output meet a de-identification quality measure. Such as K-Anonymity

The De-IdentificationAlgorithm would be defied using a De-Identification Process. The resulting Algorithm would be registered with this service (RESTful POST). Therefore the now registered algorithm can then be activated on a De-Identification request.

De-Identification Algorithm Consideration

So, what kind of Algorithms would need to be definable? Best is to look at the IHE Handbook which defines many data types, and algorithms that are logical.

For #FHIR. We would first need to an assessment of each FHIR Resource, element by element. Identify if this element is a Direct Identifier, Indirect Identifier (Quasi Identifier), Free Text, or Data.  This effort would best be done as the FHIR resources approach maturity. It could be represented within the FHIR specification, much like the _summary flag is represented...

Direct Identifiers -- Remove, Replace with fixed values, Replace with faked and random data, Replace with pseudonym replica, Replace with project specified identifiers, Replace with reversible pseudonym, etc...

Indirect Identifiers -- Remove, Fuzz, Replace, Generalize, Unmodify

Free Text -- Remove, Unmodify, interpret into codes removing unknown text

Quality Output -- Analysis of output to determine if it meets quality such as K-Anonymity

Just a hint of what it will take...

Conclusion

Plenty of good work to do, but the basis is well known.

Bulk Data -- De-Identification can be much more reliability (lowering Risk reliability) when the De-Identification process is executed on a Data-Set. Meaning the larger the number of Patients in the data-set, the more likely that one can protect Privacy. As this data-set approaches ONE, the risk approaches 100%.  This fact is usually the downfall of a de-identification result, they overlook how easy re-identification can be, especially where there is motivation and skill.





Monday, September 11, 2017

FHIR Connectathon of the IHE-MHD Profile

There were three brave soles willing to try IHE-MHD Profile here at the FHIR Connectathon.
  • Diego Kaminker (ALMADOC project / Argentina)
  • Fernando Campos (ALMADOC project / Argentina)
  • Simon Knee (NHS Digital / UK)
They did all the hard work. I just answered questions and such...

They had a specific goal: They wanted to test the IHE-MHD Profile against general purpose FHIR Servers. Specifically FHIR Servers that have not declared IHE-MHD Profile, but which do declare support for the fundamentals that IHE-MHD needs. This task is important as it is a way of testing how well I did at writing the IHE-MHD profile. How well did I stick close to FHIR (STU3), and how well did I inform the audience of the deviations.

I tried really hard to write IHE-MHD in away that a general purpose FHIR Server would work. However this was not the intended scope for IHE-MHD. The intended scope or IHE-MHD was as an API to an XDS, XDR, or XCA environment. Thus when I wrote IHE-MHD, I did have to constrain it in ways that a general purpose FHIR Server might not support.

This said, there was some big surprises.

I will summarize the reportout, It is available, with the actual XML examples and evidence.  

Test Procedure

1. Document Source: New Document (1 binary attachment included in the bundle)
2. Document Source: No binary attachment in the bundle – binary file referenced from external server
3. Document Consumer: Search by Filter Combination
1. Subject  by identifier
2. Document Class / Type
3. Document Creation Date Range
4. Document Consumer: Direct Binary Document Retrieval 
Notes: For this connectathon, we had not specified security / authentication requirements, like token / RBAC / etc.
Testing: Servers: with HAPI, Vonk, Grahame's server
Clients: plain REST with Chrome Restlet Client (Almadoc), Postman/Eclipse (LDP)

 Results

1 - During a ITI-65 "Provide Document Bundle", that carries both a Binary and a DocumentReference that points at that Binary, MHD expects that the DocumentReference.content.attachment.url would be "fixed up" by the server. That is when the server persists the Binary, and thus fixes up the Binary id from a URI (GUID) to a URL on the server, that new URL would be placed into the DocumentReference.content.attachment.url, which previously held the same URI (GUID) from the Bundle. This behavior does work when the FHIR Reference datatype is used, and did work in prior versions of FHIR (Prior to the Reference becoming a complex datatype vs just a flavor of URL), and the example in the FHIR specification shows this fixup behavior.
  • I created a CP to fix this See GF#13823
  • Recommend that in a Transaction, that URL fixup is done just like it is with Reference.
  • some zulip chat recommended that DocumentReference.content.attachment.data be used, but there are many problems with that. Most of all that it is very counter to the IHE XD* use-case need.
  • If this isn't resolvable, then the next revision of MHD might need to define ITI-65 as a FHIR Operation, rather than using the Transaction. I don't think this should be done until FHIR STU4. But I would entertain someone wanting to define this as an option. However I would remind everyone that IHE-MHD is not trying to use general purpose FHIR Servers, it is as an API to XD*.
  • One cold just do two transactions. One to create the Binary, then create the DocumentReference with a client fixedup URL. This can be done with general purpose FHIR Servers, but would be counter to the target for IHE-MHD being an API to XD*.
  • I will write a CP to MHD to make this more clear as an MHD imposed requirement of MHD servers.
2 - Already known issue #MHD_048 -- This was not a new issue, since we knew of it. But this issue was not obvious in the MHD text, so it caused issues. The default behavior for a general purpose FHIR Server is that one can't search against "contained" resources. In IHE-MHD we just left this as an open issue, we didn't try to solve it completely.
  • Turns out that one CAN query on contained resources
  • The bad news is that one can't tell if this will work from a servers conformance resource. 
  • I will write a CP that uses this feature that I didn't notice.
3 - Search on Patient.identifier -- An issue I simply didn't expose as much as I could have. This is a chained search, and will only work when the server you are querying against also hosts the Patient resources. For example: Imagine a FHIR Server that holds all of the DocumentReferences (the MHD server), and a different FHIR Server that holds the Patient resources (likely available via PDQm or PIXm). You can do a PIXm or PDQm query against this second server, and use the resulting URL in your MHD queries against the first server. This is just URL matching. But if you try to query the MHD server using Patient.identifier, that server will not know how to find the Patient.identifier.
  • This might be an important problem, or might not. It depends on deployment needs and capabilities.
  • I will likely write a CP to make this more clear in MHD

Conclusion

Overall this was really cool to do this kind of testing of the MHD specification. Especially since it resulted in some CR to FHIR, and likely a CP to IHE.

Friday, September 8, 2017

Remedial FHIR Consent Enforcement

I have been working with some of the teams wanting to do something with Privacy Consent and are coming to the FHIR Connectathon in San Diego. Here is a few stepping stones that I think are educational to move from zero to a reasonable set of Consent possibilities. Basic Consent is a necessary and powerful step:

Pre-conditions:

  1. Some form of Business Access Control are already in place to support basics of separating the kinds of things that various users (roles) are allowed to do; like Clinicians, or administrative staff, or billing staff, or cleaning staff, or dietary staff, etc. For example: Role Based Access Control (RBAC), or Attribute Based Access Control (ABAC) . This layer is responsible for separating those users that are allowed to Order drugs or procedures, vs those that are allowed to update clinical information, vs those that are given access to billing information, vs etc... An example of this is -- SMART-on-FHIR. 
  2. There is some Privacy Policy written: I will give a few for example purposes only. These are not complete, or comprehensive. They are defined well enough to illuminate the various stepping stones that I will outline below.
    * Explicit Consent -- if no consent is on file, then no data is shared outside the organization.
    * OPT-IN -- Patient has positively given consent to share without restrictions
    * OPT-IN with named organizations allowed access
    * OPT-IN with identified time/date range within which no data authored can be shared
    So, any Consent in the system must be from one of these patterns. Mixing of patterns, or going beyond these patterns is simply not allowed by the consent gathering application.
    **** Sorry, I have to use "OPT-IN" and "OPT-OUT", against my rule... but I do have them explained.
  3.  There is some consent gathering application that presents these Privacy Policies to Patients, and records the Consent ceremony as a FHIR Consent resource. I am not going to try to describe this ceremony. The user experience is very critical. The Patient must be well educated on the ramifications of their choices. I like the idea in GDPR where the consent rules presented to the Individual must be in human language terms that are specific to that person's language abilities. I am not short-circuiting this, but rather wanting to focus on the enforcement side.
    *** Note that if the patient changes their mind, this application handles this. Leaving only ONE active Consent resource. Thus there is no conflicts to resolve in real-time. If the patient wants to OPT-OUT after being OPT-IN; then a few solutions could happen. The existing record be updated to become Consent.status=rejected. Better for record keeping if a new Consent record is created with an OPT-OUT policy...

YES, I know that we all want more complex consents, we will get there. But for now we must get started. Stepping stones are important to the journey. Stepping stones are useful, but one does not stand on them for very long.

Some past articles to this point

Consent Enforcement Models

There are two major models today, very different.

Privacy Access Control decision engine

Privacy Access Control decision engine using OAuth. In this model, one offloads the Access Control decision to an OAuth authorization service. This is the model that HEART is focused on. Another example has been shown as "Cascading OAuth". The idea is that there is some OAuth authority (or a set of them) that focus on giving Access Control decisions purely along Privacy Consent lines. If this is 'cascaded' with the business OAuth Access Control decisions; then one has the combination of both.. I am not going to further explain this model in this blog article.

One could have an Access Control engine that is based on XACML...

In both of these solutions, the rules of the Patient specific Consent are not in any FHIR 'Consent' resource form. These solutions take care of steps 1-4. They only expose a 'decision' endpoint.

As such, their decision endpoint must be able to return complex obligations. PERMIT, but not to organization ABC, not the data authored during XYZ, etc... These in OAuth would be done with "Scopes", in XACML would be done with 'Obligations'. Which ever one uses, this is complex... and I would argue this is the same complexity one sees in the FHIR Consent resource today.

Interaction between FHIR Consent resource and Privacy Access Control decision engine

In both of these solutions, there might be a FHIR Consent that is simply there to indicate the endpoint of the Privacy Access Control service. This might be quite empowering. The Consent would not indicate anything other than the specific service to use. Thus enabling the Patient to choose from various Privacy Access Control services to use. The FHIR Consent resource would not indicate in any way what their Privacy rules might be.

I should model this.. and I might do it this week... but I want to focus on a solution that leverages the Consent resource... and this Privacy Access Control decision service solution hardly uses the Consent resource at all.

The FHIR Consent way:

Reminder we are working on Stepping Stones, and starting simple
* Explicit Consent -- if no consent is on file, then no data is shared outside the organization.
* OPT-IN -- Patient has positively given consent to share without restrictions
* OPT-IN with named organizations allowed access
* OPT-IN with identified time/date range within which no data authored can be shared

Lets first just work out the first two: No active consent on file, vs opt-in.

In the FHIR specification we already have examples of these:

A system can using FHIR query (RESTful) to determine if there is any Consent on file for a given patient. For the example below we look for if there is a OPT-IN consent for patient 'f001'

GET [base]/Consent?patient=f001&status=active&category=http://hl7.org/fhir/v3/ActCode#OPTIN
(((Note that one can't query on policyRule, so I propose either that be added, or that the OPTIN code goes into category. I seem to recall committee discussions that put it into category)))

If this above query returns ZERO result, then there is no OPTIN consent on file. Thus no access is granted. DONE
If there is a Consent, then we need to look to see if there are any Consent.provisions. If there are none, then we grant Privacy PERMIT access. DONE

The second stepping stone is to enforce the provisions.

Each provision will have a provision.type of either PERMIT or DENY. This means that if the constraints in the provision are matched, then the provision.type (PERMIT vs DENY) is to be enforced.

We have an example which is an OPTIN but denying access to a named organization.

Well, looking at this example, I see it is missing some critical parts that I just said would be there. The one provision this example has in it is that it is specific to an Organization named 'f001'. From the name of the example, it is clear this provision should have a .type="deny".

This kind of a provision can be enforced on the REQUEST, as that organisation is denied any access because of a Privacy Consent directive.

Date Range:

The Date Range provision is something that is much harder to enforce, as it really must look at the results that are about to be returned, and filter out any data that was authored within the given date range. I am not going to go deeper right now, but am pointing out that sometimes provisions can be enforced on the REQUEST side, denying a request or allowing a request with no further action. Some provisions require further work be done on the RESPONSE side.

Other articles:

Friday, September 1, 2017

MDS2 -- Revision Comment Opportunity

Updated: They moved the deadline for signup to September 15th, 2017

I have been involved in the original creation of the MDS2 (Manufacture Disclosure Statement for Medical Device Security).  This was back in 2004, before I was blogging...  The first revision was a major change to align with IEC 80001 - Risk Assessment to be used when putting a Medical Device onto a Network in 2010. The second revision to improve usability Major upgrade to MDS2 to align with IEC-80001 in 2013

It is being revised again, and there is a call for participation. Hurry, there is only a week to signup -- September 8th deadline.



This is intended to be a Security Disclosure statement from a Medical Device (any health software too) to someone purchasing. It is intended to explain the security capability, environmental expectations, and residual risk that would need to be managed by the purchasing organization. Security is a critical collaboration between the manufacture and the user of any system. This form is intended to enable this collaboration.

The Medical Imaging & Technology Alliance (MITA) is announcing an effort to revise the existing HIMSS/NEMA Standard HN1-2013 and develop it as an American National Standard. This standard consists of the Manufacturer Disclosure Statement for Medical Device Security (MDS2) form and related instructions how to complete the form. The intent of the MDS2 form is to supply healthcare providers with important information to assist them in assessing the vulnerability and risks associated with protecting private data transmitted or maintained by medical devices and systems.

If you are interested in participating in the development of this standard, please fill out the information requested below and return to Peter Weems (pweems@medicalimaging.org) no later than August 31, 2017 September 8, 2017.
  • Name
  • Organization Represented
  • Brief Description of Organization
  • Email Address
  • Telephone Number
  • Interest Category
  • Interest Category Descriptions

General Interest—Organization or individual that has an interest in the use of equipment included in the scope of this Standard, but neither produces nor uses it directly.

Government—Government agency or department that has an interest in the use of equipment included in the scope of this Standard. Please note that a government agency or department that uses this equipment should select the USER category.

Insurance--Insurance agency or department that has an interest in the use of equipment included in the scope of this Standard. Please note that an insurance agency or department that uses this equipment should select the USER category.

Producer—Manufacturer of equipment included in the scope of this Standard.

Testing Laboratory—Organization that tests equipment included in the scope of this Standard to established specifications.

Trade Association—Trade Association or society that represents the interests of manufacturers or users of equipment included in the scope of this Standard.

User—Organization (company, association, government agency, individual) that uses equipment included in the scope of this Standard.

User-consumer—Where the standards activity in question deals with a consumer product, such as lawn mowers or aerosol sprays, an appropriate consumer participant’s view is considered to be synonymous with that of the individual user – a person using goods and services rather than producing or selling them.

User-industrial—Where the standards activity in question deals with an industrial product, such as steel or insulation used in transformers, an appropriate user participant is the industrial user of the product.

User-government—Where the standards activity in question is likely to result in a standard that may become the basis for government agency procurement, an appropriate user participant is the representative of that government agency.

User-labor—Where the standards activity in question deals with subjects of special interest to the American worker, such as products used in the workplace, an appropriate user participant is a representative of labor.