Thursday, February 28, 2019

IHE ITI Winter 2019

The IHE ITI, PCC, and QRPH workgroups met in OakBrook, IL this week at the RSNA Headquarters. This is the usual place. Which was a big disappointment as we all were expecting we would be meeting at this time in Treviso Italy., where it is (+12°C). So we got stuck with dirty snow and (-12°C) temperature.

I think we all had a good week. There was an amazingly diverse set of participants:

I participate mostly in ITI. The week was very successful. We are trying out a new project development process that is following more of an Agile development process. Normally the IHE calendar is split into 12 month development cycle, small and large work items need to fit within that timeframe. Under Continuous we can complete a work item when it is done, as fast as it can be done, thus picking up new items as we have resources become available. Under this model we have harder requirements on Quality, where as the 12 month cycle puts emphasis on date based deadlines.

The ITI workgroup completed:
  • Patient Identity Management using FHIR -- We got started on this, mostly agreeing to the usecases, scope, and actors. This is targeting supporting a regional or national level patient identity management system, and will be tied to XDS Registry to keep the registry up-to-date, so that XDS Consumers get accurate results when querying for a patient.
  • Updating mCSD -- We are about ready to send this out for Public Comment, look for it in a month. This is an effort to add support for "Facilities" and clarify how they are different than "Locations". The work-item is aimed also to define how to create multiple hierarchies between facilities. 
  • XCA Deferred Query -- We progressed to almost a complete specification, however this will likely not be approved for Public Comment until our next face-to-face. This work item intends to support use-cases where an XCA Responding Community will take longer to respond to query and response than is allowed by Asynchronous Web-Services. Use-case such as when a human must get involved with the response to evaluate patient privacy consent authorization, or where paper documents would need to be scanned and indexed. These longer response times would be supported through a Deferred query mechanism similar to what is in XCPD.
  • FHIR R4 priority 1 group - finishing the update to FHIR R4 for the group of IHE Profiles using FHIR that is considered a Priority for ONC: MHD, PDQm, QEDm, mCSD, and Appendix Z. we resolved public comment in addition to all outstanding CP. Compliance files (structureDefinition, capabilityStatement, etc) have not yet been converted to R4, this will be done in the coming months when Forge can be used.
    • The FHIR specification changed in response to public comment based on ballot and experience. For details on the summary of changes of the FHIR specification between STU3 and R4 please see http://build.fhir.org/history.html
    • The IHE profiles of PDQm, MHD, QEDm, mCSD, and the Appendix Z were updated to reference FHIR R4. The changes were dominantly simply to reference FHIR R4 rather than FHIR STU3. Thus the most important changes to implementations are found in the FHIR specification changes. Some changes were also initiated by a Change Proposal that identified a mistake in the STU3 version of the profile. Some changes were open-issues in STU3 that are now fixed in the FHIR R4 core specification. Change Proposals and open-issues are noted in the closed-issues in these drafts. These versions of the supplement were in Public Comment that resulted in 88 public comments that were then resolved. Note that mXDE profile is independent of the FHIR revision, as it orchestrates MHD and QEDm.
    • The following are specific additional changes beyond the update to reference FHIR R4:
    • MHD - now requires Document Recipient declaring XDS on FHIR Option must support all XDS association types
    • MHD - the canonical URI used for profiles are available, but they have not yet been created. These StructureDefinition xml files will be made available later as they are not normative.
    • MHD - the canonical URI used for the Provide Document Bundle transaction has changed because the FHIR canonical URI encoding rules don't allow for "-" character. We could have just change ITI-65 into ITI_65, but a breaking change is a breaking change. So we chose to replace with an actual structure definition based in the same URI space as our other Structure definitions. This means that we would no-longer use http://ihe.net/fhir/tag/iti-65, but rather we would use http://ihe.net/fhir/StructureDefinition/IHE_MHD_Provide_Comprehensive_DocumentBundle, or http://ihe.net/fhir/StructureDefinition/IHE_MHD_Provide_Minimal_DocumentBundle
    • MHD - many updates to the XDS-FHIR mapping, including recognizing the use of the .meta element to support minimal metadata
    • MHD - recognition of ProviderRole use
    • QEDm - fixed Provenance section that was hard to read and had some errors
We also worked with PCC and QRPH on their FHIR based profiles. Reminding them that they should be leveraging Appendix Z as much as they can, and let ITI know if there is any opportunity to improve Appendix Z. Further discussion that there are guidance on how to evaluate FHIR and profile FHIR. In addition there is now a GitHUB for IHE, and we are managing our FHIR compliance (XML) files there rather than managing them on the FTP.

ITI will be picking our next work item from the backlog. The backlog is maintained and prioritized on a monthly basis by the ITI Planning committee.

UPDATE: I predicted wrong:  I expect that we will convert the remainder of our FHIR profiles to FHIR R4: NPFS (Non-Patient File Sharing), ATNA FHIR Query, mACM (Mobile Alert Communication Management), and PIXm (Patient Identity Cross-Reference for Mobile).

We actually decided to take on creating the ATNA Feed on FHIR transaction, so that applications that are just FHIR based have an easy way to log security/privacy/other events, rather than only having the DICOM ATNA encoding using SYSLOG. This work item will also update the FHIR ATNA Query transaction to FHIR R4. There is strong interest in updating the rest of our FHIR Profiles, but that will be picked up at our next meeting when we finish the CSDm Facility supplement.

Keith has a report from PCC A Brief summary of my IHE ACDC Profile and A4R Whitepaper Proposals

Wednesday, February 20, 2019

Basic DS4P - How to set the confidentialityCode

I have covered the vision of Data Segmentation for Privacy (DS4P) concept, and outline how a Security Labeling Service (SLS) would enable this grandiose vision of DS4P.

However, there are stepping stones: The following is a slightly update on an article I wrote in July 2015 on how to set the confidentialityCode.  I have used bold/underbar to indicate where I enhanced the original text.

The problem then, as it is today, the confidentialityCode value that everyone uses is "N" (Normal confidentiality). Which does not help for Data Segmentation nor Privacy.

The recommendation I give here is restricted to the gross level: for Document Sharing at the XDS/XCA/DocumentReference metadata level;  for FHIR REST at the returned Bundle.meta.security level, but not on each Resource in the Bundle; and for CDA at the CDA header, but not on each element. Going deeper is possible, but not what I am trying to drive as the next step beyond "N".

Some background articles:

Recommendation for setting confidentiatlityCode

So. I would continue to recommend that anyone or any-system publishing health data such as FHIR resources, FHIR documents, and CDA documents should use "N", unless they have evidence to say that is the wrong value. Meaning it should be a specific effort to choose the other values:
  • "R", because there is specifically sensitive content – HIV, alcohol/drug abuse, etc.
  • "V", because the content should be seen only when reader is individually authorized -- psychology-notes, usually also used on all VIP patients (Not a best practice, but reality).
  • "M", because the content is less sensitive than normal, but still medical. authorized for wide distribution – like an emergency-data-set, or for dietary use-cases
  • "L", because the content is not medical, or has been de-identified
  • "U", because the content is not specific to an individual and is public

This is right out of the definition of the vocabulary values 2.16.840.1.113883.5.25 for "_confidentiality". Available from the FHIR specification for easy reading. https://www.hl7.org/fhir/v3/Confidentiality/index.html

How to determine what the value should be?

I don't disagree that this is a hard thing to determine.
  • It might be determined by workflow, psychology notes clearly are coming from a psychology workflow. 
  • Clearly de-identification is a specific workflow. 
  • It might be an explicit act, where the user is specifically trying to make a less-sensitive document for broad use such as a emergency-dataset, or 
  • for export to the dietitian. 
  •  It might be a specific request, where the clinician decides that the data is very sensitive, or 
  • where the patient decides that the data is very sensitive. 
This is different than a patient choice in consent regarding rules applied to these different codes, meaning where a patient chooses a restrictive consent for their data accessibility. See http://healthcaresecprivacy.blogspot.com/p/topics.html#Privacy

The VHA has shown some success in demonstration projects with passing the data through a Security Labeling Service (SLS) that leverages Natural Language Processing and  CDS (Clinical Decision Support) to tag sensitive clinical concepts. See FHIR Demonstration of DS4P (sorry the video is lost). If none are found then the data is "N", if some are found then the data is "R", if specific types are found the data is "V"… This automated method has me somewhat worried as the social norms of what is sensitive, change often. So using this automated form on publication time might produce wrong evaluation overtime. In the case of the VHA demonstration, they applied it upon 'use' of the data, so it was using the social norms rules at the time of reading. Likely better social norm rules, but not sure this is better behavior. Note that the intermediate step is tagged sensitivity category, which might be given to the access control system as information to be used in the access control decision or enforcement

Is there more?

All the other security-tags are not likely to be set upon publication. IHE has brought in the whole of  the "Healthcare Privacy/Security Classification System"
  • IHE specifically recommends against using the sensitivity category, as the value itself is sensitive. They are useful for internal use, like the VHA demonstration.
  • Compartment is mostly undefined, but would likely be a local value-set. Unlikely to be understood at publication time. Interesting place to play, as it might be used to define compartments like Radiology, Cardiology, Psychology, etc... but it is untested grounds.
    • More likely it is used to tag specific authorized Research projects by name
  • Integrity could be equally set by the publisher, although it is not well enough defined. But this would be a way to tag data that was generated by the patient, vs data generated by a licensed clinician.
  • Handling caveats might be set on publication. The only cases I can think of are similar to the "V" cases, in that the author explicitly knows something about the data and thus needs to add a caveat.
    • One specific example is 42CFR Part 2 – SAMSA covered treatment- that must be explicitly marked with a 'do not disclose without explicit authorization by the patient'. NOAUTH
    • Second specific example is an obligation to delete after use, which specifically forbids persistence (including printing) DELAU

Conclusion

So, simple guidance. You need all of the _confidentiality vocabulary, and two more from the handling caveats. -- [U, L, M, N, V, R] + NOAUTH + DELAU

Blog articles by Topic

Segmenting Sensitive Health Topics

In my last article I outlined the need to recognize that health data have various kinds of sensitivity, which informs various types of Privacy rules of access, to support the goal of Privacy. Thus Data Segmentation for Privacy (DS4P). Here I am going to explain some current thinking of how an Access Control Enforcement engine can tell sensitive data from normal health data.

Access Control is broken into various parts. One part makes an access control decision. This is made based on possibly many vectors. Please read this article on Vectors through Consent to Control Big-Data Feeding frenzy. It explains that some data is sensitive simply because of who authored it (Betty Ford Clinic), which is clear by looking at the author element.

The problem I point out in the last article is that differentiating sensitive data from normal data is not easy.

Back 20 years ago, there seemed to be an expectation that when a clinician recorded some fact, they would also tag that fact with a sensitivity tag.  Thus when an access request was made these tags could be inspected by the access control engine to determine if the data could be accessed by the individual requesting access. The reality is that this tagging at authoring by the clinician was unreasonable and never done. The reality of the time was also a more simple time.

Thus there are large databases of longitudinal data that has never been assessed if it is sensitive or not. How would one enforce Data Segmentation for Privacy (DS4P) if there is no way to identify what data needs to be segmented?

Security Labeling Service


Thus the Security Labeling Service (SLS) was born. This service does what the name indicates:  given a bunch of data, it applies security labeling. 

The capability might be gross or fine-grain:
  1. Only identify the overall Confidentiality Assessment. Is the data normal health data, or is it Restricted?
  2. Only identify the various sensitive kinds of data within the data. The data has indicators of sexually transmitted disease, substance abuse, etc..
  3. Identify which fragments of the data are sensitive. The data is not modified, but enough information is given to identify the fragments. For example a FHIR Bundle might be assessed, and a list of Resources within the bundle might be identified with specific tags. 
  4. Tag fragments of the data with sensitivity. The data is modified with the tags. Such as updating the FHIR resources .meta.security value. 
There are likely more, but this subset seems foundational.

The SLS might operate on a single observation, a Bundle of Resources, a CDA document, a FHIR Bulk Data blob, or a whole database.

How does the SLS work?

The reason to create the concept of the SLS was to isolate the hard work of determine the sensitivity from the Access Control Decision and Enforcement. Thus us privacy and security experts were explicitly invoking the Hitchhikers Guide to the Galaxy:  "Somebody else's problem field". Which means, I don't know how it works....

One idea is that the SLS just has a list of terms it looks for

One idea is that the SLS leverages Clinical Decision Support is used

One idea is that Natural Language Processing is used

One idea is that Big Data and Machine Learning is used

I am sure someone would indicate the Blockchain is used

Most likely many methods are used. It depends on the needs of the organization, data, and patients.

To modify the data or not

I tend to not want the Security Labeling Service (SLS) to modify the data. Mostly because the kind of function I want out of the SLS is simply to identify the sensitivity kinds. These sensitivity kinds of data are not typically exposed to end users or recipient organizations. They are just used by the Access Control Enforcement to determine if the data should be allowed to be accessed, blocked, or modified. Thus any changes to the data would happen by the Access Control Enforcement, not the SLS. 

There is a camp that combines Access Control Enforcement and SLS into one service. I think this is simply combination. Thus this situation is explicitly the combination of Access Control Enforcement and Security Labeling Service into one thing; not a new kind of Security Labeling Service (SLS).

When to Scan?

A model is to scan the data when it is created/updated, and save the assessment made at that time with the data. This model is optimizing for doing the assessment as minimal as possible. But this model can end up with an incorrect tag as the concept of sensitivity changes over time.

This model could be enhanced by scanning the whole database again when sensitivity policies change. This likely can be done with a low priority thread, so would have minimal impact. 

The advantage of predetermining the sensitivity is that one could then do queries that include queries of these sensitivity tags. This might be useful, or might be seen as an invasion of privacy.

I tend to place the SLS at the point of Access Control Enforcement. I prefer this as the nature of health data sensitivity is very contextual. The sensitive topics change over time, the nature of the sensitivity changes over time. The context of the request might also affect the decision.

It is possible that the SLS is invoked by the Access Control Enforcement, and it is intelligent enough to notice that the data is already pre-assessed, thus just returning that pre-assessment without doing any work. 

This would benefit from knowing how old that pre-assessment is. The age might be encoded as a custom security tag, for example a tag that simply indicates when the assessment was done, likely the policy version that was used. Another method might be to look for Provenance of the prior SLS update.

Provenance of SLS update 

When a SLS is used to update a Resource, a Provenance record could be created. This Provenance record would indicate the .agent is the SLS, the .policy is the specific policy the SLS used, and the date of the update. When the SLS is used to do a batch inspection of a large body of Resources, only one Provenance record would be needed, with a very large .target element pointing at all those that were assessed. I think it should be all those assessed, not just those that were updated.

Conclusion

So the SLS role is to somehow tag the data with kinds of sensitivity it represents, so that access control enforcement can support Data Segmentation for Privacy.

Here is a sample of how this is engaged

  1. Some access request is made -- Client ID, User ID, Roles, PurposeOfUse
  2. Gross access control decision is made --> Permit with scopes
  3. Data is gathered from FHIR Server using normal FHIR query parameter processing --> Bundle of stuff
  4. Bundle of stuff is examined by SLS. SLS looks for sensitivity topics, tagging data with those sensitivity codes (e.g. HIV, ETH, etc)
  5. Access Control Enforcement examines output of SLS relative to security token/scope to determine if whole result can be returned, or if some data needs to be removed.
  6. Access Control Enforcement sets each bundled Resource .meta.security with ConfidentityCode (R vs N), removing the sensitivity codes.
  7. Access Control Enforcement determines 'high water mark' ConfidentityCode to tag the Bundle.meta
  8. Access Control Enforcement may set other Bundle.meta.security values such as Obligations based on the Access Control Decision (e.g. Do-Not-Print) 
  9. Bundle of stuff is returned to requester

What is DS4P?

Privacy advocates continue to push for better support of Privacy. The goal is to have easy and implementable systems that enable a Patient to control where, when, and to whom their data is accessible. When the data are all considered equal in the eyes of Privacy, they can all be covered by simple rules of "Permit all access...", or "Deny all access..."
This would result in a much more simple world of yes or no. Or at least the consent rules would be more focused on the where, when, and to whom.

Note that we must start with the stepping stone of this more simple set of rules. It is not an end goal, but it is an important stepping stone. It enables data use for those without sensitive topics, while it does force those that have sensitive topics to either permit access and take on the ramifications, or deny access and take on the ramifications of data not being available. This is unacceptable, hence why it is not a goal.

Sensitive health topics


Healthcare is not simple. I am not going to say that other domains are simple, because many other domains are similarly complex (military, social, politics ...). So the next concepts are not all that special in healthcare.

Healthcare data contain some topics that have various sensitivities. Exposing these sensitivities to the wrong organization or person might damage the Patient. This damage might be social stigma. This damage might be financial (denied life insurance). This damage might manifest in physical violence.

Some data are themselves sensitive: Lab results showing positive tests for sexually transmitted disease, Genetic results showing higher likelihood for a hard to treat condition, Diagnosis of substance abuse.

Some episodes of care indicate sensitive topics even when there is no data recorded: Patient received psychotherapy treatment, patient was treated for substance abuse.

Some data are only sensitive in specific context. Best example  is that Sickle Cell diagnosis has historically been used to exclude people from serving in the military. That is to say that volunteers that really wanted to serve in the military would be denied if they had a Sickle Cell diagnosis. I understand this is no longer the case. But you can understand how a medical diagnosis could limit what you are allowed to do.

Some data might be marked as less sensitive so that it can be made more widely available. An example might be a document specifically assembled as an "Emergency Data set", a critical set of data with minimal facts useful in an emergency. Similar to a medical alert bracelet that announces to all that you are highly diabetic, this data would be anonymously accessible. The point of a medical alert bracelet is to address only the emergency portion of treatment, where stabilization of the emergency is the goal, where doing the wrong thing could make things worse. I expect most Emergency Data Set data are printed on a card carried on the patient, or available at a service the patient designates. But I bring up "less sensitive" as just as legitimate use of DS4P as more sensitive topics.

Sensitivities are hard


Healthcare data tends to be scientific facts. The sensitivity of that fact may not be obvious or a one-to-one relationship. That is to say that a medically trained individual can look at three seemingly unrelated facts and draw a conclusion that is a new fact derived from those three. This was the case often in HIV, where the diagnosis of HIV was not recorded, but where some lab results combined with some specific prescription drugs would be clear to a medically trained individual that the patient was HIV positive. None of the facts alone was a strong indicator of HIV positive status, only the combination.

Sensitivities change over time. A specific lab result might not be considered sensitive, but months later medical knowledge realizes that kind of a lab result is an indication of a medical condition that is considered sensitive. Thus what was originally a normal lab result, should now be treated as a sensitive health condition. It can also happen that a sensitive result may become less sensitive, although I expect this to be rare.

How data are tagged with specific kinds of sensitivity labels is the topic of my next article...

Conclusion


So, this is why the health database can't be simply treated as a "Permit all access..." or "Deny all access.." It is important that any organization that has health data must start with gross Permit and Deny capability. Which is what we have been stressing for the last 10 years. DS4P is indicating the next step beyond that yes/no level of consent, to a more conditional level of consent.

The goal of DS4P is to enable privacy policies to have different Permit/Deny rules for these lesser or more sensitive health topics. Thus "Data Segmentation" is the concept of being able to differentiate one kind of sensitive health data from another kind of health data, segmenting one from the other. With the goal that the variously segmented data can have different Privacy rules applied.

Break-Glass is one example where sensitive health topics might be blocked, but available if the medical professional has determined they are in a treatment situation and have medical safety reasons to override the blocking rules.

Alternatives:

Some would indicate that if the patient is the only one communicating their data, then the patient can choose what data gets exposed. This is not wrong, but is not complete. There are data flows that are not supported by patient mediated exchange.   But even in these cases the Patient might need help deciding which of their data is an indicator of a sensitive health topic.

See my other articles on Privacy