Wednesday, March 13, 2019

Basic Provenance Use-cases

There is a project starting in HL7 to define an Implementation Guide for "Basic Provenance" for use with CDA and FHIR. The motivation for this project, as I understand, is to move the Healthcare industry from providing very little Provenance, to providing Provenance that provides some value.

From W3C PROV we get a very clear definition of Provenance:
Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.

Why Provenance?

The reason to have Provenance is because down-stream the data you create will be used by someone for some reason, and they need Provenance so that they can rely on your data. The creation of Provenance is work that must be done up-stream, and the agent doing the work to create Provenance is not benefiting from the work they are doing to create Provenance. Emphasis that created Provenance is only needed down-stream a fraction of the time. 

So, what are the most valuable down-stream use-cases where Provenance is needed, so that we can assure that up-stream provides that Provenance. Thus the use-case analysis needs to start from the down-stream use-cases to inspire the up-stream work to be done. 

Up to now, most work on Provenance has been on the up-stream use-cases, trying to inspire them to create perfect Provenance. This has failed because the up-stream use-case gains nothing from recording or providing Provenance, and thus there is no apparent value to providing Provenance. Where there is not a clear value-proposition, there will not be implementation. 

Use-case for Basic Provenance:

We could start from an infinite set of use-cases where Provenance might be needed, but we would continue to not inspire upstream implementation. So we need a sub-set of the use-cases that are clearly valuable. 

The critical question I have today is what is the "Basic" set of use-cases where data Provenance is needed, but where it is not available.  The "Basic" set needs to be understood as a reasonable sub-set of the "Comprehensive" set of use-cases where Provenance could be used. This Basic set of use-cases is critical today to inspire the creation of Reasonable guidance that is clearly justified to be imposed. 

In Scope use-case
  • When I use data, I need to know the Provenance of that data so that I can assess the quality, reliability and trustworthiness.
    • Where data are created by my own organization, I can trust my organization mechanism (out-of-scope is managing Provenance within an Organization)
When data are not locally created:
  • When I import data, I need to know the Provenance of the data so that I can assess the quality, reliability and trustworthiness.
  • The data is asserted as authored by an Organization (In-Scope, assertions of Organization)
    • An individual or device is nice, but often impossible to assess quality, reliability, or trustworthiness of some other organizations individuals or devices. Thus this level of data is nice, but not useful. (Out-of-scope is identification of agents more refined than Organization)
    • The data within the data may be derived from, or copies of, data originally from a third Organization. (In-Scope, origin of data that is copied or used in derivations/transforms)
Thus In-Scope
  • When data are authored and exported to another Organization, there is a need to have Provenance of the authoring and exporting Organization 
  • When data are exported that contain data that are derived, or copied, from another organization, then Provenance to that original organization is given.
Provenance elements
  • What data 
  • Who Organization authored and exported
  • Accurate timestamps
  • Indication of Provenance action (DerivedFrom, Export, Import)
  • Reasonable mechanism to confirm data are authentic 
Not-In-Scope for Basic Provenance --> Advanced Provenance
  • Provenance use within an organization for their own data -- Organizations do need this for their own purposes, but would not need to conform to a standard.
  • Fine grain Provenance on workflow transitions or data lifecycle events
  • Fine grain Provenance actor more refined than Organization 
  • Transform methods or algorithms 
  • Digital Signature proof of Integrity 
  • Provenance on De-Identification or Re-Identification
  • Provenance on Delete/Destroy/Deactivate activities
Reference:



Blockchain Provenance Service

I am inspired by the use of a public Blockchain as a repository for Provenance. That is the Provenance Service is implemented by using Blockchain technology. The most intriguing part is that with this model, everyone within a community submits in-real-time Provenance records every time they do something worthy of Provenance. This Provenance Blockchain would be a Public, Permissioned chain. That is viewable (useable) by anyone, but only updated by a defined set of permissioned entities. The Provenance record can be sufficiently opaque, while still being effective:
  1. Rather than pointers (Provenance.target), there is simply the hash of the data.
  2. All records of 'who' are organizational only. Where the organization is expected to keep internal record of individual, device, service, agent.
  3. Activity is recorded (create, update, transform, export, import, destroy)
  4. Blockchain validates the Organization (who) and the timestamp (when)

So That: When data are used, the user of the data can hash the data and look into the Blockchain for records of Provenance on that data.
Big advantage of this model is that data transfer never need to worry about what level of Provenance needs to be carried, and the pathway that data follows can be multiple hops even through hostile actors. If the data is intact, then Provenance will be found. If Provenance is found, then integrity and authenticity can be proven.

Not finding Provenance may mean the data has been improperly modified, but may also just indicate a custodian/author that is not participating in that Provenance Blockchain. These false-positive and false-negative cases do need to be addressed.

This leverages the integrity and public aspects of Blockchain, while taking careful steps to not put individually identifiable data into the Blockchain.

What is not clear is how the Patient themselves participates. They clearly can be given access to read from the Blockchain, and would encourage this as it gives them some ability to track where their data goes. This is only true of data they know about, as you must have a hash of data. There would not be a patient identifier in the blockchain, so you couldn't see all activity. The question is if the Patient needs the ability to add Provenance evidence to the Provenance Blockchain. This is not to question the Patient ability to create data, they can. But rather to point out that opening this up to the Patient is opening it up to EVERYONE on the internet, thus there is a risk of 'bad guys' filling your Provenance Blockchain with crud. Note that I have the blockchain validating the Organization, and being a Public but Permissioned chain.

State of Healthcare Provenance today

Today, there is some provenance that is built into the Healthcare standards that are used. Some of this Provenance is not obvious, so let me expose it.

HL7 v2 has the least provenance information built into common use of the specification. This is not to say that there isn't provenance, but not much. In theory, one knows the sender of the message, but as a message, this sender information is usually discarded.

HL7 CDA has well implemented CDA header that holds Provenance. It isn't described as Provenance, but it is Provenance in that it describes: (a) Who authored the document, (b) What organization is the custodian of the document, (c) When was this document authored, and (d) Why was this document authored. Given that a CDA document is a document, and not a transport, it does not include to whom is it being sent, and from where is it being sent. These are gaps overall, but gaps that one should expect the transport to fill.

There is a CDA PROV specification, but it is not used today. This specification clarifies the basics, but also adds functionality for comprehensive Provenance within CDA.

IHE XDS/XCA/XDR/XDM/MHD is a document transport that is content agnostic. It can transport CDA, but it can also transport other content. With CDA, there is the above well defined basic Provenance. With other formats the document itself is not self declarative. Thus the XDS transports have defined metadata that explicitly carries these Provenance elements, along with other elements for other reasons (descriptive, identity, life-cycle, privacy, security, and exchange).

This metadata model was inspired by the CDA Header, but abstracted so as to work with any content type (which now includes FHIR  Documents), and many deployment models.
One Metadata Model - Many Deployment Architectures

Future on FHIR

FHIR has the most mature Provenance model. Not only does it have a Provenance Resource that can be used for any FHIR resource, but much of the Provenance data elements are often built into the core FHIR resource when that data element is fundamental to that FHIR resource. See the fiveWs page for details on this.

This model is inspired by W3C PROV, and long history in HL7 on Provenance. Thus it is intended to be comprehensive in function and ability.


FHIR Provenance Profile

There is a Profile of the FHIR Provenance resource to cover the use-case of data elements extracted from Documents that are shared in a Document Sharing (XDS/XCA/MHD) exchange, where the data are made available in Query for Element Data (e.g. Observations, Medications, Conditions, etc).  This use-case supports the case where someone is using the FHIR API to gain access to data, and they want to get the Provenance of the data they were given. Where the Provenance provides positive linkage to the Document(s) from which that data was extracted. See my webinars

Provenance Service vs Provenance with the Data

The CDA, XDS, and FHIR models defined above are all ways to carry Provenance with the data. This model is historically what is done in Healthcare, when Provenance is done at all.  The advantage of this model is that the Provenance data is conveyed with the data so that it is available when the data is used. 

However this is not the only model that could be done.

Provenance Service is a service that has the responsibility to support Provenance use-cases. When data are created, a record of that Provenance is submitted to a Provenance Service. When data are used, this service can be queried for evidence of Provenance on that data.

The FHIR Provenance resource could be managed in a standalone Service. A degenerate form of this Provenance Service in FHIR is where a FHIR Server that holds the clinical information also holds the Provenance. That is just a logical merging of the data custodian service with the Provenance Service.

The XDS model could be seen as a Provenance Service, much like FHIR. One can always lookup a document that you have in XDS to find a DocumentEntry. In that DocumentEntry is a hash and size of the document that you can confirm. There might be a digital signature association too.  If confirmed then you can look at the other elements in the DocumentEntry to see what the Provenance of that document is. Further one can see if the document has been replaced, transformed, or appended. This is not purely a Provenance Service, but the functionality does exist.

When is Provenance Created?

Simply whenever data are
  • created, 
  • updated, 
  • verified / authenticated, 
  • transformed / translated / derivedFrom, 
  • appended / amended, 
  • de-identified / re-identified,  
  • destroyed / deactivated / deleted,
  • exported / published / pushed,
  • imported
That is just about all actions other than Query and Read. Note all actions are AuditEvent relevant. Audit Log is not the same as Provenance. They are similar in what gets recorded and when a record is made; but the intended use and retention lifecycle of the AuditEvent is different than for Provenance. 

Having all of these records of Provenance is not valuable unless it is useful to those needing it.

When is Provenance needed?

The most important point of Provenance is that it is needed and used. A key part of use-case analysis is to look at the use of the data you are creating. If everyone created exhaustive Provenance records, but no-one used but 1% of those records, this would be wasteful.

So, lets look at Use-case for Basic Provenance


Monday, March 11, 2019

IHE produces Profiles using FHIR R4 for core functionality

Released from IHE is an update of 5 Profiles that represent a basic API to health data. The subset of IHE profiles that leverage HL7®FHIR® Release 4. The remainder of the IHE profiles that leverage HL7® FHIR® are expected to be upgraded to FHIR® Release 4 later in 2019. IHE published the following updated supplements for trial implementation as of March 6, 2019
  • IHE Appendix Z on HL7® FHIR® - Rev. 2.1
  • Mobile Access to Health Documents (MHD) with XDS on FHIR® - Rev. 3.1
    • DocmentReference (DocumentEntry)
    • DocumentManifest (SubmissionSet)
    • List (Folder)
    • Binary (the document)
  • Mobile Care Services Discovery (mCSD) - Rev. 2.1
    • Organization
    • Location
    • Practitioner
    • PractitonerRole
    • HealthcareServices
  • Patient Demographics Query for Mobile (PDQm) - Rev. 2.1
    • Patient
  • Query for Existing Data for Mobile (QEDm) - Rev. 2.1
    • AllergyIntolerance
    • Condition
    • DiagnosticReport
    • Encounter
    • Immunization
    • Medication
    • MedicationRequest
    • MedicationStatement
    • Observation
    • Procedure
    • Provenance

This subset is consistent with the Argonaut and US-Core subset of FHIR, yet does not include US specific constraints.   The combination of  these IHE profiles with US-Core could be a powerful focus of an IHE Connectathon "Projectathon". A Projectathon is when local constraints are added to IHE Connectathon testing.

Related to these 5 is the Mobile Cross-Enterprise Document Data Element Extraction (mXDE) which works with these profiles to provide an added value service.

The profile contained within the above document will be formally tested at the USA IHE Connectathon. The document is available for download at https://www.ihe.net/resources/technical_frameworks.

Updated:
I understand the target HHS/ONC regulation is 170.315(g)(7), 170.315(g)(10), and 170.315(g)(11). What ONC now calls the ARCH.

Thursday, February 28, 2019

IHE ITI Winter 2019

The IHE ITI, PCC, and QRPH workgroups met in OakBrook, IL this week at the RSNA Headquarters. This is the usual place. Which was a big disappointment as we all were expecting we would be meeting at this time in Treviso Italy., where it is (+12°C). So we got stuck with dirty snow and (-12°C) temperature.

I think we all had a good week. There was an amazingly diverse set of participants:

I participate mostly in ITI. The week was very successful. We are trying out a new project development process that is following more of an Agile development process. Normally the IHE calendar is split into 12 month development cycle, small and large work items need to fit within that timeframe. Under Continuous we can complete a work item when it is done, as fast as it can be done, thus picking up new items as we have resources become available. Under this model we have harder requirements on Quality, where as the 12 month cycle puts emphasis on date based deadlines.

The ITI workgroup completed:
  • Patient Identity Management using FHIR -- We got started on this, mostly agreeing to the usecases, scope, and actors. This is targeting supporting a regional or national level patient identity management system, and will be tied to XDS Registry to keep the registry up-to-date, so that XDS Consumers get accurate results when querying for a patient.
  • Updating mCSD -- We are about ready to send this out for Public Comment, look for it in a month. This is an effort to add support for "Facilities" and clarify how they are different than "Locations". The work-item is aimed also to define how to create multiple hierarchies between facilities. 
  • XCA Deferred Query -- We progressed to almost a complete specification, however this will likely not be approved for Public Comment until our next face-to-face. This work item intends to support use-cases where an XCA Responding Community will take longer to respond to query and response than is allowed by Asynchronous Web-Services. Use-case such as when a human must get involved with the response to evaluate patient privacy consent authorization, or where paper documents would need to be scanned and indexed. These longer response times would be supported through a Deferred query mechanism similar to what is in XCPD.
  • FHIR R4 priority 1 group - finishing the update to FHIR R4 for the group of IHE Profiles using FHIR that is considered a Priority for ONC: MHD, PDQm, QEDm, mCSD, and Appendix Z. we resolved public comment in addition to all outstanding CP. Compliance files (structureDefinition, capabilityStatement, etc) have not yet been converted to R4, this will be done in the coming months when Forge can be used.
    • The FHIR specification changed in response to public comment based on ballot and experience. For details on the summary of changes of the FHIR specification between STU3 and R4 please see http://build.fhir.org/history.html
    • The IHE profiles of PDQm, MHD, QEDm, mCSD, and the Appendix Z were updated to reference FHIR R4. The changes were dominantly simply to reference FHIR R4 rather than FHIR STU3. Thus the most important changes to implementations are found in the FHIR specification changes. Some changes were also initiated by a Change Proposal that identified a mistake in the STU3 version of the profile. Some changes were open-issues in STU3 that are now fixed in the FHIR R4 core specification. Change Proposals and open-issues are noted in the closed-issues in these drafts. These versions of the supplement were in Public Comment that resulted in 88 public comments that were then resolved. Note that mXDE profile is independent of the FHIR revision, as it orchestrates MHD and QEDm.
    • The following are specific additional changes beyond the update to reference FHIR R4:
    • MHD - now requires Document Recipient declaring XDS on FHIR Option must support all XDS association types
    • MHD - the canonical URI used for profiles are available, but they have not yet been created. These StructureDefinition xml files will be made available later as they are not normative.
    • MHD - the canonical URI used for the Provide Document Bundle transaction has changed because the FHIR canonical URI encoding rules don't allow for "-" character. We could have just change ITI-65 into ITI_65, but a breaking change is a breaking change. So we chose to replace with an actual structure definition based in the same URI space as our other Structure definitions. This means that we would no-longer use http://ihe.net/fhir/tag/iti-65, but rather we would use http://ihe.net/fhir/StructureDefinition/IHE_MHD_Provide_Comprehensive_DocumentBundle, or http://ihe.net/fhir/StructureDefinition/IHE_MHD_Provide_Minimal_DocumentBundle
    • MHD - many updates to the XDS-FHIR mapping, including recognizing the use of the .meta element to support minimal metadata
    • MHD - recognition of ProviderRole use
    • QEDm - fixed Provenance section that was hard to read and had some errors
We also worked with PCC and QRPH on their FHIR based profiles. Reminding them that they should be leveraging Appendix Z as much as they can, and let ITI know if there is any opportunity to improve Appendix Z. Further discussion that there are guidance on how to evaluate FHIR and profile FHIR. In addition there is now a GitHUB for IHE, and we are managing our FHIR compliance (XML) files there rather than managing them on the FTP.

ITI will be picking our next work item from the backlog. The backlog is maintained and prioritized on a monthly basis by the ITI Planning committee.

UPDATE: I predicted wrong:  I expect that we will convert the remainder of our FHIR profiles to FHIR R4: NPFS (Non-Patient File Sharing), ATNA FHIR Query, mACM (Mobile Alert Communication Management), and PIXm (Patient Identity Cross-Reference for Mobile).

We actually decided to take on creating the ATNA Feed on FHIR transaction, so that applications that are just FHIR based have an easy way to log security/privacy/other events, rather than only having the DICOM ATNA encoding using SYSLOG. This work item will also update the FHIR ATNA Query transaction to FHIR R4. There is strong interest in updating the rest of our FHIR Profiles, but that will be picked up at our next meeting when we finish the CSDm Facility supplement.

Keith has a report from PCC A Brief summary of my IHE ACDC Profile and A4R Whitepaper Proposals

Wednesday, February 20, 2019

Basic DS4P - How to set the confidentialityCode

I have covered the vision of Data Segmentation for Privacy (DS4P) concept, and outline how a Security Labeling Service (SLS) would enable this grandiose vision of DS4P.

However, there are stepping stones: The following is a slightly update on an article I wrote in July 2015 on how to set the confidentialityCode.  I have used bold/underbar to indicate where I enhanced the original text.

The problem then, as it is today, the confidentialityCode value that everyone uses is "N" (Normal confidentiality). Which does not help for Data Segmentation nor Privacy.

The recommendation I give here is restricted to the gross level: for Document Sharing at the XDS/XCA/DocumentReference metadata level;  for FHIR REST at the returned Bundle.meta.security level, but not on each Resource in the Bundle; and for CDA at the CDA header, but not on each element. Going deeper is possible, but not what I am trying to drive as the next step beyond "N".

Some background articles:

Recommendation for setting confidentiatlityCode

So. I would continue to recommend that anyone or any-system publishing health data such as FHIR resources, FHIR documents, and CDA documents should use "N", unless they have evidence to say that is the wrong value. Meaning it should be a specific effort to choose the other values:
  • "R", because there is specifically sensitive content – HIV, alcohol/drug abuse, etc.
  • "V", because the content should be seen only when reader is individually authorized -- psychology-notes, usually also used on all VIP patients (Not a best practice, but reality).
  • "M", because the content is less sensitive than normal, but still medical. authorized for wide distribution – like an emergency-data-set, or for dietary use-cases
  • "L", because the content is not medical, or has been de-identified
  • "U", because the content is not specific to an individual and is public

This is right out of the definition of the vocabulary values 2.16.840.1.113883.5.25 for "_confidentiality". Available from the FHIR specification for easy reading. https://www.hl7.org/fhir/v3/Confidentiality/index.html

How to determine what the value should be?

I don't disagree that this is a hard thing to determine.
  • It might be determined by workflow, psychology notes clearly are coming from a psychology workflow. 
  • Clearly de-identification is a specific workflow. 
  • It might be an explicit act, where the user is specifically trying to make a less-sensitive document for broad use such as a emergency-dataset, or 
  • for export to the dietitian. 
  •  It might be a specific request, where the clinician decides that the data is very sensitive, or 
  • where the patient decides that the data is very sensitive. 
This is different than a patient choice in consent regarding rules applied to these different codes, meaning where a patient chooses a restrictive consent for their data accessibility. See http://healthcaresecprivacy.blogspot.com/p/topics.html#Privacy

The VHA has shown some success in demonstration projects with passing the data through a Security Labeling Service (SLS) that leverages Natural Language Processing and  CDS (Clinical Decision Support) to tag sensitive clinical concepts. See FHIR Demonstration of DS4P (sorry the video is lost). If none are found then the data is "N", if some are found then the data is "R", if specific types are found the data is "V"… This automated method has me somewhat worried as the social norms of what is sensitive, change often. So using this automated form on publication time might produce wrong evaluation overtime. In the case of the VHA demonstration, they applied it upon 'use' of the data, so it was using the social norms rules at the time of reading. Likely better social norm rules, but not sure this is better behavior. Note that the intermediate step is tagged sensitivity category, which might be given to the access control system as information to be used in the access control decision or enforcement

Is there more?

All the other security-tags are not likely to be set upon publication. IHE has brought in the whole of  the "Healthcare Privacy/Security Classification System"
  • IHE specifically recommends against using the sensitivity category, as the value itself is sensitive. They are useful for internal use, like the VHA demonstration.
  • Compartment is mostly undefined, but would likely be a local value-set. Unlikely to be understood at publication time. Interesting place to play, as it might be used to define compartments like Radiology, Cardiology, Psychology, etc... but it is untested grounds.
    • More likely it is used to tag specific authorized Research projects by name
  • Integrity could be equally set by the publisher, although it is not well enough defined. But this would be a way to tag data that was generated by the patient, vs data generated by a licensed clinician.
  • Handling caveats might be set on publication. The only cases I can think of are similar to the "V" cases, in that the author explicitly knows something about the data and thus needs to add a caveat.
    • One specific example is 42CFR Part 2 – SAMSA covered treatment- that must be explicitly marked with a 'do not disclose without explicit authorization by the patient'. NOAUTH
    • Second specific example is an obligation to delete after use, which specifically forbids persistence (including printing) DELAU

Conclusion

So, simple guidance. You need all of the _confidentiality vocabulary, and two more from the handling caveats. -- [U, L, M, N, V, R] + NOAUTH + DELAU

Blog articles by Topic

Segmenting Sensitive Health Topics

In my last article I outlined the need to recognize that health data have various kinds of sensitivity, which informs various types of Privacy rules of access, to support the goal of Privacy. Thus Data Segmentation for Privacy (DS4P). Here I am going to explain some current thinking of how an Access Control Enforcement engine can tell sensitive data from normal health data.

Access Control is broken into various parts. One part makes an access control decision. This is made based on possibly many vectors. Please read this article on Vectors through Consent to Control Big-Data Feeding frenzy. It explains that some data is sensitive simply because of who authored it (Betty Ford Clinic), which is clear by looking at the author element.

The problem I point out in the last article is that differentiating sensitive data from normal data is not easy.

Back 20 years ago, there seemed to be an expectation that when a clinician recorded some fact, they would also tag that fact with a sensitivity tag.  Thus when an access request was made these tags could be inspected by the access control engine to determine if the data could be accessed by the individual requesting access. The reality is that this tagging at authoring by the clinician was unreasonable and never done. The reality of the time was also a more simple time.

Thus there are large databases of longitudinal data that has never been assessed if it is sensitive or not. How would one enforce Data Segmentation for Privacy (DS4P) if there is no way to identify what data needs to be segmented?

Security Labeling Service


Thus the Security Labeling Service (SLS) was born. This service does what the name indicates:  given a bunch of data, it applies security labeling. 

The capability might be gross or fine-grain:
  1. Only identify the overall Confidentiality Assessment. Is the data normal health data, or is it Restricted?
  2. Only identify the various sensitive kinds of data within the data. The data has indicators of sexually transmitted disease, substance abuse, etc..
  3. Identify which fragments of the data are sensitive. The data is not modified, but enough information is given to identify the fragments. For example a FHIR Bundle might be assessed, and a list of Resources within the bundle might be identified with specific tags. 
  4. Tag fragments of the data with sensitivity. The data is modified with the tags. Such as updating the FHIR resources .meta.security value. 
There are likely more, but this subset seems foundational.

The SLS might operate on a single observation, a Bundle of Resources, a CDA document, a FHIR Bulk Data blob, or a whole database.

How does the SLS work?

The reason to create the concept of the SLS was to isolate the hard work of determine the sensitivity from the Access Control Decision and Enforcement. Thus us privacy and security experts were explicitly invoking the Hitchhikers Guide to the Galaxy:  "Somebody else's problem field". Which means, I don't know how it works....

One idea is that the SLS just has a list of terms it looks for

One idea is that the SLS leverages Clinical Decision Support is used

One idea is that Natural Language Processing is used

One idea is that Big Data and Machine Learning is used

I am sure someone would indicate the Blockchain is used

Most likely many methods are used. It depends on the needs of the organization, data, and patients.

To modify the data or not

I tend to not want the Security Labeling Service (SLS) to modify the data. Mostly because the kind of function I want out of the SLS is simply to identify the sensitivity kinds. These sensitivity kinds of data are not typically exposed to end users or recipient organizations. They are just used by the Access Control Enforcement to determine if the data should be allowed to be accessed, blocked, or modified. Thus any changes to the data would happen by the Access Control Enforcement, not the SLS. 

There is a camp that combines Access Control Enforcement and SLS into one service. I think this is simply combination. Thus this situation is explicitly the combination of Access Control Enforcement and Security Labeling Service into one thing; not a new kind of Security Labeling Service (SLS).

When to Scan?

A model is to scan the data when it is created/updated, and save the assessment made at that time with the data. This model is optimizing for doing the assessment as minimal as possible. But this model can end up with an incorrect tag as the concept of sensitivity changes over time.

This model could be enhanced by scanning the whole database again when sensitivity policies change. This likely can be done with a low priority thread, so would have minimal impact. 

The advantage of predetermining the sensitivity is that one could then do queries that include queries of these sensitivity tags. This might be useful, or might be seen as an invasion of privacy.

I tend to place the SLS at the point of Access Control Enforcement. I prefer this as the nature of health data sensitivity is very contextual. The sensitive topics change over time, the nature of the sensitivity changes over time. The context of the request might also affect the decision.

It is possible that the SLS is invoked by the Access Control Enforcement, and it is intelligent enough to notice that the data is already pre-assessed, thus just returning that pre-assessment without doing any work. 

This would benefit from knowing how old that pre-assessment is. The age might be encoded as a custom security tag, for example a tag that simply indicates when the assessment was done, likely the policy version that was used. Another method might be to look for Provenance of the prior SLS update.

Provenance of SLS update 

When a SLS is used to update a Resource, a Provenance record could be created. This Provenance record would indicate the .agent is the SLS, the .policy is the specific policy the SLS used, and the date of the update. When the SLS is used to do a batch inspection of a large body of Resources, only one Provenance record would be needed, with a very large .target element pointing at all those that were assessed. I think it should be all those assessed, not just those that were updated.

Conclusion

So the SLS role is to somehow tag the data with kinds of sensitivity it represents, so that access control enforcement can support Data Segmentation for Privacy.

Here is a sample of how this is engaged

  1. Some access request is made -- Client ID, User ID, Roles, PurposeOfUse
  2. Gross access control decision is made --> Permit with scopes
  3. Data is gathered from FHIR Server using normal FHIR query parameter processing --> Bundle of stuff
  4. Bundle of stuff is examined by SLS. SLS looks for sensitivity topics, tagging data with those sensitivity codes (e.g. HIV, ETH, etc)
  5. Access Control Enforcement examines output of SLS relative to security token/scope to determine if whole result can be returned, or if some data needs to be removed.
  6. Access Control Enforcement sets each bundled Resource .meta.security with ConfidentityCode (R vs N), removing the sensitivity codes.
  7. Access Control Enforcement determines 'high water mark' ConfidentityCode to tag the Bundle.meta
  8. Access Control Enforcement may set other Bundle.meta.security values such as Obligations based on the Access Control Decision (e.g. Do-Not-Print) 
  9. Bundle of stuff is returned to requester