Wednesday, March 13, 2019

Basic Provenance Use-cases

There is a project starting in HL7 to define an Implementation Guide for "Basic Provenance" for use with CDA and FHIR. The motivation for this project, as I understand, is to move the Healthcare industry from providing very little Provenance, to providing Provenance that provides some value.

From W3C PROV we get a very clear definition of Provenance:
Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.

Why Provenance?

The reason to have Provenance is because down-stream the data you create will be used by someone for some reason, and they need Provenance so that they can rely on your data. The creation of Provenance is work that must be done up-stream, and the agent doing the work to create Provenance is not benefiting from the work they are doing to create Provenance. Emphasis that created Provenance is only needed down-stream a fraction of the time. 

So, what are the most valuable down-stream use-cases where Provenance is needed, so that we can assure that up-stream provides that Provenance. Thus the use-case analysis needs to start from the down-stream use-cases to inspire the up-stream work to be done. 

Up to now, most work on Provenance has been on the up-stream use-cases, trying to inspire them to create perfect Provenance. This has failed because the up-stream use-case gains nothing from recording or providing Provenance, and thus there is no apparent value to providing Provenance. Where there is not a clear value-proposition, there will not be implementation. 

Use-case for Basic Provenance:

We could start from an infinite set of use-cases where Provenance might be needed, but we would continue to not inspire upstream implementation. So we need a sub-set of the use-cases that are clearly valuable. 

The critical question I have today is what is the "Basic" set of use-cases where data Provenance is needed, but where it is not available.  The "Basic" set needs to be understood as a reasonable sub-set of the "Comprehensive" set of use-cases where Provenance could be used. This Basic set of use-cases is critical today to inspire the creation of Reasonable guidance that is clearly justified to be imposed. 

In Scope use-case
  • When I use data, I need to know the Provenance of that data so that I can assess the quality, reliability and trustworthiness.
    • Where data are created by my own organization, I can trust my organization mechanism (out-of-scope is managing Provenance within an Organization)
When data are not locally created:
  • When I import data, I need to know the Provenance of the data so that I can assess the quality, reliability and trustworthiness.
  • The data is asserted as authored by an Organization (In-Scope, assertions of Organization)
    • An individual or device is nice, but often impossible to assess quality, reliability, or trustworthiness of some other organizations individuals or devices. Thus this level of data is nice, but not useful. (Out-of-scope is identification of agents more refined than Organization)
    • The data within the data may be derived from, or copies of, data originally from a third Organization. (In-Scope, origin of data that is copied or used in derivations/transforms)
Thus In-Scope
  • When data are authored and exported to another Organization, there is a need to have Provenance of the authoring and exporting Organization 
  • When data are exported that contain data that are derived, or copied, from another organization, then Provenance to that original organization is given.
Provenance elements
  • What data 
  • Who Organization authored and exported
  • Accurate timestamps
  • Indication of Provenance action (DerivedFrom, Export, Import)
  • Reasonable mechanism to confirm data are authentic 
Not-In-Scope for Basic Provenance --> Advanced Provenance
  • Provenance use within an organization for their own data -- Organizations do need this for their own purposes, but would not need to conform to a standard.
  • Fine grain Provenance on workflow transitions or data lifecycle events
  • Fine grain Provenance actor more refined than Organization 
  • Transform methods or algorithms 
  • Digital Signature proof of Integrity 
  • Provenance on De-Identification or Re-Identification
  • Provenance on Delete/Destroy/Deactivate activities

Blockchain Provenance Service

I am inspired by the use of a public Blockchain as a repository for Provenance. That is the Provenance Service is implemented by using Blockchain technology. The most intriguing part is that with this model, everyone within a community submits in-real-time Provenance records every time they do something worthy of Provenance. This Provenance Blockchain would be a Public, Permissioned chain. That is viewable (useable) by anyone, but only updated by a defined set of permissioned entities. The Provenance record can be sufficiently opaque, while still being effective:
  1. Rather than pointers (, there is simply the hash of the data.
  2. All records of 'who' are organizational only. Where the organization is expected to keep internal record of individual, device, service, agent.
  3. Activity is recorded (create, update, transform, export, import, destroy)
  4. Blockchain validates the Organization (who) and the timestamp (when)

So That: When data are used, the user of the data can hash the data and look into the Blockchain for records of Provenance on that data.
Big advantage of this model is that data transfer never need to worry about what level of Provenance needs to be carried, and the pathway that data follows can be multiple hops even through hostile actors. If the data is intact, then Provenance will be found. If Provenance is found, then integrity and authenticity can be proven.

Not finding Provenance may mean the data has been improperly modified, but may also just indicate a custodian/author that is not participating in that Provenance Blockchain. These false-positive and false-negative cases do need to be addressed.

This leverages the integrity and public aspects of Blockchain, while taking careful steps to not put individually identifiable data into the Blockchain.

What is not clear is how the Patient themselves participates. They clearly can be given access to read from the Blockchain, and would encourage this as it gives them some ability to track where their data goes. This is only true of data they know about, as you must have a hash of data. There would not be a patient identifier in the blockchain, so you couldn't see all activity. The question is if the Patient needs the ability to add Provenance evidence to the Provenance Blockchain. This is not to question the Patient ability to create data, they can. But rather to point out that opening this up to the Patient is opening it up to EVERYONE on the internet, thus there is a risk of 'bad guys' filling your Provenance Blockchain with crud. Note that I have the blockchain validating the Organization, and being a Public but Permissioned chain.

State of Healthcare Provenance today

Today, there is some provenance that is built into the Healthcare standards that are used. Some of this Provenance is not obvious, so let me expose it.

HL7 v2 has the least provenance information built into common use of the specification. This is not to say that there isn't provenance, but not much. In theory, one knows the sender of the message, but as a message, this sender information is usually discarded.

HL7 CDA has well implemented CDA header that holds Provenance. It isn't described as Provenance, but it is Provenance in that it describes: (a) Who authored the document, (b) What organization is the custodian of the document, (c) When was this document authored, and (d) Why was this document authored. Given that a CDA document is a document, and not a transport, it does not include to whom is it being sent, and from where is it being sent. These are gaps overall, but gaps that one should expect the transport to fill.

There is a CDA PROV specification, but it is not used today. This specification clarifies the basics, but also adds functionality for comprehensive Provenance within CDA.

IHE XDS/XCA/XDR/XDM/MHD is a document transport that is content agnostic. It can transport CDA, but it can also transport other content. With CDA, there is the above well defined basic Provenance. With other formats the document itself is not self declarative. Thus the XDS transports have defined metadata that explicitly carries these Provenance elements, along with other elements for other reasons (descriptive, identity, life-cycle, privacy, security, and exchange).

This metadata model was inspired by the CDA Header, but abstracted so as to work with any content type (which now includes FHIR  Documents), and many deployment models.
One Metadata Model - Many Deployment Architectures

Future on FHIR

FHIR has the most mature Provenance model. Not only does it have a Provenance Resource that can be used for any FHIR resource, but much of the Provenance data elements are often built into the core FHIR resource when that data element is fundamental to that FHIR resource. See the fiveWs page for details on this.

This model is inspired by W3C PROV, and long history in HL7 on Provenance. Thus it is intended to be comprehensive in function and ability.

FHIR Provenance Profile

There is a Profile of the FHIR Provenance resource to cover the use-case of data elements extracted from Documents that are shared in a Document Sharing (XDS/XCA/MHD) exchange, where the data are made available in Query for Element Data (e.g. Observations, Medications, Conditions, etc).  This use-case supports the case where someone is using the FHIR API to gain access to data, and they want to get the Provenance of the data they were given. Where the Provenance provides positive linkage to the Document(s) from which that data was extracted. See my webinars

Provenance Service vs Provenance with the Data

The CDA, XDS, and FHIR models defined above are all ways to carry Provenance with the data. This model is historically what is done in Healthcare, when Provenance is done at all.  The advantage of this model is that the Provenance data is conveyed with the data so that it is available when the data is used. 

However this is not the only model that could be done.

Provenance Service is a service that has the responsibility to support Provenance use-cases. When data are created, a record of that Provenance is submitted to a Provenance Service. When data are used, this service can be queried for evidence of Provenance on that data.

The FHIR Provenance resource could be managed in a standalone Service. A degenerate form of this Provenance Service in FHIR is where a FHIR Server that holds the clinical information also holds the Provenance. That is just a logical merging of the data custodian service with the Provenance Service.

The XDS model could be seen as a Provenance Service, much like FHIR. One can always lookup a document that you have in XDS to find a DocumentEntry. In that DocumentEntry is a hash and size of the document that you can confirm. There might be a digital signature association too.  If confirmed then you can look at the other elements in the DocumentEntry to see what the Provenance of that document is. Further one can see if the document has been replaced, transformed, or appended. This is not purely a Provenance Service, but the functionality does exist.

When is Provenance Created?

Simply whenever data are
  • created, 
  • updated, 
  • verified / authenticated, 
  • transformed / translated / derivedFrom, 
  • appended / amended, 
  • de-identified / re-identified,  
  • destroyed / deactivated / deleted,
  • exported / published / pushed,
  • imported
That is just about all actions other than Query and Read. Note all actions are AuditEvent relevant. Audit Log is not the same as Provenance. They are similar in what gets recorded and when a record is made; but the intended use and retention lifecycle of the AuditEvent is different than for Provenance. 

Having all of these records of Provenance is not valuable unless it is useful to those needing it.

When is Provenance needed?

The most important point of Provenance is that it is needed and used. A key part of use-case analysis is to look at the use of the data you are creating. If everyone created exhaustive Provenance records, but no-one used but 1% of those records, this would be wasteful.

So, lets look at Use-case for Basic Provenance

Monday, March 11, 2019

IHE produces Profiles using FHIR R4 for core functionality

Released from IHE is an update of 5 Profiles that represent a basic API to health data. The subset of IHE profiles that leverage HL7®FHIR® Release 4. The remainder of the IHE profiles that leverage HL7® FHIR® are expected to be upgraded to FHIR® Release 4 later in 2019. IHE published the following updated supplements for trial implementation as of March 6, 2019
  • IHE Appendix Z on HL7® FHIR® - Rev. 2.1
  • Mobile Access to Health Documents (MHD) with XDS on FHIR® - Rev. 3.1
    • DocmentReference (DocumentEntry)
    • DocumentManifest (SubmissionSet)
    • List (Folder)
    • Binary (the document)
  • Mobile Care Services Discovery (mCSD) - Rev. 2.1
    • Organization
    • Location
    • Practitioner
    • PractitonerRole
    • HealthcareServices
  • Patient Demographics Query for Mobile (PDQm) - Rev. 2.1
    • Patient
  • Query for Existing Data for Mobile (QEDm) - Rev. 2.1
    • AllergyIntolerance
    • Condition
    • DiagnosticReport
    • Encounter
    • Immunization
    • Medication
    • MedicationRequest
    • MedicationStatement
    • Observation
    • Procedure
    • Provenance

This subset is consistent with the Argonaut and US-Core subset of FHIR, yet does not include US specific constraints.   The combination of  these IHE profiles with US-Core could be a powerful focus of an IHE Connectathon "Projectathon". A Projectathon is when local constraints are added to IHE Connectathon testing.

Related to these 5 is the Mobile Cross-Enterprise Document Data Element Extraction (mXDE) which works with these profiles to provide an added value service.

The profile contained within the above document will be formally tested at the USA IHE Connectathon. The document is available for download at

I understand the target HHS/ONC regulation is 170.315(g)(7), 170.315(g)(10), and 170.315(g)(11). What ONC now calls the ARCH.