Friday, November 1, 2024

De-Identification as a Service

I have had some conversations lately around a De-Identification Service, specifically if it is possible for a general service that could be used like Actors within IHE. The problem that I have historically came up with is that there is no standard for defining de-identification policy, that set of rules that would drive the de-identification process in a way that (a) protects against re-identification, and (b) provides sufficient detail in the resulting dataset for a (c) given purpose. 

There are standards on the concept of De-Identification, and I have written articles on the process. Key to any discussion on De-Identification is to recognize that it is a process, it is not an algorithm. De-Identification is not like Encryption, or Signatures for which one can have a defined algorithm. This because De-Identification is trying to balance opposing forces: The appropriate use of the data that needs specific fidelity to the data, against the inappropriate re-identification of the subjects of the data whose privacy must be protected.

IHE has defined a "De-Identification Handbook" that speaks to how to go about defining a De-Identification Policy, and addresses why this is something that is a process. This handbook helps you identify what parts of your data are direct identifiers and what are indirect identifiers. It identifies some common ways to change data during the de-identification process, such as redact, generalize, fuzz, replace, etc. The handbook also covers how to assess your dataset to see if your choice of policy is sufficient.

I have a general orchestration diagram in my Security and Privacy Tutorial - http://bit.ly/FHIR-SecPriv 


This diagram is very abstract, presuming some kind of Query can be done by some Research Analytics App, that can be mediated by a De-Identification Service which if the request is authorized and appropriate can forward the request to a Resource Server. The Resource Server responds with the full fidelity data, the De-Identification Service mediates and de-identifies the data before returning the results to the Research Analytics App. This generalization presumes alot, including that the query can be mediated like this, and that the results can be de-identified in-real-time. Most De-Identification is done on a dataset, so that the resulting dataset can be analyzed to see that it has indeed met the goal of de-identification, often using an algorithm like K-Anonymity. The above could be done, but is far more of a systems design task, and not as simple as shown.

I think a more likely is that De-Identification Service orchestration is on a PUSH or FEED of data. That is not to say that it might not be a Query, but rather that it is a BULK of data. So, for example the FHIR Bulk Data Access might work.  So, for this let's take a generic push set of Actors and Transaction. 


In this diagram there is a data source and a data recipient and some standards-based transaction between them. 

We then insert our De-Identification Service in between by Grouping a Data Recipient with our De-Identification Service and by also grouping a Data Source. Thus, the original two actors, are now end-to-end talking, but they are talking to each other with an intermediary.



We then recognize that the de-identification policy needs to be available to the De-Identification Service and must be administered by some Policy Admin


Unfortunately, I don't know of a Standard that exists for De-Identification Policy. So, these two actors can't really be defined. They need to be some functionality inside of the De-Identification Service.

So, this is the diagram I come up with. This is more than what I discussed above, as it starts with Document based sharing, and ends up with De-Identified FHIR Rest queries. Thus, the data is feed into the De-Identification Service (MHD), but that De-Identification Service groups a bunch (mXDE) of other IHE profiles and ultimately provides access to the De-Identified data using FHIR Rest (QEDm). This diagram does not abstract out the policy, it is part of the systems design.



I have used MHD and QEDm in this example. But given that I simply grouped within the De-Identification Server the peer Actor from those transactions; then the external view of the De-Identification Server is that it is using MHD and QEDm standards; essentially magic happens inside.

Similar can be done with other standards. This left as an exercise to my reader.


Wednesday, September 25, 2024

Is honoring a Patients Consent a form of forbidden Information Blocking

As I work hard to enable a patient to express the privacy rules around how their health information can be used and by whom for what reasons; I hear that there is worry that an organization that honors those wishes by blocking the data for a given use, that this Organization may be seen as violating the regulations forbidding Information Blocking.

In HTI-2 there is some discussion around some sensitive data that has been expressed as being a special case. However, this is just one kind of data that is or might be considered sensitive by a patient.


My concern is wider than just the ONC HTI-2 and the USA Information Blocking regulations. There are other state level regulations that might force data to be shared in circumstances for which the patient does not want to share. This is not to say I am against some required reporting, but to recognize that there is a wider overlap between potential sensitive classes of data and unreasonable expectations to mandate data sharing.

I am a fan of defining classes of data that are sensitive, that are generally stigmatizing health topics. These defined classes need a specific and actionable definition, so that it is clear to all what is within that class and what is not within that class. This is important to be sure policies work together when bridged. The reality is that these classes are not as distinct as we would like, but today they are hardly even given names of the classes.

One class that is discussed is sexual health topics; which seems clear but is not clear at the detail and technical level. 

The Patient should be empowered to define what is sensitive to them. The use of sensitive classes of data should be a starting point, but the patient should also be allowed to restrict data within a timeframe, or data associated with a specific treatment episode/encounter, or even to identify specific data by identifier.

When these complex Consents can be implemented by an organization, and that organization allows more refined Consent provisions; then these restrictions should not be seen as a forbidden Information Blocking. We should not be questioning the patient's choices.

Tuesday, September 24, 2024

Healthcare AI - Provenance of AI outputs

AI is the focus of the HL7 Workgroup Plus meeting this week. As I sit in on the presentations, I find that there are some efforts that the Security WG has already put in place that are not understood. So this article will expose some of the things that Security WG has already put in place to support AI.

AI Output Provenance

First up is that there is a concern that any diagnosis, notes, observations, or other content that is created by AI, or assisted by AI, should be tagged as such. With this provenance any downstream use of the data or decisions are informed that the data came from an AI output.

An important aspect of this is to understand the background of the data, the Provenance. This might be a positive aspect, or might be seen as a drawback. The Security WG is not trying to impugn or promote; we are just wanting to provide the way for the data or decision to be tagged appropriately.

There are two methods.

Provenance Tag

There is a data tag that can be applied to any data to indicate that it came from AI.

AIAST - Artificial Intelligence asserted  --- Security provenance metadata observation value used to indicate that an IT resource (data, or information object) was asserted by a Artificial Intelligence (e.g. Clinical Decision Support, Machine Learning, Algorithm).

This might appear on the top of the FHIR Resource in the .meta.security

           "resourceType" : "Condition",
           "id" : "1",
           "meta" : {
              "security" : [{
                "system" : "http://terminology.hl7.org/CodeSystem/v3-ObservationValue",             
                "code" : "AIAST" }
                ]
              },
           ... other content etc.....
         }
 

This can also be used using the element level tagging defined in the DS4P - inline security labels
Using this would cover a DiagnosticReport that has one .note element that is the output of an AI analysis of the data. The DiagnosticReport would indicate that there is an inline label, and just that one .note would be tagged as being AI Asserted.

Non-FHIR - The AIAST code is available for use elsewhere. Such as in HL7 v2, CDA, DICOM, and IHE-XDS. As a code it is very portable. These other standards include ways of carrying security tags, and thus this AIAST code.

Provenance Resource


The Provenance resource would be used when more than the tag is needed. This Provenance would take advantage of the AIAST tag, to indicate that the purpose of this Provenance is to indicate details about the AI Assertion.

The above Provenance Tag might still be useful to use, with the Provance Resource providing the details of the provenance of that assertion.

The Provenance Resource might also use the target element extension or target path extension. to point at the specific elements of the target resource that came from AI Assertions.

The Provenance Resource can also indicate the specific AI algorithm using a Device resource. In this way one can understand the revision of the AI that was used. Possible that if there is then determined to be a problem (bias) with that version of the AI model, one can find all the decisions that were recorded from it. This might also include parameters and context around the use of the AI algorithm.

The Provenance Resource can indicate the data from the patient chart that were considered by the AI algorithm.

The Provenance can also indicate other traceability, such as what portion of the AI model were used.

As with any Provenance, the other elements can be filled out to provide details on when, why, where.

AI use of Provenance

AI will often look at a patient record to determine a NEW diagnosis or write a new note. These interactions by AI should be aware of data that has the AIAST tag, so that the AI can distinguish data that has been entered as new, from data that was derived by previous AI use. This is often referred to as “model collapse” or “feedback loops.” One possibility is that AI will ignore any data or data elements previous authored by AI.

Tuesday, September 3, 2024

Speaking at free #HL7 #FHIR #HealthIT #Cybersecurity Event


Excited to announce that I'll be speaking at the HL7 FHIR Security Education Event on September 4-5! This virtual event is packed with insights and discussions tailored for everyone in the health IT community.

Two Tracks to Choose From:

  1. General Track: Perfect for those looking to deepen their understanding of FHIR security without getting too technical.
  2. Developer Track: Designed for health IT architects, developers and engineers who want to dive into the details.

Join me and other experts as we explore the latest in FHIR security. Don’t miss out on this opportunity to enhance your knowledge and network with fellow professionals!

Register free at: https://info.hl7.org/hl7-fhir-security-education-event-0

#FHIR #HL7 #HealthIT #Cybersecurity #FHIRSecurity

Friday, August 23, 2024

Simple definition of ABAC and #FHIR

ABAC: data has "attributes" (elements), that may be summarized into "sensitivity tags" (SLS) that are also attributes. Policies indicate "classifications" of data that indicate how data are to be protected based on some attributes (may be sensitivity tags, but can be any attributes). Policies indicate what "clearance" (aka roles) have access to each data "classification". Users are grouped into "clearances" (aka roles); this might be a FHIR PractitionerRole, CareTeam, RelatedPerson, and Group; but might be something non-FHIR (aka OAuth, LDAP, etc).     


Thus: 

  • user have one or more "clearance"
  • data have one or more "classification"
  • access is granted if "clearance" permits "classification" (often said to be clearance==classification)

Note key ABAC words are quoted above: "attributes", "classification", "clearance" are the most important.

Now, that is just formal.... adjustments can be made for complexity of policy or simplicity of policy or risk addressing policy...

Must security tags be used?

No, ABAC is based on Attributes. So any attribute can be used.

A good example is Observation.category code of 'vital-signs' -- indicates vital signs that are normal health information of no stigmatizing sensitivity. No real need to dig deeper (maybe).

Some ABAC rules can't be implemented with security tags. For example rules related to the author, or rules related to a timeframe, etc. These would address these attributes (elements) in the data.

Then why use security tags?

Using security tags, and a security labeling service, allows for the Access Control implementation to be less aware of the data structure. Meaning that the Security Labeling Service is where all the knowledge of the data model and information model exists. The SLS must understand FHIR. The SLS must understand medical knowledge, and the relationships between the complexity of medical knowledge. The SLS boils all that down to a set of codes and places those codes into a common place in all the FHIR resources, the .meta.security element.

Thus the Access Control decision and enforcement need only look at that one element. There is no need to understand that Observation.code is an important attribute.

Thus the above 'vital-signs' rule would be in the SLS, not needed to be implemented anywhere else.

Does the patient tag the data?

It is possible for the data to be tagged by the patient, however this is not all that popular of a way to implement the need for patients to be able to identify sensitive data. Better for the Patient's Consent to list out the identifier of those resources that they consider sensitive and thus an explicit rule would exist in the FHIR Consent.provision covering these data. This has the added benefit that the data do not get changed when the patient decides they are sensitive or decides later they are not sensitive. Thus the data are always only Created or Updated by the custodian of the data.

Does the clinician tag the data?

It is possible for the data to be tagged by the clinician (Practitioner). This is typical in the Military Secret workflows, but has been shown to be not workable with clinicians. Thus this idea is generally not accepted as a way for the tags to get set.

Do data security tags change over time?

anything is possible, but the assessment of the data should be purely about the data. That assessment should not be based on how the data are to be protected or made available. Thus a piece of data that is sensitive to "gender issues" will always be about gender issues and not change.

The one thing to consider is that medical knowledge does change. There was a time when specific drugs were for their original and non-sensitive reasons; but we learned that that drug is also helpful for addressing drug addiction. Thus getting that medication would now be sensitive when it was not before. Thus there is sometimes when medical knowledge changes that data may need to be reassessed.


Here are some of my previous articles on Access Control


Wednesday, August 14, 2024

FHIR Security Labels and ABAC

I am rather excited that I have been asked about FHIR Security Labels lately by people getting started at implementing. I have tried to find out who has implemented this, but it is a security/privacy topic and thus everyone wants to be covert about it. Thus, I can't tell how widely it has been implemented. 

The concept is founded in Attribute Based Access Control (ABAC) that is a common IT access control standard that is especially important in data domains with sensitive information like healthcare, finance, military, etc. I would recommend looking at the generic ABAC details and implementations first. This is foundational to what we have put into FHIR.

The main useful publications are:

  • https://build.fhir.org/security-labels.html -- The FHIR Specification has the core of a security labeling and ABAC built into FHIR Resource model, and the vocabulary and explainer are on this page.
  • https://hl7.org/fhir/uv/security-label-ds4p/ -- The Data Segmentation for Privacy (DS4P) is an Implementation Guide that further explains how to use this, and adds some extra capabilities that are far more advanced than any system will need for a long time
  • https://profiles.ihe.net/ITI/PCF/index.html -- The Privacy Consent on FHIR (PCF) is an Implementation Guide that explains Privacy Consent profiling, and has a section on Security Labeling (in Appendix P) and profiles of Consent for when using data labeling
  • https://www.drummondgroup.com/shift/ -- An organization that I participate, that is trying to advance the state of the art of Privacy protection using security labels. This group spans technology to policy, with a much larger focus on the policy part that HL7 and IHE can't specify.
The co-chair of CBCP - Mohammad Jafari - has been developing an open-source implementation. He has also worked on all the above with me, and demonstrated various implementation prototypes many times over the years.
I have a few blog articles, but most of that content has made it into the above publications.


Monday, August 12, 2024

FHIR Digital Signatures

There is a FHIR leadership desire to have the FHIR Data Type "Signature" normative in FHIR R6. The ballots leading to FHIR R6 will give us a chance to test with the community their interest in this Data Type being ready to be called Normative. However so far to date it has not received much attention.

The FHIR Signature Datatype is less concerning than all of Digital Signatures. That is to say that what would be declared normative in the FHIR Signature Datatype is the FHIR structure. The actual digital signature is a blob, that is ruled by other standards such as XML-Signature and JSON Signature. This makes the FHIR Signature Datatype not all that risky to make normative.

The FHIR Signature Datatype just exposes in easy to process FHIR structure some of the important elements of a signature. These elements are expressed as copies for convenience, and thus if you must trust these values, you must process the digital signature blob and pull the values from within that signature blob. This because the Signature Datatype is not cryptographically protected, but the Digital Signature blob is.

Electronic Signature

If you don't need the protection provided by a Digital Signature, but only need an Electronic Signature, then the FHIR Signature Datatype is all that you need. In this case you would not have a Digital Signature blob. You would be trusting your infrastructure, and the Signature datatype carries
  • What does the Signature mean
  • When was the Signature applied
  • Who Signed
  • Who was the signer signing on behalf of (delegated signature)
An Electronic Signature can be considered a legal signature in many jurisdictions and for many purposes. An Electronic Signature trusts the infrastructure, but is still important as it provides for tracking the act of signing in a standardized way.

An addition to the above simplified Electronic Signature, could be some kind of an image of an ink on paper or equivalent (like is common on kiosks asking for a scribbled signature on the keypad). This would be recorded in the Signature.data (aka blob) but the mime-type would indicate that it is a JPEG or PDF. Thus not cryptographically proven, just a rendering.

Digital Signature

Digital Signatures add a standards based cryptographic proof. Thus the technology does not need to be trusted, and does not need to be the same technology throughout the process. Cryptographic signatures use a Cryptographic Signature standard such as XML-Signature or JSON-Signature; to create a mathematical proof of the content at the time of signature, that can be validated at the time of use of the content.

Critical to a Digital Signature success:
  • Agreed Key Management
  • Agreed signature standard
  • Agreed timesource or timestamp signature use
  • Agreed encoding of the FHIR content that is signed (could be both forms if you need that)
  • Agreed elements of the FHIR content that must not change (and thus what elements are allowed to change) -- aka canonicalization (see later)
  • etc.
I'm not going to cover all of these. Just some of these that might be able to be nailed down by FHIR standard or by Implementation Guides that are purpose specific and/or regional specific.

Digital Signature Standard used

There are some profiles of XML-Signature and profiles of JSON Signature directly below the FHIR Signature Datatype. These are based on standards that are more broadly used that FHIR, so we have some confidence that they are good standards to recommend. These do emphasize "long-term" need for the Digital-Signature, this is a specifically recognizing that there may be months or years between the signing event and when that signature will need to be validated. When there is a "long-term" need, there is more requirements. With short-term, one can presume that the validator has the same kind of environment (such as time, revocation checking, pki access) as the signer. The use of short-term or long-term is a profiling possibility.

Canonicalization

Canonicalization is a very important part of Digital Signatures. The canonicalization algorithm assures that the validation is looking at the same elements in the same order with the same encoding as the signer used. The concept of canonicalization is more mature with XML, but is understood in JSON too.

Within that section we do point at some canonicalization rules that have been defined
Within these there are canonicalization for everything, the mostly static stuff, just narrative, etc... These were things you were asking about. We do have these.

Use-case specific Canonicalization

An important part of selecting a canonicalization algorithm is tied to your use-case. Specifically, what should be allowed to change over-time, while still proving that what the signer intended is preserved. An example given on a zulip thread is Medication Prescription. That which is prescribed is a subset of the elements of the MedicationRequest resource over time, as the MedicationRequest will be embellished to follow the prescription path and workflow. For example when the prescription is written, the prescriber would be only intending it as a prescription, and thus the MedicationRequest.status as active, yet when the MedicationRequest is exhausted it is marked complete. This status is not important to the prescription signature proof; so it should be excluded. So, this is a good example of a need for an Implementation Guide to cover prescription digital signature workflow, and define a canonicalization algorithm.

The signature blob would indicate the canonicalization algorithm used, so the validator can be checking properly. However, this means that the validator must agree with the use of that canonicalization algorithm, signature purpose, signing time, and signer.

Note that the signer and the signature-validator do need to agree on what form (json/xml) will be signed, and what canonicalization is needed. We do have the Signature datatype able to carry many signatures, for those environments that want to force a signer to sign many ways.

Signature Chaining with Provenance

Any exclusion from the signature is a potential problem. The whole resource should be signed. This can be done with some infrastructure. First, your server would need to be preserving history (versioning), thus the original signed resource is known not just by the id, but also the version.

Later, when the medication status changes from "active" to "complete", a version of the medication is created, AND new Provenance will be recorded for that change. This new Provenance expresses who/what/where/when/why that change was made. This new Provenance can state that prior to the change the signature was validated, and after the change was made this is the new signature.

How do you do this? You do it in the digital signature object itself so that there is cryptographic proof. In this way you are using digital signature standards to do what digital standards are designed to do. Thus, the Provenance.signature blob on an update covers both the original, and the updated.

You just need a policy for how the signature is derived when an UPDATE happens, vs when a CREATE happens. This is that policy that the signer and validator need to agree upon. The cryptographic proof is solid.

This method of using resource versioning, and Provenance signature transition proofs will work for any change. Even those pesky maintenance ones... provided the validator agrees that the maintenance signatures are acceptable... proving yet again that the validator must check everything. In this case, they must check all the Provenance.signature going back to the original, one by one.

Conclusion

The FHIR Signature Datatype is likely good enough to go into Normative when FHIR R6 happens. But I am sure there is still plenty of work to do on the Digital Signature front. What standard, what encoding, what canonicalization, what timestamp, etc. I think the important next steps are some high-value use-case specific Implementation Guides. I am not confident that there is any easy generic solution.

Archive of articles