Thursday, October 19, 2023

Teaching an AI/ML/LLM should be a distinct PurposeOfUse

I have been thinking about a specific need around AI/ML. That is, that when data are being requested/downloaded for the intent of feeding to a Machine Learning; this action should be distinguished from a request for Treatment.



This came up on a TEFCA/QTE call this week, where a question was posed as to how a patient could express that they wanted to forbid their data from being used to teach Machine Learning.

This use-case would need the above ability to understand when a data request could result in the data being used for Machine Learning. Note that data requests are encouraged to include ALL purposeOfUse values for which the data would be used. So in the USA, this would include Treatment, Payment, and Operations. (Note that it is known in the existing nationwide health exchange that many participants can't handle more than one, and thus in that exchange Treatment is presumed to be TPO. I don't like this, but reality is often less than perfect).

Thus, I think we need a specific PurposeOfUse to indicate these requests intend to be used for Machine Learning. I think that this PurposeOfUse would logically be a sub-concept of the existing Healthcare Operations. I argue this because it clearly is not about Treatment, or Payment; that is not to say that the resulting algorithms may not be used for Treatment or Payment; but the reason to ask/get data at this point in the data flow is to feed the Machine Learning. It might be argued that the Machine Learning Training PurposeOfUse would possibly be a new top level PurposeOfUse, but I don't think that is correct either as much of the data captured already today is presumed to be available for Machine Learning (best-practice is that it is consumed in de-identified form, but this topic is not about de-identification or not).

It is possible that we might need a new Obligation/Refrain code as well (thanks to Kathleen for pointing this out). Thus data could be communicated with an attached Obligation to not use it for Machine Learning Training (seems like a refrain). I don't mind putting this code in, but at this time Obligation/Refrain codes are not used, where PurposeOfUse is emerging as being used.

So a PurposeOfUse code specific to Machine Learning
  1. can be used in a response (bundle) to indicate positively the intended purposeOfUse allowed
  2. can be used in a request to indicate desired purposeOfUse -- which could be rejected if the responder disagrees
  3. can be expressed in a security token to indicate authorized PurposeOfUse
  4. can be used in policy rules to indicate permit/deny of that specific policy. In this way a data-use-agreement could state that the high level operations purpose of use is intended to enable all sub-concepts; and it could be used to indicate that the high level operations purpose of use is intended to ONLY speak to some sub-concepts such as eliminating the Machine Learning as being allowed or requested.
  5. can be used in a Consent, where allowed, to allow an individual patient to express rules specific to that purposeOfUse.
  6. can be placed on a dataset that has been properly gathered with that purposeOfUse
  7. can be placed on a data item within a dataset to indicate that the data has been properly gathered with that purposeOfUse 
    • note tagging the dataset is more common, as replicating the tag millions of times over at the data resource level is not adding value, but I express this one as a dataset might be a mixture of some data that was collected with authorization and some that were not. this would require tagging each data resource.
I would like to get wider consensus on this(these) concepts before we add a code. This consensus would also help inform what it is called, what it is described as, where it is placed, etc. I am confident that we have healthcare standards infrastructure to support this new use-case.

Friday, October 13, 2023

Test Interactions in a Production Environment

I covered how to include Test data in Production Environments using the HTEST tag. That article explained how data that is not real patient data, that is to say 'test' data, would be tagged with HTEST. This is a clear indication of what data in the Production Environment is test data vs not test data. Thus enabling clients to test while connected to the Production Environment, vs having a second environment just for testing. Where having a second environment may still be useful, but the switching from test server to production server can result in errors, usually configuration errors. So being able to do some form of testing in the Production Environment is useful, and testing with test data does not present Privacy concerns.

What that article did not cover is how a client indicates that it knows that it is testing. This has been part of some discussions lately.

Test Patients

First, there could simply be a list of well-known test patients. A request for a test patient is clearly a "test". However, agreeing on a list of well-known test patients is hard at a very large scale, like a nationwide exchange. So, this is possibly of limited use. It still should be attempted, even if other methos are also used.

Custom headers

Some have proposed that additional http headers be created for this 'testing' purpose. My worry about this is that the intention of testing in production is to get as close to exactly what would happen in production as one can get. Using well-known test patients is the least change. Adding headers seems to be very extreme. Further everyone would need to add to their infrastructure knowledge of these headers.

Getting everyone to add custom headers to indicate that a request is 'different' is not likely to succeed. Further, as custom headers it is less obvious when someone forgets to turn off testing mode.

PurposeOfUse

Best case is that the HTEST be used as a PurposeOfUse. HTEST does exist in the PurposeOfUse vocabulary, so it is ready to be used as PurposeOfUse in a request. In the previous article, I don't make mention that HTEST is a PurposeOfUse, but that was for simplicity of explaining things in that article. Data should be tagged with the PurposeOfUse under which it was collected, and test data is collected for the PurposeOfUse of ... testing.

So, add to your requested PurposeOfUse HTEST, and you are now signaling your testing purpose to the authorization and server environments. The big benefit of this is that it is clearly part of the security infrastructure, and secondly it is just one more PurposeOfUse.

Use of PurposeOfUse also is tracked into AuditEvents naturally as it is part of the security layer.

Note that HTEST is ontologically within HOPERAT. This is because it is for the purposes of healthcare operations that you are testing. Testing is not for the purpose of treatment; it is explicitly not treating as it is using test patients. Testing is clearly not payment, one would hope that the test patients will not be billed (well testing billing end-to-end flows would be useful, but it is still testing).

And HTEST can be combined with the normal treatment, payment, and operations so that one can cover flows that would be distinguished. Even combined with BTG (Break the Glass), yes Break the glass is a PurposeOfUse.

PurposeOfUse concerns

There are issues that have been brought forward on this use of PurposeOfUse:

1. Not everyone supports multiple PurposeOfUse. It is understood that some systems were designed back in the day when we thought that only "TREAT" was needed. These systems are really not using PurposeOfUse properly. One really should be asking for Treatment, Payment, and Operations; as it is very likely that any data returned will be potentially used for those other purposes. 

2. SMART App Launch does not include PurposeOfUse. Well, this is a problem that I tried to point out at the first and second revision of the SMART specification. Their failure to recognize the importance of PurposeOfUse is simply wrong. The argument that I heard is that the PurposeOfUse is confirmed during the User Experience at application authorization; and from that point forward just implied by the app client configuration. The SMART specification continues to need formal security modeling.

Note that UDAP, IHE-XUA, and IHE-IUA do include PurposeOfUse... and recognize that PurposeOfUse is multiple values.

Conclusion

Arguments that changing PurposeOfUse is hard really are not compelling to me when the alternative is to invent something totally new. If a change needs to be made, then make that change using standards. There is no good justification to ignore legitimate standards.

PurposeOfUse concept has not been tested well.. adding in HTEST will help get PurposeOfUse the attention it needs. The order should not matter; TPO, OPT, POT, etc. Including adding purpose of use codes that clearly should not be accepted, like "foobar".

Thursday, October 5, 2023

California Bill 352 - aka sex and gender sensitivity

The following question(s) were asked today, and I figure my response is informative to a broader audience.
Has anyone implemented anything pertaining to this?


Prevent the disclosure, access, transfer, transmission, or processing of medical information related to gender affirming care, abortion and abortion-related services, and contraception to persons and entities outside of this state in accordance to this part.https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202320240AB352

I'm trying to think through how to go about it and it seems challenging. These are some initial thoughts.Resources should use security labels. Which ones?
There should exist valuesets that define codes which are considered sensitive.. it doesn't seem like any exist
Access by a user/practitioner might not be as challenging because we may be able to determine the state in which the practitioner is practicing via an OIDC claim or a business identifier.. not totally sure about that
Access by a system is potentially more difficult... client registrations would need to contain some assurance about the locale of the application. Some may not segment application registrations in such a way
Access by a patient seems like it would be straightforward, but perhaps not. The patient has the unfettered right to see the data. But, what if a fhir-based app they use has server/architecture outside of ca?
Perhaps a scope could be introduced: "i-am-outside-of-ca", or "i-am-inside-ca"
Must we perform partial hydration of resources? For example, if a portion of the resource is sensitive. If the Patient resource cannot be accessed, the rest of the ecosystem can loose meaning
Documents, Binaries, eg CDAs - must generate a version "for CA" and another for "not CA"
HIEs, QHINs, ...


Mohammad Jafari and I have done work on this in DS4P and IHE-PCF; we continue to refine this as part of our technical advisory participation in SHIFT. -- https://www.drummondgroup.com/shift/

Grahame is correct that GENDER would be the most specific sensitivity policy code. This could be the code used to tag data that is sensitive to GENDER, although this is actually a policy code. A more general approach would be to use the SEX code which is a more proper sensitivity indicator. Whichever of these codes you use to tag data, would then be used in the Consent resource to indicate how those tagged data are to be protected.

In IHE-PCF I have a use-case that shows how this is done using the advanced option
https://profiles.ihe.net/ITI/PCF/content.html#3584-advanced

As to the question on "There should exist a valueset that define codes which are considered sensitive..."
I will reference you to the Appendix P in the IHE-PCF
https://profiles.ihe.net/ITI/PCF/ch-P.html#p5-security-labeling-service-models

First, there was one attempt at creating these value-sets... they were done by SAMSA, they are quite old and are not currently maintained. The SHIFT project is looking for some proper organization and governance to take on the task of updating and maintaining these.

However, no matter how perfect a value-set is; it will always need local tweaking to your organizations use of codes, and will not address use-cases of second or third order relationships.

On blinding part of a resource... you are correct that this is dangerous area. It is an area that we chose in IHE to leave to later versions of the IHE-PCF; we needed basic profiling done first.... Partial resource redaction is likely to be a very specific thing, I expect that partial redaction will first appear in the Patient resource itself. That is to say that the gender extensions may be something that blinding might be seen as still useful for specific users (e.g. the food service doesn't need to see this). When doing blinding like this, there is also the policy on how to indicate that the resource itself is not complete, where we do have the SUBSETTED tag that is used for use-cases like _summary; but in the privacy blinding to indicate that data has been redacted is to send signal to recipient... POLICY is going to be the hard part, the technical implementation can be achieved once a policy is defined.

On the client is within CA or not... WOW, very dangerous territory.. I suspect that this is more risky than even sub resource redaction.

Note that the Gender Harmony workgroup did have some discussion on this, and also chose to defer solving it at this time.
https://build.fhir.org/ig/HL7/fhir-gender-harmony/#out-of-scope

Conclusion

The fact that this is not well described has nothing to do with the technical capability, we have that ready. The main problem is defining comprehensive policies and addressing the risks to privacy, security, safety, and effectiveness. 

Very likely that this bill could be used to get a bunch of people together to define policy and profile how to make it work. Forcing functions are always critical to change.

I think that the SHIFT workgroup is in the best position to address this. They have a very broad participation that can best see the whole picture... and I am on that project.