Tuesday, August 30, 2011

Proposal for confidentialityCode vocabulary

I have been complaining about the definitions of the confidentiality codes both in the active HL7 development and in my past posts: 
My main reason for not simply providing my own definitions was to allow for discussion on my concern that we have conflated the confidentialityCode meaning with consent status. My point is that the current consent status can affect all of the confidentialityCodes, not just the R or V.

I figured we should learn from the experience of the military data classification, a system that deals with very sensitive data in a different way. (Note that we are already ahead of the military in that we have a global vocabulary, take a look at the mapping mess that is military data classification).  In the case of the military classifications they use relative “harm to the country” as their measure. Yes this is different than healthcare information, but I think we can see that “harm to the patient” is what we have been discussing. Especially if we look at ‘harm’ in a broad sense that includes 
  • reputation damage, 
  • emotional damage, 
  • family relationship damage, 
  • financial damage, and 
  • physical damage (safety). 
(possibly more, I haven’t fully described patient harm in this context yet).

I think it is very legitimate to include in our definitions contemporary examples from well-known countries policies. Such as in the USA with HIPAA vs 42 CFR Part 2.

So, here is a potential draft using the existing codes, just new definitions
  • U – Unrestricted – No specific patient is identified and thus there is no patient harm risk
  • L – Low – Data has been de-identified and there are mitigating circumstances that prevent re-identification such that there is remote harm risk to the patient if the data were exposed. The data however still requires protection from exposure outside intended use.
  • M – Moderate – Data are identifiable but consists of modest clinical information that would present moderate harm risk to the patient if the data were exposed. Example include an emergency-data-set made up of non-sensitive problems, allergies, and medications.
  • N – Normal – Data are identifiable and of typical health information that would present typical harm risk to the patient if the data are exposed. This code is used for the majority of clinical information. Examples include what HIPAA identifies as Protected Health Information.
  • R – Restricted – Data are identifiable and of an especially sensitive nature that would present a high risk to the patient if the data are exposed. Examples include the data topics identified in USA 42 CFR Part 2 – “CONFIDENTIALITY OF ALCOHOL AND DRUG ABUSE PATIENT RECORDS”.
  • V – Very Restricted – Data are identifiable and of extreme sensitive nature that would present a very high risk to the patient if the data are exposed. Data in classified Very Restrictive should be kept in the highest confidence.
Just a start, feel free to take, leave, or update


  1. Glen and John,
    You guys, not me, are the experts. I can see benefit in both proposals, but wonder about counting on a null value, because that leaves the ambiguity about whether it really means "U" (unrestricted) vs. whether the system just failed to provide a value (which would not allow us to draw conclusions, and might suggest the opposite, i.e., assume "H" or "V" just to be safe). Wouldn't it be best to have an explicit code for unrestricted? Lots of discussion just occurred on a similar topic re CDA consolidation. If null is allowed, would it be like HL7 "null flavor" or just a "blank?"

  2. The military experience can help us understand the limitations on employing classification (sensitivity) levels.

    Assigning a label (Top Secret, Secret, …) describes both an information flow policy (sensitive data cannot flow to low-level users or computer systems ) and a technical partitioning that makes hacking and exfiltration difficult (physically separating sensitive data from systems accessible to users with lower clearances. Neither of these treatments is suitable for health care (or most other conventional applications).

    • The information flow restrictions treat all users cleared for a particular level as the same. For example, an appointments clerk, a psychiatrist, and an obstetrician would all be authorized to see information that some patients consider highly sensitive (for an abuse victim: home phone, therapy notes, and reproductive history). Of course, each should really see some of sensitive items, not others. Levels do not represent this well – one ideally should decide based on whatever is known about roles + treatment relationships.

    • The military partitions systems, with effectively separate computers and communication networks at each level. This makes it hard for either a hacker or a malicious insider to exfiltrate data. But such partitioning is costly and inconceivable for health care, where a single network holds all the data and files can be emailed anywhere. Thus, the exfiltration protection, a major motivation in the military IT systems, does not apply.

    The case where the detailed metadata is itself too sensitive to share will be discussed in a future post. Levels can help there, but are a blunt instrument.

  3. Arnon, Thanks, this is very helpful.

    Note that I am not proposing that the sensitivity classification be the only thing used to determine access control rules. I very much expect that typical Role-Based-Access-Control kind of rules would be in place. That is that one would only be allowed to see data that is applicable to their role. The sensitivity classification is another vector in addition to that.