Tuesday, January 12, 2010

ONC to test re-identification of protected data

I am very interested in this effort by ONC, but not expecting much from it given the scope. This is a needed first step that should continue to analyze the topic.

There has been many efforts to define how to de-identify data, including the famed 18 identifiers in HIPAA for healthcare data. HITSP has identified a set of anonymization constructs (C25, C87, and C88) and a construct for creating/managing pseudonyms (T24). These all were developed using the model defined in ISO Health Informatics -- Pseudonymization, Technical Specification ISO/TS 25237. This is a globally defined standard that brings together many of the the best thinking on the topic and many of the best practices. I have tried in all work that I have had touch with to be very clear that de-identification can only lower the risk, it can not remove the risk. The best use of de-identification is to have a very specific intended use and to remove all attributes that are not necessary for that intended use. I have outlined much of this problem in prior blog post:
De-Identification is highly contextual
There have been many proofs that this kind of data is re-identifiable in some capacity through the cross-correlation with other publicly available databases. Most of these have identified a well-known individual and found their data in the data set. They have not attempted to re-identify a complete data-set. This isn't that bad of a simplification as the risks of re-identification usually stem from an attacker wanting to know something about a specific individual. Latanya Sweeney, Ph.D, is a well known luminary on the topic and the good news is that she has been brought into HIT-Policy.

What has been missing is a quantitative analysis that would identify some scale for just how easy or hard this re-identification is, or how completely the re-identification is.We know just how long it takes to 'crack' encryption algorithms like DES and AES. Having a quantifiable rating for de-identification algorithms would be very helpful.
