Monday, April 12, 2010

Anonymizing patient records for genomics

This article in the Journal NATURE points to a nice Risk Analysis and Mitigation plan to allow researchers access to genetic information and the diagnosis codes known for the patient. They have even added a mitigation to assure that small populations in diagnosis code pools don't happen through low thresholds and grouping. 
To solve this problem, the new method allows researchers to set two parameters: the minimum number of patients (k) that should have the same set of codes, and a 'utility policy' which specifies how codes should be linked in the anonymized data. More

I really like the approach taken as it takes a look at what the minimal information desired and determines through a risk assessment how to achieve that goal. From my read they realized that they simply needed to know what the known diagnosis values were, they didn't need demographics or other indirect identifiers. At least that is all they say they are taking in the article.

I like this approach because it follows nicely the approach that I outlined in De-Identification is highly contextual. I hope that the ONC when they test re-identification of protected data looks carefully at this output, and process they used to come to this conclusion. I do not expect that their output is reusable because De-Identification is highly contextual.

Surely more investigation needs to be done, but I like that this group was willing to think critically about what the minimal information that they needed for success.