Thursday, February 12, 2015

Is it really possible to anonymize data?

De-Identification is a Process, and one that can be done right or WRONG!

The argument 'for' or 'against' use of de-identification is a Red Herring. What the arguments are actually about is the point that a Process can be done badly. There should be no doubt that any Process can be done badly. Even a simple process like filling a glass of water can be done badly, even resulting in human harm.

The big misunderstanding is that De-Identification is an absolute. It is not, it is a Process used to lower 'risk' of re-identification. As a process it can be done badly. As a domain of 'risk' it can't achieve zero-risk, except to end up at the null-set. 

The standards in this space are clear about this risk factor. It is absolutists that insist on viewing de-identification as an absolute, that are causing the argument. This oversimplification is just as alarmist.

As Yogi Berra is said to say: "In theory, there is no difference between theory and practice. But, in practice, there is."  The Practice of applying de-identification has occasional failures, like all 'risk' domains. No one hears about the  times when de-identification is done successfully.All the failures are held up to the light and used to show that the solution fails.

This doesn't mean I am an absolutist that De-Identification is the solution. My perspective is that it is a "Tool". As all tools and processes; they must be used properly.

It was pointed out to me, by the awesome Gila Pyke, that I failed to remind the reader that De-Identification is just ONE tool in a mature risk management process. As a risk management tool, and as stated above, the risk will not be brought to zero; as such the resulting data-set might still require protection. It is true that too often one presumes that a data-set that has been de-identified can be globally published. This is true if that was the target of the risk management, and that the risk to re-identification has truly been reduced to the level necessary for global publication. This is one of the misunderstandings that also results in the outlined failures. This is also a fundamental misunderstanding, failing, of the HIPAA de-identification clause.

De-Identification, Anonymization, Pseudonymization