Sunday, October 18, 2009

How Private can Electronic Information Ever Be?

This article is yet another article that is discussing how a university researcher took some de-identified data, netflix in this case, and claim they have re-identified some of the customers.
By comparing the film preferences of some anonymous Netflix customers with personal profiles on imdb.com, the Internet movie database, the researchers said they easily re-identified some people because they had posted their e-mail addresses or other distinguishing information online. More
Their claim is only that they were able to identify some of the customers, customers who had them-selves posted public reviews of the movies they watched. Thus I am not sure this is really a good example of re-identification as the customer had already identified them-selves.

But it does point out that de-identified data can sometimes be re-identified with some effort, something I have already blogged about De-Identification is highly contextual. Thus any de-identified data set must still be considered sensitive, it is just less sensitive than it was before. How much effort it takes is dependent on how much data is left in the data set, and how public the individuals of interest are.

The tie to healthcare is that this article then draws a line between movies-watched to health-information, and some of the typical discussions around the sale of de-identified health-information.  There is some discussion of this, but it is clear this section of the article was intended to be google friendly, making sure they used every keyword they could come up with. The article even concludes with a quote from Deborah Peel, a rather interesting point that there is a lack of laws against the act of re-identification. It might be useful to have a law like this.