There are some data elements that are direct identifiers (e.g. Name, Address, Phone Number, SSN) and these are simply removed. There are some data elements that are identifiers (e.g. insurance, payment) that are completely unimportant to the specific secondary-use and are simply removed. Sometimes the secondary-use needs to be able to have some way to link data over time (For example to determine if a treatment given one year is still effective years later). If this is needed then a pseudonym can be applied to the data, but applying a pseudonym brings in risk so must be done selectively and with purpose. A pseudonym is an identifier that is assigned consistently to the data, but has no apparent relationship to the original patient. Sometimes these pseudonyms can be assigned in a non-reversible way, yet other times the secondary-use have potential benefits to the patient that the risk of a reversible pseudonym is acceptable (e.g. after a clinical trial the patient is often informed of their previously blinded treatment and given recommendations). The pseudonym is often a randomly assigned value that is kept in a secured lookup table (See HITSP T24) that is very carefully protected (e.g. only the direct-care provider has access).
There are health data elements that are structured and coded and if these are needed by the secondary-use then these are generally left in place. Even with structured and coded values there needs to be a reason why the secondary-use requires the data as some structured and coded values can also be used to identify populations. The best case is to have the resulting data-set with multiple subjects statistically examined to identify if there is any segmentation that is too small in the data set. There are data elements that are NOT structured or coded (e.g. text comments) these are always simply removed as there is no way to be sure the data doesn't include identifiers.
- Latanya Sweeney, Ph.D.
January 2007; Interesting article by Bruce Schneier Why Anonymity doesn't work - Anonymity and the Netflix Dataset with some really good pointers to other articles.
ISO/TS 25237:2008 - Health informatics -- Pseudonymization
DICOM Supplement 55 is a great reference for DICOM objects. This text has been incorporated into the DICOM standard, but I provide reference to the supplement as it is easier to point at and is self contained.