To solve this problem, the new method allows researchers to set two parameters: the minimum number of patients (k) that should have the same set of codes, and a 'utility policy' which specifies how codes should be linked in the anonymized data. More
I really like the approach taken as it takes a look at what the minimal information desired and determines through a risk assessment how to achieve that goal. From my read they realized that they simply needed to know what the known diagnosis values were, they didn't need demographics or other indirect identifiers. At least that is all they say they are taking in the article.
I like this approach because it follows nicely the approach that I outlined in De-Identification is highly contextual. I hope that the ONC when they test re-identification of protected data looks carefully at this output, and process they used to come to this conclusion. I do not expect that their output is reusable because De-Identification is highly contextual.
Surely more investigation needs to be done, but I like that this group was willing to think critically about what the minimal information that they needed for success.