I am not an Apple fanboy, I think they get far too much credit and buzz. I don't think this is their fault, they are masters of marketing, and they never claim to have created something no-one else has. What they do very well is take technology that is just slightly behind bleeding edge, letting someone else get most of the cuts and blood, and the critical thing, they use it in a way that provides really good value. It is this that I am very much a fanboy for apple, they know how to pivot the work of others into a bigger value for their customers.
Differential Privacy is another case of this value-adding exceptional execution. Specifically...
From the information that is known, it appears they understand Privacy Principles and De-Identification. They have identified a small number of use-cases where they could add value if they could get data about how people are using their product and the internet; yet they want to respect Privacy. So they find this concept in Differential Privacy, apply it in a distributed way so that they can gather trends without gathering specific actions.
Key here is that they have a very well defined set of use-cases. This is the most important step of any De-Identification (the broader process). If you don't have well defined use-cases, then you can not make the risk tradeoffs. It is only through really understanding what your use-cases need, that you can determine.
More specifically you must understand what your use-cases DO NOT need. In the Apple case, they don't need the identity of the user or phone; they don't even want a pseydonym for them. The have made other very important tradeoff decisions on what they DO NOT need. They show great restraint at eliminating all data they simply don't need. This is driven by good process governance, and a strong understanding of your use-cases.
Then for the data they do need they look at what kind of fuzziness their use-case can survive. This is were Differential Privacy comes in. They determine that they can take some noise in their data. It is this noise that hides true identity in that data. Differential Privacy is a de-identification mechanism that adds random noise to some data. This random noise distorts the data, but over the whole data-set his random noise doesn't detour the trends. It might hide small trends, but it doesn't hide large trends. That is a large enough 'signal' (trend across the whole population) will still be visible.
And just to prove that Apple understands all of the Principles of Privacy; they make this data gathering, with all the protections they have engineered into it, something the end-user (lay person) gets to choose if they want to report. This not only is a fantastic recognition of the complete picture of Privacy Principles, but addresses something that many big-data projects totally fail at. These big-data projects probably have done as good of a 'technical' job, but they fail on being Transparent, and providing Choice; thus they fail at the "Perception" risk.
There are many questions open, like will they purge the data on a regular basis so as to prevent the data from building big-enough to become identifiable? With their intended use-cases, this would also be a very useful risk-reduction without loss of function.
The BEST blog article on Differential Privacy and how Apple likely is applying it comes from Mathew Green - on his blog A Few Thoughts on Cryptographic Engineering - in the article What is Differential Privacy?
My articles on De-Identification, Anonymization, Pseudonymization
- De-Identification for Family Planning
- FHIR does not need a deidentify=true parameter
- NIST seeks comments on De-Identification
- Is it really possible to anonymize data?
- PCAST - Big Data: A Technological Perspective
- De-Identifying free-text
- De-Identification: process reduce risk of identification of entries in a data-set
- Fake it properly
- De-Identification - Data Chemistry
- Guidance Regarding Methods for De-identification of Health Information
- The Emperor has no clothes - De-Identification and User Provisioning
- De-Identification is highly contextual
- Redaction and Clinical Documentation