With the announcement of iOS 10 at WWDC on Monday, Apple mentioned its adoption of "Differential Privacy" – a mathematical technique that allows the company to collect user information that helps it enhance its apps and services while keeping the data of individual users private.
During the company's keynote address, Senior VP of software engineering Craig Federighi – a vocal advocate of personal privacy – summarized the concept in the following way:
We believe you should have great features and great privacy. Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.
Wired has now published an article on the subject that lays out in clearer detail some of the practical implications and potential pitfalls of Apple's latest statistical data gathering technique.
Differential privacy, translated from Apple-speak, is the statistical science of trying to learn as much as possible about a group while learning as little as possible about any individual in it. With differential privacy, Apple can collect and store its users' data in a format that lets it glean useful notions about what people do, say, like and want. But it can't extract anything about a single, specific one of those people that might represent a privacy violation. And neither, in theory, could hackers or intelligence agencies.
Wired notes that the technique claims to have a mathematically "provable guarantee" that its generated data sets are impervious to outside attempts to de-anonymize the information. It does however caution that such complicated techniques rely on the rigor of their implementation to retain any guarantee of privacy during transmission.
You can read the full article on the subject of differential privacy here.
Note: Due to the political nature of the discussion regarding this topic, the discussion thread is located in our Politics, Religion, Social Issues forum. All forum members and site visitors are welcome to read and follow the thread, but posting is limited to forum members with at least 100 posts.
Top Rated Comments
Response to quoted text: while Apple is, without a doubt, anonymizing all identifiers in the data (i.e. your name, address, and other contact info is 100% certain to have been stripped), this does not describe what differential privacy does (rather, anonymizing data is a prerequisite for all practical data privacy methodology). Differential privacy provides a probabilistic guarantee on the data-masking algorithm that, in layman's terms, if you have two datasets that differ only for one user, the output of the algorithm on both datasets are indistinguishable in some precise sense. There are various ways to construct this algorithm so that is differentially private.
The take-away is (and I'm addressing the other commenter): no, even if you are absolutely unique in the dataset, differential privacy guarantees you will be entirely indistinguishable. In their words, it is a guarantee that any attacker will never be able to verify or determine the true value for any entry in the protected data (e.g. the value of any variable for any particular individual).
Many argue that this concept, although it is an interesting mathematical tool, is too strong for use in practice, in that it cannot be practically implemented in any real-world scenario without removing all useful signal in the data. I can't name any companies or even government agencies that have any claims that their data are algorithmically protected with differentially private guarantees. What Apple has done here is truly revolutionary and I sincerely doubt any of its competitors are close to being able to do what they're doing today. Maybe in a decade or two?
[doublepost=1465909213][/doublepost] See my other reply for a more detailed response. In particular, differential privacy is a guarantee that no matter how any attacker aggregates the data, there is no way to pick out individual values for any of the variables collected, for any user.
Welcome to the Internet . I'm yet to find a forum where it's just positive news....