The story of how Massachusett Governor William Weld’s de-identified medical records were quickly re-identified in 1997 by then-graduate student Latanya Sweeney is now legendary in discussions of the risks of sharing “anonymized” or “de-identified” health records that might foster research. In an article on Scientific American, Erica Klarreich describes a mathematical technique called “differential privacy” that could give researchers access to vast repositories of personal data while meeting a high standard for privacy protection:
A differentially private data release algorithm allows researchers to ask practically any question about a database of sensitive information and provides answers that have been “blurred” so that they reveal virtually nothing about any individual’s data — not even whether the individual was in the database in the first place.
“The idea is that if you allow your data to be used, you incur no additional risk,” said Cynthia Dwork of Microsoft Research Silicon Valley. Dwork introduced the concept of differential privacy in 2005, along with McSherry, Kobbi Nissim of Israel’s Ben-Gurion University and Adam Smith of Pennsylvania State University.
Differential privacy preserves “plausible deniability,” as Avrim Blum of Carnegie Mellon University likes to put it. “If I want to pretend that my private information is different from what it really is, I can,” he said. “The output of a differentially private mechanism is going to be almost exactly the same whether it includes the real me or the pretend me, so I can plausibly deny anything I want.”
Read more on Scientific American for a description of how this works and programs that are being developed to help researchers implement this approach.