The U.S. Defense Advanced Research Projects Agency (DARPA) this week named University of Massachusetts Amherst professor of computer science Gerome Miklau to lead a 4.5-year, $2.8 million grant to develop tools and techniques that enable the agency to build data management systems in which “private data may be used only for its intended purpose and no other.”
Miklau’s project is part of a national program dubbed by DARPA “Brandeis” in recognition of the U.S. Supreme Court Justice who in an 1890 essay expounded on the right to privacy. DARPA kicked it off this week with an official announcement during the researchers’ initial conference in Washington, D.C.
The work will be carried out in three, 18-month phases, Miklau says. He estimates that UMass Amherst will receive about $1.2 million, while collaborators Ashwin Machanavajjhala at Duke University will get about $1.1 million and Michael Hay at Colgate University approximately $470,000. At UMass Amherst, the project will support two doctoral students.
Miklau says, “Scientists and many others have a legitimate need to study data, but we worry about negative consequences for individuals. Medical epidemiology studies that look at group characteristics, for example, have high value, but we don’t want to reveal facts about individual medical records. Our team designs systems that operate between a trusted data collector, for example, a hospital or the Census Bureau, and a data analyst, so social and medical scientists and government agencies can use aggregate data without knowing all about each individual.”
The task is daunting but critically important, he adds. “It’s a difficult balance to negotiate. It’s not clear that balancing privacy and retaining the usefulness of data is completely achievable in all cases. There are many different types of data and goals for analysis.”
Methods for protecting private information fall into two broad categories: filtering data at the source or trusting the data user to diligently protect it. Both have serious challenges; editing data to hide individual identities can be undone using public information, and trusting users to protect data fails regularly, as seen in recent breaches at stores, health insurance companies and government agencies.
Miklau and colleagues plan to follow a guideline established by cryptographers nearly a decade ago known as differential privacy, which seeks to offer data analysts maximum accuracy in database queries at the same time providing minimal chance of identifying individual records. It offers more reliable protection than data anonymization, he notes.
“Our goal is to move differential privacy forward by improving some very sophisticated algorithms,” the database privacy expert says. “All our work will satisfy privacy, and we’ll try to build in the most usefulness we can. We hope to offer information that is close to the truth but it doesn’t violate anyone’s privacy.”
To accomplish this, he and colleagues will add statistical “noise” to query outputs such that the data in tables and spreadsheets are slightly distorted each time a user queries them.
Miklau explains, “We are going to deliver answers to analysts that are statistically close to what would be delivered if one person has opted out of the database. It’s a random perturbation, like flipping a coin every time you ask a question. The answer then is statistically close, but there is a randomness that helps protect the individual.”
Further, he and colleagues hope to craft a privacy tool that non-specialists can use. Miklau says, “Right now you have to be an expert to deploy these sophisticated algorithms, even to just explore them to see if they will fit your needs. Writing privacy programs currently requires an advanced degree in computer science or statistics; it’s very complicated. We’d like to change that.”
“We want to automate the parts that can be automated to give analyst users a simpler tool,” he adds. His team will attempt to design and build an invisible, behind-the-scenes system that will carry out the underlying engineering on its own, and construct the computational program each user asks for.