Re-identification is just too damned easy sometimes – and if your state is selling your “de-identified” health information, don’t be reassured – be worried.
Here’s the abstract of a study by Latanya Sweeney:
Alice goes to the hospital in the United States. Her doctor and health insurance company know the details ― and often, so does her state government. Thirty-three of the states that know those details do not keep the information to themselves or limit their sharing to researchers [1]. Instead, they give away or sell a version of this information, and often they’re legally required to do so. The states turn to you as a computer scientist, IT specialist, policy expert, consultant, or privacy officer and ask, are the data anonymous? Can anyone be identified? Chances are you have no idea whether real-world risks exist. Here is how I matched patient names to publicly available health data sold by Washington State, and how the state responded. Doing this kind of experiment helps improve data-sharing practices, reduce privacy risks, and encourage the development of better technological solutions.
Results summary: The State of Washington sells a patient-level health dataset for $50. This publicly available dataset contained virtually all hospitalizations occurring in the state in a given year, including patient demographics, diagnoses, procedures, attending physician, hospital, a summary of charges, and how the bill was paid. It did not contain patient names or addresses (only five-digit ZIPs, which are U.S. postal codes). Newspaper stories printed in the state for the same year that contain the word “hospitalized” often included a patient’s name and residential information and explained why the person was hospitalized, such as a vehicle accident or assault. A close analysis of four archival news sources focused on Washington State activities from a single searchable news repository studied uniquely and exactly matched medical records in the state database for 35 of the 81 news stories found in 2011 (or 43 percent), thereby putting names to patient records. An independent third party verified that all of the matches were correct. In response to the re-identification of patients in its data, Washington State changed its way of sharing these data to create three levels of access. Anyone can download tabular summaries. Anyone can pay $50 and complete a data-use agreement to receive a redacted version of the data. However, access to all the fields provided prior to this experiment are now limited to applicants who qualify through a review process.
Reference:
Sweeney L. Only You, Your Doctor, and Many Others May Know. Technology Science. 2015092903. September 29, 2015. http://techscience.org/a/2015092903
The full paper is available for free download at that url.