Paul Ohm recently put out an article where he makes the dramatic claim that de-identification has failed (see http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006). I have heard that argument before and the argument’s primary weakness is amplified in this article – therefore I feel compelled to comment.
Paul Ohm’s argument about the failure of anonymization is based on evidence that does not actually support his point. Therefore, his overall argument about de-identification is very questionable. Below I will explain why.
The key point is that existing re-identifications successes demonstrate the de-identification does not work. This, of course, assumes that the datasets that were re-identified was properly anonymized – it was not. One example that Ohm uses to make his case is the insurance database released in Massachusetts more than a decade ago (pre-HIPAA). That database was not properly anonymized and no professional working in this field would say that that was a properly anonymized database. The Group Insurance Commission did a lousy job. The second example is AOL – which again is an example of a database that was not properly anonymized. AOL did a lousy job in anonymizing their database. In fact the examples he cites were cases where the custodian did not use existing re-identification risk measurement techniques and did not use de-identification techniques that are available in the literature. We know how to de-identify datasets properly (up to a pre-specified threshold) and in none of those examples was this done. There is no example of a database that has been properly de-identified being re-identified.
Read more on EHIP