Mark D. Metrey of Hudson Cook LLP writes:
As artificial intelligence (AI) systems grow more advanced and data-driven, organizations are turning to synthetic and de-identified data to power model development while purportedly reducing legal risk. The assumption is simple: if the data no longer identifies a specific individual, it falls outside the reach of stringent privacy laws and related obligations. But this assumption is increasingly being tested—not just by evolving statutes and regulatory enforcement, but by advances in re-identification techniques and shifting societal expectations about data ethics.
This article examines the complex and evolving legal terrain surrounding the use of synthetic and de-identified data in AI training. It analyzes the viability of “privacy by de-identification” strategies in light of re-identification risk, explores how state privacy laws like the California Consumer Privacy Act (CCPA) and its successors address these techniques, and highlights underappreciated contractual, ethical, and regulatory risks that organizations should consider when using synthetic data.
Read more at The National Law Review.