About | HeinOnline Law Journal Library | HeinOnline Law Journal Library | HeinOnline

22 Stan. Tech. L. Rev. 1 (2019)
Privacy and Synthetic Datasets

handle is hein.journals/stantlr22 and id is 1 raw text is: 












           Privacy and Synthetic Datasets


         Steven   M.  Bellovin,*  Preetam K. Dutta,t and
                        Nathan Reitinger*


                  22 STAN.  TECH.   L. REV. 1 (2019)

                               ABSTRACT

    Sharing is a virtue, instilled in us from childhood Unfortunately, when
it comes to big data-i e., databases possessing the potential to usher in a
whole  new  world of scientific progress-the legal landscape is either too
greedy or too Laissez-Faire. Either all identifiers must be stripped from the
data, rendering it useless, or one-step remo ved personally identifiable in-
formation maybe  shared freely, freelysharingsecrets. In part this is a result
of the historic solution to database privacy, anonymization, a subtractive
technique incurringnot onlypoorprivacy  results, butalso lackluster utility
In anonymization 'sstead, differentialprivacyarose; itpro vides better, near-
perfect privacy, but is nonetheless subtractive in terms of utility
    Today, anothersolution isleaninginto the fore, synthetic data. Using the
magic  of machine learning, synthetic data offers a generative, additive ap-
proach-the   creation ofalmost-but-not-quite replica data. In fact, as we rec-
ommend,   synthetic data may  be  combined  with differential privacy to
achieve a best-of-both- worlds scenario. After unpacking the technical nu-
ances ofsynthetic data, we analyze itslegal implications, finding the familiar
ambiguity-privacystatutes   either overweigh (i.e., inappropriately exclude
data sharing) or downplay  (i.e., inappropriately permit data sharing) the
potentialforsynthetic data to leak secrets. We conclude by finding thatsyn-
thetic data is a valid, privacy-conscious alternative to raw data, but not a



* Steven M. Bellovin is the Percy K. and Vida L.W. Hudson Professor of Computer Science
at Columbia University, affiliate faculty at its law school, and a Visiting Scholar at the
Center for Law and Information Policy at Fordham University School of Law.
t Preetam Dutta is a doctoral student at the Department of Computer Science at Columbia
University.
t Nathan Reitinger is an attorney and a master's student at the Department of Computer
Science at Columbia University.


1

What Is HeinOnline?

HeinOnline is a subscription-based resource containing thousands of academic and legal journals from inception; complete coverage of government documents such as U.S. Statutes at Large, U.S. Code, Federal Register, Code of Federal Regulations, U.S. Reports, and much more. Documents are image-based, fully searchable PDFs with the authority of print combined with the accessibility of a user-friendly and powerful database. For more information, request a quote or trial for your organization below.



Short-term subscription options include 24 hours, 48 hours, or 1 week to HeinOnline.

Contact us for annual subscription options:

Already a HeinOnline Subscriber?

profiles profiles most