About | HeinOnline Law Journal Library | HeinOnline Law Journal Library | HeinOnline

110 Cornell L. Rev. 1 (2025)

handle is hein.journals/clqv110 and id is 1 raw text is: 








   SYNTHETIC DATA AND THE FUTURE OF Al


                             Peter Leet



        Thefuture  of artificial intelligence (Al) is synthetic. Several
    of the most  prominent  technical and legal challenges  of AI
    derivefrom  the need to amass huge amounts of real-world data
    to train machine learning (ML) models.  Collecting such real-
    world  data can be  highly difficult and can threaten privacy,
    introduce bias  in automated  decision making,  and  infringe
    copyrights on a massive scale. This Article explores the emer-
    gence  of a seemingly paradoxical technical creation that can
    mitigate-though   not completely  eliminate-these  concerns:
    synthetic data. Increasingly, data scientists are using simu-
    lated driving environments, fabricated medical  records, fake
    images, and  other forms of synthetic data to train ML models.
    Artificial data, in other words, is training artificial intelligence.
    Synthetic data offers a host of technical and legal benefits; it
    promises  to radically decrease the cost of obtaining data, side-
    step privacy  issues, reduce automated   discrimination, and
    avoid copyright infringement. Alongside such promises,  how-
    ever, synthetic data offers perils as well. Deficiencies in the
    development  and deployment  of synthetic data can exacerbate
    the dangers of AI and cause significant social harm.

        In  light of the enormous value  and  importance of syn-
    thetic data, this Article sketches the contours of an innovation
    ecosystem   to promote  its robust and  responsible develop-
    ment.   It identifies three objectives that should guide legal
    and  policy measures  shaping the creation of synthetic data:
    provisioning, disclosure, and democratization.  Ideally, such
    an ecosystem  should incentivize the generation of high-quality


    t  Martin Luther King Jr. Professor of Law and Director, Center for Innova-
tion, Law, and Society, UC Davis School of Law. I would like to thank Elizabeth
Joh, Mark Lemley, Sarah Polcz, and workshop participants at the Lewis & Clark
Fall Forum, the UC Davis-Jindal Global Law School symposium, the Transat-
lantic Tech Exchange Roundtable hosted by the German Marshall Fund, the UC
Davis School of Law Schmooze, the Intellectual Property Scholars Conference at
UC Berkeley School of Law, and the University of Texas School of Law for very
helpful comments. I would also like to thank Dean Kevin Johnson and Senior
Associate Dean Afra Afsharipour for providing generous institutional support for
this project. This research was supported by a grant from the UC Davis Academic
Senate Committee on Research. My thanks as well to McKenzie Deutsch and the
UC Davis School of Law Library staff for excellent research assistance. I would
also like to thank the outstanding editors of the Cornell Law Review.


1

What Is HeinOnline?

HeinOnline is a subscription-based resource containing thousands of academic and legal journals from inception; complete coverage of government documents such as U.S. Statutes at Large, U.S. Code, Federal Register, Code of Federal Regulations, U.S. Reports, and much more. Documents are image-based, fully searchable PDFs with the authority of print combined with the accessibility of a user-friendly and powerful database. For more information, request a quote or trial for your organization below.



Short-term subscription options include 24 hours, 48 hours, or 1 week to HeinOnline.

Contact us for annual subscription options:

Already a HeinOnline Subscriber?

profiles profiles most