About | HeinOnline Law Journal Library | HeinOnline Law Journal Library | HeinOnline

1 [1] (March 23, 2023)

handle is hein.crs/govekzm0001 and id is 1 raw text is: 





a  Congressional Research Service
   informing the legislative debate since 1914


March  23, 2023


Digital Biology: Implications of Genetic Sequencing


Deoxyribonucleic acid (DNA) is the molecule that carries
genetic information of an organism. This genetic code is
composed  of nucleotide bases (A, T, C, and G). The
sequence of these bases encodes information that can, for
example make  a protein. A genome is the complete set of
DNA   in an organism. A gene sequencer reads DNA. Gene
synthesis technologies can write DNA. It is this ability to
both read and write DNA that researchers in the field of
engineering biology use to reprogram cellular systems at
the genetic level for a specific functional output. To do so,
researchers require data about what gene sequences to code
for, what functions those genes impact, and how those
genes are expressed in living organisms.

Many  emergent technologies, such as artificial intelligence
(AI), require large datasets, often referred to as big data.
Theoretically, as more data becomes available, the
capabilities of those technologies increase. This includes
applications that require the use of genetic sequence data.

Sequencing technologies have evolved rapidly, making it
possible to sequence entire genomes more efficiently and at
lower cost. Sequences are collected and stored in databases,
many  of which are publicly funded and freely accessible,
while others are privately held. The volume of genetic
sequence information in databases has grown as sequencing
technology has evolved. See Figure 1 and Figure 2.

Figure  1. Cost of DNA Sequencing   over Time






                              AA
        oo                   .                     e











Source: CRS analysis of data from Kris Wetterstrand, DNA
Sequencing Costs: Data from the National Human Genome Research
Institute Genome Sequencing Program, National Institutes of Health,
at http://www.genome.gov/sequencingcostsdata.
Notes: A megabase (Mb) is a unit of measurement for DNA. One
megabase = I million bases. For generating the Cost per Genome,
the assumed genome size was 3,000 Mb-


Figure 2. Growth  of Sequences  in the International
Nucleotide  Sequence  Database  Collaboration


      1O00





          i987  192«72~tOl              22    21   2022

Source: CRS analysis of data from the International Nucleotide
Sequence Database Collaboration (INSDC).
Notes: INSDC includes sequence data from the DNA Data Bank of
Japan; the European Nucleotide Archive; and GenBank, the National
Institutes of Health genetic sequence database.

Sequencing Life on Earth
Private companies and public research groups produce large
amounts of genetic sequence data. For example, the Broad
Institute of MIT and Harvard claims to produce roughly
500 terabases (500 trillion bases) of genomic data per
month. There is great potential value in the aggregate
volume  of genetic datasets that can be collectively mined to
discover and characterize relationships among genes.
In 2018, the National Institutes of Health launched the All
of Us precision medicine research program, which aims to
collect clinical, lifestyle, electronic health record, and
genomic  data from at least 1 million people to advance the
development of precision medicine. Since its launch, the
program has made  available about 100,000 whole genome
sequences. Genomic  data, along with other information,
including data about the communities where participants
live, is available via a cloud-based platform. All direct
identifiers are removed from the data, and other privacy
requirements have been put in place for researchers seeking
access, in order to protect participants' privacy. This
combination of data may help researchers better understand
how  genes can cause or influence diseases in the context of
other health determinants.

The Earth Microbiome  Project (EMP) is a global research
project to sequence global microbial life funded by public
and private entities. Its goal was to sequence 200,000
samples from different biomes to produce a global Gene

What Is HeinOnline?

HeinOnline is a subscription-based resource containing thousands of academic and legal journals from inception; complete coverage of government documents such as U.S. Statutes at Large, U.S. Code, Federal Register, Code of Federal Regulations, U.S. Reports, and much more. Documents are image-based, fully searchable PDFs with the authority of print combined with the accessibility of a user-friendly and powerful database. For more information, request a quote or trial for your organization below.



Short-term subscription options include 24 hours, 48 hours, or 1 week to HeinOnline.

Already a HeinOnline Subscriber?

profiles profiles most