Technical Seminar On
Prepared and Presented by
Neeraj Chowdhary
15B81A04N9
WHAT IS DNA DIGITAL DATA STORAGE ?
 DNA digital data storage is defined as
the process of encoding and decoding
binary data to and from synthesized DNA
strands.
 Uses artificial DNA made using
commercially available oligonucleotide
synthesis
Figure 1 An artistic rendering of DNA storage
WHY IT IS NEEDED?
• New Generation Computers and High Speed Internet have gained
popularity in the recent years.
• But when it comes to handling big data, the data of a corporation
or of the world as a whole, the present data storage technology
comes nowhere near to be able to manage it efficiently.
• Current technology of data storage cannot cope with our needs
and also leads lot of E-waste.
• They also cannot store information for long periods of time.
Figure 2: Rise in E-waste
A COMPARISON
This picture showsWhy DNA is the obvious
option for Future!!
From Magnetic To Genetic
Figure 3: A Comparison ofTechnologies
STRUCTURE OF DNA
• DNA consists of Adenine(A),
Guanine(G), Cytosine(C) and
Thymine(T).
• Paired into nucleotide base pairs A-T
and G-C.
• Backbone of the DNA strand is made
from alternating phosphate and sugar
residues.
• Single nucleotide can represent 2 bits
of information
Figure 4:
Structure of
DNA
HOW DNA AS STORAGE
TECHNOLOGY?
• Source data in form of binary bits (0 and 1) was converted to a tertiary bit code
(0, 1 and 2) to decrease chances of encoding errors.
• Following the conversion, the digital data is encoded into the nucleobases of
DNA.
• By altering the positions of nucleobases A,T,G and C, the tertiary code can be
mapped onto the nucleobases codes, thus making a repetitive blocks of nucleobases
that encode data.
• The encoded DNA then can be sequenced and read back to tertiary and then to
binary data using technologies similar to those used to map the human
genome.
A
PRACTICAL
EXAMPLE
Figure 5: AView of
the Entire Process
EXAMPLE
• Lets See “VVIT” as sample string.
• First we should use numbers to represent the letters in ASCII code
• From ASCII table V=86 V=86 I=73 T=84
• Then change to quaternary numbers 86= 1112 86= 1112 73 = 1021
84 = 1110
• Use “A ,T, C & G” to represent the numbers
• 0 = A 1 = T 2 = C 3 = G
• 1112111210211110
TTTCTTTCTACTTTTA
WHY DNA?
• A mere milligram of the molecule
could encode the complete text of
every book in the Library of Congress.
• Very high data density.
• More compact than current magnetic
tape or hard drive storage.
• As Right side graph shows cost to
manufacture Genomes is going down
• Hence future looks bright
Figure 5: Graph showing cost to manufacture
one Genome
DISADVANTAGES
 Cost : The production costs of generating raw, unassembled sequence
(reading) data is high. Synthesizing artificial sequences is costlier.
 Speed: Speed is low. The fastest current technology can sequence (read)
DNA on the order of about 1 billion bases per hour. Synthesis (write) is
even slower and more expensive as well. This is extremely slow compared
to modern storage media but would be suitable for long term data storage.
 Rewriting: This is essentially a write-once technology, but static data like
government and historical records could benefit from this storage option.
DEVELOPMENTS
• In 2016 research by Church andTechnicolor Research and Innovation was
published in which, 22 MB of a MPEG compressed movie sequence were
stored and recovered from DNA.
• In March 2017,Yaniv Erlich and Dina Zielinski of Columbia University and
the NewYork Genome Center published a method known as DNA Fountain
that stored data at a density of 215 petabytes per gram of DNA.The technique
approaches the Shannon capacity of DNA storage, achieving 85% of the
theoretical limit.The method was not ready for large-scale use, as it costs
$7000 to synthesize 2 megabytes of data and another $2000 to read it.
DEVELOPMENT
• In March 2018, University ofWashington and Microsoft published results
demonstrating storage and retrieval of approximately 200MB of data.The
research also proposed and evaluated a method for random access of data
items stored in DNA.
• Research published by Eurecom and Imperial College in January 2019,
demonstrated the ability to store structured data in synthetic DNA.The
research showed how to encode structured or, more specifically, relational
data in synthetic DNA and also demonstrated how to perform data processing
operations (similar to SQL) directly on the DNA as chemical processes.
DNA FOUNTAIN
• DNA Fountain is a strategy to store and
retrieve DNA information that is robust and
approaches the theoretical maximum of
information that can be stored per nucleotide.
• The success of our strategy lies in careful
adaptation of recent developments in coding
theory to the domain specific constraints of
DNA storage.
APPLICATIONS
• National security for information hiding
purposes and for data stenography.
• Preserve safely the personal information of a person such as medical
information and family history in their own bodies.
• Storage of archival documents.
CONCLUSION
• At present, DNA storage is experimental. Before it becomes commonplace, it
needs to be completely automated, and the processes of both building DNA and
reading it must be improved.
• They are both prone to error and relatively slow. For example, today’s DNA
synthesis lets us write a few hundred bytes per second; a modern hard drive can
write hundreds of millions of bytes per second.
• These are significant challenges, but we are optimistic because all the relevant
technologies are improving rapidly.
• Further, DNA data storage doesn’t need the perfect accuracy that biology requires,
so researchers are likely to find even cheaper and faster ways to store information
in nature’s oldest data storage system.
A STORY TO END THE SEMINAR
• In January 21, 2015, Nick Goldman from the European Bioinformatics Institute (EBI),
announced the Davos Bitcoin Challenge at the World Economic Forum annual meeting
in Davos.
• During his presentation, DNA-tubes were handed out to the audience with the
message that each tube contained the private key of exactly one bitcoin, all coded in
DNA.
• The first one to sequence and decode the DNA could claim the bitcoin and win the
challenge.The challenge was set for three years and would close if nobody claimed the
prize before January 21, 2018.
A STORY TO END THE SEMINAR
• Almost three years later on January 19, 2018, the EBI announced that a Belgian
PhD student, Sander Wuyts of the University of Antwerp, was the first one to
complete the challenge.
• Next to the instructions on how to claim the bitcoin (stored as a plain text
and pdf file), the logo of the EBI, the logo of the company that printed the
DNA (Custom Array) and a sketch of James Joyce were retrieved from the
DNA.
dna-digital-data-storage_compress.pdf

dna-digital-data-storage_compress.pdf

  • 1.
    Technical Seminar On Preparedand Presented by Neeraj Chowdhary 15B81A04N9
  • 2.
    WHAT IS DNADIGITAL DATA STORAGE ?  DNA digital data storage is defined as the process of encoding and decoding binary data to and from synthesized DNA strands.  Uses artificial DNA made using commercially available oligonucleotide synthesis Figure 1 An artistic rendering of DNA storage
  • 3.
    WHY IT ISNEEDED? • New Generation Computers and High Speed Internet have gained popularity in the recent years. • But when it comes to handling big data, the data of a corporation or of the world as a whole, the present data storage technology comes nowhere near to be able to manage it efficiently. • Current technology of data storage cannot cope with our needs and also leads lot of E-waste. • They also cannot store information for long periods of time. Figure 2: Rise in E-waste
  • 4.
    A COMPARISON This pictureshowsWhy DNA is the obvious option for Future!! From Magnetic To Genetic Figure 3: A Comparison ofTechnologies
  • 5.
    STRUCTURE OF DNA •DNA consists of Adenine(A), Guanine(G), Cytosine(C) and Thymine(T). • Paired into nucleotide base pairs A-T and G-C. • Backbone of the DNA strand is made from alternating phosphate and sugar residues. • Single nucleotide can represent 2 bits of information Figure 4: Structure of DNA
  • 6.
    HOW DNA ASSTORAGE TECHNOLOGY? • Source data in form of binary bits (0 and 1) was converted to a tertiary bit code (0, 1 and 2) to decrease chances of encoding errors. • Following the conversion, the digital data is encoded into the nucleobases of DNA. • By altering the positions of nucleobases A,T,G and C, the tertiary code can be mapped onto the nucleobases codes, thus making a repetitive blocks of nucleobases that encode data. • The encoded DNA then can be sequenced and read back to tertiary and then to binary data using technologies similar to those used to map the human genome.
  • 8.
  • 9.
    EXAMPLE • Lets See“VVIT” as sample string. • First we should use numbers to represent the letters in ASCII code • From ASCII table V=86 V=86 I=73 T=84 • Then change to quaternary numbers 86= 1112 86= 1112 73 = 1021 84 = 1110 • Use “A ,T, C & G” to represent the numbers • 0 = A 1 = T 2 = C 3 = G • 1112111210211110 TTTCTTTCTACTTTTA
  • 10.
    WHY DNA? • Amere milligram of the molecule could encode the complete text of every book in the Library of Congress. • Very high data density. • More compact than current magnetic tape or hard drive storage. • As Right side graph shows cost to manufacture Genomes is going down • Hence future looks bright Figure 5: Graph showing cost to manufacture one Genome
  • 11.
    DISADVANTAGES  Cost :The production costs of generating raw, unassembled sequence (reading) data is high. Synthesizing artificial sequences is costlier.  Speed: Speed is low. The fastest current technology can sequence (read) DNA on the order of about 1 billion bases per hour. Synthesis (write) is even slower and more expensive as well. This is extremely slow compared to modern storage media but would be suitable for long term data storage.  Rewriting: This is essentially a write-once technology, but static data like government and historical records could benefit from this storage option.
  • 12.
    DEVELOPMENTS • In 2016research by Church andTechnicolor Research and Innovation was published in which, 22 MB of a MPEG compressed movie sequence were stored and recovered from DNA. • In March 2017,Yaniv Erlich and Dina Zielinski of Columbia University and the NewYork Genome Center published a method known as DNA Fountain that stored data at a density of 215 petabytes per gram of DNA.The technique approaches the Shannon capacity of DNA storage, achieving 85% of the theoretical limit.The method was not ready for large-scale use, as it costs $7000 to synthesize 2 megabytes of data and another $2000 to read it.
  • 13.
    DEVELOPMENT • In March2018, University ofWashington and Microsoft published results demonstrating storage and retrieval of approximately 200MB of data.The research also proposed and evaluated a method for random access of data items stored in DNA. • Research published by Eurecom and Imperial College in January 2019, demonstrated the ability to store structured data in synthetic DNA.The research showed how to encode structured or, more specifically, relational data in synthetic DNA and also demonstrated how to perform data processing operations (similar to SQL) directly on the DNA as chemical processes.
  • 14.
    DNA FOUNTAIN • DNAFountain is a strategy to store and retrieve DNA information that is robust and approaches the theoretical maximum of information that can be stored per nucleotide. • The success of our strategy lies in careful adaptation of recent developments in coding theory to the domain specific constraints of DNA storage.
  • 15.
    APPLICATIONS • National securityfor information hiding purposes and for data stenography. • Preserve safely the personal information of a person such as medical information and family history in their own bodies. • Storage of archival documents.
  • 16.
    CONCLUSION • At present,DNA storage is experimental. Before it becomes commonplace, it needs to be completely automated, and the processes of both building DNA and reading it must be improved. • They are both prone to error and relatively slow. For example, today’s DNA synthesis lets us write a few hundred bytes per second; a modern hard drive can write hundreds of millions of bytes per second. • These are significant challenges, but we are optimistic because all the relevant technologies are improving rapidly. • Further, DNA data storage doesn’t need the perfect accuracy that biology requires, so researchers are likely to find even cheaper and faster ways to store information in nature’s oldest data storage system.
  • 17.
    A STORY TOEND THE SEMINAR • In January 21, 2015, Nick Goldman from the European Bioinformatics Institute (EBI), announced the Davos Bitcoin Challenge at the World Economic Forum annual meeting in Davos. • During his presentation, DNA-tubes were handed out to the audience with the message that each tube contained the private key of exactly one bitcoin, all coded in DNA. • The first one to sequence and decode the DNA could claim the bitcoin and win the challenge.The challenge was set for three years and would close if nobody claimed the prize before January 21, 2018.
  • 18.
    A STORY TOEND THE SEMINAR • Almost three years later on January 19, 2018, the EBI announced that a Belgian PhD student, Sander Wuyts of the University of Antwerp, was the first one to complete the challenge. • Next to the instructions on how to claim the bitcoin (stored as a plain text and pdf file), the logo of the EBI, the logo of the company that printed the DNA (Custom Array) and a sketch of James Joyce were retrieved from the DNA.