This document discusses DNA digital data storage. It explains that DNA can store vastly more data than current technologies in a smaller space and last much longer. However, writing and reading DNA data is currently much slower and more expensive than modern storage methods. The document outlines how binary data can be converted to DNA nucleotide sequences and provides examples. It also reviews recent developments that aim to improve DNA data storage methods and decrease costs.
2. WHAT IS DNA DIGITAL DATA STORAGE ?
DNA digital data storage is defined as
the process of encoding and decoding
binary data to and from synthesized DNA
strands.
Uses artificial DNA made using
commercially available oligonucleotide
synthesis
Figure 1 An artistic rendering of DNA storage
3. WHY IT IS NEEDED?
• New Generation Computers and High Speed Internet have gained
popularity in the recent years.
• But when it comes to handling big data, the data of a corporation
or of the world as a whole, the present data storage technology
comes nowhere near to be able to manage it efficiently.
• Current technology of data storage cannot cope with our needs
and also leads lot of E-waste.
• They also cannot store information for long periods of time.
Figure 2: Rise in E-waste
4. A COMPARISON
This picture showsWhy DNA is the obvious
option for Future!!
From Magnetic To Genetic
Figure 3: A Comparison ofTechnologies
5. STRUCTURE OF DNA
• DNA consists of Adenine(A),
Guanine(G), Cytosine(C) and
Thymine(T).
• Paired into nucleotide base pairs A-T
and G-C.
• Backbone of the DNA strand is made
from alternating phosphate and sugar
residues.
• Single nucleotide can represent 2 bits
of information
Figure 4:
Structure of
DNA
6. HOW DNA AS STORAGE
TECHNOLOGY?
• Source data in form of binary bits (0 and 1) was converted to a tertiary bit code
(0, 1 and 2) to decrease chances of encoding errors.
• Following the conversion, the digital data is encoded into the nucleobases of
DNA.
• By altering the positions of nucleobases A,T,G and C, the tertiary code can be
mapped onto the nucleobases codes, thus making a repetitive blocks of nucleobases
that encode data.
• The encoded DNA then can be sequenced and read back to tertiary and then to
binary data using technologies similar to those used to map the human
genome.
9. EXAMPLE
• Lets See “VVIT” as sample string.
• First we should use numbers to represent the letters in ASCII code
• From ASCII table V=86 V=86 I=73 T=84
• Then change to quaternary numbers 86= 1112 86= 1112 73 = 1021
84 = 1110
• Use “A ,T, C & G” to represent the numbers
• 0 = A 1 = T 2 = C 3 = G
• 1112111210211110
TTTCTTTCTACTTTTA
10. WHY DNA?
• A mere milligram of the molecule
could encode the complete text of
every book in the Library of Congress.
• Very high data density.
• More compact than current magnetic
tape or hard drive storage.
• As Right side graph shows cost to
manufacture Genomes is going down
• Hence future looks bright
Figure 5: Graph showing cost to manufacture
one Genome
11. DISADVANTAGES
Cost : The production costs of generating raw, unassembled sequence
(reading) data is high. Synthesizing artificial sequences is costlier.
Speed: Speed is low. The fastest current technology can sequence (read)
DNA on the order of about 1 billion bases per hour. Synthesis (write) is
even slower and more expensive as well. This is extremely slow compared
to modern storage media but would be suitable for long term data storage.
Rewriting: This is essentially a write-once technology, but static data like
government and historical records could benefit from this storage option.
12. DEVELOPMENTS
• In 2016 research by Church andTechnicolor Research and Innovation was
published in which, 22 MB of a MPEG compressed movie sequence were
stored and recovered from DNA.
• In March 2017,Yaniv Erlich and Dina Zielinski of Columbia University and
the NewYork Genome Center published a method known as DNA Fountain
that stored data at a density of 215 petabytes per gram of DNA.The technique
approaches the Shannon capacity of DNA storage, achieving 85% of the
theoretical limit.The method was not ready for large-scale use, as it costs
$7000 to synthesize 2 megabytes of data and another $2000 to read it.
13. DEVELOPMENT
• In March 2018, University ofWashington and Microsoft published results
demonstrating storage and retrieval of approximately 200MB of data.The
research also proposed and evaluated a method for random access of data
items stored in DNA.
• Research published by Eurecom and Imperial College in January 2019,
demonstrated the ability to store structured data in synthetic DNA.The
research showed how to encode structured or, more specifically, relational
data in synthetic DNA and also demonstrated how to perform data processing
operations (similar to SQL) directly on the DNA as chemical processes.
14. DNA FOUNTAIN
• DNA Fountain is a strategy to store and
retrieve DNA information that is robust and
approaches the theoretical maximum of
information that can be stored per nucleotide.
• The success of our strategy lies in careful
adaptation of recent developments in coding
theory to the domain specific constraints of
DNA storage.
15. APPLICATIONS
• National security for information hiding
purposes and for data stenography.
• Preserve safely the personal information of a person such as medical
information and family history in their own bodies.
• Storage of archival documents.
16. CONCLUSION
• At present, DNA storage is experimental. Before it becomes commonplace, it
needs to be completely automated, and the processes of both building DNA and
reading it must be improved.
• They are both prone to error and relatively slow. For example, today’s DNA
synthesis lets us write a few hundred bytes per second; a modern hard drive can
write hundreds of millions of bytes per second.
• These are significant challenges, but we are optimistic because all the relevant
technologies are improving rapidly.
• Further, DNA data storage doesn’t need the perfect accuracy that biology requires,
so researchers are likely to find even cheaper and faster ways to store information
in nature’s oldest data storage system.
17. A STORY TO END THE SEMINAR
• In January 21, 2015, Nick Goldman from the European Bioinformatics Institute (EBI),
announced the Davos Bitcoin Challenge at the World Economic Forum annual meeting
in Davos.
• During his presentation, DNA-tubes were handed out to the audience with the
message that each tube contained the private key of exactly one bitcoin, all coded in
DNA.
• The first one to sequence and decode the DNA could claim the bitcoin and win the
challenge.The challenge was set for three years and would close if nobody claimed the
prize before January 21, 2018.
18. A STORY TO END THE SEMINAR
• Almost three years later on January 19, 2018, the EBI announced that a Belgian
PhD student, Sander Wuyts of the University of Antwerp, was the first one to
complete the challenge.
• Next to the instructions on how to claim the bitcoin (stored as a plain text
and pdf file), the logo of the EBI, the logo of the company that printed the
DNA (Custom Array) and a sketch of James Joyce were retrieved from the
DNA.