3. Synopsis
Introduction
History (Evolution of Memory Storage Devices)
Challenges of BigData
What is DNA? , Why DNA? (A Biological perspective)
DNA Data storage
How data is stored? (Algorithms , Techniques etc.)
Current research in world (case study by Microsoft)
Pro’s and Con’s
Application and Future scope
4. Introduction
Deoxyribonucleic acid (DNA) is a molecule that
carries the genetic (hereditary) instructions used
in the growth, development and functioning of all
known living organism and many viruses.
Most DNA molecules consist of
two biopolymer strands coiled around each other
to form a double helix.
The information in DNA is stored as a code made
up of four nitrogen bases: adenine (A), guanine
(G), cytosine (C), and thymine (T).
Nucleotide = Nitrogen base + Sugar + Phosphate.
7. Earlier devices
In mid-1700 – Punch card
It was used for input both of programs and data.
Used as early as 1725 in the textile industry (for controlling
mechanized textile looms).
In 1946 – Selectron tube
Capacity - 32 to 512 bytes.
4096-bit Selectron was 10 inches long and 3 inches wide.
Con’s - expensive and production problems.
9. Earlier devices …
In 1932 – Magnetic drum memory
Memory capacity - 10 kB.
In 1951 – Magnetic tape
In 1956 – Hard disk drive
IBM Model 350 - It had 50 24-inch discs with a total storage
capacity of 5 million characters (just under 5 MB).
In 1971 – First Floppy drive (Diskette).
In 1978 – Compact disc
In 1980 – Hard disk drive (First 1 GB drive)
10. After 1990s …
DVD and Flask storage (like SD card).
Micro drive
Holography.
Cloud storage.
11. History
The idea about the possibility of recording,
storage and retrieval of information on DNA
molecules were originally made by Mikhail
Neiman
He published his idea in 1964–65 in the
Radiotekhnika journal, USSR(now Russia),
and the technology during that time was
referred to as MNeimONics(Mikhail Neiman
OligoNucleotides).
13. Introduction
What is big data ?
Big data is a term for data sets that are so large or complex that
traditional data processing application software is inadequate to deal
with them.
Problem for existing DBMS…
Solutions..
1. Use software/framework
2. Some new technology
14. Issues
1. Data Volume
2. Data Velocity
3. Data Variety
4. Data Value
5. Data Complexity
Example : Google map
15.
16. Challenges
Privacy and security
Data access and sharing of information
Analytical challenges
Human resource and manpower
Technical – Fault tolerance , Scalability , Quality of data
17. Solution – 1 : Framework/Software
Hadoop
Hadoop is an open-source framework(by Apache) that allows to store and process big
data in a distributed environment across clusters of computers using simple
programming models. It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
Let’s see how Hadoop works?
20. Why DNA ?
1. Density of information that can be stored
- one gram of single-strand DNA could store as much as an exabyte
(1018 bytes).
2. DNA storage is not re-writable
- good for archiving records
3. Preservation
- DNA can still be sequenced from dried mummies thousands of
years old , but such sequences are rarely complete.
21. Polymerase Chain Reaction
PCR is a technique to make many copies of a specific DNA region in
vitro (in a test tube rather than an organism).
PCR relies on a thermostable DNA polymerase, Taq polymerase, and
requires DNA primers designed specifically for the DNA region of
interest.
In PCR, the reaction is repeatedly cycled through a series of
temperature changes, which allow many copies of the target region to
be produced.
PCR has many research and practical applications. It is routinely used
in DNA cloning, medical diagnostics, and forensic analysis of DNA.
28. Advantages
Density of information that can be stored is very high i.e. one gram of
single-strand DNA could store as much as an Exabyte.
DNA storage is not re-writable means it is good for archiving records.
DNA can be preserved for long time.
DNA can maintain its integrity without any power supply. Also, its
small size and weight make it easy to store and transport.
DNA is less susceptible to technical failures.
29. Disadvantages
High cost of DNA synthesis per data stored (around US$12,400 per
megabyte of data stored).
Data is read back at low speed.
DNA is not rewritable, i.e. it can’t update the information it holds
without redoing the entire information storing process.
DNA does not allow random access either, meaning, to access a
particular part of the data stored, the entire stored information should
be decoded.
30. References ...
www.google.co.in
Official website : University of Washington
Official website : Microsoft Inc.
Research paper by Siddhant Shrivastava and Rohan Badlani International
Journal of Electrical Energy, Vol. 2, No. 2, June 2014
https://en.wikipedia.org/wiki/DNA
http://www.the-scientist.com/?articles.view/articleNo/32494/title/DNA-Data-
Storage
https://www.khanacademy.org/science/biology/biotech-dna-technology/dna-
sequencing-pcr-electrophoresis/a/polymerase-chain-reaction-pcr