Course name : Seminar(IT331)
Faculty Guide : Prof. Nikita Patel
Presented By : Ravi Vaniya (15IT036)
Sanat Dhobi (15IT027)
From Magnetic drive to Genomic drive
Synopsis
 Introduction
 History (Evolution of Memory Storage Devices)
 Challenges of BigData
 What is DNA? , Why DNA? (A Biological perspective)
 DNA Data storage
 How data is stored? (Algorithms , Techniques etc.)
 Current research in world (case study by Microsoft)
 Pro’s and Con’s
 Application and Future scope
Introduction
 Deoxyribonucleic acid (DNA) is a molecule that
carries the genetic (hereditary) instructions used
in the growth, development and functioning of all
known living organism and many viruses.
 Most DNA molecules consist of
two biopolymer strands coiled around each other
to form a double helix.
 The information in DNA is stored as a code made
up of four nitrogen bases: adenine (A), guanine
(G), cytosine (C), and thymine (T).
 Nucleotide = Nitrogen base + Sugar + Phosphate.
Some images
History
(Evolution of Memory Storage Devices)
Earlier devices
 In mid-1700 – Punch card
It was used for input both of programs and data.
Used as early as 1725 in the textile industry (for controlling
mechanized textile looms).
 In 1946 – Selectron tube
Capacity - 32 to 512 bytes.
4096-bit Selectron was 10 inches long and 3 inches wide.
Con’s - expensive and production problems.
Courtesy: Wikipedia
Earlier devices …
 In 1932 – Magnetic drum memory
Memory capacity - 10 kB.
 In 1951 – Magnetic tape
 In 1956 – Hard disk drive
 IBM Model 350 - It had 50 24-inch discs with a total storage
capacity of 5 million characters (just under 5 MB).
 In 1971 – First Floppy drive (Diskette).
 In 1978 – Compact disc
 In 1980 – Hard disk drive (First 1 GB drive)
After 1990s …
 DVD and Flask storage (like SD card).
 Micro drive
 Holography.
 Cloud storage.
History
 The idea about the possibility of recording,
storage and retrieval of information on DNA
molecules were originally made by Mikhail
Neiman
 He published his idea in 1964–65 in the
Radiotekhnika journal, USSR(now Russia),
and the technology during that time was
referred to as MNeimONics(Mikhail Neiman
OligoNucleotides).
Challenges of Big Data
Introduction
 What is big data ?
Big data is a term for data sets that are so large or complex that
traditional data processing application software is inadequate to deal
with them.
 Problem for existing DBMS…
 Solutions..
1. Use software/framework
2. Some new technology
Issues
1. Data Volume
2. Data Velocity
3. Data Variety
4. Data Value
5. Data Complexity
Example : Google map
Challenges
 Privacy and security
 Data access and sharing of information
 Analytical challenges
 Human resource and manpower
 Technical – Fault tolerance , Scalability , Quality of data
Solution – 1 : Framework/Software
 Hadoop
Hadoop is an open-source framework(by Apache) that allows to store and process big
data in a distributed environment across clusters of computers using simple
programming models. It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
 Let’s see how Hadoop works?
Traditional Approach Google’s Solution
Hadoop
Why DNA ?
1. Density of information that can be stored
- one gram of single-strand DNA could store as much as an exabyte
(1018 bytes).
2. DNA storage is not re-writable
- good for archiving records
3. Preservation
- DNA can still be sequenced from dried mummies thousands of
years old , but such sequences are rarely complete.
Polymerase Chain Reaction
 PCR is a technique to make many copies of a specific DNA region in
vitro (in a test tube rather than an organism).
 PCR relies on a thermostable DNA polymerase, Taq polymerase, and
requires DNA primers designed specifically for the DNA region of
interest.
 In PCR, the reaction is repeatedly cycled through a series of
temperature changes, which allow many copies of the target region to
be produced.
 PCR has many research and practical applications. It is routinely used
in DNA cloning, medical diagnostics, and forensic analysis of DNA.
Data Storage
Data put(Key,Value) process
Data get(Key) process
Advantages
 Density of information that can be stored is very high i.e. one gram of
single-strand DNA could store as much as an Exabyte.
 DNA storage is not re-writable means it is good for archiving records.
 DNA can be preserved for long time.
 DNA can maintain its integrity without any power supply. Also, its
small size and weight make it easy to store and transport.
 DNA is less susceptible to technical failures.
Disadvantages
 High cost of DNA synthesis per data stored (around US$12,400 per
megabyte of data stored).
 Data is read back at low speed.
 DNA is not rewritable, i.e. it can’t update the information it holds
without redoing the entire information storing process.
 DNA does not allow random access either, meaning, to access a
particular part of the data stored, the entire stored information should
be decoded.
References ...
 www.google.co.in
 Official website : University of Washington
 Official website : Microsoft Inc.
 Research paper by Siddhant Shrivastava and Rohan Badlani International
Journal of Electrical Energy, Vol. 2, No. 2, June 2014
 https://en.wikipedia.org/wiki/DNA
 http://www.the-scientist.com/?articles.view/articleNo/32494/title/DNA-Data-
Storage
 https://www.khanacademy.org/science/biology/biotech-dna-technology/dna-
sequencing-pcr-electrophoresis/a/polymerase-chain-reaction-pcr
Dna data storage

Dna data storage

  • 1.
    Course name :Seminar(IT331) Faculty Guide : Prof. Nikita Patel Presented By : Ravi Vaniya (15IT036) Sanat Dhobi (15IT027)
  • 2.
    From Magnetic driveto Genomic drive
  • 3.
    Synopsis  Introduction  History(Evolution of Memory Storage Devices)  Challenges of BigData  What is DNA? , Why DNA? (A Biological perspective)  DNA Data storage  How data is stored? (Algorithms , Techniques etc.)  Current research in world (case study by Microsoft)  Pro’s and Con’s  Application and Future scope
  • 4.
    Introduction  Deoxyribonucleic acid(DNA) is a molecule that carries the genetic (hereditary) instructions used in the growth, development and functioning of all known living organism and many viruses.  Most DNA molecules consist of two biopolymer strands coiled around each other to form a double helix.  The information in DNA is stored as a code made up of four nitrogen bases: adenine (A), guanine (G), cytosine (C), and thymine (T).  Nucleotide = Nitrogen base + Sugar + Phosphate.
  • 5.
  • 6.
  • 7.
    Earlier devices  Inmid-1700 – Punch card It was used for input both of programs and data. Used as early as 1725 in the textile industry (for controlling mechanized textile looms).  In 1946 – Selectron tube Capacity - 32 to 512 bytes. 4096-bit Selectron was 10 inches long and 3 inches wide. Con’s - expensive and production problems.
  • 8.
  • 9.
    Earlier devices … In 1932 – Magnetic drum memory Memory capacity - 10 kB.  In 1951 – Magnetic tape  In 1956 – Hard disk drive  IBM Model 350 - It had 50 24-inch discs with a total storage capacity of 5 million characters (just under 5 MB).  In 1971 – First Floppy drive (Diskette).  In 1978 – Compact disc  In 1980 – Hard disk drive (First 1 GB drive)
  • 10.
    After 1990s … DVD and Flask storage (like SD card).  Micro drive  Holography.  Cloud storage.
  • 11.
    History  The ideaabout the possibility of recording, storage and retrieval of information on DNA molecules were originally made by Mikhail Neiman  He published his idea in 1964–65 in the Radiotekhnika journal, USSR(now Russia), and the technology during that time was referred to as MNeimONics(Mikhail Neiman OligoNucleotides).
  • 12.
  • 13.
    Introduction  What isbig data ? Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.  Problem for existing DBMS…  Solutions.. 1. Use software/framework 2. Some new technology
  • 14.
    Issues 1. Data Volume 2.Data Velocity 3. Data Variety 4. Data Value 5. Data Complexity Example : Google map
  • 16.
    Challenges  Privacy andsecurity  Data access and sharing of information  Analytical challenges  Human resource and manpower  Technical – Fault tolerance , Scalability , Quality of data
  • 17.
    Solution – 1: Framework/Software  Hadoop Hadoop is an open-source framework(by Apache) that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  Let’s see how Hadoop works?
  • 18.
  • 19.
  • 20.
    Why DNA ? 1.Density of information that can be stored - one gram of single-strand DNA could store as much as an exabyte (1018 bytes). 2. DNA storage is not re-writable - good for archiving records 3. Preservation - DNA can still be sequenced from dried mummies thousands of years old , but such sequences are rarely complete.
  • 21.
    Polymerase Chain Reaction PCR is a technique to make many copies of a specific DNA region in vitro (in a test tube rather than an organism).  PCR relies on a thermostable DNA polymerase, Taq polymerase, and requires DNA primers designed specifically for the DNA region of interest.  In PCR, the reaction is repeatedly cycled through a series of temperature changes, which allow many copies of the target region to be produced.  PCR has many research and practical applications. It is routinely used in DNA cloning, medical diagnostics, and forensic analysis of DNA.
  • 25.
  • 26.
  • 27.
  • 28.
    Advantages  Density ofinformation that can be stored is very high i.e. one gram of single-strand DNA could store as much as an Exabyte.  DNA storage is not re-writable means it is good for archiving records.  DNA can be preserved for long time.  DNA can maintain its integrity without any power supply. Also, its small size and weight make it easy to store and transport.  DNA is less susceptible to technical failures.
  • 29.
    Disadvantages  High costof DNA synthesis per data stored (around US$12,400 per megabyte of data stored).  Data is read back at low speed.  DNA is not rewritable, i.e. it can’t update the information it holds without redoing the entire information storing process.  DNA does not allow random access either, meaning, to access a particular part of the data stored, the entire stored information should be decoded.
  • 30.
    References ...  www.google.co.in Official website : University of Washington  Official website : Microsoft Inc.  Research paper by Siddhant Shrivastava and Rohan Badlani International Journal of Electrical Energy, Vol. 2, No. 2, June 2014  https://en.wikipedia.org/wiki/DNA  http://www.the-scientist.com/?articles.view/articleNo/32494/title/DNA-Data- Storage  https://www.khanacademy.org/science/biology/biotech-dna-technology/dna- sequencing-pcr-electrophoresis/a/polymerase-chain-reaction-pcr