Ahmad Ali
IBGE
TRENDS IN DNA SEQUENCING,
METHODS AND APPLICATIONS
CONTENTS:
• DNA
• DNA SEQUENCING
• PURPOSE
• DAN SEQUENCING METHODS
• HISTORICAL METHOD
• NEXT GENERATION METHODS
• APPLICATIONS
• Deoxyribonucleic acid (DNA) is a nucleic acid that functions
include
 Storage of genetic information
 Self-duplication & inheritance
 Expression of the genetic message
• DNA’s major function is to code for proteins. Information is
encoded in the order of the nitrogenous bases.
Deoxyribonucleic Acid
DNA SEQUENCING
Determining the order of bases in a
section of DNA
To analyze gene structure and its
relation to gene expression as well as
protein conformation
PURPOSE
Deciphering “code of life”
Detecting mutations
Typing microorganisms
Identifying human halotypes
Designating polymorphisms
DNA SEQUENCING METHODS
•Historically there are two main methods of DNA
sequencing
1. Maxam and Gilbert method
2. Sanger method
Modern sequencing equipment uses the principles of
the Sanger technique.
•A. M. Maxam and W.Gilbert-1975
•Chemical Sequencing
•Treatment of DNA with certain
Chemicals  DNA cuts into
Fragments  Monitoring of
sequences
MAXAM & GILBERT METHOD
A graphical demonstration
Principle
Maxam-Gilbert 1975
Chemical Sequencing
Issues:
• Need perfectly pure single species of DNA
• Nasty Chemicals
• Radioactive End-labeling
• 4-lanes/read
• Sequence only what you can purify
Advantages:
- 1st DNA sequencing available
- 2-300 bp/read
Fragment
population
distribution
corresponds to
appearance of
base within
sequence
• Most common approach used for DNA
sequencing .
• Invented by Frederick Sanger - 1977
• Nobel prize - 1980
• Also termed as Chain Termination or
Dideoxy method
SANGER METHOD
Sanger sequencing: chain-terminating inhibitors
Sanger “Sequencing-by-Synthesis” 1977
Issues:
- Radioactive End-labeling
- 4-lanes/read
- Sequencing gels
Advantages:
- 4-500 bp/reads
- Radioactive Incorporation
- Primer gives you control
dNTP ddNTP
PCR Dye-Terminator 1990’s
Issues:
- Sequencing gels
- 1 run/day
Advantages:
- 600-700 bp/reads
- 96 reads/run
- Each terminator dye has a different color.
Lets you combine all 4 reactions in one
lane.
- Single lane/read
- Primer gives you control
A breakthrough: fluorescent chain-terminating inhibitors
Automated DNA sequencer
• Capillary electrophoresis
• Costs reduced by 90%
• Human operation 15 min/day/machine
• 1 million bp/day
3730x/ DNA analyzer
First generation DNA sequencer
• Manual preparation of acrylamide gels
• Manual loading of samples
• Contigs of 500-600 bp
• 2.4 millions bp/year
(1000 years needed to sequence the human genome)
ABI PRISM 377
Human Genome Project (15 years) Hierarchical
Shotgun Sequencing [start1990]
- Randomly insert Human DNA into BAC clones (~150kbp each)
- Combine these BAC clones to create a scaffold of the human
genome. Each BAC clone will be mapped to a region on a Human
Chromosome
- Pass BAC clones to different Genome Centers throughout US
- At each center, each vector is sequenced using shotgun sequencing
- Wait 15 years for results.
Issues with Shotgun Sequencing
• Reads-> contigs -> scaffolds -> genome reconstruction
• Repeat regions can confuse Contig assemblers.
• It was hoped that by focusing each shotgun run to a single 40-150kb region, these issues
would be minimized.
• According to Venter, it simply multiplied the number of times one encountered the same
problem
Shotgun Sequencing: Venter 1997
Same approach is used throughout NGS
Paired-end sequencing:
1. Randomly cut genomic DNA.
2. Use Gel-purification to make three
libraries of random DNA fragments:
2kb, 10kb, 50kb
2. Sequence from both ends.
3. Use distance information to assemble
contigs into scaffolds.
Distance information allows you to ‘jump’
over repeat regions.
This approach allowed Venter to ‘jump’ over
the federal sequencing project
NEXT GENERATION
SEQUENCING
 Common Pipeline.
 Library Prep.
 Sequencing – Massively Parallel Sequencing.
 Bioinformatics - Data Analysis .

 Popular Platforms (commercially available):
 Roche 454
 AB SOLiD
 Illumina(HiSeq,MiSeq).
 Newer Platforms (Experemental ):
 Ion Torrent
 PacBio RS
 Oxford Nanopore.
Roche 454 – Library Prep.
Emulsion PCR (emPCR)
1. DNA fragmentations and
adaptor ligation.
2. DNA fragments are added
to an oil mixture containing
millions of beads.
3. Emulsion PCR results in
multiple copies of the
fragment.
4. Beads are deposited on plate
wells ready for sequencing.
Roche 454 – Pyrosequencing
M.L. Metzker, Nature Review Genetics(2010)
Illumina – Library Prep.
Illumina – Sequence by Synthesis
1. Add dye-labeled
nucleotides.
2. Scan and detect nucleotide
specific fluorescence.
3. Remove 3’ – blocking group
(Reversible termination).
4. Cleave fluorescent group.
5. Rinse and Repeat.
http://hmg.oxfordjournals.org
Illumina – Sequencing by Synthesis
Platform Run
Time
Yield
(GB)
Error Type Error Rate
GAIIx 14 days 96 Sub 0.1
HiSeq
2000/2500
10/2
days
600/120 Sub 0.03*
MiSeq 1 day 2 Sub 0.03*
Illumina
Advantages
 High throughput /low cost.
 Suitable for a wide rage of applications most notably
whole genome sequencing.
Disadvantages
 Substitution error rates (recently improved).
 Lagging strand dephasing causes sequence quality
deterioration towards the end of read.
SOLiD (Life/AB)
 SOLiD: Sequencing by Oligonucleotide Ligation and Detection.
 Library Prep: emPCR
 “2-base encoding” – Instead of the typical single dNTP addition,
two base matching probes are used. (possible 16 probes).
 Color Space – four color sequencing encoding further increases
accuracy.
SOLiD (Life/AB)
1. Anneal primer and hybridize
probe(8-mer).
2. Ligation and Detection.
3. Cleave fluorescent tail(3-mer).
4. Repeat ligation cycle.
Repeat steps 1-4 with (n-1) primer.
SOLiD – Color Space
• 16 possible base combination are
represented by 4 colors.
• All possible sequencing
combination need to be decoded.
SOLiD – SNP Detection
Summary –
Due to the 2-base encoding system you get very low error rate. (Error rate of around .01)
It is still however limited by short reads. (35bp)
Ion Torrent (Life Technologies)
J. M. Rothberg, et al. Nature (2011) 475:348-352
• Similar to pyrosequencing but uses
semiconducting chip to detect dNTP
incorporation.
• The chip measure differences in pH.
• Shown to have problems with
homopolymer reads and coverage bias
with GC-rich regions.
• Ion Proton™ promises higher output and
longer reads.
PacBio RS
 Single Molecule Sequencing – instead of
sequencing clonally amplified templates from beads
(Pyro) or clusters (Illumina) DNA synthesis is
detected on a single DNA strand.
Zero-mode waveguide (ZMW)
• DNA polymerase is affixed to the bottom of a
tiny hole (~70nm).
• Only the bottom portion of the hole is
illuminated allowing for detection of
incorporation of dye-labeled nucleotide.
PacBio RS
Advantages
 No amplification required.
 Extremely long read lengths.
 Average 2500 bp. Longest 15,000bp.
Disadvantages
 High error rates.
Oxford Nanopore
 Announced Feb. 2012 at
ABGT conference.
 Measure changes in ion
flow through nanopore.
 Potential for long read
lengths and short
sequencing times.
http://www2.technologyreview.com
Nanopore sequencing
 The system uses the Staphylococcus auereus toxin α-hemolysin, a robust heptameric
protein which normally forms holes in membranes.
 DNA and RNA can be electrophoretically driven through a nanopore of suitable
diameter (Kasianowicz J.J. et al 1996 PNAS)
Nanopore sequencing – how does it work?
 When a small voltage (~100 mV) is imposed across a
nanopore in a membrane separating two chambers
containing acqueous electrolytes, the ionic current
through the pore can be measured
 Molecules going through the nanopore cause
disruption in the ionic current, and by measuring the
disruption molecules can be identified.
Lipid bilayer with high electronic resistant
Ionic current
Hemolysin
Nanopore – strand sequencing
 The DNA polymer passes through the
nanopore itself
 The nanopore is engineered to allow
single-base resolution within the strand
 A DNA polymerase, coupled with a α-
hemolysin, synthesizes a new strand of
DNA using as a template the polymer
coming out of the pore
DNA Polymer
Nanopore sequencing
 Advantages
 minimal sample preparation
 no requirement for ploymerase and ligase
 potential of very long read-lengths ( > 10,000 – 50,000 nt )
 it might well achieve the $1,000 per mammalian genome goal
 the instrument is inexpensive
EPIGENOME
Transcriptional active
site
Protein -DNA
interaction
Methylation analysis
METAGENOME
 Microbial
diversity
 FORENSICS
 MEDICINES
 AGRICULTURE
GENOME
 DE NOVO
 RESEQUENCING
 EXOME
SEQUENCING
 ANCIENT DNA
RNA-SEQ/WHOLE
TRANCRIPTOME
 mRNA Expression and
Discovery
 Alternative Splicing
 microRNA Expression
and discovery
APPLICATIONS
References
Mardis, E. R. A decade’s perspective on DNA sequencing
technology. Nature (2011) 470:198 - 203
Metzker, M.L. Sequencing technologies – the next
generation. Nature Review Genetics(2010) 11:31 – 46
N. J. Loman, C. Constantinidou, Jacqueline Z. M. Chan, M.
Halachev, M. Sergeant, C. W. Penn, E. R. Robinson & M. J.
Pallen. High-throughput bacterial genome sequencing: an
embarrassment of choice, a world of opportunity. Nature
Review Microbiology 10, 599-606.
Hamas 1

Hamas 1

  • 1.
    Ahmad Ali IBGE TRENDS INDNA SEQUENCING, METHODS AND APPLICATIONS
  • 2.
    CONTENTS: • DNA • DNASEQUENCING • PURPOSE • DAN SEQUENCING METHODS • HISTORICAL METHOD • NEXT GENERATION METHODS • APPLICATIONS
  • 3.
    • Deoxyribonucleic acid(DNA) is a nucleic acid that functions include  Storage of genetic information  Self-duplication & inheritance  Expression of the genetic message • DNA’s major function is to code for proteins. Information is encoded in the order of the nitrogenous bases. Deoxyribonucleic Acid
  • 4.
    DNA SEQUENCING Determining theorder of bases in a section of DNA To analyze gene structure and its relation to gene expression as well as protein conformation
  • 5.
    PURPOSE Deciphering “code oflife” Detecting mutations Typing microorganisms Identifying human halotypes Designating polymorphisms
  • 6.
    DNA SEQUENCING METHODS •Historicallythere are two main methods of DNA sequencing 1. Maxam and Gilbert method 2. Sanger method Modern sequencing equipment uses the principles of the Sanger technique.
  • 7.
    •A. M. Maxamand W.Gilbert-1975 •Chemical Sequencing •Treatment of DNA with certain Chemicals  DNA cuts into Fragments  Monitoring of sequences MAXAM & GILBERT METHOD
  • 8.
  • 9.
    Maxam-Gilbert 1975 Chemical Sequencing Issues: •Need perfectly pure single species of DNA • Nasty Chemicals • Radioactive End-labeling • 4-lanes/read • Sequence only what you can purify Advantages: - 1st DNA sequencing available - 2-300 bp/read Fragment population distribution corresponds to appearance of base within sequence
  • 10.
    • Most commonapproach used for DNA sequencing . • Invented by Frederick Sanger - 1977 • Nobel prize - 1980 • Also termed as Chain Termination or Dideoxy method SANGER METHOD
  • 11.
  • 12.
    Sanger “Sequencing-by-Synthesis” 1977 Issues: -Radioactive End-labeling - 4-lanes/read - Sequencing gels Advantages: - 4-500 bp/reads - Radioactive Incorporation - Primer gives you control dNTP ddNTP
  • 13.
    PCR Dye-Terminator 1990’s Issues: -Sequencing gels - 1 run/day Advantages: - 600-700 bp/reads - 96 reads/run - Each terminator dye has a different color. Lets you combine all 4 reactions in one lane. - Single lane/read - Primer gives you control
  • 14.
    A breakthrough: fluorescentchain-terminating inhibitors Automated DNA sequencer • Capillary electrophoresis • Costs reduced by 90% • Human operation 15 min/day/machine • 1 million bp/day 3730x/ DNA analyzer First generation DNA sequencer • Manual preparation of acrylamide gels • Manual loading of samples • Contigs of 500-600 bp • 2.4 millions bp/year (1000 years needed to sequence the human genome) ABI PRISM 377
  • 15.
    Human Genome Project(15 years) Hierarchical Shotgun Sequencing [start1990] - Randomly insert Human DNA into BAC clones (~150kbp each) - Combine these BAC clones to create a scaffold of the human genome. Each BAC clone will be mapped to a region on a Human Chromosome - Pass BAC clones to different Genome Centers throughout US - At each center, each vector is sequenced using shotgun sequencing - Wait 15 years for results.
  • 16.
    Issues with ShotgunSequencing • Reads-> contigs -> scaffolds -> genome reconstruction • Repeat regions can confuse Contig assemblers. • It was hoped that by focusing each shotgun run to a single 40-150kb region, these issues would be minimized. • According to Venter, it simply multiplied the number of times one encountered the same problem
  • 17.
    Shotgun Sequencing: Venter1997 Same approach is used throughout NGS Paired-end sequencing: 1. Randomly cut genomic DNA. 2. Use Gel-purification to make three libraries of random DNA fragments: 2kb, 10kb, 50kb 2. Sequence from both ends. 3. Use distance information to assemble contigs into scaffolds. Distance information allows you to ‘jump’ over repeat regions. This approach allowed Venter to ‘jump’ over the federal sequencing project
  • 18.
    NEXT GENERATION SEQUENCING  CommonPipeline.  Library Prep.  Sequencing – Massively Parallel Sequencing.  Bioinformatics - Data Analysis .   Popular Platforms (commercially available):  Roche 454  AB SOLiD  Illumina(HiSeq,MiSeq).  Newer Platforms (Experemental ):  Ion Torrent  PacBio RS  Oxford Nanopore.
  • 19.
    Roche 454 –Library Prep. Emulsion PCR (emPCR) 1. DNA fragmentations and adaptor ligation. 2. DNA fragments are added to an oil mixture containing millions of beads. 3. Emulsion PCR results in multiple copies of the fragment. 4. Beads are deposited on plate wells ready for sequencing.
  • 20.
    Roche 454 –Pyrosequencing M.L. Metzker, Nature Review Genetics(2010)
  • 21.
  • 22.
    Illumina – Sequenceby Synthesis 1. Add dye-labeled nucleotides. 2. Scan and detect nucleotide specific fluorescence. 3. Remove 3’ – blocking group (Reversible termination). 4. Cleave fluorescent group. 5. Rinse and Repeat. http://hmg.oxfordjournals.org
  • 23.
    Illumina – Sequencingby Synthesis Platform Run Time Yield (GB) Error Type Error Rate GAIIx 14 days 96 Sub 0.1 HiSeq 2000/2500 10/2 days 600/120 Sub 0.03* MiSeq 1 day 2 Sub 0.03*
  • 24.
    Illumina Advantages  High throughput/low cost.  Suitable for a wide rage of applications most notably whole genome sequencing. Disadvantages  Substitution error rates (recently improved).  Lagging strand dephasing causes sequence quality deterioration towards the end of read.
  • 25.
    SOLiD (Life/AB)  SOLiD:Sequencing by Oligonucleotide Ligation and Detection.  Library Prep: emPCR  “2-base encoding” – Instead of the typical single dNTP addition, two base matching probes are used. (possible 16 probes).  Color Space – four color sequencing encoding further increases accuracy.
  • 26.
    SOLiD (Life/AB) 1. Annealprimer and hybridize probe(8-mer). 2. Ligation and Detection. 3. Cleave fluorescent tail(3-mer). 4. Repeat ligation cycle. Repeat steps 1-4 with (n-1) primer.
  • 27.
    SOLiD – ColorSpace • 16 possible base combination are represented by 4 colors. • All possible sequencing combination need to be decoded.
  • 28.
    SOLiD – SNPDetection Summary – Due to the 2-base encoding system you get very low error rate. (Error rate of around .01) It is still however limited by short reads. (35bp)
  • 29.
    Ion Torrent (LifeTechnologies) J. M. Rothberg, et al. Nature (2011) 475:348-352 • Similar to pyrosequencing but uses semiconducting chip to detect dNTP incorporation. • The chip measure differences in pH. • Shown to have problems with homopolymer reads and coverage bias with GC-rich regions. • Ion Proton™ promises higher output and longer reads.
  • 30.
    PacBio RS  SingleMolecule Sequencing – instead of sequencing clonally amplified templates from beads (Pyro) or clusters (Illumina) DNA synthesis is detected on a single DNA strand. Zero-mode waveguide (ZMW) • DNA polymerase is affixed to the bottom of a tiny hole (~70nm). • Only the bottom portion of the hole is illuminated allowing for detection of incorporation of dye-labeled nucleotide.
  • 31.
    PacBio RS Advantages  Noamplification required.  Extremely long read lengths.  Average 2500 bp. Longest 15,000bp. Disadvantages  High error rates.
  • 32.
    Oxford Nanopore  AnnouncedFeb. 2012 at ABGT conference.  Measure changes in ion flow through nanopore.  Potential for long read lengths and short sequencing times. http://www2.technologyreview.com
  • 33.
    Nanopore sequencing  Thesystem uses the Staphylococcus auereus toxin α-hemolysin, a robust heptameric protein which normally forms holes in membranes.  DNA and RNA can be electrophoretically driven through a nanopore of suitable diameter (Kasianowicz J.J. et al 1996 PNAS)
  • 34.
    Nanopore sequencing –how does it work?  When a small voltage (~100 mV) is imposed across a nanopore in a membrane separating two chambers containing acqueous electrolytes, the ionic current through the pore can be measured  Molecules going through the nanopore cause disruption in the ionic current, and by measuring the disruption molecules can be identified. Lipid bilayer with high electronic resistant Ionic current Hemolysin
  • 35.
    Nanopore – strandsequencing  The DNA polymer passes through the nanopore itself  The nanopore is engineered to allow single-base resolution within the strand  A DNA polymerase, coupled with a α- hemolysin, synthesizes a new strand of DNA using as a template the polymer coming out of the pore DNA Polymer
  • 36.
    Nanopore sequencing  Advantages minimal sample preparation  no requirement for ploymerase and ligase  potential of very long read-lengths ( > 10,000 – 50,000 nt )  it might well achieve the $1,000 per mammalian genome goal  the instrument is inexpensive
  • 37.
    EPIGENOME Transcriptional active site Protein -DNA interaction Methylationanalysis METAGENOME  Microbial diversity  FORENSICS  MEDICINES  AGRICULTURE GENOME  DE NOVO  RESEQUENCING  EXOME SEQUENCING  ANCIENT DNA RNA-SEQ/WHOLE TRANCRIPTOME  mRNA Expression and Discovery  Alternative Splicing  microRNA Expression and discovery APPLICATIONS
  • 38.
    References Mardis, E. R.A decade’s perspective on DNA sequencing technology. Nature (2011) 470:198 - 203 Metzker, M.L. Sequencing technologies – the next generation. Nature Review Genetics(2010) 11:31 – 46 N. J. Loman, C. Constantinidou, Jacqueline Z. M. Chan, M. Halachev, M. Sergeant, C. W. Penn, E. R. Robinson & M. J. Pallen. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Review Microbiology 10, 599-606.

Editor's Notes

  • #20 Emulsion PCR Fragment DNA and attache adapters. PCR is carried out in a oil mixture containing beads and reagents. Each of the beads has a small adaptor that is complementary to the adaptor attached to the DNA fragment. Once the DNA is added to the emulsion ideally you get one fragment attached to each bead. You then carry out the PCR reaction which results in multiple clonal DNA fragments on each bead. The beads are then lowered on plates containing millions of wells roughly the size of each bead.
  • #21 Additional beads are added to each well containing Sulphurulase and Luciferase. Reagents are flowed through the top of the plates for the sequencing reaction to occur. A key step to the process is that only one nucleotide type (ex: cytosine) is added to the plate. If the added nucleotide is complementary to the template strand it is incorporated into the growing strand by the polymerase releasing a pyrophosphate. The pyrophosphate is then picked up by the catalytic enzymes causing a chain reaction which ultamately emits light. The light is then picked up by CCD camera and recoded on a flowgram. The cycle repeated between reagent and washing. A key limitation to this procedure is that if you have a chain of identical nucleotides on the template sequences you will get the addition of multiple nucleotides in a single reaction cycle. The light intensity will be proportional to the number of nucleotides added but the detection limit is around 6 or 7. The camera will therefore not be able to detect homopolymer runs that are longer than 6 or 7 nucleotides. The most common error rate in pyrosequencing is therefore indels which is around 1%. Substitution error rate however are very low which makes this platform great for targeted resequencing and studies where you are looking for single nucleotide substitutions.
  • #22 Illumina Library Prep is done through a process called bridge amplification. 1. PCR is done on glass plate called flow cells. 2. DNA is fragmented and ligate adaptors are added to both ends similar to emulsion PCR. 3. Single stranded fragments are attached to surface of flowcells. The flowcell contains the complementary primers to the fragment adaptors. 4. PCR reaction is carried out which extents the bridge. 5. Through the PCR cycle multiple single strand copies are made creating a dense cluster on the flowcell.
  • #24 Improve chemistry and detection for Illumina HiSeq/MiSeq
  • #29 Summary – Due to the 2-base encoding system you get very low error rate. (Error rate of around .01) It is still however limited by short reads. (35bp)
  • #31 Obvious advantage is no need for clonal amplification. Reduces amplification bias.
  • #32 Indels errors occur when multiple nucleotide are incorporated too fast for the camera to detect. Improvments to the technology primarily have come with advances in it’s chemistry. Primarily it’s polymerase enzyme. Improvements include higher nucleotide incorporation accuracy. Higher tolerance to heat ( longer reads ).