Paolo Dametto
30.08.2011

Sequencing technologies –
the next generation
1953: Discovery of the structure of the DNA double helix

Nobel prize in Physiology or Medicine 1962
History of DNA sequencing


1953 Discovery of the structure of the DNA double helix



1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the
only accessible samples for sequencing were from bacteriophage or virus DNA.



1977 The first complete DNA genome to be sequenced is that of bacteriophage φX174



1977 Frederick Sanger publishes "DNA sequencing with chain-terminating inhibitors“



1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.



1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.



1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum,
Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae



1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) publish the first complete
genome of a free-living organism, the bacterium Haemophilus influenzae. The circular chromosome contains 1,830,137
bases and its publication in the journal Science marks the first use of whole-genome shotgun sequencing, eliminating the
need for initial mapping efforts.



1996 Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of
pyrosequencing



1998 Phil Green and Brent Ewing of the University of Washington publish "phred" for sequencer data analysis.



2001 A draft sequence of the human genome is published



2004 454 Life Sciences markets a parallelized version of pyrosequencing.The first version of their machine reduced
sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of a new generation of
sequencing technologies, after MPSS.
Sanger sequencing: chain-terminating inhibitors
A breakthrough: fluorescent chain-terminating inhibitors
ABI PRISM 377

First generation DNA sequencer
• Manual preparation of acrylamide gels
• Manual loading of samples
• Contigs of 500-600 bp
• 2.4 millions bp/year
(1000 years needed to sequence the human genome)

Automated DNA sequencer
• Capillary electrophoresis
• Costs reduced by 90%
• Human operation 15 min/day/machine
• 1 million bp/day

3730x/ DNA analyzer
Next-generation sequencing (NGS):
newer methods for DNA sequencing


The potential of NGS technologies is akin to the early days of PCR, with one’s
imagination being the primary limitation of its use (Metzker ML, 2010, Nature review)



NGS platforms produce an enormous volume of data cheaply, so it expands the
realm of experimentation beyond just determining the order of bases:


gene-expression studies (RNA-seq)



identification of rare transcripts without prior knowledge of a particular gene
alternative splicing identification



large-scale comparative and evolutionary studies



re-sequencing of human genomes to enhance our understanding of how genetic
differences affect health and disease
NGS technologies overview


The variety of NGS features makes it likely that multiple platforms coexist
in the marketplace, with some having clear advantages for particular
applications over others



NGS differs in template preparation, sequencing and imaging, and data
analysis

Commercially available technologies:

Roche/454

Illumina/Solexa

Helicos BioSciences

Life/APG – SOLiD system

Pacific Biosciences

Ion Torrent technology
Experimental

Nanopore sequencing
Roche/454 - Pyrosequencing
1.

Emulsion-based sample preparation (emPCR)

Several thousand
copies of the same
template sequence
on each bead

on average 1.6 million wells
Roche/454 - Pyrosequencing
2.

Pyrosequencing: non-electrophoretic, bioluminescence method that
measures the release of inorganic pyrophosphate by proportionally
converting it into visible light using a series of enzymatic reaction
DNA polymerase

(DNA)n + dNTP

(DNA) n+1 + PPi

Nucleotide incorporation generates light seen as a peak
in the Pyrogram trace
Video http://www.youtube.com/watch?v=kYAGFrbGl6E
Roche/454 - Pyrosequencing
3.

Imaging



Sequencing and de novo assembly of
the Mycoplasma genitalium genome




25 million bases in one four-hour run
96% coverage at 99.96% accuracy
100-fold increase in throughput over current
Sanger sequencing



Most of errors result from a broadening of
signal distribution, particularly for large
homopolymers (seven or more), leading
to ambiguous base call



Future directions:




increasing in throughput by miniaturization
of the fibre-optic reactors
improvements to reduce cross-talking
between adjacent wells
Roche/454 - Pyrosequencing


Applications





Whole genome sequencing
Targeted resequencing
Sequencing-based Transcriptome Analysis
Metagenomics



Over 1300 publications...
Illumina/Solexa
1.

Solid-phase amplification can produce 100-200 million spatially
separated clusters, providing free ends to which a universal sequencing
primer can be hybridized to initiate the NGS reaction
Illumina/Solexa
Sequencing by Cyclic Reversible Termination (CRT): CRT uses
reversible terminators in a cyclic method that comprises nucleotide
incorporation, fluorescence imaging and cleavage



1.

2.

3.

a DNA polymerase, bound to the primed template, adds or incorporates just one
fluorescently modified nucleotide
Unincorporated nucleotides are washed away and a four-color imaging is
acquired by total internal reflection fluorescence (TIFR) using two laser
A cleavage step (TCEP, a reducing agent) removes the terminating group
restoring the 3’-OH group and the fluorescent dye
Illumina/Solexa
3.

Imaging
Illumina/Solexa


Paired reads are very powerful in all areas of the analysis because they
provided very accurate read alignment and thus improved the accuracy and
coverage of consensus sequence and SNP calling

Video http://www.youtube.com/watch?v=77r5p8IBwJk
Illumina/Solexa





1861 publications...

Applications








DNA sequencing
Gene Regulation Analysis
Sequencing-based Transcriptome Analysis
SNPs and SVs discovery
Cytogenetic Analysis
ChIP-sequencing
Small RNA discovery analysis



A whole human genome sequence was determined in 8 weeks to an average depth
of ~ 40X, discovering ~ 4 new million SNPs and ~400000 SVs (with an accuracy
<1% for both over-calls and under-calls)



Considering the whole human genome sequencing as a clinical tool in the near
future: unravel the complexities of human variation in cancer and other diseases,
paving the way for the use of personal genome sequences in medicine and
healthcare
Helicos BioSciences

The use of PCR is problematic for two reasons:


1.

2.



PCR introduces an uncontrolled bias in template representation because its
efficiencies vary as a function of template properties
PCR introduces errors (generating false-positive SNPs)

Single-molecule sequencing has been developed to circumvent these
problems
Helicos BioSciences
1.

Template preparation: one pass-sequencing



The library preparation process is simple and fast and does not require the use of
PCR. It results in single-stranded poly(dA)-tailed templates



Poly(dT) oligonucleotides are covalently anchored to glass cover slip at random
positions, and they are used to capture the template strands and as primers for
sequencing
Helicos BioSciences

2. Sequencing
Each cycle consists of:


1.

2.

3.



adding the polymerase and one
of the labeled nucleotide
rinsing, imaging of multiple
positions
cleavage of the dye labels

224 cycles were performed to
sequence the genome of the
M13 virus to an average depth
of >150X with 100% coverage
Helicos BioSciences
3. Imaging



The system showed higher error rates compared to the previous platforms, mostly
due to multiple incorporations in the presence of homopolymers



The two-pass sequencing improved the overall quality
Helicos BioSciences


Template preparation: two pass-sequencing
Helicos BioSciences


ChIP-seq




Methy-seq




Pastor WA et al. (2011). Genome-wide mapping of
5-hydroxymethylcytosine in embryonic stem cells.
Nature. May 19;473(7347):394-7. Epub 2011 May 8

Direct RNA sequencing




Goren, A et al. (2010). Chromatin profiling by directly
sequencing small quantities of immunoprecipitated
DNA. Nat Methods 7, 47-49.

Ozsolak, F et al. (2010). Comprehensive
polyadenylation site maps in yeast and human
reveal pervasive alternative polyadenylation. Cell
143, 1018-1029.

cDNA-Based DGE, RNA-Seq and Small RNA
Sequencing




Ting, DT et al. (2011). Aberrant overexpression of
satellite repeats in pancreatic and other epithelial
cancers. Science 331, 593-6.
Lipson, D et al. (2009). Quantification of the yeast
transcriptome by single-molecule sequencing. Nat
Biotechnol 27, 652-658.

Video http://www.youtube.com/watch?v=TboL7wODBj4
Life/APG – SOLiD platform



Sequencing by ligation (SBL) uses another cyclic method that differs from
CRT in its use of DNA ligase and a two-base-encoded probes



Life/APG has commercialized their SBL platform called support
oligonucleotide ligation detection (SOLiD)
Life/APG – SOLiD platform
SOLiD sequencing Chemistry


Two-base-encoded probes: an oligonucleotide
sequence in which two interrogation bases are
associated with a particular dye
(e.g. AA, CC, GG, TT are encoded with a blue dye)


there are 16 possible combinations, each dye is
associated with 4



1,2-probes indicates that the first and second
nucleotides are the interrogation bases. The
remaining bases consist of either degenerate or
universal bases



A phosphorothiolate linkage is present between the
fifth and six nucleotides of the probe sequence,
which is then cleaved with silver ions.
Life/APG – SOLiD platform

1.

Emulsion-based sample preparation (emPCR)

2.

Chemical crosslinking to an amino-coated glass surface
Life/APG – SOLiD platform
3.

SBL protocol



Upon the annealing of a universal primer, a
library of 1,2-probes is added.
Ligation of complementary probes follows.



Four-color imaging



The ligated 1,2-probes are chemically
cleaved with silver ions to generate a 5’-PO 4
group



The SOLiD cycle is repeated 9 times
Life/APG – SOLiD platform
3.



SBL protocol
The extended primer is then stripped and four
more ligation rounds are performed, each with
ten ligation cycles
Life/APG – SOLiD platform








ChIP-seq

Chromatin immunoprecipitation
sequencing (ChIP-Seq) on the SOLiD™
System Publication: Nature Methods,
(2009)

Chromosome length influences replicationinduced topological stress
Publication: Nature (2011)
Methy-seq

Increased methylation variation in
epigenetic domains across cancer types
Publication: Nature Genetics (2011)
Metagenomics

The carnivorous bladderwort (Utricularia,
Lentibulaiceae) a system inflates
Publication: Journal of Experimental
Botany (2010)
cDNA-Based DGE, RNA-Seq and Small RNA
Sequencing

Evolution of yeast noncoding RNAs
reveals an alternative mechanism for
widespread Intron loss
Publication: Science (2010)

Video http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related
Pacific Biosciences
Pacific Biosciences


All the aforementioned methods use enzymatic activities and various
termination approaches, leading to short sequence reads (max. 350 bp)



Real-Time DNA sequencing wants to exploit the high catalytic rates and the
high processivity of the DNA polymerase, using the latter as a real-time
sequencing engine in order to obtain longer reads.
To fully harness the intrinsic speed, fidelity, and
processivity of the DNApol , several technical challenges must be met
simultaneously:


The speed at which each polymerase synthesizes DNA exhibits stochastic
fluctuation, so polymerases must be observed individually



A high nucleotide concentration is required, so a reduction in the observation
volume which allow single-molecule detection is needed



DNApol has to work with 100% fluorescently labeled dNTPs



A surface chemistry is required to retain the activity of DNApol and inhibits
nonspecific absorption of labeled dNTPs
Pacific Biosciences


Single Molecule Real Time (SMRT) DNA sequencing



The zero-mode waveguide (ZMW) design reduces the observation volume down to the zeptolitre
range (10-21 l ), reducing the number of stray fluorescently labeled molecules that enter the
detection layer for a given period



The residence time of phospholinked nucleotides in the active site is usually on the millisecond
scale, and that correspond to a recorded fluorescence pulse
Pacific Biosciences

Video

http://www.youtube.com/watch?v=_B_cUZ8hSYU
Pacific Biosciences


An initial accuracy of the reading
was estimated at 83% at 1X.
Common mistakes were insertion,
deletion and mismatches.




Up to 15X, the authors demonstrated
that the accuracy is >99%

In 2009, Pacific Biosciences
reported improvements to their
platform. E.Coli was sequenced at
38X covering 99.3% of the genome,
with an accuracy of >99.999%


average read length: 964 bp
Comparison of next-generation sequencing platforms
NGS technologies and personal genomes


Human genome studies aim to catalogue SNPs and SVs and their
association to phenotypic differences, with the eventual goal of personalized
genomics for medical purposes > Pharmacogenomics


Somatic mutations associated with acute myeloid leukemia have been identified
using Illumina/Solexa (Ley T.J. et al. 2008 Nature)



Elucidation of both allelic variants in a family with a recessive form of Charcotmarie-Tooth disease using the SOLiD platform (Lupsky J.R. et al. in press N.Engl.J.Med.)



The Cancer Genome Atlas aims at discovering SNPs and SVs associated with
major cancers (The Cancer Genome Atlas Research Network, 2011 Nature)



Beijing Genomics Institute (BGI) is working on the “1000 Plant & Animal
Reference Genomes Project" aiming at generating reference genomes for 1,000
economically and scientifically important plant/animal species. They use
Illumina/Solexa and SOLiD platforms
Sequencing services and the $1,000 genome


Illumina announced a personal genome sequencing service that
provides 30-fold base coverage for the price of $48,000.



Complete Genomics offers a similar service with 40-fold coverage
priced at $5,000. It is based on a business model that is reliant on
huge customers volume. They use a newly optimized SBL protocol
which uses a combinatorial probe anchor ligation (cPAL). Reagents:
$4,400



The greatest challenge for current technology developers consists in
closing the gap between $10,000 and $1,000 for a single genome.
The timetable for the $1,000 draft genome is difficult to predict
Nanopore sequencing?
Nanopore sequencing


The system uses the Staphylococcus auereus toxin α-hemolysin, a robust
heptameric protein which normally forms holes in membranes.



DNA and RNA can be electrophoretically driven through a nanopore of
suitable diameter (Kasianowicz J.J. et al 1996 PNAS)
Nanopore sequencing – how does it work?
Hemolysin



When a small voltage (~100 mV) is imposed across
a nanopore in a membrane separating two
chambers containing acqueous electrolytes, the
ionic current through the pore can be measured



Molecules going through the nanopore cause
disruption in the ionic current, and by measuring
the disruption molecules can be identified.

Ionic current

Lipid bilayer with high electronic resistant
Nanopore – exonuclease sequencing
Exonuclease

DNA to be sequenced

Aminocycledextrin adaptor
Nanopore – strand sequencing
DNA Polymerase


The DNA polymer passes through
the nanopore itself



The nanopore is engineered to
allow single-base resolution within
the strand



A DNA polymerase, coupled with a
α-hemolysin, synthesizes a new
strand of DNA using as a template
the polymer coming out of the pore

Video nanopore: http://www.youtube.com/watch?v=_rRrOT9gfpo&feature=related
Nanopore sequencing


Advantages








minimal sample preparation
no requirement for polymerase or ligase
potential of very long read-lengths ( > 10,000 – 50,000 nt )
it might well achieve the $1,000 per mammalian genome goal
the instrument is inexpensive

Challenges





to slow down DNA translocation from microseconds per base to milliseconds
to reduce stochastic motion of the DNA molecule in transit in order to decrease
the signal/noise ratio
a stable support for the hemolysin heptamer
Ion torrent technology



http://lifetech-it.hosted.jivesoftware.com/videos/1016
New Generation Sequencing Technologies: an overview

New Generation Sequencing Technologies: an overview

  • 1.
  • 2.
    1953: Discovery ofthe structure of the DNA double helix Nobel prize in Physiology or Medicine 1962
  • 3.
    History of DNAsequencing  1953 Discovery of the structure of the DNA double helix  1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.  1977 The first complete DNA genome to be sequenced is that of bacteriophage φX174  1977 Frederick Sanger publishes "DNA sequencing with chain-terminating inhibitors“  1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.  1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.  1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae  1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) publish the first complete genome of a free-living organism, the bacterium Haemophilus influenzae. The circular chromosome contains 1,830,137 bases and its publication in the journal Science marks the first use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts.  1996 Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of pyrosequencing  1998 Phil Green and Brent Ewing of the University of Washington publish "phred" for sequencer data analysis.  2001 A draft sequence of the human genome is published  2004 454 Life Sciences markets a parallelized version of pyrosequencing.The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of a new generation of sequencing technologies, after MPSS.
  • 4.
  • 5.
    A breakthrough: fluorescentchain-terminating inhibitors ABI PRISM 377 First generation DNA sequencer • Manual preparation of acrylamide gels • Manual loading of samples • Contigs of 500-600 bp • 2.4 millions bp/year (1000 years needed to sequence the human genome) Automated DNA sequencer • Capillary electrophoresis • Costs reduced by 90% • Human operation 15 min/day/machine • 1 million bp/day 3730x/ DNA analyzer
  • 6.
    Next-generation sequencing (NGS): newermethods for DNA sequencing  The potential of NGS technologies is akin to the early days of PCR, with one’s imagination being the primary limitation of its use (Metzker ML, 2010, Nature review)  NGS platforms produce an enormous volume of data cheaply, so it expands the realm of experimentation beyond just determining the order of bases:  gene-expression studies (RNA-seq)   identification of rare transcripts without prior knowledge of a particular gene alternative splicing identification  large-scale comparative and evolutionary studies  re-sequencing of human genomes to enhance our understanding of how genetic differences affect health and disease
  • 7.
    NGS technologies overview  Thevariety of NGS features makes it likely that multiple platforms coexist in the marketplace, with some having clear advantages for particular applications over others  NGS differs in template preparation, sequencing and imaging, and data analysis Commercially available technologies:  Roche/454  Illumina/Solexa  Helicos BioSciences  Life/APG – SOLiD system  Pacific Biosciences  Ion Torrent technology Experimental  Nanopore sequencing
  • 8.
    Roche/454 - Pyrosequencing 1. Emulsion-basedsample preparation (emPCR) Several thousand copies of the same template sequence on each bead on average 1.6 million wells
  • 9.
    Roche/454 - Pyrosequencing 2. Pyrosequencing:non-electrophoretic, bioluminescence method that measures the release of inorganic pyrophosphate by proportionally converting it into visible light using a series of enzymatic reaction DNA polymerase (DNA)n + dNTP (DNA) n+1 + PPi Nucleotide incorporation generates light seen as a peak in the Pyrogram trace Video http://www.youtube.com/watch?v=kYAGFrbGl6E
  • 10.
    Roche/454 - Pyrosequencing 3. Imaging  Sequencingand de novo assembly of the Mycoplasma genitalium genome    25 million bases in one four-hour run 96% coverage at 99.96% accuracy 100-fold increase in throughput over current Sanger sequencing  Most of errors result from a broadening of signal distribution, particularly for large homopolymers (seven or more), leading to ambiguous base call  Future directions:   increasing in throughput by miniaturization of the fibre-optic reactors improvements to reduce cross-talking between adjacent wells
  • 11.
    Roche/454 - Pyrosequencing  Applications     Wholegenome sequencing Targeted resequencing Sequencing-based Transcriptome Analysis Metagenomics  Over 1300 publications...
  • 13.
    Illumina/Solexa 1. Solid-phase amplification canproduce 100-200 million spatially separated clusters, providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction
  • 14.
    Illumina/Solexa Sequencing by CyclicReversible Termination (CRT): CRT uses reversible terminators in a cyclic method that comprises nucleotide incorporation, fluorescence imaging and cleavage  1. 2. 3. a DNA polymerase, bound to the primed template, adds or incorporates just one fluorescently modified nucleotide Unincorporated nucleotides are washed away and a four-color imaging is acquired by total internal reflection fluorescence (TIFR) using two laser A cleavage step (TCEP, a reducing agent) removes the terminating group restoring the 3’-OH group and the fluorescent dye
  • 15.
  • 16.
    Illumina/Solexa  Paired reads arevery powerful in all areas of the analysis because they provided very accurate read alignment and thus improved the accuracy and coverage of consensus sequence and SNP calling Video http://www.youtube.com/watch?v=77r5p8IBwJk
  • 17.
    Illumina/Solexa   1861 publications... Applications        DNA sequencing GeneRegulation Analysis Sequencing-based Transcriptome Analysis SNPs and SVs discovery Cytogenetic Analysis ChIP-sequencing Small RNA discovery analysis  A whole human genome sequence was determined in 8 weeks to an average depth of ~ 40X, discovering ~ 4 new million SNPs and ~400000 SVs (with an accuracy <1% for both over-calls and under-calls)  Considering the whole human genome sequencing as a clinical tool in the near future: unravel the complexities of human variation in cancer and other diseases, paving the way for the use of personal genome sequences in medicine and healthcare
  • 18.
    Helicos BioSciences The useof PCR is problematic for two reasons:  1. 2.  PCR introduces an uncontrolled bias in template representation because its efficiencies vary as a function of template properties PCR introduces errors (generating false-positive SNPs) Single-molecule sequencing has been developed to circumvent these problems
  • 19.
    Helicos BioSciences 1. Template preparation:one pass-sequencing  The library preparation process is simple and fast and does not require the use of PCR. It results in single-stranded poly(dA)-tailed templates  Poly(dT) oligonucleotides are covalently anchored to glass cover slip at random positions, and they are used to capture the template strands and as primers for sequencing
  • 20.
    Helicos BioSciences 2. Sequencing Eachcycle consists of:  1. 2. 3.  adding the polymerase and one of the labeled nucleotide rinsing, imaging of multiple positions cleavage of the dye labels 224 cycles were performed to sequence the genome of the M13 virus to an average depth of >150X with 100% coverage
  • 21.
    Helicos BioSciences 3. Imaging  Thesystem showed higher error rates compared to the previous platforms, mostly due to multiple incorporations in the presence of homopolymers  The two-pass sequencing improved the overall quality
  • 22.
  • 23.
    Helicos BioSciences  ChIP-seq   Methy-seq   Pastor WAet al. (2011). Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells. Nature. May 19;473(7347):394-7. Epub 2011 May 8 Direct RNA sequencing   Goren, A et al. (2010). Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA. Nat Methods 7, 47-49. Ozsolak, F et al. (2010). Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143, 1018-1029. cDNA-Based DGE, RNA-Seq and Small RNA Sequencing   Ting, DT et al. (2011). Aberrant overexpression of satellite repeats in pancreatic and other epithelial cancers. Science 331, 593-6. Lipson, D et al. (2009). Quantification of the yeast transcriptome by single-molecule sequencing. Nat Biotechnol 27, 652-658. Video http://www.youtube.com/watch?v=TboL7wODBj4
  • 24.
    Life/APG – SOLiDplatform  Sequencing by ligation (SBL) uses another cyclic method that differs from CRT in its use of DNA ligase and a two-base-encoded probes  Life/APG has commercialized their SBL platform called support oligonucleotide ligation detection (SOLiD)
  • 25.
    Life/APG – SOLiDplatform SOLiD sequencing Chemistry  Two-base-encoded probes: an oligonucleotide sequence in which two interrogation bases are associated with a particular dye (e.g. AA, CC, GG, TT are encoded with a blue dye)  there are 16 possible combinations, each dye is associated with 4  1,2-probes indicates that the first and second nucleotides are the interrogation bases. The remaining bases consist of either degenerate or universal bases  A phosphorothiolate linkage is present between the fifth and six nucleotides of the probe sequence, which is then cleaved with silver ions.
  • 26.
    Life/APG – SOLiDplatform 1. Emulsion-based sample preparation (emPCR) 2. Chemical crosslinking to an amino-coated glass surface
  • 27.
    Life/APG – SOLiDplatform 3. SBL protocol  Upon the annealing of a universal primer, a library of 1,2-probes is added. Ligation of complementary probes follows.  Four-color imaging  The ligated 1,2-probes are chemically cleaved with silver ions to generate a 5’-PO 4 group  The SOLiD cycle is repeated 9 times
  • 28.
    Life/APG – SOLiDplatform 3.  SBL protocol The extended primer is then stripped and four more ligation rounds are performed, each with ten ligation cycles
  • 29.
    Life/APG – SOLiDplatform     ChIP-seq  Chromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD™ System Publication: Nature Methods, (2009)  Chromosome length influences replicationinduced topological stress Publication: Nature (2011) Methy-seq  Increased methylation variation in epigenetic domains across cancer types Publication: Nature Genetics (2011) Metagenomics  The carnivorous bladderwort (Utricularia, Lentibulaiceae) a system inflates Publication: Journal of Experimental Botany (2010) cDNA-Based DGE, RNA-Seq and Small RNA Sequencing  Evolution of yeast noncoding RNAs reveals an alternative mechanism for widespread Intron loss Publication: Science (2010) Video http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related
  • 30.
  • 31.
    Pacific Biosciences  All theaforementioned methods use enzymatic activities and various termination approaches, leading to short sequence reads (max. 350 bp)  Real-Time DNA sequencing wants to exploit the high catalytic rates and the high processivity of the DNA polymerase, using the latter as a real-time sequencing engine in order to obtain longer reads. To fully harness the intrinsic speed, fidelity, and processivity of the DNApol , several technical challenges must be met simultaneously:  The speed at which each polymerase synthesizes DNA exhibits stochastic fluctuation, so polymerases must be observed individually  A high nucleotide concentration is required, so a reduction in the observation volume which allow single-molecule detection is needed  DNApol has to work with 100% fluorescently labeled dNTPs  A surface chemistry is required to retain the activity of DNApol and inhibits nonspecific absorption of labeled dNTPs
  • 32.
    Pacific Biosciences  Single MoleculeReal Time (SMRT) DNA sequencing  The zero-mode waveguide (ZMW) design reduces the observation volume down to the zeptolitre range (10-21 l ), reducing the number of stray fluorescently labeled molecules that enter the detection layer for a given period  The residence time of phospholinked nucleotides in the active site is usually on the millisecond scale, and that correspond to a recorded fluorescence pulse
  • 33.
  • 34.
    Pacific Biosciences  An initialaccuracy of the reading was estimated at 83% at 1X. Common mistakes were insertion, deletion and mismatches.   Up to 15X, the authors demonstrated that the accuracy is >99% In 2009, Pacific Biosciences reported improvements to their platform. E.Coli was sequenced at 38X covering 99.3% of the genome, with an accuracy of >99.999%  average read length: 964 bp
  • 35.
    Comparison of next-generationsequencing platforms
  • 36.
    NGS technologies andpersonal genomes  Human genome studies aim to catalogue SNPs and SVs and their association to phenotypic differences, with the eventual goal of personalized genomics for medical purposes > Pharmacogenomics  Somatic mutations associated with acute myeloid leukemia have been identified using Illumina/Solexa (Ley T.J. et al. 2008 Nature)  Elucidation of both allelic variants in a family with a recessive form of Charcotmarie-Tooth disease using the SOLiD platform (Lupsky J.R. et al. in press N.Engl.J.Med.)  The Cancer Genome Atlas aims at discovering SNPs and SVs associated with major cancers (The Cancer Genome Atlas Research Network, 2011 Nature)  Beijing Genomics Institute (BGI) is working on the “1000 Plant & Animal Reference Genomes Project" aiming at generating reference genomes for 1,000 economically and scientifically important plant/animal species. They use Illumina/Solexa and SOLiD platforms
  • 38.
    Sequencing services andthe $1,000 genome  Illumina announced a personal genome sequencing service that provides 30-fold base coverage for the price of $48,000.  Complete Genomics offers a similar service with 40-fold coverage priced at $5,000. It is based on a business model that is reliant on huge customers volume. They use a newly optimized SBL protocol which uses a combinatorial probe anchor ligation (cPAL). Reagents: $4,400  The greatest challenge for current technology developers consists in closing the gap between $10,000 and $1,000 for a single genome. The timetable for the $1,000 draft genome is difficult to predict Nanopore sequencing?
  • 39.
    Nanopore sequencing  The systemuses the Staphylococcus auereus toxin α-hemolysin, a robust heptameric protein which normally forms holes in membranes.  DNA and RNA can be electrophoretically driven through a nanopore of suitable diameter (Kasianowicz J.J. et al 1996 PNAS)
  • 40.
    Nanopore sequencing –how does it work? Hemolysin  When a small voltage (~100 mV) is imposed across a nanopore in a membrane separating two chambers containing acqueous electrolytes, the ionic current through the pore can be measured  Molecules going through the nanopore cause disruption in the ionic current, and by measuring the disruption molecules can be identified. Ionic current Lipid bilayer with high electronic resistant
  • 41.
    Nanopore – exonucleasesequencing Exonuclease DNA to be sequenced Aminocycledextrin adaptor
  • 42.
    Nanopore – strandsequencing DNA Polymerase  The DNA polymer passes through the nanopore itself  The nanopore is engineered to allow single-base resolution within the strand  A DNA polymerase, coupled with a α-hemolysin, synthesizes a new strand of DNA using as a template the polymer coming out of the pore Video nanopore: http://www.youtube.com/watch?v=_rRrOT9gfpo&feature=related
  • 43.
    Nanopore sequencing  Advantages       minimal samplepreparation no requirement for polymerase or ligase potential of very long read-lengths ( > 10,000 – 50,000 nt ) it might well achieve the $1,000 per mammalian genome goal the instrument is inexpensive Challenges    to slow down DNA translocation from microseconds per base to milliseconds to reduce stochastic motion of the DNA molecule in transit in order to decrease the signal/noise ratio a stable support for the hemolysin heptamer
  • 44.

Editor's Notes

  • #3 Erwin Chargaff rules: 1) units of guanine equals the units of cytosin and the same is for A and T 2) different percentages among different organisms He met Crick and Watson in 1952