Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

New Generation Sequencing Technologies: an overview


Published on

Adapted version of my technical Journal club presentation on the new sequencing technologies.

Published in: Technology

New Generation Sequencing Technologies: an overview

  1. 1. Paolo Dametto 30.08.2011 Sequencing technologies – the next generation
  2. 2. 1953: Discovery of the structure of the DNA double helix Nobel prize in Physiology or Medicine 1962
  3. 3. History of DNA sequencing  1953 Discovery of the structure of the DNA double helix  1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.  1977 The first complete DNA genome to be sequenced is that of bacteriophage φX174  1977 Frederick Sanger publishes "DNA sequencing with chain-terminating inhibitors“  1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.  1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.  1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae  1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) publish the first complete genome of a free-living organism, the bacterium Haemophilus influenzae. The circular chromosome contains 1,830,137 bases and its publication in the journal Science marks the first use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts.  1996 Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of pyrosequencing  1998 Phil Green and Brent Ewing of the University of Washington publish "phred" for sequencer data analysis.  2001 A draft sequence of the human genome is published  2004 454 Life Sciences markets a parallelized version of pyrosequencing.The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of a new generation of sequencing technologies, after MPSS.
  4. 4. Sanger sequencing: chain-terminating inhibitors
  5. 5. A breakthrough: fluorescent chain-terminating inhibitors ABI PRISM 377 First generation DNA sequencer • Manual preparation of acrylamide gels • Manual loading of samples • Contigs of 500-600 bp • 2.4 millions bp/year (1000 years needed to sequence the human genome) Automated DNA sequencer • Capillary electrophoresis • Costs reduced by 90% • Human operation 15 min/day/machine • 1 million bp/day 3730x/ DNA analyzer
  6. 6. Next-generation sequencing (NGS): newer methods for DNA sequencing  The potential of NGS technologies is akin to the early days of PCR, with one’s imagination being the primary limitation of its use (Metzker ML, 2010, Nature review)  NGS platforms produce an enormous volume of data cheaply, so it expands the realm of experimentation beyond just determining the order of bases:  gene-expression studies (RNA-seq)   identification of rare transcripts without prior knowledge of a particular gene alternative splicing identification  large-scale comparative and evolutionary studies  re-sequencing of human genomes to enhance our understanding of how genetic differences affect health and disease
  7. 7. NGS technologies overview  The variety of NGS features makes it likely that multiple platforms coexist in the marketplace, with some having clear advantages for particular applications over others  NGS differs in template preparation, sequencing and imaging, and data analysis Commercially available technologies:  Roche/454  Illumina/Solexa  Helicos BioSciences  Life/APG – SOLiD system  Pacific Biosciences  Ion Torrent technology Experimental  Nanopore sequencing
  8. 8. Roche/454 - Pyrosequencing 1. Emulsion-based sample preparation (emPCR) Several thousand copies of the same template sequence on each bead on average 1.6 million wells
  9. 9. Roche/454 - Pyrosequencing 2. Pyrosequencing: non-electrophoretic, bioluminescence method that measures the release of inorganic pyrophosphate by proportionally converting it into visible light using a series of enzymatic reaction DNA polymerase (DNA)n + dNTP (DNA) n+1 + PPi Nucleotide incorporation generates light seen as a peak in the Pyrogram trace Video
  10. 10. Roche/454 - Pyrosequencing 3. Imaging  Sequencing and de novo assembly of the Mycoplasma genitalium genome    25 million bases in one four-hour run 96% coverage at 99.96% accuracy 100-fold increase in throughput over current Sanger sequencing  Most of errors result from a broadening of signal distribution, particularly for large homopolymers (seven or more), leading to ambiguous base call  Future directions:   increasing in throughput by miniaturization of the fibre-optic reactors improvements to reduce cross-talking between adjacent wells
  11. 11. Roche/454 - Pyrosequencing  Applications     Whole genome sequencing Targeted resequencing Sequencing-based Transcriptome Analysis Metagenomics  Over 1300 publications...
  12. 12. Illumina/Solexa 1. Solid-phase amplification can produce 100-200 million spatially separated clusters, providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction
  13. 13. Illumina/Solexa Sequencing by Cyclic Reversible Termination (CRT): CRT uses reversible terminators in a cyclic method that comprises nucleotide incorporation, fluorescence imaging and cleavage  1. 2. 3. a DNA polymerase, bound to the primed template, adds or incorporates just one fluorescently modified nucleotide Unincorporated nucleotides are washed away and a four-color imaging is acquired by total internal reflection fluorescence (TIFR) using two laser A cleavage step (TCEP, a reducing agent) removes the terminating group restoring the 3’-OH group and the fluorescent dye
  14. 14. Illumina/Solexa 3. Imaging
  15. 15. Illumina/Solexa  Paired reads are very powerful in all areas of the analysis because they provided very accurate read alignment and thus improved the accuracy and coverage of consensus sequence and SNP calling Video
  16. 16. Illumina/Solexa   1861 publications... Applications        DNA sequencing Gene Regulation Analysis Sequencing-based Transcriptome Analysis SNPs and SVs discovery Cytogenetic Analysis ChIP-sequencing Small RNA discovery analysis  A whole human genome sequence was determined in 8 weeks to an average depth of ~ 40X, discovering ~ 4 new million SNPs and ~400000 SVs (with an accuracy <1% for both over-calls and under-calls)  Considering the whole human genome sequencing as a clinical tool in the near future: unravel the complexities of human variation in cancer and other diseases, paving the way for the use of personal genome sequences in medicine and healthcare
  17. 17. Helicos BioSciences The use of PCR is problematic for two reasons:  1. 2.  PCR introduces an uncontrolled bias in template representation because its efficiencies vary as a function of template properties PCR introduces errors (generating false-positive SNPs) Single-molecule sequencing has been developed to circumvent these problems
  18. 18. Helicos BioSciences 1. Template preparation: one pass-sequencing  The library preparation process is simple and fast and does not require the use of PCR. It results in single-stranded poly(dA)-tailed templates  Poly(dT) oligonucleotides are covalently anchored to glass cover slip at random positions, and they are used to capture the template strands and as primers for sequencing
  19. 19. Helicos BioSciences 2. Sequencing Each cycle consists of:  1. 2. 3.  adding the polymerase and one of the labeled nucleotide rinsing, imaging of multiple positions cleavage of the dye labels 224 cycles were performed to sequence the genome of the M13 virus to an average depth of >150X with 100% coverage
  20. 20. Helicos BioSciences 3. Imaging  The system showed higher error rates compared to the previous platforms, mostly due to multiple incorporations in the presence of homopolymers  The two-pass sequencing improved the overall quality
  21. 21. Helicos BioSciences  Template preparation: two pass-sequencing
  22. 22. Helicos BioSciences  ChIP-seq   Methy-seq   Pastor WA et al. (2011). Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells. Nature. May 19;473(7347):394-7. Epub 2011 May 8 Direct RNA sequencing   Goren, A et al. (2010). Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA. Nat Methods 7, 47-49. Ozsolak, F et al. (2010). Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143, 1018-1029. cDNA-Based DGE, RNA-Seq and Small RNA Sequencing   Ting, DT et al. (2011). Aberrant overexpression of satellite repeats in pancreatic and other epithelial cancers. Science 331, 593-6. Lipson, D et al. (2009). Quantification of the yeast transcriptome by single-molecule sequencing. Nat Biotechnol 27, 652-658. Video
  23. 23. Life/APG – SOLiD platform  Sequencing by ligation (SBL) uses another cyclic method that differs from CRT in its use of DNA ligase and a two-base-encoded probes  Life/APG has commercialized their SBL platform called support oligonucleotide ligation detection (SOLiD)
  24. 24. Life/APG – SOLiD platform SOLiD sequencing Chemistry  Two-base-encoded probes: an oligonucleotide sequence in which two interrogation bases are associated with a particular dye (e.g. AA, CC, GG, TT are encoded with a blue dye)  there are 16 possible combinations, each dye is associated with 4  1,2-probes indicates that the first and second nucleotides are the interrogation bases. The remaining bases consist of either degenerate or universal bases  A phosphorothiolate linkage is present between the fifth and six nucleotides of the probe sequence, which is then cleaved with silver ions.
  25. 25. Life/APG – SOLiD platform 1. Emulsion-based sample preparation (emPCR) 2. Chemical crosslinking to an amino-coated glass surface
  26. 26. Life/APG – SOLiD platform 3. SBL protocol  Upon the annealing of a universal primer, a library of 1,2-probes is added. Ligation of complementary probes follows.  Four-color imaging  The ligated 1,2-probes are chemically cleaved with silver ions to generate a 5’-PO 4 group  The SOLiD cycle is repeated 9 times
  27. 27. Life/APG – SOLiD platform 3.  SBL protocol The extended primer is then stripped and four more ligation rounds are performed, each with ten ligation cycles
  28. 28. Life/APG – SOLiD platform     ChIP-seq  Chromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD™ System Publication: Nature Methods, (2009)  Chromosome length influences replicationinduced topological stress Publication: Nature (2011) Methy-seq  Increased methylation variation in epigenetic domains across cancer types Publication: Nature Genetics (2011) Metagenomics  The carnivorous bladderwort (Utricularia, Lentibulaiceae) a system inflates Publication: Journal of Experimental Botany (2010) cDNA-Based DGE, RNA-Seq and Small RNA Sequencing  Evolution of yeast noncoding RNAs reveals an alternative mechanism for widespread Intron loss Publication: Science (2010) Video
  29. 29. Pacific Biosciences
  30. 30. Pacific Biosciences  All the aforementioned methods use enzymatic activities and various termination approaches, leading to short sequence reads (max. 350 bp)  Real-Time DNA sequencing wants to exploit the high catalytic rates and the high processivity of the DNA polymerase, using the latter as a real-time sequencing engine in order to obtain longer reads. To fully harness the intrinsic speed, fidelity, and processivity of the DNApol , several technical challenges must be met simultaneously:  The speed at which each polymerase synthesizes DNA exhibits stochastic fluctuation, so polymerases must be observed individually  A high nucleotide concentration is required, so a reduction in the observation volume which allow single-molecule detection is needed  DNApol has to work with 100% fluorescently labeled dNTPs  A surface chemistry is required to retain the activity of DNApol and inhibits nonspecific absorption of labeled dNTPs
  31. 31. Pacific Biosciences  Single Molecule Real Time (SMRT) DNA sequencing  The zero-mode waveguide (ZMW) design reduces the observation volume down to the zeptolitre range (10-21 l ), reducing the number of stray fluorescently labeled molecules that enter the detection layer for a given period  The residence time of phospholinked nucleotides in the active site is usually on the millisecond scale, and that correspond to a recorded fluorescence pulse
  32. 32. Pacific Biosciences Video
  33. 33. Pacific Biosciences  An initial accuracy of the reading was estimated at 83% at 1X. Common mistakes were insertion, deletion and mismatches.   Up to 15X, the authors demonstrated that the accuracy is >99% In 2009, Pacific Biosciences reported improvements to their platform. E.Coli was sequenced at 38X covering 99.3% of the genome, with an accuracy of >99.999%  average read length: 964 bp
  34. 34. Comparison of next-generation sequencing platforms
  35. 35. NGS technologies and personal genomes  Human genome studies aim to catalogue SNPs and SVs and their association to phenotypic differences, with the eventual goal of personalized genomics for medical purposes > Pharmacogenomics  Somatic mutations associated with acute myeloid leukemia have been identified using Illumina/Solexa (Ley T.J. et al. 2008 Nature)  Elucidation of both allelic variants in a family with a recessive form of Charcotmarie-Tooth disease using the SOLiD platform (Lupsky J.R. et al. in press N.Engl.J.Med.)  The Cancer Genome Atlas aims at discovering SNPs and SVs associated with major cancers (The Cancer Genome Atlas Research Network, 2011 Nature)  Beijing Genomics Institute (BGI) is working on the “1000 Plant & Animal Reference Genomes Project" aiming at generating reference genomes for 1,000 economically and scientifically important plant/animal species. They use Illumina/Solexa and SOLiD platforms
  36. 36. Sequencing services and the $1,000 genome  Illumina announced a personal genome sequencing service that provides 30-fold base coverage for the price of $48,000.  Complete Genomics offers a similar service with 40-fold coverage priced at $5,000. It is based on a business model that is reliant on huge customers volume. They use a newly optimized SBL protocol which uses a combinatorial probe anchor ligation (cPAL). Reagents: $4,400  The greatest challenge for current technology developers consists in closing the gap between $10,000 and $1,000 for a single genome. The timetable for the $1,000 draft genome is difficult to predict Nanopore sequencing?
  37. 37. Nanopore sequencing  The system uses the Staphylococcus auereus toxin α-hemolysin, a robust heptameric protein which normally forms holes in membranes.  DNA and RNA can be electrophoretically driven through a nanopore of suitable diameter (Kasianowicz J.J. et al 1996 PNAS)
  38. 38. Nanopore sequencing – how does it work? Hemolysin  When a small voltage (~100 mV) is imposed across a nanopore in a membrane separating two chambers containing acqueous electrolytes, the ionic current through the pore can be measured  Molecules going through the nanopore cause disruption in the ionic current, and by measuring the disruption molecules can be identified. Ionic current Lipid bilayer with high electronic resistant
  39. 39. Nanopore – exonuclease sequencing Exonuclease DNA to be sequenced Aminocycledextrin adaptor
  40. 40. Nanopore – strand sequencing DNA Polymerase  The DNA polymer passes through the nanopore itself  The nanopore is engineered to allow single-base resolution within the strand  A DNA polymerase, coupled with a α-hemolysin, synthesizes a new strand of DNA using as a template the polymer coming out of the pore Video nanopore:
  41. 41. Nanopore sequencing  Advantages       minimal sample preparation no requirement for polymerase or ligase potential of very long read-lengths ( > 10,000 – 50,000 nt ) it might well achieve the $1,000 per mammalian genome goal the instrument is inexpensive Challenges    to slow down DNA translocation from microseconds per base to milliseconds to reduce stochastic motion of the DNA molecule in transit in order to decrease the signal/noise ratio a stable support for the hemolysin heptamer
  42. 42. Ion torrent technology 