Ngs intro_v6_public


Published on

An Introduction to NGS(Next Generation Sequencing). Part 1. principle, machines and comparative analysis. by François PAILLIER

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ngs intro_v6_public

  1. 1. An Introduction to NGS(Next Generation Sequencing) François Paillier - 22/02/2011
  2. 2. Plan [ Reminder about Sanger Sequencing ]• NGS Definition• Overview of NGS technologies• NGS Applications & examples• Conclusion NOT discussed here : Sequence accuracy, assembly and sampling ; NGS data Analysis & BioInformatics tools
  3. 3. A word about Sanger Sequencing (First generation sequencing machine  Video) 3730xlPrinciple (only the tube G + dideoxyG) From gel to capillary Still a gold standard but capillary sequencing has reached its technical limitation (costs and performance will remain unchanged)
  4. 4. Short Reminder about « Classical » Assembly projects Sample  Libraries Target genome n Sequencing sub-projects Cloning SubTargets (BACs, cosmids, ..) Assembly Clone selection & Sequencing Finishing: Draft (Q40) Annotation Assembly Annotated Genome Other strategy : wgs
  5. 5. Sequencing, what for ? Assembly projects for example In bioinformatics, sequence assembly refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).Target genome Sequencing reads Assembly Assembled reads gap gap gap 4X Local coverage Consensusscaffold
  6. 6. Vocabulary that should be kept in mind in the sequencing field• Assembly : result of the sequence clustering based on their local similarity• Contig : A set of overlapping DNA segments• Coverage (in sequencing) : The mean number of times a nucleotide is sequenced in a genome (example: 10X coverage)• Scaffolds : A series of contigs that are in the right order but not necessarily connected in one contiguous stretch• Mate pairs Sequences known to be in the 3′ and 5′ of a contig from a single clone• WGS = Whole genome shotgun sequencing strategy• ESS = Environmental Shotgun Sequencing
  7. 7. NGS = Next Generation Sequencing After PCR,THE new revolution in Biology ?
  8. 8. NGS Synonym is : High-throughput Sequencing (HTS) Third Generation : NGS = HTS, Single Molecule Sequencing Second Generation : NGS = Massively Parallel SequencingFirst Generation :SANGER Sequencing
  9. 9. Overview of actual NGS technologies (Second generation sequencing machines)Year 2005* Roche, 454 GS-FLX Titanium Protocol a must Each machine with different : 2006 - Throughput - Sequence accuracy Illumina, GA1 then GA2 - Data formats (and programs) 2007 Applied Bio., Solid v3*NGS “proof of principle” was done in 2000 by Lynx Therapeutics : They publishes and markets "MPSS" - a parallelized,adapter/ligation-mediated, bead-based sequencing technology, launching "next-generation" sequencing.
  10. 10. Throughput perIllumina Channel
  11. 11. HOW is itPossible ? 
  12. 12. NGS PrincipleBuilding sequencing devices at nanoscale Polony : Discrete clonal amplifications of a single DNA molecule, grown in a gel matrix. The clusters can then be individually sequenced, producing short reads. Polony-based sequencing is the basis of most second generation sequencersA typical NGS Workflow is:1) Library construction2) Template CLONAL amplification3) Massively PARALLEL sequencing
  13. 13. High Parallelism is Achieved in Polony SequencingSanger Polony
  14. 14. Generation of Polony array: DNA Beads (454, SOLiD)DNA Beads are generated using Emulsion PCR
  15. 15. Generation of Polony array: DNA Beads (454, SOLiD) DNA Beads are placed in wells
  16. 16. Sequencing: Pyrosequencing (454) DNA Polymerase« pyrogram » / « Flowgram »
  17. 17. 454 Process : Emulsion PCR & Pyrosequencing Titanium = Read lengths approx. 400 nt 1 million reads / Run  400 Mb / day VIDEOs About Pyrosequencing 1’53’’: <here> Summary about GS Flex 4’34’’: <click here>
  18. 18. 454 GS FLX titaniumNo more Cloning step - Seq. Accuracy not so highFrom purified DNA to Sequencing (especially in case ofFit the laboratory bench top / small homopolymersLONG Sequences (400 nt)  Main error type is indelGS Junior system not so expensive - Cost : approx. 20K€ / GbCapabilities : Multiplexing & Cost per base is cheaper paired-ends (regarding Sanger) but still High regarding others NexGenWell fitted to : Machines - proK. Genome sequencing - RNA-seq
  19. 19. Illumina* : Bridge PCR GA2x Version = Read lengths approx. 100 nt 240 million reads  1500 Mb / day  30000 Mb / Run
  20. 20. Generation of Polony array: Bridge- PCR (Solexa)DNA fragments are attached to array and used as PCR templates<Watch VIDEO : Related Links  Video : Genome Analyzer workflow  Panel technology>
  21. 21. Illumina Chemistry : 4-color DNA sequencing-by-synthesis using reversible terminators with removable flourescent dyes 8 Lanes A Flow cell
  22. 22. Illumina seq. Accuracy
  23. 23. Illumina Throughput
  24. 24. IlluminaNo more Cloning stepFrom purified DNA to Sequencing - Machine is very expensiveFit the laboratory bench top / small Main error type is mismatchGood Sequence Accuracy - Read lengths are still too shortCapabilities : Multiplexing & Not fitted to big genomes paired-ends (Repeats)Cost : approx. 2K€ / Gb , Cost per - Poor coverage of AT rich regionsbase is cheaper than 454 - Most widely used NGS platform. - Requires least DNAWell fitted to : - proK. Genome sequencing - RNA-seq, ChIP-Seq, Methyl-Seq
  25. 25. SOLiD system : 4-color DNA Sequencing by Ligation SOLiD V3 = Read lengths approx. 50 nt 400 million reads  1500 Mb / day  20000 Mb / Run  1500€ / Gb <Watch Video> 4’46’’
  26. 26. Sequencing by ligation rxn: Fluorescently Labeled Nucleotides (ABI SOLiD)Complementar y strand elongation: DNA Ligase
  27. 27. Sequencing by ligation ABI SOLiD
  28. 28. Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD) 5 reading frames, each position is read twice
  29. 29. Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD)
  30. 30. SOLiDNo more Cloning stepFrom purified DNA or RNA to Seq. - This Technology is NOTFit the laboratory bench top / small IntuitiveGood Sequence Accuracy - Machine is VERY expensiveCapabilities : Multiplexing & paired-ends -HUGE amount of data produced (1500 Gb !!)Cost : approx. 1.5K€ / Gb , Cost perbase is cheaper than illumina -Long Run timesWell fitted to : -Has been demonstrated - REsequencing certain reads don’t match - RNA-seq, ChIP-Seq, Reference ! Methyl-Seq
  31. 31. Focusing NGS effort on predefined targets :« Target Enrichment » Technology (Capture Array)
  32. 32. Focusing NGS effort on predefined targets :« Target Enrichment » Technology (Capture Beads)
  33. 33. Summary : NGS Workflows +/- Target Enrichment Strategy Source: BCG
  34. 34. Prokaryotic Genome Sequencing Project as a mix of NGS technologies Conclusion : - High quality drafts can be produced for small genomes without any Sanger data input.- We found that 454 GSFLX and Solexa/Illumina show great complementarity in producing large contigs and supercontigs with a low error rate.
  35. 35. NGS ApplicationsDEEPER insight into biological processesBROADER sampling of populations (cells, viruses,Ecosystems…) • In different fields… – Metagenomics – Genomics – Transcriptomics – proteomics
  36. 36. Genome * De Novo Sequencing * Targeted Resequencing …for different(SNP, Indel, CNV) * Whole Genome Resequencing purposes… -Towards Personalized * Metagenome analyses Medicine - Biodiversity assessmentTranscriptome -De Novo Sequencing of * Gene Expression Profiling prokaryotic or eukaryotic genomes (or re-sequencing) * Small RNA Analysis -RNA-Seq  Annotation of * Whole Transcriptome Analysis eukaryotic genomes -SNP calling : identification ofEpigenome mutations * Chromatin Immunoprecipitation -Chip-Seq : identification of DNA/protein interactions Sequencing (ChIP-Seq) * Methylation Analysis
  37. 37. What is the current impact of NGS on Biology ?• Both transcriptomics and genomics can now be adressed using one technology with higher accuracy and robustess (instead of Sanger sequencing + µarrays p.e.) ( Example of RNA-SEQ)• SNP calling can rely on ultra-deep assemblies• Whole genome overview of transcription factors binding sites• Biodiversity assessment ( Metagenomics projects)• And so much more…
  38. 38. About whole-exome sequencing : « For the First Time, DNA Sequencing Technology Saves A Childs Life »« Proponents of genetic medicine say DNA sequencing is the future ofmedicine and that soon every truly sick person will have his or her genomesequenced. Critics cite privacy concerns and note that genetic mutations andvariations don’t necessarily lead to medical outcomes. Whatever theposition, it’s hard to argue that this isn’t good news: the first child – plaguedby undiagnosable illness – has been saved by DNA sequencing.That may be a bit of a strong statement – six-year-old Nicholas Volker isdoing well, though complications could soon arise. But it’s highly likely thatthe sequencing of young Nicholas’s genome saved his life. »<Link> <Article> Mayer & Al. Genetics IN Medicine • Volume xx, Number xx, 01 2011
  39. 39. What’s Next ? IonTorrent PacBio Roche, 454 GS-FLX TitaniumIllumina, GA2 Third Generation : - Single Molecule Sequencing (no bias) - FasterApplied BioSys, Solid v3 - Cheaper (or not)Second Generation : - 1000€ Human genome ?NGS = MassivelyParallel Sequencing(polony sequencing)
  40. 40. Conclusion : impact of NGS Global Shift to sequencing-based technologies Great improvements on-going : Higher throughput, longer reads Is it the end of µarrays ? A sub-part of NGS workflows restricted to target-enrichment ? Is it the end of forward genetics ? Reverse genetics only ? Biologists education should integrate NGS knowledge Is it the end of « Big sequencing centers »? change in their mission ?Next bottleneck : BioInformatics- Storing data a problem (SRA soon down ?) AND IT networks speedFAR too low  Very difficult to share NGS data  Fridges instead ofdisks !?- Analyzing data a problem  great improvements but still a lot of workremain to be done
  41. 41. Thanksfor your attention !
  42. 42. Technology Summary Read length Sequencing Throughput Cost Technology (per run) (1mbp)* Sanger ~800bp Sanger 400kbp 500$ 454 ~400bp Polony 500Mbp 60$ Solexa/Illumi 75bp Polony 20Gbp 2$ na SOLiD 75bp Polony 60Gbp 2$ Helicos 30-35bp Single 25Gbp 1$ molecule*Source: Shendure & Ji, Nat Biotech, 2008
  43. 43. NGS Technology Comparison ABI SOLiD Illumina GA 454 Roche FLXCost SOLiD 4: $495k IIe: $470k Titanium: $500k SOLiD PI: $240k IIx: $250k HiSeq: $690kQuantity SOLiD 4: 100Gb IIe: 20 - 38 Gb 450 Mbof Data SOLiD PI: 50Gb IIx: 50 – 95 Gbper run HiSeq: 200Gb +Run Time 7 Days 4 Days 9 HoursPros Low error rate due to Most widely used Short run time. Long dibase probes NGS platform. reads better for de Requires least DNA novo sequencingCons Long run times. Has Least multiplexing Expensive reagent been demonstrated capability of the 3. cost. Difficulty certain reads don’t Poor coverage of AT reading match reference rich regions homopolymer regions Source: The University of Western Ontario