Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Session 2.1.1 - VHIR, Barcelona)


Published on

Course: Bioinformatics for Biomedical Research (2014).
Session: 2.1.1- Next Generation Sequencing. Technologies and Applications. Part I: NGS Introduction and Technology Overview.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (, Barcelona.

Published in: Science, Technology, Business
  • Be the first to comment

NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Session 2.1.1 - VHIR, Barcelona)

  1. 1. 1 Vall d’Hebron Institut de Recerca (VHIR) Rosa Prieto Head of the High Tech Unit 15/05/2014 Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII) NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
  3. 3. 5 Introduction Personalized medicine era Biomarker identification: •Diagnostic •Susceptibility/risk (prevention) •Prognostic (indolent vs. aggressive) •Predictive (response) -The right therapeutic strategy for the right person at the right time -Predisposition to disease -Early and targeted prevention
  4. 4. 7 Introduction: “omics” “Omics” Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms (Wikipedia). Genomics High-throughput technologies Epigenomics Metagenomics Transcriptomics Proteomics Metabolomics Lipidomics
  5. 5. 8 Next generation sequencing The future is here, now? Everything can be sequenced…
  6. 6. 9 Introduction to NGS technologies 1st generation 2nd generation 3rd generation 3.234,83 Mb (haploid) $ 2,7 billion Automatic sequencer ABI 1987 (GS20)
  7. 7. Sequencing technology milestones First generation sequencing Second generation sequencing
  8. 8. NGS increases capacity and reduces costs Moore’s Law: the number of transistors in an integrated circuit duplicates in 2-years time (1965). Source - NHGRI : Date Cost per Mb Cost per Genome % cost vs. sep01 Sep-01 $5.292,39 $95.263.072 100% Sep-02 $3.413,80 $61.448.422 64,5039% Oct-03 $2.230,98 $40.157.554 42,1544% Oct-04 $1.028,85 $18.519.312 19,4402% Oct-05 $766,73 $13.801.124 14,4874% Oct-06 $581,92 $10.474.556 10,9954% Oct-07 $397,09 $7.147.571 7,5030% Oct-08 $3,81 $342.502 0,3595% Oct-09 $0,78 $70.333 0,0738% Oct-10 $0,32 $29.092 0,0305% Oct-11 $0,086 $7.743 0,0081% Oct-12 $0,074 $6.618 0,0069% Oct-13 $0,057 $5.096 0,0053% Jan-14 $0,045 $4.008 0,0042%
  9. 9. 1. Fragmentación de DNA 1. Fragmentación de DNA 2.Clonaje en Vectores; Transformación Bacterias; crecimiento y aislamiento vector DNA 2. Ligación de adaptadores in vitro y Amplificación clonal 3. Ciclo Secuenciación CTATGCTCG Secuencia: Primer: Polimerasa dNTPs ddNTPs marcados Electroforesis (1 Secuencia/Capilar) 3. Secuenciación masiva en paralelo 4. Procesamiento imagen y análisis de datos 4. Procesamiento imagen 1. Fragmentación de DNA 2. y 3. Ligación de adaptadores in vitro y Secuenciación masiva SIN Amplificación Sanger 2ªNGS 3ªNGS Sanger sequencing vs. NGS (2nd and 3rd generation) 4. Procesamiento imagen y análisis de datos
  10. 10. Comparison of different NGS platforms -Similarities (and differences vs. Sanger): •library preparation: starting material: short fragments of nucleic acids adapter ligation multiplexing (MID tags) •clonal amplification (not for 3rd generation sequencing) •massive parallel sequencing •the use of physical location to identify unique reads is a critical concept for all next generation sequencing systems. The density of the reads and the ability to record them without interfering noise is vital to the throughput of a given instrument. •signal needs to be processed and post-treated to get the individual sequences •complex data analysis due to the big amount of data -Differences: •Clonal amplification method/sequencing technology/signal detection •Throughput •Read-length •Run time •Cost per base
  11. 11. 16 Illumina Life Technologies ROCHE SOLID5500xl GS Junior 454GS FLX+ 454 HiSeq 2500 MiSeq NextSeq500 Benchtop Instruments 2ns generation NGS platforms IonPGMIonProton HiSeq X-Ten (exp.2014)
  12. 12. 17 DNA fragmentation and in vitro adaptor ligation Different kinds of libraries (amplicons, shot-gun, cDNA….) emulsion PCR bridge PCR 454 sequencing Illumina technologyIon Proton/PGM Pyrosequencing Semiconductor sequencing 4-colour fluorescent nucleotides 1 2 3 11 22 33 Library preparation Clonal amplification Cyclic array sequencing NGS general workflow
  13. 13. 18 -1 starting effective fragment per microreactor - ~106 microreactors per ml - All processed in parallel (Clonal amplification) High-speed shaker Clonal amplification by emPCR (454, Ion) emPCR based systems (Roche, SoLID, Ion)
  14. 14. 19 Clonal amplification by emPCR (454, Ion) Clonal amplification?? No empty beads No beads containing more than one amplified fragment 1) Bead vs. starting DNA quantity titration 2) Optimal enrichment: Melt dsDNA Unión de Primer marcado con Biotina a bolas de captura con ssDNA Adición de bolas magnéticas con estreptavidina Melt 5-20% OK
  15. 15. 20Generación de clusters: PCR “en puente” 100-200 millones de clusters HiSeq2500: 2 “flow-cells”, 8 carriles por celda Unión de cadenas sencillas a los adaptadores Eliminación de las cadenas reversas Bloqueo y adición primer secuenciación Clusters clonales de cadena doble Bridge amplification (Illumina)
  16. 16. 21 Metal coated PTP reduces crosstalk 29 μm well diameter (20/bead) 3,400,000 wells per PTP GS FLX 454 sequencing
  17. 17. 22 Pyrosequencing (sequencing by synthesis) CCD Camera “flowgram” (signal intensity is proportional to the number of nucleotides incorporated in the sequence) - throughput limited by the nº of wells in the PTP - errors in homopolymers :S (454) - long sequences (up to 1000bp) are achieved - low throughput, very expensive reagents GS FLX 454 sequencing
  18. 18. 23 Illumina sequencing - Limited by the fragment length than can “bridge” - Labelled nucleotides are not incorporated as efficiently as native ones - Short sequences -Strand-specific errors, substitutions towards the end of the read, base substitution errors (sistematic error GGT >GGG) -High throughput, expensive machines, cost per Mb OK Liberación secuencial de 4 nucleótidos fluorescentes Incorporación Captación de imagen Eliminación terminador 3’ Reversible dye terminator nucleotides (sequencing by synthesis)
  19. 19. 24 Fragmentación & secuencias adaptadoras 1. Liberación secuencial de nucleótidos no modificados 2. La incorporación de un nucleótido por la polimerasa libera un H+ 3. Detección directa y simultánea de un cambio de pH en todos los pocillos. ION TORRENT (Life Techn.) Amplificación clonal (emPCR sobre beads) Deposición de las beads+DNA en los pocillos del chip Ion Torrent sequencing •pHmeter, no optical system: rapid output improvement based on chips •Fast runs (native nucleotides) •Inexpensible machine and reagents •Fails in homopolymers detection
  20. 20. 25 NGS data analysis 454 sequencing Pyrosequencing
  21. 21. 26 PLATFORM ROCHE GS FLX+ 454 ILLUMINA HISEQ 2500 ION PROTON Library preparation emPCR Bridge amplification emPCR Sequencing chemistry Pyrosequencing Reversible dye terminators pH change Read length Up to 1000bp From 2x125 bp to 2x300 bp Up to 200 bp Run time 22 hrs 7 hrs-6 days From 2 to 4 hrs Throughput/run Up to 700 Mb 500-1000Gb (1Tb) 10Gb (PI), 100Gb (PII) Equipment Cost 500.000 $ 750.000 $ 250.000 $ Reagents Cost/run 8.000 $ 5.500 $ 1.000 $ GOOD! Longest read length High throughput/low cost per base/ease of use Quick, easy to use and cheap BAD! High error rate in homopolymers (>6); very expensive; low throughput; not automatized at all Short sequences Strand-specific errors, substitutions towards the end of the read, base substitution errors (sistematic error GGT >GGG) Errors in homopolymers Higher bias than Illumina NGS platforms comparison
  22. 22. 27 NGS High-Throughput Platforms comparison Two modes: Rapid Run and High Output Single/Dual Flow Cells PE 2 x 125 pb 120 Gb in 27 hours (Rapid) 1 Tb in 6 days (High) 20 exomes in a day 1 human genome in a day 30 RNAseq samples in 5 hours Human exome, 30x, aprox. 800-1000 € Human RNAseq (30Mreads, 100bp PE, strand specific): aprox. 800-1000 € Human whole genome 30x: 4000 € HiSeq Xten (10 HiSeqX) Only High Output mode Single/Dual Flow Cells PE 2 x 150 pb 600 Gb in a day (dual flow cell) 1.8 Tb in 3 days (4x faster than HiSeq2500) HiSeq XTen: 10.000 genomes at 30x per year Ion Proton Source: & Todos estos costes son orientativos a mayo de 2.014 y de ninguna manera vinculantes para la UAT Ion PI chip: Up to 20 Gb output (specific. 10 Gb) Read length:Up to 200 bp Run time: 2-4 hrs 1 human exome (aprox. 1000 €) Ion PII chip: Up to 100 Gb output (expected 2014), now reduced to 20-30 Gb at launch Run time: 2-4 hrs Read length: 100 pb Human Whole Genome (10x, ?) Ion PIII chip (???): 200 Gb output per run
  23. 23. 28 NGS Platforms specifications and applications Ion PGM/Ion ProtonIllumina
  24. 24. 29 Roche 454 NGS Platforms specifications and applications PacBio RSII (3rd generation)
  25. 25. 31 NGS advantages and limitations Journal of Investigative Dermatology (2013) 133