Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

High Throughput Sequencing Technologies: What We Can Know


Published on

Presentation on the pitfalls of short read sequencing and some solutions. Detailed slide notes loosely follow what I said

Published in: Science
  • Be the first to comment

High Throughput Sequencing Technologies: What We Can Know

  1. 1. High Throughput Sequencing Technologies: What We Can Know Brian Krueger, PhD Duke University Center for Human Genome Variation
  2. 2. 2nd Generation Sequencing Overview Fragmented DNA Align reads to a reference genome Ligate Adaptors Add Bases Bind Library and create clusters Repeat Hundreds of times on billions of Wash Wash clusters Cleave Image Sequencing Cycle Genomic DNA
  3. 3. 2nd Generation Sequencing Advances • V3 System Chemistry – 300GB per Flowcell – 11 Days to Data – Genome: $4700, Exome: $790 • V4 System Chemistry – 600GB per Flowcell – 6 Days to Data – Genome: $3000, Exome: $640 • X System Chemistry – 1GB per Patterned Flowcell – 3 Days to Data – Genome: $1500, Exome: $500
  4. 4. Techniques for Acquiring Data • Whole Genome Sequencing – Obtain whole blood or tissue sample – Create sequencing libraries of all DNA fragments • Whole Exome Sequencing – Utilizes a selection protocol to fish out ONLY coding DNA sequences – Create sequencing libraries from enriched DNA – Reduces cost and analysis time • Custom Capture – Same protocol as Exome sequencing – Only target desired DNA sequences • Amplicon Sequencing – Use PCR to amplify target DNA – Sequence amplified DNA (Amplicon) • RNA-Seq – Extract RNA, capture mRNA, convert to cDNA – Used for differential gene expression analyses, RNA isoform detection
  5. 5. CCoommmmoonn DDNNAA MMuuttaattiioonnss Chromosome Sequence variants Structural variants Referenc Single nucleotide variant Small insertion Small deletion Deletion Duplication A B C C D Inversion A B D C Translocation e A B C D ATCGGGTCATGTCA A B C D ATCGGGTCATATCA A B C D ATCGGGTCATGACGTCA A B C D ATCGGGTCAT A C D A B E F G Credit: Elizabeth Ruzzo, PhD, CHGV
  6. 6. Disadvantages of Current Techniques • Amplification errors – All polymerases have an inherent error rate (10-6-10-7) • GC bias – PCR bias against GC rich sequences – Exome capture bias against GC rich sequences • Trouble detecting small insertions and deletions – Capture baits may not hybridize well – Capture cannot be used to reliably detect large CNVs • Cannot be used for De novo assembly – Read length too short to span long repeat regions – Not good for detecting trinucleotide repeat expansions • Miss large structural variations – Translocations and inversions likely will be missed – Require significant read depth at break points for these variations to be detected • Trouble with RNA-seq isoform detection – Like large structural variations, hard to accurately detect all splice isoforms using short read technology A B B B C D A B B B B C D A B B B B B B C D X A C D X A E F G
  7. 7. Solutions! • Solutions for many of these problems exist – As always, come at a cost • Whole Genome Sequencing - $1500 – Reduce Exome Artifacts • Better Indel Detection and higher coverage in high GC regions • Can be used to detect large copy number variations • PCR Free Whole Genome Sequencing – Reduces amplification bias and polymerase error artifacts • WGS will miss large structural variations (Inversions, Translocations, microsatellites) – Combine with long read technologies – Added cost of $1000-$10,000 – Higher cost = better detection
  8. 8. Long-ish Read Sequencing Technologies • Mate-Pair Sequencing – Insert size increased from 300bp to 3-8KB – Sequence ends of mate-pairs to pair reads over much longer distances – Use short reads to fill gaps – Adds $1000 to Genome cost
  9. 9. Long-ish Read Sequencing Technologies • Illumina Synthetic Long Reads – Fragment Genomic DNA to 10KB – Dilute across a 384 well plate – Fragment clonal 10KB fragments into 300bp fragments and barcode – Sequence fragments and use barcodes to re-create the long reads synthetically – Use as a short read scaffold to perform De Novo sequencing – Has been used in HLA sequencing and De Novo assembly of the Drosophila genome including accurate mapping of 80% of the transposable elements – Adds $1800 to Genome cost 10kb fragmentation Barcoding and clonal amp Nextera prep Sequencing
  10. 10. True Long Read Sequencing Technologies • Defined as single molecule sequencing • Less complex sample prep and much longer read length (1-100kb) compared to 200-400bp for 2nd Gen • Two categories – Sequencing by synthesis • Pioneered by Pacific Biosciences • Sequencer uses super microscopes and polymerase bound nanowells to WATCH DNA as it is sequenced in real time • Nanowells filled with DNA bases • Fluorescence of base only detected at the polymerase – Direct sequencing by passing DNA through a nanopore • Bases fed through a membrane bound nanopore • Ionic difference between both sides of the membrane • Detect how ion flow changes at the pore as each base passes through • Oxford Nanopore, Base4, Stratos Genomics, Genia • Bleeding edge technology – Many technical hurdles with very high error rates (10-40%) – Current best use is to create scaffolds for De Novo assembly – Very expensive technology • Costs 3-10x as much as Illumina to do whole genome sequencing PacBio Oxford Nanopore
  11. 11. Questions?? • Reading/Viewing Material: • Sequencing Methods Ecosystem - review.pdf • Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive transposable elements - • Characterization of the human ESC transcriptome by hybrid sequencing - • Nanopore Sequencing Web Conference - v=UtXlr19xTh8