Next Gen Sequencing (NGS) Technology Overview


Published on

This is part of a talk I gave at MSFT a few years ago

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Next Gen Sequencing (NGS) Technology Overview

  1. 1. Next Gen Sequencing [NGS] • History of DNA Sequencing – Maxam-Gilbert – Sanger – ABI • NGS Technologies: – 454, Illumina, PacBio, ABI, Helicos, – Ion Torrent, Nanopores • Applications: – Genomes, RNASeq, ChIPSeq, CGH, CancerGenome , Environmental Human Genome: 1990-2000 Presented by Dominic Suciu, Ph.D.
  2. 2. Preliminaries: Central Dogma Gene ~ Protein ~ Enzyme Gene (DNA) [Program in directory] Protein (PolyPeptide) [Program in RAM] ~~ Enzyme ~~ Functional agent Messenger RNA Genome (DNA) [Hard drive]
  3. 3. Preliminaries: Phages BacterioPhages are viruses that infect bacteria Some Bacteria are immune to certain phages [Hamilton O. Smith, early 70‟s] Restriction Endonucleases: Enzymes that specifically cleave certain DNA sequences. Bacterial cells use these as a crude anti-phage defense mechanisms
  4. 4. Preliminaries: Restriction Enzymes • Molecular scissors • Their discovery allowed researchers to physically map genomes • Big confirmatory clue that Genome sequence determines species and even individuals
  5. 5. Preliminaries: Cloning Start with picograms of DNA End up with microgarms of highly purified copies Each Colony is highly enriched Each colony is endlessly amplifyable pBR322: is a vector, an engineered phage. It can reproduce itself inside a bacterial host and do nothing else.
  6. 6. Preliminaries: PCR [1985] As long as you know the beginning and end of a sequence, you can amplify anything
  7. 7. Deconstructing Sequencing • DNA source: gel-purified fragment, cloning product, random fragmentation. • DNA Amplification: need enough to be able to detect signal given off by base interrogation • DNA Seq Method: Base interrogation method to uniquely detect G,A,T,C bases. • Sequence Positioning: Need an organizing principal to place these bases into a sequence. The methods presented here represent unique ways to solve each of these issues
  8. 8. Maxam-Gilbert 1975 Fragment population distribution corresponds to appearance of base within sequence
  9. 9. Maxam-Gilbert 1975 Chemical Sequencing Issues: • Need perfectly pure single species of DNA • Nasty Chemicals • Radioactive End-labeling • 4-lanes/read • Sequence only what you can purify Advantages: - 1st DNA sequencing available - 2-300 bp/read Fragment population distribution corresponds to appearance of base within sequence
  10. 10. Sanger “Sequencing-by-Synthesis” 1977 Issues: - Radioactive End-labeling - 4-lanes/read - Sequencing gels Advantages: - 4-500 bp/reads - Radioactive Incorporation - Primer gives you control dNTP ddNTP
  11. 11. PCR Dye-Terminator 1990‟s Issues: - Sequencing gels - 1 run/day Advantages: - 600-700 bp/reads - 96 reads/run - Each terminator dye has a different color. Lets you combine all 4 reactions in one lane. - Single lane/read - Primer gives you control
  12. 12. Human Genome Project (15 years) Hierarchical Shotgun Sequencing [start1990] - Randomly insert Human DNA into BAC clones (~150kbp each) - Combine these BAC clones to create a scaffold of the human genome. Each BAC clone will be mapped to a region on a Human Chromosome - Pass BAC clones to different Genome Centers throughout US - At each center, each vector is sequenced using shotgun sequencing - Wait 15 years for results.
  13. 13. Issues with Shotgun Sequencing • Reads-> contigs -> scaffolds -> genome reconstruction • Repeat regions can confuse Contig assemblers. • It was hoped that by focusing each shotgun run to a single 40-150kb region, these issues would be minimized. • According to Venter, it simply multiplied the number of times one encountered the same problem
  14. 14. Shotgun Sequencing: Venter 1997 Same approach is used throughout NGS Paired-end sequencing: 1. Randomly cut genomic DNA. 2. Use Gel-purification to make three libraries of random DNA fragments: 2kb, 10kb, 50kb 2. Sequence from both ends. 3. Use distance information to assemble contigs into scaffolds. Distance information allows you to „jump‟ over repeat regions. This approach allowed Venter to „jump‟ over the federal sequencing project
  15. 15. NGS Revolution: Roche / 454 -> [2005] ABI 3700 state of the art in 1997 - 1 sample per rxn (96 rxns) in 2 hrs - Each sample had to be individually manipulated 454 solved both these problems PPi + H+ Paired-end reads can be done by including both primers on each micro-bead Emulsion PCR:
  16. 16. Roche / 454 -> [2005] • emPCR: No need for cells • Each well is a single sequencing run. • Very fast reaction
  17. 17. Illumina [Solexa 2007] No need for Cell-based amplification Bridge Amplification: PCR on a surface
  18. 18. Illumina Advantages: • No need for cells • Each cluster of DNA molecules is a single reaction. • Enormous amounts of reads • Paired ends Sequence from both sides. Disadvantages: • Slow • Short reads • Reagent costs
  19. 19. Ion Torrent/LifeTechnologies [2010] Method: • Emulsion PCR • Each bead is placed in a single well. • CHEAP/Rugged Disadvantages: • Low density • Sample prep PPi + H+
  20. 20. ABI-SOLiD Advantages: • Extremely accurate Disadvantages: • Takes a long time • Expensive reagent costs 12/cycles/position
  21. 21. Complete Genomics Advantages: • Whole genome in 3 months • 40x coverage!!! Disadvantages: • Labor Intensive Takes a long time: 3 months sample prep • Expensive: $10-20k/GENOME • No Instrument: CRO model
  22. 22. Helicos Advantages: • No amplification Single Molecule Detection Disadvantages: • It doesn‟t work 8-10 days
  23. 23. PacBio Key Factors: • Zero-mode waveguide • Zeptoliter vol • Continuous process • Lariat sequencing • Low reagent costs Disadvantages: • Low Num reads
  24. 24. Next-Next Generation: NanoPores Illumina/Oxford Nanopore Roche/IBM all-semiconductor Stratos genomics NabSys (Graphene monolayer)
  25. 25. Applications: Genome Sequencing Sequencing of whole genomes: bacterial, animal, human. De novo Genome Sequencing: Even with the large number of reads, putting a genome together from raw sequence reads is still a non-trivial task, due to sample prep and inherent complexity. Re-sequencing: Sequencing individual with a genetic disease in order to find hereditary mutations. Read depth allows one to compute allele- frequencies. 454: Due to its long reads, this method is best for de novo. Useful for scaffolding. SOLiD, Illumina: used for re-sequencing SOLiD: wins out due to accuracy loses based on complexity/cost Complete Genomics: CRO model, depth 40x
  26. 26. Applications: Exon Sequencing Mutational screening: what are the mutations in the actual coding regions? Most heritable disease models have mutations in the coding regions. Use enrichment to focus sequencing to expressed space. Then make as many reads as possible in order to accurately compute mutations. Illumina, 454, ABI
  27. 27. Enrichment: Microarrays are Not dead! Why?: In order to focus sequencing run on the region you are interested in. Ex: • Expressed region of genome (1%) • Genes of interest: mutational studies. Three ways: • Micro-droplet PCR: each droplet has unique set of amplification primers. • MIP-PCR • On-chip enrichment, using microarrays. • On-bead enrichment: make oligo pools, use them to capture targets for sequencing.
  28. 28. Two approaches for finding causative mutation responsible for Miller Syndrome Sequence Whole Genome: Complete Genomics • Sequenced Mother, Father and 2 kids (both affected) 1 kindred • Regions where they share both copies from parents (22%) • Both diseases are rare: look for locations with low prevalence SNP‟s (dbSNP) • Narrowed down to 4 genes • 2 of these were found to be causative agent in exome sequencing study Exome Array: Just sequence expressed sequence space (1%): Illumina GAII • Sequenced genomes from 4 affected individuals in 3 kindreds • Found 4600 mutants • Ignored any previously discovered SNPs from dbSNP • Looked for mutations that appeared in all 3 kindreds • Focused on damaging mutations Non-synonymous, stop codon • Discovered causative locus by elimination
  29. 29. Applications: RNA-Seq Microarrays are Dead! Don‟t have to design probes ahead of time, just sequence mRNA and count number of sequences for each gene. Read count ~ Expression level In environmental genomics, sequencing can be used to determine which genes are being expressed in a sample. Illumina: Only method that has the read depth to get useful spread between high and low-expressed genes. Its Dynamic Range far surpasses microarrays in this respect, especially for smaller genomes.
  30. 30. Applications: ChIP-Seq ChIP Chromosomal Immune Precipitation Illumina, ABI-SOLiD Where does my DNA- binding transcription factor bind within the genome?
  31. 31. Environmental Genomics GAM: Genome Annotation Machine: • Genome Annotation • Gene Identification • Comparative Genomics • Functional characterization • Phylogenetic char. • Protein Structural char. whowhat
  32. 32. Summary