High-Throughput Sequencing


Published on

Talk on High-Throughput Sequencing: Overview and Selected Applications for Masters students. Nov 9th 2011

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Note this is a new version of the slide I gave you a few weeks ago – it has been updated; I’ve also taken out the data components which made it too busy
  • Are there others we should mention? Also need to flag up newly realised strengths e.g. power of paired end reads to generate single contig of a bacterial chromosome
  • Again your comments much valued here!
  • 95 & 63 strains
  • 95 & 63 strains
  • Change shading Ultimately, we detected three SNP loci which allowed us to distinguish between isolates in the outbreak. These are presented with reference to an unrelated Acinetobacter reference, AB57 which we consider has the “ancestral” state at these loci.
  • Referring back to the outbreak diagram we can plot these consequent genotypes onto each isolate. So when considering C1 it can be seen that it has a unique genotype compared with the others thus making it hard to make a compelling case for transmission from any of the military patients. But when we consider the case of C2 it can be seen that it shares the same genotype as M2 and M4. Given that M2 and C2 were in neighboring beds around week 4 but M4 did not come into contact with C2 at any point, we believe we can make a strong case for transmission from M2 and C2.
  • You can read the full story of these study in the Journal of Hospital Infection where it is available as an online pre-print. Our analyses support transmission of MDR-Aci from the wound of a military patient M2 to the respiratory tract of a civilian patient C2. As MDR-Aci was not isolated from C2 until several weeks after M2 left the adjacent bed, however, we cannot determine when and how transmission occurred. One possibility is that C2 became colonised when the two patients were nursed together, but that colonisation did not reach detectable levels in the sputum until much later. Another possibility is that M2 contaminated the local environment and C2 acquired the organism from the environment only after M2 had left the ward. This latter option would be consistent with a significant role of the environment.
  • High-Throughput Sequencing

    1. 1. <ul><li>PROFESSOR MARK PALLEN </li></ul><ul><li>UNIVERSITY OF BIRMINGHAM </li></ul>High-throughput sequencing Overview and selected applications
    2. 2. Outline <ul><li>What is high-throughput sequencing? </li></ul><ul><ul><li>How it works </li></ul></ul><ul><ul><li>Key considerations </li></ul></ul><ul><li>Applications </li></ul><ul><ul><li>Clinical Microbiology </li></ul></ul><ul><ul><li>Cancer Biology </li></ul></ul>
    3. 3. Conventional Sequencing <ul><li>Sanger dideoxy chemistry 1970s </li></ul><ul><li>Bacterial genome sequencing 1990s </li></ul><ul><ul><li>Whole-genome shotgun </li></ul></ul><ul><ul><li>Clonal populations of template molecules in vector/cloning host </li></ul></ul><ul><ul><li>Read lengths >500 bp </li></ul></ul><ul><ul><li>De novo assembly </li></ul></ul><ul><li>Drawbacks </li></ul><ul><ul><li>Time-consuming, expensive, onerous </li></ul></ul><ul><ul><ul><li>Beyond average project grant </li></ul></ul></ul><ul><ul><ul><li>Out of reach of university infrastructure </li></ul></ul></ul><ul><ul><li>Relies on colony propagation and picking </li></ul></ul><ul><ul><ul><li>Some sequences cannot be cloned </li></ul></ul></ul>
    4. 4. High-throughput Sequencing <ul><li>>100x faster, >100x cheaper! </li></ul><ul><ul><li>A disruptive technology </li></ul></ul><ul><li>Three “second-generation” technologies in the marketplace </li></ul><ul><ul><li>454 (Roche) </li></ul></ul><ul><ul><li>Solexa (Illumina) </li></ul></ul><ul><ul><li>SOLiD (ABI) </li></ul></ul><ul><li>Fundamentally new approaches </li></ul><ul><ul><li>Solid-phase amplification of clonal templates in “molecular colonies” </li></ul></ul><ul><ul><ul><li>Massive increase in number of “clones” compensates for shorter read length </li></ul></ul></ul><ul><ul><li>New chemistries for sequence reading </li></ul></ul><ul><ul><ul><li>Pyrophosphate detection (PPi release upon base addition): 454 </li></ul></ul></ul><ul><ul><ul><li>Reversible addition of fluorescent : Solexa </li></ul></ul></ul><ul><ul><ul><li>Sequencing by Ligation: SOLiD </li></ul></ul></ul>
    5. 5. Recent Developments <ul><li>Single-molecule sequencing </li></ul><ul><ul><li>Pacific Biosciences (PacBio) </li></ul></ul><ul><ul><li>Nanopore </li></ul></ul><ul><li>Benchtop sequencers </li></ul><ul><ul><li>Ion Torrent </li></ul></ul><ul><ul><li>MiSeq </li></ul></ul>
    6. 6. Sequencing in Birmingham
    7. 7. 454 Life Sciences (Roche)
    8. 8. 454 Life Sciences (Roche)
    9. 9. Solexa/Illumina Sequencing
    10. 10. SOLiD Sequencing Requires emPCR Long run-times Short read-lengths (stuck at 50bp) Sequences in colour space
    11. 11. Source: http://www.politigenomics.com/next-generation-sequencing-informatics *Now improved to 1 kb reads and choice of 3, 8 or 20 kb inserts #b=bases, B=bytes Vendor: Roche Illumina ABI Technology: 454 Solexa GA SOLiD Platform: GS20 FLX Ti I II IIx 1 2 3 Reads: (M) 0.5 0.5 1.25 28 100 150 40 115 320 Fragment Read length: 100 200 400* 35 50 100 25 35 50 Run time: (d) 0.25 0.3 0.4 3 3 5 6 5 8 Yield: (Gb#) 0.05 0.1 0.5 1 5 15 1 4 16 Rate: (Gb/d) 0.2 0.33 1.25 0.33 1.67 3 0.34 1.6 2 Images: (TB#) 0.01 0.01 0.03 0.5 1.1 2.8 1.8 2.5 1.9 Paired-end Read length: 200 400 2×35 2×50 2×100 2×25 2×35 2×50 Insert: (kb) 3.5 3.5* 0.2 0.2 0.2 3 3 3 Run time: (d) 0.3 0.4 6 10 10 12 10 16 Yield: (Gb) 0.1 0.5 2 9 30 2 8 32 Moore’s law applies! The Sequencing Singularity! Everything published is out of date!
    12. 12. Modes and Applications <ul><li>For some applications, 454 read length essential, e.g. </li></ul><ul><ul><li>amplicon sequencing; otherwise assembly will create chimeras </li></ul></ul><ul><ul><li>differential splicing; translocations </li></ul></ul><ul><li>For other applications read number is more important; read length less so </li></ul><ul><ul><li>Transcriptomics where 35 b read will identify transcript </li></ul></ul><ul><ul><li>SNP discovery/screening </li></ul></ul>
    13. 13. Modes and Applications <ul><li>Modes </li></ul><ul><ul><li>Basic shotgun ‘library’ </li></ul></ul><ul><ul><li>Paired-end or mate-pair shotgun </li></ul></ul><ul><ul><li>Amplicon sequencing </li></ul></ul><ul><li>Applications </li></ul><ul><ul><li>Whole genome </li></ul></ul><ul><ul><li>Metagenome, phylogenetic profiling </li></ul></ul><ul><ul><li>Transcriptome </li></ul></ul><ul><ul><li>SNP analysis; Splice variants; Methylation </li></ul></ul><ul><ul><li>Targeted sequence capture by microarray; Small RNAs </li></ul></ul>
    14. 14. Modes and Applications <ul><li>Sequencing run is the basic unit </li></ul><ul><ul><li>Basic cost of 454 or Illumina ~several £1000s per run in consumables & essential on-costs </li></ul></ul><ul><ul><li>Additions for consumables and/or staff time for </li></ul></ul><ul><ul><ul><li>multiple library preparation </li></ul></ul></ul><ul><ul><ul><li>some modes, e.g. paired end </li></ul></ul></ul><ul><ul><ul><li>data analysis etc </li></ul></ul></ul><ul><li>Run can be subdivided </li></ul><ul><ul><li>Plate-dividing gaskets (loss of wells) </li></ul></ul><ul><ul><li>Multiplex identifiers (MIDs or sequence barcodes) </li></ul></ul><ul><li>So cost per sample may be ~£10s not £1000s! </li></ul><ul><ul><li>But logistics of filling a plate may incur delays </li></ul></ul>
    15. 15. “ De novo assembly” versus “ alignment against template” (aka “re-sequencing”)
    16. 16. Bacterial Genomic Epidemiology <ul><li>Genome sequencing brings the advantages of </li></ul><ul><ul><li>open-endedness (revealing the “unknown unknowns”), </li></ul></ul><ul><ul><li>universal applicability </li></ul></ul><ul><ul><li>ultimate in resolution </li></ul></ul><ul><li>High-throughput platforms </li></ul><ul><ul><li>454, Illumina, PacBio </li></ul></ul><ul><ul><li>Expense and set-up puts them beyond average lab </li></ul></ul><ul><li>Bench-top sequencing platforms </li></ul><ul><ul><li>generate data sufficiently quickly and cheaply to have an impact on real-world clinical and epidemiological problems </li></ul></ul>
    17. 17. The Birth of Genomic Epidemiology for Bacteria
    18. 18. The Birth of Genomic Epidemiology for Bacteria
    19. 19. Sequencing in Birmingham @mjpallen @pathogenomenick #AAMTHI
    20. 20. Case Study Acinetobacter baumannii <ul><li>Gram-negative bacillus </li></ul><ul><li>Multi-drug resistant </li></ul><ul><ul><li>colistin and tigecycline as reserve agents </li></ul></ul><ul><ul><li>moving towards pan-resistance </li></ul></ul><ul><li>Associated with </li></ul><ul><ul><li>wound infections and ventilator-associated pneumonia </li></ul></ul><ul><ul><li>bloodstream infections </li></ul></ul><ul><ul><li>returning military personnel from Iraq and Afghanistan </li></ul></ul><ul><ul><li>transmission from military to civilian patients </li></ul></ul>
    21. 21. Acinetobacter baumannii : problems <ul><li>Hard to identify in clinical laboratory </li></ul><ul><ul><li>Two related genomospecies 3 and 13TU, (now A. pittii and A. nosocomialis) impossible to distinguish phenotypically </li></ul></ul><ul><li>Outbreak strains can be identified by PFGE, VNTR and gene-specific assays </li></ul><ul><ul><li>BUT mode of spread and transmission chains often uncertain, hindering optimal management of outbreaks and rational design of policies </li></ul></ul><ul><li>Mechanism of resistance hard to identify in individual cases </li></ul><ul><li>Poor understanding of pathogen biology </li></ul>
    22. 22. Applications and Questions <ul><li>Epidemiology </li></ul><ul><ul><li>Q1: Can whole-genome sequencing detect differences between isolates within an outbreak? </li></ul></ul><ul><ul><li>Q2: Can these differences be used to help determine chains of transmission? </li></ul></ul><ul><li>Emergence of Resistance </li></ul><ul><ul><li>Q3: Can it reveal how resistance emerges? </li></ul></ul><ul><li>Taxonomy and Identification </li></ul><ul><ul><li>Q4: Can it tell us what defines a species within a genus? </li></ul></ul>
    23. 23. Acinetobacter Genomic Epidemiology <ul><li>Outbreak in Birmingham Hospital in 2008 </li></ul><ul><li>Isolates indistinguishable by current typing methods </li></ul>
    24. 24. Acinetobacter Genomic Epidemiology <ul><li>454 whole-genome sequencing of 6 isolates </li></ul><ul><li>SNP detection by mapping reads against draft reference assembly </li></ul><ul><li>SNP filtering for false positives </li></ul><ul><li>SNP validation with Sanger sequencing of PCR amplicons </li></ul>
    25. 25. Outbreak isolates distinguishable at only three loci   SNP 1 SNP 2 SNP 3 AB0057 C A G M1 C A G M2 T A G M3 T A T M4 T A G C1 T T G C2 T A G
    26. 28. Before and after tigecycline therapy <ul><li>Genomes of two Acinetobacter baumannii isolates from single patient sequenced </li></ul><ul><ul><li>AB210 before tigecycline therapy (susceptible); 454 sequenced </li></ul></ul><ul><ul><li>AB211 after therapy (resistant); Illumina-sequenced </li></ul></ul>
    27. 29. Before and after tigecycline therapy <ul><li>Eighteen SNPs detected between AB210 and AB211 </li></ul><ul><ul><li>nine non-synonymous </li></ul></ul><ul><ul><li>including a SNP in adeS which accounts for resistance phenotype </li></ul></ul><ul><li>Three contigs in AB210 not covered by reads in AB211, representing three deletions of ~15, 44,17 kb </li></ul><ul><ul><li>mutS truncated; likely increase in mutation rate </li></ul></ul>
    28. 31. Ion Torrent Millions of wells reading sequences Microchip detects release of protons ~3 hour run-time ~£500 cost per run
    29. 36. Applications: Cancer Biology
    30. 37. Malignant Darwinism Mutational frequency heterogeneity analysis to become an integral component of molecular pathology Cancer is an evolutionary process
    31. 38. Applications: Cancer Biology <ul><li>Genome versus exome versus transcriptome </li></ul><ul><li>Even a transcriptome provides </li></ul><ul><li>abundance of RNAs </li></ul><ul><li>expressed mutations (point mutations, indels, inversions), alternative and novel splicing, gene fusions, RNA editing </li></ul>
    32. 39. Applications: Cancer Biology Deep precision measurements of mutation frequency in a tissue can be made using next generation sequencing of PCR amplicons spanning the mutation
    33. 41. Challenges <ul><li>In recent cancer genomes ~50% of predicted SNVs from the primary sequence data could not be revalidated. </li></ul><ul><li>Many private germline polymorphisms still exist in every individual, so additional qualification against germline DNA is always necessary to distinguish somatic variants </li></ul>
    34. 42. Applications: Cancer Biology <ul><li>Coding SNPs dominated by a few frequently mutated loci (oncogenes or tumour suppressors) </li></ul><ul><ul><li>long tail of population-infrequent SNPs </li></ul></ul><ul><ul><li>driver/passenger distinction </li></ul></ul><ul><ul><li>regulatory sequence mutations yet to be explored </li></ul></ul><ul><li>Hundreds of genomes for each cancer type required to make sense of the mutations seen? </li></ul><ul><li>BUT driver mutations in some cancer subtypes found with much smaller studies </li></ul><ul><ul><li>C134Y FOXL2 mutation in adult type granulosa cell tumours from the transcriptomes of four granulosa cell cases </li></ul></ul>
    35. 46. Multiple Displacement Amplification Single-cell Genomics Or FACS or dilution or microfluidics)
    36. 48. What will you do when you can sequence everything?
    37. 49. Further Information <ul><li>High-throughput sequencing technology </li></ul><ul><ul><li>http://pathogenomics.bham.ac.uk/blog </li></ul></ul><ul><ul><li>http://www.nature.com/nrg/journal/v11/n1/pdf/nrg2626.pdf </li></ul></ul><ul><ul><li>http://onlinelibrary.wiley.com/doi/10.1002/smll.200900976/pdf </li></ul></ul><ul><ul><li>http://dx.doi.org/10.1016/j.tibtech.2008.07.003 </li></ul></ul><ul><li>Clinical Microbiology </li></ul><ul><ul><li>Pallen, Loman, Penn High-throughput sequencing and clinical microbiology: progress, opportunities and challenges Current Opinion in Infectious Disease http://www.sciencedirect.com/science/journal/13695274 </li></ul></ul>
    38. 50. Further Information <ul><li>Cancer genomics </li></ul><ul><ul><li>http ://www.ncbi.nlm.nih.gov/pubmed/19921711,19918804,20016485,20164919,20016488,20200521, 20371490 </li></ul></ul><ul><ul><li>http://www.nature.com/nature/journal/v458/n7239/pdf/nature07943.pdf </li></ul></ul><ul><ul><li>http://www.nature.com/news/2010/100414/pdf/464972a.pdf </li></ul></ul><ul><ul><li>http://www.nature.com/nature/journal/v464/n7289/pdf/464678a.pdf </li></ul></ul><ul><ul><li>http://www.nature.com/nature/journal/v464/n7289/pdf/464679a.pdf </li></ul></ul><ul><ul><li>http://omicsomics.blogspot.com/2010/04/value-of-cancer-genomics.html </li></ul></ul><ul><ul><li>http://cancergenome.nih.gov/ </li></ul></ul><ul><ul><li>http://www.sanger.ac.uk/genetics/CGP/ </li></ul></ul><ul><ul><li>http://scienceonline.org/cgi/content/full/sci;327/5969/1074 </li></ul></ul><ul><ul><li>http://app2.capitalreach.com/esp1204/servlet/tc?cn=aacr&c=10165&s=20435&e=12623&&m=1&br=80&audio=false </li></ul></ul><ul><ul><li>http://app2.capitalreach.com/esp1204/servlet/tc?cn=aacr&c=10165&s=20435&e=12624&&m=1&br=80&audio=false </li></ul></ul>