Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using field-based DNA sequencing to accelerate phylogenomics

265 views

Published on

Invited seminar at the Department of Zoology, Oxford University, 30th November 2016.

Summary of our field-based real-time phylogenomics (MinION DNA sequencing) experiments this year, and applicability to broad-scale tree-of-life phylogenomics and macroevolutionary biology.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Using field-based DNA sequencing to accelerate phylogenomics

  1. 1. Using field-based DNA sequencing to accelerate phylogenomics Joe Parker Royal Botanic Gardens, Kew Department of Zoology, Oxford University 30th November 2016
  2. 2. Outline Intro - Real-time phylogenomics - Ubiquitous sequencing - Implications
  3. 3. Images – Wikimedia commons CC BY-SA (clockwise from top left: Jeroen Rouwkema, @aGastya, author’s own, @RE73)
  4. 4. Phylogenetics  Phylogenomics Stewart et al. (1987): 5 species 1 gene 130 amino acids 2 phylogenies doi:10.1038/330401a0 Parker et al. (2013): 22 species 2,326 genes 600,000 amino acids (~700,000,000 simulated) 14 test phylogenies 100 control phylogenies doi:10.1038/nature12511
  5. 5. Real-time phylogenomics Clockwise from top-right: Authors’ own; ONT Ltd; Quick et al. (2016) http://dx.doi.org/10.1038/nature16996; Loose et al. (2016) http://dx.doi.org/10.1038/nmeth.3930 Nanopores…
  6. 6. Field-based DNA sequencing
  7. 7. Field-based sequencing: design and questions Arabidopsis sequencing in Snowdonia National Park: Congeneric species with genomes available: - Arabidopsis thaliana - Arabidopsis lyrata ssp. petraea Sample, extract and prepare libraries in the field Sequence in the field with MinION ‘Rapid 1D’ Replicate lab-based library prep and MiSeq data
  8. 8. Field-based sequencing: design and questions Questions: Will field-based DNA extraction, library preparation and sequencing work? Will the data produced be any good? In any quantity? Can we use it for congeneric species ID? By what method? What else might we do with the data?
  9. 9. Field-based sequencing: Real-time phylogenomics kit
  10. 10. Field-based sequencing: Real-time phylogenomics kit
  11. 11. Field-based sequencing: Yield and performance Total field-sequenced yield >300Mbp (~1.9x) ~80% accuracy Sequencer yield affected by reagent conditions and sample quality as much as by field conditions
  12. 12. Field-based sequencing: ID using BLAST Assign ID using BLASTN versus reference genomes; difference statistic calculated e.g. BLAST 1 database (=true pos): A. thaliana A.lyrata1 A. thaliana A.lyrata1 BLAST2 db (=true neg): A.lyrata A. thaliana A.lyrata A. thaliana Sequencing platform: ONT 1D ONT 1D MiSeq MiSeq Total reads: 91,715 25,839 9,476,598 9,659,489 Total hits present DB1 only (TP): 10,322 76 1,078,986 1,491,474 Total hits present DB2 only (FP): 378 2 29,200 12,613 Total hits present in both @ e ≤ 0.0012 22,386 101 4,251,153 3,829,870 Hit Biases3: Cumulative length 29,636,139 20,424 91,328,759 65,181,756 Cumulative % identities 850,070 119 25,692,193 19,737,356 Cumulative evalues 0.01 (0.00) 0.04 0.02 Mean length 1,324 202.22 21.48 17.02 Mean % identities 37.97 1.18 6.04 5.15 Mean evalues 4.80E-07 -3.89581E-11 1.06E-08 5.46E-09
  13. 13. Field-based sequencing: ID using BLAST Assign ID using BLASTN versus reference genomes; difference statistic calculated e.g.
  14. 14. Field-based sequencing: ID using BLAST Assign ID using BLASTN versus reference genomes; difference statistic calculated e.g. MinION MiSeq ~50% accuracy
  15. 15. Field-based genomics: assembly and annotation Species Arabidopsis thaliana A. lyrata ssp. petraea Data MiSeq, 300bp MiSeq + MinION MiSeq Hybrid Assembler Abyss hybridSPAdes Abyss hybridSPAdes # contigs 24,999 10,643 37,568 85,599 Largest contig 89,717 413,462 101,114 38,313 Total length 106,455,313 119,031,074 151,562,895 117,256,694 Reference length 119,667,750 119,667,750 183,707,801 183,707,801 N50 7,853 48,730 9,605 1,686 Unaligned length 7,121,882 6,737,059 36,669,847 35,287,390 Genome fraction (%) 82.0 88.7 53.4 43.7 Duplication ratio 1.01 1.06 1.17 1.02 # N's per 100 kbp 1.72 5.41 0.22 7.09 # mismatches per 100 kbp 518 588 1,297 1,097 # indels per 100 kbp 120 130 334 271 Largest alignment 76,935 264,039 44,515 17,201 Total aligned length 98,382,255 108,085,473 100,502,092 80,814,492
  16. 16. Field-based sequencing: Phylogenomics Coding sequences and proteins predicted from hybridSPAdes MiSeq + MinION (lab- and field- sequenced) data assembly using CEGMA Aligned to 248 CEGMA alignments (plus additional plants from JGI) with muscle Quick RAxML phylogenies; TreeAnnotator majority-rule consensus Consensus support > 98%
  17. 17. …. And extreme phylogenomics
  18. 18. Field-based sequencing Conclusions Field-based DNA extraction, library preparation and sequencing are entirely feasible. With current techniques, comparable quantities of data can be produced to lab-based runs. Data is of sufficient quality for congeneric ID using simple, fast processes (BLAST) and genomics. Raw reads can even be used for informative phylogenomics with minimal processing.
  19. 19. More adventures in real-time ID: Kew Science Festival 2016 Fast ID-by-sequencing: Generate genomic data for BLAST identification: rapid-rough-reference (‘R3’) Field-sequence unknown (blinded) samples from panel Use BLAST triggered by watch daemon to instantly identify new reads Recompute sample ID in real- time using difference statistics
  20. 20. More adventures in real-time ID: Kew Science Festival 2016 All six species identified correctly from blinded samples. Fastest time-to-ID: 20 minutes R3 databases: Samples: 6 Avg yield: 32K reads / 46Mbp N50: 7.4Kbp 12-48h sequencing Science festival IDs: 6 samples, 3 days Avg yield: 23K reads / 33Mbp N50: 2.9Kbp
  21. 21. Implications of ubiquitous sequencing in real-time
  22. 22. App store informatics
  23. 23. Real-time phylogenomics Informatics: Field-based sequencing Real-time analyses Asynchronous computation Phylogenomics: Metrics on ‘tree space’ Relaxing orthology Neutral models of genomic evolution
  24. 24. Big genomic data for macroevolutionary questions What is the tree of life? Is ‘sequence space’ constrained – is evolution reproducible (‘replay-the-tape’)? Are higher clades ‘real’? Why species? Genomes? Individuals?
  25. 25. Images – Wikimedia commons CC BY-SA (clockwise from top left: Jeroen Rouwkema, @Nelsonramirezdearellano, author’s own, @soerfm)
  26. 26. Thanks RBG Kew: Alexander S.T. Papadopulos (@metallophyte) Andrew Helmstetter (@ajhelmstetter) Dion Devey, Robyn Cowan ONT: Dan Turner, Richard Ronan, Gerrard Coyne

×