Carleton Biology talk : March 2014

  • 87 views
Uploaded on

 

More in: Science , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
87
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. @kcranstn! http://slideshare.net/kcranstn Enabling science with the tree of life Karen Cranston! National Evolutionary Synthesis Center (NESCent)
  • 2. The tree of life provides a means for organizing and explaining biodiversity data Weigmann et al. PNAS, 2011
  • 3. What do we want from a Tree of Life? ❖ complete = contains all of biodiversity! ❖ dynamic = continuously updated with new data! ❖ available digitally = browse, query, download Image: http://evolution.berkeley.edu
  • 4. ❖ Create a complete tree of life by synthesizing published phylogenetic data! ❖ Provide tools for managing, synthesizing & sharing phylogenetic data http://opentreeoflife.org
  • 5. Synthetic science ❖ Novel methods & analysis tools! ❖ Big data from existing data Biodiversity Synthesis Center / Encyclopedia of Life National Evolutionary Synthesis Center
  • 6. Challenges ❖ Incongruence: How do we detect and use conflict between trees?! ❖ Availability: What data do we have to construct a tree of life?! ❖ Synthesis: How do we combine data across the tree of life?
  • 7. What can we learn from conflict between trees?
  • 8. aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... Phylogenetic inference Many likely trees Gene tree uncertainty Single gene alignment
  • 9. Bayesian phylogenetic inference Input: sequence data + evolutionary model Output = list of sampled phylogenies
  • 10. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Sampled trees Probability Number of times sampled ∝ probability Is there a stable backbone among the trees?! What taxa have unstable placement?
  • 11. Summarize with agreement subtrees 0.20 0.15 0.25Pr=0.40 1 23 4 5 1 2 3 4 51 23 4 5 1 2 3 4 5 Pr=1.00 0.20 0.15 0.25Pr=0.40 1 3 4 5 1 3 4 5 1 3 4 51 3 4 5
  • 12. 0.20 0.15 0.25Pr=0.40 1 23 4 5 1 2 4 3 51 23 4 5 1 2 3 4 5 Pr=0.85 0.20 0.15 0.25Pr=0.40 1 3 4 5 1 4 3 5 1 3 4 51 3 4 5 Cranston, K.A. and B.H. Rannala. Summarizing a posterior distribution of phylogenies using agreement subtrees. Systematic Biology 2007: 56(4), pp. 578-590.
  • 13. Multiple sequence alignments Concatenate Supermatrix Species tree Supertrees Gene duplication Coalescent Gene trees
  • 14. Phylogenomics of rice (Oryza) 820,000 BAC-end sequences for 9 diploid Oryza species 1720 gene fragments! 2.4 million nucleotides Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010 What are the biological causes of gene tree incongruence in rice?! Do we need full genomes to answer these questions?
  • 15. Phylogenomics of rice (Oryza) Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010 Concatenated analysis
  • 16. Gene trees in Oryza ❖ Gene tree methods: recover every possible topology! ❖ Species tree methods: many clades not statistically significant Cranston, K.A., B. Hurwitz, D. Ware, L. Stein, R.A. Wing. Species trees from highly incongruent gene trees in rice. Systematic Biology. 2009: doi: 10.1093/syst- bio.syp054 Supermatrix topology ❖ Suggest incomplete lineage sorting and hybridization / introgression in evolutionary history of rice
  • 17. What data do we have for creating a complete tree of life?
  • 18. Gene tree signal in GenBank How many trees can we build using all of the data in GenBank and how are those trees distributed across the tree of life?
  • 19. All-vs-all BLAST at each NCBI taxonomy node Sanderson, M.J., D.T. Boss, D. Chen, K.A. Cranston, and A. Wehe. The PhyLoTA Browser: Processing GenBank for molecular phylogenetics research. Systematic Biology 2008: 57(3). Arachis hypogaea Arachis hypogaea subsp. fastigiata Arachis hypogaea subsp. hypogaea Arachis glabrata subtree clusters Arachis
  • 20. All possible clusters, alignments and trees aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... ❖ ~90000 clusters, alignments, trees available for download! ❖ data availability matrix at each NCBI node
  • 21. ❖ complete = contains all of biodiversity! ❖ dynamic = continuously updated with new data! ❖ available digitally = browse, query, download http://opentreeoflife.org
  • 22. Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder Laura Katz Rick Ree Stephen Smith Doug Soltis Tiffani Williams Computer science! Systematics! Evolutionary theory! Computational biology! Bioinformatics! Journalism
  • 23. Even if there were phylogenies for all sequence clusters in GenBank, would only represent a small fraction of biodiversity
  • 24. Two types of inputs Phylogeny! highly resolved! computationally derived! limited coverage Taxonomy! poorly resolved! manually curated! much more complete
  • 25. ~7000 trees from ~2600 studies Phylografter: Rick Ree, Field Museum of Natural History
  • 26. Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL = 344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im- proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80– 88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number EVOLUTION ~ 4% of all published phylogenetic trees Stoltzfus et al 2012 Trees generally published as pictures in PDFs
  • 27. OpenTree Reference Taxonomy + + + patch files for manual edits + 3,133,028 nodes and 2,559,835 ‘species’ Jonathan Rees, NESCent
  • 28. How do we combine data to build and use a tree of life?
  • 29. Novel datastore for synthesis Treemachine: Stephen Smith, Cody Hinchliff, Joseph Brown, U Michigan
  • 30. Jim Allman, NESCent
  • 31. Manual synthesis based on all data Automated synthesis based on limited data
  • 32. Inputs: Published phylogenies Taxonomies • filter / weight input trees • re-synthesize • process feedback • input new trees synthetic tree of life
  • 33. Improving the synthetic tree ❖ Branch lengths & divergence times! ❖ Better synthesis using tree metadata! ❖ Community engagement! ❖ data deposition & curation! ❖ feedback & annotation
  • 34. Moving beyond a single tree ❖ Detecting conflict and coverage! ❖ Visualization! ! ❖ Enabling custom synthesis! ❖ Building out to other tools & resources
  • 35. Leaf Tree of Life OPEN What can we do with a tree of life?
  • 36. aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... + = Image: Zephyris at the English language Wikipedia 10 million years 24 million years
  • 37. Acer macrophyllum! Betula lutea! Aesculus glabra! Tilia americana! Ulmus rubra Leaf patterns image from Walls RL: American Journal of Botany 2011, 98(2):244-253. Acer macrophyllum Betula alleghaniensis Aesculus glabra Tilia americana Ulmus rubra
  • 38. Stoltzfus, A., Lapp, H., Matasci, N., … Cranston, K.A., ... & Jordan, G. (2013). Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC bioinformatics, 14(1), 158.
  • 39. Collaborative data collection! Validation of datasets! Search & download across datasets
  • 40. Get tree Get tree
  • 41. Leaf Tree of Life OPEN What can we do with a tree of life?
  • 42. University of Alberta: ! ! Bruce Rannala! ! University of Arizona: ! ! Michael Sanderson! ! NESCent:! ! Jonathan Rees! ! Jim Allman