iPlant Tree of Life


Published on

iPlant Tree of Life
Given at PAG XIX (2011)
Overview of the ToL project

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Left tree: Maple tree phylogeny from D. Ackerly Left picture: Joe Felsenstein, ca. 1980 Right picture: Ranger cluster at TACC
  • Exponential growth is fast. The number of trees grows even faster (factorially). With exponential growth, you multiply by a constant factor (i.e., 10 x 10 x 10). With trees, this factor keeps increasing 3 x 5 x 7 x 9 x 11…. At just around 50 taxa, there are more possible topologies than there are atoms in the universe. We want a tree of 500K taxa: how many possible trees are there with that many?
  • Distance matrix calculation compared to FASTREE
  • Ninja extended abstract submitted
  • BIEN: biological information and ecology network
  • Slide for illustrative purposes only.
  • BIEN: Biological information and ecology network NCEA: Nation center for ecological analysis and sythesis
  • Tree Reconciliation allows us to infer the occurrence of these evolutionary events
  • N-J bootstraps Branch lengths ML with HKY model TreeBeST: The resultant tree is bootstrapped for 100 times, reconciled with the species tree and rooted by minimizing with the number of duplications and losses
  • Provide the scientific community with a toolkit that will allow them to study the evolution of traits of interest Adaptation in response to past climate change Co-evolution of pollinators and flowers or hosts and parasites
  • Contrast: Test for correlation of continuous traits, taking into account phylogeny DACE: Estimating the status of a discrete trait (e.g. presence/absence of fruit, color) in the ancestors of a group of taxa CACE: Estimating the value of a continuous trait (e.g. yield, hight) in the ancestors of a group of taxa
  • Tighter integration: e.g. launching analyses from visualizer
  • iPlant Tree of Life

    1. 1. iPlant Tree of Life Naim Matasci Plant and Animal Genome XIX Conference Jan 15-19, 2011
    2. 2. Nothing in biology makes sense except in the light of evolution. T. G. Dobzahnsky
    3. 3. Phylogenetic Insights
    4. 4. Tree of Life A metaphor for the phylogeny of all species or a large group of species
    5. 5. Scalability Ackerly, 2009; J. Felsenstein, ca. 1980; Ranger Cluster at TACC
    6. 6. iPToL Challenges <ul><li>Large phylogenetic inference </li></ul><ul><ul><li>Building a tree of life for up to 500,000 green plants </li></ul></ul><ul><li>Tree Visualization </li></ul><ul><ul><li>Scalable visualization for small to large trees </li></ul></ul><ul><li>Data Assembly and Integration </li></ul><ul><ul><li>Acquisition, organization and processing the data </li></ul></ul><ul><li>Taxonomic Intelligence </li></ul><ul><ul><li>Sorting out different names for the same species </li></ul></ul><ul><li>Tree Reconciliation </li></ul><ul><ul><li>Resolving discordant gene and species trees </li></ul></ul><ul><li>Trait Evolution </li></ul><ul><ul><li>Using tree to understand how traits evolved </li></ul></ul>
    7. 7. Big Trees <ul><li>To optimize existing methods to construct phylogenetic trees in the order of 500K taxa. </li></ul>
    8. 8. Tree Building
    9. 9. Factorial (trees) E10 E2 Number of atoms in the universe
    10. 10. Big Trees <ul><li>NINJA (Travis Wheeler) </li></ul><ul><ul><li>Neighbor-Joining implementation that can analyze > 200K species </li></ul></ul><ul><ul><li>Software rewritten from Java to C with an MPI </li></ul></ul><ul><ul><li>Six day run time reduced 32-fold to 4.5 hours for 220K species data set </li></ul></ul><ul><ul><li>Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set </li></ul></ul><ul><li>RAxML (Alexandros Stamatakis) </li></ul><ul><li>Large Scale Maximum Likelihood implementation </li></ul><ul><ul><li>Added check-pointing </li></ul></ul><ul><ul><li>Re-implementing pthreads implementation to an MPI </li></ul></ul><ul><ul><li>  </li></ul></ul>
    11. 11. In the works <ul><li>Source code releases </li></ul><ul><li>Further improvements </li></ul><ul><li>Building the Tree of Life </li></ul>
    12. 12. Tree Visualization <ul><li>To develop an application for viewing, analyzing and exploring large phylogenetic trees. </li></ul>
    13. 13. Tree Visualization <ul><li>> 500K Taxa </li></ul><ul><li>Fast </li></ul><ul><li>Platform independent </li></ul><ul><li>Semantic zooming </li></ul><ul><li>Metadata driven display of information </li></ul>
    14. 14. iPlant Tree Viewer Prototype
    15. 15. My-Plant.org <ul><li>To easily share information and research, collaborate, and stay on top of the latest news in the field. </li></ul>
    16. 16. <ul><li>The usual suspects in social networking features </li></ul><ul><ul><li>Image gallery </li></ul></ul><ul><ul><li>File sharing </li></ul></ul><ul><ul><li>Group/private messaging </li></ul></ul><ul><ul><li>Forums </li></ul></ul><ul><ul><li>Group posts </li></ul></ul><ul><ul><li>“ Colleagues” </li></ul></ul><ul><ul><li>User profiles </li></ul></ul><ul><ul><li>Searchable content </li></ul></ul>Social Networking for the Plant Sciences
    17. 18. My Clades
    18. 19. My Clades
    19. 20. In the works <ul><li>Integration with other social networks/services </li></ul><ul><ul><li>Twitter, Facebook </li></ul></ul><ul><li>Expanding the number of active clades </li></ul>November 13, 2010
    20. 21. Sign up today! <ul><li>Go to http://my-plant.org </li></ul><ul><li>Click on Registration </li></ul><ul><li>Try it out! </li></ul><ul><li>Check out poster (Software session) </li></ul>
    21. 22. 1KP <ul><li>Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptome Project </li></ul>
    22. 23. 1KP unexplored territory N(genes) dozens of species completed genomes N(species) dozens of genes PCR in 10 4 species
    23. 24. Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin ’s “abominable mystery” phylogenomics of 1000 species across plant taxa
    24. 25. Tuesday Afternoon, 18 January 2011 - 3:50 pm to 6:00 pm Gene Expression Analysis Workshop - Pacific Salon 3 Organizers: David Galbraith, University of Arizona and Greg May, NCGR, Santa Fe @ 4:00 pm - Gane Ka-Shu Wong , University of Alberta &quot; 1KP: an International Consortium Sequencing the Transcriptomes of 1000 Phylogenetically Diverse Plants from Angiosperms to Green Algae &quot;
    25. 26. Taxonomic Name Resolution <ul><li>Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names. </li></ul>
    26. 27. Taxonomic Intelligence Arabidopsis thaliana Arabis thaliana
    27. 28. Taxonomic Intelligence Arabidopsis thaliana Arabis thaliana Mouse-ear cress Thale cress
    28. 29. Taxonomic Intelligence Arabidopsis thaliana Arabis thaliana Mouse-ear cress Thale cress 1A12_ARATH Q06402
    29. 30. Taxonomic Intelligence cress
    30. 31. Taxonomic Intelligence cress
    31. 32. Taxonomic Name Resolution Service Image courtesy of Brad Boyle.
    32. 33. Taxonomic Name Resolution Service
    33. 34. Taxonomic Name Resolution Service (Demo)
    34. 35. In the works <ul><li>Higher and lower taxonomic orders </li></ul><ul><li>Synonymy </li></ul><ul><li>API </li></ul>
    35. 36. Tree Reconciliation <ul><li>To reconcile the evolutionary history of genes and species. </li></ul>
    36. 37. Origins of incongruence <ul><li>Lineage sorting and hybridization </li></ul><ul><li>Gene duplications </li></ul><ul><li>Horizontal gene transfer </li></ul><ul><li>Tree Reconciliation allows us to infer the occurrence of these evolutionary events </li></ul>
    37. 38. Species Tree
    38. 39. Gene Tree
    39. 40. Tree Reconciliation Gene family data courtesy John Bowers
    40. 41. Tree Reconciliation (Demo)
    41. 42. In the works <ul><li>Additional search capabilities </li></ul><ul><li>New visualization style </li></ul><ul><li>Interactive mode </li></ul>
    42. 43. Trait Evolution <ul><li>To develop an infrastructure for downstream analysis of large trees. </li></ul>
    43. 44. Trait Evolution <ul><li>Toolkit to study the evolution of traits of interest on very large phylogenies </li></ul><ul><ul><li>Diversification </li></ul></ul><ul><ul><li>Biogeographic patterns </li></ul></ul><ul><ul><li>Adaptation </li></ul></ul><ul><ul><li>Co-evolution </li></ul></ul><ul><ul><li>… </li></ul></ul>
    44. 45. Current analyses <ul><li>Phylogenetically Independent Contrasts (Felsenstein 1985) </li></ul><ul><li>Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) </li></ul><ul><li>Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004) </li></ul>
    45. 46. Pylogenetically Independent Contrasts Video
    46. 47. In the works <ul><li>Various tree stretching models </li></ul><ul><li>Evolutionary models fitting </li></ul><ul><li>Contrasts for discrete traits (Pagel 1994) </li></ul><ul><li>Tighter integration with the Discovery Environment </li></ul><ul><li>More tools and methods from the community </li></ul>