The iPlant Tree of Life Project and Toolkit


Published on

The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research
Given at Evolution 2011
An overview of the iPlant and iPToL project

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Bringing a culture of computing to the Plant Sciences.
  • World class resources: Rocinante : 128 cores; 16 nodes; 64 GB node; 300 TB storage Corral : 1.7 PB storage + 20 PB archive Lonestar4 : 22,656 Intel Westmere cores; 40 GB QDR-IB; 1 PB storage; 44.3 TB RAM. Plus 1 TB RAM, GPU, and Cloud upgrades. Longhorn : 2048 Intel Nehalem cores. 512 NVIDIA Quadro FX 5800 GPU. 14.5 TB RAM. 1 PB storage. Ranger : 62976 AMD Opteron cores; 123 TB RAM; 32 GPUs. 1.7 PB storage.
  • Large: >2 Gigs, where browsers fail
  • Highest level of abstraction
  • Distance matrix calculation compared to FASTREE
  • BIEN: biological information and ecology network
  • Provide the scientific community with a toolkit that will allow them to study the evolution of traits of interest Adaptation in response to past climate change Co-evolution of pollinators and flowers or hosts and parasites
  • Contrast: Test for correlation of continuous traits, taking into account phylogeny DACE: Estimating the status of a discrete trait (e.g. presence/absence of fruit, color) in the ancestors of a group of taxa CACE: Estimating the value of a continuous trait (e.g. yield, hight) in the ancestors of a group of taxa
  • The iPlant Tree of Life Project and Toolkit

    1. 1. The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution 2011 Jun 17-21, 2011
    2. 2. What is iPlant?
    3. 3. Discovery Environment NEW RELEASE COMING SOON!
    4. 4.
    5. 5. Physical Infrastructure <ul><li>Computation </li></ul><ul><li>63K cores cluster </li></ul><ul><li>20K cores cluster </li></ul><ul><li>1 TB RAM </li></ul><ul><li>Storage </li></ul><ul><li>2 PB </li></ul><ul><li>20 PB archive </li></ul>
    6. 6. Cloud Storage <ul><li>Store, access and share large datasets </li></ul><ul><li>Multiple points of entry: web interface, mounted FS, API </li></ul><ul><li>Free and secure </li></ul>AVAILABLE NOW!
    7. 7. Cloud Computing <ul><li>Virtual Machines </li></ul><ul><ul><li>Up to 4 cores, 32 GB RAM, 100 GB dedicated disk </li></ul></ul><ul><ul><li>Run any x86-compatible OS (even Windows) </li></ul></ul><ul><ul><li>Persistent or on-demand </li></ul></ul><ul><ul><li>Log in via SSH or secure VNC </li></ul></ul><ul><li>Use Cases </li></ul><ul><ul><li>Internet-enabled Servers </li></ul></ul><ul><ul><li>Database management appliances </li></ul></ul><ul><ul><li>Virtual desktops </li></ul></ul><ul><ul><li>… The sky is the limit! </li></ul></ul>AVAILABLE NOW!
    8. 8. Consumer Applications iPlant's CI
    9. 9. iPlant Tree of Life Grand Challange <ul><li>Large phylogenetic inference </li></ul><ul><ul><li>Building a tree of life for up to 500,000 green plants </li></ul></ul><ul><li>Tree Visualization </li></ul><ul><ul><li>Scalable visualization for small to large trees </li></ul></ul><ul><li>Data Assembly and Integration </li></ul><ul><ul><li>Acquisition, organization and processing the data </li></ul></ul><ul><li>Taxonomic Intelligence </li></ul><ul><ul><li>Sorting out different names for the same species </li></ul></ul><ul><li>Tree Reconciliation </li></ul><ul><ul><li>Resolving discordant gene and species trees </li></ul></ul><ul><li>Trait Evolution </li></ul><ul><ul><li>Using trees to understand how traits evolved </li></ul></ul>
    10. 10. Big Trees <ul><li>To optimize existing methods to construct phylogenetic trees in the order of 500K taxa. </li></ul>
    11. 11. Big Trees <ul><li>NINJA/WINDJAMMER (Travis Wheeler) </li></ul><ul><ul><li>Neighbor-Joining implementation that can analyze > 200K species </li></ul></ul><ul><ul><li>Six day run time reduced 32-fold to 4.5 hours for 220K species data set </li></ul></ul><ul><ul><li>Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set </li></ul></ul><ul><li>RAxML-Light (Alexandros Stamatakis) </li></ul><ul><li>Large Scale Maximum Likelihood implementation </li></ul><ul><ul><li>55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) </li></ul></ul>AVAILABLE NOW!
    12. 12. Tree Visualization <ul><li>To develop an application for viewing, analyzing and exploring large phylogenetic trees. </li></ul>
    13. 13. Tree Visualization <ul><li>> 500K Taxa </li></ul><ul><li>Fast </li></ul><ul><li>Web based, platform independent </li></ul><ul><li>Semantic zooming </li></ul><ul><li>Metadata driven display of information </li></ul>
    14. 14. iPlant Tree Viewer Prototype AVAILABLE NOW!
    15. 15. 1KP <ul><li>Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project </li></ul>
    16. 16. 1KP unexplored territory N(genes) dozens of species completed genomes N(species) dozens of genes PCR in 10 4 species
    17. 17. Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin ’s “abominable mystery” Phylogenomics of 1000 species across plant taxa
    18. 18. Tree Reconciliation <ul><li>To reconcile the evolutionary history of genes and species. </li></ul>
    19. 19. Tree Reconciliation Gene family data courtesy John Bowers
    20. 21. Taxonomic Name Resolution <ul><li>Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names. </li></ul>
    21. 22. Taxonomic uncertainty <ul><li>Non-existent names </li></ul><ul><ul><li>Misspellings </li></ul></ul><ul><ul><li>Contamination </li></ul></ul><ul><ul><ul><li>Annotations </li></ul></ul></ul><ul><ul><ul><li>Morphospecies </li></ul></ul></ul><ul><ul><ul><li>Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) </li></ul></ul></ul><ul><li>Synonymy </li></ul><ul><ul><li>Nomenclatural synonyms </li></ul></ul><ul><ul><li>Taxonomic synonyms / concepts </li></ul></ul><ul><li>Misidentifications, incomplete identifications </li></ul>
    23. 26. Taxonomic Name Resolution Service <ul><li>Computer assisted standardization of plant names </li></ul><ul><li>Corrects spelling errors and alternative spellings to a standard list of names </li></ul><ul><li>Convert out-of-date names to currently accepted names </li></ul>
    24. 27. Trait Evolution <ul><li>To develop an infrastructure for downstream analysis of large trees. </li></ul>
    25. 28. Trait Evolution <ul><li>Toolkit to study the evolution of traits of interest on very large phylogenies </li></ul><ul><ul><li>Diversification </li></ul></ul><ul><ul><li>Biogeographic patterns </li></ul></ul><ul><ul><li>Adaptation </li></ul></ul><ul><ul><li>Co-evolution </li></ul></ul><ul><ul><li>… </li></ul></ul>
    26. 29. Current analyses (Proof of concept) <ul><li>Phylogenetically Independent Contrasts (Felsenstein 1985) </li></ul><ul><li>Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) </li></ul><ul><li>Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004) </li></ul>
    27. 30. Community Integrated (2 ½ Days Workshop)
    28. 31. <ul><li>To easily share information and research, collaborate, and stay on top of the latest news in the field. </li></ul>
    29. 32. Collaborative Tool AVAILABLE NOW! NEW AND IMPROVED!
    30. 34.