Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iP...
What is iPlant?
Discovery Environment NEW RELEASE COMING SOON! http://www.iplantcollaborative.org/discovery-environment-preview-access
Physical Infrastructure <ul><li>Computation </li></ul><ul><li>63K cores cluster </li></ul><ul><li>20K cores cluster  </li>...
Cloud Storage <ul><li>Store, access and share large datasets </li></ul><ul><li>Multiple points of entry: web interface, mo...
Cloud Computing <ul><li>Virtual Machines </li></ul><ul><ul><li>Up to 4 cores, 32 GB RAM, 100 GB dedicated disk </li></ul><...
Consumer Applications iPlant's CI
iPlant Tree of Life Grand Challange <ul><li>Large phylogenetic inference </li></ul><ul><ul><li>Building a tree of life for...
Big Trees <ul><li>To optimize existing methods to construct phylogenetic trees in the order of 500K taxa. </li></ul>
Big Trees <ul><li>NINJA/WINDJAMMER  (Travis Wheeler) </li></ul><ul><ul><li>Neighbor-Joining implementation that can analyz...
Tree  Visualization <ul><li>To develop an application for viewing, analyzing and exploring large phylogenetic trees. </li>...
Tree Visualization <ul><li>> 500K Taxa </li></ul><ul><li>Fast </li></ul><ul><li>Web based, platform independent </li></ul>...
iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/
1KP <ul><li>Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project </li></ul>
1KP unexplored territory N(genes) dozens of species completed genomes N(species) dozens of genes PCR in 10 4  species
Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin ’s “abominable myst...
Tree Reconciliation <ul><li>To reconcile the evolutionary history of genes and species. </li></ul>
Tree Reconciliation Gene family data courtesy John Bowers
 
Taxonomic Name  Resolution <ul><li>Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting...
Taxonomic uncertainty <ul><li>Non-existent names </li></ul><ul><ul><li>Misspellings </li></ul></ul><ul><ul><li>Contaminati...
 
AS SEEN IN NATURE! AVAILABLE NOW!
 
Taxonomic Name Resolution Service <ul><li>Computer assisted standardization of plant names </li></ul><ul><li>Corrects spel...
Trait Evolution <ul><li>To develop an infrastructure for downstream analysis of large trees. </li></ul>
Trait Evolution  <ul><li>Toolkit to study the evolution of traits of interest on very large phylogenies </li></ul><ul><ul>...
Current analyses (Proof of concept) <ul><li>Phylogenetically Independent Contrasts (Felsenstein 1985) </li></ul><ul><li>Co...
Community Integrated  (2 ½ Days Workshop)
My-Plant.org <ul><li>To easily share information and research, collaborate, and stay on top of the latest news in the fiel...
Collaborative Tool AVAILABLE NOW! NEW AND IMPROVED! http://my-plant.org/
 
http://www.iplantcollaborative.org
Upcoming SlideShare
Loading in …5
×

The iPlant Tree of Life Project and Toolkit

521 views

Published on

The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research
Given at Evolution 2011
An overview of the iPlant and iPToL project

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

The iPlant Tree of Life Project and Toolkit

  1. 1. The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution 2011 Jun 17-21, 2011
  2. 2. What is iPlant?
  3. 3. Discovery Environment NEW RELEASE COMING SOON! http://www.iplantcollaborative.org/discovery-environment-preview-access
  4. 4.
  5. 5. Physical Infrastructure <ul><li>Computation </li></ul><ul><li>63K cores cluster </li></ul><ul><li>20K cores cluster </li></ul><ul><li>1 TB RAM </li></ul><ul><li>Storage </li></ul><ul><li>2 PB </li></ul><ul><li>20 PB archive </li></ul>
  6. 6. Cloud Storage <ul><li>Store, access and share large datasets </li></ul><ul><li>Multiple points of entry: web interface, mounted FS, API </li></ul><ul><li>Free and secure </li></ul>AVAILABLE NOW! http://www.iplantcollaborative.org/about/policies/data-set-hosting
  7. 7. Cloud Computing <ul><li>Virtual Machines </li></ul><ul><ul><li>Up to 4 cores, 32 GB RAM, 100 GB dedicated disk </li></ul></ul><ul><ul><li>Run any x86-compatible OS (even Windows) </li></ul></ul><ul><ul><li>Persistent or on-demand </li></ul></ul><ul><ul><li>Log in via SSH or secure VNC </li></ul></ul><ul><li>Use Cases </li></ul><ul><ul><li>Internet-enabled Servers </li></ul></ul><ul><ul><li>Database management appliances </li></ul></ul><ul><ul><li>Virtual desktops </li></ul></ul><ul><ul><li>… The sky is the limit! </li></ul></ul>AVAILABLE NOW! http://www.iplantcollaborative.org/atmosphere-preview
  8. 8. Consumer Applications iPlant's CI
  9. 9. iPlant Tree of Life Grand Challange <ul><li>Large phylogenetic inference </li></ul><ul><ul><li>Building a tree of life for up to 500,000 green plants </li></ul></ul><ul><li>Tree Visualization </li></ul><ul><ul><li>Scalable visualization for small to large trees </li></ul></ul><ul><li>Data Assembly and Integration </li></ul><ul><ul><li>Acquisition, organization and processing the data </li></ul></ul><ul><li>Taxonomic Intelligence </li></ul><ul><ul><li>Sorting out different names for the same species </li></ul></ul><ul><li>Tree Reconciliation </li></ul><ul><ul><li>Resolving discordant gene and species trees </li></ul></ul><ul><li>Trait Evolution </li></ul><ul><ul><li>Using trees to understand how traits evolved </li></ul></ul>
  10. 10. Big Trees <ul><li>To optimize existing methods to construct phylogenetic trees in the order of 500K taxa. </li></ul>
  11. 11. Big Trees <ul><li>NINJA/WINDJAMMER (Travis Wheeler) </li></ul><ul><ul><li>Neighbor-Joining implementation that can analyze > 200K species </li></ul></ul><ul><ul><li>Six day run time reduced 32-fold to 4.5 hours for 220K species data set </li></ul></ul><ul><ul><li>Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set </li></ul></ul><ul><li>RAxML-Light (Alexandros Stamatakis) </li></ul><ul><li>Large Scale Maximum Likelihood implementation </li></ul><ul><ul><li>55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) </li></ul></ul>AVAILABLE NOW!
  12. 12. Tree Visualization <ul><li>To develop an application for viewing, analyzing and exploring large phylogenetic trees. </li></ul>
  13. 13. Tree Visualization <ul><li>> 500K Taxa </li></ul><ul><li>Fast </li></ul><ul><li>Web based, platform independent </li></ul><ul><li>Semantic zooming </li></ul><ul><li>Metadata driven display of information </li></ul>
  14. 14. iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/
  15. 15. 1KP <ul><li>Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project </li></ul>
  16. 16. 1KP unexplored territory N(genes) dozens of species completed genomes N(species) dozens of genes PCR in 10 4 species
  17. 17. Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin ’s “abominable mystery” Phylogenomics of 1000 species across plant taxa
  18. 18. Tree Reconciliation <ul><li>To reconcile the evolutionary history of genes and species. </li></ul>
  19. 19. Tree Reconciliation Gene family data courtesy John Bowers
  20. 21. Taxonomic Name Resolution <ul><li>Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names. </li></ul>
  21. 22. Taxonomic uncertainty <ul><li>Non-existent names </li></ul><ul><ul><li>Misspellings </li></ul></ul><ul><ul><li>Contamination </li></ul></ul><ul><ul><ul><li>Annotations </li></ul></ul></ul><ul><ul><ul><li>Morphospecies </li></ul></ul></ul><ul><ul><ul><li>Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) </li></ul></ul></ul><ul><li>Synonymy </li></ul><ul><ul><li>Nomenclatural synonyms </li></ul></ul><ul><ul><li>Taxonomic synonyms / concepts </li></ul></ul><ul><li>Misidentifications, incomplete identifications </li></ul>
  22. 24. AS SEEN IN NATURE! AVAILABLE NOW!
  23. 26. Taxonomic Name Resolution Service <ul><li>Computer assisted standardization of plant names </li></ul><ul><li>Corrects spelling errors and alternative spellings to a standard list of names </li></ul><ul><li>Convert out-of-date names to currently accepted names </li></ul>
  24. 27. Trait Evolution <ul><li>To develop an infrastructure for downstream analysis of large trees. </li></ul>
  25. 28. Trait Evolution <ul><li>Toolkit to study the evolution of traits of interest on very large phylogenies </li></ul><ul><ul><li>Diversification </li></ul></ul><ul><ul><li>Biogeographic patterns </li></ul></ul><ul><ul><li>Adaptation </li></ul></ul><ul><ul><li>Co-evolution </li></ul></ul><ul><ul><li>… </li></ul></ul>
  26. 29. Current analyses (Proof of concept) <ul><li>Phylogenetically Independent Contrasts (Felsenstein 1985) </li></ul><ul><li>Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) </li></ul><ul><li>Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004) </li></ul>
  27. 30. Community Integrated (2 ½ Days Workshop)
  28. 31. My-Plant.org <ul><li>To easily share information and research, collaborate, and stay on top of the latest news in the field. </li></ul>
  29. 32. Collaborative Tool AVAILABLE NOW! NEW AND IMPROVED! http://my-plant.org/
  30. 34. http://www.iplantcollaborative.org

×