The iPlant Tree of Life Project and Toolkit


Published on

The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research
Given at the National Museum of National History in 2011
An overview of iPlant and iPToL

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Bringing a culture of computing to the Plant Sciences.
  • World class resources:Rocinante: 128 cores; 16 nodes; 64 GB node; 300 TB storageCorral: 1.7 PB storage + 20 PB archiveLonestar4: 22,656 Intel Westmere cores; 40 GB QDR-IB; 1 PB storage; 44.3 TB RAM. Plus 1 TB RAM, GPU, and Cloud upgrades.Longhorn: 2048 Intel Nehalem cores. 512 NVIDIA Quadro FX 5800 GPU. 14.5 TB RAM. 1 PB storage.Ranger: 62976 AMD Opteron cores; 123 TB RAM; 32 GPUs. 1.7 PB storage.
  • Large: >2 Gigs, where browsers fail
  • Highest level of abstraction
  • Distance matrix calculation compared to FASTREE
  • BIEN: biological information and ecology network
  • Parsing: GNI Parser Dmitry MozzherinMatching: Taxamatch by Tony Rees
  • Provide the scientific community with a toolkit that will allow them to study the evolution of traits of interestAdaptation in response to past climate changeCo-evolution of pollinators and flowers or hosts and parasites
  • Contrast: Test for correlation of continuous traits, taking into account phylogenyDACE: Estimating the status of a discrete trait (e.g. presence/absence of fruit, color) in the ancestors of a group of taxaCACE: Estimating the value of a continuous trait (e.g. yield, hight) in the ancestors of a group of taxa
  • The iPlant Tree of Life Project and Toolkit

    1. 1. The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science Research<br />Naim Matasci<br />520 303 8623<br />The iPlant Collaborative<br />National Museum of Natural History<br />Jul 14, 2011<br />
    2. 2. What is iPlant?<br />
    3. 3. Discovery Environment<br />NEW RELEASE COMING SOON!<br /><br />
    4. 4. 4<br />
    5. 5. Physical Infrastructure<br />Computation<br /><ul><li>63K cores cluster
    6. 6. 20K cores cluster
    7. 7. 1 TB RAM
    8. 8. 512 GPUs</li></ul>Storage<br /><ul><li>2 PB
    9. 9. 20 PB archive
    10. 10. High speed parallel data transfer </li></li></ul><li>6<br />
    11. 11. Cloud Storage<br />AVAILABLE NOW!<br /><ul><li>Store, access and share large datasets
    12. 12. Multiple points of entry: web interface, mounted FS, API
    13. 13. Free and secure</li></ul><br />
    14. 14. Cloud Computing<br />AVAILABLE NOW!<br />Virtual Machines<br />Up to 4 cores, 32 GB RAM, 100 GB dedicated disk<br />Run any x86-compatible OS (even Windows)<br />Persistent or on-demand<br />Log in via SSH or secure VNC<br />Use Cases<br />Internet-enabled Servers<br />Database management appliances<br />Virtual desktops<br />…The sky is the limit!<br /><br />
    15. 15. Consumer Applications<br />9<br />iPlant's CI<br />
    16. 16. iPlant Tree of Life Grand Challange<br />Large phylogenetic inference<br />Building a tree of life for up to 500,000 green plants<br />Tree Visualization<br />Scalable visualization for small to large trees<br />Data Assembly and Integration<br />Acquisition, organization and processing the data<br />Taxonomic Intelligence<br />Sorting out different names for the same species<br />Tree Reconciliation<br />Resolving discordant gene and species trees<br />Trait Evolution<br />Using trees to understand how traits evolved<br />
    17. 17. Big Trees<br />To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.<br />
    18. 18. Big Trees<br />NINJA/WINDJAMMER (Travis Wheeler)<br />Neighbor-Joining implementation that can analyze > 200K species <br />Six day run time reduced 32-fold to 4.5 hours for 220K species data set<br />Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set <br />RAxML-Light (AlexandrosStamatakis)<br />Large Scale Maximum Likelihood implementation <br />55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)<br />AVAILABLE NOW!<br />
    19. 19. Tree Visualization<br />To develop an application for viewing, analyzing and exploring large phylogenetic trees.<br />
    20. 20. Tree Visualization<br />> 500K Taxa<br />Fast<br />Web based, platform independent<br />Semantic zooming<br />Metadata driven display of information<br />
    21. 21. iPlant Tree Viewer Prototype<br />AVAILABLE NOW!<br /><br />
    22. 22. 1KP<br />Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project<br />
    23. 23. 1KP<br />dozens of species completed genomes<br />unexplored territory<br />N(genes)<br />dozens of genes PCR in 104 species<br />N(species)<br />
    24. 24. Broad phylogenetic coverage<br />algae<br />non-flowering<br />flowering (angiosperm)<br />on role of polyploidy in Darwin’s “abominable mystery”<br />Phylogenomicsof 1000 species across plant taxa<br />
    25. 25. Tree Reconciliation<br />To reconcile the evolutionary history of genes and species.<br />
    26. 26. Gene family data courtesy John Bowers<br />Tree Reconciliation<br />
    27. 27.
    28. 28. Taxonomic Name Resolution<br />Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.<br />
    29. 29. Taxonomic uncertainty<br />Non-existent names<br /><ul><li>Misspellings
    30. 30. Contamination
    31. 31. Annotations
    32. 32. Morphospecies
    33. 33. Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions)</li></ul>Synonymy<br /><ul><li>Nomenclatural synonyms
    34. 34. Taxonomic synonyms / concepts</li></ul>Misidentifications, incomplete identifications <br />
    35. 35. a)Centauriumcurvistamineum (Wittr.) Abrams (1951)<br />b)Centaurium minimum (Howell) Piper (1915)<br />c)Centauriummuhlenbergii(Griseb.) Wight ex Piper (1906)<br />d)Centauriummuhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937)<br />e)Centauriummuhlenbergii (Griseb.) Wight ex Piper var. albiflorumSuksd. (1927)<br />f)Centaurodesmuhlenbergii (Griseb.) Kuntze (1891)<br />g)ErythraeacurvistamineaWittr. (1886)<br />h)Erythraea minima Howell (1901)<br />i)ErythraeamuhlenbergiiGriseb. (1839)<br />Image: Gordon Leppig & Andrea J. Pickart<br />
    36. 36. How to figure that out?<br />…or ask around at<br />
    37. 37. Makemake at de.wikipedia<br />
    38. 38. Non-existent names: Herbarium specimens<br />*New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors<br />
    39. 39. Hans Hillewaert<br />
    40. 40.
    41. 41. Taxonomic Name Resolution Service<br />Computer assisted standardization of plant names<br />Corrects spelling errors and alternative spellings to a standard list of names<br />Convert out-of-date names to currently accepted names<br />
    42. 42.
    43. 43.
    44. 44.
    45. 45. Availability<br />Source code (3-clause BSD)<br /><br />Web + API instructions<br /><br />
    46. 46.
    47. 47.
    48. 48.
    49. 49.
    50. 50. Trait Evolution<br />To develop an infrastructure for downstream analysis of large trees.<br />
    51. 51. Trait Evolution <br />Toolkit to study the evolution of traits of interest on very large phylogenies<br />Diversification<br />Biogeographic patterns<br />Adaptation<br />Co-evolution <br />…<br />
    52. 52. Current analyses (Proof of concept)<br />Phylogenetically Independent Contrasts(Felsenstein 1985)<br />Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004)<br />Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)<br />
    53. 53. Community Integrated (2 ½ Days Workshop)<br />EUtils<br />Lopper<br />RAxML<br />Ninja<br />Phyml<br />Muscle<br />PHYLIP<br />VCF to GFF script<br />LRmaqqtl<br />FASTX quality stats<br />FASTX quality boxplot<br />FASTX nucleotide distribution<br />Cuffcompare<br />ERMINEJ<br />progressiveMauve<br />iPlantBorda (mlpy)<br />iPlantCanberra (mlpy)<br />vbay<br />MECPM<br />OUCH<br />Picante<br />Ontologize<br />BOWTIE<br />BWA<br />TopHat<br />SHRiMP<br />Cuffdiff<br />GNU Core Text utilities<br />GeneMania<br />SRA import<br />PARS<br />PL<br />DTT<br />BBC biclustering<br />
    54. 54.<br />To easily share information and research, collaborate, and stay on top of the latest news in the field.<br />
    55. 55. Collaborative Tool<br />AVAILABLE NOW!<br />NEW AND IMPROVED!<br /><br />
    56. 56.
    57. 57.<br />