Phylogenetic Workflows: Tree Building and Post-tree Analyses Naim Matasci The iPlant Collaborative Plant Biology 2011 August 6-10, 2011
Why is the tree of life important? “ Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”
Nothing in biology makes sense except in the light of evolution. T. G. Dobzahnsky
Scalability Ackerly, 2009; J. Felsenstein, ca. 1980; Ranger Cluster at TACC
iPlant Tree of Life Grand Challenge Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved
Ancestral state of Hawaiian lobelioids Lobelia niihauensis  (Image: David Eickhoff) Cyanea leptostegia  (Image: Karl Magnacca)
 
Continuous Ancestral Character Estimation  (Schulter  et al.  1997, Paradis 2004) ?
Obtain sequences GetSeq Align sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
Obtain sequences GetSeq Align sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
>gi|1835233|emb|Z83147.1| S.nepaulensis rbcL gene TTATTATACTCCTGAATAYGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTGCTCAGCCT GGAGTTCCACCCGAAGAAGCGGGGGCCGCGGTAGCTGCGGAATCTTCTACTGGTACATGGACAACTGTGT GGACCGATGGACTTACTAACCTTGATCGTTACAAAGGGCGATGCTACAACATAGAGCCCGTTGCTGGAGA AGAAAATCAATTTATTGCTTATGTAGCTTATCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATG TTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAA TCCCTACTGCGTATTGTAAAACTTTCCAAGGACCGCCTCATGGGATCCAAGTTGAAAGAGATAAATTGAA CAAGTATGGTCGTCCCTTGCTGGGATGTACTATTAAACCTAAATTGGGGTTATCGGCTAAAAACTACGGT AGAGCAGTTTATGAATGTCTACGCGGTGGGCTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAAC CATTTATGCGTTGGAGAGACCGTTTCGTATTTTGTGCCGAAGCAATTTTTAAAGCACAGTCTGAAACAGG TGAAATCAAAGGGCATTACTTGAATGCTACTGCAGGTACATGTGAAGAAATGATGAAAAGGGCTATATTT >gi|1835227|emb|Z83136.1| S.foetidissimum rbcL gene AAGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTGACTTATTATACTCCTGACTATGAAACCAAA GATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCAGGGGCCG CGGTAGCTGCCGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAGCCTTGATCG TTACAAAGGGCGATGCTACCACATCGAGCCCGTNGCTGGAGAAGAAAATCAATATATTGCTTATGTAGCT TATCCTTTAGACCTYTTTGAAGAAGGTTCTGTTACTAATATGTKNACTTCCATTGTGGGGAATGTATTTG GGTTCAAAGCCCTGCGTGCTTTACGTCTGGAAGATCTGCGAATCCCTCCTGCGTATTCTAAAACTTTCCA AGGACCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAACAAGTACGGTCGTCCCCTGTTGGGATGT ACTATTAAACCTAAATTGGGGTTATCTGCTAAAAACTACGGTAGAGCGGTTTATGAATGTCTCCGCGGTG GACTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGATCGTTTCTT ATTTTGTGCCGAAGCACTTTATAAAGCACAGGCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCT >gi|1834456|emb|Z83132.1| G.urceolata rbcL gene AACTAAAGCGGGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTAACTTATTATACTCCTGACTAT GAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAG CGGGGGCCGCCGTAGCTGCCGAATCCTCCACTGGTACATGGACAACTGTGTGGACCGACGGACTTACTAG CCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTGGCTGGAGAAGAAAATCAATTTATTGCT TATGTAGCTTACCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTA ATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTGTTGCGTATGCTAA AACTTTCCAAGGGCCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAATAAGTATGGTCGTCCCCTG
Get Sequences Retrieves nucleotide and amino acid sequences from NCBI's GenBank Automatically includes  species name  and  taxon ID
Get sequences DEMO
 
 
 
 
 
 
 
 
Obtain sequences GetSeq Align sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
muscleDEMO
 
 
 
 
 
 
Obtain sequences GetSeq Align sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
Improved Tree Building Tools NINJA/WINDJAMMER  (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species  Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set  RAxML-Light  (Alexandros Stamatakis) Large Scale Maximum Likelihood implementation  55K Tree published  (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)
RAxML DEMO
 
 
 
 
 
 
Obtain sequences GetSeq Align sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information
iPlant Tree Viewer http://portnoy.iplantcollaborative.org/
Live tree view demo
 
 
 
 
 
 
Obtain sequences GetSeq Align sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
Obstacles
Lopper DEMO
 
 
 
 
 
 
 
 
 
 
 
 
Lobelia kauaensis Lobelia villosa Galeatella gloria-montis Trematolobelia kauaiensis Trematolobelia macrostachys Lobelia hypoleuca Neowimmeria yuccoides Lobelia niihauensis Brighamia insignis Brighamia rockii Delissea rhytidosperma Delissea subcordata Cyanea acuminata Cyanea hirtella Cyanea coriacea Delissea leptostegia Clermontia kakeana Clermontia parviflora Clermontia arborescens Clermontia fauriei
The TNRS: A Taxonomic Name Resolution Service for Plants Tonight from 5:30 - 7:30 in Exhibit Hall A. Poster number  P21011 .
Obtain sequences GetSeq Align sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
CACE DEMO
 
 
 
 
 
 
 
 
 

Phylogenetic Workflows

  • 1.
    Phylogenetic Workflows: TreeBuilding and Post-tree Analyses Naim Matasci The iPlant Collaborative Plant Biology 2011 August 6-10, 2011
  • 2.
    Why is thetree of life important? “ Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”
  • 3.
    Nothing in biologymakes sense except in the light of evolution. T. G. Dobzahnsky
  • 4.
    Scalability Ackerly, 2009;J. Felsenstein, ca. 1980; Ranger Cluster at TACC
  • 5.
    iPlant Tree ofLife Grand Challenge Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved
  • 6.
    Ancestral state ofHawaiian lobelioids Lobelia niihauensis (Image: David Eickhoff) Cyanea leptostegia (Image: Karl Magnacca)
  • 7.
  • 8.
    Continuous Ancestral CharacterEstimation (Schulter et al. 1997, Paradis 2004) ?
  • 9.
    Obtain sequences GetSeqAlign sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
  • 10.
    Obtain sequences GetSeqAlign sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
  • 11.
    >gi|1835233|emb|Z83147.1| S.nepaulensis rbcLgene TTATTATACTCCTGAATAYGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTGCTCAGCCT GGAGTTCCACCCGAAGAAGCGGGGGCCGCGGTAGCTGCGGAATCTTCTACTGGTACATGGACAACTGTGT GGACCGATGGACTTACTAACCTTGATCGTTACAAAGGGCGATGCTACAACATAGAGCCCGTTGCTGGAGA AGAAAATCAATTTATTGCTTATGTAGCTTATCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATG TTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAA TCCCTACTGCGTATTGTAAAACTTTCCAAGGACCGCCTCATGGGATCCAAGTTGAAAGAGATAAATTGAA CAAGTATGGTCGTCCCTTGCTGGGATGTACTATTAAACCTAAATTGGGGTTATCGGCTAAAAACTACGGT AGAGCAGTTTATGAATGTCTACGCGGTGGGCTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAAC CATTTATGCGTTGGAGAGACCGTTTCGTATTTTGTGCCGAAGCAATTTTTAAAGCACAGTCTGAAACAGG TGAAATCAAAGGGCATTACTTGAATGCTACTGCAGGTACATGTGAAGAAATGATGAAAAGGGCTATATTT >gi|1835227|emb|Z83136.1| S.foetidissimum rbcL gene AAGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTGACTTATTATACTCCTGACTATGAAACCAAA GATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCAGGGGCCG CGGTAGCTGCCGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAGCCTTGATCG TTACAAAGGGCGATGCTACCACATCGAGCCCGTNGCTGGAGAAGAAAATCAATATATTGCTTATGTAGCT TATCCTTTAGACCTYTTTGAAGAAGGTTCTGTTACTAATATGTKNACTTCCATTGTGGGGAATGTATTTG GGTTCAAAGCCCTGCGTGCTTTACGTCTGGAAGATCTGCGAATCCCTCCTGCGTATTCTAAAACTTTCCA AGGACCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAACAAGTACGGTCGTCCCCTGTTGGGATGT ACTATTAAACCTAAATTGGGGTTATCTGCTAAAAACTACGGTAGAGCGGTTTATGAATGTCTCCGCGGTG GACTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGATCGTTTCTT ATTTTGTGCCGAAGCACTTTATAAAGCACAGGCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCT >gi|1834456|emb|Z83132.1| G.urceolata rbcL gene AACTAAAGCGGGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTAACTTATTATACTCCTGACTAT GAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAG CGGGGGCCGCCGTAGCTGCCGAATCCTCCACTGGTACATGGACAACTGTGTGGACCGACGGACTTACTAG CCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTGGCTGGAGAAGAAAATCAATTTATTGCT TATGTAGCTTACCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTA ATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTGTTGCGTATGCTAA AACTTTCCAAGGGCCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAATAAGTATGGTCGTCCCCTG
  • 12.
    Get Sequences Retrievesnucleotide and amino acid sequences from NCBI's GenBank Automatically includes species name and taxon ID
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    Obtain sequences GetSeqAlign sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Obtain sequences GetSeqAlign sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
  • 31.
    Improved Tree BuildingTools NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set RAxML-Light (Alexandros Stamatakis) Large Scale Maximum Likelihood implementation 55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    Obtain sequences GetSeqAlign sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
  • 40.
    Tree Visualization >500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information
  • 41.
    iPlant Tree Viewerhttp://portnoy.iplantcollaborative.org/
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
    Obtain sequences GetSeqAlign sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
    Lobelia kauaensis Lobeliavillosa Galeatella gloria-montis Trematolobelia kauaiensis Trematolobelia macrostachys Lobelia hypoleuca Neowimmeria yuccoides Lobelia niihauensis Brighamia insignis Brighamia rockii Delissea rhytidosperma Delissea subcordata Cyanea acuminata Cyanea hirtella Cyanea coriacea Delissea leptostegia Clermontia kakeana Clermontia parviflora Clermontia arborescens Clermontia fauriei
  • 65.
    The TNRS: ATaxonomic Name Resolution Service for Plants Tonight from 5:30 - 7:30 in Exhibit Hall A. Poster number P21011 .
  • 66.
    Obtain sequences GetSeqAlign sequences Muscle Build Tree FastTree (aML) Ninja (NJ) PHYLIP (MP, NJ, ML) RAxML (ML) Visualize Tree iPlant Tree Viewer Integrate Data Lopper TNRS Run Analysis CACE DACE Contrast OUch Picante Penalized likelihood
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.

Editor's Notes

  • #3 Our understanding of the phylogeny of the half million known species of green plants has expanded dramatically over the past two decades, The task of assembling a comprehensive "tree of life" for them presents a Grand Challenge. Also part of the grand challenge is developing the necessary infrastructre to view and use the tree of life, to put it into the hands of plant biologists
  • #5 Left tree: Maple tree phylogeny from D. Ackerly Left picture: Joe Felsenstein, ca. 1980 Right picture: Ranger cluster at TACC
  • #32 Distance matrix calculation compared to FASTREE