Successfully reported this slideshow.



Published on

Building a Foundation to Enable Semantic Technologies for Phylogenetically-Based Comparative Analyses

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. Building a Foundation to Enable Semantic Technologies for Phylogenetically-Based Comparative Analyses <ul><li>Maryam Panahiazar 1 , Arlin Stoltzfus 2 , Rutger Vos 3 , Enrico Pontelli 4 and Jim Leebens-Mack 1 </li></ul><ul><ul><ul><ul><li>1 University of Georgia, USA </li></ul></ul></ul></ul><ul><ul><ul><ul><li>2 NIST & University of Maryland, USA  </li></ul></ul></ul></ul><ul><ul><ul><ul><li>3 University of Reading, UK  </li></ul></ul></ul></ul><ul><ul><ul><ul><li>4 New Mexico State University </li></ul></ul></ul></ul>Phyloinformatics 06/24/11
  2. 2. Phyloinformatics 06/24/11 Motivation “ Nothing in biology makes sense except in the light of evolution ” (Theodosius Dobzhansky, 1973)…. and Nothing in evolution makes sense except in the light of phylogeny
  3. 3. 06/24/11 For example - Prediction of gene and protein function Jonathan A. Eisen, 1998,Genome Research, 8:163-167 Phyloinformatics 1. Choose gene of interest 2. Identify homolog 3.Align sequences 4.Calculate gene tree 5.Overaly known functions onto tree 6. Hypothesize function for all genes 7. Reconcile gene and species trees After Eisen 1998,Genome Research
  4. 4. 06/24/11 Example 2 – Testing congruence among phylogeographic analyses Knowles 2009 after Avis 1992 Phyloinformatics 1. Compile results of phylogeographic analyses for multiple species from the same geographic region 2. Apply demographic models to account for variation in generation times and substitution rates After Knowles 2009, Annu. Rev. Ecol. Evol. Syst.
  5. 5. Applying Semantics to Bioinformatics Integrative bioinformatics experimentation cycle 06/24/11 1.Problem Definition 2. Experimental Design 3. Data Integration 4. Data Analysis 15. Interpretation Biological hypothesis Protocol Raw integration result Analysis result knowledge 1. Imported or create data and knowledge models 2.Use data models to transform raw data to RDF data 3. Link data models to knowledge models 4. Select common domain 5. Construct and run semantic query Raw integration result Lennart J.G. Post, Marco Roos, M. Scott Marshall, Roel van Driel and Timo M. Breit. A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data, Vol. 23 no. 22 2007, pages 3080–3087 doi:10.1093/bioinformatics/btm461
  6. 6. Phyloinformatics 06/24/11 Requirements for data reuse in comparative analyses: <ul><li>Easy access to machine-readable trees, data matrices and meta-data (e.g. sample characteristics including sample locality) </li></ul><ul><li>A minimum reporting standard for phylogenetic analyses (MIAPA). </li></ul><ul><li>A controlled vocabulary for describing components of phylogenetic workflows </li></ul>
  7. 7. Bioinformatics and phylogeny 06/24/11 Proposed components of a minimum reporting standard for phylogenetic analyses: Leebens-Mack et al. 2006 OMICS
  8. 8. Bioinformatics and phylogeny 06/24/11 Developing an ontology for describing phylogentic workflows: <ul><li>Catalogue published methods of phylogentic analysis (, </li></ul><ul><li>Develop ontology that would accommodate published phylogenetic workflows, </li></ul><ul><li>Evaluate utility of ontology for describing published phylogenetic workflows. </li></ul><ul><li>Use ontology to construct NeXML files with annotated trees and data matrices </li></ul><ul><li>Elicit feedback from the Systematics community </li></ul>
  9. 9. Phyloinformatics 06/24/11 PhyloWays entry: <ul><li>Publication: </li></ul><ul><li>Soltis DE, Smith SA, Cellinese N, Wurdack KJ, Tank DC, Brockington SF, Refulio-Rodriguez NF, Walker JB, Moore MJ, Carlsward BS, et al. 2011. Angiosperm phylogeny: 17 genes, 640 taxa. Am J Bot 2011 : ajb.1000404. - </li></ul><ul><li>Data: concatenated alignments for a superset of 14loci/17 genes (nucleotide sequences) sampled from 640 species. Genes included 18S rDNA (nuc), 26S rDNA (nuc), atpB (cp), atp1 (mito), matK (cp), matR (mito), nad5 (mito), ndhF (cp), psbBTNH (cp 4 gene region), rbcL (cp), rpoC2 (cp), rps16 (cp), rps3 (mito), and rps4 (cp). </li></ul><ul><li>Alignment method: MAFFT used to align each of 14 loci; &quot;adjustments were made by eye when there were obvious alignment errors due to particularly divergent or “ gappy ” sequences&quot;; Sites (columns) with > 50% missing data (including gaps due to indels) were removed using Phyutility (Smith and Dunn, 2008). All or subsets of gene alignments concatenated for phylogenetic analysis. </li></ul><ul><li>Tree estimation: ML analyses performed the following data matrices; nuclear rDNA genes; cp genes; mito genes; nuclear+cp genes; all 17 genes; 10 independent runs for each data matrix. Program - RAxML (vers. 7.1; Stamatakis, 2006 ). </li></ul><ul><li>Model of sequence evolution - GTRGAMMA with parameters estimated separately (unlinked) for each gene partition. </li></ul><ul><li>Method for evaluating support - 100-300 bootstrap replicates </li></ul>
  10. 10. BPhyloinformatics 06/24/11 Current components of PhylOnt, an ontology for describing phylogenetics workflows: <ul><ul><li>Tree estimation program </li></ul></ul><ul><ul><li>Method of analysis </li></ul></ul><ul><ul><ul><li>Construction of data matrix </li></ul></ul></ul><ul><ul><ul><ul><li>Alignment…. </li></ul></ul></ul></ul><ul><ul><ul><li>Tree estimation </li></ul></ul></ul><ul><ul><ul><ul><li>optimality criterion…. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>branch swamping… </li></ul></ul></ul></ul><ul><ul><ul><ul><li>support assessment… </li></ul></ul></ul></ul>
  11. 11. Phyloinformatics 06/24/11 Tree estimation program ontology
  12. 12. Phyloinformatics 06/24/11 Data analysis ontology diagram
  13. 13. Phyloinformatics 06/24/11 Models for character state transitions (e.g. nucleotide substitution model)
  14. 14. Phyloinformatics 06/24/11 <ul><li>Complete PhylOnt </li></ul><ul><li>Develop NeXML file builder that uses PhylOnt concepts </li></ul><ul><li>Formalize Minimum Information about Phylogenetic Analyses (MIAPA) reporting standard </li></ul><ul><li>Evaluate and refine PhylOnt for construction of MIAPA – compliant NeXML files </li></ul>Next steps: