Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Connecting the semantics of evolutionary morphology to comparative phylogenetics

Presentation of the software package RPhenoscape for the R platform for statistical computing. The package bridges between the ecosystem of packages for comparative phylogenetics in R and the data content and computational semantics services provided by the API of the Phenoscape Knowledgebase. Presented at the 2016 Evolution Meetings in Austin, TX.

  • Be the first to comment

  • Be the first to like this

Connecting the semantics of evolutionary morphology to comparative phylogenetics

  1. 1. Rphenoscape:
 Connecting the semantics of evolutionary morphology to comparative phylogenetics Hilmar Lapp (Duke University) Hong Xu (Duke University) Jim Balhoff (RTI, Inc.) Evolution Meetings 2016, Austin, TX
  2. 2. RPhenoscape • A package for accessing the Phenoscape Knowledgebase from within R programs • Programmatic access to: • Evolutionary character data with computable semantics • Machine-reasoning with computable phenotype data
  3. 3. R features a rich ecosystem for comparative phylogenetics CRAN Task View on Phylogenetics and Comparative Methods at last count lists 76 packages.
  4. 4. Comparative analysis needs comparative data Magee et al (2014), PLOS ONE See also Drew et al (2013), PLOS Biology; Stoltzfus et al (2012), BMC Research Notes
  5. 5. The lack of reusable digital data is amplified for morphology Which matrix are we criticizing? RC07 published their character list (their appendix 2) and their matrix (appendix 3). We also have a hitherto unpublished NEXUS file (presented here as part of Data S3), most likely sent by [M.R.] to [D.G.] in late 2007 or early 2008, which purports to contain the same matrix. Surprisingly, the character list in the paper and that in the file do not agree on the identities of characters 132–134. Marjanović and Laurin (2015) Reevaluation of the largest published morphological data matrix for phylogenetic analysis of Paleozoic limbed vertebrates. PeerJ Preprints 3:e1596v1
  6. 6. Morphology is more complex than discrete, disjoint, independent
  7. 7. We know more about morphology than authors state
  8. 8. Implied knowledge can be substantial Asserted Inferred Missing Digit presence/absence; Sarcopterygii
  9. 9. Make morphology computable, discoverable, & linked to genetic data 9
  10. 10. Phenoscape Knowledgebase ❖ 4,399 taxa (vertebrates) ❖ 139 publications (matrices) ❖ 19,024 character states ❖ 651,660 phenotype annotations Morphological matrices Annotation Ontologies anatomy quality taxonomy Phenex software (Balhoff et al., 2010) Phenoscape Knowledgebase Machine reasoner (OWL)
  11. 11. KB Interface for humans
  12. 12. KB Interface for machines
  13. 13. KB Interface for machines
  14. 14. RPhenoscape An R package for API access to the Phenoscape Knowledgebase • Evolutionary character data with computable semantics • Machine-reasoning with computable phenotype data: • Synthetic supermatrix synthesis • Semantics-based character and state filtering • Semantic similarity-driven querying and synthesis
  15. 15. Use-case: Querying studies by morphology and taxonomy > slist <- pk_get_study_list(taxon = "Ictaluridae", entity = "pectoral fin") > slist[,"label"] Source: local data frame [10 x 1] label <chr> 1 Bockmann, F. A. (1998) 2 Chen, X. (1994) 3 De Pinna, M. C., Ferraris, C. J. J., & Vari, R. P. (2007) 4 Fink, S. V, & Fink, W. L. (1981); Fink, S. V, & Fink, W. L. (1996) 5 Kailola, P. J. (2004) 6 Lundberg, J. G. (1992) 7 Mo, T. (1991) 8 Royero, R. (1999) 9 Vigliotta, T. R. (2008) 10 Wiley, E.O., and Johnson, G.D. (2010) >
  16. 16. Use-case: Querying studies by morphology and taxonomy > nex_list <- pk_get_study_xml(as.matrix (slist[2:3,c("id")])) ....This might take a while.... +amblycipitid+catfishes+%28Teleostei%2C+Siluriformes%29+with +species+accounts&btnG=&hl=en&as_sdt=0%2C42 Parse NeXML.... Parse NeXML.... > nex_list[[1]] A nexml object representing: 0 phylogenetic tree blocks, where: block 1 contains NULL phylogenetic trees block 0 contains phylogenetic trees 155 meta elements 1 character matrices 53 taxonomic units Taxa: Pseudobagarius leucorhynchus, Liobagrus obesus, Hypsidoris farsonensis, Erethistes sp. (Chen 1994), Bunocephalus amaurus, Xyliphius sp. (Chen 1994) ...
  17. 17. Heavy-lifting of NeXML parsing is done by RNeXML
  18. 18. Use-case: Synthesize presence- absence matrix > nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine") > m <- pk_get_ontotrace(nex) > m[1:10,] ## Source: local data frame [15 x 5] ## ## taxa otu ## (chr) (chr) ## 1 Ameiurus brunneus VTO_0036273 ## 2 Ameiurus catus VTO_0036275 ## 3 Ameiurus melas VTO_0036272 ## 4 Ameiurus natalis VTO_0036274 ## 5 Ameiurus nebulosus VTO_0036278 ## 6 Ameiurus platycephalus VTO_0036276 ## 7 Ameiurus serracanthus VTO_0036277 ## 8 Ictalurus australis VTO_0061495 ## 9 Ictalurus balsanus VTO_0036221 ## 10 Ictalurus dugesii VTO_0061497 ## Variables not shown: otus (chr), anterior dentation of pectoral fin spine ## (int), anterior distal serration of pectoral fin spine (dbl)
  19. 19. Use-case: Filter matrix using semantics > is_desc <- pk_is_descendant('Ictalurus', m$taxa) ## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE ## [12] TRUE TRUE TRUE TRUE # pk_is_ancestor() also available (and pk_is_extinct()) # # This is in development for characters, too: # pk_is_descendant(‘jaw skeleton', m$chars, # relationships = ‘part of’)
  20. 20. Current limitations Package is not on CRAN (yet), need to install from Github:
 Data in Phenoscape concentrated on vertebrates, and skeletal fin-limb characters. Semantics-driven matrix synthesis currently limited to presence-absence characters. library(“devtools”)
  21. 21. Summary Phenoscape KB has an API for machine access to computable morphology data and computational semantics services. RPhenoscape is a bridge between this API and the ecosystem of comparative phylogenetics packages in R. Translates between R user (who uses labels, data matrices) and Phenoscape KB API (which uses identifiers, ontology terms, NeXML, etc). Code on Github:
  22. 22. Reproducible data integration: The rOpenSci ecosystem
  23. 23. Acknowledgements U.S. National Science Foundation DBI-1062404, DBI-1062542 National Evolutionary Synthesis Center (NESCent), NSF #EF-0905606 Phenoscape contributors, Advisory Board, Data sources (see: Acknowledgments) RNeXML developers (C. Boettiger, S. Chamberlain)
  24. 24. Get in touch Repo: Github: Twitter: @hlapp