Rphenoscape:  Connecting the semantics of evolutionary morphology to comparative phylogenetics

Rphenoscape: 
Connecting the semantics of
evolutionary morphology to
comparative phylogenetics
Hilmar Lapp (Duke University)
Hong Xu (Duke University)
Jim Balhoff (RTI, Inc.)
Evolution Meetings 2016, Austin, TX

RPhenoscape
• A package for accessing the
Phenoscape Knowledgebase from
within R programs
• Programmatic access to:
• Evolutionary character data with
computable semantics
• Machine-reasoning with
computable phenotype data

R features a rich ecosystem
for comparative phylogenetics
CRAN Task View on
Phylogenetics and
Comparative Methods at last
count lists 76 packages.

Comparative analysis needs comparative data
Magee et al (2014), PLOS ONE
See also Drew et al (2013), PLOS Biology; Stoltzfus et al (2012), BMC Research Notes

The lack of reusable digital data
is ampliﬁed for morphology
Which matrix are we criticizing?
RC07 published their character list (their appendix 2) and their
matrix (appendix 3). We also have a hitherto unpublished
NEXUS file (presented here as part of Data S3), most likely
sent by [M.R.] to [D.G.] in late 2007 or early 2008, which
purports to contain the same matrix. Surprisingly, the
character list in the paper and that in the file do not agree
on the identities of characters 132–134.
Marjanović and Laurin (2015) Reevaluation of the largest published morphological data matrix for
phylogenetic analysis of Paleozoic limbed vertebrates. PeerJ Preprints 3:e1596v1

Morphology is more
complex than discrete,
disjoint, independent

We know more
about morphology
than authors state

Implied knowledge can be
substantial
Asserted
Inferred
Missing
Digit presence/absence; Sarcopterygii

kb.phenoscape.org
Make morphology
computable, discoverable, &
linked to genetic data
9

Phenoscape Knowledgebase
❖ 4,399 taxa (vertebrates)
❖ 139 publications
(matrices)
❖ 19,024 character states
❖ 651,660 phenotype
annotations
Morphological
matrices
Annotation
Ontologies
anatomy
quality
taxonomy
Phenex software
(Balhoff et al., 2010)
Phenoscape
Knowledgebase
Machine reasoner
(OWL)

RPhenoscape
An R package for API access to the Phenoscape
Knowledgebase
• Evolutionary character data with computable
semantics
• Machine-reasoning with computable
phenotype data:
• Synthetic supermatrix synthesis
• Semantics-based character and state ﬁltering
• Semantic similarity-driven querying and
synthesis

Use-case: Querying studies by
morphology and taxonomy
> slist <- pk_get_study_list(taxon = "Ictaluridae",
entity = "pectoral fin")
> slist[,"label"]
Source: local data frame [10 x 1]
label
<chr>
1 Bockmann, F. A. (1998)
2 Chen, X. (1994)
3 De Pinna, M. C., Ferraris, C. J. J., & Vari, R. P. (2007)
4 Fink, S. V, & Fink, W. L. (1981); Fink, S. V, & Fink, W. L. (1996)
5 Kailola, P. J. (2004)
6 Lundberg, J. G. (1992)
7 Mo, T. (1991)
8 Royero, R. (1999)
9 Vigliotta, T. R. (2008)
10 Wiley, E.O., and Johnson, G.D. (2010)
>

Use-case: Querying studies by
morphology and taxonomy
> nex_list <- pk_get_study_xml(as.matrix
(slist[2:3,c("id")]))
....This might take a while....
https://scholar.google.com/scholar?q=hylogenetic+studies+of+the
+amblycipitid+catfishes+%28Teleostei%2C+Siluriformes%29+with
+species+accounts&btnG=&hl=en&as_sdt=0%2C42
Parse NeXML....
http://dx.doi.org/10.1111/j.1096-3642.2007.00306.x
Parse NeXML....
> nex_list[[1]]
A nexml object representing:
0 phylogenetic tree blocks, where:
block 1 contains NULL phylogenetic trees
block 0 contains phylogenetic trees
155 meta elements
1 character matrices
53 taxonomic units
Taxa: Pseudobagarius leucorhynchus, Liobagrus obesus,
Hypsidoris farsonensis, Erethistes sp. (Chen 1994), Bunocephalus
amaurus, Xyliphius sp. (Chen 1994) ...

Heavy-lifting of
NeXML parsing
is done by
RNeXML

Use-case: Synthesize presence-
absence matrix
> nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus",
"Ameiurus"), entity = "fin spine")
> m <- pk_get_ontotrace(nex)
> m[1:10,]
## Source: local data frame [15 x 5]
##
## taxa otu
## (chr) (chr)
## 1 Ameiurus brunneus VTO_0036273
## 2 Ameiurus catus VTO_0036275
## 3 Ameiurus melas VTO_0036272
## 4 Ameiurus natalis VTO_0036274
## 5 Ameiurus nebulosus VTO_0036278
## 6 Ameiurus platycephalus VTO_0036276
## 7 Ameiurus serracanthus VTO_0036277
## 8 Ictalurus australis VTO_0061495
## 9 Ictalurus balsanus VTO_0036221
## 10 Ictalurus dugesii VTO_0061497
## Variables not shown: otus (chr), anterior dentation of pectoral
fin spine
## (int), anterior distal serration of pectoral fin spine (dbl)

Use-case: Filter matrix using
semantics
> is_desc <- pk_is_descendant('Ictalurus', m$taxa)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
## [12] TRUE TRUE TRUE TRUE
# pk_is_ancestor() also available (and pk_is_extinct())
#
# This is in development for characters, too:
# pk_is_descendant(‘jaw skeleton', m$chars,
# relationships = ‘part of’)

Current limitations
Package is not on CRAN (yet), need to install from
Github: 
 
 
Data in Phenoscape concentrated on vertebrates, and
skeletal ﬁn-limb characters.
Semantics-driven matrix synthesis currently limited to
presence-absence characters.
library(“devtools”) 
install_github(“xu-hong/rphenoscape”)

Summary
Phenoscape KB has an API for machine access to
computable morphology data and computational
semantics services.
RPhenoscape is a bridge between this API and the
ecosystem of comparative phylogenetics packages in R.
Translates between R user (who uses labels, data
matrices) and Phenoscape KB API (which uses identiﬁers,
ontology terms, NeXML, etc).
Code on Github: https://github.com/xu-hong/rphenoscape

Reproducible data integration:
The rOpenSci ecosystem

Acknowledgements
U.S. National Science Foundation
DBI-1062404, DBI-1062542
National Evolutionary Synthesis Center
(NESCent), NSF #EF-0905606
Phenoscape contributors, Advisory Board, Data
sources (see: http://phenoscape.org/wiki/
Acknowledgments)
RNeXML developers (C. Boettiger, S. Chamberlain) 
http://github.com/rOpenSci/RNeXML

Get in touch
Repo: github.com/xu-hong/rphenoscape
Github: github.com/hlapp
Twitter: @hlapp

Rphenoscape:  Connecting the semantics of evolutionary morphology to comparative phylogenetics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Rphenoscape:  Connecting the semantics of evolutionary morphology to comparative phylogenetics

Similar to Rphenoscape:  Connecting the semantics of evolutionary morphology to comparative phylogenetics (20)

More from Hilmar Lapp

More from Hilmar Lapp (14)

Recently uploaded

Recently uploaded (20)