Semantic tools for aggregation of morphological characters across studies
1. Semantic tools for
aggregation of morphological
characters across studies
James Balhoff, Alex Dececchi, Paula Mabee,
Hilmar Lapp, & Phenoscape team
2. Rich body of morphological
observations – mostly locked up
Zebrafish Model of Human Ectodermal Dysplasia
Figure 2. The dominant gene Nkt is phenotypically similar, however complements fls mutants. Nkt homozygotes show complete loss of
scales, teeth and gill rakers resembling the fls phenotype (A–C). Heterozygous Nkt zebrafish show an intermediate phenotype of scale loss and
patterning defect (arrows) while no effect on fin development is seen (D). Heterozygous Nkt also show a dominant effect on the number of teeth
(arrows, E) and gill rakers (F), showing deficiencies along the posterior branchial arches and formation of rudimentary rakers along ceratobranchial 1
and 2 (arrows, F). Cb1-5, ceratobranchial bones.
doi:10.1371/journal.pgen.1000206.g002
Table 1. Quantitative effect of fls on scale number and shape
and the effect of background modifiers in Danio rerio strains
on flsdt3Tpl.
and a cytoplasmic terminal death domain essential for protein
interactions with signaling adaptor complexes. The flste370f
mutation is an A to T transversion at a splice acceptor site,
3. Free text is a barrier to machinebased integration
Phylogenetic systematics
Human genetics
OMIM query
“large bone”
“enlarged bone”
“big bones”
“huge bones”
“massive bones”
“hyperplastic bones”
Lundberg & Akama 2005
“hyperplastic bone”
“bone hyperplasia”
“increased bone growth”
# of records
1083
224
21
4
41
12
45
181
879
http://www.ncbi.nlm.nih.gov/omim
7. How it works: shared ontologies,
rich semantics, OWL reasoning
8. Phenoscape KB content
16,000 character states from >120 comparative
morphological datasets, linked to 4,000 vertebrate
taxa.
Imported genetic phenotype and expression data
from ZFIN, Xenbase, MGI, and Human Phenotype
project.
Shared semantics: Uberon (anatomy), PATO
(phenotypic qualities), Entity–Quality (EQ) OWL
axioms (phenotype observations)
Plus a dozen other ontologies ...
9. Integrative querying with the
Phenoscape KB: scale, absent
Ictalurus punctatus
eda gene in Danio rerio
“body: naked”—Kailola, P. J. 2004. A
phylogenetic exploration of the catfish family
Ariidae (Otophysi; Siluriformes). The Beagle,
Records of the Museums and Art Galleries of the
Northern Territory 20:87-166
edadt3S243X/dt3S243X — Harris, M.P., Rohner, N.,
Schwarz, H., Perathoner, S., Konstantinidis, P.,
and Nüsslein-Volhard, C.. 2008. Zebrafish eda
and edar mutants reveal conserved and
ancestral roles of ectodysplasin signaling in
vertebrates. PLoS Genetics 4(10):e1000206.
10. Integrating phylogenetic studies
Can we use reasoning to integrate character
matrices across studies?
Would enable the wealth of single-study character
analysis methods on any integrated matrix.
Including tree-based comparative phylogenetic
methods
11. Evolution of Sarcopterygian Limb/Fin
Combined matrix of any character states related to
presence/absence of limb/fin structures from
studies in Phenoscape KB
Clack, J. A. (2009). The Fin to Limb Transition: New Data, Interpretations, and Hypotheses from Paleontology and Developmental Biology. Annual
Review of Earth and Planetary Sciences, 37(1), 163-179
12. EQ supermatrix synthesis:
workflow
1. Use OWL reasoner to group character states by
anatomy and quality axes, based on EQ annotations.
2. Export groupings as character matrix, with taxon
assignments to states from original data.
3. Supplement presence/absence character state
assertions with reasoner-inferred information.
4. Use Phenex data editor to manually consolidate
character states where appropriate
13. EQ supermatrix synthesis:
Results
Synthesized limb/fin character matrix
1055 Sarcopterygian taxa
494 characters
2-7 states per character
from 55 original studies
Developed several tools for automated character
matrix synthesis to make this happen.
14. Technology stack
Ontologies and phenotype observation data in
OWL
ELK, an OWL-EL reasoner
OWL-DL reasoners are too slow for this
OWL API (Java), programmed primarily using
Scala
Bigdata™ RDF triplestore (~ 25 million triples)
15. Using reasoning to group
character states
For every pair of anatomical term X and quality
attribute Y, generate a “character expression” OWL
class: (involves some X and involves some Y)
Done programmatically via property chain axioms
and OWL reasoning (ELK)
Classify character states to most relevant character
expression
Done by OWL reasoner (ELK)
Inferred relationships materialized to triple store
16. Challenge: scalable reasoning
Anatomy ontologies and EQ annotation employ
rich OWL semantics → best used with a DL reasoner
Classifying and querying over large dataset (~25
million RDF triples) does not scale well
Presently, the only feasible OWL reasoner is ELK
constrained to OWL EL profile → limits kinds of
expressions we use
best performance over class axioms only →
data must be modeled so as to avoid need for
classifying instances
17. Challenge: Querying complex
expressions
Want to allow arbitrary selection of structures of
interest, using rich semantics:
(part_of some (limb/fin or girdle skeleton)) or
(connected_to some girdle skeleton)
RDF triplestores provide very limited reasoning
expressivity, and scale poorly with large ontologies.
However, ELK can answer class expression queries
within seconds.
18. Instead of something like this (*):
PREFIX
rdf:
<http://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#>
PREFIX
rdfs:
<http://www.w3.org/2000/01/rdf-‐schema#>
PREFIX
ao:
<http://purl.obolibrary.org/obo/my-‐anatomy-‐ontology/>
PREFIX
owl:
<http://www.w3.org/2002/07/owl#>
SELECT
DISTINCT
?gene
WHERE
{
?gene
ao:expressed_in
?structure
.
?structure
rdf:type
?structure_class
.
#
Triple
pattern
selecting
structure:
?structure_class
rdfs:subClassOf
"ao:muscle”
.
?structure_class
rdfs:subClassOf
?restriction
?restriction
owl:onProperty
ao:part_of
.
?restriction
owl:someValuesFrom
"ao:head"
.
}
We would really like to do this:
PREFIX
rdf:
<http://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#>
PREFIX
rdfs:
<http://www.w3.org/2000/01/rdf-‐schema#>
PREFIX
ao:
<http://purl.obolibrary.org/obo/my-‐anatomy-‐ontology/>
PREFIX
ow:
<http://purl.org/phenoscape/owlet/syntax#>
SELECT
DISTINCT
?gene
WHERE
{
?gene
ao:expressed_in
?structure
.
?structure
rdf:type
?structure_class
.
#
Triple
pattern
containing
an
OWL
expression:
?structure_class
rdfs:subClassOf
"ao:muscle
and
(ao:part_of
some
ao:head)"^^ow:omn
.
}
19. owlet: SPARQL query expansion
with in-memory OWL reasoner
owlet interprets OWL class expressions
embedded within SPARQL queries
Uses any OWL API-based reasoner to preprocess
query.
We use ELK that holds terminology in memory.
Replaces OWL expression with FILTER statement
listing matching terms
https://github.com/phenoscape/owlet
20. PREFIX
rdf:
<http://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#>
PREFIX
rdfs:
<http://www.w3.org/2000/01/rdf-‐schema#>
PREFIX
ao:
<http://purl.obolibrary.org/obo/my-‐anatomy-‐ontology/>
PREFIX
ow:
<http://purl.org/phenoscape/owlet/syntax#>
SELECT
DISTINCT
?gene
WHERE
{
?gene
ao:expressed_in
?structure
.
?structure
rdf:type
?structure_class
.
#
Triple
pattern
containing
an
OWL
expression:
?structure_class
rdfs:subClassOf
"ao:muscle
and
(ao:part_of
some
ao:head)"^^ow:omn
.
}
➡︎
owlet
➡︎
PREFIX
rdf:
<http://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#>
PREFIX
rdfs:
<http://www.w3.org/2000/01/rdf-‐schema#>
PREFIX
ao:
<http://purl.obolibrary.org/obo/my-‐anatomy-‐ontology/>
PREFIX
ow:
<http://purl.org/phenoscape/owlet/syntax#>
SELECT
DISTINCT
?gene
WHERE
{
?gene
ao:expressed_in
?structure
.
?structure
rdf:type
?structure_class
.
#
Filter
constraining
?structure_class
to
the
terms
returned
by
the
OWL
query:
FILTER(?structure_class
IN
(ao:adductor_mandibulae,
ao:constrictor_dorsalis,
...))
}
21. Inferring presence/absence
Character states often do not directly assert, but
imply presence or absence.
Most phenotypic descriptions of some feature of a
structure implies its presence or absence:
“Humerus slender and elongate: with length more than three
times the diameter of its distal end” → humerus must be
present
Partonomy axioms in the ontology allow inferring
presence or absence:
‘all humerus part_of some forelimb’ → forelimb must be
present if humerus is; humerus must be absent if forelimb is
22. Absence is typically
modeled using negation
→ not (has_part some
forelimb)
Negation not part of OWL
EL (and thus ELK reasoner)
C = has_part
some appendage
︎
B = has_part
some limb
︎
—————reverse—————
Challenge: absence reasoning
with OWL EL
absentA =
not A
︎
absentB =
not B
︎
Solution: programmatic
A = has_part
absentC =
assertion of “absence
some forelimb
not C
hierarchy” via classification
of negated expressions
Requires precomputation, constraints for on-the-fly use
25. Result: Reasoning fills in many
missing character states
asserted presence/absence
with inference
Mesquite “birds-eye view”
26. Unified matrix enables candidate gene view
Linking evolutionary phenotypes to genes through
ontologies, via Phenoscape KB or similarity
27. Integrated data highlight
conflict and gaps
Conflicting interpretations in studies
supinator process of humerus: both absent &
present in Strepsodus (Zhu et al. 1999 vs.
Ruta 2011)
figure from Parker et al., 2005
Gaps in knowledge
acetabulum present or absent?
Acetabulum of pelvic
girdle: present/absent
Same term, different meaning?
Acanthostega— “radials, jointed” (Swartz
2012)
but doesn’t have radials...
Uneven taxon sampling
http://characterdesignnotes.blogspot.com/2011/04/proper-use-of-reference-and-anatomy-in.html
29. Phenoscape project team
National Evolutionary Synthesis Center
(NESCent)
University of Oregon (Zebrafish Information
Network)
Todd Vision (also University of North
Carolina at Chapel Hill)
Monte Westerfield
Hilmar Lapp
Ceri Van Slyke
Jim Balhoff
Cincinnati Children's Hospital (Xenbase)
Prashanti Manda
University of South Dakota
Paula Mabee
David Blackburn
Paul Sereno
Nizar Ibrahim
Mouse Genome Informatics
Terry Hayamizu
Christina James-Zorn
California Academy of Sciences
Alex Dececchi
Judith Blake
Aaron Zorn
Virgilio Ponferrada
Wasila Dahdul
University of Chicago
Yvonne Bradford
University of Arizona
Hong Cui
Oregon Health & Science University
Melissa Haendel
Lawrence Berkeley National Labs
Chris Mungall