6. Linnaean names suffer from
fundamental shortfalls
• Names ≠ Identifiers
• Underlying taxon concepts (= semantics) shift
over time, and are computationally inaccessible
• At least 90-95% of estimated biodiversity doesn’t
have a name, and most will never receive one
• The better the Tree of Life is known, the more
groups of organisms without a name
7. The Phyloreferencing Project
• Funded by US National Science Foundation
since 2015
• Goal: Capture and make computable the
semantics of phylogenetic clade definitions for
computational integration of biological data.
• Major expected products:
• Ontology, specification, and tooling for authoring
phyloreferences.
• Ontology of phylogenetic clade definitions
• Online tools for using phyloreferences to retrieve data
phyloref.org
8. Anatomy of a phylogenetic
clade definition
Alligatoroidea =
Alligator mississippiensis and
all crocodylians closer to it than
to Crocodylus niloticus or
Gavialis gangeticus
Crocodylia =
Last common ancestor of
Gavialis gangeticus, Alligator
mississippiensis, and
Crocodylus niloticus and all of
its descendents
Brochu 2003
9. Anatomy of a phylogenetic
clade definition
Alligatoroidea =
Alligator mississippiensis and
all crocodylians closer to it than
to Crocodylus niloticus or
Gavialis gangeticus
Crocodylia =
Last common ancestor of
Gavialis gangeticus, Alligator
mississippiensis, and
Crocodylus niloticus and all of
its descendents
Brochu 2003
“Specifiers”
10. Anatomy of a phylogenetic
clade definition
Alligatoroidea =
Alligator mississippiensis and
all crocodylians closer to it than
to Crocodylus niloticus or
Gavialis gangeticus
Crocodylia =
Last common ancestor of
Gavialis gangeticus, Alligator
mississippiensis, and
Crocodylus niloticus and all of
its descendents
Brochu 2003
Type of definition
11. Anatomy of a phylogenetic
clade definition
Alligatoroidea =
Alligator mississippiensis and
all crocodylians closer to it than
to Crocodylus niloticus or
Gavialis gangeticus
Crocodylia =
Last common ancestor of
Gavialis gangeticus, Alligator
mississippiensis, and
Crocodylus niloticus and all of
its descendents
Brochu 2003
12. Anatomy of a phylogenetic
clade definition
Alligatoroidea =
Alligator mississippiensis and
all crocodylians closer to it than
to Crocodylus niloticus or
Gavialis gangeticus
Crocodylia =
Last common ancestor of
Gavialis gangeticus, Alligator
mississippiensis, and
Crocodylus niloticus and all of
its descendents
Brochu 2003
14. A phyloreference
Crocodylus niloticus and
all crocodylians
closer to it than to
Alligator mississippiensis
includes_TU some tco:Crocodylus niloticus and
excludes_TU some tco:Alligator mississippiensis
15. Resolving a phyloreference:
axiomatized tree + phyloreference + reasoner
Crocodylus niloticus and
all crocodylians
closer to it than to
Alligator mississippiensis
includes_TU some tco:Crocodylus niloticus and
excludes_TU some tco:Alligator mississippiensis
16. Resolving a phyloreference:
axiomatized tree + phyloreference + reasoner
Crocodylus niloticus and
all crocodylians
closer to it than to
Alligator mississippiensis
includes_TU some tco:Crocodylus niloticus and
excludes_TU some tco:Alligator mississippiensis
17. Resolving a phyloreference:
axiomatized tree + phyloreference + reasoner
Crocodylus niloticus and
all crocodylians
closer to it than to
Alligator mississippiensis
includes_TU some tco:Crocodylus niloticus and
excludes_TU some tco:Alligator mississippiensis
18. Phyloreferences have major
advantages for data integration
• Unambiguous and computable semantics
• Portable between phylogenetic trees, and hence
competing or evolving phylogenetic hypotheses
• Computationally reproducible
• Can be constructed for any clade, and hence
enables unrestricted communication
19. Challenges with OWL-DL:
No set size or path length
Crocodylus niloticus and
all crocodylians
closer to it than to
Alligator mississippiensis
“Maximum clade definition”
20. Challenges with OWL-DL:
No set size or path length
Last common ancestor of
Alligator mississippiensis
and Crocodylus niloticus
and all of its descendents
“Minimum clade definition”
21. Challenges with OWL-DL:
No set size or path length
Last common ancestor of
Alligator mississippiensis
and Crocodylus niloticus
and all of its descendents
includes_TU some tco:Crocodylus niloticus and
includes_TU some tco:Alligator mississippiensis
22. Challenges with OWL-DL:
No set size or path length
Last common ancestor of
Alligator mississippiensis
and Crocodylus niloticus
and all of its descendents
includes_TU some tco:Crocodylus niloticus and
includes_TU some tco:Alligator mississippiensis
23. Challenges with OWL-DL:
No set size or path length
Last common ancestor of
Alligator mississippiensis
and Crocodylus niloticus
and all of its descendents
has_Child some
(includes_TU some tco:Crocodylus niloticus and
excludes_TU some tco:Alligator mississippiensis)
24. Challenges with OWL-DL:
No set size or path length
Last common ancestor of
Alligator mississippiensis
and Crocodylus niloticus
and all of its descendents
has_Child some
(includes_TU some tco:Crocodylus niloticus and
excludes_TU some tco:Alligator mississippiensis)
• However, this will fail for the general case:
MaxClade(S1, S2) := S1 ~ S2
Parent(S) := has_Child some S
LCA(S1, S2) := Parent(MaxClade(S1, S2))
• If S2 has_Ancestor some (includes_TU some S1) then
MaxClade(S1, S2) == {}, but LCA should be S1
• I.e., in this approach SN cannot itself be a phyloreference
(LCA of a clade).
25. Challenge: Multiple specifiers
• Naïve recursive approach:
LCA(S1,…,SN) = LCA(LCA(S1,…,SN-1),SN)
• However, as shown for LCA(S1, S2), S1 or S2
cannot itself be an LCA
Last common ancestor
of Gavialis gangeticus,
Alligator mississippiensis,
and Crocodylus niloticus
and all of its descendents
26. Challenge: Multiple specifiers
Last common ancestor
of Gavialis gangeticus,
Alligator mississippiensis,
and Crocodylus niloticus
and all of its descendents
27. Challenge: Multiple specifiers
Last common ancestor
of Gavialis gangeticus,
Alligator mississippiensis,
and Crocodylus niloticus
and all of its descendents
Parent(LCA(Cn, Am) ~ Gg)
28. Challenge: Multiple specifiers
Last common ancestor
of Gavialis gangeticus,
Alligator mississippiensis,
and Crocodylus niloticus
and all of its descendents
Parent(LCA(Cn, Am) ~ Gg) Parent(LCA(Cn, Gg) ~ Am) Parent(LCA(Am, Gg) ~ Cn)OR OR
29. Challenge: Multiple specifiers
• The number of binary tree topologies for n
leaves grows very fast, and is > 100 for 5
Last common ancestor
of Gavialis gangeticus,
Alligator mississippiensis,
and Crocodylus niloticus
and all of its descendents
30. Challenge: Scalability
• Eventually we want to apply this to very large
trees, with >1M leave nodes.
• Only OWL-EL reasoners (e.g., ELK) scale well to
this level.
• However, this prevents use of disjunction.
31. Challenge: Scalability
• As a kludge, can use multiple equivalency axioms
instead.
• However, this in essence makes false assertions.
• Can result in unexpected subclass inferences for
other phyloreferences.
Parent(LCA(Cn, Am) ~ Gg) Parent(LCA(Cn, Gg) ~ Am) Parent(LCA(Am, Gg) ~ Cn)
equivalentClass: equivalentClass: equivalentClass:
32. Challenge: “Qualifiers”
• “Qualifiers” are specifiers that are required to be
included or excluded by the clade.
• “Kill switches” – tests that do not alter the
semantics of the clade definition, but render it
invalid if it fails any of the tests
• External qualifiers as property chain:
has_Ancestor o excludes_TU -> excludes_qualifying_TU
33. Summary
• Reproducible large-scale comparative biology
requires taxon concepts with fully computable
semantics
• Phylogenetic clade definitions have well-defined
semantics in the form of necessary and sufficient
conditions for clade membership
• Clade semantics are well expressible in OWL, but
OWL lacks constructs needed for inferring last
common ancestors generically and scalably.
34. Acknowledgements
• Nico Cellinese, Gaurav Vaidya, Anna
Becker (U. Florida)
• Pascal Hitzler and DaSe Lab alumni &
collaborators (see Carral et al. WOP
2017, arXiv:1710.05096)
• Funded by the US National Science
Foundation (DBI-1458484, DBI-1458604)
35. How to find us
• Web: http://phyloref.org (includes link to full
grant proposal)
• Github: http://github.com/phyloref