Successfully reported this slideshow.
Your SlideShare is downloading. ×

Of Trees and Owl: 
The challenges of reasoning over the semantics of shared descent

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 35 Ad

More Related Content

Similar to Of Trees and Owl: 
The challenges of reasoning over the semantics of shared descent (20)

More from Hilmar Lapp (19)

Advertisement

Recently uploaded (20)

Of Trees and Owl: 
The challenges of reasoning over the semantics of shared descent

  1. 1. Of Trees and Owl:
 The challenges of reasoning over the semantics of shared descent Hilmar Lapp Duke University US2TS 2019 in Durham, NC
  2. 2. https://commons.wikimedia.org/wiki/File:Dobzhansky_Evolution_Notre_Dame.jpg
  3. 3. Phylogenetic trees express hypotheses of common descent Brochu 2003
  4. 4. Phylogenetic trees express hypotheses of common descent Brochu 2003
  5. 5. Most data use Linnaean taxonomy
  6. 6. Linnaean names suffer from fundamental shortfalls • Names ≠ Identifiers • Underlying taxon concepts (= semantics) shift over time, and are computationally inaccessible • At least 90-95% of estimated biodiversity doesn’t have a name, and most will never receive one • The better the Tree of Life is known, the more groups of organisms without a name
  7. 7. The Phyloreferencing Project • Funded by US National Science Foundation since 2015 • Goal: Capture and make computable the semantics of phylogenetic clade definitions for computational integration of biological data. • Major expected products: • Ontology, specification, and tooling for authoring phyloreferences. • Ontology of phylogenetic clade definitions • Online tools for using phyloreferences to retrieve data phyloref.org
  8. 8. Anatomy of a phylogenetic clade definition Alligatoroidea =
 Alligator mississippiensis and all crocodylians closer to it than to Crocodylus niloticus or Gavialis gangeticus Crocodylia = Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents Brochu 2003
  9. 9. Anatomy of a phylogenetic clade definition Alligatoroidea =
 Alligator mississippiensis and all crocodylians closer to it than to Crocodylus niloticus or Gavialis gangeticus Crocodylia = Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents Brochu 2003 “Specifiers”
  10. 10. Anatomy of a phylogenetic clade definition Alligatoroidea =
 Alligator mississippiensis and all crocodylians closer to it than to Crocodylus niloticus or Gavialis gangeticus Crocodylia = Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents Brochu 2003 Type of definition
  11. 11. Anatomy of a phylogenetic clade definition Alligatoroidea =
 Alligator mississippiensis and all crocodylians closer to it than to Crocodylus niloticus or Gavialis gangeticus Crocodylia = Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents Brochu 2003
  12. 12. Anatomy of a phylogenetic clade definition Alligatoroidea =
 Alligator mississippiensis and all crocodylians closer to it than to Crocodylus niloticus or Gavialis gangeticus Crocodylia = Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents Brochu 2003
  13. 13. Axiomatization of a tree
  14. 14. A phyloreference Crocodylus niloticus and all crocodylians
 closer to it than to Alligator mississippiensis includes_TU some tco:Crocodylus niloticus and excludes_TU some tco:Alligator mississippiensis
  15. 15. Resolving a phyloreference:
 axiomatized tree + phyloreference + reasoner Crocodylus niloticus and all crocodylians
 closer to it than to Alligator mississippiensis includes_TU some tco:Crocodylus niloticus and excludes_TU some tco:Alligator mississippiensis
  16. 16. Resolving a phyloreference:
 axiomatized tree + phyloreference + reasoner Crocodylus niloticus and all crocodylians
 closer to it than to Alligator mississippiensis includes_TU some tco:Crocodylus niloticus and excludes_TU some tco:Alligator mississippiensis
  17. 17. Resolving a phyloreference:
 axiomatized tree + phyloreference + reasoner Crocodylus niloticus and all crocodylians
 closer to it than to Alligator mississippiensis includes_TU some tco:Crocodylus niloticus and excludes_TU some tco:Alligator mississippiensis
  18. 18. Phyloreferences have major advantages for data integration • Unambiguous and computable semantics • Portable between phylogenetic trees, and hence competing or evolving phylogenetic hypotheses • Computationally reproducible • Can be constructed for any clade, and hence enables unrestricted communication
  19. 19. Challenges with OWL-DL:
 No set size or path length Crocodylus niloticus and all crocodylians
 closer to it than to Alligator mississippiensis “Maximum clade definition”
  20. 20. Challenges with OWL-DL:
 No set size or path length Last common ancestor of Alligator mississippiensis and Crocodylus niloticus and all of its descendents “Minimum clade definition”
  21. 21. Challenges with OWL-DL:
 No set size or path length Last common ancestor of Alligator mississippiensis and Crocodylus niloticus
 and all of its descendents includes_TU some tco:Crocodylus niloticus and includes_TU some tco:Alligator mississippiensis
  22. 22. Challenges with OWL-DL:
 No set size or path length Last common ancestor of Alligator mississippiensis and Crocodylus niloticus
 and all of its descendents includes_TU some tco:Crocodylus niloticus and includes_TU some tco:Alligator mississippiensis
  23. 23. Challenges with OWL-DL:
 No set size or path length Last common ancestor of Alligator mississippiensis and Crocodylus niloticus
 and all of its descendents has_Child some
 (includes_TU some tco:Crocodylus niloticus and excludes_TU some tco:Alligator mississippiensis)
  24. 24. Challenges with OWL-DL:
 No set size or path length Last common ancestor of Alligator mississippiensis and Crocodylus niloticus
 and all of its descendents has_Child some
 (includes_TU some tco:Crocodylus niloticus and excludes_TU some tco:Alligator mississippiensis) • However, this will fail for the general case:
 
 MaxClade(S1, S2) := S1 ~ S2
 Parent(S) := has_Child some S
 LCA(S1, S2) := Parent(MaxClade(S1, S2)) • If S2 has_Ancestor some (includes_TU some S1) then
 MaxClade(S1, S2) == {}, but LCA should be S1 • I.e., in this approach SN cannot itself be a phyloreference (LCA of a clade).
  25. 25. Challenge: Multiple specifiers • Naïve recursive approach:
 LCA(S1,…,SN) = LCA(LCA(S1,…,SN-1),SN) • However, as shown for LCA(S1, S2), S1 or S2 cannot itself be an LCA Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents
  26. 26. Challenge: Multiple specifiers Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents
  27. 27. Challenge: Multiple specifiers Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents Parent(LCA(Cn, Am) ~ Gg)
  28. 28. Challenge: Multiple specifiers Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents Parent(LCA(Cn, Am) ~ Gg) Parent(LCA(Cn, Gg) ~ Am) Parent(LCA(Am, Gg) ~ Cn)OR OR
  29. 29. Challenge: Multiple specifiers • The number of binary tree topologies for n leaves grows very fast, and is > 100 for 5 Last common ancestor of Gavialis gangeticus, Alligator mississippiensis, and Crocodylus niloticus and all of its descendents
  30. 30. Challenge: Scalability • Eventually we want to apply this to very large trees, with >1M leave nodes. • Only OWL-EL reasoners (e.g., ELK) scale well to this level. • However, this prevents use of disjunction.
  31. 31. Challenge: Scalability • As a kludge, can use multiple equivalency axioms instead. • However, this in essence makes false assertions. • Can result in unexpected subclass inferences for other phyloreferences. Parent(LCA(Cn, Am) ~ Gg) Parent(LCA(Cn, Gg) ~ Am) Parent(LCA(Am, Gg) ~ Cn) equivalentClass: equivalentClass: equivalentClass:
  32. 32. Challenge: “Qualifiers” • “Qualifiers” are specifiers that are required to be included or excluded by the clade. • “Kill switches” – tests that do not alter the semantics of the clade definition, but render it invalid if it fails any of the tests • External qualifiers as property chain:
 has_Ancestor o excludes_TU -> excludes_qualifying_TU
  33. 33. Summary • Reproducible large-scale comparative biology requires taxon concepts with fully computable semantics • Phylogenetic clade definitions have well-defined semantics in the form of necessary and sufficient conditions for clade membership • Clade semantics are well expressible in OWL, but OWL lacks constructs needed for inferring last common ancestors generically and scalably.
  34. 34. Acknowledgements • Nico Cellinese, Gaurav Vaidya, Anna Becker (U. Florida) • Pascal Hitzler and DaSe Lab alumni & collaborators (see Carral et al. WOP 2017, arXiv:1710.05096) • Funded by the US National Science Foundation (DBI-1458484, DBI-1458604)
  35. 35. How to find us • Web: http://phyloref.org (includes link to full grant proposal) • Github: http://github.com/phyloref

×