Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Integrating data with phylogenies, at scale

186 views

Published on

Invited presentation at the final Phenotype RCN Summit, held at Biosphere2, AZ, Feb 26-28, 2016. Co-presented with N. Cellinese.

More information about the Phyloreferencing project can be found at http://phyloref.org.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Integrating data with phylogenies, at scale

  1. 1. Integra(ng data with phylogenies, at scale Nico Cellinese University of Florida & Hilmar Lapp Duke University
  2. 2. WHAT’S IN A NAME?
  3. 3. What’s in a name? Chaos! •  Names and Concepts do not reconcile that easily •  Names are text strings •  Context is lacking or subjec(ve •  Meaning is not computable
  4. 4. Linnean names point to concepts Antoine Laurent de Jussieu Genera Plantarum, 1789
  5. 5. Linnean names point to concepts Antoine Laurent de Jussieu Genera Plantarum, 1789
  6. 6. Linnean names point to concepts Antoine Laurent de Jussieu Genera Plantarum, 1789 I don’t understand any of those concepts whether in LaDn or English, but I can sDll link them to their names, as in one object to one object
  7. 7. Linnean names point to concepts Antoine Laurent de Jussieu Genera Plantarum, 1789 …and 200+ …and 400+
  8. 8. Idiosyncratic Russian dolls syndrome
  9. 9. Idiosyncratic Russian dolls syndrome
  10. 10. Idiosyncratic Russian dolls syndrome
  11. 11. Idiosyncratic Russian dolls syndrome
  12. 12. Idiosyncratic Russian dolls syndrome
  13. 13. Idiosyncratic Russian dolls syndrome
  14. 14. Idiosyncratic Russian dolls syndrome
  15. 15. From a human perspecDve, we lose track of concepts. Hard to reconcile all of them. We need help! Can we compute them? Idiosyncratic Russian dolls syndrome
  16. 16. Linnean names point to concepts Antoine Laurent de Jussieu Genera Plantarum, 1789 …and 200+ …and 400+
  17. 17. •  We can uncluNer concepts, and thereby nomenclature •  How do we navigate along the Tree of Life repurposing Linnean names, which are linked to tradi(onal concepts?
  18. 18. Dark taxa!
  19. 19. Dark taxa! How do we integrate data with this tree?
  20. 20. Tree-thinking Common descent àevoluDon at the center of taxonomy B C D Branches Synapomorphies A Clades = taxa Discovery
  21. 21. Tree-thinking Common descent àevoluDon at the center of taxonomy Discovery CommunicaDon How?? 0147 Density 0.07 0.22 0.72 Diversification rate
  22. 22. Tree-thinking Berberidopsidaceae Opiliones Zingiberaceae Hamamelidaceae Sarcolaenaceae Lingulidae Hymenoptera Mammalia Apocynaceae Galliformes Rubiaceae Anarthriaceae Lineidae Crocodylidae Stylosiphonia Andrenidae Cracidae Gavialis Globba Micrella Rhodoleia Phalangiidae Tachyglossa Lyginia Mediusella Chamaeclitandra
  23. 23. Tree-thinking Berberidopsidaceae Opiliones Zingiberaceae Hamamelidaceae Sarcolaenaceae Lingulidae Hymenoptera Mammalia Apocynaceae Galliformes Rubiaceae Anarthriaceae Lineidae Crocodylidae Stylosiphonia Andrenidae Cracidae Gavialis Globba Micrella Rhodoleia Phalangiidae Tachyglossa Lyginia Mediusella Chamaeclitandra These names are not generated in an evoluDonary-based framework (Groups defined by character similarity vs. common descent)
  24. 24. Both the Encyclopedia of Life (EOL) and the Open Tree of Life suggest that Campanuloideae is a misspelling of Campaniloidea (marine gastropods!) GBIF does not currently have Campanuloideae in its backbone taxonomy.
  25. 25. Are you kidding me? These are the Campanuloideae! Wang et al. 2014
  26. 26. Life as a street map How to navigate life as a machine
  27. 27. Mapping data to phylogene(c knowledge space
  28. 28. Street signs serve people, not machines
  29. 29. •  How do we build a reliable GPS for phylogenies? •  How do we reproducibly find the right nodes? Mapping data to phylogene(c knowledge space
  30. 30. FEED Textual Definition – The hyoglossus is a muscle that attaches to the hyoid and tongue and is innervated by Cranial Nerve XII. Computable Definition – ('attached to' some 'hyoid bone') and ('attached to' some tongue) and ('innervated by' some 'hypoglossal nerve') and spatially disjoint with 'intrinsic tongue muscle' Druzinsky et al (2015): Logic definiDons of mammalian feeding muscles by means of necessary and sufficient condiDons true for all mammals Nomenclature ≠ Seman(cs
  31. 31. Phyloreference = Logic defini(on of a clade, using the property common to all of life
  32. 32. Phyloreferences Statements formally expressing the paaerns we discover (analogous to map coordinates) Node-Based Branch-Based Apomorphy-Based A B C A B C A B C X The clade originaDng with the last common ancestor of B and C. The clade originaDng with the first ancestor of B that is not an ancestor of A. The clade originaDng with the first ancestor of C to evolve X.
  33. 33. Phyloreferences yield a coordinate system for the Tree of Life •  Any node, branch, subtree is referenceable •  References are unambiguous •  References are computable •  References are portable •  Adapts to new and changing knowledge
  34. 34. Many needed technologies already exist •  OWL ontologies designed for –  PhylogeneDc knowledge: CDAO –  Phenotypic knowledge: Uberon, PATO, … –  Efficient and expressive reasoners: FaCT++, HermiT, Racer, ELK
  35. 35. 0.0 Campanula_rotundifolia Pseudonemacladus_oppositifolius Lobelia_cardinalis Campanula_latifolia Cyphocarpus_rigescens Wahlenbergia_linifolia Nemacladus_ramosissmus Lobelia_coronopifolia Cyphia_elata Pentaphragma Crysanthemum Sphenoclea Platycodon_grandiflorus Cyphia_bulbosa 5 3 Campanula 1 7 8 9 4 Lobelia Cyphia 6 1 0 2 Class: Campanulaceae_1889_to_1980 EquivalentTo: cdao:has_Descendant value taxon:Campanula_laDfolia and phyloref:excludes_lineage value taxon:Crysanthemum
  36. 36. 0.0 Campanula_rotundifolia Pseudonemacladus_oppositifolius Lobelia_cardinalis Campanula_latifolia Cyphocarpus_rigescens Wahlenbergia_linifolia Nemacladus_ramosissmus Lobelia_coronopifolia Cyphia_elata Pentaphragma Crysanthemum Sphenoclea Platycodon_grandiflorus Cyphia_bulbosa 5 3 Campanula 1 7 8 9 4 Lobelia Cyphia 6 1 0 2 Class: Campanulaceae_1980 EquivalentTo: cdao:has_Descendant value taxon:Campanula_laDfolia and phyloref:excludes_lineage value taxon:Lobelia
  37. 37. 0.0 Campanula_rotundifolia Pseudonemacladus_oppositifolius Lobelia_cardinalis Campanula_latifolia Cyphocarpus_rigescens Wahlenbergia_linifolia Nemacladus_ramosissmus Lobelia_coronopifolia Cyphia_elata Pentaphragma Crysanthemum Sphenoclea Platycodon_grandiflorus Cyphia_bulbosa 5 3 Campanula 1 7 8 9 4 Lobelia Cyphia 6 1 0 2 Class: Campanulaceae_aier_1995 EquivalentTo: cdao:has_Descendant value taxon:Campanula_laDfolia and phyloref:excludes_lineage value taxon:Sphenoclea
  38. 38. Phyloreferences as ontological expressions Phyloreference expressions can be: •  Easily generated by anyone •  Can work on any tree •  Named and registered – To promote reuse and consistency – To improve usability and accessibility Class: Campanulaceae Annota(ons: rdfs:label “Campanulaceae_aier_1995” dc:descripDon “the clade that includes Campanula laDfolia but not Sphenoclea” EquivalentTo: cdao:has_Descendant value taxon:Campanula_laDfolia and phyloref:excludes_lineage value taxon:Sphenoclea Class: AGF4-SHRU-3560 EquivalentTo: cdao:has_Descendant value taxon:Campanula_laDfolia and phyloref:excludes_lineage value taxon:Sphenoclea vs.
  39. 39. Challenges •  OWL-based data model to saDsfy phylogeneDc taxonomy, reasoning expressivity, scalability •  ConvenDons for data transformaDon, and consequences of different choices •  Least common ancestor reasoning for OWL data •  Lack of canonical specimen idenDfier system •  Specifier mapping ontologies
  40. 40. Tree of Life, ontologized: A universal coordinate system •  The Tree of Life is itself an aggregaDon and integraDon of our phylogeneDc knowledge. •  Phyloreferencing is addressing into a knowledge universe. •  Ontologies, reasoning, and other KR techniques are powerful tools for this.
  41. 41. Acknowledgements •  NaDonal Science FoundaDon (DBI-1458484) •  Ken and Linda McGurn •  Phenoscape •  EvoIO

×