Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Franz 2014 ESA Aligning Insect Phylogenies Perelleschus and Other Cases

3,063 views

Published on

Update on the Euler/X project at http://www.entsoc.org/entomology2014; see also: http://taxonbytes.org/prior-work-on-concept-taxonomy-2013/

Published in: Science
  • Be the first to comment

Franz 2014 ESA Aligning Insect Phylogenies Perelleschus and Other Cases

  1. 1. Aligning insect phylogenies: Perelleschus and other cases Nico M. Franz 1,2 Arizona State University http://taxonbytes.org/ 1 Concepts and tools developed jointly with members of the Ludäscher Lab (UC Davis & UIUC): Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher 2 Systematics, Evolution and Biodiversity Section, Ten Minute Papers Annual Meeting of the Entomological Society of America November 18, 2014 - Portland, Oregon On-line @ http://www.slideshare.net/taxonbytes/franz-2014-esa-aligning-insect-phylogenies-perelleschus-and-other-cases-41654235
  2. 2. Research motivation: 1 How can we represent, and reason over, taxonomic concept provenance, based on varying input classifications and differentially sampled phylogenies? 1 This presentation concentrates on the "how?"; though the "why?" is addressed in the References (listed at the end).
  3. 3. Definitional preliminaries, 1 Taxonomic concept: 1 The circumscription of a perceived (or, more accurately, hypothesized) taxonomic group, as advocated by a particular author and source. 1Not the same as species concepts, which are theories about what species are, and/or how they are recognized.
  4. 4. Definitional preliminaries, 2 Provenance: 1 Information describing the origin, derivation, history, custody, or context of an entity (etc.). Provenance establishes the authenticity, integrity and trustworthiness of information about entities. 1 See, e.g.: http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
  5. 5. Definitional preliminaries, 3 Alignment ("merge"): A comprehensive, logically consistent, and (where possible) well-specified reconciliation of shared and unique Euler regions that result from integrating two or more taxonomic concept hierarchies ("trees") with RCC-5 articulations.1 1 RCC-5 = Region Connection Calculus (set theory relationships: congruence, inclusion, overlap, exclusion, etc.).
  6. 6. Input for provenance reasoning: Perelleschus use case, 1936−2013
  7. 7. Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013) Female , habitu s Labium Maxill a • Habitus, mouthparts
  8. 8. Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013) • Habitus, mouthparts One might call this string a Taxonomic Concept Label. Female , habitu s Labium Maxill a
  9. 9. Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013) • Male & female terminalia, showing putative synapomorphies Synapomorphy (genus-level): Spermatheca with an acute, sclerotized appendix at insertion of the collum (character 17:1). "11" Synapomorphy (subclade-level): Aedeagus with endophallic sclerites extending in apical half of aedeagus (character 11:1). "17"
  10. 10. Phylogeny: Perelleschus sec. Franz & Cardona-Duque (2013) Spermathecal synapomorphy Aedeagal synapomorphy
  11. 11. Perelleschus concept history: • 6 classifications, • 54 taxonomic concepts, • 75 concept2 RCC-5 articulations;  Suitable for provenance reasoning. 1 1 Franz et al. 2014. Reasoning over taxonomic change: Exploring alignments for the Perelleschus use case. PLoS ONE.
  12. 12. 1936: 1st species-level concept. 1954: Genus named, + 2 species. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species.
  13. 13. 1936: 1st species-level concept. 1954: Genus named, + 2 species. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species.
  14. 14. 1936: 1st species-level concept. 1954: Genus named, + 2 species. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species.
  15. 15. 1936: 1st species-level concept. 1954: Genus named, + 2 species. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species.
  16. 16. 1936: 1st species-level concept. 1954: Genus named, + 2 species. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species.
  17. 17. 1936: 1st species-level concept. 1954: Genus named, + 2 species. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species.
  18. 18. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species. Focal alignments (today) • 1986 versus 2001 • Classification / Phylogeny • 2001 versus 2006 • Phylogeny / Exemplar Analysis • 2001 versus 2013 (appended) • Phylogeny / Extended Phylogeny 2001 / 2013
  19. 19. Introducing the Euler/X software toolkit (Open Source) "A toolkit for consistently aligning sets of hierarchically arranged entities under (relaxable) logic constraints, and using RCC-5 articulations." Desktop tool @ https://bitbucket.org/eulerx Euler server @ http://euler.asu.edu
  20. 20. Euler/X toolkit − Please ask me (later) about a live demonstration!
  21. 21. Euler/X uses Answer Set Programming. The reasoner asks, and solves, the question: "Which possible worlds can be generated that satisfy (i.e., are consistent with) a given set of input constraints?" 1
  22. 22. Euler/X uses Answer Set Programming. The reasoner asks, and solves, the question: "Which possible worlds can be generated that satisfy (i.e., are consistent with) a given set of input constraints?" 1 1 Input constraints: • T1 − taxonomy 1 • T2 − taxonomy 2 • A − user-asserted articulations • C − additional 'tree' constraints
  23. 23. Alignment 1 - Perelleschus sec. WOB (1986) versus sec. FOB (2001) T1: Perelleschus sec. 1986 • Traditional classification • 1 genus-level concept • 3 species-level concepts
  24. 24. Alignment 1 - Perelleschus sec. WOB (1986) versus sec. FOB (2001) T1: Perelleschus sec. 1986 • Traditional classification • 1 genus-level concept • 3 species-level concepts T2: Perelleschus sec. 2001 • Phylogenetic revision • 2 genus-level concepts • 7 clade-level concepts • 9 species-level concepts
  25. 25. Format for alignment input file (constraints: T1, T2, A, C) Year Source T2 Parent concept Child concepts T1 T2 to T1 Articulations (as provided by the user)
  26. 26. Input visualization Six1 user-asserted input articulations (pink lines) are sufficient to yield a single, well-specified alignment. 1Actually, three (species-level) articulations are sufficient to achieve this for the 2001/1986 alignment.
  27. 27. Alignment (merge) visualization Reasoner infers 66 additional, logically implied articulations (MIR).1 2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation is explained in the merge taxonomy. 1 MIR = Maximally Informative Relations (among paired concepts of T1, T2). Legend
  28. 28. Alignment (merge) visualization Reasoner infers 66 additional, logically implied articulations (MIR).1 2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation is explained in the merge taxonomy. 3 congruent 2001/1986 species-level concepts. 1 MIR = Maximally Informative Relations (among paired concepts of T1, T2). Legend
  29. 29. Alignment (merge) visualization Reasoner infers 66 additional, logically implied articulations (MIR).1 2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation is explained in the merge taxonomy. 3 congruent 2001/1986 species-level concepts. 6 species-level concepts unique sec. FOB (2001). 1 MIR = Maximally Informative Relations (among paired concepts of T1, T2). Legend
  30. 30. Alignment (merge) visualization Reasoner infers 66 additional, logically implied articulations (MIR).1 2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation is explained in the merge taxonomy. 3 congruent 2001/1986 species-level concepts. 6 species-level concepts unique sec. FOB (2001). 6 clade-level concepts unique to FOB (2001). 1 MIR = Maximally Informative Relations (among paired concepts of T1, T2). Legend
  31. 31. Alignment (merge) visualization Reasoner infers 66 additional, logically implied articulations (MIR).1 2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation is explained in the merge taxonomy. 3 congruent 2001/1986 species-level concepts. 6 species-level concepts unique sec. FOB (2001). 6 clade-level concepts unique to FOB (2001). 2001.PER & 2001.PHY in overlap with 1986.PER. 1 MIR = Maximally Informative Relations (among paired concepts of T1, T2). Legend
  32. 32. Alignment (merge) visualization Reasoner infers 66 additional, logically implied articulations (MIR).1 2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation is explained in the merge taxonomy. We can 'zoom in' on the overlap and resolve the resulting subregions in the "merge concept view". 1 MIR = Maximally Informative Relations (among paired concepts of T1, T2). Legend
  33. 33. Merge concept view (in part) "2001.PER and 1986.PER share a region (2001.PER * 1986.PER) constituted (at lower levels) by 2001/1986.P_rectirostris; this latter region is that which is entailed in 1986.PER and excluded from 2001.PHY. (1986.PER2001.PHY)." 2001 concepts 2001/1986 concepts
  34. 34. Merge concept view (in part) "2001.PHYsubcin/1986.Psubcin differentially 'participates' in 2001.PHY and 1986.PER; but not 2001.PER (or any of its children)." 2001 concepts 2001/1986 concepts
  35. 35. Alignment 2 - Perelleschus sec. FOB (2001) versus sec. F (2006) T1: Perelleschus sec. 2001 • Phylogenetic revision • 8 ingroup species concepts • 2 outgroup concepts • 18 concepts total
  36. 36. Alignment 2 - Perelleschus sec. FOB (2001) versus sec. F (2006) T1: Perelleschus sec. 2001 • Phylogenetic revision • 8 ingroup species concepts • 2 outgroup concepts • 18 concepts total T2: Perelleschus sec. 2006 • Exemplar analysis • 2 ingroup species concepts • 1 outgroup concept • 7 concepts total
  37. 37. Logic representation challenge: Perelleschus sec. 2001 & 2006 concepts have incongruent sets of subordinate members, yet each concept has congruent synapomorphies.
  38. 38. Definitional preliminaries, 4 1 Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.  Ostension: giving meaning through an act of pointing out. 1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
  39. 39. Definitional preliminaries, 4 1 Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.  Ostension: giving meaning through an act of pointing out. Intensional alignment: the congruence among higher-level concepts is assessed in relation to their properties.  Intension: giving meaning through the specification of properties. 1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
  40. 40. Ostensive alignment – members are all that counts Input constraints Challenge 1: Ostensive alignment Ostensive alignment 2001 & 2006
  41. 41. Ostensive alignment – members are all that counts Challenge 1: Ostensive alignment Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non-congruence in the representation of child concepts Input constraints Ostensive alignment 2001 & 2006
  42. 42. Ostensive alignment – members are all that counts Challenge 1: Ostensive alignment Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non-congruence in the representation of child concepts Result: 2006.PER < 2001.PER 2006.PER | 2001.[5 species concepts] etc. Input constraints Ostensive alignment 2001 & 2006 5 x | 2 x ><
  43. 43. Intensional alignment – representation of congruent synapomorphies Input constraints Challenge 2: Intensional alignment Intensional alignment 2001 & 2006 "17" "11"
  44. 44. Intensional alignment – representation of congruent synapomorphies Input constraints Challenge 2: Intensional alignment Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child Intensional alignment 2001 & 2006 "17" "11"
  45. 45. Intensional alignment – representation of congruent synapomorphies Input constraints Challenge 2: Intensional alignment Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child 11 ingroup concept articulations are coded intensionally – as == or > – to reflect congruent synapomorphies (chars. 11, 17) of 2001 & 2006 Intensional alignment 2001 & 2006 "17" "11"
  46. 46. Intensional alignment – representation of congruent synapomorphies Input constraints Challenge 2: Intensional alignment Result: The genus- and ingroup clade-level concepts are inferred as congruent: 2006. PER == 2001.PER 2006.PcarPeve == 2001.PcarPsul etc. Intensional alignment 2001 & 2006
  47. 47. Review – representing ostensive versus intensional alignments Ostensive alignment 2001.PER includes more species-level concepts than 2006.PER [>].
  48. 48. Review – representing ostensive versus intensional alignments Ostensive alignment 2001.PER includes more species-level concepts than 2006.PER [>]. Intensional alignment 2006.PER reconfirms the synapomorphies inferred in 2001.PER [==].
  49. 49. Is this approach scalable? Quite possibly yes.
  50. 50. Use case: Alternative phylogenetic schemes of higher-level weevils T1: Curculionoidea sec. Kuschel (1995) • Cladistic analysis • 41 concepts
  51. 51. Use case: Alternative phylogenetic schemes of higher-level weevils T1: Curculionoidea sec. Kuschel (1995) • Cladistic analysis • 41 concepts T2: Curculionoidea sec. Marvaldi & Morrone (2000) • Cladistic analysis • 25 concepts
  52. 52. Alignment: Curculionoidea sec. K (1995) versus sec. MM (2000) Initial visual impression: Lots of green rectangles, yellow octagons, and overlap (><). Much taxonomic concept incongruence.
  53. 53. Use case: Dwarf lemurs sec. 1993 & 2005 1 Chirogaleus furcifer sec. Mühel (1890) – Brehms Tierleben. Public Domain: http://books.google.com/books?id=sDgQAQAAMAAJ 1 Franz et al. 2014. Taxonomic provenance: Two influential primate classifications logically aligned. (in preparation)
  54. 54. The 2nd & 3rd Editions of the Mammal Species of the World 1993 2005 Primates sec. Groves (1993)  317 taxonomic concepts, 233 at the species level. Primates sec. Groves (2005)  483 taxonomic concepts, 376 at the species level. Δ = 143 species-level concepts
  55. 55. Alignment of Primates sec. Groves 1993 / 2005 Primates: 800 concepts 402 articulations 153,111 MIR  ~ 380x information gain! Strepsirrhini sec. MSW3 Haplorrhini sec. MSW3 Catarrhini sec. MSW3
  56. 56. Taxonomic provenance  quantify name/meaning dissociation 'Dissociation' means that either un-identical names are paired with congruent concepts, or that identical names are paired with incongruent concepts. "Reliable names" "Unreliable names"
  57. 57. In summary (1) − What this approach can provide: So, given an input set of [T1, T2, A, C], one gains: (1) Logical consistency in the alignment; (2) Intended degree of alignment resolution; (3) Additional, logically implied articulations; (4) Visualizations of taxonomic provenance; (5) Quantifications of name/meaning relations.
  58. 58. In summary (2) − Representation and reasoning abilities • Compatibility with contemporary Linnaean nomenclature (and PhyloCode too); • Integration of many-to-many name/circumscription relationships across taxonomies; • Reconciliation of traditional classifications with fully bifurcated phylogenies; • Representation of monotypic concept lineages with congruent taxonomic extensions;
  59. 59. In summary (2) − Representation and reasoning abilities • Compatibility with contemporary Linnaean nomenclature (and PhyloCode too); • Integration of many-to-many name/circumscription relationships across taxonomies; • Reconciliation of traditional classifications with fully bifurcated phylogenies; • Representation of monotypic concept lineages with congruent taxonomic extensions; • Accounting for insufficiently specified higher-level entities: • Undersampled outgroup entities; • Differentially sampled ingroup entities; • Resolution of taxonomically overlapping entities and merge concepts; • Differentiation of ostensive versus intensional readings of concept articulations; • Representation of topologically localized resolution versus ambiguity in alignments.
  60. 60. In summary (2) − Representation and reasoning abilities • Compatibility with contemporary Linnaean nomenclature (and PhyloCode too); • Integration of many-to-many name/circumscription relationships across taxonomies; • Reconciliation of traditional classifications with fully bifurcated phylogenies; • Representation of monotypic concept lineages with congruent taxonomic extensions; • Accounting for insufficiently specified higher-level entities: • Undersampled outgroup entities; • Differentially sampled ingroup entities; • Resolution of taxonomically overlapping entities and merge concepts; • Differentiation of ostensive versus intensional readings of concept articulations; • Representation of topologically localized resolution versus ambiguity in alignments. • Next critical step(s): accessible, scalable, usable, integrated web instance of Euler/X
  61. 61. In summary (3) − Take-home message We can explain (much of) taxonomy's legacy to computers (e.g.) for superior name/meaning resolution. Well, then, should we? And at what cost?
  62. 62. And, in the near future..?
  63. 63. A future beyond concept-to-concept alignments Reasoning over the provenance / identity of: • Taxonomic concepts; • Concept-associated traits; • Vouchered specimens.
  64. 64. Acknowledgments • Euler/X team: Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher. • Juliana Cardona-Duque, Charles O'Brien (Perelleschus), Naomi Pier (primates) & AlanWeakley (Magnolia). • taxonbytes lab members: Andrew Johnston & Guanyang Zhang. • NSF DEB–1155984, DBI–1342595 (Franz); IIS–118088, DBI–1147273 (Ludäscher). • Information @ http://taxonbytes.org/tag/concept-taxonomy/ • Euler/X code @ https://bitbucket.org/eulerx • Euler server @ http://euler.asu.edu Franz Lab: http://taxonbytes.org/ https://sols.asu.edu/
  65. 65. Select references on concept taxonomy and the Euler/X toolkit • Franz et al. 2008. On the use of taxonomic concepts in support of biodiversity research and taxonomy. In: The New Taxonomy; pp. 63–86. Link • Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 7: 5–20. Link • Franz & Thau. 2010. Biological taxonomy and ontology development: Scope and limitations. Biodiversity Informatics 7: 45–66. Link • Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. WFLP 2013 – 22nd International Workshop on Functional and (Constraint) Logic Programming. Link • Chen et al. 2014. A hybrid diagnosis approach combining Black-Box and White- Box reasoning. Lecture Notes in Computer Science 8620: 127–141. Link • Franz et al. 2014. Names are not good enough: Reasoning over taxonomic change in the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability – Special Issue on Semantics for Biodiversity. (in press) Link • Franz et al. 2014. Reasoning over taxonomic change: Exploring alignments for the Perelleschus use case. PLoS ONE. (in press) Link • Franz et al. 2015. Taxonomic provenance: Two influential primate classifications logically aligned. (in preparation)
  66. 66. Miscellaneous appended slides
  67. 67. User/reasoner interaction: achieving well-specified alignments T1 = Taxonomy 1 T2 = Taxonomy 2 A = Input articulations [==, >, <, ><, |] C = Taxonomic constraints  Articulations are asserted by taxonomic experts.
  68. 68. User/reasoner interaction: achieving well-specified alignments MIR = Maximally Informative Relations [==, >, <, ><, |] for each concept pair Yes Yes
  69. 69. Euler/X toolkit − Desktop version downloadable on Bitbucket
  70. 70. Alan Weakley 2014 (UNC Herbarium) - Magnolia concept evolution
  71. 71. R32 lattice of RCC-5 articulations (lighter color = less certainty)
  72. 72. The other piece in the puzzle: Concept-to-voucher identifications Source: Baskauf & Webb. 214. Darwin-SW. URL: http://www.semantic-web-journal.net/system/files/swj635.pdf
  73. 73. 1986: Validation of generic name. 2001: Revision & phylogeny, + 6 / - 1 species. 2006: Exemplar cladistic analysis; 3 species. 2013: Revision & phylogeny, + 2 species. Focal alignments (today) • 1986 versus 2001 • Classification / Phylogeny • 2001 versus 2006 • Phylogeny / Exemplar Analysis • 2001 versus 2013 • Phylogeny / Extended Phylogeny 2001 / 2013
  74. 74. Alignment 3 - Perelleschus sec. FOB (2001) versus sec. FCD (2013) Ostensive alignment 10 overlapping articulations Species-level congruence 'Cascading' clade concepts Intensional alignment Congruent synapomorphies reconfirmed across sub-clades; with minor low-level concept additions

×