Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Franz et al evol 2016 representing phylogeny as a logically tractable variable
1. Representing phylogeny as
a logically tractable variable
Please
@taxonbytes
Nico M. Franz1 , Guanyang Zhang1, Shizhuo Yu2 & Bertram Ludäscher3
1 School of Life Sciences, Arizona State University
2 Computer Science, University of California at Davis
3 iSchool, University of Illinois at Urbana-Champaign
Systematics / Bioinformatics Session – Evolution 2016 Meetings
June 20, 2016 - Austin, Texas (#Evol2016)
@ http://www.slideshare.net/taxonbytes/franz-et-al-evol-2016-representing-phylogeny-as-a-logically-tractable-variable
2. Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638
2015 2014
Phylogenetic inferences
can vary over time.
3. Querying phylogenetic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach
eventually, more so than where we are now (for many perceived groups).
4. • The term "tree of life" characterizes a goal that we strive to reach
eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge
advancement service". The service satisfies queries such as:
Querying phylogenetic advancement – premises & questions
5. • The term "tree of life" characterizes a goal that we strive to reach
eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge
advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferences
have a stabilizing or destabilizing trend?"
Querying phylogenetic advancement – premises & questions
6. • The term "tree of life" characterizes a goal that we strive to reach
eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge
advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferences
have a stabilizing or destabilizing trend?"
2. "Are two or more phylogenies – each differentially sub-sampled
at lower levels – in congruence or in conflict?"
Querying phylogenetic advancement – premises & questions
7. • The term "tree of life" characterizes a goal that we strive to reach
eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge
advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferences
have a stabilizing or destabilizing trend?"
2. "Are two or more phylogenies – each differentially sub-sampled
at lower levels – in congruence or in conflict?"
3. "How can an evolutionary study tied to one (earlier) phylogeny
be "updated" (integrated) with another (later) phylogeny?"
Querying phylogenetic advancement – premises & questions
8. • The term "tree of life" characterizes a goal that we strive to reach
eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "phylogenetic knowledge
advancement service". The service satisfies queries such as:
1. "Does this sequence of related phylogenetic inferences
have a stabilizing or destabilizing trend?"
2. "Are two or more phylogenies – each differentially sub-sampled
at lower levels – in congruence or in conflict?"
3. "How can an evolutionary study tied to one (earlier) phylogeny
be "updated" (integrated) with another (later) phylogeny?"
Querying phylogenetic advancement – premises & questions
Service We can prioritize research agendas accordingly.
Service Sampling an issue? Or are signals complementary?
Service Effects of "phylogenetic variable" on conclusions can be controlled for.
10. Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing
or destabilizing trends?
2. Are multiple, differentially sub-sampled
phylogenies congruent or conflicting?
3. How to control for the "phylogenetic
variable" affecting evolutionary inferences?
11. Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing
or destabilizing trends?
2. Are multiple, differentially sub-sampled
phylogenies congruent or conflicting?
3. How to control for the "phylogenetic
variable" affecting evolutionary inferences?
• The service needs more granular
identifiers than "just Linnaean names" to
account for study-specific phylogenetic
(sub-)tree concepts.
12. Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing
or destabilizing trends?
2. Are multiple, differentially sub-sampled
phylogenies congruent or conflicting?
3. How to control for the "phylogenetic
variable" affecting evolutionary inferences?
• The service needs more granular
identifiers than "just Linnaean names" to
account for study-specific phylogenetic
(sub-)tree concepts.
• The service needs identifier-to-identifier relationships that match the query
semantics by representing congruence, conflict, and achieving reconciliation
across concepts associated with multiple inferences.
13. Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing
or destabilizing trends?
2. Are multiple, differentially sub-sampled
phylogenies congruent or conflicting?
3. How to control for the "phylogenetic
variable" affecting evolutionary inferences?
• The service needs more granular
identifiers than "just Linnaean names" to
account for study-specific phylogenetic
(sub-)tree concepts.
• The service needs identifier-to-identifier relationships that match the query
semantics by representing congruence, conflict, and achieving reconciliation
across concepts associated with multiple inferences.
• Compatibility with both human cognitive constraints1 and computational
logic is desirable to balance service quality (allowing expert interactions) with
scalability (semi-automated reasoning products).
1 Atran, S. 1998. Folk biology and the anthropology of science: cognitive universals and cultural particulars. Behavioral
and Brain Sciences 21: 547–569.
15. Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (RCC–5) articulations
• Two regions N, M are either:
• congruent (N == M)
• properly inclusive (N < M)
• inversely properly inclusive (N > M)
• overlapping (N >< M)
• exclusive of each other (N ! M)
16. Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (RCC–5) articulations
• Two regions N, M are either:
• congruent (N == M)
• properly inclusive (N < M)
• inversely properly inclusive (N > M)
• overlapping (N >< M)
• exclusive of each other (N ! M)
• RCC–5 articulations match the query: "can we join regions N and M?"
• Phylogenies have multiple RCC–5 alignable components: nodes (parents,
children), node-associated traits, even node-anchoring specimens
17. Overview of RCC–5 alignment use cases
1. Two primate classifications – MSW2 (1993) versus MSW3 (2005)
a. Microcebus + Mirza sec. MSW3 (2005)
b. Quantifying name (identifier) reliability
c. Reasoning achieves scalability (matrix)
2. Lorisiformes sec. MSW3 (2005) versus Springer et al. (2012) phylogeny
a. Phylogeny refines classification
b. Sampling causes non-congruence
3. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)
a. Psittaciformes with & without coverage
b. Precise semiotics for the "avian explosion"
18. Use case 1:
Two primate classifications –
MSW2 (1993) versus MSW3 (2005)
19. Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"
identify input concept regions
RCC–5 articulations provided
for each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.
20. • Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
21. One name &
congruent region
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
22. One name &
congruent region
Many names &
congruent region
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
23. One name &
congruent region
Many names &
congruent region
One name &
non-congruent regions
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
24. One name &
congruent region
Many names &
congruent region
One name &
non-congruent regions
Many names &
non-congruent regions
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
25. One name &
congruent region
Many names &
congruent region
One name &
non-congruent regions
Many names &
non-congruent regions
New names &
exclusive regions
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
26. One name &
congruent region
Many names &
congruent region
One name &
non-congruent regions
Many names &
non-congruent regions
New names &
exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) are
fully defined by alignment signal propagated from their respective children.
Sensible when complete sampling of children is intended.
• Alignment visualization: "grey means congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
27. Use case 1.b.: Quantifying name (identifier) reliability
One name &
congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &
congruent region
One name &
non-congruent regions
Many names &
non-congruent regions
New names &
exclusive regions
28. One name &
congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &
congruent region
One name &
non-congruent regions
Many names &
non-congruent regions
New names &
exclusive regions
• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence
is not only caused by differential low-level sampling; (3) alignment constitutes
a taxonomic meaning integration map to navigate across MSW3 & MSW2.
Use case 1.b.: Quantifying name (identifier) reliability
29. Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.
1 in 3 names are unreliable across MSW2/MSW3 classifications
30. Use case 1.c.: Reasoning achieves scalability (MIR matrix)
Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop
Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf
• Input: 402 articulations. Output: 153,111 Maximally Informative Relations
Salmon cells
↔ reasoning
36. • At low-to-mid levels, phylogeny often congruently refines classification.
Congruent refinement
Alignment
visualization
Use case 2.a.: Phylogeny (2012) refines classification (2005)
37. • At low-to-mid levels, phylogeny often congruently refines classification.
Congruent refinement
Unique species concepts
Alignment
visualization
Use case 2.a.: Phylogeny (2012) refines classification (2005)
38. Use case 2.b.: Sampling alone causes high-level non-congruence
• At low-to-mid levels, phylogeny often congruently refines classification.
• However, differential low-level sampling causes 'excessive' non-congruence
at higher levels that may not reflect the authors' intentions.
Congruent refinement
Unique species concepts
Higher level conflict
Alignment
visualization
39. Use case 3:
Avian phylogenies sec. Prum et al. (2015)
versus Jarvis et al. (2014)
41. Use case 3: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)
• Sampling is highly differential: 198 versus 48 species-level entities
• Only 12 species-level concept pairs are congruent [green cells]
42. Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with global coverage constraint
Input visualization
Only disjoint articulations
43. • Psittaciformes sec. 2015 – with global coverage constraint
• No low-level congruence ↔ no congruent alignment regions
Input visualization
Only disjoint articulations
Alignment visualization
108 MIR; all disjoint
Use case 3.a.: Psittaciformes with & without coverage constraint
44. • Psittaciformes sec. 2015 – with coverage locally relaxed
Input visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
45. • Psittaciformes sec. 2015 – with coverage locally relaxed
• No coverage constraint for 2014/2015.[Psittacidae, Nestor]
Input visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
46. • Psittaciformes sec. 2015 – with coverage locally relaxed
• No coverage constraint for 2014/2015.[Psittacidae, Nestor]
• Allows for 3 congruent & 7 inclusive RCC–5 articulations
Input visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
47. • Psittaciformes sec. 2015 – with coverage locally relaxed
• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Use case 3.a.: Psittaciformes with & without coverage constraint
48. • Psittaciformes sec. 2015 – with coverage locally relaxed
• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Additional 2015 low-level sampling
Use case 3.a.: Psittaciformes with & without coverage constraint
49. Use case 3.b.: Precise semiotics for the "avian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
50. • Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within
2015.Paleognathae Non-congruence within
2014.Pelecanimorphae
Use case 3.b.: Precise semiotics for the "avian explosion"
51. • Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within
2015/2014.Neoaves
(see next slide)
Use case 3.b.: Precise semiotics for the "avian explosion"
52. • Neoaves sec. 2015/2014, and 3–4 less inclusive levels
26 overlapping articulations in the sub-
Neoavian alignment region cannot be
assigned to differential sampling
'Genuine' phylogenetic conflict
Use case 3.b.: Precise semiotics for the "avian explosion"
53. Example service queries:
1. Do succeeding inferences have stabilizing
or destabilizing trends?
2. Are multiple, differentially sub-sampled
phylogenies congruent or conflicting?
3. How to control for the "phylogenetic
variable" affecting evolutionary inferences?
In conclusion:
How to build such a service?
54. Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
55. Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
• Utilize identifier-to-identifier relationships to represent the multi-component
nature of phylogenetic congruence & non-congruence:
• RCC–5 for nodes & terminals: w/o coverage ( sampling)
• RCC–5 for character concepts & specimen sets (not shown)
56. Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
• Utilize identifier-to-identifier relationships to represent the multi-component
nature of phylogenetic congruence & non-congruence:
• RCC–5 for nodes & terminals: w/o coverage ( sampling)
• RCC–5 for character concepts & specimen sets (not shown)
• Excel at representing and explaining conflict, as opposed to 'voting' (which is
external, a posteriori); enforce multi-theory alignment, not synthesis.
57. Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.
• Utilize identifier-to-identifier relationships to represent the multi-component
nature of phylogenetic congruence & non-congruence:
• RCC–5 for nodes & terminals: w/o coverage ( sampling)
• RCC–5 for character concepts & specimen sets (not shown)
• Excel at representing and explaining conflict, as opposed to 'voting' (which is
external, a posteriori); enforce multi-theory alignment, not synthesis.
• Explicitly identify speaker intentions – by modeling phylogenetically
localized intentions of semantic precision (and in what specific sense?) versus
ambiguity – vis-à-vis their own products, and in relation to those of other
author teams. Otherwise, signal interpretation remains unreliable.
58. Acknowledgements & links to products and references
• Euler/X team: Shawn Bowers, Parisa Kianmajd, Timothy McPhillips.
• ProvenanceMatrix: Tuan Nhon Dang.
• NSF DEB–1155984, DBI–1342595 (PI Franz).
• NSF IIS–118088, DBI–1147273 (PI Ludäscher).
• Information @ http://taxonbytes.org/tag/concept-taxonomy/
• Euler/X code @ https://github.com/EulerProject/EulerX
• Franz et al. 2015. Reasoning over taxonomic change: exploring alignments for
the Perelleschus use case. PLoS ONE 10(2): e0118247. Link
• Franz et al. 2016. Two influential primate classifications logically aligned.
Systematic Biology 65(4): 561–582. Link
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
Some of our trees are not necessarily "of life", but are best modeled as our more or less flawed approximations. (Most trees are false and also approximately true.)
Euler/X is available both as a desktop CLI toolkit, and as a web-based service of the "Explorer of Taxon Concepts" project. Links provided at the end.
The simple semantics of RCC-5 makes this a rather generic vocabulary for representing advancement in phylogenetic knowledge. At the same time, the onus is on the phylogeneticists to apply the articulations in auch ways that the desired query services are actually obtained.
The simple semantics of RCC-5 makes this a rather generic vocabulary for representing advancement in phylogenetic knowledge. At the same time, the onus is on the phylogeneticists to apply the articulations in auch ways that the desired query services are actually obtained.
The key issue here lies in translation - will modulate some logic constraints here.
This table can be queried for any taxonomic concept pair, to inform data integration across the two aligned classifications.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.