Franz et al evol 2016 representing phylogeny as a logically tractable variable

Representing phylogeny as
a logically tractable variable
Please
@taxonbytes
Nico M. Franz1 , Guanyang Zhang1, Shizhuo Yu2 & Bertram Ludäscher3
1 School of Life Sciences, Arizona State University
2 Computer Science, University of California at Davis
3 iSchool, University of Illinois at Urbana-Champaign
Systematics / Bioinformatics Session – Evolution 2016 Meetings
June 20, 2016 - Austin, Texas (#Evol2016)
@ http://www.slideshare.net/taxonbytes/franz-et-al-evol-2016-representing-phylogeny-as-a-logically-tractable-variable

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638
2015 2014
Phylogenetic inferences
can vary over time.

Querying phylogenetic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach
eventually, more so than where we are now (for many perceived groups).

• Therefore it would be useful to have a "phylogenetic knowledge
advancement service". The service satisfies queries such as:

1. "Does this sequence of related phylogenetic inferences
have a stabilizing or destabilizing trend?"

2. "Are two or more phylogenies – each differentially sub-sampled
at lower levels – in congruence or in conflict?"

3. "How can an evolutionary study tied to one (earlier) phylogeny
be "updated" (integrated) with another (later) phylogeny?"

3. "How can an evolutionary study tied to one (earlier) phylogeny
be "updated" (integrated) with another (later) phylogeny?"
Service  We can prioritize research agendas accordingly.
Service  Sampling an issue? Or are signals complementary?
Service  Effects of "phylogenetic variable" on conclusions can be controlled for.

Research question:
How to build such a service?

Phylogenetic advancement service – design specifications
Example service queries:
1. Do succeeding inferences have stabilizing
or destabilizing trends?
2. Are multiple, differentially sub-sampled
phylogenies congruent or conflicting?
3. How to control for the "phylogenetic
variable" affecting evolutionary inferences?

• The service needs more granular
identifiers than "just Linnaean names" to
account for study-specific phylogenetic
(sub-)tree concepts.

• The service needs identifier-to-identifier relationships that match the query
semantics by representing congruence, conflict, and achieving reconciliation
across concepts associated with multiple inferences.

• The service needs identifier-to-identifier relationships that match the query
semantics by representing congruence, conflict, and achieving reconciliation
across concepts associated with multiple inferences.
• Compatibility with both human cognitive constraints1 and computational
logic is desirable to balance service quality (allowing expert interactions) with
scalability (semi-automated reasoning products).
1 Atran, S. 1998. Folk biology and the anthropology of science: cognitive universals and cultural particulars. Behavioral
and Brain Sciences 21: 547–569.

Proposed solution: Euler/X – logically consistent RCC–5 alignments
• Input: multiple taxonomies and/or phylogenies; expert-provided articulations
• Output: logic consistency checking; Maximally Informative Relations (MIR);
alignment visualizations

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (RCC–5) articulations
• Two regions N, M are either:
• congruent (N == M)
• properly inclusive (N < M)
• inversely properly inclusive (N > M)
• overlapping (N >< M)
• exclusive of each other (N ! M)

Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (RCC–5) articulations
• Two regions N, M are either:
• congruent (N == M)
• properly inclusive (N < M)
• inversely properly inclusive (N > M)
• overlapping (N >< M)
• exclusive of each other (N ! M)
• RCC–5 articulations match the query: "can we join regions N and M?"
• Phylogenies have multiple RCC–5 alignable components: nodes (parents,
children), node-associated traits, even node-anchoring specimens

Overview of RCC–5 alignment use cases
1. Two primate classifications – MSW2 (1993) versus MSW3 (2005)
a. Microcebus + Mirza sec. MSW3 (2005)
b. Quantifying name (identifier) reliability
c. Reasoning achieves scalability (matrix)
2. Lorisiformes sec. MSW3 (2005) versus Springer et al. (2012) phylogeny
a. Phylogeny refines classification
b. Sampling causes non-congruence
3. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)
a. Psittaciformes with & without coverage
b. Precise semiotics for the "avian explosion"

Use case 1:
Two primate classifications –
MSW2 (1993) versus MSW3 (2005)

Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"
identify input concept regions
RCC–5 articulations provided
for each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.

• Alignment visualization: "grey means congruent"

One name &
congruent region

One name &
congruent region
Many names &
congruent region

One name &
congruent region
Many names &
congruent region
One name &
non-congruent regions

One name &
congruent region
Many names &
congruent region
One name &
Many names &

One name &
congruent region
Many names &
congruent region
One name &
Many names &
New names &
exclusive regions

One name &
congruent region
Many names &
congruent region
One name &
Many names &
New names &
exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) are
fully defined by alignment signal propagated from their respective children.
 Sensible when complete sampling of children is intended.

Use case 1.b.: Quantifying name (identifier) reliability
One name &
congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &
congruent region
One name &
Many names &
New names &
exclusive regions

One name &
congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &
congruent region
One name &
Many names &
New names &
exclusive regions
• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence
is not only caused by differential low-level sampling; (3) alignment constitutes
a taxonomic meaning integration map to navigate across MSW3 & MSW2.
Use case 1.b.: Quantifying name (identifier) reliability

Source: Franz et al. 2016. Two influential primate classifications logical aligned. Systematic Biology 65: 561–582.
1 in 3 names are unreliable across MSW2/MSW3 classifications

Use case 1.c.: Reasoning achieves scalability (MIR matrix)
Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop
Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf
• Input: 402 articulations. Output: 153,111 Maximally Informative Relations
Salmon cells
↔ reasoning

Use case 2:
Lorisiformes sec. MSW3 versus
Springer et al. (2012) phylogeny

Use case 2.a.: Phylogeny (2012) refines classification (2005)
Springer et al. (2012) @ OpenTree
https://tree.opentreeoflife.org/curator/study/view/pg_2656

Convert OpenTree-supported NeXML files into Euler/X input (credit: Daisie Huang):
https://github.com/daisieh/phylogenomics/blob/master/converting/nexml_to_euler.py &
https://github.com/daisieh/phylogenomics/blob/master/converting/relabel_euler.py
Euler/X input
visualization

Convert OpenTree-supported NeXML files into Euler/X input (credit: Daisie Huang):
https://github.com/daisieh/phylogenomics/blob/master/converting/nexml_to_euler.py &
https://github.com/daisieh/phylogenomics/blob/master/converting/relabel_euler.py
Euler/X input
visualization
MSW3 (2005)

Alignment
visualization

• At low-to-mid levels, phylogeny often congruently refines classification.
Congruent refinement
Alignment
visualization

Unique species concepts
Alignment
visualization

Use case 2.b.: Sampling alone causes high-level non-congruence
• However, differential low-level sampling causes 'excessive' non-congruence
at higher levels that may not reflect the authors' intentions.
Unique species concepts
Higher level conflict
Alignment
visualization

Use case 3:
Avian phylogenies sec. Prum et al. (2015)
versus Jarvis et al. (2014)

Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638
2015 2014
Let's recall..

Use case 3: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)
• Sampling is highly differential: 198 versus 48 species-level entities
• Only 12 species-level concept pairs are congruent [green cells]

Use case 3.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with global coverage constraint
Input visualization
 Only disjoint articulations

• Psittaciformes sec. 2015 – with global coverage constraint
• No low-level congruence ↔ no congruent alignment regions
Input visualization
 Only disjoint articulations
Alignment visualization
 108 MIR; all disjoint

• Psittaciformes sec. 2015 – with coverage locally relaxed
Input visualization

• No coverage constraint for 2014/2015.[Psittacidae, Nestor]
Input visualization

• No coverage constraint for 2014/2015.[Psittacidae, Nestor]
• Allows for 3 congruent & 7 inclusive RCC–5 articulations
Input visualization

• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive

• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
 Additional 2015 low-level sampling

Use case 3.b.: Precise semiotics for the "avian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed

Non-congruence within
2015.Paleognathae Non-congruence within
2014.Pelecanimorphae

Non-congruence within
2015/2014.Neoaves
(see next slide)

• Neoaves sec. 2015/2014, and 3–4 less inclusive levels
 26 overlapping articulations in the sub-
Neoavian alignment region cannot be
assigned to differential sampling
 'Genuine' phylogenetic conflict

In conclusion:
How to build such a service?

Creating a Phylogenetic Knowledge Advancement Service
• Explicitly represent human theories, and only indirectly taxa (species, clades).
• Identify study-specific phylogenetic (sub-)tree concepts with precision.

• Utilize identifier-to-identifier relationships to represent the multi-component
nature of phylogenetic congruence & non-congruence:
• RCC–5 for nodes & terminals: w/o coverage ( sampling)
• RCC–5 for character concepts & specimen sets (not shown)

• Excel at representing and explaining conflict, as opposed to 'voting' (which is
external, a posteriori); enforce multi-theory alignment, not synthesis.

• Excel at representing and explaining conflict, as opposed to 'voting' (which is
external, a posteriori); enforce multi-theory alignment, not synthesis.
• Explicitly identify speaker intentions – by modeling phylogenetically
localized intentions of semantic precision (and in what specific sense?) versus
ambiguity – vis-à-vis their own products, and in relation to those of other
author teams. Otherwise, signal interpretation remains unreliable.

Acknowledgements & links to products and references
• Euler/X team: Shawn Bowers, Parisa Kianmajd, Timothy McPhillips.
• ProvenanceMatrix: Tuan Nhon Dang.
• NSF DEB–1155984, DBI–1342595 (PI Franz).
• NSF IIS–118088, DBI–1147273 (PI Ludäscher).
• Information @ http://taxonbytes.org/tag/concept-taxonomy/
• Euler/X code @ https://github.com/EulerProject/EulerX
• Franz et al. 2015. Reasoning over taxonomic change: exploring alignments for
the Perelleschus use case. PLoS ONE 10(2): e0118247. Link
• Franz et al. 2016. Two influential primate classifications logically aligned.
Systematic Biology 65(4): 561–582. Link

Interested in exploring
multi-phylogeny alignments?
Please contact me.
nico.franz@asu.edu
@taxonbytes
https://biokic.asu.edu/

Franz et al evol 2016 representing phylogeny as a logically tractable variable

Recommended

Recommended

More Related Content

Similar to Franz et al evol 2016 representing phylogeny as a logically tractable variable

Similar to Franz et al evol 2016 representing phylogeny as a logically tractable variable (20)

More from taxonbytes

More from taxonbytes (20)

Recently uploaded

Recently uploaded (20)

Franz et al evol 2016 representing phylogeny as a logically tractable variable

Editor's Notes