Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tractable Variables
1. The evolving synthesis chain
Our human-made taxonomies and phylogenies –
intended to hierarchically represent organismal
lineage identities and relationships – are not a
constant over time. At any time [1], the prospect of
a comprehensive tree of life synthesis is immense-
ly motivating to research communities advancing
comparative phenomic knowledge. But the terms
"tree of life" and "synthesis" are also capable of
deceiving us. They focus us psychologically on the
here and now, and may therefore obscure the need
to build information systems that are (1) capable
of representing multiple, incremental stages along
the complex path towards 'the synthetic tree of life'
(→ representation), and which (2) entail semantic
linkages among the frequently divergent signals
that each stage can emit (→ reasoning).
We may attempt to look deeper into the past and
future of systematic inference generation, and ask:
"how well is our present 'synthesis' aligned with
one that we will have in 25 years from now?" This
question is relevant to knowledge integration and
reproducible science. If we look 25 years into the
past (ca. 1990), we can gauge its significance.
Designing for the failure to refer
Speaking provocatively, but also with an eye on
sustainable information management design, our
emerging environments are not necessarily 'of
taxa' or 'of phyla'. They are more immediately of
human theories about these purported evolutionary
entities. Even well into the 21st century, our
hierarchical theories are consistently expanded,
reconfirmed, or partly rejected and revised,
sometimes in dramatic fashion, to approximate the
tree of life better than before.
It is a personal and social challenge to counter-
act the allure of an 'almost-within-reach' synthesis,
and instead recognize the ephemerality of con-
temporary inferences, in order to create sustainable
environments. While we may want stable classes
and identifiers 'for taxa and phyla' in our systems
now, we should build for the possibility of these
entities failing to refer to natural entities. We
should not trust our 'inductive wants' vis-à-vis
current systematic knowledge – which are
cognitively constrained [2] – but instead design
knowledge transition systems. Building such
systems is also a logic representation challenge,
and more directly, a data service design challenge.
In summary, open-ended environments for
comparative phenomic data should be designed to
represent taxonomy and phylogeny as logically
tractable variables, whose significance on
particular comparative inferences can be assessed
and reassessed over time and across parallel or
succeeding phenomic analyses.
Representing taxonomy and phylogeny
as logically tractable variables
Nico M. Franz
School of Life Sciences, Arizona State University
URL: https://biokic.asu.edu; E-mail: nico.franz@asu.edu
Figs. 1 & 2. (1) Above: Toolkit workflow schema. T1, T2 =
input trees, A = RCC-5 articulations, C = additional tree
constraints, MIR = Maximally Informative Relations. (2)
Right: Abstract toolkit example, with user-provided input
and inferred output alignment visualizations. Source: [7].
Acknowledgments
Thanks to the Euler/X logic team: Shawn Bowers, Tuan Dang, Parisa Kianmajd, Bertram
Ludäscher (PI), Timothy McPhillips & Shizhuo Yu; and the ETC team: Hong Cui (PI), James
Macklin & Thomas Rodenhausen. This research is supported through the grants: NSF DEB–
1155984, DBI–1342595 (Franz); and IIS–118088, DBI–1147273 (Ludäscher).
References
[1] Haeckel. 1866. Generelle Morphologie der Organismen. doi: 10.5962/bhl.title.3953
[2] Atran. 1998. Folk biology and the anthropology of science: cognitive universals and
cultural particulars. doi: 10.1017/S0140525X98001277
[3] Euler/X project @ GitHub: https://github.com/EulerProject/EulerX
[4] Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration.
http://arxiv.org/abs/1402.1992
[5] Franz et al. 2016. Two influential primate classifications logically aligned. To appear in
Systematic Biology. http://arxiv.org/abs/1412.1025
[6] Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the
Andropogon complex. To appear in Semantic Web Journal. doi: 10.3233/SW-160220
[7] Franz et al. 2015. Reasoning over taxonomic change: exploring alignments for the
Perelleschus use case. PLoS ONE 10(2): e0118247. doi: 10.1371/journal.pone.0118247
[8] Jansen & Franz. 2015. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen &
Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and
alignments. ZooKeys 528: 1–133. doi: 10.3897/zookeys.528.6001
[9] Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy
alignments. CEUR Workshop Proceedings 1456: 14–23.
Aligning evolving syntheses
In the Euler/X project [3], we are developing
novel logic services and use cases that demonstrate
the feasibility of managing the taxonomic and
phylogenetic variables in open-ended systems. The
application is more fully described in [4]. We
represent taxonomic concepts and leverage Region
Connection Calculus (RCC-5) articulations, in
combination with off-the-shelf and custom Answer
Set Programming and RCC reasoners, to achieve
logically consistent and well-specified alignments
of semantically heterogeneous taxonomies and
phylogenies (Figs. 1 & 2).
The alignments are intentionally not 'objective',
but instead reflect one or more systematic experts'
subjective and purpose-driven, but logically
explicit, perspectives on how to integrate across
succeeding meaning hierarchies. The resulting
alignment visualizations are taxonomic and
phylogenetic meaning transition maps. One direct
benefit of inferring such maps is that the reliability
of taxonomic names and of phyloreferences can be
quantified through the semantics of RCC-5
articulations (Table 1; Fig. 4).
Pathways to system implementation
We have successfully applied this approach to
align multiple classifications of primates (Fig. 3)
[5], alternative species-level taxonomies of grasses
[6], succeeding cladistic and revisionary inferences
of weevils [7,8], and competing avian order-level
phylogenomic hypotheses (in prep.).
While the logic optimization research remains
ongoing, the RCC-5 multi-taxonomy/phylogeny
alignment approach appears ready for implemen-
tation into open-ended biodiversity or evolutionary
knowledge systems. This will provide needed
input on conceptual and practical challenges, and
on the value of the novel semantic integration
services afforded by this approach (Fig. 5.).
If you are interested in using the Euler/X
toolkit and/or in collaborating with us, please
contact nico.franz@asu.edu
Annual Summit of the
Phenotype Research Coordination Network
– 'Complex Data Integration'
Biosphere2, February 26–28, 2016
1 2
Fig. 3. Visualization of the consistent, well-specified
Cheirogaleiodae sec. Groves (2005) (T2) / Cheirogaleidae
sec. Groves (1993) (T1) alignment. Source: [5].
Table 1. Name:meaning cardinality relations in the primate
use case – only 56.4% name pairs are 'reliable'. Source: [5].
Fig. 4. ProvenanceMatrix visualization of Maximally Informa-
tive Relations in the Minyomerus use case. Sources: [8,9].
Fig. 5. Semi-realistic example of
using Euler/X RCC-5 alignments to
represent evolving relationships of
specimen, phenomic, and
taxonomic concept information.
Two floristic treatments (1993,
1997) have overlapping sets of
examined herbarium specimens,
where each specimen is variously
assigned to treatment-specific
phenomic traits and taxonomic
concepts. These lower-level RCC-5
articulations logically 'propagate
up' to integrate at higher levels.
3
4
5