Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014
Taxonomic 'data' exchange
expression and synthesis
Jonathan A. Rees
National Evolutionary Synthesis Center
IevoBio, 25 June 2014
CoL IRMNG NCBI GBIF EOL Union4
Finding inconsistencies = good
Collecting information is useful
'Data' – BAH!
'data' 'information' 'representation'
'format' 'nomenclature' - how bland.
Claims, not data. Consequential.
Taxon: a set determined by a membership rule.
Taxonomy: a collection of taxa that form a
Some taxonomies are phylogenetic (all clades).
Taxonomies are collections of
X includes A, B, and C
A, B, C are mutually disjoint
X, A, B, and C are clades - if phylogenetic.
The important claims are about
X includes Y
X1, X2, X3, … are mutually disjoint
X is a clade
X is a species
We have to designate taxa somehow, when we
express a claim
Many taxon names are polysemous
To be clear, always say 'in the sense of' some
static document (article or database snapshot)
X = Mammalia sensu
If used multiple ways in some document, give
Claims about taxa
Reasoning with claims
X includes Y and Y includes Z
→ X includes Z
X includes Y
→ X and Y are not disjoint
X and Y are clades →
one includes the other, or they are
Two ways to be wrong
Wrong about designation
Wrong about science
'Alignment' = estimating
X = Y (X and Y are the same taxon)
Heuristics based on properties and
relations (including names...)
Manual 'curation' if necessary
X is incertae sedisin A means
(1) A includes X
(2) it's not known which of A's non-incertae-
sedis'children' X belongs to, if any
(2) is not a claim about biology.
Logical content = (1).
'Rozellabelongs in Fungi.'
'Rhodophyceae is the same as Rhodophyta.'
'SILVA'sMorganellaisn't the same as Index
'Anolisisn't a clade unless it isNoropsis
merged into it.'
“Rozellais in Fungi.”
Rozella sensuSILVA115 and Fungisensu
SILVA115 belong to a clade disjoint from the
other SILVA115 children of Nucletmycea.
How about let's apply the label 'Fungi' to
such a clade and not to Fungisensu
Notation not so important,
but for example -
disjoint(A, B, C, …)
node(X, A, B, C, …) - abbreviation
same(X, Y) notSame(X, Y)
+ nomenclatural claims
On and on
Compare 'macrotaxonomy' and
Defense of scrufy
Compare Rod's github proposal
Philosophy of language
Separate science from nomenclature.
Use logic to do science.
Always use names withsensu.
Use heuristics to prevent paralysis.
Don't 'represent data' – express claims!
Nico Franz, David Thau, Rod Page
Open Tree: Karen Cranston, Stephen Smith,
Mark Holder, and legions of others
Gerald Jay Sussman
Jonathan A. Rees 2014
Copyright waived CC0 1.0