Building a names backbone

Building a “names
backbone”

Nicky Nicolson, RBG Kew

A names backbone

== “an environment for the management of multiple
overlapping classifications and tracking how these
change over time”
Not a monolith:
• Built on a layered view of the domain – clearly
separating names and taxonomy
• Names form the objective basis for higher layers

The current situation…
Many overlapping systems, few links

… and what we’re aiming for:
Authoritative data, reduced duplication, many more links

Names backbone: a layered environment

Name occurrence layer AKA
“Nomen-clutter”

== any attempt
at the
transcription of
a name..

Names layer

Holds objective
published facts
about a name:
-Orthography
- Authorship
- Protologue
reference
- Type citation
- Objective
synonymy

Concepts layer

Hypotheses
draw names
together to form
concepts via
heterotypic
synonymy

The (current) problem:
Most people want
to operate at
concept level…

The (current) problem:

… but have
to start right
down at the
lowest level

Solving the problem…

We need to provide ways to allow people to better
navigate between the layers, and better focus their
efforts – e.g. build classifications using the same
objective bases.

We started with a blank sheet of paper – it’s hard to get
existing systems to conform to the layering that we
need

Drawbacks of data models used to
date
• conflated the storage of names and concepts.
• store only a single classification
• store only the end product of a thought process, not
work in progress
• are difficult to version
• are difficult to query effectively (for hierarchies etc)

A new (graph) model

• Stores data as graphs – composed of nodes and
directed relationships
• Both nodes and relationships can hold data as
properties
• Supports highly interconnected data
• Supports self-referential data
• Optimised for queries on relationships

Using a graph model to hold
concept data: Attempt #1
Two nodes, with name
+ status properties,
and an “accepted_as”
link.
== a naïve use of the
graph model: status is
stored in 2 places
(explicitly in status
property, implicitly
by the participation
relationship)

More strict about the
separation of the
nomenclatural
information (the nodes)
and the taxonomic
information (the
relationships between
nodes), but the link
is still very sparse…

Add an attribute to
indicate which
classification asserts
this subjective
relationship:
Taxonomic status of a
name is inferred from
its participation
in a subjective
taxonomic relationship.

Links become more interesting
than the nodes
Expand the data
held on the
subjective
relationship to allow
it to be
computationally
assessed

Multiple opinions – using the
same name nodes
Reuse the name
nodes to store
multiple opinions
using the same
basic facts (name
nodes)

Relationships held

Objective, e.g.:
• Combination-basionym
• Later_homonym
• Alternative_name_for
• …
Subjective, e.g.:
• Parent_child (taxonomic placement)
• Synonym (heterotypic synonymy)
• …

Objective relationships “stronger” than
subjective

Supporting versioning

We keep all relationships, modifications to the data just
mark relationships as no longer current.
We can always resurrect the state of the graph
== persistent identification of taxon concepts

Versioning = name id +
classification + state

We can always resurrect the state of the graph.
Versioning enables remote curation of the data

Versioning = name id +
classification + state
State1, according to
WCS:
Xus yus Smith (A)
= Aus bus Jones
(S)
State2, according to
WCS:
Xus zus White (A)
= Xus yus Smith
(S)
= Aus bus Jones
We can always resurrect the state of the graph.
(S)
Versioning enables remote curation of the data

What can be done with this kind of
data model?
• Client systems can reliably connect to a version of a
concept
• We can see how concepts change over time
• Researchers can query the data to compare
classifications and identify areas of dispute
Longer term:
• Examine the “computed acceptance” rules used in
TPL - could these be run on the relationships in the
names backbone?

Building it: we first focussed on
the top two layers…

… but we need a way to manage
the name occurrences

Building the name occurrence layer:

Populating it:
• Seed it with authoritative set of names
• Add the version history of these names – how were
these names transcribed in the past?
Using it:
• Load candidate name occurrences and match them,
storing metrics on the match.
Reviewing – a “data improvement” team to:
• Verify the matches, focussing on ambiguity (that
which can’t be done computationally) == annotation

Services: name occurrence layer

- Data input / output:
DwCA
-Linking and
reviewing links
-RSS feeds to
indicate activity

Services: names layer
TCS
-Propose addition /
edit of names
-RSS feeds to
indicate activity

Services: concepts layer
TCS
-Create
classifications using
names
-Propose
addition / edit of
names to names
layer
-RSS feeds

The names backbone is an
extensible environment:
• Links “name occurrences” to names
• Separates curation of names and concepts
• Supports building concepts on the same objective
basis: enables sharing and reuse of foundation data.
• Allow many relationships to form concepts – supports
multiple overlapping classifications
• Allows distributed curation of the concepts.

Building a names backbone

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building a names backbone

Similar to Building a names backbone (20)

More from nickyn

More from nickyn (8)

Building a names backbone

Editor's Notes