A names backbone: a
graph of taxonomy
Nicky Nicolson, RBG Kew
Informatics Horizons for the Natural History Museum
24th July 2013
Project and aim
• DEFRA funded
• Split into 3 phases:
• Names and taxonomy
• Taxon based information
• To create, curate and cite semantically-meaningful
• “Things “ not “strings”
A 3-layered model
- Changes in the way we create, curate and cite data
and how we present it to systems and users
- Integration of data, when that data has been stored
using different classifications
- Analysis - why do differences of opinion occur?
- Synthesis - propose classifications by integrating
existing overlapping concept data
Not pulling data together for the hell of it.It should increase efficiency, but also allow us to participate in analysis of existing data and synthesis of new dataParticipate in semantic web developments – un-covering the meaning in our data.
Attempts at transcribing an entity (“strings”) -> Recognised entities (“things”) -> Assertions about the inter-relationships between entities (a graph of “things”)Lexical entities -> semantic entities
Name occurrence layer – any informal attempt at the transcription of a nameWe struggled about what to call these. We called them name occurrences for a bit. Then we looked at the data and we thought “nomenclutter” was probably a better term.
Some name occurrences are code governed names – eligible to appear in the next layer – the names layer – this holds all the objective published facts about a name – its orthography, authorship, protologue reference, type citation and objective synonymy
Hypotheses about how names inter-relateConcepts layer – hypotheses draw these names together to form concepts via heterotypic synonymy.Projects such as World checklist (monographs) but also floras (regional checklists).The questions we are asked tend to be about concepts – species, their characteristics and how they inter-relate. But the resources we have tend to be name occurrences.So we want to answer scientific questions and operate at the concept level. But we too often have to start at the lowest level.
We need to provide ways to allow people to better navigate between the layers, and better focus their efforts – e.g. build classifications using the same objective bases.
We need to provide ways to allow people to better navigate between the layers, and better focus their efforts – e.g. build classifications using the same objective bases.We’ve recognised we need this three layer model. Conceptually it is a graph structure – and in implementation, we’re using graph technology to store and process the data. Graph technology has moved from computer science research to the mainstream with increasing use of social networks. This approach promotes the relations between items as “first class citizens” in the model.
Populating the centre layer – agreeing on the facts – is key.Changing our processesCollaboratively working on an authoritative set of names, which we use to build the graph of concepts
Holding data on the relations means that we can more precisely model the nomenclatural / taxonomic domain. We can reuse names to form many different, overlapping, conflicting hypotheses. We can compare hypotheses by looking at how a name (as an object – a “thing” not a “string”) is treated in different classifications.
Contrived example, but shows that completely opposing hypotheses can be modelled using the same basic elements.