A names backbone - a graph of taxonomy


Published on

[Presented at the Informatics Horizons event at the Natural History Museum, London, July 2013]

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Not pulling data together for the hell of it.It should increase efficiency, but also allow us to participate in analysis of existing data and synthesis of new dataParticipate in semantic web developments – un-covering the meaning in our data.
  • Attempts at transcribing an entity (“strings”) -> Recognised entities (“things”) -> Assertions about the inter-relationships between entities (a graph of “things”)Lexical entities -> semantic entities
  • Name occurrence layer – any informal attempt at the transcription of a nameWe struggled about what to call these. We called them name occurrences for a bit. Then we looked at the data and we thought “nomenclutter” was probably a better term.
  • Some name occurrences are code governed names – eligible to appear in the next layer – the names layer – this holds all the objective published facts about a name – its orthography, authorship, protologue reference, type citation and objective synonymy
  • Hypotheses about how names inter-relateConcepts layer – hypotheses draw these names together to form concepts via heterotypic synonymy.Projects such as World checklist (monographs) but also floras (regional checklists).The questions we are asked tend to be about concepts – species, their characteristics and how they inter-relate. But the resources we have tend to be name occurrences.So we want to answer scientific questions and operate at the concept level. But we too often have to start at the lowest level.
  • We need to provide ways to allow people to better navigate between the layers, and better focus their efforts – e.g. build classifications using the same objective bases.
  • We need to provide ways to allow people to better navigate between the layers, and better focus their efforts – e.g. build classifications using the same objective bases.We’ve recognised we need this three layer model. Conceptually it is a graph structure – and in implementation, we’re using graph technology to store and process the data. Graph technology has moved from computer science research to the mainstream with increasing use of social networks. This approach promotes the relations between items as “first class citizens” in the model.
  • Populating the centre layer – agreeing on the facts – is key.Changing our processesCollaboratively working on an authoritative set of names, which we use to build the graph of concepts
  • Holding data on the relations means that we can more precisely model the nomenclatural / taxonomic domain. We can reuse names to form many different, overlapping, conflicting hypotheses. We can compare hypotheses by looking at how a name (as an object – a “thing” not a “string”) is treated in different classifications.
  • Contrived example, but shows that completely opposing hypotheses can be modelled using the same basic elements.
  • A names backbone - a graph of taxonomy

    1. 1. A names backbone: a graph of taxonomy Nicky Nicolson, RBG Kew Informatics Horizons for the Natural History Museum 24th July 2013
    2. 2. Project and aim Project • DEFRA funded • Split into 3 phases: • Names and taxonomy • Collections • Taxon based information Aim • To create, curate and cite semantically-meaningful objects • “Things “ not “strings”
    3. 3. A 3-layered model “Strings” “Things” “Graph of things”
    4. 4. Attempts a transcribing a name: “strings”
    5. 5. Names in the nomenclatural sense: “things”
    6. 6. Taxonomic concepts: “a graph of things”
    7. 7. A 3-layered model “Strings” “Things” “Graph of things”
    8. 8. A 3-layered model Name occurrences Names Concepts
    9. 9. Relations become more interesting than the nodes
    10. 10. Multiple opinions – using the same name nodes
    11. 11. Summary Implications: - Changes in the way we create, curate and cite data and how we present it to systems and users Applications: - Integration of data, when that data has been stored using different classifications - Analysis - why do differences of opinion occur? - Synthesis - propose classifications by integrating existing overlapping concept data
    12. 12. Thanks Nicky Nicolson n.nicolson@kew.org