names-backbone-graph-TDWG
Upcoming SlideShare
Loading in...5
×
 

names-backbone-graph-TDWG

on

  • 1,073 views

 

Statistics

Views

Total Views
1,073
Views on SlideShare
978
Embed Views
95

Actions

Likes
0
Downloads
4
Comments
0

1 Embed 95

https://twitter.com 95

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We looked at the data held in the systems on the “smartie” diagram, and assessed what data is there and what business processes are supported.Its a map of information than can be accessed from many different points, but I will walk through in one linear sequence:
  • Fieldwork...
  • Takes place in a geographic area...
  • Collects physical material...
  • Accessioned to multiple specialist collections...
  • (may be) dispatched to external organisations...
  • Handled by agents (people / teams / organisations)....
  • Collection objects are determined...
  • Which labels them with a concept....
  • Concepts form classifications
  • The core of a classification is a name.... (with a special link back to collections via the type)
  • Names are published in literature, accessed via bibliographic citations.
  • Concepts can be mapped to management classifications (for reporting purposes) and to phylogenies...
  • The final piece is “taxon based information” – assertions about species, evidence by reference to literature or to a specimen.
  • Won’t go through this in detail – basically:We have a clever tool, and a dedicated team to work with the data.We will aim to share the tool, and to publish the configurations used to match the data.
  • This is a “digital age” name – includes a link out (using DOI) to the article containing the protologue.
  • This is a slightly older name, but has been linked to BHL – who include content past the 1923 copyright cut-off.A recent IPNI release added this functionality.
  • The name is a sp nov, so type citation is included in the protologue. We have (spirit) type material at Kew – so will be included in the herbarium catalogue.
  • The type specimen is digitised in herbcat (from the spirit catalogue)We’ve built a way to permit persistent citation of specimen objects using anHTTP URI (following RBGE’s lead)
  • When working with older names, its worth clicking through to the author details in IPNI...
  • ~5000 authors are linked through to the biographical details in TL-2
  • TL-2 gives a biography and details of where the author lodged their specimens.
  • We’ve also built some services on the names layer – data in and data out.
  • Data in This snippet from a Phytokeys publication.Includes an IPNI ID – the data about the name submitted to IPNI pre-publication, new name records created, ID sent back to issuer. Our persistent identifier is therefore embedded in published literature.Done via machine to machine service.
  • Phytokeys sends structured formatted data to the names layer.We’ll directly reuse this approach to support classification editors sending (missing) names to the names layer.And we can support external classification systems doing the same thing.
  • Managing objects and the relationships between them is not a problem unique to systematicsThe commercial world has caught up with us to some extent – what was a computer science research problem is now mainstreamMany commercial applications – e.g. targeted advertising, network topology, fraud analysis
  • At the highest level, we are using a graph model to represent the taxonomic domain.Ours is a very interconnected domain, to some extent the commercial world has caught up with us: a “graph” of nodes and their network of interconnections can easily be built using commodity software.Social graphs are an interesting comparison: there is no one single “correct” social graph.In our domain there is no one single “correct” taxonomic hypothesis of everything.
  • We represent names as nodes, and allow them to be inter-related in many different ways using relationships between name nodes.We can store data on the relationship (the relationship is a “first class entity” – in some ways the relationships become more interesting than the nodes).
  • There is also the special relationship that we have in botany i.e. the combination-basionym relationship, which relates two names as they are based on the same type.
  • Not a backbone, more like a taxonomic network.We can query for areas of dispute, and apply rules – like taxonomists apply in their minds – to translate between classifications and resolve areas of dispute.
  • Quick recap of implementation of persistent identifiers:NamesLiteratureSpecimensConcepts- Agents...and tools – the tool that the data improvement team have used to match names from the World Checklist System against IPNI can also be used to match determinations against names from IPNI and/or concepts from the WCS. If we decided to break out collection event information, we could use the same tool to match these against the data held in separate collections systems.
  • Quick recap of implementation of persistent identifiers:NamesLiteratureSpecimensConcepts- Agents...and tools – the tool that the data improvement team have used to match names from the World Checklist System against IPNI can also be used to match determinations against names from IPNI and/or concepts from the WCS. If we decided to break out collection event information, we could use the same tool to match these against the data held in separate collections systems.

names-backbone-graph-TDWG names-backbone-graph-TDWG Presentation Transcript

  • A names backbone: a graph of taxonomy Nicky Nicolson, RBG Kew Biodiversity Information Standards (TDWG) annual meeting Florence, Italy -31st October 2013
  • Three layer model: Concepts Names Name occurrences: e.g. collections
  • Three layer model: Concepts Names: e.g. IPNI & IF Name occurrences
  • Three layer model: Concepts: monographs & regional floras Names Name occurrences
  • Names occurrence layer Data improvement team – 4 people Names occurrence database Matching and de-duplication processes Data matching tool – “MatchConf” – flexible configuration of matching logic , running matches as batch jobs - Not dedicated to names data – reusable for any data-type - Import / export via TCS data standard -
  • Names layer - Wrapped the existing names systems - IPNI - IF (work in progress) - Data import / export via TCS - Workflows: - submit new names for inclusion - request modifications to existing names - Linking to evidence...
  • Linking names to evidence Literature: • Digital age name citations linked via DOI • Legacy (pre-digital) name citations linked to BHL (work in progress) Type specimens: • Persistent identifier to K specimen records: http://specimens.kew.org/herbarium/[barcodeID]
  • “Cite this specimen as”: http://specimens.kew.org/herbarium/60152.000
  • Names layer services Data input • Remote submission of names pre-publication Publishing data out • Current awareness feeds of new / changed names • Data subsets - suitable for use as the starting point to build classifications
  • Structured submission of names pre-publication PhytoKeys 26: 101–112, doi: 10.3897/phytokeys.26.5335
  • Phytokeys External classification system Concepts: WCS Names: IPNI & IF Name occurrences
  • Taxonomy deals with relationships between names “Duboscia macrocarpa was considered as the only species in the genus by Cheek et al. (2004, 2011), Keay (1989), Lebrun & Stork (1997) and Hawthorne & Jongkind (2006), with D. viridiflora in synonymy. Lebrun & Stork (2003) subsequently changed their view of Duboscia to include two species, D. macrocarpa and D. polyantha, which they had previously placed in synonymy with D. macrocarpa. …“(continues)
  • Concepts layer: a graph
  • Concepts layer: taxonomy as a graph • Names are nodes • Typed, directed relationships represent synonymy and taxonomic placement • Evidence for taxonomic assertions provided as references ...and again, standards based import / export using TCS
  • Nomenclature: name nodes Taxonomy: relations between nodes Duboscia Duboscia macrocarpa Duboscia viridiflora
  • Hold data on relationships to provide evidence Duboscia AccordingTo: WCS Year:2001 Ref: doi:10.1234/567.899 RefType: Monographic Duboscia macrocarpa AccordingTo: WCS Year:2001 Ref: doi:10.1234/567.899 RefType: Monographic Duboscia viridiflora
  • Reuse the nodes to support different taxonomic opinions Duboscia AccordingTo: FWTA Year:1971 Ref: doi:10.6789/123.000 RefType: Floristic Duboscia macrocarpa Duboscia viridiflora
  • Persistent identification of concepts We can re-create a sub-graph representing a concept at a particular point in time using: 1. Name ID 2. Classification 3. State Users can link to a stable state of a concept We can provide a feed of what has changed since
  • Thanks n.nicolson@kew.org @nickynicolson