Strings to things: a user-friendly
framework for data reconciliation
Nicky Nicolson, RBG Kew
@nickynicolson
Biodiversity Information Standards (TDWG) annual meeting
Nairobi, Kenya / 28th September – 1 October 2015
Reconciliation
• Turns a string representation of an entity into
an actionable identifier.
e.g.:
Tahina spectabilis
Will reconcile to:
http://ipni.org/urn:lsid:ipni.org:names:77086615-1
Maximise reuse, two stage process
1. Standardise data
- Package of 40 plus “transformers”
- All accept a string input, produce a string
output
Examples of transformers
Open Refine screenshot
Open source
http://github.com/RBGKew/StringTransformers
Maximise reuse, two stage process
2. Match the data
- Package of 20 plus “matchers”
- All accept two inputs and return a flag if they
match
Configuring a service
1) Read tabular data (file or DB)
2) Configure transformers
3) Configure matchers
Run it…
1) Service description
2) Three service endpoints
3) Javascript query interface
IPNI Reconciliation Service
3 service endpoints
IPNI Reconciliation Service
Flexible web service
• Open Refine compatible
• But underneath it’s JSON over HTTP
• … so call it from any programming language
Service metadata
Service call
Service response
List of reconciliation services
https://github.com/OpenRefine/OpenRefine/wi
ki/Reconcilable-Data-Sources
Open source
https://github.com/RBGKew/Reconciliation-and-Matching-Framework
What we’ll work on in the future
Reconciliation services on different
data types
• Specimens
– Add DwCA as a readable data store
– Collections focussed transformers & matchers
– Resolve & link specimen duplicates
• People
• Trait glossaries
Integration with github
Thanks to:
• Biodiversity Informatics team (Abigail Barker,
Matt Blissett, James Crowe, John Iacona, Rob
Turner, Alecs Gueder)
• Plant & fungal name curation team (Christine
Barker / Irina Belyaeva / Katherine Challis /
Rafael Govaerts / Paul Kirk / Heather Lindon /
Emma Williams)
• Data improvement team (Anna Lynch, Rachel
Witherow, Malin Rivers, Esther Wainwright-Deri)
@nickynicolson / n.nicolson@kew.org
http://bit.ly/k-names-service
http://github.com/RBGKew
Biodiversity Information Standards (TDWG) annual meeting
Nairobi, Kenya / 28th September – 1 October 2015

829 tdwg-2015-nicolson-kew-strings-to-things

Editor's Notes

  • #12 Configuration (transformers & matchers) displayed in a web page