SlideShare a Scribd company logo
Kew at pro-iBiosphere
data hackathon
Nicky Nicolson, Matt Blissett
RBG Kew Biodiversity Informatics team
A map + data + tools = links
Two minute background: what we’ve done, why we
should link up our data
What is needed?
- Persistent identifiers
- Tools – to turn “strings” into “things”
What we’ve brought along:
- Map
- Data
- ... Labelled with persistent identifiers
- A rules based matching / linking tool
A map + data + tools = links
Two minute background: what we’ve done, why we
should link up our data
What is needed?
- Persistent identifiers
- Tools – to turn “strings” into “things”
What we’ve brought along:
- Map
- Data
- ... Labelled with persistent identifiers
- A rules based matching / linking tool
specimens.kew.org/herbarium/K000525802
doi: 10.1007/s12225-010-9210-7
Cited in:
Rakotoarinivo M, Dransfield J. 2010
New species of Dypsis and Ravenea
(Arecaceae) from Madagascar. Kew
Bull. 65, 279–303.
doi:10.1007/s12225-010-9210-7
specimens.kew.org/herbarium/K000525802
Data linking tool
Rules based
Armed with a tabular dataset, you:
Define zero or more transformers for each field
Define how fields must match
This is a match configuration.
Examples of transformers
Epithet
mediterraneum → mediterranea
NormaliseDiacrits
Déségl. → Desegl.
RemoveBracketedText, RomanNumeral
cix (1892), 57 → 109 57
CleanedPubAuthors
(L.) A.Gray in Hook.f. → A.Gray
SurnameExtracter
(A.Gray) A.Heller → (Gray) Heller
PageExtractor
37(4): 412 (1977) → 412
Examples of matchers
Exact
CommonTokens
CapitalLetters
in Beitr. Aethiop. → B A
Beitr. Fl. Aethiop. → B F A = 0.67 ratio
Number
Integer
Levenshtein
Using the matcher
A configured match can run against any tabular dataset.
Accessible as:
- JSON web service
- Google Refine reconciliation service (work in
progress)
Transformers can be dropped into Google Refine
Proposal: link names in floras to
IPNI
We’ll set up the tool with IPNI as its backend dataset
We run lists of taxa treated in floras against it and
distribute IPNI IDs for these names.
Short term gain: navigate via the IPNI ID to the
evidence about the name – protologues (Rod has
matched 120K to DOIs) and types.
Long term gain: GSPC target #1 – online world flora.
Simpler to integrate data if we’re talking about the
same name.
Proposal – link IPNI to types
We set up the tool with a botanical specimen catalogue
as its backend data-source.
We link up the IPNI cited type data with the specimens
themselves.
Proposal – link floras to
specimens
Floras use herbarium specimens as evidence for their
distribution statements.
We set up the tool with a botanical specimen catalogue
as its backend data-source.
We extract specimen references from floras and run
these against the tool to create links from flora
accounts to specimens themselves.
specimens.kew.org/herbarium/K000049118
Cited in: FZ volume:5 part:3 (2003) Rubiaceae by D.M.Bridson &
B.Verdcourt
specimens.kew.org/herbarium/K000049118
Proposal – link duplicates
between herbaria
We set up the tool with a botanical specimen catalogue
e.g. K as its backend data-source.
We fire specimen data from another specimen
catalogue at it to look for duplicates.
Benefits:
- Geo-referencing
- Imaging
- Data capture efficiency
n.nicolson@kew.org
@nickynicolson
m.blissett@kew.org

More Related Content

What's hot

Dgpg college kanpur_2015
Dgpg college kanpur_2015Dgpg college kanpur_2015
Dgpg college kanpur_2015
Puneet Kacker
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Robert Grossman
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
Michael Bar-Sinai
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
GigaScience, BGI Hong Kong
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize Agriculture
David LeBauer
 
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support ResearchDataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
IAALD Community
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
Rensselaer Polytechnic Institute
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
Michel Dumontier
 
Why are we still doing industrial age drug
Why are we still doing industrial age drugWhy are we still doing industrial age drug
Why are we still doing industrial age drugSean Ekins
 
Getting Started With Kaggle Dataset
Getting Started With Kaggle DatasetGetting Started With Kaggle Dataset
Getting Started With Kaggle Dataset
Sankha Subhra Mondal
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
Michel Dumontier
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 

What's hot (13)

Dgpg college kanpur_2015
Dgpg college kanpur_2015Dgpg college kanpur_2015
Dgpg college kanpur_2015
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize Agriculture
 
Amman Workshop - Overview - M MacKay
Amman Workshop - Overview - M MacKayAmman Workshop - Overview - M MacKay
Amman Workshop - Overview - M MacKay
 
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support ResearchDataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
Why are we still doing industrial age drug
Why are we still doing industrial age drugWhy are we still doing industrial age drug
Why are we still doing industrial age drug
 
Getting Started With Kaggle Dataset
Getting Started With Kaggle DatasetGetting Started With Kaggle Dataset
Getting Started With Kaggle Dataset
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 

Viewers also liked

Challenges in developing names services - RDA
Challenges in developing names services - RDAChallenges in developing names services - RDA
Challenges in developing names services - RDA
nickyn
 
Rda p5-env-plenary-nn
Rda p5-env-plenary-nnRda p5-env-plenary-nn
Rda p5-env-plenary-nnnickyn
 
829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things
nickyn
 
names-backbone-graph-TDWG
names-backbone-graph-TDWGnames-backbone-graph-TDWG
names-backbone-graph-TDWGnickyn
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talkRoderic Page
 
Building a names backbone
Building a names backboneBuilding a names backbone
Building a names backbonenickyn
 
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...
Neo4j
 

Viewers also liked (7)

Challenges in developing names services - RDA
Challenges in developing names services - RDAChallenges in developing names services - RDA
Challenges in developing names services - RDA
 
Rda p5-env-plenary-nn
Rda p5-env-plenary-nnRda p5-env-plenary-nn
Rda p5-env-plenary-nn
 
829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things
 
names-backbone-graph-TDWG
names-backbone-graph-TDWGnames-backbone-graph-TDWG
names-backbone-graph-TDWG
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talk
 
Building a names backbone
Building a names backboneBuilding a names backbone
Building a names backbone
 
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...
 

Similar to Kew at the pro-iBiosphere data hackathon

IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
Mark Wilkinson
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
Mitch Miller
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biology
Chunlei Wu
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
Anita de Waard
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
University of Malaya
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
GigaScience, BGI Hong Kong
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
Bianca Pereira
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
Alejandra Gonzalez-Beltran
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
Susanna-Assunta Sansone
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Mark Wilkinson
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
David Johnson
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
GigaScience, BGI Hong Kong
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Fiona Nielsen
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI)
nickyn
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
Vivien Bonazzi
 

Similar to Kew at the pro-iBiosphere data hackathon (20)

IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biology
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI)
 
Sw ri sciverse ppt
Sw ri sciverse pptSw ri sciverse ppt
Sw ri sciverse ppt
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

Kew at the pro-iBiosphere data hackathon

  • 1. Kew at pro-iBiosphere data hackathon Nicky Nicolson, Matt Blissett RBG Kew Biodiversity Informatics team
  • 2. A map + data + tools = links Two minute background: what we’ve done, why we should link up our data What is needed? - Persistent identifiers - Tools – to turn “strings” into “things” What we’ve brought along: - Map - Data - ... Labelled with persistent identifiers - A rules based matching / linking tool
  • 3. A map + data + tools = links Two minute background: what we’ve done, why we should link up our data What is needed? - Persistent identifiers - Tools – to turn “strings” into “things” What we’ve brought along: - Map - Data - ... Labelled with persistent identifiers - A rules based matching / linking tool
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 19.
  • 20. Cited in: Rakotoarinivo M, Dransfield J. 2010 New species of Dypsis and Ravenea (Arecaceae) from Madagascar. Kew Bull. 65, 279–303. doi:10.1007/s12225-010-9210-7 specimens.kew.org/herbarium/K000525802
  • 21. Data linking tool Rules based Armed with a tabular dataset, you: Define zero or more transformers for each field Define how fields must match This is a match configuration.
  • 22. Examples of transformers Epithet mediterraneum → mediterranea NormaliseDiacrits Déségl. → Desegl. RemoveBracketedText, RomanNumeral cix (1892), 57 → 109 57 CleanedPubAuthors (L.) A.Gray in Hook.f. → A.Gray SurnameExtracter (A.Gray) A.Heller → (Gray) Heller PageExtractor 37(4): 412 (1977) → 412
  • 23. Examples of matchers Exact CommonTokens CapitalLetters in Beitr. Aethiop. → B A Beitr. Fl. Aethiop. → B F A = 0.67 ratio Number Integer Levenshtein
  • 24. Using the matcher A configured match can run against any tabular dataset. Accessible as: - JSON web service - Google Refine reconciliation service (work in progress) Transformers can be dropped into Google Refine
  • 25. Proposal: link names in floras to IPNI We’ll set up the tool with IPNI as its backend dataset We run lists of taxa treated in floras against it and distribute IPNI IDs for these names. Short term gain: navigate via the IPNI ID to the evidence about the name – protologues (Rod has matched 120K to DOIs) and types. Long term gain: GSPC target #1 – online world flora. Simpler to integrate data if we’re talking about the same name.
  • 26. Proposal – link IPNI to types We set up the tool with a botanical specimen catalogue as its backend data-source. We link up the IPNI cited type data with the specimens themselves.
  • 27. Proposal – link floras to specimens Floras use herbarium specimens as evidence for their distribution statements. We set up the tool with a botanical specimen catalogue as its backend data-source. We extract specimen references from floras and run these against the tool to create links from flora accounts to specimens themselves.
  • 29. Cited in: FZ volume:5 part:3 (2003) Rubiaceae by D.M.Bridson & B.Verdcourt specimens.kew.org/herbarium/K000049118
  • 30. Proposal – link duplicates between herbaria We set up the tool with a botanical specimen catalogue e.g. K as its backend data-source. We fire specimen data from another specimen catalogue at it to look for duplicates. Benefits: - Geo-referencing - Imaging - Data capture efficiency
  • 31.