This document discusses using a rules-based data linking tool to connect disparate biodiversity data sources. It proposes applying the tool to link (1) plant names in floras to the International Plant Names Index, (2) cited type specimens in IPNI to actual specimen records, (3) flora accounts to herbarium specimens, and (4) duplicate specimen records between herbaria collections. The tool transforms and matches fields in tabular datasets using customizable rules to identify relationships between entities from different sources.
Biothings APIs: high-performance bioentity-centric web servicesChunlei Wu
High performance web service API for gene and genetic variant annotations: MyGene.info and MyVariant.info, And a SDK for building same high-performance API for other biomedical data types ("biothings")
Presented this slide deck to analytic and evaluation professionals at the Ohio Program Evaluators' Group's bi-annual conference. Discussed how to reduce large, complex datasets into smaller, manageable projects.
A keynote given on experiences in curating workflows and web services.
3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are many tens of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of well over 20 million chemical substances integrated with over 300 disparate data sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for the semantic web for chemistry and to provide access to a set online tools and services to support access to these data. I will also discuss how ChemSpider is being used to enhance Semantic Publishing in Chemistry at RSC.
Biothings APIs: high-performance bioentity-centric web servicesChunlei Wu
High performance web service API for gene and genetic variant annotations: MyGene.info and MyVariant.info, And a SDK for building same high-performance API for other biomedical data types ("biothings")
Presented this slide deck to analytic and evaluation professionals at the Ohio Program Evaluators' Group's bi-annual conference. Discussed how to reduce large, complex datasets into smaller, manageable projects.
A keynote given on experiences in curating workflows and web services.
3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are many tens of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of well over 20 million chemical substances integrated with over 300 disparate data sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for the semantic web for chemistry and to provide access to a set online tools and services to support access to these data. I will also discuss how ChemSpider is being used to enhance Semantic Publishing in Chemistry at RSC.
A very simplistic presentation on current Big Data challenge in bioinformatics. A case on study using one of the computational methods for drug discovery is presented. Cost for development of a new drug is increasing dramatically every year along with challenges associated with it. The big data approach in drug discovery is penetrating slowly, but with a constant speed. We believe effective use of big data would be highly beneficial for taking several crucial dicision during the complete drug discovery process. A discussion on data management using Hadoop and analysis using R programming package is also discussed.
DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai
This presentation describes the concept of DataTags, which simplifies handling of sensitive datasets. It then shows the Tags toolset, and how it is integrated with Dataverse, Harvard's popular dataset repository.
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
Abstract:
Humans need a secure and sustainable food supply, and science can help. We have an opportunity to transform agriculture by combining knowledge of organisms and ecosystems to engineer ecosystems that sustainably produce food, fuel, and other services. The challenge is that the information we have. Measurements, theories, and laws found in publications, notebooks, measurements, software, and human brains are difficult to combine. We homogenize, encode, and automate the synthesis of data and mechanistic understanding in a way that links understanding at different scales and across domains. This allows extrapolation, prediction, and assessment. Reusable components allow automated construction of new knowledge that can be used to assess, predict, and optimize agro-ecosystems.
Developing reusable software and open-access databases is hard, and examples will illustrate how we use the Predictive Ecosystem Analyzer (PEcAn, pecanproject.org), the Biofuel Ecophysiological Traits and Yields database (BETYdb, betydb.org), and ecophysiological crop models to predict crop yield, decide which crops to plant, and which traits can be selected for the next generation of data driven crop improvement. A next step is to automate the use of sensors mounted on robots, drones, and tractors to assess plants in the field. The TERRA Reference Phenotyping Platform (TERRA-Ref, terraref.github.io) will provide an open access database and computing platform on which researchers can use and develop tools that use sensor data to assess and manage agricultural and other terrestrial ecosystems.
TERRA-Ref will adopt existing standards and develop modular software components and common interfaces, in collaboration with researchers from iPlant, NEON, AgMIP, USDA, rOpenSci, ARPA-E, many scientists and industry partners. Our goal is to advance science by enabling efficient use, reuse, exchange, and creation of knowledge.
---
Invited talk for the "Informatics for Reproducibility in Earth and Environmental Science Research" session at the American Geophysical Union Fall Meeting, Dec 17 2015.
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
Over 15 years ago, Sir Tim Berners Lee proclaimed the founding of an exciting new future involving intelligent agents operating over smarter data in order to perform complex tasks at the behest of their human controllers. At the heart of this vision lies an uneasy alliance between tedious formal knowledge representations and powerful analytics over big, but often messy data. Bio2RDF, our decade old open source project to create Linked Data for the life sciences, has weaved emergent Semantic Web technologies such as ontologies and Linked Data to generate FAIR - Findable, Accessible, Interoperable, and Reusable - data in the form of billions of machine accessible statements for use in downstream biomedical discovery.
This revolution in data publication has been strengthened by action from global bioinformatics institutions such as the NCBI, NCBO, EBI, and DBCLS. Notably, NCBI's PubChem has successfully coupled large scale data integration with community-based standards to offer a remakable biochemical knowledge resource amenable to data hungry discovery tools. Yet, in the face of increasing pressure from researchers, funders, and publishers, will these approaches be sufficient for growing and maintaining a comprehensive knowledge graph that is inclusive of all biomedical research?
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Challenges in developing names services - RDAnickyn
Explanation of how names data are gathered, structured, standardised and annotated - and how these data are mobilised using names services. Challenges are around credit and attribution, usage metrics on services.
Presented at the Research Data Alliance plenary 5, 9-11 March 2015, San Diego.
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...Neo4j
In this talk David will summarize business and technical use cases and introduce Kaiso. He will give a basic overview of how to use it, along with some examples of how one might use it to model complex class hierarchies. This will include some interactive code demonstrations. David will explore the main design goals of the project, the current state of the project, and take a look at what’s ahead on Kaiso’s roadmap.
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
Discussion about ways of achieving FAIRness of both metadata and data. Brute force approaches, and more elegant "projection" approaches are shown.
Relevant papers are at:
doi: 10.7717/peerj-cs.110 (https://peerj.com/articles/cs-110/)
doi: 10.3389/fpls.2016.00641 (https://doi.org/10.3389/fpls.2016.00641)
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
A very simplistic presentation on current Big Data challenge in bioinformatics. A case on study using one of the computational methods for drug discovery is presented. Cost for development of a new drug is increasing dramatically every year along with challenges associated with it. The big data approach in drug discovery is penetrating slowly, but with a constant speed. We believe effective use of big data would be highly beneficial for taking several crucial dicision during the complete drug discovery process. A discussion on data management using Hadoop and analysis using R programming package is also discussed.
DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai
This presentation describes the concept of DataTags, which simplifies handling of sensitive datasets. It then shows the Tags toolset, and how it is integrated with Dataverse, Harvard's popular dataset repository.
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
Abstract:
Humans need a secure and sustainable food supply, and science can help. We have an opportunity to transform agriculture by combining knowledge of organisms and ecosystems to engineer ecosystems that sustainably produce food, fuel, and other services. The challenge is that the information we have. Measurements, theories, and laws found in publications, notebooks, measurements, software, and human brains are difficult to combine. We homogenize, encode, and automate the synthesis of data and mechanistic understanding in a way that links understanding at different scales and across domains. This allows extrapolation, prediction, and assessment. Reusable components allow automated construction of new knowledge that can be used to assess, predict, and optimize agro-ecosystems.
Developing reusable software and open-access databases is hard, and examples will illustrate how we use the Predictive Ecosystem Analyzer (PEcAn, pecanproject.org), the Biofuel Ecophysiological Traits and Yields database (BETYdb, betydb.org), and ecophysiological crop models to predict crop yield, decide which crops to plant, and which traits can be selected for the next generation of data driven crop improvement. A next step is to automate the use of sensors mounted on robots, drones, and tractors to assess plants in the field. The TERRA Reference Phenotyping Platform (TERRA-Ref, terraref.github.io) will provide an open access database and computing platform on which researchers can use and develop tools that use sensor data to assess and manage agricultural and other terrestrial ecosystems.
TERRA-Ref will adopt existing standards and develop modular software components and common interfaces, in collaboration with researchers from iPlant, NEON, AgMIP, USDA, rOpenSci, ARPA-E, many scientists and industry partners. Our goal is to advance science by enabling efficient use, reuse, exchange, and creation of knowledge.
---
Invited talk for the "Informatics for Reproducibility in Earth and Environmental Science Research" session at the American Geophysical Union Fall Meeting, Dec 17 2015.
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
Over 15 years ago, Sir Tim Berners Lee proclaimed the founding of an exciting new future involving intelligent agents operating over smarter data in order to perform complex tasks at the behest of their human controllers. At the heart of this vision lies an uneasy alliance between tedious formal knowledge representations and powerful analytics over big, but often messy data. Bio2RDF, our decade old open source project to create Linked Data for the life sciences, has weaved emergent Semantic Web technologies such as ontologies and Linked Data to generate FAIR - Findable, Accessible, Interoperable, and Reusable - data in the form of billions of machine accessible statements for use in downstream biomedical discovery.
This revolution in data publication has been strengthened by action from global bioinformatics institutions such as the NCBI, NCBO, EBI, and DBCLS. Notably, NCBI's PubChem has successfully coupled large scale data integration with community-based standards to offer a remakable biochemical knowledge resource amenable to data hungry discovery tools. Yet, in the face of increasing pressure from researchers, funders, and publishers, will these approaches be sufficient for growing and maintaining a comprehensive knowledge graph that is inclusive of all biomedical research?
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Challenges in developing names services - RDAnickyn
Explanation of how names data are gathered, structured, standardised and annotated - and how these data are mobilised using names services. Challenges are around credit and attribution, usage metrics on services.
Presented at the Research Data Alliance plenary 5, 9-11 March 2015, San Diego.
Kaiso: Modeling Complex Class Hierarchies with Neo4j - David Szotten @ GraphC...Neo4j
In this talk David will summarize business and technical use cases and introduce Kaiso. He will give a basic overview of how to use it, along with some examples of how one might use it to model complex class hierarchies. This will include some interactive code demonstrations. David will explore the main design goals of the project, the current state of the project, and take a look at what’s ahead on Kaiso’s roadmap.
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
Discussion about ways of achieving FAIRness of both metadata and data. Brute force approaches, and more elegant "projection" approaches are shown.
Relevant papers are at:
doi: 10.7717/peerj-cs.110 (https://peerj.com/articles/cs-110/)
doi: 10.3389/fpls.2016.00641 (https://doi.org/10.3389/fpls.2016.00641)
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
BioThings SDK: a toolkit for building high-performance data APIs in biologyChunlei Wu
This is from my talk at BOSC 2017.
What’s BioThings?
We use “BioThings” to refer to objects of any biomedical entity-type represented in the biological knowledge space, such as genes, genetic variants, drugs, chemicals, diseases, etc.
BioThings SDK
SDK represents “Software Development Kit”. BioThings SDK provides a Python-based toolkit to build high-performance data APIs (or web services) from a single data source or multiple data sources. It has the particular focus on building data APIs for biomedical-related entities, a.k.a “BioThings”, though it’s not necessarily limited to the biomedical scope. For any given “BioThings” type, BioThings SDK helps developers to aggregate annotations from multiple data sources, and expose them as a clean and high-performance web API.
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
Data is the most powerful resource in any field or subject of study. In Biology, data comes from scientists and their actions, while any institution that makes sense of the data collected, will be in the forefront in their respective research field. In the beginning of any data collection endeavour, it is critical to find proper management techniques to store data and to maximise its utilisation. This presentation reflects upon the current trends and techniques of data modeling, architecture with a highlight on the uses of database, focusing on Bioinformatics examples and case studies. Finally, the future of bioinformatics databases is highlighted to give an overview of the modeling techniques to accommodate the biological data escalation in coming years.
Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review.
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
This slide deck accompanies the manuscript "Interoperability and FAIRness through a novel combination of Web technologies", submitted to PeerJ Computer Science: https://doi.org/10.7287/peerj.preprints.2522v1
It describes the output of the "Skunkworks" FAIR implementation group, who were tasked with building a prototype infrastructure that would fulfill the FAIR Principles for scholarly data publishing. We show how a novel combination of the Linked Data Platform, RDF Mapping Language (RML) and Triple Pattern Fragments (TPF) can be combined to create a scholarly publishing infrastructure that is markedly interoperable, at both the metadata and the data level.
This slide deck (or something close) will be presented at the Dutch Techcenter for Life Sciences Partners Workshop, November 4, 2016.
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
Presented in session 48 - Sharing of sensitive data - presented by Fiona Nielsen on September 12, 2016 at #SciDataCon http://scidatacon.org
We have addressed the most pressing problem for public genomic data, that of data discoverability, by indexing worldwide resources for genomic research data on an online platform (repositive.io) providing a single point of entry to find and access available genomic research data.
http://www.scidatacon.org/2016/sessions/48/paper/26/
http://www.scidatacon.org/2016/sessions/48/
International data week - #RDAPlenary #IDW2016
Advancing the International Plant Names Index (IPNI) nickyn
The "names and taxa" information space is often thought of as being composed of three layers:
Taxonomic concepts
Code governed nomenclatural acts
Name occurrences
In many circumstances the distinction of these layers is blurred, leading to confusion and inefficiencies in information management. To date, IPNI has been mainly concerned with the middle layer comprising ICBN governed nomenclatural acts, and is formed of three key components: curated data, information services to expose this data, and dedicated editorial staff to provide nomenclatural expertise.
IPNI will be advanced from its current state to better connect to the layers above (taxonomic concepts) and below (name occurrences). This will require the expansion of data holdings, improved linkages, and the development of information services and associated workflows. These will be offered to key actors including name authors, publishers, taxonomists and managers of biodiversity information.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. A map + data + tools = links
Two minute background: what we’ve done, why we
should link up our data
What is needed?
- Persistent identifiers
- Tools – to turn “strings” into “things”
What we’ve brought along:
- Map
- Data
- ... Labelled with persistent identifiers
- A rules based matching / linking tool
3. A map + data + tools = links
Two minute background: what we’ve done, why we
should link up our data
What is needed?
- Persistent identifiers
- Tools – to turn “strings” into “things”
What we’ve brought along:
- Map
- Data
- ... Labelled with persistent identifiers
- A rules based matching / linking tool
20. Cited in:
Rakotoarinivo M, Dransfield J. 2010
New species of Dypsis and Ravenea
(Arecaceae) from Madagascar. Kew
Bull. 65, 279–303.
doi:10.1007/s12225-010-9210-7
specimens.kew.org/herbarium/K000525802
21. Data linking tool
Rules based
Armed with a tabular dataset, you:
Define zero or more transformers for each field
Define how fields must match
This is a match configuration.
24. Using the matcher
A configured match can run against any tabular dataset.
Accessible as:
- JSON web service
- Google Refine reconciliation service (work in
progress)
Transformers can be dropped into Google Refine
25. Proposal: link names in floras to
IPNI
We’ll set up the tool with IPNI as its backend dataset
We run lists of taxa treated in floras against it and
distribute IPNI IDs for these names.
Short term gain: navigate via the IPNI ID to the
evidence about the name – protologues (Rod has
matched 120K to DOIs) and types.
Long term gain: GSPC target #1 – online world flora.
Simpler to integrate data if we’re talking about the
same name.
26. Proposal – link IPNI to types
We set up the tool with a botanical specimen catalogue
as its backend data-source.
We link up the IPNI cited type data with the specimens
themselves.
27. Proposal – link floras to
specimens
Floras use herbarium specimens as evidence for their
distribution statements.
We set up the tool with a botanical specimen catalogue
as its backend data-source.
We extract specimen references from floras and run
these against the tool to create links from flora
accounts to specimens themselves.
30. Proposal – link duplicates
between herbaria
We set up the tool with a botanical specimen catalogue
e.g. K as its backend data-source.
We fire specimen data from another specimen
catalogue at it to look for duplicates.
Benefits:
- Geo-referencing
- Imaging
- Data capture efficiency