1. Linking Data to Literature:
A Conceptual and Practical Framework
Outcomes from the RDA/WDS Publishing Data Interest Group
Presented by
Anita de Waard,
Boston, MA June 7 2016
2. 2
RDA/WDS Publishing Data Services WG
Part of the RDA/WDS Publishing Data IG
(together with Bibliometrics and Cost Recovery WGs)
Worked for 18 months, now at the end of its lifecycle:
WG goals to be rescoped at RDAP8
3. 3
Linking Research Data and the Literature: Why?
Why link?
1. Increase visibility &
discoverability of research data
(and articles)
2. Place research data in the right
context to enable proper re-use.
3. Support credit attribution
mechanisms
What is the problem?
1. Many disconnected sources (publishers, data centers,
repositories, infrastructure providers, …)
2. Heterogeneity of practices, for example:
• Different PID systems (DOI, accession numbers)
• Different ways of referencing data (formal citations,
in-text references, ..)
• Different moments of citing data (at publication, post
publication)
Objective: move from a plethora of (mostly) bilateral arrangements to a one-for-
all service model infrastructure for the research data publication landscape, to:
1. Increase interoperability
2. Decrease systemic inefficiencies
3. Power new tools and functionalities to benefit researchers
4. 4
Linking Research Data and the Literature: How?
• Universal: cross-disciplinary, global
• Inclusive and participatory: supported by all stakeholder groups
• Quality through meticulous provenance and metadata (not “filtering at the gate”)
• Open and non-discriminatory
• Standards-based
• Infrastructure & service layer
• Inclusive – new hubs welcome
• Create sustainable infrastructure as extensions of existing systems
• “Follow the content”: use established processes as natural aggregation
points (“hubs”) for different constituencies
• Interoperability between the hubs through common standards
The “multi-hub model”
5. 5
Output # 1: Prototype “Data-Literature Interlinking (DLI)”
Links collection
…
Harmonizing
PID
resolving
De-
duplicating
Information Space
Web Portal
Core Data Model
Data Sources
OAI-PMHSearch APIs
Examples:
• Pairs of DOIs
• DataCite records
• PANGAEA records
OAI-PMH
intersection
Over 2M
links!
(Prototype) interlinking service developed with OpenAIRE, DNET and PANGAEA
Give it a spin: http://dliservice.research-infrastructures.eu
6. 6
DLI: Building the graph (there are now > 7 my of these!)
Metadata (title etc.) from CrossRef
Metadata from Protein Data Bank (RCSB)
Article 1 (DOI1):
“Mining of
protein…”
Article 2 (DOI2):
“The crystal
structure..”
DATA SET 1
(PDB: 1b57)
“Class II
fructose..”
Metadata (title etc.) from CrossRef
7. 7
DLI Standards: metadata and provenance
PIDs: DOIs, Accession numbers, URLs -> URI’s
Relationships: References, Supplements, Cites (DataCite
schema)
Provenance
Data source, timestamp, completeness, provision
Attached to objects and relationships
Data sources: Link providers, “Resolvers”,
“System”
For more detail, see
DOI: 10.1007/978-3-319-24129-6_28
9. 9
Option 1: Feed data-literature link information to an existing Scholix hub
using your existing community standards
Option 2: Become a hub and share your data-literature link information
using the Scholix standards
Option 3: Help to expand and document the Scholix Guidelines.
Contact info@scholix.org or join theRDA-WDS Working Group on Data Publishing Services
Come join us at RDAP8 In Denver!
How can you participate?