DataCite – Bridging the gap and
helping to find, access and reuse data

Herbert Gruttemeier
OpenAIREplus workshop
February 8th, 2013
Braga
Publishers’ data policies
Publishers’
data policies

extract from
Nature Publishing Group,
Editorial Policies,
Availability of data and
materials

H. GRUTTEMEIER
http://www.doi.org
At the infrastructure level, DOI names are handles.
http://www.handle.net
From KE workshop presentation, The Hague, June 2011 (L. Lannom)
From KE workshop presentation, The Hague, June 2011 (L. Lannom)
From KE workshop presentation, The Hague, June 2011 (N. Paskin)
plutôt: identifiant numérique d’objet

« The objects identified by DOI names may be of any form digital, physical, or abstract - as all these forms may be
necessary parts of a content management system. The DOI
system is an abstract framework which does not specify a
particular context of its application, but is designed with the
aim of working over the Internet. »
Norman Paskin, « Digital Object Identifier (DOI®) System »
DataCite

•
•

•
•

Global consortium carried by local institutions
Focused on improving the scholarly infrastructure around
datasets and other non-textual information
Focused on working with data centres and organisations that
hold data
Providing standards, workflows and best-practice
Initially, but not exclusively based on the DOI system

•
•

Memorandum of Understanding, Paris, February 2009
Officially founded December 1st 2009 in London

•
DataCite Members
• Technische Informationsbibliothek (TIB), Germany
• Canada Institute for Scientific and Technical Information (CISTI)
• California Digital Library, USA
• Purdue University, USA
• Office of Scientific and Technical
Information (OSTI), USA
• The British Library
• Technical Information Center
of Denmark (DTU)
• Library of TU Delft, The Netherlands
• ZBMed, Germany
• ZBW, Germany
• GESIS, Germany
• Library of ETH Zürich, Switzerland
• Institut de l’Information Scientifique et
Technique (INIST-CNRS), France
• Swedish National Data Service (SND)
• Australian National Data Service (ANDS)
• Conferenza dei Rettori delle
Università Italiane (CRUI)
• National Research Council of Thailand
(NRCT)

Affiliated members:
• Digital Curation Center, UK
• Microsoft Research
• Interuniversity Consortium for Political and Social Research (ICPSR), USA
• Korea Institute of Science and Technology Information (KISTI)
• Bejiing Genomic Institute (BGI)
DataCite
The DataCite registration agency
–
–
–
–

Maintains the resolution infrastructure
Maintains a searchable database of metadata
Manages the identifiers over the long term
Establishes and shares best practice

Publishing agents (data centres, research institutes, data
publishers) are responsible for
–
–
–
–

Quality assurance
Content storage and access
Creating the identifiers
Creating and updating metadata
What type of data are we talking about?
PS1389-3

PS1390-3

IRD

Sand

(grav/10 cm3)
0

CaCO3

(%)
20

0

TOC

(%)
100

0

Radio

(%)
15

0

Smect

(%/sand)
0.5

0

0

PS1431-1

IRD

(%/clay)
50

Sand

(grav/10 cm3)
100

0

CaCO3

(%)
20

0

TOC

(%)
100

0

Radio

(%)
15

0

Smect

(%/sand)
0.5

0

0

PS1640-1

IRD

(%/clay)
50

Sand

(grav/10 cm3)
100

0

CaCO3

(%)
20

0

TOC

(%)
100

0

Radio

(%)
15

0

Smect

(%/sand)
0.5

0

0

PS1648-1

IRD

(%/clay)
50

Sand

(grav/10 cm3)
100

0

CaCO3

(%)
20

0

TOC

(%)
100

0

Radio

(%)
15

0

Smect

(%/sand)
0.5

0

IRD

(%/clay)
50

0

Sand

(grav/10 cm3)
100

0

CaCO3

(%)
20

0

TOC

(%)
100

0

Radio

(%)
15

0

Smect

(%/sand)
0.5

0

(%/clay)
50

0

100

0.0

•

Earth quake events =>
doi:10.1594/GFZ.GEOFON.gfz2009kciu
Climate models => doi:10.1594/WDCC/dphase_mpeps
Sea bed photos => doi:10.1594/PANGAEA.757741
Distributes samples => doi:10.1594/PANGAEA.51749
Medical case studies => doi:10.1594/eaacinet2007/CR/5270407
Computational model => doi:10.4225/02/4E9F69C011BC8
Audio record => doi:10.1594/PANGAEA.339110
Grey Literature => doi:10.2314/GBV:489185967
Videos => doi:10.3207/2959859860
100.0

•
•
•
•
•
•
•
•

Anything that is the foundation
of further research
is research data
200.0

Age (kyr) max. : 233.55 kyr

Data is evidence

11°

12°

PS1389-3ff

13°

14°

15°

55°30'

55°30'

55° 0'

55° 0'

54°30'

54°30'

54° 0'

11°

12°

54° 0'

13°

14°

15°

Scale: 1:2695194 at Latitude 0°
Source: Baltic Sea Research Institute, Warnemünde.

World vector shore line
Grain size class KOLP A
Grain size class KOEHN2
Grain size class KOEHN
Geochemistry
Grain size class KOLP B
G i i
l
KOLP DIN
DataCite Structure
International DOI
Foundation
Member
DataCite

Managing Agent
(TIB)

Member
Institution

Member
Institution

Works
with

…
Data Centre
Data Centre
Data Centre

Associate
Stakeholder

Data Centre
Data Centre
Data Centre
Bridging the gap

DOIs in Use: DataCite
CrossRef has registered more than 51 million DOIs on behalf of scholarly publishers.
But CrossRef DOIs are not the only DOIs available in the scholarly community. DOIs
Publishers
Data centres
for datasets associated with scholarly research are being registered by institutions in
the DataCite network. DataCite and CrossRef have committed to the interoperability
of their DOIs. Ideally, scholarly content like journals will cite related data by the
appropriate DataCite DOI, and in return, the data record will cite the relevant article’s
(from CrossRef Quarterly, January 2012)
CrossRef DOI.
Bridging the gap
Data citation
Connecting article and underlying data via DOI:
The dataset:
Storz, D et al. (2009):
Planktic foraminiferal flux and faunal composition of sediment trap
L1_K276 in the northeastern Atlantic.
http://dx.doi.org/10.1594/PANGAEA.724325
Is supplement to the article:
Storz, David; Schulz, Hartmut; Waniek, Joanna J; Schulz-Bull,
Detlef; Kucera, Michal (2009): Seasonal and interannual
variability of the planktic foraminiferal flux in the vicinity of the
Azores Current.
Deep-Sea Research Part I-Oceanographic Research Papers, 56(1),
107-124,
http://dx.doi.org/10.1016/j.dsr.2008.08.009
Bridging the gap
•

DataCite supports researchers by enabling them to locate,
identify, and cite research datasets with confidence

•

DataCite supports data centres by providing workflows and
standards for data publication

•

DataCite supports publishers by enabling linking from articles
to the underlying data
http://www.datacite.org
http://schema.datacite.org
https://mds.datacite.org
http://search.datacite.org
http://oai.datacite.org
http://data.datacite.org
http://stats.datacite.org
Working Groups
•
•
•
•
•
•
•

Business Practices
Criteria for Data Centers
Identifier Syntax
Metadata
Services
Special Datasets
Technical Infrastructure
MDS: Central portal allowing
access to the metadata from
all registered objects (OAI)
DataCite Metadata 2.2 XML Schema
• Service for displaying DataCite metadata
• Different formats (BibTeX, RIS, RDF, etc.)
• Content Negotation (through MIME-Typ)
– Access through DOI proxy (http://dx.doi.org)
– First implemented by CNRI and CrossRef:
• Service for displaying DataCite metadata in different formats
•(BibTeX, RIS, RDF, etc.)
Documentation:
• A particular representation of the metadata can be requested via
content negotiation or by using DOI proxy (the "http://dx.doi.org"
formulation as a URL) and MIME-type

• http://www.crosscite.org/cn/

• Documentation: http://www.crosscite.org/cn/
Resolution - Current Status
Landing Page
with catalog
metadata
(human-readable)

Client (Web‐Browser) 
requesting PID

Persistent
Identifier
(DOI, URN, …)

Resolver
(DataCite, …)
Mapping Table
PID - URL

Problem
Not machine‐
actionable

Data
Details on
Data
(Rich
Metadata)
Details on
(human-readable)
Data
(Rich
Structured
Metadata)
(machine-
Content Negotiation - Based on the Solution
of CrossRef/DataCite
Web Page on Data
with catalog
metadata
(human-readable)

Client requesting PID
Persistent
Identifier
(DOI, URN, …)

Resolver
(DataCite, …)
Mapping Table
PID - URL

Different Accept Headers
in addition to URL
requesting different 
representations of PID

Details on
Data
(Rich
Metadata)
Details on
(human-readable)
Data
(Rich
Structured
Metadata)
(machineactionable)

Data
List of
repositories
for
research data
Some recent related developments
•
•
•
•

Thomson-Reuters Data Citation Index
ORCID official launch
ODIN European project
CODATA/ICSTI Working Group on Data
Citation
• Creation of the Research Data Alliance
ORCID and DataCite Interoperability Network

« ODIN will build on the ORCID
and DataCite initiatives to
uniquely identify scientists and
data sets and connect this
information across multiple
services and infrastructures for
scholarly communication.
It will address some of the
critical open questions in the
area: Referencing a data
object; Tracking of use and reuse; Links between a data
object, subsets, articles, rights
statements and every person
involved in its life-cycle. »
http://www.codata.org/taskgroups/TGdatacitation/index.html

http://www.codata.org/taskgroups/TGdatacitation/index.html
Thank you

DataCite – Bridging the gap and helping to find, access and reuse data – Herbert Gruttemeier

  • 1.
    DataCite – Bridgingthe gap and helping to find, access and reuse data Herbert Gruttemeier OpenAIREplus workshop February 8th, 2013 Braga
  • 4.
  • 5.
    Publishers’ data policies extract from NaturePublishing Group, Editorial Policies, Availability of data and materials H. GRUTTEMEIER
  • 8.
  • 9.
    At the infrastructurelevel, DOI names are handles. http://www.handle.net
  • 10.
    From KE workshoppresentation, The Hague, June 2011 (L. Lannom)
  • 11.
    From KE workshoppresentation, The Hague, June 2011 (L. Lannom)
  • 12.
    From KE workshoppresentation, The Hague, June 2011 (N. Paskin)
  • 13.
    plutôt: identifiant numériqued’objet « The objects identified by DOI names may be of any form digital, physical, or abstract - as all these forms may be necessary parts of a content management system. The DOI system is an abstract framework which does not specify a particular context of its application, but is designed with the aim of working over the Internet. » Norman Paskin, « Digital Object Identifier (DOI®) System »
  • 14.
    DataCite • • • • Global consortium carriedby local institutions Focused on improving the scholarly infrastructure around datasets and other non-textual information Focused on working with data centres and organisations that hold data Providing standards, workflows and best-practice Initially, but not exclusively based on the DOI system • • Memorandum of Understanding, Paris, February 2009 Officially founded December 1st 2009 in London •
  • 15.
    DataCite Members • TechnischeInformationsbibliothek (TIB), Germany • Canada Institute for Scientific and Technical Information (CISTI) • California Digital Library, USA • Purdue University, USA • Office of Scientific and Technical Information (OSTI), USA • The British Library • Technical Information Center of Denmark (DTU) • Library of TU Delft, The Netherlands • ZBMed, Germany • ZBW, Germany • GESIS, Germany • Library of ETH Zürich, Switzerland • Institut de l’Information Scientifique et Technique (INIST-CNRS), France • Swedish National Data Service (SND) • Australian National Data Service (ANDS) • Conferenza dei Rettori delle Università Italiane (CRUI) • National Research Council of Thailand (NRCT) Affiliated members: • Digital Curation Center, UK • Microsoft Research • Interuniversity Consortium for Political and Social Research (ICPSR), USA • Korea Institute of Science and Technology Information (KISTI) • Bejiing Genomic Institute (BGI)
  • 16.
    DataCite The DataCite registrationagency – – – – Maintains the resolution infrastructure Maintains a searchable database of metadata Manages the identifiers over the long term Establishes and shares best practice Publishing agents (data centres, research institutes, data publishers) are responsible for – – – – Quality assurance Content storage and access Creating the identifiers Creating and updating metadata
  • 17.
    What type ofdata are we talking about? PS1389-3 PS1390-3 IRD Sand (grav/10 cm3) 0 CaCO3 (%) 20 0 TOC (%) 100 0 Radio (%) 15 0 Smect (%/sand) 0.5 0 0 PS1431-1 IRD (%/clay) 50 Sand (grav/10 cm3) 100 0 CaCO3 (%) 20 0 TOC (%) 100 0 Radio (%) 15 0 Smect (%/sand) 0.5 0 0 PS1640-1 IRD (%/clay) 50 Sand (grav/10 cm3) 100 0 CaCO3 (%) 20 0 TOC (%) 100 0 Radio (%) 15 0 Smect (%/sand) 0.5 0 0 PS1648-1 IRD (%/clay) 50 Sand (grav/10 cm3) 100 0 CaCO3 (%) 20 0 TOC (%) 100 0 Radio (%) 15 0 Smect (%/sand) 0.5 0 IRD (%/clay) 50 0 Sand (grav/10 cm3) 100 0 CaCO3 (%) 20 0 TOC (%) 100 0 Radio (%) 15 0 Smect (%/sand) 0.5 0 (%/clay) 50 0 100 0.0 • Earth quake events => doi:10.1594/GFZ.GEOFON.gfz2009kciu Climate models => doi:10.1594/WDCC/dphase_mpeps Sea bed photos => doi:10.1594/PANGAEA.757741 Distributes samples => doi:10.1594/PANGAEA.51749 Medical case studies => doi:10.1594/eaacinet2007/CR/5270407 Computational model => doi:10.4225/02/4E9F69C011BC8 Audio record => doi:10.1594/PANGAEA.339110 Grey Literature => doi:10.2314/GBV:489185967 Videos => doi:10.3207/2959859860 100.0 • • • • • • • • Anything that is the foundation of further research is research data 200.0 Age (kyr) max. : 233.55 kyr Data is evidence 11° 12° PS1389-3ff 13° 14° 15° 55°30' 55°30' 55° 0' 55° 0' 54°30' 54°30' 54° 0' 11° 12° 54° 0' 13° 14° 15° Scale: 1:2695194 at Latitude 0° Source: Baltic Sea Research Institute, Warnemünde. World vector shore line Grain size class KOLP A Grain size class KOEHN2 Grain size class KOEHN Geochemistry Grain size class KOLP B G i i l KOLP DIN
  • 18.
    DataCite Structure International DOI Foundation Member DataCite ManagingAgent (TIB) Member Institution Member Institution Works with … Data Centre Data Centre Data Centre Associate Stakeholder Data Centre Data Centre Data Centre
  • 19.
    Bridging the gap DOIsin Use: DataCite CrossRef has registered more than 51 million DOIs on behalf of scholarly publishers. But CrossRef DOIs are not the only DOIs available in the scholarly community. DOIs Publishers Data centres for datasets associated with scholarly research are being registered by institutions in the DataCite network. DataCite and CrossRef have committed to the interoperability of their DOIs. Ideally, scholarly content like journals will cite related data by the appropriate DataCite DOI, and in return, the data record will cite the relevant article’s (from CrossRef Quarterly, January 2012) CrossRef DOI.
  • 20.
  • 21.
    Data citation Connecting articleand underlying data via DOI: The dataset: Storz, D et al. (2009): Planktic foraminiferal flux and faunal composition of sediment trap L1_K276 in the northeastern Atlantic. http://dx.doi.org/10.1594/PANGAEA.724325 Is supplement to the article: Storz, David; Schulz, Hartmut; Waniek, Joanna J; Schulz-Bull, Detlef; Kucera, Michal (2009): Seasonal and interannual variability of the planktic foraminiferal flux in the vicinity of the Azores Current. Deep-Sea Research Part I-Oceanographic Research Papers, 56(1), 107-124, http://dx.doi.org/10.1016/j.dsr.2008.08.009
  • 24.
    Bridging the gap • DataCitesupports researchers by enabling them to locate, identify, and cite research datasets with confidence • DataCite supports data centres by providing workflows and standards for data publication • DataCite supports publishers by enabling linking from articles to the underlying data http://www.datacite.org http://schema.datacite.org https://mds.datacite.org http://search.datacite.org http://oai.datacite.org http://data.datacite.org http://stats.datacite.org
  • 25.
    Working Groups • • • • • • • Business Practices Criteriafor Data Centers Identifier Syntax Metadata Services Special Datasets Technical Infrastructure
  • 26.
    MDS: Central portalallowing access to the metadata from all registered objects (OAI)
  • 28.
  • 32.
    • Service fordisplaying DataCite metadata • Different formats (BibTeX, RIS, RDF, etc.) • Content Negotation (through MIME-Typ) – Access through DOI proxy (http://dx.doi.org) – First implemented by CNRI and CrossRef: • Service for displaying DataCite metadata in different formats •(BibTeX, RIS, RDF, etc.) Documentation: • A particular representation of the metadata can be requested via content negotiation or by using DOI proxy (the "http://dx.doi.org" formulation as a URL) and MIME-type • http://www.crosscite.org/cn/ • Documentation: http://www.crosscite.org/cn/
  • 33.
    Resolution - CurrentStatus Landing Page with catalog metadata (human-readable) Client (Web‐Browser)  requesting PID Persistent Identifier (DOI, URN, …) Resolver (DataCite, …) Mapping Table PID - URL Problem Not machine‐ actionable Data Details on Data (Rich Metadata) Details on (human-readable) Data (Rich Structured Metadata) (machine-
  • 34.
    Content Negotiation -Based on the Solution of CrossRef/DataCite Web Page on Data with catalog metadata (human-readable) Client requesting PID Persistent Identifier (DOI, URN, …) Resolver (DataCite, …) Mapping Table PID - URL Different Accept Headers in addition to URL requesting different  representations of PID Details on Data (Rich Metadata) Details on (human-readable) Data (Rich Structured Metadata) (machineactionable) Data
  • 35.
  • 36.
    Some recent relateddevelopments • • • • Thomson-Reuters Data Citation Index ORCID official launch ODIN European project CODATA/ICSTI Working Group on Data Citation • Creation of the Research Data Alliance
  • 39.
    ORCID and DataCiteInteroperability Network « ODIN will build on the ORCID and DataCite initiatives to uniquely identify scientists and data sets and connect this information across multiple services and infrastructures for scholarly communication. It will address some of the critical open questions in the area: Referencing a data object; Tracking of use and reuse; Links between a data object, subsets, articles, rights statements and every person involved in its life-cycle. »
  • 40.
  • 41.