How do DataCite and Crossref
support research data sharing?
Britta Dreyer, DataCite
June 27, 2018
DataCite - Introduction
.....100 Members in 24 countries ....1500 data centers.... 14 million DOIs.....
Connecting research, identifying knowledge.
2
Persistent identifiers:
connecting research
Institutions
Publications
Funders
Researchers & Contributors
Grants and Projects
Data and Software
Cruse, Patricia (2018): Connecting Research(ers): DataCite in 5 Minutes. ORCID.
Presentation. https://doi.org/10.23640/07243.6281405.v1
3
ORG iD WG
Joint Declaration of Data Citation
Principles- Force11(2014)
2. Credit and Attribution
Data citations should facilitate giving scholarly credit and
normative and legal attribution to all contributors to the data,
recognizing that a single style or mechanism of attribution may not be
applicable to all data.
Data Citation Synthesis Group. (2014). Joint Declaration of Data Citation
Principles. Force11. https://doi.org/10.25490/a97f-egyk 4
Making Data Count (2014-2015)
Survey results (247 researchers responded)
5
Relation types =
isSupplementedB
y
references
Relation types =
isSupplementedT
o
isReferencedBy
DOI reference from
reference list of the article
Metadata
Event Data (2015)
The staging (preview) instance of the Scholix endpoint:
http://api.eventdata.crossref.org/v1/events/scholix?rows=100
❏ Only the publisher of the DOI can add the relations to the
metadata
❏ Crossref and DataCite provide relations between their DOIs as
a service – but don‘t add them to the metadata
6
Information Model
Burton et. al (2017). The Scholix Framework for Interoperability in Data-Literature Information Exchange.
D-Lib Magazine. https://doi.org/10.1045/january2017-burton 7
Scholarly Link Exchange
→ RDA/WDS Scholarly Link Exchange Working Group
Scholix exchange framework for data-article information links
• A consensus among publishers, datacentres, and global/ domain service
providers to work collaboratively and systematically to improve exchange of
data-literature link information
• Information model: conceptual definition of what is a Scholix scholarly link
• Link metadata schema: metadata representation of a Scholix link
• Options for exchange protocols (forthcoming)
(2016)
8
(2017 – 2019)
Lowenberg, Daniella; Cruse, Patricia; Chodacki, John; Fenner, Martin (2018): PIDapalooza Make Data Count
Slides. figshare. Presentation. https://doi.org/10.6084/m9.figshare.5818002.v1
9
Example Landing page
with standardized metrics
Lowenberg, D., Budden, A., Cruse, T. (2018). It’s Time to Make Your Data Count!.
https://doi.org/10.5438/pre3-2f25 10
MVP and future versions
11
Minimum viable product (MVP):
● Views and Downloads: Internal logs are processed against the Code of
Practice and send standard formatted usage logs to a DataCite hub for
public use and eventually, aggregation.
● Citations: Citation information is pulled from Crossref Event Data.
Future versions:
● details about where the data are being accessed
● volume of data being accessed
● citation details
● social media activity
Lowenberg, Daniella; Budden, Amber; Cruse, Patricia (2018): It’s Time to Make Your Data
Count! https://doi.org/10.5438/pre3-2f25
Snapshot March 2018
▪ 870,000 links between Crossref DOIs and DataCite DOIs
✓ 850,000 links originating from DataCite DOIs
✓ 22,000 links originating from Crossref DOIs
Garza, K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation.
https://doi.org/10.5438/h16y-3d72
12
There is improvement
Garza, K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation.
https://doi.org/10.5438/h16y-3d72
13
FREYA (2017 - 2020)
• EU funded, 3 years, 5 million, 11 partners
Goals:
• Harmonizing PID resolution and metadata;
• improve discoverability of research outputs and
associated researchers;
• expose links between research outputs, and between
research outputs and associated researchers;
• integrate other identifiers
• Build a search interface for all DOI metadata
https://www.project-freya.eu/en
14
Conclusions
1. Many journals still have no stated policy on research data.
2. The majority of data citations are done within the text of the
publication, without including them in the reference list.
3. Many datasets are not using DOIs as a persistent identifier, in
particular in the life sciences, where accession numbers are more
common.
4. Many publishers integrate backlinks to their articles into their
publishing platforms instead of providing the links themselves.
5. Coordinating the publication of an article and associated data is
complicated in particular if the dataset is not published before
submission of the manuscript, but in parallel to the article.
Garza, K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation.
https://doi.org/10.5438/h16y-3d72 15
Next Steps
1. To provide incentives to authors, data citation counts and actual data
citations should be displayed on repository landing pages for
datasets.
2. Look at other approaches to collect data citations, such as text-mining
for DOIs.
3. Article-data publishing workflows need to be simplified and
standardized. (Enabling FAIR Data project)
a) DataCite to better integrate with metadata lookup in publishing
tools, e.g. to validate reference lists, and
b) Coordination from DataCite and Crossref in early DOI registration for
article and data after manuscript acceptance, before publication.
1. Develop a repository selection tool for researchers based on the
information available in re3data (Enabling FAIR Data project)
Garza, K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation.
https://doi.org/10.5438/h16y-3d72
16
What can you do?
● Check out our How-To Guide as described by the California Digital Library
implementation of Make Data Count. Tips and tools (e.g. a Python Log
Processor) are detailed in this guide and available on our public Github. Links
in this guide also point to the DataCite documentation necessary for
implementation.
● Join our project team for a webinar on how to implement Make Data Count
at your repository and learn more about the project on Tuesday, July 10th at
8am PST/11am EST. Webinar link:http://bit.ly/2xJEA4n.
17

How DataCite and Crossref Support Research Data Sharing - Crossref LIVE Hannover

  • 1.
    How do DataCiteand Crossref support research data sharing? Britta Dreyer, DataCite June 27, 2018
  • 2.
    DataCite - Introduction .....100Members in 24 countries ....1500 data centers.... 14 million DOIs..... Connecting research, identifying knowledge. 2
  • 3.
    Persistent identifiers: connecting research Institutions Publications Funders Researchers& Contributors Grants and Projects Data and Software Cruse, Patricia (2018): Connecting Research(ers): DataCite in 5 Minutes. ORCID. Presentation. https://doi.org/10.23640/07243.6281405.v1 3 ORG iD WG
  • 4.
    Joint Declaration ofData Citation Principles- Force11(2014) 2. Credit and Attribution Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. Data Citation Synthesis Group. (2014). Joint Declaration of Data Citation Principles. Force11. https://doi.org/10.25490/a97f-egyk 4
  • 5.
    Making Data Count(2014-2015) Survey results (247 researchers responded) 5
  • 6.
    Relation types = isSupplementedB y references Relationtypes = isSupplementedT o isReferencedBy DOI reference from reference list of the article Metadata Event Data (2015) The staging (preview) instance of the Scholix endpoint: http://api.eventdata.crossref.org/v1/events/scholix?rows=100 ❏ Only the publisher of the DOI can add the relations to the metadata ❏ Crossref and DataCite provide relations between their DOIs as a service – but don‘t add them to the metadata 6
  • 7.
    Information Model Burton et.al (2017). The Scholix Framework for Interoperability in Data-Literature Information Exchange. D-Lib Magazine. https://doi.org/10.1045/january2017-burton 7
  • 8.
    Scholarly Link Exchange →RDA/WDS Scholarly Link Exchange Working Group Scholix exchange framework for data-article information links • A consensus among publishers, datacentres, and global/ domain service providers to work collaboratively and systematically to improve exchange of data-literature link information • Information model: conceptual definition of what is a Scholix scholarly link • Link metadata schema: metadata representation of a Scholix link • Options for exchange protocols (forthcoming) (2016) 8
  • 9.
    (2017 – 2019) Lowenberg,Daniella; Cruse, Patricia; Chodacki, John; Fenner, Martin (2018): PIDapalooza Make Data Count Slides. figshare. Presentation. https://doi.org/10.6084/m9.figshare.5818002.v1 9
  • 10.
    Example Landing page withstandardized metrics Lowenberg, D., Budden, A., Cruse, T. (2018). It’s Time to Make Your Data Count!. https://doi.org/10.5438/pre3-2f25 10
  • 11.
    MVP and futureversions 11 Minimum viable product (MVP): ● Views and Downloads: Internal logs are processed against the Code of Practice and send standard formatted usage logs to a DataCite hub for public use and eventually, aggregation. ● Citations: Citation information is pulled from Crossref Event Data. Future versions: ● details about where the data are being accessed ● volume of data being accessed ● citation details ● social media activity Lowenberg, Daniella; Budden, Amber; Cruse, Patricia (2018): It’s Time to Make Your Data Count! https://doi.org/10.5438/pre3-2f25
  • 12.
    Snapshot March 2018 ▪870,000 links between Crossref DOIs and DataCite DOIs ✓ 850,000 links originating from DataCite DOIs ✓ 22,000 links originating from Crossref DOIs Garza, K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation. https://doi.org/10.5438/h16y-3d72 12
  • 13.
    There is improvement Garza,K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation. https://doi.org/10.5438/h16y-3d72 13
  • 14.
    FREYA (2017 -2020) • EU funded, 3 years, 5 million, 11 partners Goals: • Harmonizing PID resolution and metadata; • improve discoverability of research outputs and associated researchers; • expose links between research outputs, and between research outputs and associated researchers; • integrate other identifiers • Build a search interface for all DOI metadata https://www.project-freya.eu/en 14
  • 15.
    Conclusions 1. Many journalsstill have no stated policy on research data. 2. The majority of data citations are done within the text of the publication, without including them in the reference list. 3. Many datasets are not using DOIs as a persistent identifier, in particular in the life sciences, where accession numbers are more common. 4. Many publishers integrate backlinks to their articles into their publishing platforms instead of providing the links themselves. 5. Coordinating the publication of an article and associated data is complicated in particular if the dataset is not published before submission of the manuscript, but in parallel to the article. Garza, K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation. https://doi.org/10.5438/h16y-3d72 15
  • 16.
    Next Steps 1. Toprovide incentives to authors, data citation counts and actual data citations should be displayed on repository landing pages for datasets. 2. Look at other approaches to collect data citations, such as text-mining for DOIs. 3. Article-data publishing workflows need to be simplified and standardized. (Enabling FAIR Data project) a) DataCite to better integrate with metadata lookup in publishing tools, e.g. to validate reference lists, and b) Coordination from DataCite and Crossref in early DOI registration for article and data after manuscript acceptance, before publication. 1. Develop a repository selection tool for researchers based on the information available in re3data (Enabling FAIR Data project) Garza, K., Fenner, M. (2018). Glad You Asked: A Snapshot of the Current State of Data Citation. https://doi.org/10.5438/h16y-3d72 16
  • 17.
    What can youdo? ● Check out our How-To Guide as described by the California Digital Library implementation of Make Data Count. Tips and tools (e.g. a Python Log Processor) are detailed in this guide and available on our public Github. Links in this guide also point to the DataCite documentation necessary for implementation. ● Join our project team for a webinar on how to implement Make Data Count at your repository and learn more about the project on Tuesday, July 10th at 8am PST/11am EST. Webinar link:http://bit.ly/2xJEA4n. 17