This document discusses data citation and how to implement it for publishers and data repositories. It covers how publishers can include data citations in their Crossref metadata and how repositories can link datasets to publications. It also introduces the Crossref Event Data service, which captures these data citations and other relationships between DOIs and makes them openly available via APIs. This allows data citations to be more widely discovered and adopted.
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
Data citations unlock research insights
1. Data, data, everywhere? Not nearly enough!
Rachael Lammey, Crossref https://orcid.org/0000-0001-5800-1434
Mary Hirsch, DataCite https://orcid.org/0000-0002-6628-8225
UKSG Group C
2. ● Introducing data citation
● Data citation for publishers
○ How to
● Data citation for data repositories
○ How to
● Using data citation information (Event Data API)
● Summary/wrap up
● Q&A
We’ll cover
3. ● References to data, the same way researchers routinely provide a
bibliographic reference to other scholarly resources
● Data are often shared, but they are not often cited the same way as
journal articles or other publications.
● Let’s change that!
What is data citation?
HT to Patricia Cruse
4. ● Access
● Transparency and reproducibility
● Reuse
● Track, measure, and count
● Credit
● Mandates
○ Funders
○ Publishers
Why cite data?
HT to Patricia Cruse
5. Data citation for publishers
• Supports scholarship
• Extends research
• Data cited consistently provides
• Transparency
• Context
“eLife is committed to ensuring researchers get credit for all their outputs, and data
is a major component of this.” -- Melissa Harrison, eLife
6. Data citation for publishers
Step 1
• Develop a data policy that includes data citation - not too scary
○ Jones, L., Grant, R., & Hrynaszkiewicz, I. (2019). Implementing publisher policies that inform, support and
encourage authors to share data: two case studies. Insights, 32(1), 11. DOI: http://doi.org/10.1629/uksg.463
Step 2
• Explain to authors how they should be citing data - not too scary
Step 3
• Update internal workflows, including your DTD and instructions to suppliers - quite scary but others
have done it and are happy to help
Step 4
• Include these citations in your Crossref metadata - Crossref is making this as easy as possible!
Cousijn, H. et al. A data citation roadmap for scientific publishers. Sci. Data. 5:180259 doi: 10.1038/sdata.2018.259 (2018).
7. Data citation for publishers
eLife 2019;8:e43599 DOI: 10.7554/eLife.43599
8. Including data citations in Crossref metadata
1. References - include data citations in the citations you
register with Crossref
<citation key="ref2">
<doi>10.6084/m9.figshare.5981968</doi>
</citation>
1. Relations - include relationships between DOIs and other
items in your Crossref metadata records
<rel:related_item>
<rel:description>Sicard-2018-External-database-S1
</rel:description>
<rel:inter_work_relation identifier-type="doi" relationship-
type="references">
10.6084/m9.figshare.5981968
</rel:inter_work_relation>
</rel:related_item>
9. Future plans
Expanded citation markup with publication types
<citation key="ref3" publication_type=”data”>
<author>Morinha F</author>
<cYear>2017</cYear>
<institution>Dryad Digital Repository</institution>
<title>Data from: Extreme genetic structure in a social bird species despite high
dispersal capacity</title>
<doi>10.5061/dryad.684v0</doi>
<identifier type=”accession”>ABC123</identifier>
<unstructured_citation>Morinha F, Dávila JA, Estela B, Cabral JA, Frías Ó, González JL,
Travassos P, Carvalho D, Milá B, Blanco G (2017) Data from: Extreme genetic structure in a
social bird species despite high dispersal capacity. Dryad Digital Repository.
http://dx.doi.org/10.5061/dryad.684v0</unstructured_citation>
</citation>
10. References
● Good option for publishers who make
their reference metadata openly
available via Crossref
● Can be a better fit with their existing
workflows
● Data Citations made openly available
via Crossref’s APIs
● Citations with DataCite DOI are sent
to Event Data
Relations
● At the moment, these can give more
context about the data/article
relationship (can specify ‘references’
or ‘isSupplementedBy’ as relation
type)
● Good option for publishers who don’t
make their reference information
visible via Crossref
● Relations are available via metadata
APIs
● Future: will be events in Event Data
11. Data citation for repositories
Data repositories recognize the importance of establishing links between datasets
and articles
➢ ‘Data citation makes data visible to the research community. Without it, data
cannot be accessed for re-use or reproduced for transparency’ - ICPSR
➢ ‘Data publishers efforts often risk going unnoticed, and the true impact of sharing
data remains invisible’ - GBIF
https://doi.org/10.5438/tyey-k867
12. Data citation for repositories - how?
Step 1
● Researchers ensure associated publications indicated somewhere in dataset metadata
(many repositories do additional curation work to establish these links)
Step 2
● Include these links in your DataCite metadata (instructions on next slide)
Step 3
● Publishers can access this information and link back from the article to the dataset
16. Data Citation in Three Steps
Reference to dataset
included in article
metadata (Crossref) and/or
reference to article
included in dataset
metadata (DataCite).
Crossref DOI <-> DataCite
DOI extracted from
metadata, stored in Event
Data service and made
available via APIs.
New services and updates
of existing services, e.g.
DataCite Search, that
integrate Event Data
information.
1 2 3
17. Data Citation in Four Steps
Work with bibliometrics
community to understand data
citation data provided via Event
Data, and develop data metrics.
4
18. Event Data
● Service jointly developed by Crossref and DataCite to capture references, mentions
and other events around DOIs that are not provided via DOI metadata.
● Also includes references between different DOI registration agencies (like data
citations!)
● Events is used as a broad term to for example also include social media mentions
and usage statistics.
● Data citations are a small subset of the events captured by the Event Data service.
https://twitter.com/aarontay/status/1111600204858290176
19. Event Data Service is crucial for Data Citation
Before Event Data
● Data citations needed to be found in DOI metadata, separate for Crossref
and DataCite.
● Doesn’t scale well, limiting adoption of data citation.
After Event Data
● Data citations are extracted into a separate service and can easily be
found.
● Scales well, will lead to adoption of data citation.
22. Event Data Service is crucial for Data Citation
Additional Data Sources
Event Data can be extended to include other types of events outside the usual
article-to-dataset relations.
● Crossref is pulling social media information into Event Data (like Twitter
and Wikipedia)
● DataCite is sending data repository usage reports to Event Data (views
and downloads)
24. Event Data APIs
Crossref Event Data Query
API
Query Event Data using a
large number of
parameters.
https://www.eventdata.cros
sref.org/guide/
DataCite REST API
Query Event Data that
include a DataCite DOI.
Integrated into the DataCite
REST API, includes basic
DOI metadata, and
aggregations.
https://support.datacite.org
/docs/eventdata-guide
Scholix-compatible API
Data citations in Event Data
using the Scholix metadata
standard. MORE INFO
1 2 3
25. Event Data APIs
● These are openly available, anyone can use them
● They’re free to use (and we encourage you to do so!)
26. In summary
● There are lots of good reasons for data citation
● Collecting data citations from authors is a good place to start!
● Publishers can cite data in their Crossref metadata
● Data repositories can link their data to the articles that cite them
● Data citation information is then made openly available via Event
Data so that anyone interested can find and use this information.
● In future, we expect that lots of services will build off Event
Data!
28. ● Crossref Data & Software Citation Deposit Guide for Publishers:
https://support.crossref.org/hc/en-us/articles/215787303-Crossref-Data-
Software-Citation-Deposit-Guide-for-Publishers
● Crossref Event Data: https://www.crossref.org/services/event-data/
● A Data Citation Roadmap for Scientific Publishers:
https://doi.org/10.1038/sdata.2018.259
● A Data Citation Roadmap for Scholarly Data Repositories
https://doi.org/10.1101/097196
● Make Data Count webinar:
https://www.youtube.com/watch?v=Lkysz0Mc7fo
● support@crossref.org/support@datacite.org
Useful links and contacts:
Editor's Notes
Rachael: The underlying data created and/or reused and remixed for research is becoming as crucial as the resulting text-based output. This is your opportunity to dig into the what, the why, and the how of data publication, data citation, and data sharing. Workshop hosts will cover this topic from a range of perspectives. Let’s review the best practices and case studies in data citation and data publishing, add to our collective understanding of why this is so important, and contribute to the next steps in building solutions to improving infrastructure for research data.
Both do intros
Rachael
Mary
Borrowing heavily from Patricia Cruse, Data Citations and why they matter, Crossref LIVE18 - I’ll provide a link to this in a later slide.
Precedence/context:
One of the first large-scale initiatives to establish data citation as a standard academic practice was the FORCE11 Joint Declaration of Data Citation Principles (JDDCP) in 2014. This declaration was endorsed by over 100 organizations in the scholarly community as well as many individuals.
Following this agreement on how data citation should be done, many projects followed. Within FORCE11, the Data Citation Implementation Pilot brought together publishers and repositories to put data citation into practice and work on the implementation of the JDDCP. Within the context of the Research Data Alliance, a data-literature linking group started under the name of Scholix to establish a framework for exchanging information about the relationships between articles and datasets. The infrastructure building blocks now feed into projects such as Make Data Count and Enabling FAIR Data.
Mary
Lots of reasons! If datasets are cited consistently and in a standard way, it will make it much easier for the research community to see links between different research outputs and work with these outputs. It also makes it much easier to count these citations, so that researchers can get credit for their data and the sharing of that data.
1) Transparency and reproducibility
Most scientific results that are shared today are just a summary of what researchers did and found. The underlying data are not available, making it difficult to verify and replicate results. If data would always be made available with publications, transparency of research would be greatly improved.
2) Reuse
The availability of raw data allows other researchers to reuse the data. Not just for replication purposes, but to answer new research questions.
3) Credit
When researchers cite the data they used, this forms the basis for a data credit system. Right now researchers are not really incentivized to share their data, because nobody is looking at data metrics and measuring their impact. Data citation is a first step towards changing that.
Rachael
We appreciate it’s not as easy as just putting extra info in your XML! T&F and Springer Nature explain their policies in this Insights article.
Let us know if we can help!
Published in the Data C
Here’s a nice example from eLife - explaining what data is available, how it is being made available and providing the persistent identifier for the data - in this case a DataCite DOI. The existance of the persistent identifier is key in taking the next steps for data citation.
From a Crossref perspective, it’s important that this information finds it into the metadata that publishers register with us, in order for us to share that information. Simplest way is just the DOI but you can put the full citation in there too.
References:
Data citations are citations / we currently support with DataCite DOIs / our limited citation markup support makes identifying other data citations difficult
Relations:
Citations are relations (‘references’)/ allows you to include a more granular type of relationship as well as a description / can append to existing record without touching citations
Mary
Mary
Scholix recommendation
IsCitedBy will go into Event Data
Fields refer to DataCite metadata (all 3 need to be populated to appear in event data)
DataCite metadata schema which is aligned with the Scholix metadata schema
There is no single relation type to describe citations, and relation types relevant for citations are sometimes used differently across organizations.
JSON
Here it is in REAL LIFE!
DataCite Search
Rachael
Rachael
Rachael - example of flow of information.
Example slide 2. Makes possible - citation information on data (Make Data Count is doing good work around metrics around data). Scholexplorer - data-literature links
MDC & Scholix work hand in hand to advocate for best data citation practices
Scholix is an information framework for submitting data citations
Mary
contributions > hubs
https://datascience.codata.org/articles/10.5334/dsj-2019-009/
Sneak peak
Third parties build services based on the information provided by Event Data
Mary
Scholix > recommendations article >< dataset provides guidance (openaire get links from DataCite scholix compatible APi)
Scholix specifies a format for information sharing that was agreed upon within scholix group (format!)
Rachael
Rachael (Mary, this is you and I being happy about lots of services building off event data). Yaaay
We need you to do this so that there is stuff to collect.
End slide: Rachael can moderate questions
Feel free to add!
Do a shout out to people - if they want to talk about the work they’re doing then get in touch!Rachael/Mary presenting at UKSG.