Data, data, everywhere? Not nearly enough!
Rachael Lammey, Crossref https://orcid.org/0000-0001-5800-1434
Mary Hirsch, DataCite https://orcid.org/0000-0002-6628-8225
UKSG Group C
● Introducing data citation
● Data citation for publishers
○ How to
● Data citation for data repositories
○ How to
● Using data citation information (Event Data API)
● Summary/wrap up
● Q&A
We’ll cover
● References to data, the same way researchers routinely provide a
bibliographic reference to other scholarly resources
● Data are often shared, but they are not often cited the same way as
journal articles or other publications.
● Let’s change that!
What is data citation?
HT to Patricia Cruse
● Access
● Transparency and reproducibility
● Reuse
● Track, measure, and count
● Credit
● Mandates
○ Funders
○ Publishers
Why cite data?
HT to Patricia Cruse
Data citation for publishers
• Supports scholarship
• Extends research
• Data cited consistently provides
• Transparency
• Context
“eLife is committed to ensuring researchers get credit for all their outputs, and data
is a major component of this.” -- Melissa Harrison, eLife
Data citation for publishers
Step 1
• Develop a data policy that includes data citation - not too scary
○ Jones, L., Grant, R., & Hrynaszkiewicz, I. (2019). Implementing publisher policies that inform, support and
encourage authors to share data: two case studies. Insights, 32(1), 11. DOI: http://doi.org/10.1629/uksg.463
Step 2
• Explain to authors how they should be citing data - not too scary
Step 3
• Update internal workflows, including your DTD and instructions to suppliers - quite scary but others
have done it and are happy to help
Step 4
• Include these citations in your Crossref metadata - Crossref is making this as easy as possible!
Cousijn, H. et al. A data citation roadmap for scientific publishers. Sci. Data. 5:180259 doi: 10.1038/sdata.2018.259 (2018).
Data citation for publishers
eLife 2019;8:e43599 DOI: 10.7554/eLife.43599
Including data citations in Crossref metadata
1. References - include data citations in the citations you
register with Crossref
<citation key="ref2">
<doi>10.6084/m9.figshare.5981968</doi>
</citation>
1. Relations - include relationships between DOIs and other
items in your Crossref metadata records
<rel:related_item>
<rel:description>Sicard-2018-External-database-S1
</rel:description>
<rel:inter_work_relation identifier-type="doi" relationship-
type="references">
10.6084/m9.figshare.5981968
</rel:inter_work_relation>
</rel:related_item>
Future plans
Expanded citation markup with publication types
<citation key="ref3" publication_type=”data”>
<author>Morinha F</author>
<cYear>2017</cYear>
<institution>Dryad Digital Repository</institution>
<title>Data from: Extreme genetic structure in a social bird species despite high
dispersal capacity</title>
<doi>10.5061/dryad.684v0</doi>
<identifier type=”accession”>ABC123</identifier>
<unstructured_citation>Morinha F, Dávila JA, Estela B, Cabral JA, Frías Ó, González JL,
Travassos P, Carvalho D, Milá B, Blanco G (2017) Data from: Extreme genetic structure in a
social bird species despite high dispersal capacity. Dryad Digital Repository.
http://dx.doi.org/10.5061/dryad.684v0</unstructured_citation>
</citation>
References
● Good option for publishers who make
their reference metadata openly
available via Crossref
● Can be a better fit with their existing
workflows
● Data Citations made openly available
via Crossref’s APIs
● Citations with DataCite DOI are sent
to Event Data
Relations
● At the moment, these can give more
context about the data/article
relationship (can specify ‘references’
or ‘isSupplementedBy’ as relation
type)
● Good option for publishers who don’t
make their reference information
visible via Crossref
● Relations are available via metadata
APIs
● Future: will be events in Event Data
Data citation for repositories
Data repositories recognize the importance of establishing links between datasets
and articles
➢ ‘Data citation makes data visible to the research community. Without it, data
cannot be accessed for re-use or reproduced for transparency’ - ICPSR
➢ ‘Data publishers efforts often risk going unnoticed, and the true impact of sharing
data remains invisible’ - GBIF
https://doi.org/10.5438/tyey-k867
Data citation for repositories - how?
Step 1
● Researchers ensure associated publications indicated somewhere in dataset metadata
(many repositories do additional curation work to establish these links)
Step 2
● Include these links in your DataCite metadata (instructions on next slide)
Step 3
● Publishers can access this information and link back from the article to the dataset
Adding links to your metadata deposit
Example metadata
DataCite DOI
http://doi.org/10.17035/d.2018.0055749445
Cardiff University Research Portal
Crossref DOI
https://doi.org/10.3389/fmats.2018.00051
Data Citation in Three Steps
Reference to dataset
included in article
metadata (Crossref) and/or
reference to article
included in dataset
metadata (DataCite).
Crossref DOI <-> DataCite
DOI extracted from
metadata, stored in Event
Data service and made
available via APIs.
New services and updates
of existing services, e.g.
DataCite Search, that
integrate Event Data
information.
1 2 3
Data Citation in Four Steps
Work with bibliometrics
community to understand data
citation data provided via Event
Data, and develop data metrics.
4
Event Data
● Service jointly developed by Crossref and DataCite to capture references, mentions
and other events around DOIs that are not provided via DOI metadata.
● Also includes references between different DOI registration agencies (like data
citations!)
● Events is used as a broad term to for example also include social media mentions
and usage statistics.
● Data citations are a small subset of the events captured by the Event Data service.
https://twitter.com/aarontay/status/1111600204858290176
Event Data Service is crucial for Data Citation
Before Event Data
● Data citations needed to be found in DOI metadata, separate for Crossref
and DataCite.
● Doesn’t scale well, limiting adoption of data citation.
After Event Data
● Data citations are extracted into a separate service and can easily be
found.
● Scales well, will lead to adoption of data citation.
Here’s an example
Crossref article
metadata
article
Crossref Event
Data
Event Data Service is crucial for Data Citation
Additional Data Sources
Event Data can be extended to include other types of events outside the usual
article-to-dataset relations.
● Crossref is pulling social media information into Event Data (like Twitter
and Wikipedia)
● DataCite is sending data repository usage reports to Event Data (views
and downloads)
View and downloads
Event Data APIs
Crossref Event Data Query
API
Query Event Data using a
large number of
parameters.
https://www.eventdata.cros
sref.org/guide/
DataCite REST API
Query Event Data that
include a DataCite DOI.
Integrated into the DataCite
REST API, includes basic
DOI metadata, and
aggregations.
https://support.datacite.org
/docs/eventdata-guide
Scholix-compatible API
Data citations in Event Data
using the Scholix metadata
standard. MORE INFO
1 2 3
Event Data APIs
● These are openly available, anyone can use them
● They’re free to use (and we encourage you to do so!)
In summary
● There are lots of good reasons for data citation
● Collecting data citations from authors is a good place to start!
● Publishers can cite data in their Crossref metadata
● Data repositories can link their data to the articles that cite them
● Data citation information is then made openly available via Event
Data so that anyone interested can find and use this information.
● In future, we expect that lots of services will build off Event
Data!
Thank you!
Questions?
● Crossref Data & Software Citation Deposit Guide for Publishers:
https://support.crossref.org/hc/en-us/articles/215787303-Crossref-Data-
Software-Citation-Deposit-Guide-for-Publishers
● Crossref Event Data: https://www.crossref.org/services/event-data/
● A Data Citation Roadmap for Scientific Publishers:
https://doi.org/10.1038/sdata.2018.259
● A Data Citation Roadmap for Scholarly Data Repositories
https://doi.org/10.1101/097196
● Make Data Count webinar:
https://www.youtube.com/watch?v=Lkysz0Mc7fo
● support@crossref.org/support@datacite.org
Useful links and contacts:

Data, data, everywhere? Not nearly enough!

  • 1.
    Data, data, everywhere?Not nearly enough! Rachael Lammey, Crossref https://orcid.org/0000-0001-5800-1434 Mary Hirsch, DataCite https://orcid.org/0000-0002-6628-8225 UKSG Group C
  • 2.
    ● Introducing datacitation ● Data citation for publishers ○ How to ● Data citation for data repositories ○ How to ● Using data citation information (Event Data API) ● Summary/wrap up ● Q&A We’ll cover
  • 3.
    ● References todata, the same way researchers routinely provide a bibliographic reference to other scholarly resources ● Data are often shared, but they are not often cited the same way as journal articles or other publications. ● Let’s change that! What is data citation? HT to Patricia Cruse
  • 4.
    ● Access ● Transparencyand reproducibility ● Reuse ● Track, measure, and count ● Credit ● Mandates ○ Funders ○ Publishers Why cite data? HT to Patricia Cruse
  • 5.
    Data citation forpublishers • Supports scholarship • Extends research • Data cited consistently provides • Transparency • Context “eLife is committed to ensuring researchers get credit for all their outputs, and data is a major component of this.” -- Melissa Harrison, eLife
  • 6.
    Data citation forpublishers Step 1 • Develop a data policy that includes data citation - not too scary ○ Jones, L., Grant, R., & Hrynaszkiewicz, I. (2019). Implementing publisher policies that inform, support and encourage authors to share data: two case studies. Insights, 32(1), 11. DOI: http://doi.org/10.1629/uksg.463 Step 2 • Explain to authors how they should be citing data - not too scary Step 3 • Update internal workflows, including your DTD and instructions to suppliers - quite scary but others have done it and are happy to help Step 4 • Include these citations in your Crossref metadata - Crossref is making this as easy as possible! Cousijn, H. et al. A data citation roadmap for scientific publishers. Sci. Data. 5:180259 doi: 10.1038/sdata.2018.259 (2018).
  • 7.
    Data citation forpublishers eLife 2019;8:e43599 DOI: 10.7554/eLife.43599
  • 8.
    Including data citationsin Crossref metadata 1. References - include data citations in the citations you register with Crossref <citation key="ref2"> <doi>10.6084/m9.figshare.5981968</doi> </citation> 1. Relations - include relationships between DOIs and other items in your Crossref metadata records <rel:related_item> <rel:description>Sicard-2018-External-database-S1 </rel:description> <rel:inter_work_relation identifier-type="doi" relationship- type="references"> 10.6084/m9.figshare.5981968 </rel:inter_work_relation> </rel:related_item>
  • 9.
    Future plans Expanded citationmarkup with publication types <citation key="ref3" publication_type=”data”> <author>Morinha F</author> <cYear>2017</cYear> <institution>Dryad Digital Repository</institution> <title>Data from: Extreme genetic structure in a social bird species despite high dispersal capacity</title> <doi>10.5061/dryad.684v0</doi> <identifier type=”accession”>ABC123</identifier> <unstructured_citation>Morinha F, Dávila JA, Estela B, Cabral JA, Frías Ó, González JL, Travassos P, Carvalho D, Milá B, Blanco G (2017) Data from: Extreme genetic structure in a social bird species despite high dispersal capacity. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.684v0</unstructured_citation> </citation>
  • 10.
    References ● Good optionfor publishers who make their reference metadata openly available via Crossref ● Can be a better fit with their existing workflows ● Data Citations made openly available via Crossref’s APIs ● Citations with DataCite DOI are sent to Event Data Relations ● At the moment, these can give more context about the data/article relationship (can specify ‘references’ or ‘isSupplementedBy’ as relation type) ● Good option for publishers who don’t make their reference information visible via Crossref ● Relations are available via metadata APIs ● Future: will be events in Event Data
  • 11.
    Data citation forrepositories Data repositories recognize the importance of establishing links between datasets and articles ➢ ‘Data citation makes data visible to the research community. Without it, data cannot be accessed for re-use or reproduced for transparency’ - ICPSR ➢ ‘Data publishers efforts often risk going unnoticed, and the true impact of sharing data remains invisible’ - GBIF https://doi.org/10.5438/tyey-k867
  • 12.
    Data citation forrepositories - how? Step 1 ● Researchers ensure associated publications indicated somewhere in dataset metadata (many repositories do additional curation work to establish these links) Step 2 ● Include these links in your DataCite metadata (instructions on next slide) Step 3 ● Publishers can access this information and link back from the article to the dataset
  • 13.
    Adding links toyour metadata deposit
  • 14.
  • 15.
  • 16.
    Data Citation inThree Steps Reference to dataset included in article metadata (Crossref) and/or reference to article included in dataset metadata (DataCite). Crossref DOI <-> DataCite DOI extracted from metadata, stored in Event Data service and made available via APIs. New services and updates of existing services, e.g. DataCite Search, that integrate Event Data information. 1 2 3
  • 17.
    Data Citation inFour Steps Work with bibliometrics community to understand data citation data provided via Event Data, and develop data metrics. 4
  • 18.
    Event Data ● Servicejointly developed by Crossref and DataCite to capture references, mentions and other events around DOIs that are not provided via DOI metadata. ● Also includes references between different DOI registration agencies (like data citations!) ● Events is used as a broad term to for example also include social media mentions and usage statistics. ● Data citations are a small subset of the events captured by the Event Data service. https://twitter.com/aarontay/status/1111600204858290176
  • 19.
    Event Data Serviceis crucial for Data Citation Before Event Data ● Data citations needed to be found in DOI metadata, separate for Crossref and DataCite. ● Doesn’t scale well, limiting adoption of data citation. After Event Data ● Data citations are extracted into a separate service and can easily be found. ● Scales well, will lead to adoption of data citation.
  • 20.
    Here’s an example Crossrefarticle metadata article Crossref Event Data
  • 22.
    Event Data Serviceis crucial for Data Citation Additional Data Sources Event Data can be extended to include other types of events outside the usual article-to-dataset relations. ● Crossref is pulling social media information into Event Data (like Twitter and Wikipedia) ● DataCite is sending data repository usage reports to Event Data (views and downloads)
  • 23.
  • 24.
    Event Data APIs CrossrefEvent Data Query API Query Event Data using a large number of parameters. https://www.eventdata.cros sref.org/guide/ DataCite REST API Query Event Data that include a DataCite DOI. Integrated into the DataCite REST API, includes basic DOI metadata, and aggregations. https://support.datacite.org /docs/eventdata-guide Scholix-compatible API Data citations in Event Data using the Scholix metadata standard. MORE INFO 1 2 3
  • 25.
    Event Data APIs ●These are openly available, anyone can use them ● They’re free to use (and we encourage you to do so!)
  • 26.
    In summary ● Thereare lots of good reasons for data citation ● Collecting data citations from authors is a good place to start! ● Publishers can cite data in their Crossref metadata ● Data repositories can link their data to the articles that cite them ● Data citation information is then made openly available via Event Data so that anyone interested can find and use this information. ● In future, we expect that lots of services will build off Event Data!
  • 27.
  • 28.
    ● Crossref Data& Software Citation Deposit Guide for Publishers: https://support.crossref.org/hc/en-us/articles/215787303-Crossref-Data- Software-Citation-Deposit-Guide-for-Publishers ● Crossref Event Data: https://www.crossref.org/services/event-data/ ● A Data Citation Roadmap for Scientific Publishers: https://doi.org/10.1038/sdata.2018.259 ● A Data Citation Roadmap for Scholarly Data Repositories https://doi.org/10.1101/097196 ● Make Data Count webinar: https://www.youtube.com/watch?v=Lkysz0Mc7fo ● support@crossref.org/support@datacite.org Useful links and contacts:

Editor's Notes

  • #2 Rachael: The underlying data created and/or reused and remixed for research is becoming as crucial as the resulting text-based output. This is your opportunity to dig into the what, the why, and the how of data publication, data citation, and data sharing. Workshop hosts will cover this topic from a range of perspectives. Let’s review the best practices and case studies in data citation and data publishing, add to our collective understanding of why this is so important, and contribute to the next steps in building solutions to improving infrastructure for research data. Both do intros
  • #3 Rachael
  • #4 Mary Borrowing heavily from Patricia Cruse, Data Citations and why they matter, Crossref LIVE18 - I’ll provide a link to this in a later slide. Precedence/context: One of the first large-scale initiatives to establish data citation as a standard academic practice was the FORCE11 Joint Declaration of Data Citation Principles (JDDCP) in 2014. This declaration was endorsed by over 100 organizations in the scholarly community as well as many individuals. Following this agreement on how data citation should be done, many projects followed. Within FORCE11, the Data Citation Implementation Pilot brought together publishers and repositories to put data citation into practice and work on the implementation of the JDDCP. Within the context of the Research Data Alliance, a data-literature linking group started under the name of Scholix to establish a framework for exchanging information about the relationships between articles and datasets. The infrastructure building blocks now feed into projects such as Make Data Count and Enabling FAIR Data.
  • #5 Mary Lots of reasons! If datasets are cited consistently and in a standard way, it will make it much easier for the research community to see links between different research outputs and work with these outputs. It also makes it much easier to count these citations, so that researchers can get credit for their data and the sharing of that data. 1) Transparency and reproducibility Most scientific results that are shared today are just a summary of what researchers did and found. The underlying data are not available, making it difficult to verify and replicate results. If data would always be made available with publications, transparency of research would be greatly improved. 2) Reuse The availability of raw data allows other researchers to reuse the data. Not just for replication purposes, but to answer new research questions. 3) Credit When researchers cite the data they used, this forms the basis for a data credit system. Right now researchers are not really incentivized to share their data, because nobody is looking at data metrics and measuring their impact. Data citation is a first step towards changing that.
  • #6 Rachael
  • #7 We appreciate it’s not as easy as just putting extra info in your XML! T&F and Springer Nature explain their policies in this Insights article. Let us know if we can help! Published in the Data C
  • #8 Here’s a nice example from eLife - explaining what data is available, how it is being made available and providing the persistent identifier for the data - in this case a DataCite DOI. The existance of the persistent identifier is key in taking the next steps for data citation.
  • #9 From a Crossref perspective, it’s important that this information finds it into the metadata that publishers register with us, in order for us to share that information. Simplest way is just the DOI but you can put the full citation in there too. References: Data citations are citations / we currently support with DataCite DOIs / our limited citation markup support makes identifying other data citations difficult Relations: Citations are relations (‘references’)/ allows you to include a more granular type of relationship as well as a description / can append to existing record without touching citations
  • #12  Mary
  • #14 Mary Scholix recommendation IsCitedBy will go into Event Data Fields refer to DataCite metadata (all 3 need to be populated to appear in event data) DataCite metadata schema which is aligned with the Scholix metadata schema There is no single relation type to describe citations, and relation types relevant for citations are sometimes used differently across organizations.
  • #15 JSON
  • #17 Here it is in REAL LIFE! DataCite Search
  • #19 Rachael
  • #20 Rachael
  • #21 Rachael - example of flow of information.
  • #22 Example slide 2. Makes possible - citation information on data (Make Data Count is doing good work around metrics around data). Scholexplorer - data-literature links MDC & Scholix work hand in hand to advocate for best data citation practices Scholix is an information framework for submitting data citations
  • #23 Mary contributions > hubs https://datascience.codata.org/articles/10.5334/dsj-2019-009/
  • #24 Sneak peak Third parties build services based on the information provided by Event Data
  • #25 Mary Scholix > recommendations article >< dataset provides guidance (openaire get links from DataCite scholix compatible APi) Scholix specifies a format for information sharing that was agreed upon within scholix group (format!)
  • #26 Rachael
  • #27 Rachael (Mary, this is you and I being happy about lots of services building off event data). Yaaay We need you to do this so that there is stuff to collect.
  • #28 End slide: Rachael can moderate questions
  • #29 Feel free to add! Do a shout out to people - if they want to talk about the work they’re doing then get in touch! Rachael/Mary presenting at UKSG.