Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
Cultivation of KODO MILLET . made by Ghanshyam pptx
The challenge of sharing data well, how publishers can help
1. The challenge of sharing data well and how publishers can help
Varsha Khodiyar, PhD
DREAM Challenges and Epidemium@RECOMB
workshop, Paris
19.04.2018
2. 1
What are the challenges to sharing data?
12%
16%
20%
23%
28%
Costs of
sharing data
Lack of time
to deposit
data
Not knowing
which
repository to
use
Unsure
about
copyright
and licensing
Organising
data in a
presentable
and useful
wayTotal respondents: 7719
Stuart, D. et al. Practical challenges for researchers in data sharing. Springer Nature
whitepaper (2018) https://doi. org/10.6084/m9.figshare.5975011
4. 3
The Data Descriptor
Sections
• Title
• Abstract
• Background & Summary
• Methods
• Data Records
• Technical Validation
• Usage Notes
• Figures & Tables
• References
• Data Citations
• Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.
• Does not contain tests of new scientific hypotheses
• Data must be archived in a repository at submission
• Peer reviewers are asked to comment on the reusability of the work as presented
5. 4
Data peer review
nature.com/sdata/policies/for-referees
Experimental
Rigor and
Technical Data
Quality
Were data produced in a sound manner?
Technical quality of data – appropriate statistical analyses?
Experimental rigor - appropriate depth, coverage?
Completeness
of the
Description
Sufficient detail to allow others to reproduce these steps?
Sufficient detail to allow others to reuse this data?
Consistent with relevant minimum reporting standards?
Integrity of the
Data Files and
Repository
Record
Do data files appear complete and match manuscript
descriptions?
Are data archived to the most appropriate repository?
6. 5
We capture information about the dataset being described in each Data Descriptor.
During the metadata curation process
• Manuscript re-read
• Data archive checked
• Minor issues with the data and/or manuscript often identified
• Metadata captured in ISA-Tab format
Metadata curation and final data checking
8. 7
Data Descriptors help researchers share their data in a reusable
way
7
“The Data Descriptor made it easier to
use the data, for me it was critical that
everything was there…all the technical
details like voxel size.”
Professor Daniele Marinazzo
9. 8
Helping researchers know where to share their data
nature.com/sdata/data-policies/repositories
Browse our recommended data repositories online.
• We currently list more than 100 repositories, across biological,
medical, physical and social sciences
• When requested we provide guidance to authors on the best place to
store their data
13. 12
How can we help Challenge participants?
Visit nature.com/scientificdata
Email scientificdata@nature.com
Tweet @ScientificData
Editor's Notes
Main findings:
Researchers do share and use one another’s data but lack places to put it.
They would value a high quality data publication
We do not expect reviewers to open every file or reuse the data in it’s entirety.
Often identify minor issues with the data, files, or archive as part of review process, e.g.
Typos in data archive and/or manuscript tables
File names differences between archive and manuscript
Typos in accession IDs in manuscript
Final data citation checking (e.g. changes to data DOI)
Essentially facilitates an additional final check that the data and manuscript are as accurate as possible prior to final publication, so that data is maximally reusable.
The curation process adds another layer of checks to the manuscript, ensuring that the published article and data archive are as accurate as possible, maximising data reusability.
Daniele knew about the dataset prior to Chris’ paper being published, as Chris had shared this in Torrent Exchange. However he did not access the data from this. He saw on Twitter when the SciData paper was published and then read the paper. Daniele said “I would never have collected this data myself, as it’s not my primary field of work”.
He said the Data Descriptor made it easier to use the data “for me it was critical that everything was there [in the Data Descriptor]…all the technical details like voxel size.”