1. Show me the data!
Data peer review at Scientific Data
Varsha Khodiyar, Scientific Data
30.03.2017
2. 1
Scientific Data, a Nature Research journal
Data Descriptor
Primary article type; sound
science and facilitates data
reuse
Analysis
New analyses or meta-
analyses of existing data
Article
Original reports on
advances in data sharing &
reuse
Comment
Announcements of broad
interest; usually invited
www.nature.com/scientificdata
3. 2
Under the hood of a Data Descriptor
• Context for data generation
(background)
• How was data generated?
• How was data processed?
• Where is the data?
• Synthesis
• Analysis
• Conclusions
4. 3
A key principle of publishing at Scientific Data
Wilkinson M.D., et al . The FAIR Guiding Principles for
scientific data management and stewardship.
Scientific Data 3; 160018 (2016)
doi:10.1038/sdata.2016.18
Findable – (meta)data is uniquely and
persistently identifiable.
Accessible – data is reachable and
accessible by humans and machines, using
standard formats and protocols.
Interoperable – (meta)data is machine
readable and annotated with resolvable
vocabularies and ontologies.
Reusable – (meta)data is sufficiently well-
described to allow integration with
compatible data.
5. 4
Data Descriptors have human and machine
understandable components
Human readable
representation of
study
i.e. article (HTML &
PDF)
Human readable
representation of
study
i.e. article (HTML &
PDF)
6. 5
Data Descriptors have human and machine
understandable components
Machine accessible
representation of
study
i.e. metadata
7. 6
What types of data can be published?
6
Decades old
dataset
Standalone
dataset
Data that has been
used in an analysis
article
Large
consortium
dataset
Data from a
single
experiment
Any data that the researcher
finds valuable and that others
might find useful too
Data associated with a
high impact analysis
article
8. 7
When can a Data Descriptor be published?
7
After data
analysis has been
published
Before analysis has
been published
Authors not
intending to
analyse data
Data Descriptors can be
submitted and published at
any point in the research
workflow, i.e. whenever it
makes most sense for your
data
After data
analysis has been
published
Before the analysis
has been
published
Publication alongside
analysis article
10. 9
Researchers are sharing and reusing data
• Direct contact between researchers
(on request) is the most common
way of sharing data
• Repositories are second most
common method of sharing
Why might direct contact be the
most preferred method?
Fig 2A & C; Kratz and Strasser, PLOS ONE (2015)
doi: 10.1371/journal.pone.0117619
11. 10
Researchers see peer review as a mark of data quality
• Respondents trust peer review above all else: 72% (n = 175) say peer review
confers high or complete confidence in the data
Figure 6B; Kratz and Strasser, PLOS ONE (2015) doi: 10.1371/journal.pone.0117619
14. 13
Selection of Editorial Board members
Experts in their discipline
AND
Demonstrable experience of data standards, data reuse or data analysis in
their discipline
www.nature.com/sdata/about/editorial-board#eb
15. 14
Data peer review
www.nature.com/sdata/policies/for-referees
Experimental
Rigor and
Technical Data
Quality
Were data produced in a sound manner?
Technical quality of data – appropriate statistical analyses?
Experimental rigor - appropriate depth, coverage?
Completeness
of the
Description
Sufficient detail to allow others to reproduce these steps?
Sufficient detail to allow others to reuse this data?
Consistent with relevant minimum reporting standards?
Integrity of the
Data Files and
Repository
Record
Do data files appear complete and match manuscript
descriptions?
Are data archived to the most appropriate repository?
16. 15
We capture metadata about the dataset being described in each Data Descriptor.
During the metadata curation process
• Manuscript re-read
• Data archive checked
• Minor issues with the data and/or manuscript often identified
Metadata curation and final data checking
17. 16
Why a Data Descriptor may be rejected
Reject without review
• Out of scope or no data present
Reject after review
• Serious flaws in the study design,
e.g. lack of crucial controls
• Serious issues identified in the data
files by the peer reviewers
After rejection
• Address concerns and resubmit to Scientific Data
• Resubmit to another data journal
• Withdraw data from Scientific Data integrated repositories
Data should be technically reliable and suitable for use by others
19. 18
Create a data management plan
• Can avoid problems later
• Increasingly required by funders
• Critically evaluate existing practices – you may be setting standards for
your field
• Some aspects of best practice may incur costs
• Find people and resources that can help you
Datasets CodeMetadataResearch paper
Nature Genetics
20. 19
Archive your data to the most appropriate repository
We currently list around 90
repositories, across biological,
medical, physical and social sciences
www.nature.com/sdata/policies/repositories
Considerations:
1. Is there a discipline or data-specific repository for your data?
2. If no discipline or data-specific repository for your data exists, does your
funder or institution mandate deposition to a particular repository?
21. 20
Spot the mistakes
Unhelpful
document name
Formatting used to
convey information
Special characters
can cause text
mining errors
Meaningless
column titles
Undefined
abbreviation No units are
given
25. 24
Increasing reproducibility
• Include any additional information needed to understand the data,
methods, parameters, e.g. which instrument (make and model) was
used to measure blood carbon dioxide levels?
• Include availability statements for any code that was used to view,
parse or analyse the data, in support of the conclusions.
28. 27
Data reuse by other researchers in the same field
2
“The Data Descriptor made it easier to
use the data, for me it was critical that
everything was there…all the technical
details like voxel size.”
Professor Daniele Marinazzo
30. 29
Data reuse by the non-research community
2
http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
31. 30
Data peer review at Scientific Data
Data Archive
• Checked multiple times
• Scientific reasoning underlying data reviewed by active researchers
• Technical validity reviewed by discipline experts
Data
Citations
• Citation accuracy confirmed by specialist editor
• Citation format checked by editorial team
• Data linkage tested by production team
Data Peer
Review
• Does not have to be onerous
• Can save overall reviewing time
• Results in data that is reusable and useful!