Unleash Your Potential - Namagunga Girls Coding Club
Carpenter: What Constitutes Peer Review of Data Study
1. WHAT CONSTITUTES PEER
REVIEW OF RESEARCH DATA
– A STUDY OF CURRENT PRACTICE
TODD A. CARPENTER
EXECUTIVE DIRECTOR, NISO
CO-CHAIR, RDA DATA PUBLISHING IG
2. IT DOESN’T MAKE SENSE WITHOUT THE METADATA!*
XKCD (CONVINCING)
* With apologies for the sexism to my many math-minded female friends!
3. WE ALL KNOW WHAT PEER REVIEW OF A PAPER IS
Initial editorial review includes things like:
• Does the article fit within the journal’s scope and aims?
• Will it be of interest to the readership?
• Does it meet minimum quality standards for writing and content?
• Did the author follow the instructions?
• Did it meet some type of deadline?
Journal editors reject between 6 and 60% of papers at this stage.
(Per Schultz)
4. AND WHAT PEER REVIEWERS ASSESS IN PAPERS
• Study design and methodology
• Soundness of work and results
• Presentation and clarity of data
• Interpretation of results
• Whether research objectives have
been met
• Whether a study is incomplete or too
preliminary
• Novelty and significance
• Ethical issues
• Other journal-specific issues
5. BUT PEER REVIEW ISN’T PERFECT
• Causes delay in publication
• Bias against specific categories of papers/topics
• Social & cognitive biases
• Unreliability
• Inability to detect errors and fraud
• Lack of transparency—leading to unethical practices
• Lack of recognition for reviewers
(from Walker and Rocha da Silva)
6. PEER REVIEW OF DATA IS EVEN LESS PERFECT
• Overall complexity of data (Data,
metadata, methodology, IP)
• Not only need data, but software to
process data as well
• Provenance of data
• Post gathering, preprocessing of data
• Statistical/computational rigor
• Size and complexity of data sets
• Also need to review code/software for
proper analysis
• How to handle different versions of the
same data set?
• When does a new version require
additional review? Does the earlier
review still apply?
• More work on already burdened
reviewers
7. JUST LIKE ALL OTHER SCHOLARLY CONTENT
“It was generally agreed that
data should be peer reviewed.”
- 2014 Study of 4,000 researchers by David Nicholas et al.
8. HECK, WHO PUBLISHES THEIR DATA ANYWAY?
A 2011 study of 500 papers published in 2009 from
50 top-ranked research journals showed that only
only 47 papers (9%) of those reviewed had
deposited full primary raw data online.
(Source: Alsheikh-Ali et al)
9. OH, HOW DATA PUBLISHING HAS CHANGED
Over the past decade the number of journals that accept data has increased.
The number of titles that explicitly require data sharing in some form
is also increasing rapidly.
A culture of data sharing is developing and researchers are responding to data
sharing requirements, the efficacy of data sharing, and its growing acceptance
as a scientific norm in many fields.
10. WHOLE LOT OF DATA PUBLISHING GOING ON
Source: Assante et al.
11. LOTS OF JOURNALS PUBLISH DATA TODAY
Data policies are growing in number
• PLOS
• AGU
• SpringerNature
• American Economic Association
• COPDESS Statement of Commitment (43 Signatories)
• The International Committee of Medical Journal Editors has recommended that as a “condition
of … publication” of a trial report, journals will “require authors to share with others the
deidentified individual-patient data no later than 6 months after publication”.
• Resulting from inclusion of data release as part of OSTP Memo, Wellcome Trust Policy, Gates
Foundation policy and many others.
12. BUT WHO IS RESPONSIBLE FOR DATA PEER
REVIEW?
• Repositories that accept data?
• Some repositories, such as DRYAD & NASA Planetary Data System Qualitative
Data Repository, perform data review upon deposit.
• Journals that require data submission as part of publication process?
• Some journals that require data release with publication require at least cursory
peer review of associated data with their papers.
• Data journal publishers?
• Explicitly providing peer review and “publication” of data
13. WHAT IS A DATA PAPER?
• Data papers are “scholarly publication of a searchable metadata
document describing a particular on-line accessible dataset, or a
group of datasets, published in accordance to the standard
academic practices”
(Chavan & Penev, 2011)
• Phrased more succinctly: “Information on the what, where, why, how
and who of the data” (Callaghan et al., 2012)
14. WHO PUBLISHES THEM?
• In 2014 paper, Candela et al. identified 116 journals that publish data
papers. 7 “pure”, i.e., just data publications, and 109 that published at least
one data paper, of which Springer/BioMedCentral contributed 94 titles
• Subsequently, additional titles have launched and more detailed data access
policies are being advanced.
• SpringerNature/BioMedCentral now have a set of four data policies that
cover all of their titles. Only the 8 level 4 titles, where data sharing is
required were included in this study Num of titles Type
8 4
343 3
178 2
217 1
15. STUDY METHODOLOGY
• Taking Candela et al. list of titles, I added newly released data journals to
the list (13 new titles), as well as SpringerNature’s Data Policy types. In total,
39 specific titles were reviewed.
• Found the available peer review criteria or instructions for reviewers or
editorial policies.
• Extracted key criteria from each to compile then compare similarities &
differences.
17. PEER REVIEW OF DATA
IS SIMILAR TO PEER REVIEW
OF A PAPER, BUT INCLUDES
A LOT MORE
18. RECOMMENDED PEER REVIEW CRITERIA
FROM LAWRENCE ET AL.
Suitability for Publication
Importance
Originality/Novelty - of science
Metadata Quality
Metadata Presentation
Metadata Standards Conformance /
Template Conformance
Dataset DOI Assignment
Metadata Rights Information
Provenance
Metadata Completeness
Data Logic & Consistency
Data Format Consistency
Data - Plausibility
Data - worthy of sub selection/broad
enough/Appropriate selection
Data - Software used to process
described
Data Quality Methods
Data Anomalies Documented
19. FIVE BROAD CATEGORIES OF PEER REVIEW
PROCESSES FOR DATA
Editorial Review (9 Criteria)
Metadata Quality (9 Criteria)
Data Quality (15 Criteria)
Methodology (12 Criteria)
Other – (11 Criteria)
20. EDITORIAL REVIEW OF THE DATASET
Editorial Review 36
Open Peer Review 3
Conflict of Interest Policy 17
Topical Appropriateness in Title 29
Suitability for Publication in Title 28
Importance of the Subject 12
Overall Quality of Research 31
Unpublished 13
Originality/Novelty - of Science 27
21. METADATA REVIEW OF THE DATASET
Metadata Quality 33
Metadata Presentation 24
Metadata Standards Conformance / Template
Conformance 19
Title/Abstract/Writing Clarity 25
Keyword Selection 2
Dataset DOI Assignment 10
Metadata Rights Information 1
Provenance 0
Metadata Completeness 19
22. DATA REVIEW OF THE DATASET
Data Logic & Consistency 17
Data Format Consistency 17
Data - Non-Proprietary Formats 6
Data - Plausibility 1
Data - of High Quality 2
Data - Worthy of sub selection/broad enough/Appropriate selection 16
Data reuse 23
Data - Software Used to Process Described 15
Data Units of Measure 22
Data Quality Methods 17
Data Anonymization 5
Data Anomalies Documented 2
How are outliers identified & treated? 1
Any data errors introduced by transmissions (checksums)? 3
Any errors in technique, fact, calculation, interpretation, completeness? 18
23. DATA REVIEW OF THE DATASET
Methodology - Appropriate 25
Methodology - Current 9
Methodology - Data Collection Methods 23
Methodology - High Technical Standard 18
Methodology - Equipment Description 2
Methodology - Replicable 23
Methodology - Quality Control 3
Study design - Overall 16
Methodology - Supporting Experiments 16
Study design - Replications Identified 15
Study design - Independencies or Co-Dependencies 16
Study/Data Preprocessing Described 16
24. OTHER REVIEW CRITERIA
Citations to Other Relevant Materials 29
Fairness 1
Anonymity of Reviewers (if desired) 15
Ethics of Experimentation 28
Public Data Sharing, Open License Requirement 32
Data Repository with Sustainability Model 13
Data Sharing - Platform Agnostic 1
Link to Public Repository 31
Descriptions of How to Access Data 30
Abbreviations Noted/Defined 15
All Contributors/Authors Credited 16
25. MOST-REFERENCED PEER REVIEW CRITERIA
Attribute
Included
Statement
Lawrence
et al
Editorial Review 36
Metadata Quality 33 x
Public Data Sharing, Open License Requirement 33
Overall Quality 31
Link to Public Repository 31
Descriptions of How to Access Data 30
Topical Appropriateness 29
Citations to Other Relevant Materials 29
Suitability for Publication 28 x
Ethics of Experimentation 28
Originality/Novelty - Of Science 27 x
Title/Abstract/Writing Clarity 25
Methodology - Appropriate 25 x
Metadata Presentation 24 x
Data Reuse 23
26. LEAST-REFERENCED PEER REVIEW CRITERIA
Attribute
Included
Statement
Lawrence
et al
Any data errors introduced by transmissions (checksums)? 3 x
Methodology - Quality Control 3
Keyword Selection 2
Data - Of High Quality 2
Data Anomalies Documented 2 x
Methodology - Equipment Description 2
Metadata Rights Information 1 x
Data - Plausibility 1 x
How are outliers identified & treated? 1 x
Fairness 1
Data sharing - Platform Agnostic 1
Provenance 0 x
27. WHAT CONSTITUTES ROBUST PEER REVIEW
• Editorial purpose and cohesion is important, even in data publication
• Significant push for openness of data & reuse, with recognition of
anonymization or access control if appropriate
• Links to public repositories and details on gaining access
• Ethical concerns regarding data collection need to be documented
• Overall quality of metadata, with specifics depending on domain
• Focus on replicability
28. COMPARISON TO LAWRENCE ET AL LIST WITH
CURRENT PRACTICE
• Of the 23 criteria outlined by Lawrence, et al, only 5 were in the top third of
criteria being applied by journal publishers
• 10 of the 23 were in the middle third of criteria by number of titles applying
those criteria
• And 8 of their 23 criteria are in use by less than 1/3 of reviewed policies.
Shows how much the process of data publishing has changed in the past 6
years.
30. WHAT IS “DATASET VALIDATION” IN REPOSITORIES?
Many journals request (or require) deposit of data in a trusted repository.
Few repositories conduct Dataset Validation upon deposit.
• “At the moment among the most debated and undefined data publishing
phases”
• “There are no shared established criteria on how to conduct such review and
on what data quality is.”
(Source: Assante et al.)
31. THE ENTIRE PEER REVIEW PROCESS ISN’T KNOWABLE
• Detailed peer review instructions are not always publicly available and
additional detail/instructions may be provided to peer reviewers.
• One journal’s catch-all instruction was “ALL Aspects of the data set are
reviewed”
• This is a study of policies, not of actual peer review behavior. (i.e., some peer
reviewers might not follow the exact letter of the policy[!])
32. CONCLUSION
• Over the past five years, there has been a significant
growth in the publication and sharing of scholarly
research data.
• With this rise, there has been an increase in the
demand for peer review of data sets.
• Each domain is different, and the expectations of
data review are distinct, as they are with article peer
review processes, but there are commonalities and
baseline standards can be propagated.
33. REFERENCES
• Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP. Public availability of published research data in high-impact journals. PLoS One. 2011;6(9):e24357.
doi: 10.1371/journal.pone.0024357. Epub 2011 Sep 7.
• Assante, M. et al., (2016). Are Scientific Data Repositories Coping with Research Data Publishing?. Data Science Journal. 15, p.6. DOI:
http://doi.org/10.5334/dsj-2016-006
• Candela, Leonardo, et al. "Data journals: A survey." Journal of the Association for Information Science and Technology 66.9 (2015): 1747-1762.
• Chavan, V., & Penev, L. (2011). The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics, 12 (15).
doi:10.1186/1471-2105-12-S15-S2
• Lawrence, B., Jones, C., Matthews, B., Pepler, S., & Callaghan, S. (2011). Citation and peer review of data: Moving towards formal data publication.
International Journal of Digital Curation , 6 (2), 4-37. http:// doi:10.2218/ijdc.v6i2.205
• Nicholas, David, et al. "Peer review: still king in the digital age." Learned Publishing 28.1 (2015): 15-21. Also CICS/CIBER. Trust and Authority in Scholarly
Communications in the Digital Era (2013). Available at: http://ciber-research.eu/download/20140115-Trust_Final_Report.pdf
• Schultz DM (2010). Rejection rates for journals publishing in the atmospheric sciences. Bulletin of the American Meteorological Society, 91(2): 231-243. doi:
10.1175/2009BAMS2908.1.]
• SpringerNature Research Data Policy Types, http://www.springernature.com/gp/group/data-policy/policy-types
• Taichman DB, Backus J, Baethge C, Bauchner H, de Leeuw PW, Drazen JM, et al. Sharing clinical trial data – A proposal from the International Committee of
Medical Journal Editors. N Engl J Med. 2016;374(4):384–6.
• Walker, Richard and Rocha da Silva, Pascal. Emerging trends in peer review—a survey. Front. Neurosci., 27 May 2015.
https://doi.org/10.3389/fnins.2015.00169
34. THANK YOU!
TODD A. CARPENTER
EXECUTIVE DIRECTOR, NISO
TCARPENTER@NISO.ORG
@TAC_NISO