Reproducibility of Published
Scientific and Medical Findings in
Top Journals in an Era of Big Data
Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP
.org 2014 Tech Conference
The Honourable Robert Boyle 1627–1691, Experimental Philosopher
(Image credit: Wellcome Library, London). Published with written permission.
First published in 1661, The Sceptical Chymist: or Chymico-Physical Doubts &
Paradoxes, Touching the Spagyrist's Principles Commonly call'd Hypostatical; As
they are wont to be Propos'd and Defended by the Generality of Alchymists.
Whereunto is præmis'd Part of another Discourse relating to the same
Subject was written by Robert Boyle and is the source of the name of the
modern field of 'chemistry.' Image credit: Project Gutenberg.
Yunda Huang and Raphael Gottardo. Comparability and reproducibility of biomedical data. Brief Bioinform. Jul 2013; 14(4): 391–401.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3713713/#!po=71.0526. doi: 10.1093/bib/bbs078.
“Stodden’s talk reminded us that the idea of open data is not a new one; indeed, when studying the
history and philosophy of science, Robert Boyle is credited with stressing the concepts of skepticism,
transparency, and reproducibility for independent verification in scholarly publishing in the 1660s.
The scientific method
later was divided into
two major branches,
deductive and empirical
approaches, she noted.
Today, a theoretical
revision in the scientific
method should include
a new branch, Stodden
advocated, that of the
approach, where like
the other two methods,
all of the computational
steps by which scientists draw conclusions are revealed. his is because within the last 20 years,
people have been grappling with how to handle changes in high performance computing and
simulation. What is often referred to as “big data” has revolutionized science. Some examples of
this include the Large Hadron Collider (LHC) at CERN which generates around 780 terabytes per
year, the Sloan Digital Sky Survey that recently released 60 terabytes, and computational biology,
bioinformatics, and genomics which are also highly data intensive modern fields of science,
8 Different Standards for Rating the Quality of Open Data
“When working with government data it may be helpful to keep a few key guidelines in mind. The problem is, there are many
guidelines. A working group within OpenGovData.org developed ‘8 Principles of Open Government Data’ which are: ‘1. Data Must Be
Complete... 2. Data Must Be Primary... 3. Data Must Be Timely... 4. Data Must Be Accessible... 5. Data Must Be Machine processable...
6. Access Must Be Non-Discriminatory... 7. Data Formats Must Be Non-Proprietary... 8. Data Must Be License-free.’ This is very similar
to the Sunlight Foundation's ‘Ten Principles for Opening Up Government Information’— ‘1. Completeness... 2. Primacy... 3.
Timeliness 4. Ease of Physical and Electronic Access... 5. Machine readability... 6. Non-discrimination... 7. Use of Commonly
Owned Standards...8. Licensing... 9. Permanence... 10. Usage Costs.’ Open government data initiatives could also be held up to a
5-star rating method, which has been proposed by Tim Berners-Lee, the British computer scientist credited with inventing the
World Wide Web:
★ Available on the web (whatever format), but with an open licence to be Open Data
★★ Available as machine-readable structured data (e.g. Excel instead of image scan of a table)
★★★ As (2), plus non-proprietary format (e.g. CSV instead of Excel)
★★★★ All the above, plus use W3C open standards (RDF and SPARQL)
★★★★★ All the above, plus link your data to other people’s data to provide context
The Open Data Institute has created an ‘Open Data Certificate’ for data and rates it against a checklist. Certificates awarded grade
data as ‘Raw: A great start at the basics of publishing open data, Pilot: Data users receive extra support from, and can provide
feedback to the publisher, Standard: Regularly published open data with robust support that people can rely on, and Expert: An
exceptional example of information infrastructure.’ In 2009, The White House created a scorecard by which open data can be
evaluated according to 10 criteria: ‘high value data, data integrity, open webpage, public consultation, overall plan, formulating the
plan, transparency, participation, collaboration, and flagship initiative.’ The U.S. government's simple stoplight-like rating system was
as follows: green for data that ‘meets expectations,’ yellow for data that demonstrates ‘progress toward expectations,’ and red for
data that ‘fails to meet expectations.’ At the other end of the spectrum, there is an exceptionally complex checklist offered by
OPQUAST. On May 9, 2013, President Obama issued an Executive Order ‘Making Open and Machine Readable the New Default for
Government Information’ wherein a new ‘Open Data Policy’ has just been established and being newly implemented through
‘Project Open Data’ in which there are seven key principles: ‘public, accessible, described, reusable, complete, timely, and managed
post-release.’ There does not seem to be an associated rating system, however, to evaluate how well the data complies with the
principles. Finally, Nature has set up three criteria for data: firstly, ‘’experimental rigor and technical data quality,’ secondly,
‘completeness of the description,’ and lastly, ‘integrity of the data files and repository record..’”
EVALUATING P VALUES, R², AND SAMPLE SIZES
IN TOP-TIER SCIENCE JOURNALS
Access to raw or curated data sets that accompanies published articles adds to
greater transparency, and it may catch obvious errors or fraudulence. However,
it will only show the studies were done to the best of the ability given a small
In addition to replication of experiments and reproduction based on data sets
and methodology, as we move into an era of “Big Data,” from small sample
sizes to very large ones, the arguments in previously published articles BOTH
in top-tier and lower-tier journals will probably be disproved while others will
be proved nearly conclusively.
Fraud and “gaming” of p value (predictability value) data, a mathematical
description, sample size , and “cherry picking” of successful results are key
ways that fraud happens. If 100 mice were tested and only 10 showed the results
that were expected, for example, and only the 10 were reported on. Big data
makes it less likely that this cherry picking will work. Is there a way to test for
gaming of the data?Yes there are at least two ways: 1) Examine lab notebooks to
see how many experiments were done correctly but “thrown out” because the
results did not fit the hypothesis, and 2) “Big Data” opens the door for much
greater levels of certainty—that is, predictable, reproducible p values.
Figure 1. Breakdown of journal policies for public deposition of certain data types, sharing of materials and/or
protocols, and whether this is a condition for publication and percentage of papers with fully deposited data.
Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JPA (2011) Public Availability of Published Research Data in High-
Impact Journals. PLoS ONE 6(9): e24357. doi:10.1371/journal.pone.0024357.
Table 1. Economic Terms and Analogies in Scientific Publication
Young NS, Ioannidis JPA, Al-Ubaydli O (2008) Why Current Publication Practices May
Distort Science. PLoS Med 5(10): e201. doi:10.1371/journal.pmed.0050201
King G (1995) Replication, replication. Political Science and Politics 28: 443–499. http://gking.harvard.edu/files/abs/
replication-abs.shtml (last accessed May 8, 2013).
Mesirov J (2010) Accessible reproducible research. Science 327: 964.http://www.sciencemag.org/content/327/
5964/415 (last accessed May 8, 2013).
Alsheikh-Ali A, Qureshi W, Al-Mallah M, Ioannidis JPA (2011) Public availability of published research data in high-impact
journals. PLoS ONE 6: 9.http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0024357 (last accessed
May 8, 2013). [PMC free article] [PubMed]
Victoria Stodden, Peixuan Guo, and Zhaokun Ma, Toward Reproducible Computational Research: An Empirical Analysis of
Data and Code Policy Adoption by Journals. PLoS One. 2013; 8(6): e67111. Published online Jun 21, 2013.
doi: 10.1371/journal.pone.0067111. PMCID: PMC3689732.
Reproducible Research. Special Issue, Computing in Science and Engineering 14: 4 11–56. http://ieeexplore.ieee.org/xpl/
tocresult.jsp?reload=true&isnumber=6241356&punumber=5992(last accessed May 8, 2013).
Stodden V, Mitchell I, LeVeque R (2012) Reproducible research for scientific computing: Tools and strategies for changing
the culture. Computing in Science and Engineering 14: 4 13–17.http://www.computer.org/csdl/mags/cs/2012/04/
mcs2012040013-abs.html(last accessed May 8, 2013).
Young NS, Ioannidis JP, Al-Ubaydli O (2008) Why current publication practices may distort science. PLoS Med 5: e201.
Increasing value and reducing waste in research design, conduct, and analysis John P A Ioannidis, Sander Greenland,
Mark A Hlatky, Muin J Khoury, Malcolm R Macleod, David Moher, Kenneth F Schulz, Robert Tibshirani The Lancet 11
January 2014 (Volume 383 Issue 9912 Pages 166-175 DOI: 10.1016/S0140-6736(13)62227-8)
John Wood, Nck Freemantle,Michael King,Irwin Nazareth. Trap of trends to statistical significance: likelihood of near
significant P value becoming more significant with extra data. BMJ 2014; 348 doi:
http://dx.doi.org/10.1136/bmj.g2215 (Published 31 March 2014)Cite this as: BMJ 2014;348:g2215.