Reproducibility and Scientific Research: why, what, where, when, who, how
1. Reproducibility
and Scientific
Research
Professor Carole Goble CBE FREng FBCS
The University of Manchester, UK
carole.goble@manchester.ac.uk
Open Data Manchester, 27th January 2015
icanhascheezburger.com
why, what, where, when, who, how
2. Scientific publications have at least
two goals:
(i) to announce a result and
(ii) to convince readers that the
result is correct
…..
papers in experimental [and
computational science] should
describe the results and provide a
clear enough protocol [or
algorithm] to allow successful
repetition and extension
Jill Mesirov
Accessible Reproducible Research
Science 22 January 2010:
Vol. 327 no. 5964 pp. 415-416
DOI: 10.1126/science.1179653
Virtual Witnessing / Minute Taking
7. “an experiment is reproducible until
another laboratory tries to repeat it.”
Alexander Kohn
9. design
cherry picking data, random seed
reporting, non-independent bias, poor
positive and negative controls, dodgy
normalisation, arbitrary cut-offs,
premature data triage, un-validated
materials, improper statistical analysis,
poor statistical power, stop when “get to
the right answer”, software
misconfigurations misapplied black box
software
reporting
John P. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005,
DOI: 10.1371/journal.pmed.0020124
incomplete reporting of software configurations, parameters & resource
versions, missed steps, missing data, vague methods, missing software
Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013
Empirical
Statistical
Computational
V. Stodden, IMS Bulletin (2013)
11. Transparency / Availability Gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Out of 18 microarray papers, results
from 10 could not be reproduced
13. Broken software, broken science
• GeoffreyChang, Scripps Institute
• Homemade data-analysis program
inherited from another lab
• Flipped two columns of data,
inverting the electron-density map
used to derive protein structure
• Retract 3 Science papers and 2 papers
in other journals
• One paper cited by 364The structures of MsbA (purple) and
Sav1866 (green) overlap little (left)
until MsbA is inverted (right).
Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006:
vol. 314 no. 5807 1856-1857
http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
14. “An article about computational science in a
scientific publication is not the scholarship
itself, it is merely advertising of the
scholarship.The actual scholarship is the
complete software development
environment, [the complete data] and the
complete set of instructions which generated
the figures.”
David Donoho, “Wavelab and Reproducible
Research,” 1995
Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160
Ince et alThe case for open computer programs, Nature 482, 2012
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
third party services,
system software
infrastructure,
compilers
hardware
Self-contained codes??
16. WHY? 12+3 reasons
research goes “wrong”
1. Pressure to publish
2. Impact factor mania
3. Tainted resources
4. Bad maths
5. Sins of omission
6. Science is messy
7. Broken peer review
8. Some scientists don’t share
9. Research never reported
10. Poor training -> sloppiness
11. Honest error
12. Fraud
13. Disorganisation & time pressures
14. Cost to prepare and curate materials
15. Inherently “unreplicable ” (one-off data, specialist kit, stochastic)
https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)
17. • replication hostility
• resource intensive
• no funding, time,
recognition, place to
publish
• the complete
environment?
Its HARD to
Prepare and Independently Test
[Norman Morrison]
20. Can I repeat
my method?
publish article
DEFEND
*Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
WHEN? same experiment, set up, lab
submit article
(and move on…)
Can I replicate
your method?
CERTIFY
(a window before decay sets in … )
same experiment,
set up,
independent lab
Can I reproducemy
results using your method or
your results using my method?
COMPARE
variations on experiment, set up, lab
Can I reuseyour
results / method in my
research ?
TRANSFER
different experiment
21. WHO? scientific ego-system & access
trust, reciprocity, and competition
blame
scooping
no credit / credit drift
misinterpretation
scrutiny trolling
cost of preparation
support distraction
dependents on old news
loss of dowry
loss of special sauce
hugging
flirting
voyerism
cautionary creeping
Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012
Borgman The conundrum of sharing research data, JASIST 2012
22. John P. A. Ioannidis How to Make More Published ResearchTrue, October 21, 2014 DOI: 10.1371/journal.pmed.1001747
Sandve GK, NekrutenkoA,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible Computational Research. PLoS
Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
HOW?
27. Summary
• Replicable Science is
hard work and poorly
rewarded
• Reproducible Science
=> Transparent Science
but ideally needs to be
born that way
• Collective responsibility
28. • Barend Mons
• Sean Bechhofer
• Philip Bourne
• Matthew Gamble
• Raul Palma
• Jun Zhao
• Alan Williams
• Stian Soiland-Reyes
• Paul Groth
• Tim Clark
• Juliana Freire
• Alejandra Gonzalez-Beltran
• Philippe Rocca-Serra
• Ian Cottam
• Susanna Sansone
• Kristian Garza
• Hylke Koers
• Norman Morrison
• Ian Fore
• Jill Mesirov
• Robert Stevens
• Steve Pettifer
http://www.researchobject.org
http://www.wf4ever-project.org
http://www.fair-dom.org
http://www.software.ac.uk