This document discusses the issue of reproducibility in research. It begins by noting that 47/53 "landmark" publications could not be replicated, and lists some common causes of irreproducibility like cherry-picking data and improper statistical analysis. It then looks at reproducibility from the perspective of different stakeholders like researchers, funders, and the public. Next, it distinguishes between different levels of reproducibility like replicating a study with the same versus different data or software. The document advocates making data and code "first class citizens" in research and describes emerging tools and systems that can help improve reproducibility. It ends by asking questions about what more can be done by individual labs and the research community as a whole to enhance reproducibility.
Hierarchy of management that covers different levels of management
Reproducibility and the Research Lifecycle
1. Reproducibility
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
1
@pebourne
3Dsig Chicago July 10, 2018
2. This is a discussion.. I am merely
providing some context …
The real work comes this afternoon
at 2pm
2
3. Collaborative structural biology using machine
learning and Jupyter notebook
Fergus Boyles and Fergus Imrie
Department of Statistics, University of Oxford
ISMB July 2018
- Live interactive demonstration
- Follow along during the presentation, or use as a
reference afterwards
- Materials:
http://opig.stats.ox.ac.uk/webapps/ISMB_2018.html
GitHub instructions: https://github.com/FBoyles/3dsig
6. Causality …
• Cherry picking data
• Misapplication of black box software
• Bias
• Poor positive and negative controls
• Improper statistical analysis
• Etc …
6
The review process itself under threat does not catch all of this
7. Its useful to look at the issue through
the eyes of different stakeholders
• Researchers – on one hand reproducibility is like
broccoli – no one wants to, but you know you
should eat it, on the other, we all know we spend
too much time recreating the research of others.
• Funders – they are demanding it – what does that
mean?
• Publishers – they are demanding it too – what does
that mean?
• Public – just another attack on the value of science
7
9. Its more complex than that…
• Infrastructures (hardware, compilers, libraries,
languages etc. change)
• There is the process through which the research is
done…
• Different parameters
• Different protocols / workflows
9
10. 3Dsigers do pretty well relative to
other disciplines.. but we could do
better
• Major public data repositories
• Multiple declarations for depositing data
• Thriving open source community
• Data standardisation efforts
• Core facilities
• Heroic data campaigns
• International and national coordination
11. data/code as first class citizen
http://www.ncbi.nlm.nih.gov/pubmed/26207759
Only 12% of data from research is
preserved
[Adapted from Carole Goble]
12. For Labs - Incentives
12
“I can’t immediately reproduce the
research in my own laboratory. It
took an estimated 280 hours for an
average user to approximately
reproduce the paper.
Data/software versions. Workflows
are maturing and becoming
helpful”
Garijo et al. 2013 Quantifying Reproducibility in Computational Biology:
The Case of the Tuberculosis Drugome PLOS ONE 8(11): e80278.
14. the neylon equation
Process =
Interest
Friction
x
Number people
reach
Cameron Neylon, BOSC 2013, http://cameronneylon.net/
lower friction so born reproducible
17. The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Repositories
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
18. Questions
• What is missing from this discussion?
• Where do you see the balance between the pain
and the gain?
• Is your lab doing anything to improve the situation,
if so what?
• Should we and could we do anything as a
community?
18