What is reproducibility goble-clean

What is Reproducibility?
The R* Brouhaha
Professor Carole Goble
The University of Manchester, UK
Software Sustainability Institute UK
carole.goble@manchester.ac.uk
AlanTuring Institute Symposium Reproducibility, Sustainability and Preservation , 6-7 April 2016, Oxford, UK

“When I use a word," Humpty Dumpty
said in rather a scornful tone, "it means
just what I choose it to mean - neither
more nor less.”
Carroll,Through the Looking Glass
re-compute
replicate
rerun
repeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regenerate
revise
recycle
redo
robustness
tolerance
verificationcompliancevalidation assurance
remix

Reproducibility of
Reproducibility Research

http://www.dagstuhl.de/16041
24. – 29. January 2016, Dagstuhl Seminar 16041
Reproducibility of Data-Oriented Experiments in e-Science

10 Simple Rules for Reproducible
Computational Research
1. For Every Result, Keep Track of How It Was
Produced
2. Avoid Manual Data Manipulation Steps
3. Archive the Exact Versions of All External
Programs Used
4. Version Control All Custom Scripts
5. Record All Intermediate Results, When Possible in
Standardized Formats
6. For Analyses That Include Randomness, Note
Underlying Random Seeds
7. Always Store Raw Data behind Plots
8. Generate Hierarchical Analysis Output, Allowing
Layers of Increasing Detail to Be Inspected
9. Connect Textual Statements to Underlying
Results
10. Provide Public Access to Scripts, Runs, and
Results
Sandve GK, Nekrutenko A,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible
Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Record
Everything
Automate
Everything

Scientific publications goals:
(i) announce a result and
(ii) convince readers that the result
is correct.
Papers in experimental science
should describe the results and
provide a clear enough protocol to
allow successful repetition and
extension.
Papers in computational science
should describe the results and
provide the complete software
development environment, data
and set of instructions which
generated the figures.
Virtual Witnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.
Jill Mesirov
David Donoho

Datasets, Data collections
Standard operating procedures
Software, algorithms
Configurations,
Tools and apps, services
Codes, code libraries
Workflows, scripts
System software
Infrastructure
Compilers, hardware

Analogy: The Lab
data science | data-driven science
1. Philosophy
2. Practice

Witnessing “Datascopes”
Input Data
Software
Output Data
Config
Parameters
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorithm seeds
Instruments
codes, services, scripts,
underlying libraries,
workflows, , ref resources
Laboratory
sw and hw infrastructure,
systems software,
integrative platforms
computational environment

Model Driven Science – can I rerun my model?
Model Sweeps,What are the sensitivities?

“Micro” Reproducibility
“Macro” Reproducibility

Repeat, Replicate, Robust
Why the differences?
Reproduce
[CTitus Brown]
https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html

Computational
analyses
Productivity
Track differences

Computational
analyses?
Repeatability:
Sameness
Same result
1 Lab
1 experiment
Reproducibility:
Similarity
Similar result
> 1 Lab
> 1 experiment

“an experiment is reproducible until
another laboratory tries to repeat it.”
Alexander Kohn

reviewers want additional
work
statistician wants more
runs
analysis needs to be
repeated
post-doc leaves, student
arrives
new data, revised data
updated versions of
algorithms/codes
sample was
contaminated

Measuring Information Gain from Reproducibility
Research goal
Method/Alg.
Platform/Exec Env
Data Parameters
Input data
Actors
Information Gain
Implementation/Code
No change
Change
Don’t care
https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/

Taxonomy of actions towards
improving reproducibility in
Computer Science.
https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/

Methods
Materials
algorithm seeds
Instruments
workflows, ref datasets
Laboratory
systems software,
Input Data
Software
Output Data
Config
Parameters
“Datascope” Entropy -> Preservation
“Replication / Reproducibility Window”

Change:
Science, methods, datasets
Questions don’t change, answers do.
Materials unavailable
one offs, streams,
stochastics, sensitivities,
licensing
scale, non-portable data
Change:
Instruments break, labs decay
Active ref datasets and services
Platforms & resources unavailable
supercomputer scale
non-portable software
Archived vs Active Environment
Isolated vs Open Distributed Ecosystem

T1 T2
Evolving Reference
Knowledge Bases
e.g. UNIPROT

Repeat harder than Reproduce?
Repeating the experiment or the set up?
• When the environment is active

Form?
Function?
Methods
Materials
algorithm seeds
Instruments
workflows, ref datasets
Laboratory
systems software,

How? Preserve by Reporting, Reproduce by Reading
ProvenanceTraces, Notebooks, Rich Metadata
Archived Record
Methods
Materials
Instruments
Laboratory

How?
Preserve by Reporting, Reproduce by Reading
ProvenanceTraces, Notebooks, Rich Metadata
Archived Record
standards, common metadata
Provenance
Workflows,
Scripts
ELNs
Methods
Materials
Instruments
Laboratory

How? Preserve by Maintaining, Repairing,VMs
Reproduce by Running, Emulating, Reconstructing
Active Instrument
Methods
Materials
Instruments
Laboratory

How? Preserve by Maintaining, Repairing,VMs
Reproduce by Running, Emulating, Reconstructing
Active Instrument, Byte level
Methods
Materials
Instruments
Laboratory

Levels of Computational Reproducibility
Coverage: how
much of an
experiment is
reproducible
OriginalExperimentSimilarExperimentDifferentExperiment
Portability
Depth: how much of an experiment is available
Binaries +
Data
Source Code /
Workflow
+ Data
Binaries +
Data +
Dependencies
Source Code /
Workflow
+ Data +
Dependencies
Virtual Machine
Binaries +
Data +
Dependencies
Virtual Machine
Source Code /
Workflow
+ Data +
Dependencies
Figures +
Data
[Freire, 2014]
Minimum:
data and source
code available
under terms
that permit
inspection and
execution.

Repeatable Environments
*https://2016-oslo-repeatability.readthedocs.org/en/latest/overview-and-agenda.html
[C.Titus Brown*]
Metadata Objects : Reproducible Reporting, Exchange
Checklist
ProvenanceTracking
Versioning
Dependencies
container

provenance
dependencies
steps, features
transparency
portability
robustness
preservation
access description
available
standards
common APIs
licensing
identifiers
standards,
common metadata
change
variation sensitivity
versioning
packaging
So, What is Reproducibility? Being FAIR

So, What is Reproducibility? Being FAIR

What is Reproducibility?
Why, When, Where, Who for, Who by, How
Special thanks to
• C Titus Brown
• Juliana Freire
• David De Roure
• Stian Soiland-Reyes
• Barend Mons
• Tim Clark
• Daniel Garijo
• Wf4Ever and Research Object teams
• Dagstuhl Seminar 16041
• Force11 http://www.force11.org

What is reproducibility goble-clean

More Related Content

Viewers also liked

More from Carole Goble

Recently uploaded

What is reproducibility goble-clean