Reproducibility
(and the R*) of
Science:
motivations,
challenges and
trends
Professor Carole Goble
The University of Manchester, UK
Software Sustainability Institute, UK
Head of Node ELIXIR-UK
ELIXIR, IBISBA, FAIRDOM Association e.V., BioExcel Life Science Infrastructures
carole.goble@manchester.ac.uk
IRCDL Pisa 31 Jan – 1 Feb 2019
Beware.
Results may vary.
Reproducibility of Science…
A fundamental given of the
Scientific Method….
https://xkcd.com/242/
Reproducibility on the Agenda
The famous Nature survey
1576 researchers, 2016
https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
Reporting and Availability
John P. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005, DOI:
10.1371/journal.pmed.0020124
incomplete reporting of method, software configurations, resources,
parameters & resource versions, missed steps, missing data, vague
methods, missing software, unreproducible environments.
Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013
BetterTraining
Methodological Support
More robust designs
Independent
accountability
Collaboration & team science
Diversifying peer review
Better Practices
Funding replication studies
Rewarding right behaviour
Design Flaws
HARKIng (hypothesizing after the results are
known), cherry picking data, random seed
reporting, non-independent bias, poor
positive and negative controls, poor
normalisation, arbitrary cut-offs,
premature data triage, un-validated materials,
improper statistical analysis, poor statistical
power, stop when “get to the right answer”,
software misconfigurations, misapplied black
box software
Trend: Policy and advice proliferation
Findable ( and be Citable)
Accessible (and beTrackable)
Interoperable (and be Intelligible)
Reusable (and be Reproducible)
Record, Automate,
Contain, Access
*Based on
Scientific publications
• announce a result
• convince readers to trust it.
Experimental science
• describe the results
• provide a clear enough the
materials and protocol to allow
successful repetition and
extension. (Jill Mesirov 2010*)
Computational science
• describe the results
• provide the complete software
development environment,
data, instructions, techniques
(which generated the figures)
(David Donoho 1995*).
“Virtual Witnessing”
*Leviathan and theAir-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.Joseph Wright, Experiment with the Air Pump c. 1768
“virtually witnessing” the “moist lab”
Experiment
Setup
Methods
Algorithms, spec. of the
analysis steps, models…
Materials Datasets, parameters,
algorithm seeds…
Instruments Codes, services, scripts,
workflows, reference
datasets…
Laboratory Software and hardware
infrastructure…
Wet Dry
Physical Lab
Chemicals, reagents,
samples, strain of mouse…
Mass specs, sequencers,
microscopes, calibrations…
Lab protocols, standard
operating procedures…
International Mouse Strain
Resource (IMSR)
Bramhall et al QUALITY OF METHODS
REPORTING IN ANIMAL MODELS OF COLITIS
Inflammatory Bowel Diseases, , 2015,
“Only one of the 58 papers
reported all essential criteria on
our checklist. Animal age,
gender, housing conditions and
mortality/morbidity were all
poorly reported…..”
The Materials
Turning FAIR into reality
Final report and action plan from the European Commission expert group on FAIR data ,
Nov 2018
The Materials
The Methods
Method Reproducibility
the provision of enough detail about
study procedures and data so the
same procedures could, in theory or
in actuality, be exactly repeated.
Result Reproducibility
the same results from the conduct of
an independent study whose
procedures are as closely matched
to the original experiment as possible
Procedure = Software, SOP, Lab Protocol, Workflow, Script.
Tools, Technologies, Techniques. A whole bunch of them together.
Goodman, et al ScienceTranslational Medicine 8 (341) 2016
The Methods
Computational
Workflows/Scripts
Experimental Standard
Operating Procedures
Assemble
Methods, Materials Experiment
Observe
Simulate
Analyse
Results
Publish/
Share Results
Manage
Results
Plan
Run
“I can’t immediately reproduce the
research in my own laboratory. It
took an estimated 280 hours for an
average user to approximately
reproduce the paper.
Garijo et al. 2013 Quantifying Reproducibility in
Computational Biology: The Case of the Tuberculosis
Drugome PLOS ONE.
re-compute
replicate
rerun
repeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regenerate
revise
recycle
redo
robustness
tolerance
verify
compliance
validate assurance
remix
conceptually replicate
“show A is true by doing B rather
than doing A again”
verify but not falsify
[Yong, Nature 485, 2012]
The R* Brouhaha
repair
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Replicate
Reproduce
Reuse /
Generalise
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
My Research Environment
robust, defensible, productive
“Micro” Reproducibility
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
Replicate
Same data, set up
Same task/goal
Same materials
Same methods
Different group
Our Research Environment
review, validate, certify
Publication Environment
review, validate, certify
“Sameness”
Accountability
Trust
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
Replicate
Same data, set up
Same task/goal
Same materials
Same methods
Different group
Reproduce
Different data, set up
Same task/goal
Same/different materials
Same/different methods
Different group
Their Research Environment
review, compare, verify
“Similar”
Accountability
Trust
“Macro” Reproducibility
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
Replicate
Same data, set up
Same task/goal
Same materials
Same methods
Different group/lab
Reproduce
Different data, set up
Same task/goal
Same/different materials
Same/different methods
Different group/lab
Reuse / Generalise
Different data, set up
Different task/goal
Same/different materials
Same/different methods
Different group/lab
Transferred
Repurposed
Trusted
Productivity
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Reused
Experimental
outputs
Outputs retained
Outputs Used and
Shared
Outputs
Published
Not all outputs are worth the
burden of metadata unless its
automagical and a side-effect
Why does this matter?
Moving between different environments
Recreating / accessing common environments
Fragmented, decentralised, multi-various and complicated …
Research Infrastructure
Services
Assemble
Methods, Materials Experiment
ObserveSimulate
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Publishing Services
Share
Results
Manage
Results
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante,
Candela, Castelli, Manghi, Pagano, D-Lib 2015
Why does this matter?
Accuracy, Sameness, Change, Dependencies
What has been fixed, must be
fixed, what variations are valid.
We snapshot publications but
science does not stay still.
Replication may be harder than
reproducing and will decay as the
tools, methods, software, data …
move on or are inherently
unavailable.
What are the dependencies.
What are the black box steps.
Results
may vary
Why does this matter?
More than just “FAIR” data
Open Access to data, software and platforms
Rich descriptions of data, software, methods
• Transparent record of steps, dependencies,
provenance.
• Reporting robustness of methods, versions,
parameters, variation sensitivities
• Portability and preservation of the software
and the data
Should be embedded in Research Practice not a
burdensome after thought at publication.
Keeping track a side effect of using research
tools.
Transparency
https://cos.io/our-services/top-guidelines/
Extreme example
Precision medicine HTS pipelines
Alterovitz, Dean, Goble, Crusoe, Soiland-Reyes et al Enabling
Precision Medicine via standard communication of NGS provenance,
analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
parameters
Why does this matter?
• Reproducibility is a spectrum
• Strength and difficulty
depends on context and
purpose in the scholarly
workflow
• Beware reproducibility (and
FAIR) dogmatists.
Why does this matter?
forced fragmentation and decentralisation
distributed knowledge infrastructures
De-contextualised
Static, Fragmented
Lost Semantic linking
Contextualised
Active, Unified
Semantic linking
Buried in a
PDF
figure
Reading and Writing Scattered….
Trend: Research Commons & Hubs
DOI: 10.15490/seek.1.investigation.56
Snapshot preservation
http://fairdomhub.org
Trend: Research Objects
context, data, methods, models, provenance bundled together
Handling and embracing decentralisation and enabling portability
Trend: Tool/Environment Proliferation
built in reproducibility by side effect, reproducibility ramps,
disguised as productivity. If only they worked together…
Standards and templates for reporting
methods, provenance, tracking
Tools and platforms for capturing, tracking,
structuring, organising assets throughout the
whole project research cycle.
Shared Cloud-based
analysis systems &
collaboratories
Workflow/Script Automation
Containers for
executable software
dependencies &
portability
Electronic
Lab note
books
Open source software
repositories
Models and methods archives
Research
Commons
Trend: Publication Tool Proliferation
mostly as an additional step
eLife Reproducible Document Stack
publish computationally reproducible
research articles online.
Data2Paper
Challenges
Provocation:
why are we still publishing articles?
For Reproducible Research
Release Research Objects
Jennifer Schopf,Treating Data Like Software: ACase for ProductionQuality Data, JCDL 2012
Analogous to software products and
practices rather than data or articles or
library practices…
Treat ALL Products and
ALL Research Like Software
Time Higher Education Supplement, 14 May 2015
Acknowledgements
• Dagstuhl Seminar 16041 , January 2016
– http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041
• ATI Symposium Reproducibility, Sustainability and Preservation , April 2016
– https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/
– https://osf.io/bcef5/files/
• Nicola Ferro
• CTitus Brown
• Juliana Freire
• David De Roure
• Stian Soiland-Reyes
• Barend Mons
• Tim Clark
• Daniel Garijo
• Norman Morrison
• Matt Spritzer
• Scott Edmunds
• Paolo Manghi …
• Reproducibility rubric https://osf.io/zjvh2/

Reproducibility (and the R*) of Science: motivations, challenges and trends

  • 1.
    Reproducibility (and the R*)of Science: motivations, challenges and trends Professor Carole Goble The University of Manchester, UK Software Sustainability Institute, UK Head of Node ELIXIR-UK ELIXIR, IBISBA, FAIRDOM Association e.V., BioExcel Life Science Infrastructures carole.goble@manchester.ac.uk IRCDL Pisa 31 Jan – 1 Feb 2019 Beware. Results may vary.
  • 2.
    Reproducibility of Science… Afundamental given of the Scientific Method…. https://xkcd.com/242/
  • 3.
  • 4.
    The famous Naturesurvey 1576 researchers, 2016 https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
  • 5.
    Reporting and Availability JohnP. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005, DOI: 10.1371/journal.pmed.0020124 incomplete reporting of method, software configurations, resources, parameters & resource versions, missed steps, missing data, vague methods, missing software, unreproducible environments. Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013 BetterTraining Methodological Support More robust designs Independent accountability Collaboration & team science Diversifying peer review Better Practices Funding replication studies Rewarding right behaviour Design Flaws HARKIng (hypothesizing after the results are known), cherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, poor normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations, misapplied black box software
  • 6.
    Trend: Policy andadvice proliferation Findable ( and be Citable) Accessible (and beTrackable) Interoperable (and be Intelligible) Reusable (and be Reproducible) Record, Automate, Contain, Access
  • 7.
    *Based on Scientific publications •announce a result • convince readers to trust it. Experimental science • describe the results • provide a clear enough the materials and protocol to allow successful repetition and extension. (Jill Mesirov 2010*) Computational science • describe the results • provide the complete software development environment, data, instructions, techniques (which generated the figures) (David Donoho 1995*).
  • 8.
    “Virtual Witnessing” *Leviathan andtheAir-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.Joseph Wright, Experiment with the Air Pump c. 1768
  • 9.
    “virtually witnessing” the“moist lab” Experiment Setup Methods Algorithms, spec. of the analysis steps, models… Materials Datasets, parameters, algorithm seeds… Instruments Codes, services, scripts, workflows, reference datasets… Laboratory Software and hardware infrastructure… Wet Dry Physical Lab Chemicals, reagents, samples, strain of mouse… Mass specs, sequencers, microscopes, calibrations… Lab protocols, standard operating procedures…
  • 10.
    International Mouse Strain Resource(IMSR) Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases, , 2015, “Only one of the 58 papers reported all essential criteria on our checklist. Animal age, gender, housing conditions and mortality/morbidity were all poorly reported…..” The Materials
  • 11.
    Turning FAIR intoreality Final report and action plan from the European Commission expert group on FAIR data , Nov 2018 The Materials
  • 12.
    The Methods Method Reproducibility theprovision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated. Result Reproducibility the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible Procedure = Software, SOP, Lab Protocol, Workflow, Script. Tools, Technologies, Techniques. A whole bunch of them together. Goodman, et al ScienceTranslational Medicine 8 (341) 2016
  • 13.
  • 14.
    Assemble Methods, Materials Experiment Observe Simulate Analyse Results Publish/ ShareResults Manage Results Plan Run “I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper. Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE.
  • 15.
    re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle redo robustness tolerance verify compliance validate assurance remix conceptuallyreplicate “show A is true by doing B rather than doing A again” verify but not falsify [Yong, Nature 485, 2012] The R* Brouhaha repair
  • 16.
    The R* Nautilus withthanks to Nicola Ferro for the visualisation Repeat Replicate Reproduce Reuse / Generalise
  • 17.
    The R* Nautilus withthanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab My Research Environment robust, defensible, productive “Micro” Reproducibility
  • 18.
    The R* Nautilus withthanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab Replicate Same data, set up Same task/goal Same materials Same methods Different group Our Research Environment review, validate, certify Publication Environment review, validate, certify “Sameness” Accountability Trust
  • 19.
    The R* Nautilus withthanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab Replicate Same data, set up Same task/goal Same materials Same methods Different group Reproduce Different data, set up Same task/goal Same/different materials Same/different methods Different group Their Research Environment review, compare, verify “Similar” Accountability Trust “Macro” Reproducibility
  • 20.
    The R* Nautilus withthanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab Replicate Same data, set up Same task/goal Same materials Same methods Different group/lab Reproduce Different data, set up Same task/goal Same/different materials Same/different methods Different group/lab Reuse / Generalise Different data, set up Different task/goal Same/different materials Same/different methods Different group/lab Transferred Repurposed Trusted Productivity
  • 21.
    The R* Nautilus withthanks to Nicola Ferro for the visualisation Reused Experimental outputs Outputs retained Outputs Used and Shared Outputs Published Not all outputs are worth the burden of metadata unless its automagical and a side-effect
  • 22.
    Why does thismatter? Moving between different environments Recreating / accessing common environments Fragmented, decentralised, multi-various and complicated … Research Infrastructure Services Assemble Methods, Materials Experiment ObserveSimulate Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Publishing Services Share Results Manage Results Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
  • 24.
    Why does thismatter? Accuracy, Sameness, Change, Dependencies What has been fixed, must be fixed, what variations are valid. We snapshot publications but science does not stay still. Replication may be harder than reproducing and will decay as the tools, methods, software, data … move on or are inherently unavailable. What are the dependencies. What are the black box steps. Results may vary
  • 25.
    Why does thismatter? More than just “FAIR” data Open Access to data, software and platforms Rich descriptions of data, software, methods • Transparent record of steps, dependencies, provenance. • Reporting robustness of methods, versions, parameters, variation sensitivities • Portability and preservation of the software and the data Should be embedded in Research Practice not a burdensome after thought at publication. Keeping track a side effect of using research tools.
  • 26.
  • 27.
    Extreme example Precision medicineHTS pipelines Alterovitz, Dean, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783 parameters
  • 28.
    Why does thismatter? • Reproducibility is a spectrum • Strength and difficulty depends on context and purpose in the scholarly workflow • Beware reproducibility (and FAIR) dogmatists.
  • 29.
    Why does thismatter? forced fragmentation and decentralisation distributed knowledge infrastructures De-contextualised Static, Fragmented Lost Semantic linking Contextualised Active, Unified Semantic linking Buried in a PDF figure Reading and Writing Scattered….
  • 30.
    Trend: Research Commons& Hubs DOI: 10.15490/seek.1.investigation.56 Snapshot preservation http://fairdomhub.org
  • 31.
    Trend: Research Objects context,data, methods, models, provenance bundled together Handling and embracing decentralisation and enabling portability
  • 32.
    Trend: Tool/Environment Proliferation builtin reproducibility by side effect, reproducibility ramps, disguised as productivity. If only they worked together… Standards and templates for reporting methods, provenance, tracking Tools and platforms for capturing, tracking, structuring, organising assets throughout the whole project research cycle. Shared Cloud-based analysis systems & collaboratories Workflow/Script Automation Containers for executable software dependencies & portability Electronic Lab note books Open source software repositories Models and methods archives Research Commons
  • 33.
    Trend: Publication ToolProliferation mostly as an additional step eLife Reproducible Document Stack publish computationally reproducible research articles online. Data2Paper
  • 34.
  • 35.
    Provocation: why are westill publishing articles? For Reproducible Research Release Research Objects Jennifer Schopf,Treating Data Like Software: ACase for ProductionQuality Data, JCDL 2012 Analogous to software products and practices rather than data or articles or library practices… Treat ALL Products and ALL Research Like Software Time Higher Education Supplement, 14 May 2015
  • 36.
    Acknowledgements • Dagstuhl Seminar16041 , January 2016 – http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041 • ATI Symposium Reproducibility, Sustainability and Preservation , April 2016 – https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/ – https://osf.io/bcef5/files/ • Nicola Ferro • CTitus Brown • Juliana Freire • David De Roure • Stian Soiland-Reyes • Barend Mons • Tim Clark • Daniel Garijo • Norman Morrison • Matt Spritzer • Scott Edmunds • Paolo Manghi …
  • 37.
    • Reproducibility rubrichttps://osf.io/zjvh2/