SlideShare a Scribd company logo
1 of 37
Download to read offline
Reproducibility
(and the R*) of
Science:
motivations,
challenges and
trends
Professor Carole Goble
The University of Manchester, UK
Software Sustainability Institute, UK
Head of Node ELIXIR-UK
ELIXIR, IBISBA, FAIRDOM Association e.V., BioExcel Life Science Infrastructures
carole.goble@manchester.ac.uk
IRCDL Pisa 31 Jan – 1 Feb 2019
Beware.
Results may vary.
Reproducibility of Science…
A fundamental given of the
Scientific Method….
https://xkcd.com/242/
Reproducibility on the Agenda
The famous Nature survey
1576 researchers, 2016
https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
Reporting and Availability
John P. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005, DOI:
10.1371/journal.pmed.0020124
incomplete reporting of method, software configurations, resources,
parameters & resource versions, missed steps, missing data, vague
methods, missing software, unreproducible environments.
Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013
BetterTraining
Methodological Support
More robust designs
Independent
accountability
Collaboration & team science
Diversifying peer review
Better Practices
Funding replication studies
Rewarding right behaviour
Design Flaws
HARKIng (hypothesizing after the results are
known), cherry picking data, random seed
reporting, non-independent bias, poor
positive and negative controls, poor
normalisation, arbitrary cut-offs,
premature data triage, un-validated materials,
improper statistical analysis, poor statistical
power, stop when “get to the right answer”,
software misconfigurations, misapplied black
box software
Trend: Policy and advice proliferation
Findable ( and be Citable)
Accessible (and beTrackable)
Interoperable (and be Intelligible)
Reusable (and be Reproducible)
Record, Automate,
Contain, Access
*Based on
Scientific publications
• announce a result
• convince readers to trust it.
Experimental science
• describe the results
• provide a clear enough the
materials and protocol to allow
successful repetition and
extension. (Jill Mesirov 2010*)
Computational science
• describe the results
• provide the complete software
development environment,
data, instructions, techniques
(which generated the figures)
(David Donoho 1995*).
“Virtual Witnessing”
*Leviathan and theAir-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.Joseph Wright, Experiment with the Air Pump c. 1768
“virtually witnessing” the “moist lab”
Experiment
Setup
Methods
Algorithms, spec. of the
analysis steps, models…
Materials Datasets, parameters,
algorithm seeds…
Instruments Codes, services, scripts,
workflows, reference
datasets…
Laboratory Software and hardware
infrastructure…
Wet Dry
Physical Lab
Chemicals, reagents,
samples, strain of mouse…
Mass specs, sequencers,
microscopes, calibrations…
Lab protocols, standard
operating procedures…
International Mouse Strain
Resource (IMSR)
Bramhall et al QUALITY OF METHODS
REPORTING IN ANIMAL MODELS OF COLITIS
Inflammatory Bowel Diseases, , 2015,
“Only one of the 58 papers
reported all essential criteria on
our checklist. Animal age,
gender, housing conditions and
mortality/morbidity were all
poorly reported…..”
The Materials
Turning FAIR into reality
Final report and action plan from the European Commission expert group on FAIR data ,
Nov 2018
The Materials
The Methods
Method Reproducibility
the provision of enough detail about
study procedures and data so the
same procedures could, in theory or
in actuality, be exactly repeated.
Result Reproducibility
the same results from the conduct of
an independent study whose
procedures are as closely matched
to the original experiment as possible
Procedure = Software, SOP, Lab Protocol, Workflow, Script.
Tools, Technologies, Techniques. A whole bunch of them together.
Goodman, et al ScienceTranslational Medicine 8 (341) 2016
The Methods
Computational
Workflows/Scripts
Experimental Standard
Operating Procedures
Assemble
Methods, Materials Experiment
Observe
Simulate
Analyse
Results
Publish/
Share Results
Manage
Results
Plan
Run
“I can’t immediately reproduce the
research in my own laboratory. It
took an estimated 280 hours for an
average user to approximately
reproduce the paper.
Garijo et al. 2013 Quantifying Reproducibility in
Computational Biology: The Case of the Tuberculosis
Drugome PLOS ONE.
re-compute
replicate
rerun
repeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regenerate
revise
recycle
redo
robustness
tolerance
verify
compliance
validate assurance
remix
conceptually replicate
“show A is true by doing B rather
than doing A again”
verify but not falsify
[Yong, Nature 485, 2012]
The R* Brouhaha
repair
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Replicate
Reproduce
Reuse /
Generalise
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
My Research Environment
robust, defensible, productive
“Micro” Reproducibility
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
Replicate
Same data, set up
Same task/goal
Same materials
Same methods
Different group
Our Research Environment
review, validate, certify
Publication Environment
review, validate, certify
“Sameness”
Accountability
Trust
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
Replicate
Same data, set up
Same task/goal
Same materials
Same methods
Different group
Reproduce
Different data, set up
Same task/goal
Same/different materials
Same/different methods
Different group
Their Research Environment
review, compare, verify
“Similar”
Accountability
Trust
“Macro” Reproducibility
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Repeat
Same data, set up
Same task/goal
Same materials
Same methods
Same group/lab
Replicate
Same data, set up
Same task/goal
Same materials
Same methods
Different group/lab
Reproduce
Different data, set up
Same task/goal
Same/different materials
Same/different methods
Different group/lab
Reuse / Generalise
Different data, set up
Different task/goal
Same/different materials
Same/different methods
Different group/lab
Transferred
Repurposed
Trusted
Productivity
The R* Nautilus
with thanks to Nicola Ferro for the visualisation
Reused
Experimental
outputs
Outputs retained
Outputs Used and
Shared
Outputs
Published
Not all outputs are worth the
burden of metadata unless its
automagical and a side-effect
Why does this matter?
Moving between different environments
Recreating / accessing common environments
Fragmented, decentralised, multi-various and complicated …
Research Infrastructure
Services
Assemble
Methods, Materials Experiment
ObserveSimulate
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Publishing Services
Share
Results
Manage
Results
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante,
Candela, Castelli, Manghi, Pagano, D-Lib 2015
Why does this matter?
Accuracy, Sameness, Change, Dependencies
What has been fixed, must be
fixed, what variations are valid.
We snapshot publications but
science does not stay still.
Replication may be harder than
reproducing and will decay as the
tools, methods, software, data …
move on or are inherently
unavailable.
What are the dependencies.
What are the black box steps.
Results
may vary
Why does this matter?
More than just “FAIR” data
Open Access to data, software and platforms
Rich descriptions of data, software, methods
• Transparent record of steps, dependencies,
provenance.
• Reporting robustness of methods, versions,
parameters, variation sensitivities
• Portability and preservation of the software
and the data
Should be embedded in Research Practice not a
burdensome after thought at publication.
Keeping track a side effect of using research
tools.
Transparency
https://cos.io/our-services/top-guidelines/
Extreme example
Precision medicine HTS pipelines
Alterovitz, Dean, Goble, Crusoe, Soiland-Reyes et al Enabling
Precision Medicine via standard communication of NGS provenance,
analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
parameters
Why does this matter?
• Reproducibility is a spectrum
• Strength and difficulty
depends on context and
purpose in the scholarly
workflow
• Beware reproducibility (and
FAIR) dogmatists.
Why does this matter?
forced fragmentation and decentralisation
distributed knowledge infrastructures
De-contextualised
Static, Fragmented
Lost Semantic linking
Contextualised
Active, Unified
Semantic linking
Buried in a
PDF
figure
Reading and Writing Scattered….
Trend: Research Commons & Hubs
DOI: 10.15490/seek.1.investigation.56
Snapshot preservation
http://fairdomhub.org
Trend: Research Objects
context, data, methods, models, provenance bundled together
Handling and embracing decentralisation and enabling portability
Trend: Tool/Environment Proliferation
built in reproducibility by side effect, reproducibility ramps,
disguised as productivity. If only they worked together…
Standards and templates for reporting
methods, provenance, tracking
Tools and platforms for capturing, tracking,
structuring, organising assets throughout the
whole project research cycle.
Shared Cloud-based
analysis systems &
collaboratories
Workflow/Script Automation
Containers for
executable software
dependencies &
portability
Electronic
Lab note
books
Open source software
repositories
Models and methods archives
Research
Commons
Trend: Publication Tool Proliferation
mostly as an additional step
eLife Reproducible Document Stack
publish computationally reproducible
research articles online.
Data2Paper
Challenges
Provocation:
why are we still publishing articles?
For Reproducible Research
Release Research Objects
Jennifer Schopf,Treating Data Like Software: ACase for ProductionQuality Data, JCDL 2012
Analogous to software products and
practices rather than data or articles or
library practices…
Treat ALL Products and
ALL Research Like Software
Time Higher Education Supplement, 14 May 2015
Acknowledgements
• Dagstuhl Seminar 16041 , January 2016
– http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041
• ATI Symposium Reproducibility, Sustainability and Preservation , April 2016
– https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/
– https://osf.io/bcef5/files/
• Nicola Ferro
• CTitus Brown
• Juliana Freire
• David De Roure
• Stian Soiland-Reyes
• Barend Mons
• Tim Clark
• Daniel Garijo
• Norman Morrison
• Matt Spritzer
• Scott Edmunds
• Paolo Manghi …
• Reproducibility rubric https://osf.io/zjvh2/

More Related Content

What's hot

RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seqPaul Gardner
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewDominic Suciu
 
Single-cell RNA-seq tutorial
Single-cell RNA-seq tutorialSingle-cell RNA-seq tutorial
Single-cell RNA-seq tutorialAaron Diaz
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...Anton Alexandrov
 
qPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsqPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsIntegrated DNA Technologies
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolHong ChangBum
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
2009 Il Progetto Genoma Umano (Shared)
2009 Il Progetto Genoma Umano (Shared)2009 Il Progetto Genoma Umano (Shared)
2009 Il Progetto Genoma Umano (Shared)Luca Gianfranceschi
 
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...Jean Fan
 
Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09Pascal Mayer
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide PolymorphismFazeehaAmjad
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 

What's hot (20)

Snp genotyping
Snp genotypingSnp genotyping
Snp genotyping
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seq
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
Single-cell RNA-seq tutorial
Single-cell RNA-seq tutorialSingle-cell RNA-seq tutorial
Single-cell RNA-seq tutorial
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
 
NGS File formats
NGS File formatsNGS File formats
NGS File formats
 
qPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsqPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific Applications
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
2009 Il Progetto Genoma Umano (Shared)
2009 Il Progetto Genoma Umano (Shared)2009 Il Progetto Genoma Umano (Shared)
2009 Il Progetto Genoma Umano (Shared)
 
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
 
Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
 
Snp
SnpSnp
Snp
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 

Similar to Reproducibility (and the R*) of Science: motivations, challenges and trends

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsTimothy McPhillips
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
Roche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLRoche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLDominique Roche
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryResearch Information Network
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryCarole Goble
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)Dag Endresen
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Susanna-Assunta Sansone
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 

Similar to Reproducibility (and the R*) of Science: motivations, challenges and trends (20)

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research Objects
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Roche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLRoche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NL
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 

More from Carole Goble

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a VillageCarole Goble
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learningCarole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpCarole Goble
 

More from Carole Goble (20)

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 

Recently uploaded

6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...Chiheb Ben Hammouda
 
Understanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfUnderstanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfHabibouKarbo
 
lect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology pptlect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology pptzbyb6vmmsd
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspectsMansi Rastogi
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Interpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTInterpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTAlexander F. Mayer
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptxHarsha Patel
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...jana861314
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness1hk20is002
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyHemantThakare8
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 

Recently uploaded (20)

6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
Efficient Fourier Pricing of Multi-Asset Options: Quasi-Monte Carlo & Domain ...
 
Understanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfUnderstanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdf
 
lect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology pptlect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology ppt
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspects
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Interpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTInterpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWST
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiology
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Bioenergetics and the role of ATP to drive the beats of life.
Bioenergetics and the role of ATP to drive the beats of life.Bioenergetics and the role of ATP to drive the beats of life.
Bioenergetics and the role of ATP to drive the beats of life.
 

Reproducibility (and the R*) of Science: motivations, challenges and trends

  • 1. Reproducibility (and the R*) of Science: motivations, challenges and trends Professor Carole Goble The University of Manchester, UK Software Sustainability Institute, UK Head of Node ELIXIR-UK ELIXIR, IBISBA, FAIRDOM Association e.V., BioExcel Life Science Infrastructures carole.goble@manchester.ac.uk IRCDL Pisa 31 Jan – 1 Feb 2019 Beware. Results may vary.
  • 2. Reproducibility of Science… A fundamental given of the Scientific Method…. https://xkcd.com/242/
  • 4. The famous Nature survey 1576 researchers, 2016 https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
  • 5. Reporting and Availability John P. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005, DOI: 10.1371/journal.pmed.0020124 incomplete reporting of method, software configurations, resources, parameters & resource versions, missed steps, missing data, vague methods, missing software, unreproducible environments. Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013 BetterTraining Methodological Support More robust designs Independent accountability Collaboration & team science Diversifying peer review Better Practices Funding replication studies Rewarding right behaviour Design Flaws HARKIng (hypothesizing after the results are known), cherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, poor normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations, misapplied black box software
  • 6. Trend: Policy and advice proliferation Findable ( and be Citable) Accessible (and beTrackable) Interoperable (and be Intelligible) Reusable (and be Reproducible) Record, Automate, Contain, Access
  • 7. *Based on Scientific publications • announce a result • convince readers to trust it. Experimental science • describe the results • provide a clear enough the materials and protocol to allow successful repetition and extension. (Jill Mesirov 2010*) Computational science • describe the results • provide the complete software development environment, data, instructions, techniques (which generated the figures) (David Donoho 1995*).
  • 8. “Virtual Witnessing” *Leviathan and theAir-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.Joseph Wright, Experiment with the Air Pump c. 1768
  • 9. “virtually witnessing” the “moist lab” Experiment Setup Methods Algorithms, spec. of the analysis steps, models… Materials Datasets, parameters, algorithm seeds… Instruments Codes, services, scripts, workflows, reference datasets… Laboratory Software and hardware infrastructure… Wet Dry Physical Lab Chemicals, reagents, samples, strain of mouse… Mass specs, sequencers, microscopes, calibrations… Lab protocols, standard operating procedures…
  • 10. International Mouse Strain Resource (IMSR) Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases, , 2015, “Only one of the 58 papers reported all essential criteria on our checklist. Animal age, gender, housing conditions and mortality/morbidity were all poorly reported…..” The Materials
  • 11. Turning FAIR into reality Final report and action plan from the European Commission expert group on FAIR data , Nov 2018 The Materials
  • 12. The Methods Method Reproducibility the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated. Result Reproducibility the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible Procedure = Software, SOP, Lab Protocol, Workflow, Script. Tools, Technologies, Techniques. A whole bunch of them together. Goodman, et al ScienceTranslational Medicine 8 (341) 2016
  • 14. Assemble Methods, Materials Experiment Observe Simulate Analyse Results Publish/ Share Results Manage Results Plan Run “I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper. Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE.
  • 15. re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle redo robustness tolerance verify compliance validate assurance remix conceptually replicate “show A is true by doing B rather than doing A again” verify but not falsify [Yong, Nature 485, 2012] The R* Brouhaha repair
  • 16. The R* Nautilus with thanks to Nicola Ferro for the visualisation Repeat Replicate Reproduce Reuse / Generalise
  • 17. The R* Nautilus with thanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab My Research Environment robust, defensible, productive “Micro” Reproducibility
  • 18. The R* Nautilus with thanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab Replicate Same data, set up Same task/goal Same materials Same methods Different group Our Research Environment review, validate, certify Publication Environment review, validate, certify “Sameness” Accountability Trust
  • 19. The R* Nautilus with thanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab Replicate Same data, set up Same task/goal Same materials Same methods Different group Reproduce Different data, set up Same task/goal Same/different materials Same/different methods Different group Their Research Environment review, compare, verify “Similar” Accountability Trust “Macro” Reproducibility
  • 20. The R* Nautilus with thanks to Nicola Ferro for the visualisation Repeat Same data, set up Same task/goal Same materials Same methods Same group/lab Replicate Same data, set up Same task/goal Same materials Same methods Different group/lab Reproduce Different data, set up Same task/goal Same/different materials Same/different methods Different group/lab Reuse / Generalise Different data, set up Different task/goal Same/different materials Same/different methods Different group/lab Transferred Repurposed Trusted Productivity
  • 21. The R* Nautilus with thanks to Nicola Ferro for the visualisation Reused Experimental outputs Outputs retained Outputs Used and Shared Outputs Published Not all outputs are worth the burden of metadata unless its automagical and a side-effect
  • 22. Why does this matter? Moving between different environments Recreating / accessing common environments Fragmented, decentralised, multi-various and complicated … Research Infrastructure Services Assemble Methods, Materials Experiment ObserveSimulate Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Publishing Services Share Results Manage Results Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
  • 23.
  • 24. Why does this matter? Accuracy, Sameness, Change, Dependencies What has been fixed, must be fixed, what variations are valid. We snapshot publications but science does not stay still. Replication may be harder than reproducing and will decay as the tools, methods, software, data … move on or are inherently unavailable. What are the dependencies. What are the black box steps. Results may vary
  • 25. Why does this matter? More than just “FAIR” data Open Access to data, software and platforms Rich descriptions of data, software, methods • Transparent record of steps, dependencies, provenance. • Reporting robustness of methods, versions, parameters, variation sensitivities • Portability and preservation of the software and the data Should be embedded in Research Practice not a burdensome after thought at publication. Keeping track a side effect of using research tools.
  • 27. Extreme example Precision medicine HTS pipelines Alterovitz, Dean, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783 parameters
  • 28. Why does this matter? • Reproducibility is a spectrum • Strength and difficulty depends on context and purpose in the scholarly workflow • Beware reproducibility (and FAIR) dogmatists.
  • 29. Why does this matter? forced fragmentation and decentralisation distributed knowledge infrastructures De-contextualised Static, Fragmented Lost Semantic linking Contextualised Active, Unified Semantic linking Buried in a PDF figure Reading and Writing Scattered….
  • 30. Trend: Research Commons & Hubs DOI: 10.15490/seek.1.investigation.56 Snapshot preservation http://fairdomhub.org
  • 31. Trend: Research Objects context, data, methods, models, provenance bundled together Handling and embracing decentralisation and enabling portability
  • 32. Trend: Tool/Environment Proliferation built in reproducibility by side effect, reproducibility ramps, disguised as productivity. If only they worked together… Standards and templates for reporting methods, provenance, tracking Tools and platforms for capturing, tracking, structuring, organising assets throughout the whole project research cycle. Shared Cloud-based analysis systems & collaboratories Workflow/Script Automation Containers for executable software dependencies & portability Electronic Lab note books Open source software repositories Models and methods archives Research Commons
  • 33. Trend: Publication Tool Proliferation mostly as an additional step eLife Reproducible Document Stack publish computationally reproducible research articles online. Data2Paper
  • 35. Provocation: why are we still publishing articles? For Reproducible Research Release Research Objects Jennifer Schopf,Treating Data Like Software: ACase for ProductionQuality Data, JCDL 2012 Analogous to software products and practices rather than data or articles or library practices… Treat ALL Products and ALL Research Like Software Time Higher Education Supplement, 14 May 2015
  • 36. Acknowledgements • Dagstuhl Seminar 16041 , January 2016 – http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041 • ATI Symposium Reproducibility, Sustainability and Preservation , April 2016 – https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/ – https://osf.io/bcef5/files/ • Nicola Ferro • CTitus Brown • Juliana Freire • David De Roure • Stian Soiland-Reyes • Barend Mons • Tim Clark • Daniel Garijo • Norman Morrison • Matt Spritzer • Scott Edmunds • Paolo Manghi …
  • 37. • Reproducibility rubric https://osf.io/zjvh2/