Reproducibility and Scientific Research: why, what, where, when, who, how

Reproducibility
and Scientific
Research
Professor Carole Goble CBE FREng FBCS
The University of Manchester, UK
carole.goble@manchester.ac.uk
Open Data Manchester, 27th January 2015
icanhascheezburger.com
why, what, where, when, who, how
Scientific publications have at least
two goals:
(i) to announce a result and
(ii) to convince readers that the
result is correct
…..
papers in experimental [and
computational science] should
describe the results and provide a
clear enough protocol [or
algorithm] to allow successful
repetition and extension
Jill Mesirov
Accessible Reproducible Research
Science 22 January 2010:
Vol. 327 no. 5964 pp. 415-416
DOI: 10.1126/science.1179653
Virtual Witnessing / Minute Taking
[Pettifer, Attwood]
http://getutopia.com
Why smart parents often
tend to have smart kids
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
“an experiment is reproducible until
another laboratory tries to repeat it.”
Alexander Kohn
Reproducibility and Scientific Research: why, what, where, when, who, how
design
cherry picking data, random seed
reporting, non-independent bias, poor
positive and negative controls, dodgy
normalisation, arbitrary cut-offs,
premature data triage, un-validated
materials, improper statistical analysis,
poor statistical power, stop when “get to
the right answer”, software
misconfigurations misapplied black box
software
reporting
John P. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005,
DOI: 10.1371/journal.pmed.0020124
incomplete reporting of software configurations, parameters & resource
versions, missed steps, missing data, vague methods, missing software
Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013
Empirical
Statistical
Computational
V. Stodden, IMS Bulletin (2013)
Reproducibility and Scientific Research: why, what, where, when, who, how
Transparency / Availability Gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Out of 18 microarray papers, results
from 10 could not be reproduced
Researcher survey, 1202 respondents
(PARSE.insight 2010)
Sustainability
WHERE?
[Hylke Koers]
Broken software, broken science
• GeoffreyChang, Scripps Institute
• Homemade data-analysis program
inherited from another lab
• Flipped two columns of data,
inverting the electron-density map
used to derive protein structure
• Retract 3 Science papers and 2 papers
in other journals
• One paper cited by 364The structures of MsbA (purple) and
Sav1866 (green) overlap little (left)
until MsbA is inverted (right).
Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006:
vol. 314 no. 5807 1856-1857
http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
“An article about computational science in a
scientific publication is not the scholarship
itself, it is merely advertising of the
scholarship.The actual scholarship is the
complete software development
environment, [the complete data] and the
complete set of instructions which generated
the figures.”
David Donoho, “Wavelab and Reproducible
Research,” 1995
Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160
Ince et alThe case for open computer programs, Nature 482, 2012
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
third party services,
system software
infrastructure,
compilers
hardware
Self-contained codes??
Reproducibility and Scientific Research: why, what, where, when, who, how
WHY? 12+3 reasons
research goes “wrong”
1. Pressure to publish
2. Impact factor mania
3. Tainted resources
4. Bad maths
5. Sins of omission
6. Science is messy
7. Broken peer review
8. Some scientists don’t share
9. Research never reported
10. Poor training -> sloppiness
11. Honest error
12. Fraud
13. Disorganisation & time pressures
14. Cost to prepare and curate materials
15. Inherently “unreplicable ” (one-off data, specialist kit, stochastic)
https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)
• replication hostility
• resource intensive
• no funding, time,
recognition, place to
publish
• the complete
environment?
Its HARD to
Prepare and Independently Test
[Norman Morrison]
Value People. Data. Method. Software.
re-compute
replicate
rerun
repeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regenerate
revise
recycle
conceptual replication “show
A is true by doing B rather
than doing A again”
verify but not falsify
[Yong, Nature 485, 2012]
regenerate figure
redo
WHAT is reproducibility?
this is a heated topic of debate
robustness tolerance
verificationcompliance
validation assurance
Can I repeat
my method?
publish article
DEFEND
*Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
WHEN? same experiment, set up, lab
submit article
(and move on…)
Can I replicate
your method?
CERTIFY
(a window before decay sets in … )
same experiment,
set up,
independent lab
Can I reproducemy
results using your method or
your results using my method?
COMPARE
variations on experiment, set up, lab
Can I reuseyour
results / method in my
research ?
TRANSFER
different experiment
WHO? scientific ego-system & access
trust, reciprocity, and competition
blame
scooping
no credit / credit drift
misinterpretation
scrutiny trolling
cost of preparation
support distraction
dependents on old news
loss of dowry
loss of special sauce
hugging
flirting
voyerism
cautionary creeping
Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012
Borgman The conundrum of sharing research data, JASIST 2012
John P. A. Ioannidis How to Make More Published ResearchTrue, October 21, 2014 DOI: 10.1371/journal.pmed.1001747
Sandve GK, NekrutenkoA,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible Computational Research. PLoS
Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
HOW?
[Adapted Freire, 2013]
transparency
dependencies
steps
provenance
portability
robustness
tolerance
preservation
packaging
versioning
access
available
standards
common APIs
licence
description
intelligible
standards
common metadata
HOW?
sustained sitesFindable
Accessible
Intelligible
Reproducible
http://software-carpentry.org/
http://datacarpentry.org/
http://www.nature.com/sdata/
ELNs
Automation
Checklists
eLabs
Gathering
scattered
research
components
Summary
• Replicable Science is
hard work and poorly
rewarded
• Reproducible Science
=> Transparent Science
but ideally needs to be
born that way
• Collective responsibility
• Barend Mons
• Sean Bechhofer
• Philip Bourne
• Matthew Gamble
• Raul Palma
• Jun Zhao
• Alan Williams
• Stian Soiland-Reyes
• Paul Groth
• Tim Clark
• Juliana Freire
• Alejandra Gonzalez-Beltran
• Philippe Rocca-Serra
• Ian Cottam
• Susanna Sansone
• Kristian Garza
• Hylke Koers
• Norman Morrison
• Ian Fore
• Jill Mesirov
• Robert Stevens
• Steve Pettifer
http://www.researchobject.org
http://www.wf4ever-project.org
http://www.fair-dom.org
http://www.software.ac.uk
Further Reading
• https://www.sciencenews.org/article/redoing-scientific-
research-best-way-find-truth
• Drummond C Replicability is not Reproducibility: Nor is it
Good Science, online
• Peng RD, Reproducible Research in Computational Science
Science 2 Dec 2011: 1226-1227.
1 of 29

More Related Content

What's hot(20)

Big Data in MedicineBig Data in Medicine
Big Data in Medicine
Nasir Arafat2.1K views
Data science Big DataData science Big Data
Data science Big Data
sreekanthricky661 views
Introduction to RIntroduction to R
Introduction to R
Ajay Ohri9.2K views
Mrp IntrimMrp Intrim
Mrp Intrim
Theju Paul940 views
Role of Big Data in Medical DiagnosticsRole of Big Data in Medical Diagnostics
Role of Big Data in Medical Diagnostics
Nishant Agarwal1.4K views
Using deep learning in remote sensingUsing deep learning in remote sensing
Using deep learning in remote sensing
Mohamed Yousif463 views
Graphical AnalysisGraphical Analysis
Graphical Analysis
CIToolkit7.1K views
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
rjain513.9K views
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit3.7K views
GIS data structureGIS data structure
GIS data structure
Thana Chirapiwat14.8K views
Data VisualizationData Visualization
Data Visualization
Mithilesh Trivedi1.3K views
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban197.6K views
TYBSC IT SEM 6 GISTYBSC IT SEM 6 GIS
TYBSC IT SEM 6 GIS
WE-IT TUTORIALS22.3K views
Predicting employee burnoutPredicting employee burnout
Predicting employee burnout
Python Predictions12.8K views
Basics of Graphpad prismBasics of Graphpad prism
Basics of Graphpad prism
Raeed Altaee1.2K views

Similar to Reproducibility and Scientific Research: why, what, where, when, who, how

Similar to Reproducibility and Scientific Research: why, what, where, when, who, how (20)

More from Carole Goble(20)

FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble193 views
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble982 views
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble415 views
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble132 views
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble493 views
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble629 views
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble308 views

Reproducibility and Scientific Research: why, what, where, when, who, how

  • 1. Reproducibility and Scientific Research Professor Carole Goble CBE FREng FBCS The University of Manchester, UK carole.goble@manchester.ac.uk Open Data Manchester, 27th January 2015 icanhascheezburger.com why, what, where, when, who, how
  • 2. Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct ….. papers in experimental [and computational science] should describe the results and provide a clear enough protocol [or algorithm] to allow successful repetition and extension Jill Mesirov Accessible Reproducible Research Science 22 January 2010: Vol. 327 no. 5964 pp. 415-416 DOI: 10.1126/science.1179653 Virtual Witnessing / Minute Taking
  • 4. Why smart parents often tend to have smart kids
  • 7. “an experiment is reproducible until another laboratory tries to repeat it.” Alexander Kohn
  • 9. design cherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, dodgy normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations misapplied black box software reporting John P. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005, DOI: 10.1371/journal.pmed.0020124 incomplete reporting of software configurations, parameters & resource versions, missed steps, missing data, vague methods, missing software Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013 Empirical Statistical Computational V. Stodden, IMS Bulletin (2013)
  • 11. Transparency / Availability Gap 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950 Out of 18 microarray papers, results from 10 could not be reproduced
  • 12. Researcher survey, 1202 respondents (PARSE.insight 2010) Sustainability WHERE? [Hylke Koers]
  • 13. Broken software, broken science • GeoffreyChang, Scripps Institute • Homemade data-analysis program inherited from another lab • Flipped two columns of data, inverting the electron-density map used to derive protein structure • Retract 3 Science papers and 2 papers in other journals • One paper cited by 364The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right). Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006: vol. 314 no. 5807 1856-1857 http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
  • 14. “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship.The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et alThe case for open computer programs, Nature 482, 2012 algorithms configurations tools and apps codes workflows scripts code libraries third party services, system software infrastructure, compilers hardware Self-contained codes??
  • 16. WHY? 12+3 reasons research goes “wrong” 1. Pressure to publish 2. Impact factor mania 3. Tainted resources 4. Bad maths 5. Sins of omission 6. Science is messy 7. Broken peer review 8. Some scientists don’t share 9. Research never reported 10. Poor training -> sloppiness 11. Honest error 12. Fraud 13. Disorganisation & time pressures 14. Cost to prepare and curate materials 15. Inherently “unreplicable ” (one-off data, specialist kit, stochastic) https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)
  • 17. • replication hostility • resource intensive • no funding, time, recognition, place to publish • the complete environment? Its HARD to Prepare and Independently Test [Norman Morrison]
  • 18. Value People. Data. Method. Software.
  • 19. re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle conceptual replication “show A is true by doing B rather than doing A again” verify but not falsify [Yong, Nature 485, 2012] regenerate figure redo WHAT is reproducibility? this is a heated topic of debate robustness tolerance verificationcompliance validation assurance
  • 20. Can I repeat my method? publish article DEFEND *Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010) WHEN? same experiment, set up, lab submit article (and move on…) Can I replicate your method? CERTIFY (a window before decay sets in … ) same experiment, set up, independent lab Can I reproducemy results using your method or your results using my method? COMPARE variations on experiment, set up, lab Can I reuseyour results / method in my research ? TRANSFER different experiment
  • 21. WHO? scientific ego-system & access trust, reciprocity, and competition blame scooping no credit / credit drift misinterpretation scrutiny trolling cost of preparation support distraction dependents on old news loss of dowry loss of special sauce hugging flirting voyerism cautionary creeping Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012 Borgman The conundrum of sharing research data, JASIST 2012
  • 22. John P. A. Ioannidis How to Make More Published ResearchTrue, October 21, 2014 DOI: 10.1371/journal.pmed.1001747 Sandve GK, NekrutenkoA,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285 HOW?
  • 23. [Adapted Freire, 2013] transparency dependencies steps provenance portability robustness tolerance preservation packaging versioning access available standards common APIs licence description intelligible standards common metadata HOW? sustained sitesFindable Accessible Intelligible Reproducible
  • 27. Summary • Replicable Science is hard work and poorly rewarded • Reproducible Science => Transparent Science but ideally needs to be born that way • Collective responsibility
  • 28. • Barend Mons • Sean Bechhofer • Philip Bourne • Matthew Gamble • Raul Palma • Jun Zhao • Alan Williams • Stian Soiland-Reyes • Paul Groth • Tim Clark • Juliana Freire • Alejandra Gonzalez-Beltran • Philippe Rocca-Serra • Ian Cottam • Susanna Sansone • Kristian Garza • Hylke Koers • Norman Morrison • Ian Fore • Jill Mesirov • Robert Stevens • Steve Pettifer http://www.researchobject.org http://www.wf4ever-project.org http://www.fair-dom.org http://www.software.ac.uk
  • 29. Further Reading • https://www.sciencenews.org/article/redoing-scientific- research-best-way-find-truth • Drummond C Replicability is not Reproducibility: Nor is it Good Science, online • Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.