Successfully reported this slideshow.

Reproducibility and Scientific Research: why, what, where, when, who, how



1 of 29
1 of 29

More Related Content

Similar to Reproducibility and Scientific Research: why, what, where, when, who, how

Related Books

Free with a 14 day trial from Scribd

See all

Reproducibility and Scientific Research: why, what, where, when, who, how

  1. 1. Reproducibility and Scientific Research Professor Carole Goble CBE FREng FBCS The University of Manchester, UK Open Data Manchester, 27th January 2015 why, what, where, when, who, how
  2. 2. Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct ….. papers in experimental [and computational science] should describe the results and provide a clear enough protocol [or algorithm] to allow successful repetition and extension Jill Mesirov Accessible Reproducible Research Science 22 January 2010: Vol. 327 no. 5964 pp. 415-416 DOI: 10.1126/science.1179653 Virtual Witnessing / Minute Taking
  3. 3. [Pettifer, Attwood]
  4. 4. Why smart parents often tend to have smart kids
  5. 5. “an experiment is reproducible until another laboratory tries to repeat it.” Alexander Kohn
  6. 6. design cherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, dodgy normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations misapplied black box software reporting John P. A. Ioannidis Why Most Published Research FindingsAre False, August 30, 2005, DOI: 10.1371/journal.pmed.0020124 incomplete reporting of software configurations, parameters & resource versions, missed steps, missing data, vague methods, missing software Joppa, et al,TroublingTrends in Scientific Software Use SCIENCE 340 May 2013 Empirical Statistical Computational V. Stodden, IMS Bulletin (2013)
  7. 7. Transparency / Availability Gap 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions 3. Bjorn Brembs: Open Access and the looming crisis in science Out of 18 microarray papers, results from 10 could not be reproduced
  8. 8. Researcher survey, 1202 respondents (PARSE.insight 2010) Sustainability WHERE? [Hylke Koers]
  9. 9. Broken software, broken science • GeoffreyChang, Scripps Institute • Homemade data-analysis program inherited from another lab • Flipped two columns of data, inverting the electron-density map used to derive protein structure • Retract 3 Science papers and 2 papers in other journals • One paper cited by 364The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right). Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006: vol. 314 no. 5807 1856-1857
  10. 10. “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship.The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et alThe case for open computer programs, Nature 482, 2012 algorithms configurations tools and apps codes workflows scripts code libraries third party services, system software infrastructure, compilers hardware Self-contained codes??
  11. 11. WHY? 12+3 reasons research goes “wrong” 1. Pressure to publish 2. Impact factor mania 3. Tainted resources 4. Bad maths 5. Sins of omission 6. Science is messy 7. Broken peer review 8. Some scientists don’t share 9. Research never reported 10. Poor training -> sloppiness 11. Honest error 12. Fraud 13. Disorganisation & time pressures 14. Cost to prepare and curate materials 15. Inherently “unreplicable ” (one-off data, specialist kit, stochastic) (adapted)
  12. 12. • replication hostility • resource intensive • no funding, time, recognition, place to publish • the complete environment? Its HARD to Prepare and Independently Test [Norman Morrison]
  13. 13. Value People. Data. Method. Software.
  14. 14. re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle conceptual replication “show A is true by doing B rather than doing A again” verify but not falsify [Yong, Nature 485, 2012] regenerate figure redo WHAT is reproducibility? this is a heated topic of debate robustness tolerance verificationcompliance validation assurance
  15. 15. Can I repeat my method? publish article DEFEND *Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010) WHEN? same experiment, set up, lab submit article (and move on…) Can I replicate your method? CERTIFY (a window before decay sets in … ) same experiment, set up, independent lab Can I reproducemy results using your method or your results using my method? COMPARE variations on experiment, set up, lab Can I reuseyour results / method in my research ? TRANSFER different experiment
  16. 16. WHO? scientific ego-system & access trust, reciprocity, and competition blame scooping no credit / credit drift misinterpretation scrutiny trolling cost of preparation support distraction dependents on old news loss of dowry loss of special sauce hugging flirting voyerism cautionary creeping Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012 Borgman The conundrum of sharing research data, JASIST 2012
  17. 17. John P. A. Ioannidis How to Make More Published ResearchTrue, October 21, 2014 DOI: 10.1371/journal.pmed.1001747 Sandve GK, NekrutenkoA,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285 HOW?
  18. 18. [Adapted Freire, 2013] transparency dependencies steps provenance portability robustness tolerance preservation packaging versioning access available standards common APIs licence description intelligible standards common metadata HOW? sustained sitesFindable Accessible Intelligible Reproducible
  19. 19.
  20. 20. ELNs Automation Checklists eLabs
  21. 21. Gathering scattered research components
  22. 22. Summary • Replicable Science is hard work and poorly rewarded • Reproducible Science => Transparent Science but ideally needs to be born that way • Collective responsibility
  23. 23. • Barend Mons • Sean Bechhofer • Philip Bourne • Matthew Gamble • Raul Palma • Jun Zhao • Alan Williams • Stian Soiland-Reyes • Paul Groth • Tim Clark • Juliana Freire • Alejandra Gonzalez-Beltran • Philippe Rocca-Serra • Ian Cottam • Susanna Sansone • Kristian Garza • Hylke Koers • Norman Morrison • Ian Fore • Jill Mesirov • Robert Stevens • Steve Pettifer
  24. 24. Further Reading • research-best-way-find-truth • Drummond C Replicability is not Reproducibility: Nor is it Good Science, online • Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.