Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reproducibility

282 views

Published on

Discussion at 3Dsig as part of ISMB, Chicago, July 10, 2018.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Reproducibility

  1. 1. Reproducibility Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 1 @pebourne 3Dsig Chicago July 10, 2018
  2. 2. This is a discussion.. I am merely providing some context … The real work comes this afternoon at 2pm 2
  3. 3. Collaborative structural biology using machine learning and Jupyter notebook Fergus Boyles and Fergus Imrie Department of Statistics, University of Oxford ISMB July 2018 - Live interactive demonstration - Follow along during the presentation, or use as a reference afterwards - Materials: http://opig.stats.ox.ac.uk/webapps/ISMB_2018.html GitHub instructions: https://github.com/FBoyles/3dsig
  4. 4. Why the fuss? 4
  5. 5. 47/53 “landmark” publications could not be replicated [Begley, Ellis Nature, 483, 2012]
  6. 6. Causality … • Cherry picking data • Misapplication of black box software • Bias • Poor positive and negative controls • Improper statistical analysis • Etc … 6 The review process itself under threat does not catch all of this
  7. 7. Its useful to look at the issue through the eyes of different stakeholders • Researchers – on one hand reproducibility is like broccoli – no one wants to, but you know you should eat it, on the other, we all know we spend too much time recreating the research of others. • Funders – they are demanding it – what does that mean? • Publishers – they are demanding it too – what does that mean? • Public – just another attack on the value of science 7
  8. 8. Robust Reproduce Replicate Some terminology Same Data Same Software Different Data Different Software Generalize
  9. 9. Its more complex than that… • Infrastructures (hardware, compilers, libraries, languages etc. change) • There is the process through which the research is done… • Different parameters • Different protocols / workflows 9
  10. 10. 3Dsigers do pretty well relative to other disciplines.. but we could do better • Major public data repositories • Multiple declarations for depositing data • Thriving open source community • Data standardisation efforts • Core facilities • Heroic data campaigns • International and national coordination
  11. 11. data/code as first class citizen http://www.ncbi.nlm.nih.gov/pubmed/26207759 Only 12% of data from research is preserved [Adapted from Carole Goble]
  12. 12. For Labs - Incentives 12 “I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper. Data/software versions. Workflows are maturing and becoming helpful” Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE 8(11): e80278.
  13. 13. For Labs: Disincentives 13
  14. 14. the neylon equation Process = Interest Friction x Number people reach Cameron Neylon, BOSC 2013, http://cameronneylon.net/ lower friction so born reproducible
  15. 15. emerging reproducible system ecosystem [from Carole Goble 2013] Sweave ReproZip instrumented desktop tools hosted services packaging and archiving repositories, catalogues online sharing platforms integrated authoring integrative frameworks XworX
  16. 16. Natural selection has taken place 16
  17. 17. The Research Lifecycle IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Repositories Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  18. 18. Questions • What is missing from this discussion? • Where do you see the balance between the pain and the gain? • Is your lab doing anything to improve the situation, if so what? • Should we and could we do anything as a community? 18

×