Reproducible Science
Panel at iEvoBio 2014
Hilmar Lapp
National Evolutionary Synthesis Center (NESCent)
Disclosure
(or call it perspective)
• Funded by NSF through BEACON for a series
of workshops on developing reproducible
science curriculum
• 3 principle questions:
• What practices, tools, and resources are
available now?
• How best to teach these?
• What are the gaps faced by biologist users?
An experiment with
sobering results
https://storify.com/hlapp/reproducibility-repeatability-bigthink
An experiment with
sobering results
Main takeaways
(distilled to tweets)
• software with many dependencies ->
exponentially lower prob that all install
• holes or errors in docs -> harmless for
experts, often fatal for "method novice"
• software evolution & rot -> parameters
that worked 1 year ago now throw an
error
• Non-domain reproducers harder: baseline
software, packages differ #dependencyhell
http://ropensci.org/blog/2014/06/09/
reproducibility/
“arguing that reproducibility
is laudable in general glosses
over the fact that for each
research group it is a
significant amount of work
to make their research
(easily) reproducible for
independent scientists”
“Any work you do to
make your analysis
more reproducible
pays dividends for
colleagues and your
future self.”
Jeremy Leipzig
For research to be
reproducible, the parts need
to be available to start
Collberg et al (2014), Measuring Reproducibility in Computer Systems Research.
http://reproducibility.cs.arizona.edu/tr.pdf
A huge tech soup
• vagrant
• Ansible
• Docker
• Drone
• Travis
• knitr
• packrat
• VM memory limits
• VM storage limits
• VM uptime limits
• firewalls
• protected data
• data snapshotting
Reproducible science
is a huge opportunity
for Research IT to
enable & accelerate
science.

Reproducible Science - Panel at iEvoBio 2014

  • 1.
    Reproducible Science Panel atiEvoBio 2014 Hilmar Lapp National Evolutionary Synthesis Center (NESCent)
  • 2.
    Disclosure (or call itperspective) • Funded by NSF through BEACON for a series of workshops on developing reproducible science curriculum • 3 principle questions: • What practices, tools, and resources are available now? • How best to teach these? • What are the gaps faced by biologist users?
  • 3.
    An experiment with soberingresults https://storify.com/hlapp/reproducibility-repeatability-bigthink
  • 4.
  • 5.
    Main takeaways (distilled totweets) • software with many dependencies -> exponentially lower prob that all install • holes or errors in docs -> harmless for experts, often fatal for "method novice" • software evolution & rot -> parameters that worked 1 year ago now throw an error • Non-domain reproducers harder: baseline software, packages differ #dependencyhell
  • 6.
  • 7.
    “arguing that reproducibility islaudable in general glosses over the fact that for each research group it is a significant amount of work to make their research (easily) reproducible for independent scientists”
  • 8.
    “Any work youdo to make your analysis more reproducible pays dividends for colleagues and your future self.” Jeremy Leipzig
  • 10.
    For research tobe reproducible, the parts need to be available to start Collberg et al (2014), Measuring Reproducibility in Computer Systems Research. http://reproducibility.cs.arizona.edu/tr.pdf
  • 11.
    A huge techsoup • vagrant • Ansible • Docker • Drone • Travis • knitr • packrat • VM memory limits • VM storage limits • VM uptime limits • firewalls • protected data • data snapshotting
  • 12.
    Reproducible science is ahuge opportunity for Research IT to enable & accelerate science.