Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Know your R usage workflow to handle reproducibility challenges

329 views

Published on

R is used in a vast ways. From pure ad-hoc by hobbysts to an organized and structured way in an enterprise. Each way of R usage brings different reproducibility challenges. Going through range of typical workflows we will show that understanding reproducibility must start with understanding your workflow. Presenting workflows we will show how we deal reproducibiilty challenges with open-source R Suite (http://rsuite.io) solution developed by us to support our large scale R development.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Know your R usage workflow to handle reproducibility challenges

  1. 1. Copyright (c) WLOG Solutions Know your R usage workflow to handle reproducibility challenges Budapest, 2018
  2. 2. Copyright (c) WLOG Solutions Kate and Henry Freelancer/scientist/ consultant The Team Corporate/ In-house team Meet Personas John Student/hobbyist
  3. 3. Copyright (c) WLOG Solutions They were coding in R happily until that one day...
  4. 4. Copyright (c) WLOG Solutions https://xkcd.com/234/
  5. 5. Copyright (c) WLOG Solutions John Could not deliver R labs homework due to package incompatibility at professors laptop.
  6. 6. Copyright (c) WLOG Solutions Kate and Henry Missed deadlines due to problems installing packages for their R shiny app at Customer’s Server running RedHat Enterprise 6.8.
  7. 7. Copyright (c) WLOG Solutions The Team Had serious issues with package versions conflicts due to many users, many projects, running RedHat Enteprise machine without internet access.
  8. 8. Copyright (c) WLOG Solutions Three different stories the same reproducibility problem.
  9. 9. Copyright (c) WLOG Solutions What is reproducibility?
  10. 10. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  11. 11. Copyright (c) WLOG Solutions Reproducibility is the ability to run a code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  12. 12. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  13. 13. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, at different computer, in such way to obtain the same outputs given the same inputs.
  14. 14. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  15. 15. Copyright (c) WLOG Solutions Bare metal Operating system Solution dependencies Code Data
  16. 16. Copyright (c) WLOG Solutions Few examples
  17. 17. Copyright (c) WLOG Solutions 17 forecast v7.2 - ggplot2 (>= 2.0.0) - Rcpp (>= 0.11) - Added gglagplot R 3.3.1 2016-01-03 2016-09-08 forecast v6.2 - Rcpp (>= 0.11) R 3.2.3 forecast v8.0 - ggplot2 (>= 2.0.0) - Rcpp (>= 0.11) - Modified defaults for gglagplot R 3.3.2 2017-03-01
  18. 18. Copyright (c) WLOG Solutions 18
  19. 19. Copyright (c) WLOG Solutions Development Production
  20. 20. Copyright (c) WLOG Solutions I recommend using rocker/r-ver
  21. 21. Copyright (c) WLOG Solutions When is reproducibility important while you program in R?
  22. 22. Copyright (c) WLOG Solutions Debian/Ubuntu RedHat/Centos Windows Debian/Ubuntu RedHat/Centos Windows Development Production Deploy (share) solution to production
  23. 23. Copyright (c) WLOG Solutions Debian/Ubuntu RedHat/Centos Windows Debian/Ubuntu RedHat/Centos Windows Development Development’ Restore development environment
  24. 24. Copyright (c) WLOG Solutions Three workflows three reproducibility solutions.
  25. 25. Copyright (c) WLOG Solutions John, student/hobbyist Dev/Production Version controlFamily&Friends or Professor MRAN
  26. 26. Copyright (c) WLOG Solutions Kate and Henry, consultancy team/freelancer/scientist DevProduction Continuous integration Version control Local CRAN MRAN On-premise Cloud Spark etc.
  27. 27. Copyright (c) WLOG Solutions The Team, corporate/in-house team DevProduction Continuous integration Version control Local CRAN
  28. 28. Copyright (c) WLOG Solutions One word on Docker Development Production Build for different OS Deployment package . zip
  29. 29. Copyright (c) WLOG Solutions Second word on Docker Development Production Build Docker image
  30. 30. Copyright (c) WLOG Solutions CRAN management Multiple R versions Debian/Ubuntu Windows RedHat/CenOS Docker Jenkins Isolated projects http://rsuite.io https://github.com/WLOGSolutions/RSuite https://www.slideshare.net/WLOGSolutions No installation on prod Internetless environments System requirements Git/SVN Binary packages
  31. 31. 31 Wit Jakuczun CEO wit.Jakuczun@wlogsolutions.com +48 601820620 http://www.wlogsolutions.com

×