SlideShare a Scribd company logo
Reproducibility with Checkpoint & RRO
New York R Conference
Joseph Rickert
April 25, 2015
2
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR PRODUCT
REVOLUTION R: The
enterprise-grade predictive
analytics application platform
based on the R language
SOME KUDOS
“This acquisition will help
customers use advanced
analytics within Microsoft data
platforms“
-- Joseph Sirosh, CVP C+E
What is Reproducibility?
“The goal of reproducible research is to tie specific
instructions to data analysis and experimental data so that
scholarship can be recreated, better understood and
verified.”
CRAN Task View on Reproducible Research (Kuhn)
 Method + Environment
-> Results
 A process for:
– Sharing the method
– Describing the environment
– Recreating the results
3
xkcd.com/242/
Reproducibility – why do we care?
Academic / Research
 Verify results
 Advance Research
Business
 Production code
 Reliability
 Reusability
 Collaboration
 Regulation
4
www.nytimes.com/2011/07/08/health/research/08genes.html
http://arxiv.org/pdf/1010.1092.pdf
Observations
 R versions are pretty manageable
– Major versions just once a year
– Patches rarely introduce incompatible changes
 Good solutions for literate programming
– Rstudio / knitr / Rmarkdown
 OS/Hardware not the major cause of problems
 The big problem is with packages
– CRAN is in a state of continual flux
5
Package dependency explosion
 R script file using 6 most popular packages
6
Any updated package = potential reproducibility error!
http://blog.revolutionanalytics.com/2014/10/explore-r-package-connections-at-mran.html
7
An R Reproducibility Problem
Adapted from http://xkcd.com/234/ CC BY-NC 2.5
8
Reproducible R Toolkit
 Static CRAN mirror in Revolution R Open
– CRAN packages fixed with each RRO update
 Daily CRAN snapshots
– Storing every package version since September 2014
– Hosted at mran.revolutionanalytics.com/snapshot
 Write and share scripts synced to a specific snapshot date
– checkpoint package installed with RRO
projects.revolutionanalytics.com/rrt/
9
Using checkpoint
 Add 2 lines to the top of your script
library(checkpoint)
checkpoint("2015-01-28")
 That’s it?
 Optionally, check the R version as well
library(checkpoint)
checkpoint("2015-01-28", R.version="3.1.3")
Or, whichever date you want
10
Using checkpoint
 Easy to use: add 2 lines to the top of each script
library(checkpoint)
checkpoint("2014-09-17")
 For the package author:
– Use package versions available on the chosen date
– Installs packages local to this project
• Allows different package versions to be used simultaneously
 For a script collaborator:
– Automatically installs required packages
• Detects required packages (no need to manually install!)
– Uses same package versions as script author to ensure reproducibility
11
R Script with packages having dependencies
 ## Adapted from https://gist.github.com/abresler/46c36c1a88c849b94b07
 ## Blog post Jan 21
 ## http://blog.revolutionanalytics.com/2015/01/a-beautiful-story-about-nyc-weather.html
 library(checkpoint)
 checkpoint("2015-02-05",R="3.1.2")
 library(dplyr)
 library(tidyr)
 library(magrittr)
 library(ggplot2)
12
First time script is run
 checkpoint: Part of the Reproducible R Toolkit from Revolution Analytics
 http://projects.revolutionanalytics.com/rrt/
 > checkpoint("2015-02-05",R="3.1.2")
 Scanning for packages used in this project
 |==========================================================================================| 100%
 - Discovered 5 packages
 Installing packages used in this project
 - Installing ‘dplyr’
 also installing the dependencies ‘assertthat’, ‘R6’, ‘Rcpp’, ‘magrittr’, ‘lazyeval’, ‘DBI’, ‘BH’
 package ‘assertthat’ successfully unpacked and MD5 sums checked
 package ‘R6’ successfully unpacked and MD5 sums checked
 package ‘Rcpp’ successfully unpacked and MD5 sums checked
 package ‘magrittr’ successfully unpacked and MD5 sums checked
 package ‘lazyeval’ successfully unpacked and MD5 sums checked
 package ‘DBI’ successfully unpacked and MD5 sums checked
 package ‘BH’ successfully unpacked and MD5 sums checked
 package ‘dplyr’ successfully unpacked and MD5 sums checked
 - Installing ‘ggplot2’
 also installing the dependencies ‘colorspace’, ‘stringr’, ‘RColorBrewer’, ‘dichromat’, ‘munsell’, ‘labeling’, ‘plyr’, ‘digest’, ‘gtable’, ‘reshape2’,
‘scales’, ‘proto’, ‘MASS’
13
Lots of dependent packages loaded
 package ‘colorspace’ successfully unpacked and MD5 sums checked
 package ‘stringr’ successfully unpacked and MD5 sums checked
 package ‘RColorBrewer’ successfully unpacked and MD5 sums checked
 package ‘dichromat’ successfully unpacked and MD5 sums checked
 package ‘munsell’ successfully unpacked and MD5 sums checked
 package ‘labeling’ successfully unpacked and MD5 sums checked
 package ‘plyr’ successfully unpacked and MD5 sums checked
 package ‘digest’ successfully unpacked and MD5 sums checked
 package ‘gtable’ successfully unpacked and MD5 sums checked
 package ‘reshape2’ successfully unpacked and MD5 sums checked
 package ‘scales’ successfully unpacked and MD5 sums checked
 package ‘proto’ successfully unpacked and MD5 sums checked
 package ‘MASS’ successfully unpacked and MD5 sums checked
 package ‘ggplot2’ successfully unpacked and MD5 sums checked
 - Previously installed ‘magrittr’
 - Installing ‘tidyr’
 also installing the dependency ‘stringi’
 package ‘stringi’ successfully unpacked and MD5 sums checked
 package ‘tidyr’ successfully unpacked and MD5 sums checked
 checkpoint process complete
Checkpoint tips for script authors
 Work within a project
– Dedicated folder with scripts, data and output
– eg /Users/Joe/Rstudio_Projects/weather
 Create a master .R script file beginning with
library(checkpoint)
checkpoint("DATE")
– package versions used will be as of this date
 Don’t use install.packages directly
– Use library() and checkpoint does the rest
– You can have different package versions installed for different projects at the
same time!
14
Sharing projects with checkpoint
 Just share your script or project folder!
 Recipient only needs:
– compatible R version
– checkpoint package (installed with RRO)
– Internet connection to MRAN (at least first time)
 Checkpoint takes care of:
– Installing CRAN packages
• Binaries (ease of installation)
• Correct versions (reproducibility)
• Dependencies (ease of installation)
– Eliminating conflicts with other installed packages
15
The checkpoint magic
The checkpoint() call does all this:
 Scans project for required packages
 Installs required packages and dependencies
– Packages installed specific to project
– Versions specific to checkpoint date
• Installed in ~/.checkpoint/DATE
• Skips packages if already installed (2nd run through)
 Reconfigures package search path
– Points only to project-specific library
16
MRAN checkpoint server
Checkpoint uses MRAN’s downstream CRAN mirror
with daily snapshots.
17
CRAN
RRDaily
snapshots
checkpoint
package
library(checkpoint)
checkpoint("2015-01-28")
CRAN mirror
cran.revolutionanalytics.com
(Windows binaries virus-scanned)
checkpoint
server
Midnight
UTC
mran.revolutionanalytics.com/snapshot/
checkpoint server - implementation
Checkpoint uses MRAN’s downstream CRAN mirror with daily snapshots.
 rsync to mirror CRAN daily
– Only downloads changed packages
 zfs to store incremental snapshots
– Storage only required for new packages
 Organizes snapshots into a labelled hierarchy
– mran.revolutionanalytics.com/snapshot/YYYY-MM-DD
 MRAN hosted by high-performance cloud provider
– Provisioned for availability and latency
18
https://github.com/RevolutionAnalytics/checkpoint-server
19
Using non-CRAN packages Reproducibly
 Today, checkpoint only manages packages from CRAN
 GitHub: use install_github with a specific checkin hash
install_github("ramnathv/rblocks",
ref="a85e748390c17c752cc0ba961120d1e784fb1956")
 BioConductor: use packages from a specific BioConductor release
– Not as easy as it seems!
 Private packages / behind the firewall
– use miniCRAN to create a local, static repository
20
Why use checkpoint?
 Write and share code R whose results can be reproduced, even if new (and
possibly incompatible) package versions are released later.
 Share R scripts with others that will automatically install the appropriate package
versions (no need to manually install CRAN packages).
 Write R scripts that use older versions of packages, or packages that are no
longer available on CRAN.
 Install packages (or package versions) visible only to a specific project, without
affecting other R projects or R users on the same system.
 Manage multiple projects that use different package versions.
What about packrat?
 Packrat is flexible and powerful
– Supports non-CRAN packages (e.g. github)
– Allows mix-and-matching package versions
– Requires shipping all package source
– Requires recipients to build packages from source
 Checkpoint is simple
– Reproducibility from one script
– Simple for recipients to reproduce results
– Only allows use of CRAN packages versions that have been tested
together
– Requires Web access (and availability of MRAN)
21
rstudio.github.io/packrat/
22
Revolution R Open is:
 An enhanced open source distribution of R
 Compatible with all R-related software
 Multi-threaded for performance
 Focused on reproducibility
 Open source (GPLv2 license)
 Available for Windows, Mac OS X, Ubuntu, Red Hat and OpenSUSE
 Includes checkpoint
 Designed to work with RStudio
 Side-by-side installations
 Download from mran.revolutionanalytics.com
CRAN mirrors and RRO
 Revolution R Open ships with a fixed default CRAN mirror
– Currently, 1 March 2015 snapshot (v 8.0.2)
– Soon: (v 8.0.3)
 All users of same version get same CRAN package versions by default
– regardless when “install.packages” is run
 Use checkpoint to access newer package versions
23
24
MRAN
The Managed R Archive Network
 Download Revolution R Open
 Learn about R and RRO
 Explore R Packages
 Explore Task Views
 R tips and applications
 Daily CRAN snapshots
mran.revolutionanalytics.com
Thank you.
Learn more at:
projects.revolutionanalytics.com/rrt/
Find archived webinars at:
revolutionanalytics.com/webinars
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR
Joseph Rickert
Data Scientist, Community Manager
Revolution Analytics
@RevoJoe
Joseph.rickert@revolutionanalytics.com
26
Multi-threaded performance
 Intel MKL replaces standard
BLAS/LAPACK algorithms
(Windows/Linux)
 Pipelined operations
– Optimized for Intel, works for all archs
 High-performance algorithms
 Sequential  Parallel
– Uses as many threads as there are
available cores
– Control with:
setMKLthreads(<value>)
 No need to change any R code
 Included in RRO binary distribution
More at Revolutions blog
27
# Create a local checkpoint library
library(checkpoint)
checkpoint("2014-11-14")
> library(checkpoint)
checkpoint: Part of the Reproducible R Toolkit from Revolution Analytics
http://projects.revolutionanalytics.com/rrt/
Warning message:
package ‘checkpoint’ was built under R version 3.1.2
> checkpoint("2014-11-14")
Scanning for loaded pkgs
Scanning for packages used in this project
Installing packages used in this project
Warning: dependencies ‘stats’, ‘tools’, ‘utils’, ‘methods’, ‘graphics’, ‘splines’, ‘grid’, ‘grDevices’ are not available
also installing the dependencies ‘bitops’, ‘stringr’, ‘digest’, ‘jsonlite’, ‘lattice’, ‘RCurl’, ‘rjson’, ‘statmod’,
‘survival’, ‘XML’, ‘httr’, ‘Matrix’
package ‘bitops’ successfully unpacked and MD5 sums checked
package ‘stringr’ successfully unpacked and MD5 sums checked
package ‘digest’ successfully unpacked and MD5 sums checked
package ‘jsonlite’ successfully unpacked and MD5 sums checked
package ‘lattice’ successfully unpacked and MD5 sums checked
package ‘RCurl’ successfully unpacked and MD5 sums checked
package ‘rjson’ successfully unpacked and MD5 sums checked
package ‘statmod’ successfully unpacked and MD5 sums checked
package ‘survival’ successfully unpacked and MD5 sums checked
package ‘XML’ successfully unpacked and MD5 sums checked
package ‘httr’ successfully unpacked and MD5 sums checked
package ‘Matrix’ successfully unpacked and MD5 sums checked
package ‘h2o’ successfully unpacked and MD5 sums checked
package ‘miniCRAN’ successfully unpacked and MD5 sums checked
package ‘igraph’ successfully unpacked and MD5 sums checked
SVD 100,000 by 2,000 Matrix
28
RRO R3.1.1
user system elapsed
78.08 1.18 42.51
R 3.1.2
user system elapsed
431.44 0.95 432.70
29
Revolution R Plus Technical Support
Technical support for R, from the R experts.
 Developers: Email and phone support, 8AM-6PM Mon-Fri
 Production Servers and Hadoop: 24x7 Severity-1 coverage
 On-line case management and knowledgebase
 Access to technical resources, documentation and user forums
 Exclusive on-line webinars from community experts
 Guaranteed response times
Also available: expert hands-on and on-line training for R, from
Revolution Analytics AcademyR.
www.revolutionanalytics.com/plus

More Related Content

What's hot

Cross-Project Build Co-change Prediction
Cross-Project Build Co-change PredictionCross-Project Build Co-change Prediction
Cross-Project Build Co-change Prediction
Shane McIntosh
 
Contributing to Upstream Open Source Projects
Contributing to Upstream Open Source ProjectsContributing to Upstream Open Source Projects
Contributing to Upstream Open Source Projects
Scott Garman
 
Optimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest ApiOptimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest Api
Iman Syahputra Situmorang
 
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...Tracing Software Build Processes to Uncover License Compliance Inconsistencie...
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...Shane McIntosh
 
Dependency management in golang
Dependency management in golangDependency management in golang
Dependency management in golang
Ramit Surana
 
The Impact of Code Review Coverage and Participation on Software Quality
The Impact of Code Review Coverage and Participation on Software QualityThe Impact of Code Review Coverage and Participation on Software Quality
The Impact of Code Review Coverage and Participation on Software Quality
Shane McIntosh
 
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
Grid Dynamics
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in Technology
Daniel Barker
 
Agile analysis development
Agile analysis developmentAgile analysis development
Agile analysis development
setitesuk
 
Analyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHubAnalyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHub
Ahmed Zerouali
 
Quickly testing legacy code cppp.fr 2019 - clare macrae
Quickly testing legacy code   cppp.fr 2019 - clare macraeQuickly testing legacy code   cppp.fr 2019 - clare macrae
Quickly testing legacy code cppp.fr 2019 - clare macrae
Clare Macrae
 
Spring Testing, Fight for the Context
Spring Testing, Fight for the ContextSpring Testing, Fight for the Context
Spring Testing, Fight for the Context
GlobalLogic Ukraine
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Rafael Ferreira da Silva
 
OpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summitOpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summit
Anne Gentle
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
Fasten Project
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
swy351
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
swy351
 
ASE2010
ASE2010ASE2010
ASE2010
swy351
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Paul Richards
 
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Shane McIntosh
 

What's hot (20)

Cross-Project Build Co-change Prediction
Cross-Project Build Co-change PredictionCross-Project Build Co-change Prediction
Cross-Project Build Co-change Prediction
 
Contributing to Upstream Open Source Projects
Contributing to Upstream Open Source ProjectsContributing to Upstream Open Source Projects
Contributing to Upstream Open Source Projects
 
Optimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest ApiOptimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest Api
 
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...Tracing Software Build Processes to Uncover License Compliance Inconsistencie...
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...
 
Dependency management in golang
Dependency management in golangDependency management in golang
Dependency management in golang
 
The Impact of Code Review Coverage and Participation on Software Quality
The Impact of Code Review Coverage and Participation on Software QualityThe Impact of Code Review Coverage and Participation on Software Quality
The Impact of Code Review Coverage and Participation on Software Quality
 
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in Technology
 
Agile analysis development
Agile analysis developmentAgile analysis development
Agile analysis development
 
Analyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHubAnalyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHub
 
Quickly testing legacy code cppp.fr 2019 - clare macrae
Quickly testing legacy code   cppp.fr 2019 - clare macraeQuickly testing legacy code   cppp.fr 2019 - clare macrae
Quickly testing legacy code cppp.fr 2019 - clare macrae
 
Spring Testing, Fight for the Context
Spring Testing, Fight for the ContextSpring Testing, Fight for the Context
Spring Testing, Fight for the Context
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
OpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summitOpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summit
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
 
ASE2010
ASE2010ASE2010
ASE2010
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
 
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
 

Viewers also liked

But when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only oneBut when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only one
Work-Bench
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
Work-Bench
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
Work-Bench
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
Work-Bench
 
Data Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program AnalysisData Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program Analysis
Work-Bench
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
Work-Bench
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament edition
Work-Bench
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
Work-Bench
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
Work-Bench
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
Work-Bench
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
Work-Bench
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
Work-Bench
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
Work-Bench
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
Work-Bench
 
The Feels
The FeelsThe Feels
The Feels
Work-Bench
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
Work-Bench
 
R for Everything
R for EverythingR for Everything
R for Everything
Work-Bench
 
Julia + R for Data Science
Julia + R for Data ScienceJulia + R for Data Science
Julia + R for Data Science
Work-Bench
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Work-Bench
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at Airbnb
Work-Bench
 

Viewers also liked (20)

But when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only oneBut when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only one
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
 
Data Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program AnalysisData Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program Analysis
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament edition
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
 
The Feels
The FeelsThe Feels
The Feels
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
 
R for Everything
R for EverythingR for Everything
R for Everything
 
Julia + R for Data Science
Julia + R for Data ScienceJulia + R for Data Science
Julia + R for Data Science
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at Airbnb
 

Similar to Reproducibility with Checkpoint & RRO

Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
Revolution Analytics
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
Revolution Analytics
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
Felipe Prado
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
Revolution Analytics
 
Native Hadoop with prebuilt spark
Native Hadoop with prebuilt sparkNative Hadoop with prebuilt spark
Native Hadoop with prebuilt spark
arunkumar sadhasivam
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track Introduction
Sergey Sotnikov
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
gslicraf
 
[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1
Rubens Dos Santos Filho
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
Knoldus Inc.
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
Dave Hiltbrand
 
SPACK: A Package Manager for Supercomputers, Linux, and MacOS
SPACK: A Package Manager for Supercomputers, Linux, and MacOSSPACK: A Package Manager for Supercomputers, Linux, and MacOS
SPACK: A Package Manager for Supercomputers, Linux, and MacOS
inside-BigData.com
 
FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...
FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...
FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...
Fasten Project
 
Approaching package manager
Approaching package managerApproaching package manager
Approaching package manager
Timur Safin
 
Apache maven, a software project management tool
Apache maven, a software project management toolApache maven, a software project management tool
Apache maven, a software project management toolRenato Primavera
 
PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)
Andrey Karpov
 
How to implement continuous delivery with enterprise java middleware?
How to implement continuous delivery with enterprise java middleware?How to implement continuous delivery with enterprise java middleware?
How to implement continuous delivery with enterprise java middleware?
Thoughtworks
 
Новый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоныНовый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоны
Timur Safin
 

Similar to Reproducibility with Checkpoint & RRO (20)

Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
Native Hadoop with prebuilt spark
Native Hadoop with prebuilt sparkNative Hadoop with prebuilt spark
Native Hadoop with prebuilt spark
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track Introduction
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1[JOI] TOTVS Developers Joinville - Java #1
[JOI] TOTVS Developers Joinville - Java #1
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
SPACK: A Package Manager for Supercomputers, Linux, and MacOS
SPACK: A Package Manager for Supercomputers, Linux, and MacOSSPACK: A Package Manager for Supercomputers, Linux, and MacOS
SPACK: A Package Manager for Supercomputers, Linux, and MacOS
 
MidSem
MidSemMidSem
MidSem
 
kishore_Nokia
kishore_Nokiakishore_Nokia
kishore_Nokia
 
FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...
FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...
FASTEN presentation at OSS2021, by Michele Scarlato, Endocode, May 12, 2021, ...
 
Approaching package manager
Approaching package managerApproaching package manager
Approaching package manager
 
Apache maven, a software project management tool
Apache maven, a software project management toolApache maven, a software project management tool
Apache maven, a software project management tool
 
PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)
 
How to implement continuous delivery with enterprise java middleware?
How to implement continuous delivery with enterprise java middleware?How to implement continuous delivery with enterprise java middleware?
How to implement continuous delivery with enterprise java middleware?
 
Новый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоныНовый InterSystems: open-source, митапы, хакатоны
Новый InterSystems: open-source, митапы, хакатоны
 

More from Work-Bench

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
Work-Bench
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
Work-Bench
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
Work-Bench
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
Work-Bench
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
Work-Bench
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
Work-Bench
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
Work-Bench
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
Work-Bench
 

More from Work-Bench (8)

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

Reproducibility with Checkpoint & RRO

  • 1. Reproducibility with Checkpoint & RRO New York R Conference Joseph Rickert April 25, 2015
  • 2. 2 OUR COMPANY The leading provider of advanced analytics software and services based on open source R, since 2007 OUR PRODUCT REVOLUTION R: The enterprise-grade predictive analytics application platform based on the R language SOME KUDOS “This acquisition will help customers use advanced analytics within Microsoft data platforms“ -- Joseph Sirosh, CVP C+E
  • 3. What is Reproducibility? “The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.” CRAN Task View on Reproducible Research (Kuhn)  Method + Environment -> Results  A process for: – Sharing the method – Describing the environment – Recreating the results 3 xkcd.com/242/
  • 4. Reproducibility – why do we care? Academic / Research  Verify results  Advance Research Business  Production code  Reliability  Reusability  Collaboration  Regulation 4 www.nytimes.com/2011/07/08/health/research/08genes.html http://arxiv.org/pdf/1010.1092.pdf
  • 5. Observations  R versions are pretty manageable – Major versions just once a year – Patches rarely introduce incompatible changes  Good solutions for literate programming – Rstudio / knitr / Rmarkdown  OS/Hardware not the major cause of problems  The big problem is with packages – CRAN is in a state of continual flux 5
  • 6. Package dependency explosion  R script file using 6 most popular packages 6 Any updated package = potential reproducibility error! http://blog.revolutionanalytics.com/2014/10/explore-r-package-connections-at-mran.html
  • 7. 7 An R Reproducibility Problem Adapted from http://xkcd.com/234/ CC BY-NC 2.5
  • 8. 8 Reproducible R Toolkit  Static CRAN mirror in Revolution R Open – CRAN packages fixed with each RRO update  Daily CRAN snapshots – Storing every package version since September 2014 – Hosted at mran.revolutionanalytics.com/snapshot  Write and share scripts synced to a specific snapshot date – checkpoint package installed with RRO projects.revolutionanalytics.com/rrt/
  • 9. 9 Using checkpoint  Add 2 lines to the top of your script library(checkpoint) checkpoint("2015-01-28")  That’s it?  Optionally, check the R version as well library(checkpoint) checkpoint("2015-01-28", R.version="3.1.3") Or, whichever date you want
  • 10. 10 Using checkpoint  Easy to use: add 2 lines to the top of each script library(checkpoint) checkpoint("2014-09-17")  For the package author: – Use package versions available on the chosen date – Installs packages local to this project • Allows different package versions to be used simultaneously  For a script collaborator: – Automatically installs required packages • Detects required packages (no need to manually install!) – Uses same package versions as script author to ensure reproducibility
  • 11. 11 R Script with packages having dependencies  ## Adapted from https://gist.github.com/abresler/46c36c1a88c849b94b07  ## Blog post Jan 21  ## http://blog.revolutionanalytics.com/2015/01/a-beautiful-story-about-nyc-weather.html  library(checkpoint)  checkpoint("2015-02-05",R="3.1.2")  library(dplyr)  library(tidyr)  library(magrittr)  library(ggplot2)
  • 12. 12 First time script is run  checkpoint: Part of the Reproducible R Toolkit from Revolution Analytics  http://projects.revolutionanalytics.com/rrt/  > checkpoint("2015-02-05",R="3.1.2")  Scanning for packages used in this project  |==========================================================================================| 100%  - Discovered 5 packages  Installing packages used in this project  - Installing ‘dplyr’  also installing the dependencies ‘assertthat’, ‘R6’, ‘Rcpp’, ‘magrittr’, ‘lazyeval’, ‘DBI’, ‘BH’  package ‘assertthat’ successfully unpacked and MD5 sums checked  package ‘R6’ successfully unpacked and MD5 sums checked  package ‘Rcpp’ successfully unpacked and MD5 sums checked  package ‘magrittr’ successfully unpacked and MD5 sums checked  package ‘lazyeval’ successfully unpacked and MD5 sums checked  package ‘DBI’ successfully unpacked and MD5 sums checked  package ‘BH’ successfully unpacked and MD5 sums checked  package ‘dplyr’ successfully unpacked and MD5 sums checked  - Installing ‘ggplot2’  also installing the dependencies ‘colorspace’, ‘stringr’, ‘RColorBrewer’, ‘dichromat’, ‘munsell’, ‘labeling’, ‘plyr’, ‘digest’, ‘gtable’, ‘reshape2’, ‘scales’, ‘proto’, ‘MASS’
  • 13. 13 Lots of dependent packages loaded  package ‘colorspace’ successfully unpacked and MD5 sums checked  package ‘stringr’ successfully unpacked and MD5 sums checked  package ‘RColorBrewer’ successfully unpacked and MD5 sums checked  package ‘dichromat’ successfully unpacked and MD5 sums checked  package ‘munsell’ successfully unpacked and MD5 sums checked  package ‘labeling’ successfully unpacked and MD5 sums checked  package ‘plyr’ successfully unpacked and MD5 sums checked  package ‘digest’ successfully unpacked and MD5 sums checked  package ‘gtable’ successfully unpacked and MD5 sums checked  package ‘reshape2’ successfully unpacked and MD5 sums checked  package ‘scales’ successfully unpacked and MD5 sums checked  package ‘proto’ successfully unpacked and MD5 sums checked  package ‘MASS’ successfully unpacked and MD5 sums checked  package ‘ggplot2’ successfully unpacked and MD5 sums checked  - Previously installed ‘magrittr’  - Installing ‘tidyr’  also installing the dependency ‘stringi’  package ‘stringi’ successfully unpacked and MD5 sums checked  package ‘tidyr’ successfully unpacked and MD5 sums checked  checkpoint process complete
  • 14. Checkpoint tips for script authors  Work within a project – Dedicated folder with scripts, data and output – eg /Users/Joe/Rstudio_Projects/weather  Create a master .R script file beginning with library(checkpoint) checkpoint("DATE") – package versions used will be as of this date  Don’t use install.packages directly – Use library() and checkpoint does the rest – You can have different package versions installed for different projects at the same time! 14
  • 15. Sharing projects with checkpoint  Just share your script or project folder!  Recipient only needs: – compatible R version – checkpoint package (installed with RRO) – Internet connection to MRAN (at least first time)  Checkpoint takes care of: – Installing CRAN packages • Binaries (ease of installation) • Correct versions (reproducibility) • Dependencies (ease of installation) – Eliminating conflicts with other installed packages 15
  • 16. The checkpoint magic The checkpoint() call does all this:  Scans project for required packages  Installs required packages and dependencies – Packages installed specific to project – Versions specific to checkpoint date • Installed in ~/.checkpoint/DATE • Skips packages if already installed (2nd run through)  Reconfigures package search path – Points only to project-specific library 16
  • 17. MRAN checkpoint server Checkpoint uses MRAN’s downstream CRAN mirror with daily snapshots. 17 CRAN RRDaily snapshots checkpoint package library(checkpoint) checkpoint("2015-01-28") CRAN mirror cran.revolutionanalytics.com (Windows binaries virus-scanned) checkpoint server Midnight UTC mran.revolutionanalytics.com/snapshot/
  • 18. checkpoint server - implementation Checkpoint uses MRAN’s downstream CRAN mirror with daily snapshots.  rsync to mirror CRAN daily – Only downloads changed packages  zfs to store incremental snapshots – Storage only required for new packages  Organizes snapshots into a labelled hierarchy – mran.revolutionanalytics.com/snapshot/YYYY-MM-DD  MRAN hosted by high-performance cloud provider – Provisioned for availability and latency 18 https://github.com/RevolutionAnalytics/checkpoint-server
  • 19. 19 Using non-CRAN packages Reproducibly  Today, checkpoint only manages packages from CRAN  GitHub: use install_github with a specific checkin hash install_github("ramnathv/rblocks", ref="a85e748390c17c752cc0ba961120d1e784fb1956")  BioConductor: use packages from a specific BioConductor release – Not as easy as it seems!  Private packages / behind the firewall – use miniCRAN to create a local, static repository
  • 20. 20 Why use checkpoint?  Write and share code R whose results can be reproduced, even if new (and possibly incompatible) package versions are released later.  Share R scripts with others that will automatically install the appropriate package versions (no need to manually install CRAN packages).  Write R scripts that use older versions of packages, or packages that are no longer available on CRAN.  Install packages (or package versions) visible only to a specific project, without affecting other R projects or R users on the same system.  Manage multiple projects that use different package versions.
  • 21. What about packrat?  Packrat is flexible and powerful – Supports non-CRAN packages (e.g. github) – Allows mix-and-matching package versions – Requires shipping all package source – Requires recipients to build packages from source  Checkpoint is simple – Reproducibility from one script – Simple for recipients to reproduce results – Only allows use of CRAN packages versions that have been tested together – Requires Web access (and availability of MRAN) 21 rstudio.github.io/packrat/
  • 22. 22 Revolution R Open is:  An enhanced open source distribution of R  Compatible with all R-related software  Multi-threaded for performance  Focused on reproducibility  Open source (GPLv2 license)  Available for Windows, Mac OS X, Ubuntu, Red Hat and OpenSUSE  Includes checkpoint  Designed to work with RStudio  Side-by-side installations  Download from mran.revolutionanalytics.com
  • 23. CRAN mirrors and RRO  Revolution R Open ships with a fixed default CRAN mirror – Currently, 1 March 2015 snapshot (v 8.0.2) – Soon: (v 8.0.3)  All users of same version get same CRAN package versions by default – regardless when “install.packages” is run  Use checkpoint to access newer package versions 23
  • 24. 24 MRAN The Managed R Archive Network  Download Revolution R Open  Learn about R and RRO  Explore R Packages  Explore Task Views  R tips and applications  Daily CRAN snapshots mran.revolutionanalytics.com
  • 25. Thank you. Learn more at: projects.revolutionanalytics.com/rrt/ Find archived webinars at: revolutionanalytics.com/webinars www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR Joseph Rickert Data Scientist, Community Manager Revolution Analytics @RevoJoe Joseph.rickert@revolutionanalytics.com
  • 26. 26 Multi-threaded performance  Intel MKL replaces standard BLAS/LAPACK algorithms (Windows/Linux)  Pipelined operations – Optimized for Intel, works for all archs  High-performance algorithms  Sequential  Parallel – Uses as many threads as there are available cores – Control with: setMKLthreads(<value>)  No need to change any R code  Included in RRO binary distribution More at Revolutions blog
  • 27. 27 # Create a local checkpoint library library(checkpoint) checkpoint("2014-11-14") > library(checkpoint) checkpoint: Part of the Reproducible R Toolkit from Revolution Analytics http://projects.revolutionanalytics.com/rrt/ Warning message: package ‘checkpoint’ was built under R version 3.1.2 > checkpoint("2014-11-14") Scanning for loaded pkgs Scanning for packages used in this project Installing packages used in this project Warning: dependencies ‘stats’, ‘tools’, ‘utils’, ‘methods’, ‘graphics’, ‘splines’, ‘grid’, ‘grDevices’ are not available also installing the dependencies ‘bitops’, ‘stringr’, ‘digest’, ‘jsonlite’, ‘lattice’, ‘RCurl’, ‘rjson’, ‘statmod’, ‘survival’, ‘XML’, ‘httr’, ‘Matrix’ package ‘bitops’ successfully unpacked and MD5 sums checked package ‘stringr’ successfully unpacked and MD5 sums checked package ‘digest’ successfully unpacked and MD5 sums checked package ‘jsonlite’ successfully unpacked and MD5 sums checked package ‘lattice’ successfully unpacked and MD5 sums checked package ‘RCurl’ successfully unpacked and MD5 sums checked package ‘rjson’ successfully unpacked and MD5 sums checked package ‘statmod’ successfully unpacked and MD5 sums checked package ‘survival’ successfully unpacked and MD5 sums checked package ‘XML’ successfully unpacked and MD5 sums checked package ‘httr’ successfully unpacked and MD5 sums checked package ‘Matrix’ successfully unpacked and MD5 sums checked package ‘h2o’ successfully unpacked and MD5 sums checked package ‘miniCRAN’ successfully unpacked and MD5 sums checked package ‘igraph’ successfully unpacked and MD5 sums checked
  • 28. SVD 100,000 by 2,000 Matrix 28 RRO R3.1.1 user system elapsed 78.08 1.18 42.51 R 3.1.2 user system elapsed 431.44 0.95 432.70
  • 29. 29 Revolution R Plus Technical Support Technical support for R, from the R experts.  Developers: Email and phone support, 8AM-6PM Mon-Fri  Production Servers and Hadoop: 24x7 Severity-1 coverage  On-line case management and knowledgebase  Access to technical resources, documentation and user forums  Exclusive on-line webinars from community experts  Guaranteed response times Also available: expert hands-on and on-line training for R, from Revolution Analytics AcademyR. www.revolutionanalytics.com/plus

Editor's Notes

  1. http://xkcd.com/242/