2. 2Enterprise R platform
Motivation
1. Can new hires get set up in the environment to run analyses
on their first day?
2. Can data scientists utilize the latest tools/packages without
help from IT?
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions?
The “Joel Test” for Data Science
3. 3Enterprise R platform
Motivation
5. Does collaboration happen through a system other than
email?
6. Can predictive models be deployed to production without
custom engineering or infrastructure work?
7. Is there a single place to search for past research and
reusable data sets, code, etc.?
8. Do your data scientists use the best tools money can buy?
Source: https://blog.dominodatalab.com/joel-test-data-science/
+1 Can you access your statistics without using R?
The “Joel Test” for Data Science
5. 5Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
CI
• Documentation
• Unit tests
• Build
6. 6Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
CI
• Upload to
repository
7. 7Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
CI
• Deploy to separate
applications
8. 8Enterprise R platform
laiR
1. Multiple repositories
• Shared and Private repositories
• CRAN & Bioconductor
2. Upload any tar.gz with a DESCRIPTION
3. Search packages
4. Dependency management across all repositories
5. Zero-config installation
6. Maintain multiple versions of the same package
7. For a demo see: https://lair.ownr.io (demo / demo123)
The R package repository
9. 9Enterprise R platform
laiR
1. Can new hires get set up in the environment to run analyses
on their first day?
2. Can data scientists utilize the latest tools/packages without
help from IT? ✅
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions?
The “Joel Test” for Data Science
10. 10Enterprise R platform
laiR
5. Does collaboration happen through a system other than
email? ✅
6. Can predictive models be deployed to production without
custom engineering or infrastructure work?
7. Is there a single place to search for past research and
reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R?
The “Joel Test” for Data Science
11. 11Enterprise R platform
roveR
1. Preconfigured, separated R environments
2. Open-source R package
3. Linking R environments to laiR installations
4. Release packages into laiR
5. Install dependencies with specific versions using laiR API
6. Install rover using the following command:
R container management
install.packages("rover",
repos = c("https://lair.functionalfinances.com/repos/cran/",
"https://lair.functionalfinances.com/repos/shared/"),
method = "curl")
12. 12Enterprise R platform
roveR + laiR
1. Can new hires get set up in the environment to run analyses
on their first day? ✅
2. Can data scientists utilize the latest tools/packages without
help from IT? ✅
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions? ✅
The “Joel Test” for Data Science
13. 13Enterprise R platform
roveR + laiR
5. Does collaboration happen through a system other than
email? ✅
6. Can predictive models be deployed to production without
custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and
reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science
14. 14Enterprise R platform
exposeR
1. Access selected R calculations via REST
2. roveR container management - CRUD
3. Select functions to expose from packages in container
4. Allows other application to integrate R calculation
5. R in a dedicated environment
6. Zero-config installation
REST API for R containers
15. 15Enterprise R platform
exposeR + roveR + laiR
1. Can new hires get set up in the environment to run analyses
on their first day? ✅
2. Can data scientists utilize the latest tools/packages without
help from IT? ✅
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions? ✅
The “Joel Test” for Data Science
16. 16Enterprise R platform
exposeR + roveR + laiR
5. Does collaboration happen through a system other than
email? ✅
6. Can predictive models be deployed to production without
custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and
reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science
17. 17Enterprise R platform
Reference Implementation
The Delta-Lloyd case studyDevelopment
R DEPLOYMENT PROCESS Functional Finances Ltd | January 13, 2016
R Container
roveR::create
GIT
commit
test, build, install
Jenkinscheckout
checkout
test, build, install
laiR
roveR::release
roveR::install
Development
Start
R Container
roveR::install
roveR::create
Test Start Jenkins
Shiny
CLI
execute
execute
download
release
laiR
R Container
(MPI grid)
roveR::install
roveR::create
Production Start
Shiny
CLI
execute
execute
TestProduction
roveR::install
roveR::install
Sandbox
R Container
(MPI grid)
roveR::create
GIT
commit
test, build, install
clone
checkoutModel
Development
Start
18. 18Enterprise R platform
Reference Implementation
1. Can new hires get set up in the environment to run analyses
on their first day? ✅
2. Can data scientists utilize the latest tools/packages without
help from IT? ✅
3. Can data scientists use on-demand and scalable compute
resources without help from IT/dev ops? ✅
4. Can data scientists find and reproduce past experiments and
results, using the original code, data, parameters, and
software versions? ✅
The “Joel Test” for Data Science
19. 19Enterprise R platform
Reference Implementation
5. Does collaboration happen through a system other than
email? ✅
6. Can predictive models be deployed to production without
custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and
reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy? ✅
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science