SlideShare a Scribd company logo
Nobody Knows What It’s Like To Be the Bad Man
The Development Process for the caret Package
Max Kuhn
Pfizer Global R&D
max.kuhn@pfizer.com
Zachary Deane–Mayer
Cognius
zach.mayer@gmail.com
Model Function Consistency
Since there are many modeling packages in R written by di↵erent
people, there are inconsistencies in how models are specified and
predictions are created.
For example, many models have only one method of specifying the
model (e.g. formula method only)
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 2 / 19
Generating Class Probabilities Using Di↵erent
Packages
Function predict Function Syntax
MASS::lda predict(obj) (no options needed)
stats:::glm predict(obj, type = "response")
gbm::gbm predict(obj, type = "response", n.trees)
mda::mda predict(obj, type = "posterior")
rpart::rpart predict(obj, type = "prob")
RWeka::Weka predict(obj, type = "probability")
caTools::LogitBoost predict(obj, type = "raw", nIter)
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 3 / 19
The caret Package
The caret package was developed to:
create a unified interface for modeling and prediction (interfaces
to 180 models)
streamline model tuning using resampling
provide a variety of “helper” functions and classes for
day–to–day model building tasks
increase computational e ciency using parallel processing
First commits within Pfizer: 6/2005, First version on CRAN: 10/2007
Website: http://topepo.github.io/caret/
JSS Paper: http://www.jstatsoft.org/v28/i05/paper
Model List: http://topepo.github.io/caret/bytag.html
Many computing sections in APM
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 4 / 19
Package Dependencies
One thing that makes caret di↵erent from most other packages is
that it uses code from an abnormally large number (> 80) of other
packages.
Originally, these were in the Depends field of the DESCRIPTION file
which caused all of them to be loaded with caret.
For many years, they were moved to Suggests, which solved that
issue.
However, their formal dependency in the DESCRIPTION file required
CRAN to install hundreds of other packages to check caret. The
maintainers were not pleased.
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 5 / 19
Package Dependencies
This problem was somewhat alleviated at the end of 2013 when
custom methods were expanded.
Although this functionality had already existed in the package for
some time, it was refactored to be more user friendly.
In the process, much of the modeling code was moved out of caret’s
R files and into R objects, eliminating the formal dependencies.
Right now, the total number of dependencies is much smaller (2
Depends, 7 Imports, and 25 Suggests).
This still a↵ects testing though (described later). Also:
1 package is needed for this model and is not installed. (gbm).
Would you like to try to install it now?
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 6 / 19
The Basic Release Process
1 create a few dynamic man pages
2 use R CMD check --as-cran to ensure passing CRAN tests
and unit tests
3 update all packages (and R)
4 run regression tests and evaluate results
5 send to CRAN
6 repeat
7 repeat
8 install passed caret version
9 generate HTML documentation and sync github io branch
10 profit!
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 7 / 19
Toolbox
RStudio projects: a clean, self contained package development
environment
I No more setwd(’/path/to/some/folder’) in scripts
I Keep track of project-wide standards, e.g. code formatting
I An RStudio project was the first thing we added after moving
the caret repository to github
devtools: automate boring tasks
testthat: automated unit testing
roxygen2: combine source code with documentation
github: source control
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 8 / 19
devtools
devtools::install builds package and installs it locally
devtools::check:
1 Builds documentation
2 Runs unit tests
3 Builds tarball
4 Runs R CMD CHECK
devtools::release builds package and submits it to CRAN
devtools::install github enables non-CRAN code distribution
or distribution of private packages
devtools::use travis enables automated unit testing through
travis-CI and test coverage reports through coveralls
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 9 / 19
Testing
Testing for the package occurs in a few di↵erent ways:
units tests via testthat and travis-CI
regression tests for consistency
Automated unit testing via testthat
testthat::use testhat
Unit tests prevent new features from breaking old code
All functions should have associated tests
Run during R CMD check --as-cran
Can specify that certain tests be skipped on CRAN
caret is slowly adding more testthat tests
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 10 / 19
github + travis + coveralls
Travis and coveralls are tools for automated unit testing
Travis reports test failutes and CRAN errors/warnings
Coveralls reports % of code covered by unit tests
Both automatically comment on the PR itself
Contributor submits code via a pull request
Travis notifies them of test failures
Coveralls notifies them to write tests for new functions
Automated feedback on code quality allows rapid iteration
Code review once unit tests and R CMD CHECK pass
github supports line-by-line comments
Usually several more iterations here
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 11 / 19
Regression Testing
Prior to CRAN release (or whenever required), a comprehensive set of
regression tests can be conducted.
All modeling packages are updated to their current CRAN versions.
For each model accessed by train, rfe, and/or sbf, a set of test
cases are computed with the production version of caret and the
devel version.
First, test cases are evaluated to make sure that nothing has been
broken by updated versions of the consistuent packages.
Di↵s of the model results are computed to assess any di↵erences in
caret versions.
This process takes approximately 3hrs to complete using make -j 12
on a Mac Pro.
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 12 / 19
Regression Testing
$ R CMD BATCH move_files.R
$ cd ~/tmp/2015_04_19_09__6.0-41/
$ make -j 12 -i
2015-04-19 09:13:44: Starting ada
2015-04-19 09:13:44: Starting AdaBag
2015-04-19 09:13:44: Starting AdaBoost.M1
2015-04-19 09:13:44: Starting ANFIS
:
make: [FH.GBML.RData] Error 1 (ignored)
:
2015-04-19 12:03:52: Finished WM
2015-04-19 12:04:48: Finished xyf
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 13 / 19
Documentation
caret originally contained four package vignettes with in–depth
descriptions of functionality with examples.
However, this added time to R CMD check and was a general pain for
CRAN.
E↵orts to make the vignettes more computationally e cient (e.g.
reducing the number of examples, resamples, etc.) diminished the
e↵ectiveness of the documentation.
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 14 / 19
Documentation
The documentation was moved out of the package and to the github
IO page.
These pages are built using knitr whenever a new version is sent to
CRAN. Some advantages are:
longer and more relevant examples are available
update schedule is under my control
dynamic documentation (e.g. D3 network graphs, JS tables)
better formatting
It currently takes about 4hr to create these (using parallel processing
when possible).
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 15 / 19
Backup Slides
roxygen2
Simplified package documentation
Automates many parts of the documentation process
Special comment block above each function
Name, description, arguments, etc.
Code and documentation are in the same source file
A must have for new packages but hard to convert existing packages
caret has 92 .Rd files
I’m not in a hurry to re-write them all in roxygen2 format
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 17 / 19
Required “Optimizations”
For example, there is one check that produces a large number of false
positive warnings. For example:
> bwplot.diff.resamples <- function (x, data, metric = x$metric, ...) {
+ ## some code
+ plotData <- subset(plotData, Metric %in% metric)
+ ## more code
+ }
will trigger a warning that “bwplot.diff.resamples: no visible
binding for global variable ’Metric’”.
The “solution” is to have a file that is sourced first in the package
(e.g. aaa.R) with the line
> Metric <- NULL
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 18 / 19
Judging the Severity of Problems
It’s hard to tell which warnings should be ignored and which should
not. There is also the issue of inconsistencies related to who is “on
duty” when you submit your package.
It is hard to believe that someone took the time for this:
Description Field: "My awesome R package"
R CMD check: "Malformed Description field: should contain one
or more complete sentences."
Description Field: "This package is awesome"
R CMD check: "The Description field should not start with the
package name, ’This package’ or similar."
Description Field: "Blah blah blah."
R CMD check: PASS
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 19 / 19

More Related Content

What's hot

High-Performance Python
High-Performance PythonHigh-Performance Python
High-Performance Python
Work-Bench
 
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
corpaulbezemer
 
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, Puppet
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, PuppetPuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, Puppet
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, Puppet
Puppet
 
Contributing to Upstream Open Source Projects
Contributing to Upstream Open Source ProjectsContributing to Upstream Open Source Projects
Contributing to Upstream Open Source Projects
Scott Garman
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in Technology
Daniel Barker
 
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
Grid Dynamics
 
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Shane McIntosh
 
Dependency management in golang
Dependency management in golangDependency management in golang
Dependency management in golang
Ramit Surana
 
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Raffi Khatchadourian
 
Optimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest ApiOptimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest Api
Iman Syahputra Situmorang
 
Quickly testing legacy code cppp.fr 2019 - clare macrae
Quickly testing legacy code   cppp.fr 2019 - clare macraeQuickly testing legacy code   cppp.fr 2019 - clare macrae
Quickly testing legacy code cppp.fr 2019 - clare macrae
Clare Macrae
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Paul Richards
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
swy351
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
swy351
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
Fasten Project
 
Becoming A Plumber: Building Deployment Pipelines - LISA17
Becoming A Plumber: Building Deployment Pipelines - LISA17Becoming A Plumber: Building Deployment Pipelines - LISA17
Becoming A Plumber: Building Deployment Pipelines - LISA17
Daniel Barker
 
Analyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHubAnalyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHub
Ahmed Zerouali
 
OpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summitOpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summit
Anne Gentle
 
ASE2010
ASE2010ASE2010
ASE2010
swy351
 

What's hot (19)

High-Performance Python
High-Performance PythonHigh-Performance Python
High-Performance Python
 
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
 
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, Puppet
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, PuppetPuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, Puppet
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, Puppet
 
Contributing to Upstream Open Source Projects
Contributing to Upstream Open Source ProjectsContributing to Upstream Open Source Projects
Contributing to Upstream Open Source Projects
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in Technology
 
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
MPL: modular pipeline library - Dynamic Talks Milwaukee 4/11/2019
 
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...
 
Dependency management in golang
Dependency management in golangDependency management in golang
Dependency management in golang
 
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
 
Optimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest ApiOptimizing and Profiling Golang Rest Api
Optimizing and Profiling Golang Rest Api
 
Quickly testing legacy code cppp.fr 2019 - clare macrae
Quickly testing legacy code   cppp.fr 2019 - clare macraeQuickly testing legacy code   cppp.fr 2019 - clare macrae
Quickly testing legacy code cppp.fr 2019 - clare macrae
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
 
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Bru...
 
Becoming A Plumber: Building Deployment Pipelines - LISA17
Becoming A Plumber: Building Deployment Pipelines - LISA17Becoming A Plumber: Building Deployment Pipelines - LISA17
Becoming A Plumber: Building Deployment Pipelines - LISA17
 
Analyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHubAnalyzing Packages in Docker images hosted On DockerHub
Analyzing Packages in Docker images hosted On DockerHub
 
OpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summitOpenStack documentation & translation management 2012_summit
OpenStack documentation & translation management 2012_summit
 
ASE2010
ASE2010ASE2010
ASE2010
 

Viewers also liked

But when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only oneBut when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only one
Work-Bench
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
Work-Bench
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
Work-Bench
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
Work-Bench
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
Work-Bench
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
Work-Bench
 
The Feels
The FeelsThe Feels
The Feels
Work-Bench
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
Work-Bench
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
Work-Bench
 
R for Everything
R for EverythingR for Everything
R for Everything
Work-Bench
 
Julia + R for Data Science
Julia + R for Data ScienceJulia + R for Data Science
Julia + R for Data Science
Work-Bench
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
Work-Bench
 
Data Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program AnalysisData Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program Analysis
Work-Bench
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament edition
Work-Bench
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
Work-Bench
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
Work-Bench
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
Work-Bench
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
Work-Bench
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Work-Bench
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at Airbnb
Work-Bench
 

Viewers also liked (20)

But when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only oneBut when you call me Bayesian I know I’m not the only one
But when you call me Bayesian I know I’m not the only one
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
 
The Feels
The FeelsThe Feels
The Feels
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
R for Everything
R for EverythingR for Everything
R for Everything
 
Julia + R for Data Science
Julia + R for Data ScienceJulia + R for Data Science
Julia + R for Data Science
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
 
Data Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program AnalysisData Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program Analysis
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament edition
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at Airbnb
 

Similar to Nobody Knows What It’s Like To Be the Bad Man: The Development Process for the Caret Package

The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...
odsc
 
Bounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise EnvironmentBounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise Environment
AdaCore
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
Perrin Harkins
 
Go 1.8 Release Party
Go 1.8 Release PartyGo 1.8 Release Party
Go 1.8 Release Party
Rodolfo Carvalho
 
DCEU 18: Developing with Docker Containers
DCEU 18: Developing with Docker ContainersDCEU 18: Developing with Docker Containers
DCEU 18: Developing with Docker Containers
Docker, Inc.
 
Native Hadoop with prebuilt spark
Native Hadoop with prebuilt sparkNative Hadoop with prebuilt spark
Native Hadoop with prebuilt spark
arunkumar sadhasivam
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
Alexandre Gouaillard
 
Containerised Testing at Demonware : PyCon Ireland 2016
Containerised Testing at Demonware : PyCon Ireland 2016Containerised Testing at Demonware : PyCon Ireland 2016
Containerised Testing at Demonware : PyCon Ireland 2016
Thomas Shaw
 
Create ReactJS Component & publish as npm package
Create ReactJS Component & publish as npm packageCreate ReactJS Component & publish as npm package
Create ReactJS Component & publish as npm package
Andrii Lundiak
 
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Ralf Gommers
 
Stop Being Lazy and Test Your Software
Stop Being Lazy and Test Your SoftwareStop Being Lazy and Test Your Software
Stop Being Lazy and Test Your Software
Laura Frank Tacho
 
PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...
PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...
PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...
Puppet
 
ABCs of docker
ABCs of dockerABCs of docker
ABCs of docker
Sabyrzhan Tynybayev
 
Bottlenecks rel b works and rel c planning
Bottlenecks rel b works and rel c planningBottlenecks rel b works and rel c planning
Bottlenecks rel b works and rel c planning
Jun Li
 
Creating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneCreating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with Chaperone
Gary Wisniewski
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
 
Some Pitfalls with Python and Their Possible Solutions v1.0
Some Pitfalls with Python and Their Possible Solutions v1.0Some Pitfalls with Python and Their Possible Solutions v1.0
Some Pitfalls with Python and Their Possible Solutions v1.0
Yann-Gaël Guéhéneuc
 
Docker pipelines
Docker pipelinesDocker pipelines
Docker pipelines
Chris Mague
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learning
Mikhail Rozhkov
 

Similar to Nobody Knows What It’s Like To Be the Bad Man: The Development Process for the Caret Package (20)

The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...
 
Bounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise EnvironmentBounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise Environment
 
Ns2leach
Ns2leachNs2leach
Ns2leach
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
 
Go 1.8 Release Party
Go 1.8 Release PartyGo 1.8 Release Party
Go 1.8 Release Party
 
DCEU 18: Developing with Docker Containers
DCEU 18: Developing with Docker ContainersDCEU 18: Developing with Docker Containers
DCEU 18: Developing with Docker Containers
 
Native Hadoop with prebuilt spark
Native Hadoop with prebuilt sparkNative Hadoop with prebuilt spark
Native Hadoop with prebuilt spark
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
 
Containerised Testing at Demonware : PyCon Ireland 2016
Containerised Testing at Demonware : PyCon Ireland 2016Containerised Testing at Demonware : PyCon Ireland 2016
Containerised Testing at Demonware : PyCon Ireland 2016
 
Create ReactJS Component & publish as npm package
Create ReactJS Component & publish as npm packageCreate ReactJS Component & publish as npm package
Create ReactJS Component & publish as npm package
 
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
 
Stop Being Lazy and Test Your Software
Stop Being Lazy and Test Your SoftwareStop Being Lazy and Test Your Software
Stop Being Lazy and Test Your Software
 
PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...
PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...
PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...
 
ABCs of docker
ABCs of dockerABCs of docker
ABCs of docker
 
Bottlenecks rel b works and rel c planning
Bottlenecks rel b works and rel c planningBottlenecks rel b works and rel c planning
Bottlenecks rel b works and rel c planning
 
Creating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with ChaperoneCreating Developer-Friendly Docker Containers with Chaperone
Creating Developer-Friendly Docker Containers with Chaperone
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Some Pitfalls with Python and Their Possible Solutions v1.0
Some Pitfalls with Python and Their Possible Solutions v1.0Some Pitfalls with Python and Their Possible Solutions v1.0
Some Pitfalls with Python and Their Possible Solutions v1.0
 
Docker pipelines
Docker pipelinesDocker pipelines
Docker pipelines
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learning
 

More from Work-Bench

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
Work-Bench
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
Work-Bench
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
Work-Bench
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
Work-Bench
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
Work-Bench
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
Work-Bench
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
Work-Bench
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
Work-Bench
 

More from Work-Bench (8)

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Nobody Knows What It’s Like To Be the Bad Man: The Development Process for the Caret Package

  • 1. Nobody Knows What It’s Like To Be the Bad Man The Development Process for the caret Package Max Kuhn Pfizer Global R&D max.kuhn@pfizer.com Zachary Deane–Mayer Cognius zach.mayer@gmail.com
  • 2. Model Function Consistency Since there are many modeling packages in R written by di↵erent people, there are inconsistencies in how models are specified and predictions are created. For example, many models have only one method of specifying the model (e.g. formula method only) Kuhn & Deane–Mayer (Pfizer / Cognius) caret 2 / 19
  • 3. Generating Class Probabilities Using Di↵erent Packages Function predict Function Syntax MASS::lda predict(obj) (no options needed) stats:::glm predict(obj, type = "response") gbm::gbm predict(obj, type = "response", n.trees) mda::mda predict(obj, type = "posterior") rpart::rpart predict(obj, type = "prob") RWeka::Weka predict(obj, type = "probability") caTools::LogitBoost predict(obj, type = "raw", nIter) Kuhn & Deane–Mayer (Pfizer / Cognius) caret 3 / 19
  • 4. The caret Package The caret package was developed to: create a unified interface for modeling and prediction (interfaces to 180 models) streamline model tuning using resampling provide a variety of “helper” functions and classes for day–to–day model building tasks increase computational e ciency using parallel processing First commits within Pfizer: 6/2005, First version on CRAN: 10/2007 Website: http://topepo.github.io/caret/ JSS Paper: http://www.jstatsoft.org/v28/i05/paper Model List: http://topepo.github.io/caret/bytag.html Many computing sections in APM Kuhn & Deane–Mayer (Pfizer / Cognius) caret 4 / 19
  • 5. Package Dependencies One thing that makes caret di↵erent from most other packages is that it uses code from an abnormally large number (> 80) of other packages. Originally, these were in the Depends field of the DESCRIPTION file which caused all of them to be loaded with caret. For many years, they were moved to Suggests, which solved that issue. However, their formal dependency in the DESCRIPTION file required CRAN to install hundreds of other packages to check caret. The maintainers were not pleased. Kuhn & Deane–Mayer (Pfizer / Cognius) caret 5 / 19
  • 6. Package Dependencies This problem was somewhat alleviated at the end of 2013 when custom methods were expanded. Although this functionality had already existed in the package for some time, it was refactored to be more user friendly. In the process, much of the modeling code was moved out of caret’s R files and into R objects, eliminating the formal dependencies. Right now, the total number of dependencies is much smaller (2 Depends, 7 Imports, and 25 Suggests). This still a↵ects testing though (described later). Also: 1 package is needed for this model and is not installed. (gbm). Would you like to try to install it now? Kuhn & Deane–Mayer (Pfizer / Cognius) caret 6 / 19
  • 7. The Basic Release Process 1 create a few dynamic man pages 2 use R CMD check --as-cran to ensure passing CRAN tests and unit tests 3 update all packages (and R) 4 run regression tests and evaluate results 5 send to CRAN 6 repeat 7 repeat 8 install passed caret version 9 generate HTML documentation and sync github io branch 10 profit! Kuhn & Deane–Mayer (Pfizer / Cognius) caret 7 / 19
  • 8. Toolbox RStudio projects: a clean, self contained package development environment I No more setwd(’/path/to/some/folder’) in scripts I Keep track of project-wide standards, e.g. code formatting I An RStudio project was the first thing we added after moving the caret repository to github devtools: automate boring tasks testthat: automated unit testing roxygen2: combine source code with documentation github: source control Kuhn & Deane–Mayer (Pfizer / Cognius) caret 8 / 19
  • 9. devtools devtools::install builds package and installs it locally devtools::check: 1 Builds documentation 2 Runs unit tests 3 Builds tarball 4 Runs R CMD CHECK devtools::release builds package and submits it to CRAN devtools::install github enables non-CRAN code distribution or distribution of private packages devtools::use travis enables automated unit testing through travis-CI and test coverage reports through coveralls Kuhn & Deane–Mayer (Pfizer / Cognius) caret 9 / 19
  • 10. Testing Testing for the package occurs in a few di↵erent ways: units tests via testthat and travis-CI regression tests for consistency Automated unit testing via testthat testthat::use testhat Unit tests prevent new features from breaking old code All functions should have associated tests Run during R CMD check --as-cran Can specify that certain tests be skipped on CRAN caret is slowly adding more testthat tests Kuhn & Deane–Mayer (Pfizer / Cognius) caret 10 / 19
  • 11. github + travis + coveralls Travis and coveralls are tools for automated unit testing Travis reports test failutes and CRAN errors/warnings Coveralls reports % of code covered by unit tests Both automatically comment on the PR itself Contributor submits code via a pull request Travis notifies them of test failures Coveralls notifies them to write tests for new functions Automated feedback on code quality allows rapid iteration Code review once unit tests and R CMD CHECK pass github supports line-by-line comments Usually several more iterations here Kuhn & Deane–Mayer (Pfizer / Cognius) caret 11 / 19
  • 12. Regression Testing Prior to CRAN release (or whenever required), a comprehensive set of regression tests can be conducted. All modeling packages are updated to their current CRAN versions. For each model accessed by train, rfe, and/or sbf, a set of test cases are computed with the production version of caret and the devel version. First, test cases are evaluated to make sure that nothing has been broken by updated versions of the consistuent packages. Di↵s of the model results are computed to assess any di↵erences in caret versions. This process takes approximately 3hrs to complete using make -j 12 on a Mac Pro. Kuhn & Deane–Mayer (Pfizer / Cognius) caret 12 / 19
  • 13. Regression Testing $ R CMD BATCH move_files.R $ cd ~/tmp/2015_04_19_09__6.0-41/ $ make -j 12 -i 2015-04-19 09:13:44: Starting ada 2015-04-19 09:13:44: Starting AdaBag 2015-04-19 09:13:44: Starting AdaBoost.M1 2015-04-19 09:13:44: Starting ANFIS : make: [FH.GBML.RData] Error 1 (ignored) : 2015-04-19 12:03:52: Finished WM 2015-04-19 12:04:48: Finished xyf Kuhn & Deane–Mayer (Pfizer / Cognius) caret 13 / 19
  • 14. Documentation caret originally contained four package vignettes with in–depth descriptions of functionality with examples. However, this added time to R CMD check and was a general pain for CRAN. E↵orts to make the vignettes more computationally e cient (e.g. reducing the number of examples, resamples, etc.) diminished the e↵ectiveness of the documentation. Kuhn & Deane–Mayer (Pfizer / Cognius) caret 14 / 19
  • 15. Documentation The documentation was moved out of the package and to the github IO page. These pages are built using knitr whenever a new version is sent to CRAN. Some advantages are: longer and more relevant examples are available update schedule is under my control dynamic documentation (e.g. D3 network graphs, JS tables) better formatting It currently takes about 4hr to create these (using parallel processing when possible). Kuhn & Deane–Mayer (Pfizer / Cognius) caret 15 / 19
  • 17. roxygen2 Simplified package documentation Automates many parts of the documentation process Special comment block above each function Name, description, arguments, etc. Code and documentation are in the same source file A must have for new packages but hard to convert existing packages caret has 92 .Rd files I’m not in a hurry to re-write them all in roxygen2 format Kuhn & Deane–Mayer (Pfizer / Cognius) caret 17 / 19
  • 18. Required “Optimizations” For example, there is one check that produces a large number of false positive warnings. For example: > bwplot.diff.resamples <- function (x, data, metric = x$metric, ...) { + ## some code + plotData <- subset(plotData, Metric %in% metric) + ## more code + } will trigger a warning that “bwplot.diff.resamples: no visible binding for global variable ’Metric’”. The “solution” is to have a file that is sourced first in the package (e.g. aaa.R) with the line > Metric <- NULL Kuhn & Deane–Mayer (Pfizer / Cognius) caret 18 / 19
  • 19. Judging the Severity of Problems It’s hard to tell which warnings should be ignored and which should not. There is also the issue of inconsistencies related to who is “on duty” when you submit your package. It is hard to believe that someone took the time for this: Description Field: "My awesome R package" R CMD check: "Malformed Description field: should contain one or more complete sentences." Description Field: "This package is awesome" R CMD check: "The Description field should not start with the package name, ’This package’ or similar." Description Field: "Blah blah blah." R CMD check: PASS Kuhn & Deane–Mayer (Pfizer / Cognius) caret 19 / 19