Delivered by Max Kuhn, Pfizer Global R&D, and Zachary Deane–Mayer, Cognius, at the inaugural New York R Conference in New York City at Work-Bench on Friday, April 44th, and Saturday, April 25th.
Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction.
In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.
The Impact of Code Review Coverage and Participation on Software QualityShane McIntosh
Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.
Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction.
In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.
The Impact of Code Review Coverage and Participation on Software QualityShane McIntosh
Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, PuppetPuppet
Here are the slides from Gareth Rushgrove's presentation called The Future of Testing Puppet Code. Watch the videos at https://www.youtube.com/playlist?list=PLV86BgbREluVjwwt-9UL8u2Uy8xnzpIqa
Architecting The Future - WeRise Women in TechnologyDaniel Barker
Kubernetes and Docker are two of the top open source projects, and they’re built around abstractions and metadata. These two concepts are the key to architecting in the future. Come with me as I dig a little deeper into these concepts within k8s and Docker and provide some examples from my own work.
A collaborative DevOps is crucial to fast project development. With an MPL, modular pipeline library, such collaboration between teams is quick and easy.
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...Shane McIntosh
Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.
Having trouble managing dependencies with golang ? Here's how to resolve those issues using some of the best tools built by the community for the community.
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
Actor concurrency is becoming increasingly important in the development of real-world software systems. Although actor concurrency may be less susceptible to some multithreaded concurrency bugs, such as low-level data races and deadlocks, it comes with its own bugs that may be different. However, the fundamental characteristics of actor concurrency bugs, including their symptoms, root causes, API usages, examples, and differences when they come from different sources are still largely unknown. Actor software development can significantly benefit from a comprehensive qualitative and quantitative understanding of these characteristics, which is the focus of this work, to foster better API documentation, development practices, testing, debugging, repairing, and verification frameworks. To conduct this study, we take the following major steps. First, we construct a set of 186 real-world Akka actor bugs from Stack Overflow and GitHub via manual analysis of 3,924
Stack Overflow questions, answers, and comments and 3,315 GitHub commits, messages, original and modified code snippets, issues, and pull requests. Second, we manually study these actor bugs and their fixes to understand and classify their symptoms, root causes, and API usages. Third, we study the differences between the commonalities and distributions of symptoms, root causes, and API usages of our Stack Overflow and GitHub actor bugs. Fourth, we discuss real-world examples of our actor bugs with these symptoms and root causes. Finally, we investigate the relation of our findings with those of previous work and discuss their implications. A few findings of our study are: (1) symptoms of our actor bugs can be classified into five categories, with Error as the most common symptom and Incorrect Exceptions as the least common, (2) root causes of our actor bugs can be classified into ten categories, with Logic as the most common root cause and Untyped Communication as the least common, (3) a small number of Akka API packages are responsible for most of API usages by our actor bugs, and (4) our Stack Overflow and GitHub actor bugs can differ significantly in commonalities and distributions of their symptoms, root causes, and API usages. While some of our findings agree with those of previous work, others sharply contrast.
Presented at CPPP Conference, Paris, on 15th June 2019.
You've inherited some legacy code: it's valuable, but it doesn't have tests, and it wasn't designed to be testable, so you need to start refactoring! But you can't refactor safely until the code has tests, and you can't add tests without refactoring. How can you ever break out of this loop?
I present a new C++ library for applying Llewellyn Falco's "Approval Tests" approach to testing cross-platform C++ code - for both legacy and green-field systems, and a range of testing frameworks.
I describe its use in some real-world situations, including how to quickly lock down the behaviour of legacy code. I show how to quickly achieve good test coverage, even for very large sets of inputs. Finally, I also describe some general techniques I learned along the way.
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Paul Richards
Presentation by Jean Sanderson given at June R Users Group meeting, describing how to build an R package and submit to CRAN, based on her experiences with the "adwave" package.
Becoming A Plumber: Building Deployment Pipelines - LISA17Daniel Barker
A core part of our IT transformation program is the implementation of deployment pipelines for every application. Attendees will learn how to build abstract pipelines that will allow multiple types of applications to fit the same basic pipeline structure. This has been a big win for injecting change and updating legacy applications.
OpenStack documentation has a series of documents for administrators and API users. All these documents need to be translated. The continuous development of documents brings difficulties to the translation management, but the process can be automated with continuous integration. This slide deck introduces the process and the technologies used in the translation management during OpenStack document internationalization. It also includes a demo for creating a Chinese version of manuals.
But when you call me Bayesian I know I’m not the only oneWork-Bench
Delivered by Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, at the inaugural New York R Conference in New York City at Work-Bench on Friday, April 44th, and Saturday, April 25th.
PuppetConf 2016: The Future of Testing Puppet Code – Gareth Rushgrove, PuppetPuppet
Here are the slides from Gareth Rushgrove's presentation called The Future of Testing Puppet Code. Watch the videos at https://www.youtube.com/playlist?list=PLV86BgbREluVjwwt-9UL8u2Uy8xnzpIqa
Architecting The Future - WeRise Women in TechnologyDaniel Barker
Kubernetes and Docker are two of the top open source projects, and they’re built around abstractions and metadata. These two concepts are the key to architecting in the future. Come with me as I dig a little deeper into these concepts within k8s and Docker and provide some examples from my own work.
A collaborative DevOps is crucial to fast project development. With an MPL, modular pipeline library, such collaboration between teams is quick and easy.
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...Shane McIntosh
Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.
Having trouble managing dependencies with golang ? Here's how to resolve those issues using some of the best tools built by the community for the community.
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
Actor concurrency is becoming increasingly important in the development of real-world software systems. Although actor concurrency may be less susceptible to some multithreaded concurrency bugs, such as low-level data races and deadlocks, it comes with its own bugs that may be different. However, the fundamental characteristics of actor concurrency bugs, including their symptoms, root causes, API usages, examples, and differences when they come from different sources are still largely unknown. Actor software development can significantly benefit from a comprehensive qualitative and quantitative understanding of these characteristics, which is the focus of this work, to foster better API documentation, development practices, testing, debugging, repairing, and verification frameworks. To conduct this study, we take the following major steps. First, we construct a set of 186 real-world Akka actor bugs from Stack Overflow and GitHub via manual analysis of 3,924
Stack Overflow questions, answers, and comments and 3,315 GitHub commits, messages, original and modified code snippets, issues, and pull requests. Second, we manually study these actor bugs and their fixes to understand and classify their symptoms, root causes, and API usages. Third, we study the differences between the commonalities and distributions of symptoms, root causes, and API usages of our Stack Overflow and GitHub actor bugs. Fourth, we discuss real-world examples of our actor bugs with these symptoms and root causes. Finally, we investigate the relation of our findings with those of previous work and discuss their implications. A few findings of our study are: (1) symptoms of our actor bugs can be classified into five categories, with Error as the most common symptom and Incorrect Exceptions as the least common, (2) root causes of our actor bugs can be classified into ten categories, with Logic as the most common root cause and Untyped Communication as the least common, (3) a small number of Akka API packages are responsible for most of API usages by our actor bugs, and (4) our Stack Overflow and GitHub actor bugs can differ significantly in commonalities and distributions of their symptoms, root causes, and API usages. While some of our findings agree with those of previous work, others sharply contrast.
Presented at CPPP Conference, Paris, on 15th June 2019.
You've inherited some legacy code: it's valuable, but it doesn't have tests, and it wasn't designed to be testable, so you need to start refactoring! But you can't refactor safely until the code has tests, and you can't add tests without refactoring. How can you ever break out of this loop?
I present a new C++ library for applying Llewellyn Falco's "Approval Tests" approach to testing cross-platform C++ code - for both legacy and green-field systems, and a range of testing frameworks.
I describe its use in some real-world situations, including how to quickly lock down the behaviour of legacy code. I show how to quickly achieve good test coverage, even for very large sets of inputs. Finally, I also describe some general techniques I learned along the way.
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Paul Richards
Presentation by Jean Sanderson given at June R Users Group meeting, describing how to build an R package and submit to CRAN, based on her experiences with the "adwave" package.
Becoming A Plumber: Building Deployment Pipelines - LISA17Daniel Barker
A core part of our IT transformation program is the implementation of deployment pipelines for every application. Attendees will learn how to build abstract pipelines that will allow multiple types of applications to fit the same basic pipeline structure. This has been a big win for injecting change and updating legacy applications.
OpenStack documentation has a series of documents for administrators and API users. All these documents need to be translated. The continuous development of documents brings difficulties to the translation management, but the process can be automated with continuous integration. This slide deck introduces the process and the technologies used in the translation management during OpenStack document internationalization. It also includes a demo for creating a Chinese version of manuals.
But when you call me Bayesian I know I’m not the only oneWork-Bench
Delivered by Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, at the inaugural New York R Conference in New York City at Work-Bench on Friday, April 44th, and Saturday, April 25th.
Delivered by Josh Katz (Graphics Editor, The New York Times) at the 2016 New York R Conference on April 8th and 9th at Work-Bench. See the rest of the conference videos & presentations at http://www.rstats.nyc.
The caret package is a unified interface to a large number of predictive mode...odsc
The caret package is a unified interface to a large number of predictive model functions in R.
First created in 2005, the home for the source code and documentation has changed several times.
In this talk, we will outline the somewhat unique aspects of the package and how it impacts the development environment (including documentation and testing). Friction points with CRAN and their resolution will also be discussed.
What is new in Go 1.8, and what is expected to come in the Go 1.9 release. Presented at the Golang-Brno meetup group on Feb 28th, 2017.
https://www.meetup.com/Golang-Brno/events/237697083/
DCEU 18: Developing with Docker ContainersDocker, Inc.
Laura Frank Tacho - Director of Engineering, CloudBees
Wouldn't it be great for a new developer on your team to have their dev environment totally set up on their first day? What about having the confidence that your dev environment mirrors testing and prod? Containers enable this to become reality, along with other great benefits like keeping dependencies nice and tidy and making packaged code easier to share. Come learn about the ways containers can help you build and ship software easily, and walk away with two actionable steps you can take to start using Docker containers for development.
Slides that were presented during the webrtc Qt Cmake tutorial at IIT-RTC in October 2017 in Chicago. The slides are not yet complete, and will be updated later.
Containerised Testing at Demonware : PyCon Ireland 2016Thomas Shaw
A journey through the history of containerised testing and how we use tools such as Docker Compose and Docker Swarm to continuously deliver great gaming experiences for our customers.
Create ReactJS Component & publish as npm packageAndrii Lundiak
How to prepare your (provider) ReactJS component and let your friends (consumer) to use it.
What issues you may face with Babel, Webpack, Eslint, Node, npm.
When to use “npm link” approach and “npm publish” approach.
What else to read and to try.
Reliable from-source builds (Qshare 28 Nov 2023).pdfRalf Gommers
Short presentation covering some in-progress work around handling external (non-Python/PyPI) dependencies in Python package metadata and build steps better. Covers PEP 725 and what may come after.
DockerCon EU 2015 in Barcelona
Practical tips for using Docker to run tests during development, CI/CD, and different strategies for speeding up the run of your test suite by using parallel pipelines in containers.
The jet tool used to demonstrate parallel testing is available here: https://codeship.com/documentation/docker/installation/
PuppetConf 2016: Getting to the Latest Puppet – Nate McCurdy & Elizabeth Witt...Puppet
Here are the slides from Nate McCurdy & Elizabeth Wittig Plumb's PuppetConf 2016 presentation called PuppetConf 2016: Getting to the Latest Puppet. Watch the videos at https://www.youtube.com/playlist?list=PLV86BgbREluVjwwt-9UL8u2Uy8xnzpIqa
Creating Developer-Friendly Docker Containers with ChaperoneGary Wisniewski
Chaperone provides a lean, full-featured environment which simplifies development and deployment of container services while adding minimal overhead (a single process does it all).
Presented by Joseph Rickert at the NYC R Conference, April 25 2015.
Good data analysis is reproducible. If someone else can’t independently replicate your results from your data, the consequences can be severe. With R, a major challenge for reproducibility is the ever-changing package ecosystem: it's all too easy to develop an R script using packages, only to find collaborators will download later versions of those packages when they attempt to reproduce your results, and outcome can be unpredictable!
In this talk I'll introduce the Reproducible R Toolkit, and the "checkpoint" package, included with Revolution R Open, and describe some best practices for writing reliable, reproducible R code with packages.
Some Pitfalls with Python and Their Possible Solutions v1.0Yann-Gaël Guéhéneuc
Python is a very popular programming language that comes with many pitfalls. This presentation describes some of these pitfalls, especially when they could trick unsuspecting object-oriented developers. It proposes solutions to these pitfalls, in particular regarding inheritance, which is easily broken because of the implementation choice of Python for explicit delegation, its method resolution order, and its use of the C3 algorithm. It discusses some advantages of using Python, especially regarding meta-classes.
Start with version control and experiments management in machine learningMikhail Rozhkov
How to manage complexity and reproducibility of Machine Learning projects? What requirements and tools? How to apply in your company and projects? Let's start with data and model version control! Review Data Version Control (DVC), MLFlow and other tools
Similar to Nobody Knows What It’s Like To Be the Bad Man: The Development Process for the Caret Package (20)
The Next Generational Shift In Enterprise Infrastructure Has Arrived. If SlideShare is broken, please download report here: https://www.scribd.com/document/352452857/2017-Enterprise-Almanac
AI to Enable Next Generation of People ManagersWork-Bench
In our work with hundreds of top fast growth startups and globally-distributed Fortune 1000 corporations in our enterprise tech ecosystem here in NYC, the most common refrain we hear is: "managing people is hard."
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Nobody Knows What It’s Like To Be the Bad Man: The Development Process for the Caret Package
1. Nobody Knows What It’s Like To Be the Bad Man
The Development Process for the caret Package
Max Kuhn
Pfizer Global R&D
max.kuhn@pfizer.com
Zachary Deane–Mayer
Cognius
zach.mayer@gmail.com
2. Model Function Consistency
Since there are many modeling packages in R written by di↵erent
people, there are inconsistencies in how models are specified and
predictions are created.
For example, many models have only one method of specifying the
model (e.g. formula method only)
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 2 / 19
3. Generating Class Probabilities Using Di↵erent
Packages
Function predict Function Syntax
MASS::lda predict(obj) (no options needed)
stats:::glm predict(obj, type = "response")
gbm::gbm predict(obj, type = "response", n.trees)
mda::mda predict(obj, type = "posterior")
rpart::rpart predict(obj, type = "prob")
RWeka::Weka predict(obj, type = "probability")
caTools::LogitBoost predict(obj, type = "raw", nIter)
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 3 / 19
4. The caret Package
The caret package was developed to:
create a unified interface for modeling and prediction (interfaces
to 180 models)
streamline model tuning using resampling
provide a variety of “helper” functions and classes for
day–to–day model building tasks
increase computational e ciency using parallel processing
First commits within Pfizer: 6/2005, First version on CRAN: 10/2007
Website: http://topepo.github.io/caret/
JSS Paper: http://www.jstatsoft.org/v28/i05/paper
Model List: http://topepo.github.io/caret/bytag.html
Many computing sections in APM
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 4 / 19
5. Package Dependencies
One thing that makes caret di↵erent from most other packages is
that it uses code from an abnormally large number (> 80) of other
packages.
Originally, these were in the Depends field of the DESCRIPTION file
which caused all of them to be loaded with caret.
For many years, they were moved to Suggests, which solved that
issue.
However, their formal dependency in the DESCRIPTION file required
CRAN to install hundreds of other packages to check caret. The
maintainers were not pleased.
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 5 / 19
6. Package Dependencies
This problem was somewhat alleviated at the end of 2013 when
custom methods were expanded.
Although this functionality had already existed in the package for
some time, it was refactored to be more user friendly.
In the process, much of the modeling code was moved out of caret’s
R files and into R objects, eliminating the formal dependencies.
Right now, the total number of dependencies is much smaller (2
Depends, 7 Imports, and 25 Suggests).
This still a↵ects testing though (described later). Also:
1 package is needed for this model and is not installed. (gbm).
Would you like to try to install it now?
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 6 / 19
7. The Basic Release Process
1 create a few dynamic man pages
2 use R CMD check --as-cran to ensure passing CRAN tests
and unit tests
3 update all packages (and R)
4 run regression tests and evaluate results
5 send to CRAN
6 repeat
7 repeat
8 install passed caret version
9 generate HTML documentation and sync github io branch
10 profit!
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 7 / 19
8. Toolbox
RStudio projects: a clean, self contained package development
environment
I No more setwd(’/path/to/some/folder’) in scripts
I Keep track of project-wide standards, e.g. code formatting
I An RStudio project was the first thing we added after moving
the caret repository to github
devtools: automate boring tasks
testthat: automated unit testing
roxygen2: combine source code with documentation
github: source control
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 8 / 19
9. devtools
devtools::install builds package and installs it locally
devtools::check:
1 Builds documentation
2 Runs unit tests
3 Builds tarball
4 Runs R CMD CHECK
devtools::release builds package and submits it to CRAN
devtools::install github enables non-CRAN code distribution
or distribution of private packages
devtools::use travis enables automated unit testing through
travis-CI and test coverage reports through coveralls
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 9 / 19
10. Testing
Testing for the package occurs in a few di↵erent ways:
units tests via testthat and travis-CI
regression tests for consistency
Automated unit testing via testthat
testthat::use testhat
Unit tests prevent new features from breaking old code
All functions should have associated tests
Run during R CMD check --as-cran
Can specify that certain tests be skipped on CRAN
caret is slowly adding more testthat tests
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 10 / 19
11. github + travis + coveralls
Travis and coveralls are tools for automated unit testing
Travis reports test failutes and CRAN errors/warnings
Coveralls reports % of code covered by unit tests
Both automatically comment on the PR itself
Contributor submits code via a pull request
Travis notifies them of test failures
Coveralls notifies them to write tests for new functions
Automated feedback on code quality allows rapid iteration
Code review once unit tests and R CMD CHECK pass
github supports line-by-line comments
Usually several more iterations here
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 11 / 19
12. Regression Testing
Prior to CRAN release (or whenever required), a comprehensive set of
regression tests can be conducted.
All modeling packages are updated to their current CRAN versions.
For each model accessed by train, rfe, and/or sbf, a set of test
cases are computed with the production version of caret and the
devel version.
First, test cases are evaluated to make sure that nothing has been
broken by updated versions of the consistuent packages.
Di↵s of the model results are computed to assess any di↵erences in
caret versions.
This process takes approximately 3hrs to complete using make -j 12
on a Mac Pro.
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 12 / 19
14. Documentation
caret originally contained four package vignettes with in–depth
descriptions of functionality with examples.
However, this added time to R CMD check and was a general pain for
CRAN.
E↵orts to make the vignettes more computationally e cient (e.g.
reducing the number of examples, resamples, etc.) diminished the
e↵ectiveness of the documentation.
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 14 / 19
15. Documentation
The documentation was moved out of the package and to the github
IO page.
These pages are built using knitr whenever a new version is sent to
CRAN. Some advantages are:
longer and more relevant examples are available
update schedule is under my control
dynamic documentation (e.g. D3 network graphs, JS tables)
better formatting
It currently takes about 4hr to create these (using parallel processing
when possible).
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 15 / 19
17. roxygen2
Simplified package documentation
Automates many parts of the documentation process
Special comment block above each function
Name, description, arguments, etc.
Code and documentation are in the same source file
A must have for new packages but hard to convert existing packages
caret has 92 .Rd files
I’m not in a hurry to re-write them all in roxygen2 format
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 17 / 19
18. Required “Optimizations”
For example, there is one check that produces a large number of false
positive warnings. For example:
> bwplot.diff.resamples <- function (x, data, metric = x$metric, ...) {
+ ## some code
+ plotData <- subset(plotData, Metric %in% metric)
+ ## more code
+ }
will trigger a warning that “bwplot.diff.resamples: no visible
binding for global variable ’Metric’”.
The “solution” is to have a file that is sourced first in the package
(e.g. aaa.R) with the line
> Metric <- NULL
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 18 / 19
19. Judging the Severity of Problems
It’s hard to tell which warnings should be ignored and which should
not. There is also the issue of inconsistencies related to who is “on
duty” when you submit your package.
It is hard to believe that someone took the time for this:
Description Field: "My awesome R package"
R CMD check: "Malformed Description field: should contain one
or more complete sentences."
Description Field: "This package is awesome"
R CMD check: "The Description field should not start with the
package name, ’This package’ or similar."
Description Field: "Blah blah blah."
R CMD check: PASS
Kuhn & Deane–Mayer (Pfizer / Cognius) caret 19 / 19