Cutting Edge Predictive Modeling For Classification

Cutting Edge Predictive Modeling for Classification
- A look at GBM in R

Pankaj Sharma
Oct 25, 2012

The views expressed herein are my own and do not
necessarily represent the views of my past / current
employer or people I have worked with.

My apologies if some sources have not have been
sighted. I make the best attempt to include the
relevant sources known to me.

1

Performance vs. Complexity

Predictive Performance

Simple Complex
Model Type

Highly complex models have won numerous competitions – Netflix Prize

2

Brief Mention of Modeling Techniques

 Statistical Models
- Linear regression
- Logistic regression
- Naïve Bayes

 Semi-Parametric Models
- Credit Scoring
- GAM: generalized additive model
- GNBC: generalized naïve bayes classifier

 Algorithmic / Non-Parametric Models
- MARS: multiple adaptive regression splines
- Gradient Boosting / TreeNet
- SVM: support vector machines
- Random Forests
- knn: k nearest neighbors

3

Leo Breiman's Philosophy of Data Analysis

 According to Leo Breiman (2001) there are two cultures in the use of statistical
modeling to reach conclusions from data.
- One assumes that the data are generated by a given stochastic data model.
- The other uses algorithmic models and treats the data mechanism as unknown.
• The statistical community has been committed to the almost exclusive use of data models.

“How much are we prepared to let the data tell us about the process we are studying?
For Breiman the answer would typically be ‘everything’, whereas for more traditional
statisticians the answer would be far more qualified” – Dan Steinberg

http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?handle=euclid.ss/1009213726&view=body&content-type=pdf_1

4

Best of-the-shelf Predictive Model for Classification

Missing values
Variable scaling
Correlated variables Model should be able to handle it all
Variable transformations
Categorical variables

In 90’s - Decision Tree

Today - Stochastic Gradient Boosting
https://www.salford-systems.com/en/products/treenet

5

Modeling in R – Some Available Functions

Taken from Revolution R Presentation on Introduction to Data Mining

6

Advanced Analytics begins in R

Cutting Edge Data Mining Not difficult to learn R.
Algorithms are usually In fact R is easier
implemented in R first and intuitive than SAS

Friedman, J. H. "Tutorial: Getting Started with MART in R." (April 2002)

MART with R - MART(tm) is an implementation of the gradient tree boosting methods for predictive data mining
(regression and classification) described in Greedy Function Approximation: a Gradient Boosting Machine and
Stochastic Gradient Boosting.

http://www-stat.stanford.edu/~jhf/R-MART.html

7

KDD cup 2009 winner used R

Entrants were given 50k records with 15k variables of CRM data from a telecommunication
company and tasked with predicting three target variables:
1. Churn (likelihood of a customer switching to another provider)
2. Appetency (propensity to buy)
3. Likelihood of response to up-selling

Fast Track Slow Track
IBM Research University of Melbourne
1st prize Ensemble Selection for the KDD Cup Orange Challenge Boosting

ID Analytics, Inc Financial Engineering Group, Inc. Japan
2nd prize KDD Cup Fast Scoring on a Large Database - TreeNet Stochastic Gradient Boosting

Old dogs with new tricks (David Slate, Peter W. Frey) National Taiwan University, Computer Science and Information Engineering
None Fast Scoring on a Large Database using regularized maximum entropy model,
3rd prize
categorical/numerical balanced AdaBoost and selective Naive Bayes

Fast challenge - complete in five days & Slow challenge – one month deadline from dataset availability

1, 2 and 3rd prize winners used some form of Gradient Boosting

8

How to win the KDD Cup Challenge with R and GBM – Hugh Miller

 Feature selection is an important first step for all successful data mining projects.
- For categorical variables we just took the average number of 1's in the response for each
category and used this as a predictor.
- For continuous variables we split the variable up into "bins", as you would via histogram, and
again took the average number of 1's in the response for each bin as the predictor.

 The main model was a gradient boosted machine which used the "gbm" package in R.
This basically fits a series of small decision trees, up-weighting the observations that
are predicted poorly at each iteration. We used Bernoulli loss and also up-weighted the
"1" response class. A fair amount of time was spent optimizing the number of trees,
how big they should be etc, but a fit of 5,000 trees only took a bit over an hour to fit.

 We used trees to avoid doing much data cleaning – they automatically allow for
extreme results, non-linearity, missing values and handle both categorical and
continuous variables. The main adjustment we had to make was to aggregate the
smaller categories in the categorical variables, as they tended to distort the fits.

http://www.cybaea.net/Blogs/Data/How-to-win-the-KDD-Cup-Challenge-with-R-and-gbm.html

9

R / SAS Expertise

The Tipping Point: How Little Things Make a Big Difference (2000)
Blink: The Power of Thinking Without Thinking (2005)
Outliers: The Story of Success (2008)

Same logic applies in SAS – macros in SAS are similar to functions in R

Taken from Revolution R Presentation on Introduction to Data Mining

10

Case Study – Current Approach vs. GBM

GBM shows promise of higher lift and response rate in the top decile of validation data
(historical direct marketing program) when compared to the current approach
Lift - current way Lift - GBM ks - current way ks - GBM

0.7 1

Lift and KS numbers are masked in this chart
Lift

KS
0 0
1 2 3 4 5 6 7 8 9 10

Decile
# one way of building the gbm model in R
library(gbm)
mdl <- gbm (resp ~ ., data=t, distribution = "bernoulli", n.trees = 1000, interaction.depth = 1, n.minobsinnode = 20, shrinkage = 0.1,
bag.fraction = 0.5, train.fraction = 1, keep.data = TRUE, verbose = TRUE, cv.folds=3)

http://cran.r-project.org/web/packages/gbm/index.html
11

My Journey from SAS to R

12

Cutting Edge Predictive Modeling For Classification

Recommended

Recommended

More Related Content

Similar to Cutting Edge Predictive Modeling For Classification

Similar to Cutting Edge Predictive Modeling For Classification (20)

Cutting Edge Predictive Modeling For Classification