Cutting Edge Predictive Modeling for Classification- A look at GBM in RPankaj SharmaOct 25, 2012
The views expressed herein are my own and do notnecessarily represent the views of my past / current      employer or peop...
Performance vs. Complexity                   Predictive Performance                                            Simple     ...
Brief Mention of Modeling Techniques    Statistical Models      - Linear regression      - Logistic regression      - Naï...
Leo Breimans Philosophy of Data Analysis     According to Leo Breiman (2001) there are two cultures in the use of statist...
Best of-the-shelf Predictive Model for Classification         Missing values         Variable scaling         Correlated v...
Modeling in R – Some Available Functions                      Taken from Revolution R Presentation on Introduction to Data...
Advanced Analytics begins in R                  Cutting Edge Data Mining                                     Not difficult...
KDD cup 2009 winner used R       Entrants were given 50k records with 15k variables of CRM data from a telecommunication  ...
How to win the KDD Cup Challenge with R and GBM – Hugh Miller    Feature selection is an important first step for all suc...
R / SAS Expertise                                            The Tipping Point: How Little Things Make a Big Difference (2...
Case Study – Current Approach vs. GBM                GBM shows promise of higher lift and response rate in the top decile ...
My Journey from SAS to R                           12
Upcoming SlideShare
Loading in …5
×

Cutting Edge Predictive Modeling For Classification

1,007 views

Published on

Cutting Edge Predictive Modeling For Classification Problems in Direct Marketing using GBM in R

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,007
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cutting Edge Predictive Modeling For Classification

  1. 1. Cutting Edge Predictive Modeling for Classification- A look at GBM in RPankaj SharmaOct 25, 2012
  2. 2. The views expressed herein are my own and do notnecessarily represent the views of my past / current employer or people I have worked with.My apologies if some sources have not have been sighted. I make the best attempt to include the relevant sources known to me. 1
  3. 3. Performance vs. Complexity Predictive Performance Simple Complex Model Type Highly complex models have won numerous competitions – Netflix Prize 2
  4. 4. Brief Mention of Modeling Techniques  Statistical Models - Linear regression - Logistic regression - Naïve Bayes  Semi-Parametric Models - Credit Scoring - GAM: generalized additive model - GNBC: generalized naïve bayes classifier  Algorithmic / Non-Parametric Models - MARS: multiple adaptive regression splines - Gradient Boosting / TreeNet - SVM: support vector machines - Random Forests - knn: k nearest neighbors 3
  5. 5. Leo Breimans Philosophy of Data Analysis  According to Leo Breiman (2001) there are two cultures in the use of statistical modeling to reach conclusions from data. - One assumes that the data are generated by a given stochastic data model. - The other uses algorithmic models and treats the data mechanism as unknown. • The statistical community has been committed to the almost exclusive use of data models. “How much are we prepared to let the data tell us about the process we are studying? For Breiman the answer would typically be ‘everything’, whereas for more traditional statisticians the answer would be far more qualified” – Dan Steinberg http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?handle=euclid.ss/1009213726&view=body&content-type=pdf_1 4
  6. 6. Best of-the-shelf Predictive Model for Classification Missing values Variable scaling Correlated variables Model should be able to handle it all Variable transformations Categorical variables In 90’s - Decision Tree Today - Stochastic Gradient Boosting https://www.salford-systems.com/en/products/treenet 5
  7. 7. Modeling in R – Some Available Functions Taken from Revolution R Presentation on Introduction to Data Mining 6
  8. 8. Advanced Analytics begins in R Cutting Edge Data Mining Not difficult to learn R. Algorithms are usually In fact R is easier implemented in R first and intuitive than SAS Friedman, J. H. "Tutorial: Getting Started with MART in R." (April 2002) MART with R - MART(tm) is an implementation of the gradient tree boosting methods for predictive data mining (regression and classification) described in Greedy Function Approximation: a Gradient Boosting Machine and Stochastic Gradient Boosting. http://www-stat.stanford.edu/~jhf/R-MART.html 7
  9. 9. KDD cup 2009 winner used R Entrants were given 50k records with 15k variables of CRM data from a telecommunication company and tasked with predicting three target variables: 1. Churn (likelihood of a customer switching to another provider) 2. Appetency (propensity to buy) 3. Likelihood of response to up-selling Fast Track Slow Track IBM Research University of Melbourne 1st prize Ensemble Selection for the KDD Cup Orange Challenge Boosting ID Analytics, Inc Financial Engineering Group, Inc. Japan 2nd prize KDD Cup Fast Scoring on a Large Database - TreeNet Stochastic Gradient Boosting Old dogs with new tricks (David Slate, Peter W. Frey) National Taiwan University, Computer Science and Information Engineering None Fast Scoring on a Large Database using regularized maximum entropy model, 3rd prize categorical/numerical balanced AdaBoost and selective Naive Bayes Fast challenge - complete in five days & Slow challenge – one month deadline from dataset availability 1, 2 and 3rd prize winners used some form of Gradient Boosting 8
  10. 10. How to win the KDD Cup Challenge with R and GBM – Hugh Miller  Feature selection is an important first step for all successful data mining projects. - For categorical variables we just took the average number of 1s in the response for each category and used this as a predictor. - For continuous variables we split the variable up into "bins", as you would via histogram, and again took the average number of 1s in the response for each bin as the predictor.  The main model was a gradient boosted machine which used the "gbm" package in R. This basically fits a series of small decision trees, up-weighting the observations that are predicted poorly at each iteration. We used Bernoulli loss and also up-weighted the "1" response class. A fair amount of time was spent optimizing the number of trees, how big they should be etc, but a fit of 5,000 trees only took a bit over an hour to fit.  We used trees to avoid doing much data cleaning – they automatically allow for extreme results, non-linearity, missing values and handle both categorical and continuous variables. The main adjustment we had to make was to aggregate the smaller categories in the categorical variables, as they tended to distort the fits. http://www.cybaea.net/Blogs/Data/How-to-win-the-KDD-Cup-Challenge-with-R-and-gbm.html 9
  11. 11. R / SAS Expertise The Tipping Point: How Little Things Make a Big Difference (2000) Blink: The Power of Thinking Without Thinking (2005) Outliers: The Story of Success (2008) Same logic applies in SAS – macros in SAS are similar to functions in R Taken from Revolution R Presentation on Introduction to Data Mining 10
  12. 12. Case Study – Current Approach vs. GBM GBM shows promise of higher lift and response rate in the top decile of validation data (historical direct marketing program) when compared to the current approach Lift - current way Lift - GBM ks - current way ks - GBM 0.7 1 Lift and KS numbers are masked in this chart Lift KS 0 0 1 2 3 4 5 6 7 8 9 10 Decile # one way of building the gbm model in R library(gbm) mdl <- gbm (resp ~ ., data=t, distribution = "bernoulli", n.trees = 1000, interaction.depth = 1, n.minobsinnode = 20, shrinkage = 0.1, bag.fraction = 0.5, train.fraction = 1, keep.data = TRUE, verbose = TRUE, cv.folds=3) http://cran.r-project.org/web/packages/gbm/index.html 11
  13. 13. My Journey from SAS to R 12

×