• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Predicting Customer Conversion with Random Forests

Predicting Customer Conversion with Random Forests



Talk given for New England Artificial Intelligence on October 10, 2012.

Talk given for New England Artificial Intelligence on October 10, 2012.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Tools that help you decide how to spend those limited resources.

Predicting Customer Conversion with Random Forests Predicting Customer Conversion with Random Forests Presentation Transcript

  • Predicting Customer Conversionwith Random ForestsA Decision Trees Case StudyDaniel Gerlanc, PrincipalEnplus Advisors, Inc.www.enplusadvisors.comdgerlanc@enplusadvisors.com
  • TopicsObjectives Research Question Bank Prospect Data Conversion Decision TreesMethods Random Forests Results
  • Objective• Which customer or prospects should you call today?• To whom should you offer incentives?
  • Dataset• Direct Marketing campaign for bank loans• http://archive.ics.uci.edu/ml/datasets/Ba nk+Marketing• 45211 records, 17 features
  • Dataset
  • Decision Trees
  • Decision Trees Windy Coat yesSunny No Coat no Coat
  • Statistical Decision Trees• Randomness• May not know the relationships ahead of time
  • Decision Trees
  • SplittingDeterministic process
  • Decision Tree Code tree.1 <- rpart(takes.loan ~ ., data=bank)• See the „rpart‟ and „rpart.plot‟ R packages.• Many parameters available to control the fit.
  • Make Predictionspredict(tree.1, type=“vector”)
  • How‟d it do? Naïve Accuracy: 11.7% Decision Tree Precision: 34.8% ActualPredicted no yesno (1) 38,904 (3) 3,444yes (2) 1,018 (4) 1,845
  • Decision Tree Problems• Overfitting the data (high variance)• May not use all relevant features
  • Random ForestsOne Decision Tree Many Decision Trees (Ensemble)
  • Building RF• Sample from the data• At each split, sample from the available variables• Repeat for each tree
  • Motivations for RF• Create uncorrelated trees• Variance reduction• Subspace exploration
  • Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)Most important parameters are: Variable Description Default ntree Number of Trees 500 mtry Number of variables to randomly • square root of # predictors for select at each node classification • # predictors / 3 for regression
  • How‟d it do?Naïve Accuracy: 11.7%Random Forest • Precision: 64.5% (2541 / 3937) • Recall: 48% (2541 / 5289) ActualPredicted yes noyes (1) 2,541 (3) 2748no (2) 1,396 (4) 38,526
  • Tuning RFrffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)
  • Benefits of RF• Good accuracy with default settings• Relatively easy to make parallel• Many implementations • R, Weka, RapidMiner, Mahout
  • References• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.