Predicting Customer Conversionwith Random ForestsA Decision Trees Case StudyDaniel Gerlanc, PrincipalEnplus Advisors, Inc....
TopicsObjectives       Research Question                   Bank Prospect  Data                    Conversion              ...
Objective• Which customer or prospects should  you call today?• To whom should you offer incentives?
Dataset• Direct Marketing campaign for bank  loans• http://archive.ics.uci.edu/ml/datasets/Ba  nk+Marketing• 45211 records...
Dataset
Decision Trees
Decision Trees              Windy    Coat        yesSunny                 No Coat        no    Coat
Statistical Decision         Trees• Randomness• May not know the relationships ahead  of time
Decision Trees
SplittingDeterministic process
Decision Tree Code tree.1 <- rpart(takes.loan ~ ., data=bank)• See the „rpart‟ and „rpart.plot‟ R packages.• Many paramete...
Make Predictionspredict(tree.1, type=“vector”)
How‟d it do? Naïve Accuracy: 11.7% Decision Tree Precision: 34.8%                           ActualPredicted   no          ...
Decision Tree        Problems• Overfitting the data (high variance)• May not use all relevant features
Random ForestsOne Decision    Tree                 Many Decision                Trees (Ensemble)
Building RF• Sample from the data• At each split, sample from the available  variables• Repeat for each tree
Motivations for RF• Create uncorrelated trees• Variance reduction• Subspace exploration
Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)Most important parameters are: Variable    Description    ...
How‟d it do?Naïve Accuracy: 11.7%Random Forest  •    Precision: 64.5% (2541 / 3937)  •    Recall: 48% (2541 / 5289)       ...
Tuning RFrffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)
Benefits of RF• Good accuracy with default settings• Relatively easy to make parallel• Many implementations • R, Weka, Rap...
References•   A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.•   Breiman...
Upcoming SlideShare
Loading in …5
×

Predicting Customer Conversion with Random Forests

3,308 views

Published on

Talk given for New England Artificial Intelligence on October 10, 2012.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,308
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
66
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Tools that help you decide how to spend those limited resources.
  • Predicting Customer Conversion with Random Forests

    1. 1. Predicting Customer Conversionwith Random ForestsA Decision Trees Case StudyDaniel Gerlanc, PrincipalEnplus Advisors, Inc.www.enplusadvisors.comdgerlanc@enplusadvisors.com
    2. 2. TopicsObjectives Research Question Bank Prospect Data Conversion Decision TreesMethods Random Forests Results
    3. 3. Objective• Which customer or prospects should you call today?• To whom should you offer incentives?
    4. 4. Dataset• Direct Marketing campaign for bank loans• http://archive.ics.uci.edu/ml/datasets/Ba nk+Marketing• 45211 records, 17 features
    5. 5. Dataset
    6. 6. Decision Trees
    7. 7. Decision Trees Windy Coat yesSunny No Coat no Coat
    8. 8. Statistical Decision Trees• Randomness• May not know the relationships ahead of time
    9. 9. Decision Trees
    10. 10. SplittingDeterministic process
    11. 11. Decision Tree Code tree.1 <- rpart(takes.loan ~ ., data=bank)• See the „rpart‟ and „rpart.plot‟ R packages.• Many parameters available to control the fit.
    12. 12. Make Predictionspredict(tree.1, type=“vector”)
    13. 13. How‟d it do? Naïve Accuracy: 11.7% Decision Tree Precision: 34.8% ActualPredicted no yesno (1) 38,904 (3) 3,444yes (2) 1,018 (4) 1,845
    14. 14. Decision Tree Problems• Overfitting the data (high variance)• May not use all relevant features
    15. 15. Random ForestsOne Decision Tree Many Decision Trees (Ensemble)
    16. 16. Building RF• Sample from the data• At each split, sample from the available variables• Repeat for each tree
    17. 17. Motivations for RF• Create uncorrelated trees• Variance reduction• Subspace exploration
    18. 18. Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)Most important parameters are: Variable Description Default ntree Number of Trees 500 mtry Number of variables to randomly • square root of # predictors for select at each node classification • # predictors / 3 for regression
    19. 19. How‟d it do?Naïve Accuracy: 11.7%Random Forest • Precision: 64.5% (2541 / 3937) • Recall: 48% (2541 / 5289) ActualPredicted yes noyes (1) 2,541 (3) 2748no (2) 1,396 (4) 38,526
    20. 20. Tuning RFrffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)
    21. 21. Benefits of RF• Good accuracy with default settings• Relatively easy to make parallel• Many implementations • R, Weka, RapidMiner, Mahout
    22. 22. References• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

    ×