Random Forests Lightning Talk

2,092 views

Published on

Presentation given for at the Predictive Analytics World/Boston Predictive Analytics Meetup on 2012-10-01 providing a quick introduction to decision trees and random forests.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,092
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Tools that help you decide how to spend those limited resources.
  • Random Forests Lightning Talk

    1. 1. Predicting Customer Conversionwith Random ForestsA Decision Trees Case StudyDaniel Gerlanc, PrincipalEnplus Advisors, Inc.www.enplusadvisors.comdgerlanc@enplusadvisors.com
    2. 2. TopicsObjectives Research Question Bank Prospect Data Conversion Decision TreesMethods Random Forests Results
    3. 3. Objective• Which customer or prospects should you call today?• To whom should you offer incentives?
    4. 4. Dataset• Direct Marketing campaign for bank loans• http://archive.ics.uci.edu/ml/datasets/Ba nk+Marketing• 45211 records, 17 attributes
    5. 5. Dataset
    6. 6. Decision Trees
    7. 7. Decision Trees Windy Coat yesSunny No Coat no Coat
    8. 8. Statistical Decision Trees• Randomness• May not know the relationships ahead of time
    9. 9. Decision Trees
    10. 10. SplittingDeterministic process
    11. 11. Decision Tree Code tree.1 <- rpart(takes.loan ~ ., data=bank)• See the „rpart‟ and „rpart.plot‟ R packages.• Many parameters available to control the fit.
    12. 12. Make Predictionspredict(tree.1, type=“vector”)
    13. 13. How‟d it do? Guessing Precision: 11.7% Decision Tree: 34.8% ActualPredicted no yesno (1) 38,904 (3) 3,444yes (2) 1,018 (4) 1,845
    14. 14. Decision Tree Problems• Overfitting the data• High variance• Not globally optimal
    15. 15. Random ForestsOne Decision Tree Many Decision Trees (Ensemble)
    16. 16. Building RF• Sample from the data• At each split, sample from the available variables• Repeat for each tree
    17. 17. Why more than 1?• Create uncorrelated trees• Reduce variance of predictor• Continual cross-validation
    18. 18. Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)Most important parameters are: Variable Description Default ntree Number of Trees 500 mtry Number of variables to randomly square root of # predictors for select at each node classification, # predictors / 3 for regression
    19. 19. How‟d it do?Guessing Precision: 11.7%Random Forest: 64.5% ActualPredicted no yesno (1) 38,526 (3) 1396yes (2) 2748 (4) 2541
    20. 20. Benefits of RF• Don‟t need a lot of tuning• Don‟t need an extra cross validation step• Many implementations • R, Weka, RapidMiner, Mahout
    21. 21. References• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.ht m• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

    ×