Random Forests Lightning Talk

  • 1,157 views
Uploaded on

Presentation given for at the Predictive Analytics World/Boston Predictive Analytics Meetup on 2012-10-01 providing a quick introduction to decision trees and random forests.

Presentation given for at the Predictive Analytics World/Boston Predictive Analytics Meetup on 2012-10-01 providing a quick introduction to decision trees and random forests.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,157
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Tools that help you decide how to spend those limited resources.

Transcript

  • 1. Predicting Customer Conversionwith Random ForestsA Decision Trees Case StudyDaniel Gerlanc, PrincipalEnplus Advisors, Inc.www.enplusadvisors.comdgerlanc@enplusadvisors.com
  • 2. TopicsObjectives Research Question Bank Prospect Data Conversion Decision TreesMethods Random Forests Results
  • 3. Objective• Which customer or prospects should you call today?• To whom should you offer incentives?
  • 4. Dataset• Direct Marketing campaign for bank loans• http://archive.ics.uci.edu/ml/datasets/Ba nk+Marketing• 45211 records, 17 attributes
  • 5. Dataset
  • 6. Decision Trees
  • 7. Decision Trees Windy Coat yesSunny No Coat no Coat
  • 8. Statistical Decision Trees• Randomness• May not know the relationships ahead of time
  • 9. Decision Trees
  • 10. SplittingDeterministic process
  • 11. Decision Tree Code tree.1 <- rpart(takes.loan ~ ., data=bank)• See the „rpart‟ and „rpart.plot‟ R packages.• Many parameters available to control the fit.
  • 12. Make Predictionspredict(tree.1, type=“vector”)
  • 13. How‟d it do? Guessing Precision: 11.7% Decision Tree: 34.8% ActualPredicted no yesno (1) 38,904 (3) 3,444yes (2) 1,018 (4) 1,845
  • 14. Decision Tree Problems• Overfitting the data• High variance• Not globally optimal
  • 15. Random ForestsOne Decision Tree Many Decision Trees (Ensemble)
  • 16. Building RF• Sample from the data• At each split, sample from the available variables• Repeat for each tree
  • 17. Why more than 1?• Create uncorrelated trees• Reduce variance of predictor• Continual cross-validation
  • 18. Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)Most important parameters are: Variable Description Default ntree Number of Trees 500 mtry Number of variables to randomly square root of # predictors for select at each node classification, # predictors / 3 for regression
  • 19. How‟d it do?Guessing Precision: 11.7%Random Forest: 64.5% ActualPredicted no yesno (1) 38,526 (3) 1396yes (2) 2748 (4) 2541
  • 20. Benefits of RF• Don‟t need a lot of tuning• Don‟t need an extra cross validation step• Many implementations • R, Weka, RapidMiner, Mahout
  • 21. References• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.ht m• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.