TreeNet TreeEnsembles and CART  Decision Trees: AWinning Combination                                                      ...
Course Outline• CART decision tree pros/cons• TreeNet stochastic gradient boosting: a promising  way to overcome the short...
Demonstration Dataset108,376 bank customers (commercial and individual)with 6,564 in bad standing over the past two yearsG...
CART Advantages1. Relatively fast2. All types of variables    1.    Numeric, binary, categorical, missing values3. Invaria...
CART Disadvantages1. Trade-off: accuracy vs. interpretability2. Piecewise-constant model    1.    Big errors near region b...
TreeNet Tree Ensembles• Complements CART advantages, while  dramatically increasing accuracy       Tree 1                 ...
TreeNet Overcomes         CART’s ShortcomingsPiecewise-Constant         CART                           TreeNetModel       ...
TreeNet and CART A Winning Combination© Salford Systems 2012
Upcoming SlideShare
Loading in …5
×

TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination

369 views

Published on

Understand CART decision tree pros/cons, how TreeNet stochastic gradient boosting ca n help overcome single-tree challenges, and what the advantages are when using CART and TreeNet in combination for predictive modeling success.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
369
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination

  1. 1. TreeNet TreeEnsembles and CART Decision Trees: AWinning Combination October 2012 Mikhail Golovnya Salford SystemsCART® software is a trademark of California Statistical Software, Inc. and is licensed exclusively to Salford Systems.TreeNet® software is a trademark of Salford Systems
  2. 2. Course Outline• CART decision tree pros/cons• TreeNet stochastic gradient boosting: a promising way to overcome the shortcomings of a single tree• Introducing TreeNet, a powerful modern ensemble of boosted trees o Methodology o Reporting o Interpretability o Post-processing o Interaction detection• Advantages of using both CART and TreeNet o Contribution from CART o Contribution from TreeNet © Salford Systems 2012
  3. 3. Demonstration Dataset108,376 bank customers (commercial and individual)with 6,564 in bad standing over the past two yearsGoal: identify customers in bad standing using thefollowing predictorsRevolving utilization of creditAge of the primary account holderDebt ratio of the primary account holderMonthly incomeNumber of open credit linesNumber of mortgagesNumber of dependents © Salford Systems 2012
  4. 4. CART Advantages1. Relatively fast2. All types of variables 1. Numeric, binary, categorical, missing values3. Invariant under monotone transformations 1. Variable scales are irrelevant 2. Immunity to outliers 3. Most variables can be used “as is”4. Resistance to many irrelevant variables5. Few tunable parameters “off-the-shelf” procedure6. Interpretable model representation © Salford Systems 2012
  5. 5. CART Disadvantages1. Trade-off: accuracy vs. interpretability2. Piecewise-constant model 1. Big errors near region boundaries 2. Impossible to detect fine differences within the segment3. Instability => high variance 1. Small data change => big model change (especially for large trees)4. Data fragmentation – splitting5. High interaction order model, unreasonably complicated way to represent simple additive dependencies © Salford Systems 2012
  6. 6. TreeNet Tree Ensembles• Complements CART advantages, while dramatically increasing accuracy Tree 1 Tree 2 Tree 3 + + First tree grown 2nd tree grown on 3rd tree grown on on original residuals from residuals from target. first. Predictions model consisting Intentionally made to improve of first two trees “weak” model first tree© Salford Systems 2012
  7. 7. TreeNet Overcomes CART’s ShortcomingsPiecewise-Constant CART TreeNetModel Big errors near region Fine predictions, nearly boundaries, coarse emulating smooth predictions continuous response surfaceInstability and Variance CART TreeNet Small data changes Stable models due to induce big model changes averaging of individual (especially for large trees) tree responsesData Fragmentation CART TreeNet Relatively few predictors Each tree works with the make it into the model entire data – many opportunities for variables to enterHigh Interaction Order CART TreeNetModel Always enforced Allows precise control © Salford Systems 2012 over the interactions
  8. 8. TreeNet and CART A Winning Combination© Salford Systems 2012

×