Tree advanced

Tree: Advanced Topic
Bagging, Random forest, Boosting
Jinseob Kim
GSPH, SNU
August 22, 2014
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 1 / 16

Tree: Pros & Cons
Pros
Computed very quickly
Simple interpretations.
built-in feature selection; if a predictor was not used in any split, the
model is completely independent of that data.
Cons
Do not usually have optimal performance
Small change in data ! drastically change in Tree
Ensemble methods
Many trees are

t and predictions are aggregated across the trees.
Bagging, boosting and random forests

Bagging
Contents
1 Bagging
2 Random Forest
3 Boosting

Bagging
Bootstrap aggregation
Basic idea
Resampling and recalculating tree
Averaging(continuous) or Majority vote(categorial)
Note
Similar Bias
Reduced variance

Bagging
Bagging: reduces variance –– Example 1
Two categories of samples: blue, red
Two predictors: x1 and x2
Diagonal separation .. hardest case for tree-based classifier
Single tree decision boundary in orange.
Bagged predictor decision boundary in red.
UPenn & Rutgers Albert A. Montillo 12 of 28

Bagging
Bagging: reduces variance –– Example 2
Ellipsoid separation Æ
Two categories,
Two predictors
Single tree decision boundary 100 bagged trees..
UPenn & Rutgers Albert A. Montillo 13 of 28

Random Forest
Contents
1 Bagging
2 Random Forest
3 Boosting

Random Forest
Basic idea: Decorrelated Tree
Bootstrap samples
At each split, bootstrap variables
Grow multiple trees and vote
Pros
Accuracy
Cons
Speed
Interpretability
Over

tting

Random Forest
Bagging vs Random Forest
Bagging alone utilizes the same full set of predictors to determine
each split.
Random forest applies another judicious injection of randomness:
namely by selecting a random subset of the predictors for each split
Number of predictors to try at each split?? : mtry p
k : classi

cation
k
3 : regression
Bagging is a special case of random forest where mtry = k

Boosting
Contents
1 Bagging
2 Random Forest
3 Boosting

Boosting
Boosting Algorithms
A method to “boost”weak learning algorithms (e.g. single trees) into
strong learning algorithms.
Boosted trees try to improve the model fit over di↵erent trees by
considering past fits (not unlike iteratively reweighted least squares)
The basic tree boosting algorithm:
Initialize equal weights per sample;
for j = 1. . .M iterations do
Fit a classification tree using sample weights (denote the model
equation as fj (x ));
forall the misclassified samples do
increase sample weight
end
Save a “stage–weight” ("j ) based on the performance of the current
model;
end
Max Kuhn (Pfizer) Predictive Modeling 83 / 132

Boosting

Tree advanced

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

More from Jinseob Kim

More from Jinseob Kim (20)

Recently uploaded

Recently uploaded (20)

Tree advanced