Decision Trees- Random Forests.pdf

CS189 Discussion 9
Vashisht Madhavan

Decision Trees +
Random Forests
Some guidance
with this year’s
election...

Decision Trees (Review)
● Nodes represent thresholds
on features
○ I.e. x1 > 3
● Edges lead us to next node
based on threshold
○ I.e. Right if True, Left if False
● Leaf Nodes give us our
predicted class

At Test Time
● Start at root node with unlabeled point x
● Descend to leaf node based on feature values of x
● Assign leaf node value as predicted class for x
● Usually O(log n)
○ Worst case O(n)

Training a DTree
● How do we decide which features to split on at each level of the tree?
We use this as our
metric for deciding
splits…
Here’s how

Training a DTree(cont.)
● We have this idea of Information gain
○ How much does a given feature threshold split reduce our entropy
● So we choose the feature split with the highest information gain
● We need to exhaustively search through values for all features
○ what would be a good way to do this?

Overﬁtting in Decision Trees
● As the depth and
complexity of the tree
increases, there is an
increase in overﬁtting
● As the tree becomes
deeper, each leaf gets very
few data points

Early Stopping ● Prevents Overﬁtting
● Improves speed
○ Smaller tree
BENEFITS
● We have stopping
criteria to prevent our
tree from growing
further
○ Max Tree Depth
○ Min # of points at
node
○ Tree Complexity
Penalty
○ Validation Error
Monitoring

Early Stopping(cont.)
● By stopping early or pruning we do lose some modeling power
○ We cannot capture more complex distributions
Early stopping leads us to
the red line

Ensemble Learning
● “Many idiots are often better
than one experts”
● Learn with several algorithms
○ Combine results
● Usually leads to better accuracy
● Very popular with decision trees
○ Random Forests
● Reduces variance
Combination of Decision “Stumps”

Random Forests
● Ensemble of short decision trees
● Averaging
○ Randomizing each model
○ I.e. diﬀerent depth trees
● Bagging
○ Randomize the data fed to each model
○ Take random samples from training data
■ With replacement
● To predict a new sample, we look at whichever class has the most number
of votes from the learners in the ensemble

Random Forests(cont.)
● With bagging, trees in the ensemble often look very similar. Why?
○ All trees pick same best splits (“correlated trees”)
○ Averaging won’t help then
● Feature Bagging
○ At each node pick a random subset of m features from d total features
○ Typically m = sqrt(d)
○ Has the eﬀect of “de-correlating” trees in the ensemble
● Sometimes test error reduction up to 100s or even 1,000s of decision
trees!
● Be careful to not dumb down trees too much
○ Ideally want very strong, diverse set of learners in the ensemble

Decision Trees- Random Forests.pdf

More Related Content

Similar to Decision Trees- Random Forests.pdf

Recently uploaded

Decision Trees- Random Forests.pdf