DECISION TREES & RANDOM FORESTS
Max Pagels, Data Science Specialist
max.pagels@sc5.io
12.6.2016
A TREE IN THE REAL WORLD
A TREE IN COMPUTER SCIENCE
A TREE IN COMPUTER SCIENCE
Root
Leaf
Edge
Node
DECISION TREES
A decision tree is a learning algorithm that constructs a set of
decisions based on training data.
Decision trees are popular because:
• They are naturally non-linear, so you can use them to solve
complex problems
• They are easy to visualise
• How they work is easily explained
• They can be used for regression (predict a number) and
classification (predict a class)
A decision tree algorithm is an explicit version of “ten questions”.
A TOY EXAMPLE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
DECISION TREES IN SCIKIT
sklearn.tree.DecisionTreeClassifier(
criterion=‘gini',
splitter=‘best',
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features=None,
random_state=None,
max_leaf_nodes=None,
min_impurity_split=1e-07,
class_weight=None,
presort=False
)
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
Yes No
Date No Date
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
Date No Date
ID3
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
Date No Date
The ID3 algorithm, and many other decision tree algorithms, are
prone to overfitting: trees become too deep and start to capture
noise in the training data.
Overfitting means a trained algorithm will fail to generalise well to
new examples.
One way of combatting overfitting is to use an ensemble method.
ID3
RANDOM FORESTS
A FOREST IN THE REAL WORLD
A FOREST IN COMPUTER SCIENCE
A FOREST IN COMPUTER SCIENCE
RANDOM FORESTS
A random forest is an ensemble method based on decision trees.
The basic idea is deceptively simple:
1. Construct N decision trees
• Randomly sample a subset of the training data (with
replacement)
• Construct/train a decision tree using the decision tree
algorithm and the sampled subset of data
2. Predict by asking all trees in the forest for their opinion
• For regression problems, take the mean (average) of all
trees’ predictions
• For classification problems, take the mode of all trees’
predictions (i.e. vote)
CLASSIFICATION
Y
Y
N
Mo({Y,Y,N}) = Y
REGRESSION
2.1
1.8
1.9
µ({2.1, 1.8, 1.9}) = 1.933…
SUMMARY
Decision trees are easy-to-understand learning algorithms that can
be used for regression and classification, even for non-linear
problems.
Random forests are ensemble learning algorithms that help prevent
overfitting by creating many decision trees and averaging their
predictions.
If you are just getting started with machine learning, decision trees
are an excellent starting point.
THANK YOU!
Questions?

Decision trees & random forests

  • 1.
    DECISION TREES &RANDOM FORESTS Max Pagels, Data Science Specialist max.pagels@sc5.io 12.6.2016
  • 2.
    A TREE INTHE REAL WORLD
  • 3.
    A TREE INCOMPUTER SCIENCE
  • 4.
    A TREE INCOMPUTER SCIENCE Root Leaf Edge Node
  • 5.
    DECISION TREES A decisiontree is a learning algorithm that constructs a set of decisions based on training data. Decision trees are popular because: • They are naturally non-linear, so you can use them to solve complex problems • They are easy to visualise • How they work is easily explained • They can be used for regression (predict a number) and classification (predict a class) A decision tree algorithm is an explicit version of “ten questions”.
  • 6.
    A TOY EXAMPLE Weekend?Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 7.
    BUILDING A DECISIONTREE Basic approach (starting with a root node): Loop over all leaf nodes: 1. Select the best attribute A 2. Assign A as the decision attribute for the node we are currently traversing 3. For each value of A, create a descendant node (leaf node) 4. Sort training examples to leaf nodes 5. If stopping criterion hit, stop; else continue Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 8.
    BUILDING A DECISIONTREE Basic approach (starting with a root node): Loop over all leaf nodes: 1. Select the best attribute A 2. Assign A as the decision attribute for the node we are currently traversing 3. For each value of A, create a descendant node (leaf node) 4. Sort training examples to leaf nodes 5. If stopping criterion hit, stop; else continue Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 9.
    BUILDING A DECISIONTREE Basic approach (starting with a root node): Loop over all leaf nodes: 1. Select the best attribute A 2. Assign A as the decision attribute for the node we are currently traversing 3. For each value of A, create a descendant node (leaf node) 4. Sort training examples to leaf nodes 5. If stopping criterion hit, stop; else continue Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 10.
    DECISION TREES INSCIKIT sklearn.tree.DecisionTreeClassifier( criterion=‘gini', splitter=‘best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_split=1e-07, class_weight=None, presort=False )
  • 11.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 12.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W
  • 13.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W Yes No
  • 14.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W F Yes No
  • 15.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W F Yes No Yes No
  • 16.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W F Yes No Yes No Date No Date
  • 17.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date
  • 18.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date Yes No
  • 19.
    BUILDING A DECISIONTREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date Yes No Date No Date
  • 20.
    ID3 Weekend? Evening? Food?Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date Yes No Date No Date
  • 21.
    The ID3 algorithm,and many other decision tree algorithms, are prone to overfitting: trees become too deep and start to capture noise in the training data. Overfitting means a trained algorithm will fail to generalise well to new examples. One way of combatting overfitting is to use an ensemble method. ID3
  • 22.
  • 23.
    A FOREST INTHE REAL WORLD
  • 24.
    A FOREST INCOMPUTER SCIENCE
  • 25.
    A FOREST INCOMPUTER SCIENCE
  • 26.
    RANDOM FORESTS A randomforest is an ensemble method based on decision trees. The basic idea is deceptively simple: 1. Construct N decision trees • Randomly sample a subset of the training data (with replacement) • Construct/train a decision tree using the decision tree algorithm and the sampled subset of data 2. Predict by asking all trees in the forest for their opinion • For regression problems, take the mean (average) of all trees’ predictions • For classification problems, take the mode of all trees’ predictions (i.e. vote)
  • 27.
  • 28.
  • 29.
    SUMMARY Decision trees areeasy-to-understand learning algorithms that can be used for regression and classification, even for non-linear problems. Random forests are ensemble learning algorithms that help prevent overfitting by creating many decision trees and averaging their predictions. If you are just getting started with machine learning, decision trees are an excellent starting point.
  • 30.