Advanced Machine Learning with Python
Session 9 :Decision Trees
SIGKDD
Carlos Santillan
Bentley Systems Inc
csantill@gmail.com
Decision Trees
A tree-like graph decision support model
Growing a Tree
Types of Decision Trees
There are two main Types
• Classification Tree (Categorical Value Decision Tree)
• Regression Tree (Continuous Variable Decision Tree)
CART (Classification and Regression Tree) Used to refer to both
The type of a Decision tree is based on the type of the target Variable
Nodes
1.Root Node
2.Internal Node (Decision Node)
3.Leaf (terminal)
Depth - Length of of longest path
from root to leaf
Decision Stump (One level decision Tree)
Decision Tree Terms
Decision Tree Algorithm
The basic greedy algorithm is as follows:
Start at Node find “best attribute” to split at
Repartition N into N1, N2, … according to best split
Repeat for each Node N until “stop condition” is met
Growing an optimal Decision Tree is an NP-complete
Problem
Fortunately greedy algorithms have good accuracy and
performance
What is the “Best Attribute” to split
There are different criteria that can be used to determine what
is the best attribute to split.
• Information Gain
• Gini Index
• Classification Error
• Gain Ratio (Normalized Information Gain)
• Variance Reduction
Purity
Entropy
Def: Measure of Impurity in our sample
• Entropy =0 (All elements are same class)
• Entropy =1 (All elements evenly split between classes)
Information Gain
Information Gain = Entropy (Parent) - [ Weighted Average]
Entropy (Children)
If we Split X < 4
• Entropy < 4 = 0.86
• Entropy > 4 = 0
Information Gain = 0.95 - 14/16 (0.86) - (2/16) (0)
Information Gain = 0.19
Information Gain
IG = Entropy (Parent) - [ Weighted Average] Entropy (Children)
If we Split X < 3
• Entropy < 3 = 0
• Entropy > 3 = 0.811
Information Gain = 0.95 - 8/16 (0) - (2/16) (0.811)
Information Gain = 0.8486
GINI Index
Definition : Expected error rate
• GINI =0 (All elements are same class)
• GINI =0.5 (All elements evenly split between classes)
GINI Gain
If we Split X < 4
• Gini < 4 = 0.4081
• Gini > 4 = 0
Gini Gain = 0.4687 - 10/16 (0.4081) - (0/16) (0)
Gini Gain = 0.2136
GINI Gain
If we Split X < 3
• Gini< 3 = 0
• Gini > 3 = 0.375
Gini Gain = 0.4687 - 8/16 (0) - (2/16) (0.375)
Gini Gain = 0.421825
When to use which?
● Gini for continuous attributes
● Entropy for categorical.
● Entropy is slower to calculate than GINI
● Gini may fail with very small probability
● Difference between the two is theoretically around 2%
When to stop growing?
• All data points at leaf are pure
• When tree a reaches depth k
• Number of cases in node less that minimum number of cases
• Splitting criteria less than certain threshold
Pruning
Prevent over fitting
Smaller trees may be more accurate
Strategies:
• Prepruning : Stop growing when information becomes
unreliable
• Postpruning : fully grow a tree and remove unreliable parts
Note: Pruning currently not supported by scikit
Algorithms
ID3 (Iterative Dichotomiser 3) Greedy algorithm, categorical
(entropy)
C4.5 Improves on ID3 support categorical and continuous
(entropy)
C5.0 (See5)
CART similar to C4.5 (Gini Impurity)
Pros
• Easy to Understand (white box)
• Supports both Numerical and Categorical data
• Fast (greedy) algorithms
• Performs well with large datasets
• Accurate
• Feature importance
Cons
• Without pruning/Cross-validation Prone to overfitting
• Information gain biased toward features with a lot of classes
• Sensitive to changes in the data
DEMO
Resources
• https://github.com/csantill/AustinSIGKDD-DecisionTrees
• Decision Forests for Classification, Regression, Density
Estimation, Manifold Learning and Semi-Supervised Learning
• Classification and Regression Trees
• A Visual Introduction to Machine Learning
• A Complete Tutorial on Tree Based Modeling from Scratch
• Theoretical Comparison between the Gini Index and
Information Gain Criteria
Thank You
Carlos Santillan

Decision Trees

  • 1.
    Advanced Machine Learningwith Python Session 9 :Decision Trees SIGKDD Carlos Santillan Bentley Systems Inc csantill@gmail.com
  • 4.
    Decision Trees A tree-likegraph decision support model
  • 5.
  • 6.
    Types of DecisionTrees There are two main Types • Classification Tree (Categorical Value Decision Tree) • Regression Tree (Continuous Variable Decision Tree) CART (Classification and Regression Tree) Used to refer to both The type of a Decision tree is based on the type of the target Variable
  • 7.
    Nodes 1.Root Node 2.Internal Node(Decision Node) 3.Leaf (terminal) Depth - Length of of longest path from root to leaf Decision Stump (One level decision Tree) Decision Tree Terms
  • 8.
    Decision Tree Algorithm Thebasic greedy algorithm is as follows: Start at Node find “best attribute” to split at Repartition N into N1, N2, … according to best split Repeat for each Node N until “stop condition” is met Growing an optimal Decision Tree is an NP-complete Problem Fortunately greedy algorithms have good accuracy and performance
  • 9.
    What is the“Best Attribute” to split There are different criteria that can be used to determine what is the best attribute to split. • Information Gain • Gini Index • Classification Error • Gain Ratio (Normalized Information Gain) • Variance Reduction
  • 10.
  • 11.
    Entropy Def: Measure ofImpurity in our sample • Entropy =0 (All elements are same class) • Entropy =1 (All elements evenly split between classes)
  • 12.
    Information Gain Information Gain= Entropy (Parent) - [ Weighted Average] Entropy (Children) If we Split X < 4 • Entropy < 4 = 0.86 • Entropy > 4 = 0 Information Gain = 0.95 - 14/16 (0.86) - (2/16) (0) Information Gain = 0.19
  • 13.
    Information Gain IG =Entropy (Parent) - [ Weighted Average] Entropy (Children) If we Split X < 3 • Entropy < 3 = 0 • Entropy > 3 = 0.811 Information Gain = 0.95 - 8/16 (0) - (2/16) (0.811) Information Gain = 0.8486
  • 14.
    GINI Index Definition :Expected error rate • GINI =0 (All elements are same class) • GINI =0.5 (All elements evenly split between classes)
  • 15.
    GINI Gain If weSplit X < 4 • Gini < 4 = 0.4081 • Gini > 4 = 0 Gini Gain = 0.4687 - 10/16 (0.4081) - (0/16) (0) Gini Gain = 0.2136
  • 16.
    GINI Gain If weSplit X < 3 • Gini< 3 = 0 • Gini > 3 = 0.375 Gini Gain = 0.4687 - 8/16 (0) - (2/16) (0.375) Gini Gain = 0.421825
  • 17.
    When to usewhich? ● Gini for continuous attributes ● Entropy for categorical. ● Entropy is slower to calculate than GINI ● Gini may fail with very small probability ● Difference between the two is theoretically around 2%
  • 18.
    When to stopgrowing? • All data points at leaf are pure • When tree a reaches depth k • Number of cases in node less that minimum number of cases • Splitting criteria less than certain threshold
  • 19.
    Pruning Prevent over fitting Smallertrees may be more accurate Strategies: • Prepruning : Stop growing when information becomes unreliable • Postpruning : fully grow a tree and remove unreliable parts Note: Pruning currently not supported by scikit
  • 20.
    Algorithms ID3 (Iterative Dichotomiser3) Greedy algorithm, categorical (entropy) C4.5 Improves on ID3 support categorical and continuous (entropy) C5.0 (See5) CART similar to C4.5 (Gini Impurity)
  • 21.
    Pros • Easy toUnderstand (white box) • Supports both Numerical and Categorical data • Fast (greedy) algorithms • Performs well with large datasets • Accurate • Feature importance
  • 22.
    Cons • Without pruning/Cross-validationProne to overfitting • Information gain biased toward features with a lot of classes • Sensitive to changes in the data
  • 23.
  • 24.
    Resources • https://github.com/csantill/AustinSIGKDD-DecisionTrees • DecisionForests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning • Classification and Regression Trees • A Visual Introduction to Machine Learning • A Complete Tutorial on Tree Based Modeling from Scratch • Theoretical Comparison between the Gini Index and Information Gain Criteria
  • 25.