SlideShare a Scribd company logo
Decision Trees
Gonzalo Martínez Muñoz
Universidad Autónoma de Madrid
2!
•  What is a decision tree?
•  History
•  Decision tree learning algorithm
•  Growing the tree
•  Pruning the tree
•  Capabilities
Outline
3!
•  Hierarchical learning model that recursively
partitions the space using decision rules
•  Valid for classification and regression
•  Ok, but what is a decision tree?
What is a decision tree?
4!
•  Labor negotiations: predict if a collective agreement
is good or bad

Example for classification
good
dental plan?
>full!≤full
goodbad
wage increase
1st year
>36%!≤36%
N/A none half full
dental plan
1020304050
wageincrease1styear
Leaf nodes!
Internal!
nodes!
Root node!
5!
•  Boston housing: predict house values in Boston by
neigbourhood

Example for regression
500000
average
rooms
>5.5!≤ 5.5
350000200000
Average
house year
>75!≤75
3 4 5 6
ave. rooms
5060708090
ave.houseyear
6!
•  Precursors: Expert Based Systems (EBS)
EBS = Knowledge database + Inference Engine
•  MYCIN: Medical diagnosis system based, 600 rules
•  XCON: System for configuring VAX computers, 2500 rules
(1982)
•  The rules were created by experts by hand!! 
•  Knowledge acquisition has to be automatized
•  Substitute the Expert by its archive with solved cases
History
7!
•  CHAID (CHi-squared Automatic Interaction
Detector) Gordon V. Kass ,1980
•  CART (Classification and Regression Trees),
Breiman, Friedman, Olsen and Stone, 1984
•  ID3 (Iterative Dichotomiser 3), Quinlan, 1986
•  C4.5, Quinlan 1993: Based on ID3
History
•  Consider two binary variables. How many ways can
we split the space using a decision tree?

•  Two possible splits and two possible assignments
to the leave nodes à At least 8 possible trees
8!
Computational
0 1
01
9!
Computational
•  Under what conditions someone waits in a
restaurant?

There are 2 x 2 x 2 x 2 x 3 x 3 x 2 x 2 x 4 x 4 = 9216 cases 
and two classes à 29216 possible hypothesis and many more
possible trees!!!!
10!
•  It is just not feasible to find the optimal solution
•  A bias should be selected to build the models.
•  This is a general a problem in Machine Learning.
Computational
11!
For decision trees a greedy approach is generally
selected:
•  Built step by step, instead of building the tree as a
whole
•  At each step the best split with respect to the train
data is selected (following a split criterion).
•  The tree is grown until a stopping criterion is met
•  The tree is generally pruned (following a pruning
criterion) to avoid over-fitting.
Computational
12!
Basic Decision Tree Algorithm
trainTree(Dataset L)
1. T = growTree(L)
2. pruneTree(T,L)
3. return T
growTree(Dataset L)
1.  T.s = findBestSplit(L)
2. if T.s == null return null
3. (L1, L2) = splitData(L,T.s)
4. T.left = growTree(L1)
5. T.right = growTree(L2)
6. return T
Removes
subtrees
uncertain
about their
validity.
13!
Finding the best split
findBestSplit(Dataset L)
1.  Try all possible splits
2. return best
0!
1!
2!
3!
4!
5!
0! 1! 2! 3! 4! 5!
There are not too
many. Feasible
computationally
O(# of attrib x # of
points)
But which one
is best?
14!
Split Criterion
•  It should measure the impurity of node:

 𝑖( 𝑡)=∑𝑖=1↑𝑚▒​ 𝑓↓𝑖 (1−​ 𝑓↓𝑖 )  
 
Gini impurity
(CART)
where fi is the fraction of instances of class i in node t
•  The improvement of a split is the variation of impurity
before and after the split

∆ 𝑖(𝑡, 𝑠)= 𝑖(𝑡)−​ 𝑝↓𝐿 𝑖(​ 𝑡↓𝐿 )−​ 𝑝↓𝑅 𝑖(​ 𝑡↓𝑅 )
where pL (pR) is the proportion of instances going to the
left (right) node
15!
Split Criterion
0!
1!
2!
3!
4!
5!
0! 1! 2! 3! 4! 5!
∆ 𝑖(𝑡, 𝑠)= 𝑖(𝑡)−​ 𝑝↓𝐿 𝑖(​ 𝑡↓𝐿 )−​ 𝑝↓𝑅 𝑖(​ 𝑡↓𝑅 )!
𝑖(𝑡)=2×​8/12 ×​4/12 =0,44!
​ 𝑝↓𝐿 =​5/12     ;  ​ 𝑝↓𝑅 =​
7/12 !
𝑖(​ 𝑡↓𝐿 )=2×​2/5 ×​
3/5 !
𝑖(​ 𝑡↓𝑅 )=2×​6/7 ×​
1/7 !
∆ 𝑖(𝑡, 𝑠)= 0.102!
0.011
 0.102 0.026 0.020
0.044

0.111
0.056
0.011
Which one is
best?
16!
•  Based on entropy
𝐻(𝑡)=−∑𝑖=1↑𝑚▒​ 𝑓↓𝑖 ​​log↓2 ⁠​ 𝑓↓𝑖   
•  Information gain (used in ID4 and C4.5)
𝐼𝐺(𝑡, 𝑠)= 𝐻(𝑡)−​ 𝑝↓𝐿 𝐻(​ 𝑡↓𝐿 )−​ 𝑝↓𝑅 𝐻(​ 𝑡↓𝑅 )
•  Information gain ratio (C4.5)
𝐼𝐺𝑅(𝑡, 𝑠)= 𝐼𝐺(𝑡,   𝑠)/ 𝐻( 𝑠)
Other splitting criteria
17!
•  Any splitting criteria with this shape is good

Splitting criteria
(this is for
binary
problems)!
18!


 

 

Full grown tree
0!
1!
2!
3!
4!
5!
0! 1! 2! 3! 4! 5!
good
X2
>3.5!
≤3.5
bad
X1
>2.5!≤2.5
X2
>2.5!≤2.5
goodX2
>1.5!≤1.5
good X1
>4.5!≤4.5
goodX1
>3.5!≤3.5
good bad
Are we happy
with this tree?
good
How about this
one?
19!
•  All instances assigned to the node considered for
splitting have the same class label
•  No split is found to further partition the data 
•  The number of instances in each terminal node is smaller
than a predefined threshold
•  The impurity gain of the best split is not below a given
threshold
•  The tree has reached a maximum depth
Stopping criteria
The last three elements are also call pre-pruning
20!
Pruning
•  Another option is post pruning (or pruning). Consist in:
•  Grow the tree as much as possible 
•  Prune it afterwards by substituting one subtree by a
single leaf node if the error does not worsen
significantly.
•  This process is continued until no more pruning is
possible.
•  Actually we go back to smaller trees but through a
different path
•  The idea of pruning is to avoid overfitting
21!
Cost-complexity pruning (CART)
•  Cost-complexity based pruning:

​ 𝑅↓𝛼 (𝑡)= 𝑅(𝑡)+ 𝛼· 𝐶(𝑡)
•  R(t) is the error of the decision tree rooted at node t
•  C(t) is the number of leaf nodes from node t
•  Parameter α specifies the relative weight between
the accuracy and complexity of the tree
22!


 

 

Pruning CART
0!
1!
2!
3!
4!
5!
0! 1! 2! 3! 4! 5!
good
X2
>3.5!
≤3.5
bad
X1
>2.5!≤2.5
X2
>2.5!≤2.5
goodX2
>1.5!≤1.5
good X1
>4.5!≤4.5
goodX1
>3.5!≤3.5
good bad
good
​ 𝑅↓𝛼=0.1 (𝑡)=1/5+0.1·1=0.3
​ 𝑅↓𝛼=0.1 (𝑡)=0+0.1·5=0.5
Pruned:!
Unpruned!
Let’s say 𝛼=0.1!
23!
•  CART uses 10-fold cross-validation within the
training data to estimate alpha. Iteratively nine folds
are used for training a tree and one for test. 
•  A tree is trained on nine folds and it is pruned using
all possible alphas (that are finite). 
•  Then each of those trees is tested on the remaining
fold. 
•  The process is repeated 10 times and the alpha value
that gives the best generalization accuracy is kept
Cost-complexity pruning (CART)
24!
•  C4.5 estimates the accuracy % on the leaf nodes
using the upper confidence bound (parameter) of a
normal distribution instead of the data.
•  Error estimate for subtree is the weighted sum of
the error estimates for all its leaves
•  This error is higher when few data instances fall on
a leaf. 
•  Hence, leaf nodes with few instances tend to be
pruned. 
Statistical pruning (C4.5)
25!
•  CART pruning is slower since it has to build 10
extra trees to estimate alpha.
•  C4.5 pruning is faster, however the algorithm does
not propose a way to compute the confidence
threshold
•  The statistical grounds for C4.5 pruning are
questionable. 
•  Using cross validation is safer 
Pruning (CART vs C4.5)
26!
Missing values
•  What can be done if a value is missing?
•  Suppose the value for “Pat” for one instance is
unknown. 
•  The instance with the missing value fall through the
three branches but weighted
•  And the validity of the split
is computed as before
27!
Oblique splits
•  CART algorithms allows for oblique splits, i.e. splits
that are not orthogonal to the attributes axis
•  The algorithm searches for planes with good
impurity reduction
•  The growing tree process becomes slower
•  But trees become more expressive and compact
N1>N2
true!false
- +
28!
•  Minimum number of instances necessary to split a
node. 
•  Pruning/No pruning
•  Pruning confidence. How much to prune?
•  For computational issues the number of nodes or
depth of the tree can be limited
Parameters
29!
Algorithms details
Splitting criterion! Pruning criterion! Other features!
CART!
•  Gini!
•  Twoing!
Cross-validation post-
pruning!
•  Regression/Classif.!
•  Nominal/numeric
attributes!
•  Missing values!
•  Oblique splits!
•  Nominal splits
grouping!
!
ID3! Information Gain (IG)! Pre-pruning.!
•  Classification!
•  Nominal attributes!
C4.5!
•  Information Gain
(IG)!
•  Information Gain
Ratio (IGR)!
!
Statistical based post-
pruning!
•  Classification!
•  Nominal/numeric
attributes!
•  Missing values!
•  Rule generator!
•  Multiple nodes split!
!
30!
Bad things about DT
•  None! Well maybe something…
•  Cannot handle so well complex interactions between
attributes. Lack of expressive power
31!
Bad things about DT
•  Replication problem. Can end up with similar subtrees
in mutually exclusive regions.
32!
Good things about DT
•  Self-explanatory. Easy for non experts to understand.
Can be converted to rules. 
•  Handle both nominal and numeric attributes.
•  Can handle uninformative and redundant attributes.
•  Can handle missing values.
•  Nonparametric method. In principle, no predefined
idea of the concept to learn
•  Easy to tune. Do not have hundreds of parameters
Thanks for listening
Go for a coffee
Questions?
NO!YES
Go for a coffee!Go for a coffee!
Correctly
answered?
YES!NO

More Related Content

What's hot

Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptx
Chode Amarnath
 
Xgboost
XgboostXgboost
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Decision tree
Decision treeDecision tree
Decision tree
SEMINARGROOT
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
kibriaswe
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
Amr BARAKAT
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
 
Decision Trees
Decision TreesDecision Trees
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
Shuai Zhang
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
Daniel Hen
 
Decision tree
Decision treeDecision tree
Decision tree
Varun Jain
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
Zhen Li
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
MdAlAmin187
 
Decision tree
Decision treeDecision tree
Decision tree
shivani saluja
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
07 regularization
07 regularization07 regularization
07 regularization
Ronald Teo
 

What's hot (20)

Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
07 regularization
07 regularization07 regularization
07 regularization
 

Viewers also liked

LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
Machine Learning Valencia
 
From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
Machine Learning Valencia
 
Translators
TranslatorsTranslators
TranslatorsMrsEhm
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
Machine Learning Valencia
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Carlos Santillan
 
Final presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab KothariFinal presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab Kothari
Stevens Institute of Technology
 
Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data Mining
Nasha Dmasive
 
Artificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom DietterichArtificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom Dietterich
Machine Learning Valencia
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
Abdelfattah Al Zaqqa
 
Techniques in Translation
Techniques in TranslationTechniques in Translation
Techniques in Translation
juvelle villafania
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
Venkata Reddy Konasani
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
DataminingTools Inc
 
Translation Types
Translation TypesTranslation Types
Translation Types
Elena Shapa
 

Viewers also liked (15)

LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
 
Translators
TranslatorsTranslators
Translators
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Final presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab KothariFinal presentation MIS 637 A - Rishab Kothari
Final presentation MIS 637 A - Rishab Kothari
 
Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data Mining
 
Artificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom DietterichArtificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom Dietterich
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Techniques in Translation
Techniques in TranslationTechniques in Translation
Techniques in Translation
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Methods Of Translation
Methods Of TranslationMethods Of Translation
Methods Of Translation
 
Translation Types
Translation TypesTranslation Types
Translation Types
 

Similar to L3. Decision Trees

Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
ssuser4c50a9
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
tttiba
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
Luca Zavarella
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
Seonho Park
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
yasir149288
 
Machine learning interviews day3
Machine learning interviews   day3Machine learning interviews   day3
Machine learning interviews day3
rajmohanc
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
Rajasekhar364622
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
Seth Juarez
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
07 learning
07 learning07 learning
07 learning
ankit_ppt
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
ananth
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
Syed Muhammad Zeejah Hashmi
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
Alex Henderson
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
Manzur Ashraf
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Decision trees
Decision treesDecision trees
Decision trees
Ncib Lotfi
 

Similar to L3. Decision Trees (20)

Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Machine learning interviews day3
Machine learning interviews   day3Machine learning interviews   day3
Machine learning interviews day3
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
07 learning
07 learning07 learning
07 learning
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Decision trees
Decision treesDecision trees
Decision trees
 

More from Machine Learning Valencia

L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
Machine Learning Valencia
 
L14. Anomaly Detection
L14. Anomaly DetectionL14. Anomaly Detection
L14. Anomaly Detection
Machine Learning Valencia
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
Machine Learning Valencia
 
L9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking PredictionsL9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking Predictions
Machine Learning Valencia
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
Machine Learning Valencia
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIsL7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
Machine Learning Valencia
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
Machine Learning Valencia
 
L6. Unbalanced Datasets
L6. Unbalanced DatasetsL6. Unbalanced Datasets
L6. Unbalanced Datasets
Machine Learning Valencia
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
Machine Learning Valencia
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
Machine Learning Valencia
 
L1. State of the Art in Machine Learning
L1. State of the Art in Machine LearningL1. State of the Art in Machine Learning
L1. State of the Art in Machine Learning
Machine Learning Valencia
 

More from Machine Learning Valencia (11)

L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
L14. Anomaly Detection
L14. Anomaly DetectionL14. Anomaly Detection
L14. Anomaly Detection
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
L9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking PredictionsL9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking Predictions
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIsL7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
 
L6. Unbalanced Datasets
L6. Unbalanced DatasetsL6. Unbalanced Datasets
L6. Unbalanced Datasets
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
L1. State of the Art in Machine Learning
L1. State of the Art in Machine LearningL1. State of the Art in Machine Learning
L1. State of the Art in Machine Learning
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

L3. Decision Trees

  • 1. Decision Trees Gonzalo Martínez Muñoz Universidad Autónoma de Madrid
  • 2. 2! •  What is a decision tree? •  History •  Decision tree learning algorithm •  Growing the tree •  Pruning the tree •  Capabilities Outline
  • 3. 3! •  Hierarchical learning model that recursively partitions the space using decision rules •  Valid for classification and regression •  Ok, but what is a decision tree? What is a decision tree?
  • 4. 4! •  Labor negotiations: predict if a collective agreement is good or bad Example for classification good dental plan? >full!≤full goodbad wage increase 1st year >36%!≤36% N/A none half full dental plan 1020304050 wageincrease1styear Leaf nodes! Internal! nodes! Root node!
  • 5. 5! •  Boston housing: predict house values in Boston by neigbourhood Example for regression 500000 average rooms >5.5!≤ 5.5 350000200000 Average house year >75!≤75 3 4 5 6 ave. rooms 5060708090 ave.houseyear
  • 6. 6! •  Precursors: Expert Based Systems (EBS) EBS = Knowledge database + Inference Engine •  MYCIN: Medical diagnosis system based, 600 rules •  XCON: System for configuring VAX computers, 2500 rules (1982) •  The rules were created by experts by hand!! •  Knowledge acquisition has to be automatized •  Substitute the Expert by its archive with solved cases History
  • 7. 7! •  CHAID (CHi-squared Automatic Interaction Detector) Gordon V. Kass ,1980 •  CART (Classification and Regression Trees), Breiman, Friedman, Olsen and Stone, 1984 •  ID3 (Iterative Dichotomiser 3), Quinlan, 1986 •  C4.5, Quinlan 1993: Based on ID3 History
  • 8. •  Consider two binary variables. How many ways can we split the space using a decision tree? •  Two possible splits and two possible assignments to the leave nodes à At least 8 possible trees 8! Computational 0 1 01
  • 9. 9! Computational •  Under what conditions someone waits in a restaurant? There are 2 x 2 x 2 x 2 x 3 x 3 x 2 x 2 x 4 x 4 = 9216 cases and two classes à 29216 possible hypothesis and many more possible trees!!!!
  • 10. 10! •  It is just not feasible to find the optimal solution •  A bias should be selected to build the models. •  This is a general a problem in Machine Learning. Computational
  • 11. 11! For decision trees a greedy approach is generally selected: •  Built step by step, instead of building the tree as a whole •  At each step the best split with respect to the train data is selected (following a split criterion). •  The tree is grown until a stopping criterion is met •  The tree is generally pruned (following a pruning criterion) to avoid over-fitting. Computational
  • 12. 12! Basic Decision Tree Algorithm trainTree(Dataset L) 1. T = growTree(L) 2. pruneTree(T,L) 3. return T growTree(Dataset L) 1.  T.s = findBestSplit(L) 2. if T.s == null return null 3. (L1, L2) = splitData(L,T.s) 4. T.left = growTree(L1) 5. T.right = growTree(L2) 6. return T Removes subtrees uncertain about their validity.
  • 13. 13! Finding the best split findBestSplit(Dataset L) 1.  Try all possible splits 2. return best 0! 1! 2! 3! 4! 5! 0! 1! 2! 3! 4! 5! There are not too many. Feasible computationally O(# of attrib x # of points) But which one is best?
  • 14. 14! Split Criterion •  It should measure the impurity of node: 𝑖( 𝑡)=∑𝑖=1↑𝑚▒​ 𝑓↓𝑖 (1−​ 𝑓↓𝑖 )  Gini impurity (CART) where fi is the fraction of instances of class i in node t •  The improvement of a split is the variation of impurity before and after the split ∆ 𝑖(𝑡, 𝑠)= 𝑖(𝑡)−​ 𝑝↓𝐿 𝑖(​ 𝑡↓𝐿 )−​ 𝑝↓𝑅 𝑖(​ 𝑡↓𝑅 ) where pL (pR) is the proportion of instances going to the left (right) node
  • 15. 15! Split Criterion 0! 1! 2! 3! 4! 5! 0! 1! 2! 3! 4! 5! ∆ 𝑖(𝑡, 𝑠)= 𝑖(𝑡)−​ 𝑝↓𝐿 𝑖(​ 𝑡↓𝐿 )−​ 𝑝↓𝑅 𝑖(​ 𝑡↓𝑅 )! 𝑖(𝑡)=2×​8/12 ×​4/12 =0,44! ​ 𝑝↓𝐿 =​5/12     ;  ​ 𝑝↓𝑅 =​ 7/12 ! 𝑖(​ 𝑡↓𝐿 )=2×​2/5 ×​ 3/5 ! 𝑖(​ 𝑡↓𝑅 )=2×​6/7 ×​ 1/7 ! ∆ 𝑖(𝑡, 𝑠)= 0.102! 0.011 0.102 0.026 0.020 0.044 0.111 0.056 0.011 Which one is best?
  • 16. 16! •  Based on entropy 𝐻(𝑡)=−∑𝑖=1↑𝑚▒​ 𝑓↓𝑖 ​​log↓2 ⁠​ 𝑓↓𝑖    •  Information gain (used in ID4 and C4.5) 𝐼𝐺(𝑡, 𝑠)= 𝐻(𝑡)−​ 𝑝↓𝐿 𝐻(​ 𝑡↓𝐿 )−​ 𝑝↓𝑅 𝐻(​ 𝑡↓𝑅 ) •  Information gain ratio (C4.5) 𝐼𝐺𝑅(𝑡, 𝑠)= 𝐼𝐺(𝑡,   𝑠)/ 𝐻( 𝑠) Other splitting criteria
  • 17. 17! •  Any splitting criteria with this shape is good Splitting criteria (this is for binary problems)!
  • 18. 18! Full grown tree 0! 1! 2! 3! 4! 5! 0! 1! 2! 3! 4! 5! good X2 >3.5! ≤3.5 bad X1 >2.5!≤2.5 X2 >2.5!≤2.5 goodX2 >1.5!≤1.5 good X1 >4.5!≤4.5 goodX1 >3.5!≤3.5 good bad Are we happy with this tree? good How about this one?
  • 19. 19! •  All instances assigned to the node considered for splitting have the same class label •  No split is found to further partition the data •  The number of instances in each terminal node is smaller than a predefined threshold •  The impurity gain of the best split is not below a given threshold •  The tree has reached a maximum depth Stopping criteria The last three elements are also call pre-pruning
  • 20. 20! Pruning •  Another option is post pruning (or pruning). Consist in: •  Grow the tree as much as possible •  Prune it afterwards by substituting one subtree by a single leaf node if the error does not worsen significantly. •  This process is continued until no more pruning is possible. •  Actually we go back to smaller trees but through a different path •  The idea of pruning is to avoid overfitting
  • 21. 21! Cost-complexity pruning (CART) •  Cost-complexity based pruning: ​ 𝑅↓𝛼 (𝑡)= 𝑅(𝑡)+ 𝛼· 𝐶(𝑡) •  R(t) is the error of the decision tree rooted at node t •  C(t) is the number of leaf nodes from node t •  Parameter α specifies the relative weight between the accuracy and complexity of the tree
  • 22. 22! Pruning CART 0! 1! 2! 3! 4! 5! 0! 1! 2! 3! 4! 5! good X2 >3.5! ≤3.5 bad X1 >2.5!≤2.5 X2 >2.5!≤2.5 goodX2 >1.5!≤1.5 good X1 >4.5!≤4.5 goodX1 >3.5!≤3.5 good bad good ​ 𝑅↓𝛼=0.1 (𝑡)=1/5+0.1·1=0.3 ​ 𝑅↓𝛼=0.1 (𝑡)=0+0.1·5=0.5 Pruned:! Unpruned! Let’s say 𝛼=0.1!
  • 23. 23! •  CART uses 10-fold cross-validation within the training data to estimate alpha. Iteratively nine folds are used for training a tree and one for test. •  A tree is trained on nine folds and it is pruned using all possible alphas (that are finite). •  Then each of those trees is tested on the remaining fold. •  The process is repeated 10 times and the alpha value that gives the best generalization accuracy is kept Cost-complexity pruning (CART)
  • 24. 24! •  C4.5 estimates the accuracy % on the leaf nodes using the upper confidence bound (parameter) of a normal distribution instead of the data. •  Error estimate for subtree is the weighted sum of the error estimates for all its leaves •  This error is higher when few data instances fall on a leaf. •  Hence, leaf nodes with few instances tend to be pruned. Statistical pruning (C4.5)
  • 25. 25! •  CART pruning is slower since it has to build 10 extra trees to estimate alpha. •  C4.5 pruning is faster, however the algorithm does not propose a way to compute the confidence threshold •  The statistical grounds for C4.5 pruning are questionable. •  Using cross validation is safer Pruning (CART vs C4.5)
  • 26. 26! Missing values •  What can be done if a value is missing? •  Suppose the value for “Pat” for one instance is unknown. •  The instance with the missing value fall through the three branches but weighted •  And the validity of the split is computed as before
  • 27. 27! Oblique splits •  CART algorithms allows for oblique splits, i.e. splits that are not orthogonal to the attributes axis •  The algorithm searches for planes with good impurity reduction •  The growing tree process becomes slower •  But trees become more expressive and compact N1>N2 true!false - +
  • 28. 28! •  Minimum number of instances necessary to split a node. •  Pruning/No pruning •  Pruning confidence. How much to prune? •  For computational issues the number of nodes or depth of the tree can be limited Parameters
  • 29. 29! Algorithms details Splitting criterion! Pruning criterion! Other features! CART! •  Gini! •  Twoing! Cross-validation post- pruning! •  Regression/Classif.! •  Nominal/numeric attributes! •  Missing values! •  Oblique splits! •  Nominal splits grouping! ! ID3! Information Gain (IG)! Pre-pruning.! •  Classification! •  Nominal attributes! C4.5! •  Information Gain (IG)! •  Information Gain Ratio (IGR)! ! Statistical based post- pruning! •  Classification! •  Nominal/numeric attributes! •  Missing values! •  Rule generator! •  Multiple nodes split! !
  • 30. 30! Bad things about DT •  None! Well maybe something… •  Cannot handle so well complex interactions between attributes. Lack of expressive power
  • 31. 31! Bad things about DT •  Replication problem. Can end up with similar subtrees in mutually exclusive regions.
  • 32. 32! Good things about DT •  Self-explanatory. Easy for non experts to understand. Can be converted to rules. •  Handle both nominal and numeric attributes. •  Can handle uninformative and redundant attributes. •  Can handle missing values. •  Nonparametric method. In principle, no predefined idea of the concept to learn •  Easy to tune. Do not have hundreds of parameters
  • 33. Thanks for listening Go for a coffee Questions? NO!YES Go for a coffee!Go for a coffee! Correctly answered? YES!NO