SlideShare a Scribd company logo
1 of 20
Download to read offline
Decision Tree
Presenter: Jaehyeong Ahn
Department of Applied Statistics, Konkuk University
jayahn0104@gmail.com
1 / 20
Contents
Introduction
Decision Tree in a nutshell
Four Ingredients in Decision Tree Construction
ā€¢ Splitting rule
ā€¢ Stopping rule
ā€¢ Pruning rule
ā€¢ Prediction
Some Algorithms for Decision Tree
Advantages and Disadvantages
2 / 20
Introduction
ā€¢ Decision tree is one of the most widely used supervised
learning method in machine learning
ā€¢ Its big appeal is that the decision process is very much akin to
how we humans make decisions
ā€¢ Therefore it is easy to understand and accept the results
coming from the tree-style decision process
ā€¢ It used to be an baseline model when constructing some other
algorithms (e.g. Random Forest, Boosting)
3 / 20
Decision Tree in a nutshell
Decision Tree Construction Process
1 Recursively partition the feature space X into the subregions
tļœ±, Ā· Ā· Ā· , tm based on the speciļ¬c splitting rule
2 Stop splitting the feature space (or stop growing tree) when
the stopping rule holds
3 Prune the grown tree based on the pruning rule
4 Make a (local) prediction on each subregion in pruned tree
based on the speciļ¬c estimation method
4 / 20
Four Ingredients in Decision Tree
Construction
ā€¢ Splitting rule
ā€¢ How does the split works?
ā€¢ What is the criterion for splitting the feature space?
ā€¢ Stopping rule
ā€¢ What kind of threshold is used for stopping?
ā€¢ Pruning rule
ā€¢ Why prune the tree?
ā€¢ How does it works?
ā€¢ Estimation method
ā€¢ What kind of estimation method is used?
ā€¢ How does it works?
5 / 20
Notation
ā€¢ D = {(x(i)
, y(i)
)}N
i=ļœ±: the dataset,
ā€¢ Where x(i)
= (x
(i)
ļœ± , Ā· Ā· Ā· , x
(i)
d ) āˆˆ X and y(i)
āˆˆ Y, where Y = {ļœ±, Ā· Ā· Ā· , K}
ā€¢ Let t be a node, which is also identiļ¬ed with a subregion of X
ā€¢ D(t) = {(x(i)
, y(i)
) āˆˆ D : (x(i)
, y(i)
) āˆˆ t}: the set of data points in the
node t
ā€¢ Let T be a tree
ā€¢ N = |D|: the total number of data points
ā€¢ N(t) = |D(t)|: the number of data points in the node t
ā€¢ Nj(t) = |{(x(i)
, y(i)
āˆˆ D(t) : y(i)
= j}|: the number of data points in the
node t with class label j āˆˆ Y
ā€¢ p(j, t) =
Nj
N
Ā·
Nj (t)
Nj
=
Nj (t)
N
ā€¢ p(t) = j p(j, t) = N(t)
N
ā€¢ p(j|t) = p(j,t)
p(t)
=
Nj (t)
N(t)
6 / 20
Splitting Rule
ā€¢ For each node, determine a splitting variable xj and splitting
criterion c
ā€¢ For continuous splitting variable, splitting criterion c is a
number.
ā€¢ For example, if an observation x(i)
is the case that its splitting
variable x
(i)
j < c
ā€¢ Then tree assigns it to the left child node. Otherwise tree
assigns it to the right child
ā€¢ For categorical variable, the splitting criterion divides the
range of the splitting variable in two parts
ā€¢ For example, let splitting variable xj āˆˆ{1, 2, 3, 4}
ā€¢ And let the splitting criterion is {1, 2, 4}
ā€¢ If x
(i)
j āˆˆ{1, 2, 4}, tree assigns it to the left child. Otherwise,
tree assigns it to the right child
7 / 20
Splitting Rule
ā€¢ A split is determined based on the impurity
ā€¢ Impurity (or purity) is the measure of homogeneity for a given
node
ā€¢ For each node, we select a splitting variable and a splitting
criterion which minimizes the sum of impurities of the two
child nodes
ā€¢ Impurity is calculated by an impurity function Ļ† which
satisļ¬es the following conditions
8 / 20
Splitting Rule
ā€¢ Deļ¬nition 1. An impurity function Ļ† is a function
Ļ†(pļœ±, Ā· Ā· Ā· , pK) deļ¬ned for pļœ±, Ā· Ā· Ā· , pK with pj ā‰„0 for all j and
pļœ± + Ā· Ā· Ā· + pK =1 such that
(i) Ļ†(pļœ±, Ā· Ā· Ā· , pK) ā‰„0
(ii) Ļ†(ļœ±/K, Ā· Ā· Ā· , ļœ±/K) is the maximum value of Ļ†
(iii) Ļ†(pļœ±, Ā· Ā· Ā· , pK) is symmetric with regard to pļœ±, Ā· Ā· Ā· , pK
(iv) Ļ†(1, 0, Ā· Ā· Ā· , 0) = Ļ†(0, 1, Ā· Ā· Ā· , 0) = Ļ†(0, Ā· Ā· Ā· , 0, 1) = 0
ā€¢ Deļ¬nition 2. For node t, its impurity (measure) i(t) is
deļ¬ned as
i(t) = Ļ†(p(1|t), Ā· Ā· Ā· , p(K|t))
9 / 20
Splitting Rule
Examples of impurity functions
ā€¢ Entropy impurity
Ļ†(pļœ±, Ā· Ā· Ā· , pK) = āˆ’
j
pj log pj,
(Where we use the convention 0 log 0 = 0)
ā€¢ Gini impurity
Ļ†(pļœ±, Ā· Ā· Ā· , pK) =
1
2
j
pj(1 āˆ’ pj)
10 / 20
Splitting Rule
ā€¢ Deļ¬nition 3. The decrease in impurity
ā€¢ Let t be a node and let s be split of t into two child nodes
tL and tR
āˆ†i(s, t) = i(t) āˆ’ pLi(tL) āˆ’ pRi(tR)
ā€¢ Where pL = p(tL)
p(t) and pR = p(tR)
p(t) , pL + pR =1
ā€¢ Then āˆ†i(t) ā‰„0 (see Proposition 4.4. in Breiman et al. (1984))
ā€¢ Hence the splitting rule at t is sāˆ— such that we take the split
sāˆ— among all possible candidate splits that decreases the cost
most
sāˆ—
= argmax
s
āˆ†i(s, t)
11 / 20
Splitting Rule
Why does not use misclassiļ¬cation error as an impurity
function?
ā€¢ Recall that āˆ†i(s, t) = i(t) āˆ’ pLi(tL) āˆ’ pRi(tR) and sāˆ—
= argmax
s
āˆ†i(s, t)
ā€¢ Misclassiļ¬cation error: iM (t) = ļœ± āˆ’ max
j
pj, (where pj = p(j|t))
ā€¢ A: āˆ†iM (sA, t) = (1 āˆ’ 1
2
) āˆ’ (40
80
) āˆ— (1 āˆ’ 3
4
) āˆ’ (40
80
) āˆ— (1 āˆ’ 3
4
) = 1
4
ā€¢ B: āˆ†iM (sB, t) = (1 āˆ’ 1
2
) āˆ’ (60
80
) āˆ— (1 āˆ’ 4
6
) āˆ’ (20
80
) āˆ— (1 āˆ’ 1) = 1
4
ā€¢ Entropy impurity: iE(t) = āˆ’ j pjlog(pj)
ā€¢ A: āˆ†iE(sA, t) = 0.130812
ā€¢ B: āˆ†iE(sB, t) = 0.2157616
12 / 20
Stopping Rule
ā€¢ Stopping rules terminate further splitting
ā€¢ For example
ā€¢ All observations in a node are contained in one group
ā€¢ The number of observations in a node is small
ā€¢ The decrease of impurity is small
ā€¢ The depth of a node is larger than a given number
13 / 20
Pruning Rule
ā€¢ A tree with too many nodes will have large prediction error
rate for new observations
ā€¢ It is appropriate to prune away some branch of tree for good
prediction error rate
ā€¢ To determine the size of tree, we estimate prediction error
using validation set or cross validation
14 / 20
Pruning Rule
Pruning Process
ā€¢ For a given tree T and positive number Ī±, cost-complexity
pruning is deļ¬ned by
cost-complexity(Ī±) = error rate of T + Ī±|T|
(Where |T| is the number of nodes)
ā€¢ In general, the larger tree(the larger |T|), the smaller error
rate (see Proposition 1. in Hyeongin Choi (2017)). But
cost-complexity does not decrease as |T| increases.
ā€¢ For the grown tree Tmax, T(Ī±) is a subtree which minimizes
cost-complexity(Ī±)
ā€¢ In general, the larger Ī±, the smaller |T(Ī±)|
15 / 20
Pruning Rule
Pruning Process
ā€¢ One important property of T(Ī±) is that if Ī±ļœ± ā‰¤ Ī±ļœ², then
T(Ī±ļœ±) ā‰„ T(Ī±ļœ²) which means T(Ī±ļœ²) is a subtree of T(Ī±ļœ±)
ā€¢ Let Tļœ± = T(Ī±ļœ±), Tļœ² = T(Ī±ļœ²), Ā· Ā· Ā· then we can get the
following sequence of pruned subtrees
Tļœ± ā‰„ Tļœ² ā‰„ Ā· Ā· Ā· ā‰„ {tļœ±},
where 0 = Ī±ļœ± < Ī±ļœ² < Ā· Ā· Ā·
ā€¢ For a given Ī±, we estimate the generalization error of T(Ī±) by
validation set or cross-validation
ā€¢ Choose Ī±āˆ— (and corresponding T(Ī±āˆ—)) which minimizes the
(estimated) generalization error. (see Hyeongin Choi (2017)
for details)
16 / 20
Estimation Method
ā€¢ Once a tree is ļ¬xed, a prediction at each subregion (terminal
node) can be determined from that tree
ā€¢ Since the estimation is operated at each subregion, it is called
a local estimation
ā€¢ For this local estimation, any estimation method can be
obtained. What kind of method to use is dependent on the
model assumption.
ā€¢ For example, for classiļ¬cation, majority vote or
logistic regression is obtained by modelerā€™s probability
assumption
17 / 20
Some Algorithms for Decision Tree
CHAID (CHi-squared Automatic Interaction Detector)
ā€¢ by J. A. Hartigan 1975
ā€¢ Employes Ļ‡2
statistic as impurity
ā€¢ No pruning process, it stops growing at a certain size
CART (Classiļ¬cation And Regression Tree)
ā€¢ by Breiman and et al. 1984
ā€¢ Only binary split is operated
ā€¢ Cost-complexity pruning is an important unique feature
C5.0 (successor of ID3 and C4.5)
ā€¢ by J. Ross Quinlan 1993
ā€¢ Multisplit is available
ā€¢ For categorical input variable, a node splits into the number of categories.
18 / 20
Advantages and Disadvantages
Advantages
ā€¢ Easy to interpret and explain
ā€¢ Trees can be displayed graphically
ā€¢ Trees can easily handle both continuous and categorical
variables without the need to create dummy variables
Disadvantages
ā€¢ Poor prediction accuracy compared to other models
ā€¢ When depth is large, not only accuracy but interpretation are
bad
19 / 20
References
1 Breiman, L. Friedman, J.H, Olshen, R.A. and Stone, C.J., Classiļ¬cation
And Regression Trees, Chapman & Hall/CRC (1984)
2 Hyeongin Choi (2017), Lecture 9: Classiļ¬cation and Regreesion Tree
(CART), http://www.math.snu.ac.kr/āˆ¼
hichoi/machinelearning/lecturenotes/CART.pdf
3 Yongdai Kim (2017), Chapter 6. Decision Tree,
https://stat.snu.ac.kr/ydkim/courses/2017-1/addm/Chap6-
DecisionTree.pdf
20 / 20

More Related Content

What's hot

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
Ā 
Decision tree
Decision treeDecision tree
Decision treeSEMINARGROOT
Ā 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
Ā 
Decision tree
Decision treeDecision tree
Decision treeR A Akerkar
Ā 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
Ā 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
Ā 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
Ā 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
Ā 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
Ā 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
Ā 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
Ā 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
Ā 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
Ā 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
Ā 
Random forest
Random forestRandom forest
Random forestUjjawal
Ā 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
Ā 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
Ā 
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsWEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsDataminingTools Inc
Ā 

What's hot (20)

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Ā 
Decision tree
Decision treeDecision tree
Decision tree
Ā 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
Ā 
Decision tree
Decision treeDecision tree
Decision tree
Ā 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Ā 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Ā 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Ā 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Ā 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Ā 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
Ā 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
Ā 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
Ā 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Ā 
Decision tree
Decision treeDecision tree
Decision tree
Ā 
Decision tree
Decision treeDecision tree
Decision tree
Ā 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Ā 
Random forest
Random forestRandom forest
Random forest
Ā 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
Ā 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
Ā 
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsWEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic Methods
Ā 

Similar to Decision Tree

Decision tree
Decision treeDecision tree
Decision treeVarun Jain
Ā 
Class13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfClass13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfAkashSingh625550
Ā 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
Ā 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
Ā 
08 cie552 image_segmentation
08 cie552 image_segmentation08 cie552 image_segmentation
08 cie552 image_segmentationElsayed Hemayed
Ā 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptglorypreciousj
Ā 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
Ā 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
Ā 
Course-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfCourse-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfShreeDevi42
Ā 
Advanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfAdvanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfHariPrasad314745
Ā 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8Wake Tech BAS
Ā 
Support vector machine
Support vector machineSupport vector machine
Support vector machinePrasenjit Dey
Ā 
presentation
presentationpresentation
presentationjie ren
Ā 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random ForestBong-Ho Lee
Ā 
Increasing electrical grid stability classification performance using ensemb...
Increasing electrical grid stability classification performance  using ensemb...Increasing electrical grid stability classification performance  using ensemb...
Increasing electrical grid stability classification performance using ensemb...IJECEIAES
Ā 

Similar to Decision Tree (20)

Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
Ā 
Extended Isolation Forest
Extended Isolation Forest Extended Isolation Forest
Extended Isolation Forest
Ā 
Decision tree
Decision treeDecision tree
Decision tree
Ā 
Benelearn2016
Benelearn2016Benelearn2016
Benelearn2016
Ā 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
Ā 
Class13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfClass13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdf
Ā 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
Ā 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
Ā 
08 cie552 image_segmentation
08 cie552 image_segmentation08 cie552 image_segmentation
08 cie552 image_segmentation
Ā 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
Ā 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
Ā 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Ā 
Course-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfCourse-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdf
Ā 
Advanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfAdvanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdf
Ā 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Ā 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
Ā 
presentation
presentationpresentation
presentation
Ā 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random Forest
Ā 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
Ā 
Increasing electrical grid stability classification performance using ensemb...
Increasing electrical grid stability classification performance  using ensemb...Increasing electrical grid stability classification performance  using ensemb...
Increasing electrical grid stability classification performance using ensemb...
Ā 

More from Konkuk University, Korea

More from Konkuk University, Korea (6)

(Slide)concentration effect
(Slide)concentration effect(Slide)concentration effect
(Slide)concentration effect
Ā 
House price
House priceHouse price
House price
Ā 
Titanic data analysis
Titanic data analysisTitanic data analysis
Titanic data analysis
Ā 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
Ā 
Anomaly Detection: A Survey
Anomaly Detection: A SurveyAnomaly Detection: A Survey
Anomaly Detection: A Survey
Ā 
Random Forest
Random ForestRandom Forest
Random Forest
Ā 

Recently uploaded

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
Ā 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
Ā 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...amitlee9823
Ā 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
Ā 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
Ā 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
Ā 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
Ā 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
Ā 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
Ā 
CALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service Online
CALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service OnlineCALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service Online
CALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
Ā 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
Ā 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
Ā 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
Ā 
ź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Call
ź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Callź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Call
ź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Callshivangimorya083
Ā 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
Ā 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
Ā 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
Ā 

Recently uploaded (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Ā 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
Ā 
CHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
Ā 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
Ā 
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore ...
Ā 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Ā 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Ā 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
Ā 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
Ā 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
Ā 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Ā 
CALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service Online
CALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service OnlineCALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service Online
CALL ON āž„8923113531 šŸ”Call Girls Chinhat Lucknow best sexual service Online
Ā 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
Ā 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Ā 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Ā 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
Ā 
ź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Call
ź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Callź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Call
ź§ā¤ Greater Noida Call Girls Delhi ā¤ź§‚ 9711199171 ā˜Žļø Hard And Sexy Vip Call
Ā 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Ā 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
Ā 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Ā 

Decision Tree

  • 1. Decision Tree Presenter: Jaehyeong Ahn Department of Applied Statistics, Konkuk University jayahn0104@gmail.com 1 / 20
  • 2. Contents Introduction Decision Tree in a nutshell Four Ingredients in Decision Tree Construction ā€¢ Splitting rule ā€¢ Stopping rule ā€¢ Pruning rule ā€¢ Prediction Some Algorithms for Decision Tree Advantages and Disadvantages 2 / 20
  • 3. Introduction ā€¢ Decision tree is one of the most widely used supervised learning method in machine learning ā€¢ Its big appeal is that the decision process is very much akin to how we humans make decisions ā€¢ Therefore it is easy to understand and accept the results coming from the tree-style decision process ā€¢ It used to be an baseline model when constructing some other algorithms (e.g. Random Forest, Boosting) 3 / 20
  • 4. Decision Tree in a nutshell Decision Tree Construction Process 1 Recursively partition the feature space X into the subregions tļœ±, Ā· Ā· Ā· , tm based on the speciļ¬c splitting rule 2 Stop splitting the feature space (or stop growing tree) when the stopping rule holds 3 Prune the grown tree based on the pruning rule 4 Make a (local) prediction on each subregion in pruned tree based on the speciļ¬c estimation method 4 / 20
  • 5. Four Ingredients in Decision Tree Construction ā€¢ Splitting rule ā€¢ How does the split works? ā€¢ What is the criterion for splitting the feature space? ā€¢ Stopping rule ā€¢ What kind of threshold is used for stopping? ā€¢ Pruning rule ā€¢ Why prune the tree? ā€¢ How does it works? ā€¢ Estimation method ā€¢ What kind of estimation method is used? ā€¢ How does it works? 5 / 20
  • 6. Notation ā€¢ D = {(x(i) , y(i) )}N i=ļœ±: the dataset, ā€¢ Where x(i) = (x (i) ļœ± , Ā· Ā· Ā· , x (i) d ) āˆˆ X and y(i) āˆˆ Y, where Y = {ļœ±, Ā· Ā· Ā· , K} ā€¢ Let t be a node, which is also identiļ¬ed with a subregion of X ā€¢ D(t) = {(x(i) , y(i) ) āˆˆ D : (x(i) , y(i) ) āˆˆ t}: the set of data points in the node t ā€¢ Let T be a tree ā€¢ N = |D|: the total number of data points ā€¢ N(t) = |D(t)|: the number of data points in the node t ā€¢ Nj(t) = |{(x(i) , y(i) āˆˆ D(t) : y(i) = j}|: the number of data points in the node t with class label j āˆˆ Y ā€¢ p(j, t) = Nj N Ā· Nj (t) Nj = Nj (t) N ā€¢ p(t) = j p(j, t) = N(t) N ā€¢ p(j|t) = p(j,t) p(t) = Nj (t) N(t) 6 / 20
  • 7. Splitting Rule ā€¢ For each node, determine a splitting variable xj and splitting criterion c ā€¢ For continuous splitting variable, splitting criterion c is a number. ā€¢ For example, if an observation x(i) is the case that its splitting variable x (i) j < c ā€¢ Then tree assigns it to the left child node. Otherwise tree assigns it to the right child ā€¢ For categorical variable, the splitting criterion divides the range of the splitting variable in two parts ā€¢ For example, let splitting variable xj āˆˆ{1, 2, 3, 4} ā€¢ And let the splitting criterion is {1, 2, 4} ā€¢ If x (i) j āˆˆ{1, 2, 4}, tree assigns it to the left child. Otherwise, tree assigns it to the right child 7 / 20
  • 8. Splitting Rule ā€¢ A split is determined based on the impurity ā€¢ Impurity (or purity) is the measure of homogeneity for a given node ā€¢ For each node, we select a splitting variable and a splitting criterion which minimizes the sum of impurities of the two child nodes ā€¢ Impurity is calculated by an impurity function Ļ† which satisļ¬es the following conditions 8 / 20
  • 9. Splitting Rule ā€¢ Deļ¬nition 1. An impurity function Ļ† is a function Ļ†(pļœ±, Ā· Ā· Ā· , pK) deļ¬ned for pļœ±, Ā· Ā· Ā· , pK with pj ā‰„0 for all j and pļœ± + Ā· Ā· Ā· + pK =1 such that (i) Ļ†(pļœ±, Ā· Ā· Ā· , pK) ā‰„0 (ii) Ļ†(ļœ±/K, Ā· Ā· Ā· , ļœ±/K) is the maximum value of Ļ† (iii) Ļ†(pļœ±, Ā· Ā· Ā· , pK) is symmetric with regard to pļœ±, Ā· Ā· Ā· , pK (iv) Ļ†(1, 0, Ā· Ā· Ā· , 0) = Ļ†(0, 1, Ā· Ā· Ā· , 0) = Ļ†(0, Ā· Ā· Ā· , 0, 1) = 0 ā€¢ Deļ¬nition 2. For node t, its impurity (measure) i(t) is deļ¬ned as i(t) = Ļ†(p(1|t), Ā· Ā· Ā· , p(K|t)) 9 / 20
  • 10. Splitting Rule Examples of impurity functions ā€¢ Entropy impurity Ļ†(pļœ±, Ā· Ā· Ā· , pK) = āˆ’ j pj log pj, (Where we use the convention 0 log 0 = 0) ā€¢ Gini impurity Ļ†(pļœ±, Ā· Ā· Ā· , pK) = 1 2 j pj(1 āˆ’ pj) 10 / 20
  • 11. Splitting Rule ā€¢ Deļ¬nition 3. The decrease in impurity ā€¢ Let t be a node and let s be split of t into two child nodes tL and tR āˆ†i(s, t) = i(t) āˆ’ pLi(tL) āˆ’ pRi(tR) ā€¢ Where pL = p(tL) p(t) and pR = p(tR) p(t) , pL + pR =1 ā€¢ Then āˆ†i(t) ā‰„0 (see Proposition 4.4. in Breiman et al. (1984)) ā€¢ Hence the splitting rule at t is sāˆ— such that we take the split sāˆ— among all possible candidate splits that decreases the cost most sāˆ— = argmax s āˆ†i(s, t) 11 / 20
  • 12. Splitting Rule Why does not use misclassiļ¬cation error as an impurity function? ā€¢ Recall that āˆ†i(s, t) = i(t) āˆ’ pLi(tL) āˆ’ pRi(tR) and sāˆ— = argmax s āˆ†i(s, t) ā€¢ Misclassiļ¬cation error: iM (t) = ļœ± āˆ’ max j pj, (where pj = p(j|t)) ā€¢ A: āˆ†iM (sA, t) = (1 āˆ’ 1 2 ) āˆ’ (40 80 ) āˆ— (1 āˆ’ 3 4 ) āˆ’ (40 80 ) āˆ— (1 āˆ’ 3 4 ) = 1 4 ā€¢ B: āˆ†iM (sB, t) = (1 āˆ’ 1 2 ) āˆ’ (60 80 ) āˆ— (1 āˆ’ 4 6 ) āˆ’ (20 80 ) āˆ— (1 āˆ’ 1) = 1 4 ā€¢ Entropy impurity: iE(t) = āˆ’ j pjlog(pj) ā€¢ A: āˆ†iE(sA, t) = 0.130812 ā€¢ B: āˆ†iE(sB, t) = 0.2157616 12 / 20
  • 13. Stopping Rule ā€¢ Stopping rules terminate further splitting ā€¢ For example ā€¢ All observations in a node are contained in one group ā€¢ The number of observations in a node is small ā€¢ The decrease of impurity is small ā€¢ The depth of a node is larger than a given number 13 / 20
  • 14. Pruning Rule ā€¢ A tree with too many nodes will have large prediction error rate for new observations ā€¢ It is appropriate to prune away some branch of tree for good prediction error rate ā€¢ To determine the size of tree, we estimate prediction error using validation set or cross validation 14 / 20
  • 15. Pruning Rule Pruning Process ā€¢ For a given tree T and positive number Ī±, cost-complexity pruning is deļ¬ned by cost-complexity(Ī±) = error rate of T + Ī±|T| (Where |T| is the number of nodes) ā€¢ In general, the larger tree(the larger |T|), the smaller error rate (see Proposition 1. in Hyeongin Choi (2017)). But cost-complexity does not decrease as |T| increases. ā€¢ For the grown tree Tmax, T(Ī±) is a subtree which minimizes cost-complexity(Ī±) ā€¢ In general, the larger Ī±, the smaller |T(Ī±)| 15 / 20
  • 16. Pruning Rule Pruning Process ā€¢ One important property of T(Ī±) is that if Ī±ļœ± ā‰¤ Ī±ļœ², then T(Ī±ļœ±) ā‰„ T(Ī±ļœ²) which means T(Ī±ļœ²) is a subtree of T(Ī±ļœ±) ā€¢ Let Tļœ± = T(Ī±ļœ±), Tļœ² = T(Ī±ļœ²), Ā· Ā· Ā· then we can get the following sequence of pruned subtrees Tļœ± ā‰„ Tļœ² ā‰„ Ā· Ā· Ā· ā‰„ {tļœ±}, where 0 = Ī±ļœ± < Ī±ļœ² < Ā· Ā· Ā· ā€¢ For a given Ī±, we estimate the generalization error of T(Ī±) by validation set or cross-validation ā€¢ Choose Ī±āˆ— (and corresponding T(Ī±āˆ—)) which minimizes the (estimated) generalization error. (see Hyeongin Choi (2017) for details) 16 / 20
  • 17. Estimation Method ā€¢ Once a tree is ļ¬xed, a prediction at each subregion (terminal node) can be determined from that tree ā€¢ Since the estimation is operated at each subregion, it is called a local estimation ā€¢ For this local estimation, any estimation method can be obtained. What kind of method to use is dependent on the model assumption. ā€¢ For example, for classiļ¬cation, majority vote or logistic regression is obtained by modelerā€™s probability assumption 17 / 20
  • 18. Some Algorithms for Decision Tree CHAID (CHi-squared Automatic Interaction Detector) ā€¢ by J. A. Hartigan 1975 ā€¢ Employes Ļ‡2 statistic as impurity ā€¢ No pruning process, it stops growing at a certain size CART (Classiļ¬cation And Regression Tree) ā€¢ by Breiman and et al. 1984 ā€¢ Only binary split is operated ā€¢ Cost-complexity pruning is an important unique feature C5.0 (successor of ID3 and C4.5) ā€¢ by J. Ross Quinlan 1993 ā€¢ Multisplit is available ā€¢ For categorical input variable, a node splits into the number of categories. 18 / 20
  • 19. Advantages and Disadvantages Advantages ā€¢ Easy to interpret and explain ā€¢ Trees can be displayed graphically ā€¢ Trees can easily handle both continuous and categorical variables without the need to create dummy variables Disadvantages ā€¢ Poor prediction accuracy compared to other models ā€¢ When depth is large, not only accuracy but interpretation are bad 19 / 20
  • 20. References 1 Breiman, L. Friedman, J.H, Olshen, R.A. and Stone, C.J., Classiļ¬cation And Regression Trees, Chapman & Hall/CRC (1984) 2 Hyeongin Choi (2017), Lecture 9: Classiļ¬cation and Regreesion Tree (CART), http://www.math.snu.ac.kr/āˆ¼ hichoi/machinelearning/lecturenotes/CART.pdf 3 Yongdai Kim (2017), Chapter 6. Decision Tree, https://stat.snu.ac.kr/ydkim/courses/2017-1/addm/Chap6- DecisionTree.pdf 20 / 20