2. Decision Tree (1/2)
feature vector (xi)
yi: +1:Yes, -1: No
• Training set
• Learned decision tree
[Note]
Only one feature will be
involved at a node
2
3. Decision Tree (2/2)
• Example:
Outlook Temperature Humidity Windy Play
Rainy Hot High True ?
Ans: no
(A test instance)
3
4. Why use decision tree
• Easy to interpret
• If outlook is sunny and humidity is high, then
we can’t play tennis ball.
• Perform feature selection
• The top few nodes on which the tree is split are
essentially the most important variables within
the dataset and feature selection is completed
automatically
4
5. Why not use decision tree
• Decision Trees do not work well if you have smooth boundaries
5
Smooth boundaries D.T. boundaries
6. Why not use decision tree
• Poor Resolution on Data With Complex Relationships Among the
Variables
6
8. CART (Classification and Regression Trees)
Dataset S include n classes ,Gini(S) define as
pj is probability of value in S belong to class j
CART Briemen(1984) Discrete and
continuous
Gini Index Entire Error Rate
Learning
Method(演算法)
Author(作者) Data Type(資料屬
性)
Splitting Rule(分割
規則)
Pruning Rule(修剪
樹規則)
9. Example
• GiniOutlook(S) =
3
4
* (1 – (
3
3
)2 – (
0
3
)2 ) +
1
4
* (1 – (
1
1
)2 – (
0
1
)2 ) = 0
9
Thresholdsunny rain
PlayNot play
We have two feature Outlook、Temp., and we want to know that will we play tennis?
Thresholdhot cold
PlayNot play
GiniTemp. (S) =
2
4
* (1 – (
1
2
)2 – (
1
2
)2 ) +
2
4
* (1 – (
1
2
)2 – (
1
2
)2 ) =
1
2
Best threshold
Worst threshold
Method’s Blue part is calculate how pure can we get after we cut threshold. Orange part means its weight.
10. Example
DAY Outlook Temp. Play tennis
D1 Sunny Hot NO
D2 Sunny Hot YES
D3 Sunny Mild NO
D4 Sunny Cold YES
D5 Rain Cold YES
Sunny rain
NO 3 0
YES 1 1
Play
Outlook Hot Mild Cold
NO 1 1 0
YES 1 0 2
Play
Temp.
We have two feature Outlook、Temp., and we want to know that will we play tennis?
And using those two features to draw decision tree by using CART method.
12. Example
• GiniOutlook(S) =
3
10
> GiniTemp.(S) =
4
15
We choose small one to be decision tree’s root.
Temp.
[Cold] [Mild, Hot]
Yes
Outlook
Yes No
[Rain] [Sunny]
13. Example
Riding mower classification
13
Obs # Income Lot size Class
1 Middle Middle Owners
2 High Middle Owners
3 High Big Owners
4 High Big Owners
5 Middle Big Owners
Obs # Income Lot size Class
6 Middle Middle Non-owners
7 Low Big Non-owners
8 Middle Middle Non-owners
9 Low Big Non-owners
10 High Middle Non-owners
A riding-mower manufacturer would like to find a way of classifying families in a city into those that
are likely to purchase a riding mower and those who are not likely to buy one. A pilot random sample
of 5 owners and 5 non-owners in the city is undertaken.
15. How to split?
• Split criterion: Goodness function
• Used to select the attribute to be split at a tree node
during the tree generation phase
• Goodness function in CART: Gini Index
15
19. Why use impurity instead of error as
goodness function?
• The main objective of decision tree is to find pure node containing
only one class
19
Error rate: 25%
A riding-mower manufacturer would like to find a way of classifying families in a city into those that are likely to purchase a riding mower and those who are not likely to buy one. A pilot random sample of 5 owners and 5 non-owners in the city is undertaken.