INTRODUCTION
Classification Trees: Whenthe decision tree has categorical target variable. The above tree
is an example of a classification tree because we know that there are two options for the
result.
Regression Trees: When the decision tree has a continuous target variable. For example, a
regression tree would be used for the price of a newly launched product because price can
be anything depending on various constraints.
Both types of decision trees fall under the Classification and Regression Tree (CART)
designation.
Golf players forsunny outlook = {25, 30, 35, 38, 48}
Average of golf players for sunny outlook = (25+30+35+38+48)/5 = 35.2
Standard deviation of golf players for sunny outlook = (((25 – 35.2)
√ 2
+ (30 – 35.2)2
+
… )/5) = 7.78
6.
Golf players forovercast outlook = {46, 43, 52, 44}
Average of golf players for overcast outlook = (46 + 43 + 52 + 44)/4 =
46.25
Standard deviation of golf players for overcast outlook = (((46-
√
46.25)2
+(43-46.25)2
+…)= 3.49
7.
Golf players forovercast outlook = {45, 52, 23, 46, 30}
Average of golf players for overcast outlook =
(45+52+23+46+30)/5 = 39.2
Standard deviation of golf players for rainy outlook = (((45 –
√
39.2)2
+(52 – 39.2)2
+…)/5)=10.87
8.
Weighted standard deviationfor outlook = (4/14)x3.49 +
(5/14)x10.87 + (5/14)x7.78 = 7.66
Standard deviation reduction for outlook = 9.32 – 7.66 =
1.66
9.
Weighted standard deviationfor humidity = (7/14)x9.36 +
(7/14)x8.73 = 9.04
Standard deviation reduction for humidity = 9.32 – 9.04 = 0.27
11.
Root Node
Outlo
ok
!4 data- Global Std dev
5 data - Global Std dev
5
Temp
Hot
Mild
Cool
Sunny
Wind
Weak
Strong
Humidity
High
Normal
12.
Golf players forsunny outlook = {25, 30, 35, 38, 48}
Average of golf players for sunny outlook = (25+30+35+38+48)/5 = 35.2
Standard deviation of golf players for sunny outlook = (((25 – 35.2)
√ 2
+ (30 – 35.2)2
+
… )/5) = 7.78
Considered as Global standard deviation for this sub data set = 7.78
13.
Standard deviation forsunny
outlook and hot temperature = 2.5
Standard deviation for sunny
outlook and cool temperature = 0
Standard deviation for sunny
outlook and mild temperature =
6.5
14.
Weighted standard deviationfor sunny outlook and temperature =
(2/5)x2.5 + (1/5)x0 + (2/5)x6.5 = 3.6
Standard deviation reduction for sunny outlook and temperature =
7.78 – 3.6 = 4.18
15.
Weighted standard deviationsfor sunny outlook and humidity = (3/5)x4.08 + (2/5)x5 =
4.45
Standard deviation reduction for sunny outlook and humidity = 7.78 – 4.45 = 3.33
Weighted standard deviations for sunny outlook and wind = (2/5)x9 + (3/5)x5.56 =
6.93
Standard deviation reduction for sunny outlook and wind = 7.78 – 6.93 = 0.85
Summarizing standard deviations for windy feature
when outlook is sunny
17.
FINAL FORM OFREGRESSION TREE
https://sefiks.com/2018/08/28/a-step-by-step-regression-decision-tree-example/
Leaf Node =
Golf Player
5
5
1
(2) 2
18.
Decision Tree
Entropy
Information gain– Higher gain Best candidate to be selected as a node
Entropy – If all the data belongs to the same class label – Entropy =0 (Pure)
If the input data belongs to many class lables – Entropy = near to 1 (Impure)
Nodes Input attributes (Ex: outlook)
Arcs/links/edges Values of input attributes (Ex: Sunny, Rainy, Overcast)
Top node – Root node
Other nodes in the tree – Intermediate nodes
Leaf node (last level of the tree) – Identifies the corresponding class label (Ex: Play = Yes/ No)
From Decision tree Derive classification rules
How many rules can be derived ? No of leaf level nodes