Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
8 decision tree working-sheet-0
1. Decision Tree Working Example
Rec Age Income Student Credit_rating Buys_computer
R1 <=30 High No Fair No
R2 <=30 High No Excellent No
R3 31..40 High No Fair Yes
R4 >40 Medium No Fair Yes
R5 >40 Low Yes Fair Yes
R6 >40 Low Yes Excellent No
R7 31..40 Low Yes Excellent Yes
R8 <=30 Medium No Fair No
R9 <=30 Low Yes Fair Yes
R10 >40 Medium Yes Fair Yes
R11 <=30 Medium Yes Excellent Yes
R12 31..40 Medium No Excellent Yes
R13 31..40 High Yes Fair Yes
R14 >40 Medium No Excellent No
Expected information (entropy) needed to classify a tuple in Database „D‟:
Info (D) = -9/14 log (9/14) -5/14 log (5/14)
= -0.64286 * log (0.64286)-0.35714 * log (0.35714)
=-0.64286* (-0.6373)-0.35714*(-1.485438)
Info (D) = 0.40976 + 0.530496=0.940256 bits
)
(
log
)
( 2
1
i
m
i
i p
p
D
Info
2. Information needed (after using attribute „A‟ to split database „D‟ into „V‟ partitions) to
classify D:
For Attribute “Age”
Info Age (D) = 5/14 I(2,3) + 4/14 I (4,0) + 5/14 I(3,2)
= 5/14[-2/5 log (2/5)-3/5 log (3/5)] +4/14[-4/4 log(4/4)-0/4* log (0/4)] +5/14[-3/5 log (3/5)-
2/5log(2/5)]
= 0.35714[-0.4*(-1.321928)-0.6*(-0.736966)]+
0.28571[-1*0]+
0.35714 * [-0.6*(-0.736966)-0.4*(-1.321928)]
=0.35714 *[0.528771+0.44218]+0.35714 *[0.44218+0.528771]
Info Age (D) =0.34676+0.34676=0.693531 bits
Information gained by branching on attribute Age
Gain (age) = 0.940256-0.693531=0.2467 bits
Similarly for Attribute “Income”
Info income (D) =4/14[-2/4 log(2/4)-2/4 log (2/4)] +6/14[-4/6 log (4/6)-2/6 log (2/6)] +4/14[-3/4
log (3/4)-1/4 log (1/4)]
=0.2857 *[-0.5*log(0.5)-0.5 *log (0.5)] + 0.4285 *[-0.66 * log(0.66)-0.33 log(0.33)]
+0.2857[-0.75* log(0.75)-0.25 log (0.25)]
=0.2857[0.5+0.5] +0.4285[0.395645+0.5278]+0.2857[0.311278+0.5]
=0.2857*1+0.4285*0.923445+0.2857*0.811278
Info income (D ) =0.2857+0.3956+0.23178=0.91308 bits
Gain (income) = 0.940256-0.91308=0.027 bits
)
(
|
|
|
|
)
(
1
j
v
j
j
A D
Info
D
D
D
Info
(D)
Info
Info(D)
Gain(A) A
3. For Attribute “Student”
Info student (D) =7/14[-6/7log(6/7)-1/7log(1/7)] +7/14[-3/7log (3/7)-4/7log (4/7)]
=0.50 *[-0.86-0.22-0.14-0.281] +0.50[-0.43-1.22-0.57-0.81]
=0.50*[0.19+0.39]+0.50[0.52+0.46]
=0.50*0.58+0.5*0.98
Info student (D) = =0.29+0.49=0.78 bits
Gain (Student) =0.940256-0.78=0.16 bits
For Attribute “Credit Rating”
Info Credit Rating(D)=8/14[-6/8 log(6/8)-2/8 log(2/8)] +6/14[-3/6log (3/6)-3/6 log (3/6)]
=0.57*[-0.75*(-0.42)-0.25*(-2.00)]+0.43[-0.50*(-1)-0.50*(-1)]
=0.57*[0.32+0.50] +0.43[0.50*0.50]
=0.57*0.82+0.43*1.00
Info Credit Rating (D) =0.47+0.43=0.90 bits
Gain (Credit Rating) =0.940256-0.90=0.04 bits