Decision tree
Hello!
I am Iffat Firozy
I am here because I love to teach.
You can find me at ifirozy@gmail.com
2
1.
INTRODUCTION
Let’s start with the first set of slides
“A decision tree is a structure that includes a root node, branches, and leaf
nodes. Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class label. The
topmost node in the tree is the root node.”
4
5
OUTLIER
WINDYHUMIDITY
NO YES YES NO
YES
SUNNY RAINY
OVERCAST
NORMALHIGH FALSE TRUE
6
BUT HOW can WE GENERATE A DECISION TREE FROM Our DATA SET???
LET'S SOLVE THIS PROBLEM.
Yellow
Is the color of gold,
butter and ripe lemons.
In the spectrum of
visible light, yellow is
found between green
and orange.
Blue
Is the colour of the
clear sky and the deep
sea. It is located
between violet and
green on the optical
spectrum.
Red
Is the color of blood,
and because of this it
has historically been
associated with
sacrifice, danger and
courage.
8
FOR ATTRIBUTE A:
STEP1:INFORMATION GAIN
𝐼0 2,3 =
−
2
2+3
𝑙𝑜𝑔2
2
2+3
−
3
3+2
𝑙𝑜𝑔2
3
3+2
=0.97
𝐼1 4,1 =
−
4
4+1
𝑙𝑜𝑔2
4
4+1
−
1
1+4
𝑙𝑜𝑔2
1
1+4
=.206
𝐼 𝑇 6,4 =
−
6
6+2
𝑙𝑜𝑔2
6
6+2
−
2
2+6
𝑙𝑜𝑔2
2
2+6
=.970
9
A C1/YES C2/NO
0 2 3
1 4 1
FOR ATTRIBUTE A:
STEP2:ENTROPY
𝐸 𝐶𝐿𝐴𝑆𝑆 𝐿𝐴𝐵𝐸𝐿, 𝐴
= 𝑃 0 𝐼0 + 𝑃 1 𝐼1
=
2+3
10
∗ .97 +
4+1
10
*.206
=.588
10
A C1/YES C2/NO
0 2 3
1 4 1
FOR ATTRIBUTE A:
STEP3:ENTROPY
𝐺𝐴𝐼𝑁 = 𝐼 𝑇 - E
=0.97 - .588
=0.38
Now, will follow these 3 steps to find out the
gain of attribute B.
11
FOR ATTRIBUTE B:
STEP1: INFORMATION GAIN
𝐼0 2,4 =0.05
𝐼1 4,0 =0
STEP2: ENTROPY
E=0.03
STEP3: GAIN
𝐼 𝑇 - E
=0.97 – 0.03 =.967
12
B C1/YES C2/NO
0 2 4
1 4 0
SEE!
13
0.967 0.38
SO, B ATTRIBUTE SPLIT THE DECISION TREE
B
A
NO YES
YES
10
10
14
THANKS!
Any questions?
You can find me at @iffat.firozy & ifirozy@gmail.com

Data Mining || Decision Tree

  • 1.
  • 2.
    Hello! I am IffatFirozy I am here because I love to teach. You can find me at ifirozy@gmail.com 2
  • 3.
    1. INTRODUCTION Let’s start withthe first set of slides
  • 4.
    “A decision treeis a structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in the tree is the root node.” 4
  • 5.
    5 OUTLIER WINDYHUMIDITY NO YES YESNO YES SUNNY RAINY OVERCAST NORMALHIGH FALSE TRUE
  • 6.
    6 BUT HOW canWE GENERATE A DECISION TREE FROM Our DATA SET???
  • 8.
    LET'S SOLVE THISPROBLEM. Yellow Is the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange. Blue Is the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum. Red Is the color of blood, and because of this it has historically been associated with sacrifice, danger and courage. 8
  • 9.
    FOR ATTRIBUTE A: STEP1:INFORMATIONGAIN 𝐼0 2,3 = − 2 2+3 𝑙𝑜𝑔2 2 2+3 − 3 3+2 𝑙𝑜𝑔2 3 3+2 =0.97 𝐼1 4,1 = − 4 4+1 𝑙𝑜𝑔2 4 4+1 − 1 1+4 𝑙𝑜𝑔2 1 1+4 =.206 𝐼 𝑇 6,4 = − 6 6+2 𝑙𝑜𝑔2 6 6+2 − 2 2+6 𝑙𝑜𝑔2 2 2+6 =.970 9 A C1/YES C2/NO 0 2 3 1 4 1
  • 10.
    FOR ATTRIBUTE A: STEP2:ENTROPY 𝐸𝐶𝐿𝐴𝑆𝑆 𝐿𝐴𝐵𝐸𝐿, 𝐴 = 𝑃 0 𝐼0 + 𝑃 1 𝐼1 = 2+3 10 ∗ .97 + 4+1 10 *.206 =.588 10 A C1/YES C2/NO 0 2 3 1 4 1
  • 11.
    FOR ATTRIBUTE A: STEP3:ENTROPY 𝐺𝐴𝐼𝑁= 𝐼 𝑇 - E =0.97 - .588 =0.38 Now, will follow these 3 steps to find out the gain of attribute B. 11
  • 12.
    FOR ATTRIBUTE B: STEP1:INFORMATION GAIN 𝐼0 2,4 =0.05 𝐼1 4,0 =0 STEP2: ENTROPY E=0.03 STEP3: GAIN 𝐼 𝑇 - E =0.97 – 0.03 =.967 12 B C1/YES C2/NO 0 2 4 1 4 0
  • 13.
    SEE! 13 0.967 0.38 SO, BATTRIBUTE SPLIT THE DECISION TREE B A NO YES YES 10 10
  • 14.
    14 THANKS! Any questions? You canfind me at @iffat.firozy & ifirozy@gmail.com