5. ●
We have to determine the attribute that best
classifies the training data; use this attribute at
the root of the tree. Repeat this process at for
each branch.
●
So, now we have to perform Attribute
Selection.
●
For attribute selection we have to find
– Entropy
– Information gain
6. Atrribute Selection...
●
Selection of an attribute to test at each node - choosing the
most useful attribute for classifying examples.
●
Information Gain
– Measures how well a given attribute separates the training
examples according to their target classification.
– This measure is used to select among the candidate
attributes at each step while growing the tree.
7. Step2...
●
Finding Entropy:-
– A measure of homogeneity of the set of examples.
– Given a set S of positive and negative examples of some tar
get concept (a 2-class problem), the entropy of set S relative
to this binary classification is
E(S) = - p(P)log2 p(P) – p(N)log2 p(N)
8. Step2...
●
Example:-
●
Suppose S has 25 examples, 15 positive and 10 negatives
[15+, 10-]. Then the entropy of S relative to this classificatio
n is
– E(S) = - p(P)log2 p(P) – p(N)log2 p(N)
– E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)
9. Step 3...
●
Finding Information Gain:-
– Information gain measures the expected reduction in entrop
y, or uncertainty.
– Values(A) is the set of all possible values for attribute A, and Sv th
e subset of S for which attribute A has value v Sv = {s in S | A(s) =
v}.
– the first term in the equation for Gain is just the entropy of the origi
nal collection S
– the second term is the expected value of the entropy after S is par
titioned using attribute A
( )
( , ) ( ) ( )v
v
v Values A
S
Gain S A Entropy S Entropy S
S
14. Calculating Informaiton gain
●
By applying similar calculation:-
●
Gain(Decision, Outlook) = 0.246
●
Gain(Decision, Temperature) = 0.029
●
Gain(Decision, Humidity) = 0.151
15. ●
As seen, outlook factor on decision produces
the highest score. That’s why, outlook decision
will appear in the root node of the tree.
16. Overcast outlook on decision..
Day Outlook Temp Humidity Wind Decision
3 Overcast Hot High Weak Yes
7 Overcast Cool Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
17. Overcast outlook on decision..
●
Basically, decision will always be yes if outlook
were overcast.
●
Here we don’t need for the further classification.
18. Sunny outlook on decision..
Day Outlook Temp Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes
19. Sunny outlook on decision...
●
Now we have 5 instances so, we will be
repeating above step and again find the
information gain.
●
1- Gain(Outlook=Sunny|Temperature) = 0.570
●
2- Gain(Outlook=Sunny|Humidity) = 0.970
●
3- Gain(Outlook=Sunny|Wind) = 0.019
At this point humidity is the decesion because it
produces highest score if outlook were sunny.
20. Sunny outlook on decision...
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
Day Outlook Temp. Humidity Wind Decision
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes
Decision will always be no if humidity were high.
Decision will always be yes if humidity were normal.
21. Rain Outlook On Decision...
Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No
22. Rain Outlook On Decision...
●
Again we will be finding Information Gain from
the above 5 instances.
●
1- Gain(Outlook=Rain | Temperature)
●
2- Gain(Outlook=Rain | Humidity)
●
3- Gain(Outlook=Rain | Wind)
●
Wind produces the highest score if outlook
were rain.
23. Rain Outlook On Decision...
Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
Decision will always be yes if wind were weak and outlook were rain.
Day Outlook Temp. Humidity Wind Decision
6 Rain Cool Normal Strong No
14 Rain Mild High Strong No
Decision will be always no if wind were strong and outlook were rain.