Week02                                             The Author                                        November 24, 20091   ...
studentID      score     class                                      st1            9         yes                          ...
• A new value “missing” can be assigned to attribute a1 for instance 2. In this case,  we have  gain(a1 ) = H(class) − ( 1...
Upcoming SlideShare
Loading in …5
×

Week02 answer

599 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
599
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Week02 answer

  1. 1. Week02 The Author November 24, 20091 Exercise 12 Exercise 2 • Intuitively, a1 has a higher information gain. The value of a2 has equal distribution for both classes, which shows no discriminative ability. • H(class) = − 1 log2 ( 1 ) − 1 log2 ( 2 ) = 1 2 2 2 1 • Gain for a1 H(a1 , true) = − 1 log2 ( 1 ) − 2 log2 ( 3 ) = 0.9183 3 3 3 2 H(a1 , f alse) = − 3 log2 ( 3 ) − 3 log2 ( 2 ) = 0.9183 1 1 2 3 H(a1 ) = 2 ∗ 0.9183 + 1 ∗ 0.9183 = 0.9183 1 2 Gain(a1 ) = 1 − 0.9183 = 0.0817 1
  2. 2. studentID score class st1 9 yes st2 4 no st3 7 yes ... Table 1: Example of overfitting • Gain for a2 H(a2 , true) = − 1 log2 ( 1 ) − 1 log2 ( 2 ) = 1 2 2 2 1 H(a2 , f alse) = − 2 log2 ( 2 ) − 2 log2 ( 1 ) = 1 1 1 1 2 1 H(a2 ) = 2 ∗ 1 + 1 ∗ 1 = 1 2 Gain(a2 ) = 1 − 1 = 03 Exercise 3Assume we have following training example shown in Tab 3: For the attribute studentID,it’s unique for each instance. In the training data, we can easily get the target class valueas long as know the studentID. However, this can not be generalized to unseen data, i.e.,given a new studentID, we won’t be able to predict its class label.4 Exercise 4Example: if an attribute has n values, in an extreme case, we can have a data set of ninstances and each instance has a different value. Assume that we have a binary target,then for each value of the attribute, the entropy of each value of the attribute is H(Sv ) =−0 ∗ log2 0 − 1 ∗ log2 1 = 0 |Sv | H(S, A) = H(S) − ∗ 0 = H(S) (1) |S| v∈values(A)since H(S, A) <= H(S), H(S) is the maximum gain we can have, so that the attribute inthis extreme case will always be selected by the information gain criterion. However, thisis not a good choice. (Consider the over-fitting problem discussed in exercise 3)5 Exercise 5 • Assign the most common value among examples for the missing value, i.e., “true” for attribute a1 at instance 2. In this case, we have gain(a1 ) = H(class) − H(class, a1 ) = H(class) − ( 3 H([2, 1]) + 1 H([1, 0])) 4 4 2
  3. 3. • A new value “missing” can be assigned to attribute a1 for instance 2. In this case, we have gain(a1 ) = H(class) − ( 1 H([1, 1]) + 1 H([1, 0]) + 1 H([1, 0])) 2 4 4 3

×