Upcoming SlideShare
Loading in...5
×

# Week02 answer

401

Published on

0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total Views
401
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Week02 answer

1. 1. Week02 The Author November 24, 20091 Exercise 12 Exercise 2 • Intuitively, a1 has a higher information gain. The value of a2 has equal distribution for both classes, which shows no discriminative ability. • H(class) = − 1 log2 ( 1 ) − 1 log2 ( 2 ) = 1 2 2 2 1 • Gain for a1 H(a1 , true) = − 1 log2 ( 1 ) − 2 log2 ( 3 ) = 0.9183 3 3 3 2 H(a1 , f alse) = − 3 log2 ( 3 ) − 3 log2 ( 2 ) = 0.9183 1 1 2 3 H(a1 ) = 2 ∗ 0.9183 + 1 ∗ 0.9183 = 0.9183 1 2 Gain(a1 ) = 1 − 0.9183 = 0.0817 1
2. 2. studentID score class st1 9 yes st2 4 no st3 7 yes ... Table 1: Example of overﬁtting • Gain for a2 H(a2 , true) = − 1 log2 ( 1 ) − 1 log2 ( 2 ) = 1 2 2 2 1 H(a2 , f alse) = − 2 log2 ( 2 ) − 2 log2 ( 1 ) = 1 1 1 1 2 1 H(a2 ) = 2 ∗ 1 + 1 ∗ 1 = 1 2 Gain(a2 ) = 1 − 1 = 03 Exercise 3Assume we have following training example shown in Tab 3: For the attribute studentID,it’s unique for each instance. In the training data, we can easily get the target class valueas long as know the studentID. However, this can not be generalized to unseen data, i.e.,given a new studentID, we won’t be able to predict its class label.4 Exercise 4Example: if an attribute has n values, in an extreme case, we can have a data set of ninstances and each instance has a diﬀerent value. Assume that we have a binary target,then for each value of the attribute, the entropy of each value of the attribute is H(Sv ) =−0 ∗ log2 0 − 1 ∗ log2 1 = 0 |Sv | H(S, A) = H(S) − ∗ 0 = H(S) (1) |S| v∈values(A)since H(S, A) <= H(S), H(S) is the maximum gain we can have, so that the attribute inthis extreme case will always be selected by the information gain criterion. However, thisis not a good choice. (Consider the over-ﬁtting problem discussed in exercise 3)5 Exercise 5 • Assign the most common value among examples for the missing value, i.e., “true” for attribute a1 at instance 2. In this case, we have gain(a1 ) = H(class) − H(class, a1 ) = H(class) − ( 3 H([2, 1]) + 1 H([1, 0])) 4 4 2
3. 3. • A new value “missing” can be assigned to attribute a1 for instance 2. In this case, we have gain(a1 ) = H(class) − ( 1 H([1, 1]) + 1 H([1, 0]) + 1 H([1, 0])) 2 4 4 3