Week02 answer
Upcoming SlideShare
Loading in...5
×
 

Week02 answer

on

  • 525 views

 

Statistics

Views

Total Views
525
Views on SlideShare
524
Embed Views
1

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Week02 answer Week02 answer Document Transcript

  • Week02 The Author November 24, 20091 Exercise 12 Exercise 2 • Intuitively, a1 has a higher information gain. The value of a2 has equal distribution for both classes, which shows no discriminative ability. • H(class) = − 1 log2 ( 1 ) − 1 log2 ( 2 ) = 1 2 2 2 1 • Gain for a1 H(a1 , true) = − 1 log2 ( 1 ) − 2 log2 ( 3 ) = 0.9183 3 3 3 2 H(a1 , f alse) = − 3 log2 ( 3 ) − 3 log2 ( 2 ) = 0.9183 1 1 2 3 H(a1 ) = 2 ∗ 0.9183 + 1 ∗ 0.9183 = 0.9183 1 2 Gain(a1 ) = 1 − 0.9183 = 0.0817 1
  • studentID score class st1 9 yes st2 4 no st3 7 yes ... Table 1: Example of overfitting • Gain for a2 H(a2 , true) = − 1 log2 ( 1 ) − 1 log2 ( 2 ) = 1 2 2 2 1 H(a2 , f alse) = − 2 log2 ( 2 ) − 2 log2 ( 1 ) = 1 1 1 1 2 1 H(a2 ) = 2 ∗ 1 + 1 ∗ 1 = 1 2 Gain(a2 ) = 1 − 1 = 03 Exercise 3Assume we have following training example shown in Tab 3: For the attribute studentID,it’s unique for each instance. In the training data, we can easily get the target class valueas long as know the studentID. However, this can not be generalized to unseen data, i.e.,given a new studentID, we won’t be able to predict its class label.4 Exercise 4Example: if an attribute has n values, in an extreme case, we can have a data set of ninstances and each instance has a different value. Assume that we have a binary target,then for each value of the attribute, the entropy of each value of the attribute is H(Sv ) =−0 ∗ log2 0 − 1 ∗ log2 1 = 0 |Sv | H(S, A) = H(S) − ∗ 0 = H(S) (1) |S| v∈values(A)since H(S, A) <= H(S), H(S) is the maximum gain we can have, so that the attribute inthis extreme case will always be selected by the information gain criterion. However, thisis not a good choice. (Consider the over-fitting problem discussed in exercise 3)5 Exercise 5 • Assign the most common value among examples for the missing value, i.e., “true” for attribute a1 at instance 2. In this case, we have gain(a1 ) = H(class) − H(class, a1 ) = H(class) − ( 3 H([2, 1]) + 1 H([1, 0])) 4 4 2
  • • A new value “missing” can be assigned to attribute a1 for instance 2. In this case, we have gain(a1 ) = H(class) − ( 1 H([1, 1]) + 1 H([1, 0]) + 1 H([1, 0])) 2 4 4 3