Submit to Blackboard in electronic form before 10 am, November 25, 2010.
For questions, please contact the teaching assistants
Spyros Martzoukos: S.Martzoukos@uva.nl (English only!)
Jiyin He: firstname.lastname@example.org (English only!)
Exercise 1: Information Gain and Attributes with
Information gain is deﬁned as
gain(S, A) = H(S) − |S| H(Sv )
Following to this deﬁnition, information gain favors attributes with many values.
Why? Give an example.
Exercise 2: Missing Attribute Values
Consider the following set of training instances. Instance 2 has a missing value
for attribute a1 .
instance a1 a2 class
1 true true +
2 ? true +
3 true false -
4 false false +
Apply at least two diﬀerent strategies for dealing with missing attribute values
and show how they work in this concrete example.
Exercise 3: Perceptrons
What is the function of the learning rate in the perceptron training rule?
What kind of Boolean functions can be modeled with perceptrons and which
Boolean functions can not be modeled and why?
Assume the following set of instances with the weights: w0 = 0.4 and w1 = 0.8.
The threshold is 0.
instance x0 x1 target class
1 1.0 1.0 1
2 1.0 0.5 1
3 1.0 -0.8 -1
4 1.0 -0.2 -1
What are the output values for each instance before the threshold function
is applied? What is the accuracy of the model when applying the threshold
Exercise 4: Gradient Descent
Consider the data in Exercise 1.3. Apply the gradient descent algorithm and
compute the weight updates for one iteration. You can assume the same initial
weights, threshold, and learning rate as in Exercise 3.3.
Exercise 5: Stochastic Gradient Descent
Consider the data in Exercise 3.3. Apply the stochastic gradient descent algo-
rithm and compute the weight updates for one iteration. You can assume the
sae initial weights, threshold, and learning rate as in Exercise 3.3.