2. Assignment 6
Exercise 1: Lazy Learning
How is lazy learning different from the other machine learning approaches we have covered
so far?
In the previous lectures and exercises we’ve been working on how to predict various types of data. The
approach we've used so far is to split data into a test set and training set. With these sets we were able
to determine several solutions for handling the data which we eventually process.
The distinction between the Lazy learning system and the previous systems is that a Lazy system stores
it's training data and uses it directly on the data which we process further on. In the previous lectures the
training data was necessary in the first place. Thereafter we’ve determined several conditions which
gave us the opportunity to predict new values/instances. In conclusion we can state that we’ve
determined a formula to handle new data and after words the training data were we base our formula’s
on is not necessary anymore. With a lazy learning system we always refresh our training data to be able
to predict the best. A disadvantage of this ‘lazy learning system’ is that it takes a lot of storage to handle
these systems.
2
3. Assignment 6
Exercise 2: k Nearest Neighbor Classification
2.1 How does a form of overfitting affect kNN classification, and what can be done
to overcome this?
The kNN classifier is used to classify new instances for an existing dataset. To classify new instances
we search for the nearest point/neighbor of this instance.
For using this kNN algorithm a certain variable is taken into account. This variable represents the
number of nearest points which we need to take into account for classifying new instances. In case
we’re only looking for one (or a view) nearest point(s) it’s easily possible to create overfitting.
To describe the overfitting of the kNN algorithm I’ve used these three examples which are listed below.
Example 1:
In this example we try to look for a plus or a minus which is closes to the red dot.
We use a k (number of points which we take into account) of 1. Because the red
dot is closest to the plus we ‘generalize’ the red dot to a plus value.
Example 2:
In this example we try to look for a plus or a minus which is closes to the red dot.
We use a k (number of points which we take into account) of 1. Because the red
dot is closest to the min we ‘generalize’ the red dot to a minus value.
Example 3:
In this example we try to look for a plus or a minus which is closes to the red
dot. We use a k (number of points which we take into account) of 3. We now
check for the three nearest neighbors so now it will be harder to predict the
value of the red dot. If we draw a circular line as it is listed in the screenshot you
see that the same dot from example 1 has now a plus value in stead of a minus
because there are two plusses and one minus in the closest region of the red
dot.
What you can see within example 2 and 3 is that the red dot was classified as a minus. But when we
increased the ‘k’ value it became clear that this red dot it’s value has been swapped now. In conclusion
we can state that we needed to increase the k value to increase the sensitivity and correctness of our
classifier. If we take a lower ‘k’ value then the sensitivity of our classifier goes upwards and
generalizing/overfitting is more likely to occur.
3
4. Assignment 6
2.2 Given the following data:
How does the kNN algorithm classify instances 7 and 8, with k
= 1 and k = 3? You can use simple majority voting and the
Manhattan distance.
I’ve chosen to use the manhattan distance measure method to
classify instance 7 and 8.
First we compare instance 7 with all other attributes. We note all
distances.
Instance 1
0.25 – 0.25 = 0.00
0.55 – 0.25 = 0.30
Distance = 0.30
Instance 2
0.25 – 0.25 = 0.00
0.55 – 0.75 = -0.20
Distance = |-0.20| = 0.20
Instance 3
0.25 – 0.50 = -0.25
0.55 – 0.25 = 0.30
Distance = |-0.25| + 0.30 = 0.55
Instance 4
0.25 – 0.50 = -0.25
0.55 – 0.75 = -0.20
Distance = |-0.25| + |-0.20| = 0.45
Instance 5
0.25 – 0.75 = -0.50
0.55 – 0.50 = 0.05
Distance = |-0.50| + 0.05 = 0.55
Instance 6
0.25 – 0.75 = -0.50
0.55 – 1.00 = -0.45
Distance = |-0.50| + |-0.45| = 0.95
1. If we use a k-value of 1 we check for just one instance which is closest to instance 7. This is
done by checking the instance which has the shortest distance. This instance is instance 2. So
we classify instance 7 with a plus(+).
2. Now we check for a k value of 3. So for the three nearest points (with the shortest distance to
instance 7). These instances are 1,2 and 4. We classify instance 7 with a plus(+) because 2/3
of the nearest neighbors are classified as a plus.
4
5. Assignment 6
Now we compare instance 8 with all other attributes. We note all distances.
Instance 1
0.75 – 0.25 = 0.50
0.80 – 0.25 = 0.55
Distance = 1.05
Instance 2
0.75 – 0.25 = 0.50
0.80 – 0.75 = 0.05
Distance = 0.55
Instance 3
0.75 – 0.50 = 0.25
0.80 – 0.25 = 0.55
Distance = 0.80
Instance 4
0.75 – 0.50 = 0.25
0.80 – 0.75 = 0.05
Distance = 0.30
Instance 5
0.75 – 0.75 = 0.00
0.80 – 0.50 = 0.30
Distance = 0.30
Instance 6
0.75 – 0.75 = 0.00
0.80 – 1.00 = -0.2
Distance = |-0.20| = 0.20
1. If we use a k-value of 1 we check for just one instance which is closest to instance 8. This is
done by checking the instance which has the shortest distance. This instance is instance 6. So
we classify instance 8 with a plus(+).
2. Now we check for a k-value of 3. So the three points with the shortest distance to instance 8.
These instances are 4,5 and 6. We classify instance 8 with a minus(-) because 2/3 of the
nearest neighbors are classified as a minus.
5
6. Assignment 6
2.3 Given the dataset from question 2.3, how are instances 7
and 8 classified when using the prototype classifier, and what
are the coordinates of the prototypes (i.e., x1 and x2 values?)
For classifying both instance 7 and 8 with the prototype classifier we
need to define a ‘super’-plus and a ‘super’-minus. These two values
will represent the average value for all plus and minus classified
instances. These are represented as a new ‘average’-instance.
First we calculate a ‘super’-plus:
Super-plus x1 value = ((0.25 + 0.25 + 0.75) / 3)
Super-plus x1 value = 0.46
Super-plus x2 value = ((0.25 + 0.75 + 1.00) / 3)
Super-plus x2 value = 0.67
Now we calculate a ‘super’-minus:
Super-minus x1 value = ((0.50 + 0.50 + 0.75) / 3)
Super-minus x1 value = 0.58
Super-minus x2 value = ((0.25 + 0.75 + 0.50) / 3)
Super-minus x2 value = 0.50
Classifying instance 7
Now we use the manhattan distance to compute the total difference of the super-plus compared to
instance 7:
Distance x1 = Super-plus x1 value – instance_7 x1 value =
Distance x1 = 0.46 – 0.25 = 0.21
Distance x2 = Super-plus x2 value – instance_7 x2 value
Distance x2 = 0.67 – 0.55 = 0.17
Distance = 0.21 + 0.17 = 0.38
And finally we compute the total difference between the super minus and instance 7:
Distance x1 = Super-minus x1 value – instance_7 x1 value =
Distance x1 = 0.58 – 0.25 = 0.33
Distance x2 = Super- minus x2 value – instance_7 x2 value
Distance x2 = 0.50 – 0.55 = 0.05
Distance = 0.3 + 0.55 = 0.85
We see that the smallest distance is between the super plus and instance 7. So we classify instance 7
as a plus(+).
6
7. Assignment 6
Classifying instance 8
Now we use the manhattan distance to compute the total difference of the super-plus compared to
instance 8:
Distance x1 = Super-plus x1 value – instance_8 x1 value =
Distance x1 = 0.46 – 0.75 = -0.29
Distance x2 = Super-plus x2 value – instance_8 x2 value
Distance x2 = 0.67 – 0.80 = -0.13
Distance = |-0.29| + |-0.13| = 0.42
And finally we compute the total difference between the super minus and instance 8:
Distance x1 = Super-minus x1 value – instance_8 x1 value =
Distance x1 = 0.58 – 0.75 = -0.17
Distance x2 = Super- minus x2 value – instance_8 x2 value
Distance x2 = 0.50 – 0.80 = -0.30
Distance = |-0.17| + |-0.30| = 0.47
We see that the smallest distance is between the super plus and instance 8. So we classify instance 8
as a plus(+).
7