k-nearest neighbour Machine Learning.pdf

K-Nearest Neighbor
CSE 4621 | Machine Learning| BScTE | Summer 22-23
Sabbir Ahmed
Assistant Professor, Dept of CSE, IUT
sabbirahmed@iut-dhaka.edu
1

Nearest Neighbor Classifier
2
Memorize all data
and labels
Predict the label of
the most similar
training image

Distance Metric to compare images
3
L1 distance:

• Memorize training data
• For each test image:
• Find nearest training image
• Return label of nearest
image
4
Nearest Neighbor Classifier
Q: With N examples,
how fast is training?
A: O(1)
Q: With N examples,
how fast is testing?
A: O(N)

7
Nearest Neighbor Decision Boundaries

8
x1
Nearest neighbors
in two dimensions
x0

9
x1
Nearest neighbors
in two dimensions
Points are training
examples; colors
give training labels
x0

10
x1
Nearest neighbors
in two dimensions
Points are training
examples; colors
Background colors
give the category
a test point would
be assigned
x
x0

11
x1
Nearest neighbors
in two dimensions
Points are training
examples; colors
Decision boundary
is the boundary
between two
classification regions
Background colors
give the category
a test point would
be assigned
x
x0

12
x1
Nearest neighbors
in two dimensions
Points are training
examples; colors
Background colors
give the category
a test point would
x
Decision boundary
is the boundary
between two
be assigned x0
Decision boundaries
can be noisy;
affected by outliers

13
x1
Nearest neighbors
in two dimensions
Points are training
examples; colors
Background colors
give the category
a test point would
x
Decision boundary
is the boundary
between two
be assigned x0
Decision boundaries
can be noisy;
affected by outliers
How to smooth out
decision boundaries?
Use more neighbors!

14
K-Nearest Neighbors
K = 1
Instead of copying label from nearest neighbor,
take majority vote from K closest points
K = 3

15
K-Nearest Neighbors
K = 1
Using more neighbors helps smooth
out rough decision boundaries
K = 3

16
K-Nearest Neighbors
K = 1
Using more neighbors helps
reduce the effect of outliers
K = 3

K-Nearest Neighbors
K = 1
When K > 1 there can be
ties between classes.
Need to break somehow!
K = 3

K-Nearest Neighbors: Distance Metric
L1 (Manhattan) distance L2 (Euclidean) distance
K = 1
K = 1

K-Nearest Neighbors:
Web Demo
http://vision.stanford.edu/teaching/cs231n-demos/knn/
Interactively move points around
and see decision boundaries change
Play with L1 vs L2 metrics
Play with changing number of
training points, value of K

Hyperparameters
What is the best value of K to use?
What is the best distance metric to use?
These are examples of hyperparameters: choices about our
learning algorithm that we don’t learn from the training
data; instead we set them at the start of the learning process

Hyperparameters
What is the best value of K to use?
What is the best distance metric to use?
These are examples of hyperparameters: choices about our
learning algorithm that we don’t learn from the training
data; instead we set them at the start of the learning process
Very problem-dependent. In general need to try them all and
see what works best for our data / task.

Setting Hyperparameters
Idea #1: Choose hyperparameters that
work best on the data
Your Dataset

BAD: K = 1 always works
perfectly on training data
Your Dataset

Idea #2: Split data into train and test, choose
hyperparameters that work best on test data
Your Dataset
train test

BAD: No idea how algorithm
will perform on new data
Your Dataset
train test

BAD: No idea how algorithm
will perform on new data
Your Dataset
train test
Idea #3: Split data into train, val, and test; choose
hyperparameters on val and evaluate on test
Better!
train validation test

Your Dataset
fold 1 fold 2 fold 3 fold 4 fold 5 test
Idea #4: Cross-Validation: Split data into folds, try each
fold as validation and average the results
Useful for small datasets, but (unfortunately) not used too frequently in deep learning

Example of 5-fold cross-validation for
the value of k.
Each point: single outcome.
The line goes through the mean, bars
indicated standard deviation
(Seems that k ~ 7 works best
for this data)

Mathematical explanation of K-Nearest
Neighbour
• It’s one of the Supervised learning algorithm mostly used for classification
of data on the basis how it’s neighbour are classified.
• KNN stores all available cases and classifies new cases based on a
similarity measure.
• K in KNN is a parameter that refers to the number of the nearest
neighbours to include in the majority voting process.
29

• How do we choose K?
• Sqrt(n), where n is a total number of data points(if in case n is even we have to
make the value odd by adding 1 or subtracting 1 that helps in select better)
• When to use KNN?
• We can use KNN when Dataset is labelled and noise-free and it’s must be small
because KNN is a “Lazy learner”. Let’s understand KNN algorithm with the help of
an example
30

• Here male is denoted with numeric value 0 and female with 1.
• Let’s find in which class of people Angelina will lie whose k factor is 3 and
age is 5.
• So we have to find out the distance using
d=√((x2-x1)²+(y2-y1)²) to find the distance between any two points.
32

• So let’s find out the distance between Ajay and Angelina using formula
d=√((age2-age1)²+(gender2-gender1)²)
d=√((5-32)²+(1-0)²)
d=√729+1
d=27.02
Similarly, we find out all distance one by one.
33

• So the value of k factor is 3 for Angelina. And the closest to 3 is 9,10,10.5
that is closest to Angelina are Zaira, Smith and Michael.
Zaira 9 cricket
Michael 10 cricket
Smith 10.5 football
• So according to KNN algorithm, Angelina will be in the class of people
who like cricket. So this is how KNN algorithm works.
35

Reference
• https://www.geeksforgeeks.org/mathematical-explanation-of-k-nearest-
neighbour/
36

k-nearest neighbour Machine Learning.pdf

More Related Content

Similar to k-nearest neighbour Machine Learning.pdf

Recently uploaded

k-nearest neighbour Machine Learning.pdf