K-Nearest Neighbor (KNN)
N Nasurudeen Ahamed,
Assistant Professor,
CSE
K-Nearest Neighbor (KNN)
• Definition :
• K-Nearest Neighbors is one of the most basic yet essential classification
algorithms in Machine Learning.
• Example :
• It belongs to the supervised learning domain and finds intense application
in pattern recognition, data mining and intrusion detection.
K-Nearest Neighbor (KNN)
• Lazy Learner : Training Data is Stored.(Example : KNN)
• KNN: Used for Classification, Prediction for test data is done on the basis
of it’s neighbor.
• K is a integer value, Example : if K=1 , K is assigned to the class of single
nearest neighbor.
K-Nearest Neighbor (KNN)
• Calculate the distance between the query instance and all the training samples.
• The most popular distance measure is Euclidean distance.
• Euclidean distance is calculated as the square root of the sum of the squared differences
between a new point (x) and an existing point (xi) across all input attributes j.
• Euclidean Distance(x, xi) = sqrt( sum( (xj – xij)^2 ) )
• i.e.
KNN – Example 1
• We have data from the questionnaires survey (to ask people opinion) and objective testing with two
attributes (acid durability and strength) to classify whether a special paper tissue is good or not.
Here is four training samples.
• Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7.
Without another expensive survey, can we guess what the classification of this new tissue is?
Name Acid Durability Strength Class
Type 1 7 7 Bad
Type 2 7 4 Bad
Type 3 3 4 Good
Type 4 1 4 Good
Solution
• Calculate Distance Measure : Euclidean Distance
Name Acid
Durability
Strength Class Distance
Type 1 7 7 Bad Sqrt ((7-3)2 + (7-7)2) = 4
Type 2 7 4 Bad
Do Similarly
Type 3 3 4 Good
Type 4 1 4 Good
Solution
• Rank The Attributes
Name Acid Durability Strength Class Distance Rank
Type 1 7 7 Bad 4 3
Type 2 7 4 Bad 5 4
Type 3 3 4 Good 3 1
Type 4 1 4 Good 3.6 2
Solution
• If, K = 3
Name Acid Durability Strength Class Distance Rank
Type 1 7 7 Bad 4 3
Type 2 7 4 Bad 5 4
Type 3 3 4 Good 3 1
Type 4 1 4 Good 3.6 2
Solution
• Based on Three Neighbors , 2 Goods and 1 Bad . i.e. Majority -> Good
• We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass
laboratory test with X1 = 3 and X2 = 7 is included in Good category.

Knn

  • 1.
    K-Nearest Neighbor (KNN) NNasurudeen Ahamed, Assistant Professor, CSE
  • 2.
    K-Nearest Neighbor (KNN) •Definition : • K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning. • Example : • It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection.
  • 3.
    K-Nearest Neighbor (KNN) •Lazy Learner : Training Data is Stored.(Example : KNN) • KNN: Used for Classification, Prediction for test data is done on the basis of it’s neighbor. • K is a integer value, Example : if K=1 , K is assigned to the class of single nearest neighbor.
  • 4.
    K-Nearest Neighbor (KNN) •Calculate the distance between the query instance and all the training samples. • The most popular distance measure is Euclidean distance. • Euclidean distance is calculated as the square root of the sum of the squared differences between a new point (x) and an existing point (xi) across all input attributes j. • Euclidean Distance(x, xi) = sqrt( sum( (xj – xij)^2 ) ) • i.e.
  • 5.
    KNN – Example1 • We have data from the questionnaires survey (to ask people opinion) and objective testing with two attributes (acid durability and strength) to classify whether a special paper tissue is good or not. Here is four training samples. • Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7. Without another expensive survey, can we guess what the classification of this new tissue is? Name Acid Durability Strength Class Type 1 7 7 Bad Type 2 7 4 Bad Type 3 3 4 Good Type 4 1 4 Good
  • 6.
    Solution • Calculate DistanceMeasure : Euclidean Distance Name Acid Durability Strength Class Distance Type 1 7 7 Bad Sqrt ((7-3)2 + (7-7)2) = 4 Type 2 7 4 Bad Do Similarly Type 3 3 4 Good Type 4 1 4 Good
  • 7.
    Solution • Rank TheAttributes Name Acid Durability Strength Class Distance Rank Type 1 7 7 Bad 4 3 Type 2 7 4 Bad 5 4 Type 3 3 4 Good 3 1 Type 4 1 4 Good 3.6 2
  • 8.
    Solution • If, K= 3 Name Acid Durability Strength Class Distance Rank Type 1 7 7 Bad 4 3 Type 2 7 4 Bad 5 4 Type 3 3 4 Good 3 1 Type 4 1 4 Good 3.6 2
  • 9.
    Solution • Based onThree Neighbors , 2 Goods and 1 Bad . i.e. Majority -> Good • We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7 is included in Good category.