SlideShare a Scribd company logo
1 of 30
Intro to k-NN
USED IN CLASSIFICATION PROBLEMS
We all love
watching
movies
2
Lets Talk about the Movies
Identify the Genre and Group them
Cluster – Action Vs Comedy
Action
Comedy
Comedy &
Action
Hotstar Page – a Good Example of K-NN
Thriller | Action | Drama | Romance | Comedy
Nearest Neighbor Classification
 The Nearest Neighbors are defined by their characteristics of
class and using them to classify the unlabeled set of data
 Suitable for Classification Tasks where the relationship
between features and target class are numerous, complex and
extremely difficult to understand.
 Computer Vision Applications
 Optical Character Recognition
 Predicting if a person will enjoy music or a movie based on recommendations
 Patterns in Genetic Data
 Detecting Diseases
Identify the Class of Red Star
Identify the Class of Red Star
What k-
NN does
Whether a new point should be
Classified as Red Point or Green Point,
This is where k-NN comes.
Euclidean Distance
Default Value of ‘K’ is 05
Brute Force
How do we choose neighbours?
Ans. Brute Force
Lets consider for simple case with two dimension plot. If we
look mathematically, the simple intuition is to calculate the
Euclidean distance from point of interest ( of whose class
we need to determine) to all the points in training set.
Then we take class with majority points. This is called Brute
Force method.
Remember that Brute Force performs worst when there are
large dimensions and large training sets. With larger
dimensions, it will take longer time. This is called the “curse
of dimensionality”.
k-NN Ground Realities
Blind Taste Experience case study
 Blind Taste Experience involves some people going in a restaurant and tasting food in darkness.
 In Mystery Meal, people are asked to mark the food on two parameters – Crunchy and Sweet
 Scale used – 1 to 10 (10 being highest and 01 being lowest)
 The food products are labeled as follows:
Tomato Family
 Notice the pattern of Veggies,
Fruits and Proteins
 Locating the tomato’s nearest
neighbor requires a distance
formula.
 k-NN uses EUCLIDEAN
DISTANCE to find the answer
How does It do that??
Interview Questions
In the given image, which would be the best value for
k assuming that the algorithm you are using is k-
Nearest Neighbour.
A) 3
B) 10
C) 20
D) 50
Interview Questions
In the given image, which would be the best value for
k assuming that the algorithm you are using is k-
Nearest Neighbour.
A) 3
B) 10
C) 20
D) 50
Solution: B
Validation error is the least when the value of k is 10.
So it is best to use this value of k
Interview Question
Which of the following option is true about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
Interview Question
Which of the following option is true about k-NN algorithm?
It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
Solution: C
We can also use k-NN for regression problems. In this case the prediction can be based on
the mean or the median of the k-most similar instances.
Interview Question
Which of the following statement is true about k-NN algorithm?
A) K-NN performs much better if all of the data have the same scale
B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large
C) K-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Interview Question
Which of the following statement is true about k-NN algorithm?
A) K-NN performs much better if all of the data have the same scale
B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very
large
C) K-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Solution: D
The above mentioned statements are assumptions of K-NN algorithm
Interview Question
Which of the following machine learning algorithm can be used for imputing
missing values of both categorical and continuous variables?
A) k-NN
B) Linear Regression
C) Logistics Regression
Interview Question
Which of the following machine learning algorithm can be used for imputing missing
values of both categorical and continuous variables?
A) k-NN
B) Linear Regression
C) Logistics Regression
Solution: A
k-NN algorithm can be used for imputing missing value of both categorical and continuous
variables.
Interview Question
Which of the following distance measure do we use in case of categorical variables in k-NN?
A) Hamming Distance
B) Euclidean Distance
C) Manhattan Distance
Interview Question
Which of the following distance measure do we use in case of categorical variables in k-NN?
A) Hamming Distance
B) Euclidean Distance
C) Manhattan Distance
Solution: A
Both Euclidean and Manhattan distances are used in case of continuous variables, whereas
hamming distance is used in case of categorical variable.
Interview Question
A company has build a kNN classifier that gets 100% accuracy on training data. When they
deployed this model on client side it has been found that the model is not at all accurate. Which
of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except the
model performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these
Interview Question
A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed
this model on client side it has been found that the model is not at all accurate. Which of the
following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except the model
performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these
Solution: A
In an overfitted module, it seems to be performing well on training data, but it is not generalized enough to
give the same results on a new data.

More Related Content

Similar to k-NN Algorithm.pptx

Lec14: Evaluation Framework for Medical Image Segmentation
Lec14: Evaluation Framework for Medical Image SegmentationLec14: Evaluation Framework for Medical Image Segmentation
Lec14: Evaluation Framework for Medical Image Segmentation
Ulaş Bağcı
 
Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 

Similar to k-NN Algorithm.pptx (20)

Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
 
Lec14: Evaluation Framework for Medical Image Segmentation
Lec14: Evaluation Framework for Medical Image SegmentationLec14: Evaluation Framework for Medical Image Segmentation
Lec14: Evaluation Framework for Medical Image Segmentation
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
Knn
KnnKnn
Knn
 
Introduction to k-Nearest Neighbors and Amazon SageMaker
Introduction to k-Nearest Neighbors and Amazon SageMaker Introduction to k-Nearest Neighbors and Amazon SageMaker
Introduction to k-Nearest Neighbors and Amazon SageMaker
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
 
A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
Ant colony search and heuristic techniques for optimal dispatch of energy sou...
Ant colony search and heuristic techniques for optimal dispatch of energy sou...Ant colony search and heuristic techniques for optimal dispatch of energy sou...
Ant colony search and heuristic techniques for optimal dispatch of energy sou...
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
 
Di35605610
Di35605610Di35605610
Di35605610
 
07 learning
07 learning07 learning
07 learning
 
ML-MCQ.pdf
ML-MCQ.pdfML-MCQ.pdf
ML-MCQ.pdf
 

Recently uploaded

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 

Recently uploaded (20)

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 

k-NN Algorithm.pptx

  • 1. Intro to k-NN USED IN CLASSIFICATION PROBLEMS
  • 3. Lets Talk about the Movies
  • 4. Identify the Genre and Group them
  • 5. Cluster – Action Vs Comedy Action Comedy Comedy & Action
  • 6. Hotstar Page – a Good Example of K-NN Thriller | Action | Drama | Romance | Comedy
  • 7. Nearest Neighbor Classification  The Nearest Neighbors are defined by their characteristics of class and using them to classify the unlabeled set of data  Suitable for Classification Tasks where the relationship between features and target class are numerous, complex and extremely difficult to understand.  Computer Vision Applications  Optical Character Recognition  Predicting if a person will enjoy music or a movie based on recommendations  Patterns in Genetic Data  Detecting Diseases
  • 8. Identify the Class of Red Star
  • 9. Identify the Class of Red Star
  • 10. What k- NN does Whether a new point should be Classified as Red Point or Green Point, This is where k-NN comes.
  • 12. Brute Force How do we choose neighbours? Ans. Brute Force Lets consider for simple case with two dimension plot. If we look mathematically, the simple intuition is to calculate the Euclidean distance from point of interest ( of whose class we need to determine) to all the points in training set. Then we take class with majority points. This is called Brute Force method. Remember that Brute Force performs worst when there are large dimensions and large training sets. With larger dimensions, it will take longer time. This is called the “curse of dimensionality”.
  • 13.
  • 15. Blind Taste Experience case study  Blind Taste Experience involves some people going in a restaurant and tasting food in darkness.  In Mystery Meal, people are asked to mark the food on two parameters – Crunchy and Sweet  Scale used – 1 to 10 (10 being highest and 01 being lowest)  The food products are labeled as follows:
  • 16. Tomato Family  Notice the pattern of Veggies, Fruits and Proteins  Locating the tomato’s nearest neighbor requires a distance formula.  k-NN uses EUCLIDEAN DISTANCE to find the answer
  • 17.
  • 18. How does It do that??
  • 19. Interview Questions In the given image, which would be the best value for k assuming that the algorithm you are using is k- Nearest Neighbour. A) 3 B) 10 C) 20 D) 50
  • 20. Interview Questions In the given image, which would be the best value for k assuming that the algorithm you are using is k- Nearest Neighbour. A) 3 B) 10 C) 20 D) 50 Solution: B Validation error is the least when the value of k is 10. So it is best to use this value of k
  • 21. Interview Question Which of the following option is true about k-NN algorithm? A) It can be used for classification B) It can be used for regression C) It can be used in both classification and regression
  • 22. Interview Question Which of the following option is true about k-NN algorithm? It can be used for classification B) It can be used for regression C) It can be used in both classification and regression Solution: C We can also use k-NN for regression problems. In this case the prediction can be based on the mean or the median of the k-most similar instances.
  • 23. Interview Question Which of the following statement is true about k-NN algorithm? A) K-NN performs much better if all of the data have the same scale B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large C) K-NN makes no assumptions about the functional form of the problem being solved A) 1 and 2 B) 1 and 3 C) Only 1 D) All of the above
  • 24. Interview Question Which of the following statement is true about k-NN algorithm? A) K-NN performs much better if all of the data have the same scale B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large C) K-NN makes no assumptions about the functional form of the problem being solved A) 1 and 2 B) 1 and 3 C) Only 1 D) All of the above Solution: D The above mentioned statements are assumptions of K-NN algorithm
  • 25. Interview Question Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? A) k-NN B) Linear Regression C) Logistics Regression
  • 26. Interview Question Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? A) k-NN B) Linear Regression C) Logistics Regression Solution: A k-NN algorithm can be used for imputing missing value of both categorical and continuous variables.
  • 27. Interview Question Which of the following distance measure do we use in case of categorical variables in k-NN? A) Hamming Distance B) Euclidean Distance C) Manhattan Distance
  • 28. Interview Question Which of the following distance measure do we use in case of categorical variables in k-NN? A) Hamming Distance B) Euclidean Distance C) Manhattan Distance Solution: A Both Euclidean and Manhattan distances are used in case of continuous variables, whereas hamming distance is used in case of categorical variable.
  • 29. Interview Question A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is not at all accurate. Which of the following thing might gone wrong? Note: Model has successfully deployed and no technical issues are found at client side except the model performance A) It is probably a overfitted model B) It is probably a underfitted model C) Can’t say D) None of these
  • 30. Interview Question A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is not at all accurate. Which of the following thing might gone wrong? Note: Model has successfully deployed and no technical issues are found at client side except the model performance A) It is probably a overfitted model B) It is probably a underfitted model C) Can’t say D) None of these Solution: A In an overfitted module, it seems to be performing well on training data, but it is not generalized enough to give the same results on a new data.