SlideShare a Scribd company logo
K-Nearest Neighbor
What is k-Nearest Neighbor ?
• a non-parametric method used for classification and regression
• Nearest Neighbor classifiers are defined by their characteristic of
classifying unlabeled examples by assigning them the class of similar
labeled examples
• k nearest neighbors is a simple algorithm that stores all available
cases and classifies new cases by a majority vote of its k neighbors
Continued…
• This algorithm segregates unlabeled data points into well defined
groups
• k-NN is a type of instance-based learning, or lazy learning
• It does not use the training data points to do any generalization (all
the training data is needed during the testing phase)
K-NN Algorithm
• Consider an example of investigating a food type of ingredient tomato, checking
whether it is in fruit or protein food or vegetable
• We had created a dataset in which we recorded our impressions of a number of
ingredients we tasted
• To keep things simple, we rated only two features of each ingredient, first is a
measure from 1 to 10 of how crunchy the ingredient is, second is a 1 to 10 score
of how sweet
• The k-NN algorithm treats the features as coordinates in a
multidimensional feature space
• Here we can notice the pattern, similar types of food tend to be
grouped closely together
• Is tomato a fruit or vegetable?
• To answer this question we have to use Nearest Neighbor approach to
determine which class is a better fit
• Measuring similarity with distance:
• Locating the tomato’s nearest neighbors requires a distance function
• Traditionally, the k-NN algorithm uses Euclidean distance
• Suppose we are measuring distance between tomato (sweetness = 6,
crunchiness = 4), and the green bean (sweetness = 3, crunchiness = 7)
Dist.(tomato, green bean) = (6 – 3)2 + (4 - 7)2 = 4.2
• Similarly we can find the distance of tomato with other food types ingredients
• The distance between tomato and both fruits orange and grape is smaller than
remaining food types, hence tomato can be labeled as a fruit
Choosing an appropriate k
• Determining the value of k plays a significant role in determining the
efficacy of the model and generalize to future data
• Choosing a larger k value over fits the data, and smaller value of k
causes under fitting to the training data
• One common practice is to begin with k equal to the square root of
the number of training examples
• Test several k values on a variety of test datasets and choose the one
that delivers the best classification performance
Preparing data for use with k-NN
• Features are typically transformed to a standard range prior to
applying the k-NN algorithm
• Scale used for the values for each variable might be different
• The best practice is to normalize the data and transform all the values
to a common scale
• Two traditional methods for normalization:
min-max normalization
z-score standardization
1) min-max normalization:
2) Z-score standardization:
Case Study: Detecting Prostate Cancer
• The data set consists of 100 observations and 10 variables (out of
which 8 numeric variables and one categorical variable and is ID)
which are as follows:
1. Radius
2. Texture
3. Perimeter
4. Area
5. Smoothness
6. Compactness
7. Symmetry
8. Fractal dimension
• Applying normalization and prediction:
• Model performance for different k values
For k = 10; accuracy = 60% For k = 9; accuracy = 68%
K-NN Algorithm: Pros and Cons
• Pros:
• The algorithm is highly unbiased in nature
• makes no prior assumption of the underlying data.
• Being simple and effective in nature, it is easy to implement and has gained good
popularity.
• Cons:
• Doesn’t create a model since there’s no abstraction process involved.
• The training process is really fast but the prediction time is pretty high with useful
insights missing at times.
• Building this algorithm requires time to be invested in data preparation (especially
treating the missing data and categorical features) to obtain a robust model.
Applications of k-NN
Agriculture:
• For instance such as simulating daily precipitations and weather variables
• Evaluation of forest inventories and estimating forest variables
• Climate forecasting and estimating soil water parameters
Finance:
• Stock market forecasting, Currency exchange rate, Bank bankruptcies, Credit rating,
Loan Management, Bank customer profiling
Medicine:
• Predict whether a patient, hospitalized due to a heart attack, will have a second
attack
• Estimating the amount of glucose in the blood of a diabetic person
THANK YOU !!

More Related Content

What's hot

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
Krish_ver2
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
Tamer Ahmed Farrag, PhD
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
Anmol Dwivedi
 
Supervised learning
  Supervised learning  Supervised learning
Supervised learning
Learnbay Datascience
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
Zaffar Ahmed Shaikh
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
Mehrnaz Faraz
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
International Institute of Information Technology (I²IT)
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
rameswara reddy venkat
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
Bangalore
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
 

What's hot (20)

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
 
Supervised learning
  Supervised learning  Supervised learning
Supervised learning
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 

Similar to K nearest neighbor

Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
PriyadharshiniG41
 
Nbvtalkonfeatureselection
NbvtalkonfeatureselectionNbvtalkonfeatureselection
Nbvtalkonfeatureselection
Nagasuri Bala Venkateswarlu
 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
Hein Min Htike
 
MLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for AnomaliesMLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for Anomalies
BigML, Inc
 
Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)
tanujoshi98
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
gamingzonedead880
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
Damian R. Mingle, MBA
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
khairulhuda242
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
pannicle
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
Kishor Datta Gupta
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
 
Modelling the effluent quality utilizing optical monitoring
Modelling the effluent quality utilizing optical monitoringModelling the effluent quality utilizing optical monitoring
Modelling the effluent quality utilizing optical monitoring
CLIC Innovation Ltd
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
Shiwani Gupta
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
tttiba
 
Query dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighborQuery dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighbor
iyo
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
bintis1
 
Feature selection
Feature selectionFeature selection
Feature selection
dkpawar
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
Alon Bochman, CFA
 

Similar to K nearest neighbor (20)

Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Nbvtalkonfeatureselection
NbvtalkonfeatureselectionNbvtalkonfeatureselection
Nbvtalkonfeatureselection
 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
 
MLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for AnomaliesMLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for Anomalies
 
Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)Diabetes prediction with r(using knn)
Diabetes prediction with r(using knn)
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Modelling the effluent quality utilizing optical monitoring
Modelling the effluent quality utilizing optical monitoringModelling the effluent quality utilizing optical monitoring
Modelling the effluent quality utilizing optical monitoring
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Query dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighborQuery dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighbor
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

K nearest neighbor

  • 2. What is k-Nearest Neighbor ? • a non-parametric method used for classification and regression • Nearest Neighbor classifiers are defined by their characteristic of classifying unlabeled examples by assigning them the class of similar labeled examples • k nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors
  • 3. Continued… • This algorithm segregates unlabeled data points into well defined groups • k-NN is a type of instance-based learning, or lazy learning • It does not use the training data points to do any generalization (all the training data is needed during the testing phase)
  • 4. K-NN Algorithm • Consider an example of investigating a food type of ingredient tomato, checking whether it is in fruit or protein food or vegetable • We had created a dataset in which we recorded our impressions of a number of ingredients we tasted • To keep things simple, we rated only two features of each ingredient, first is a measure from 1 to 10 of how crunchy the ingredient is, second is a 1 to 10 score of how sweet
  • 5. • The k-NN algorithm treats the features as coordinates in a multidimensional feature space
  • 6. • Here we can notice the pattern, similar types of food tend to be grouped closely together
  • 7. • Is tomato a fruit or vegetable? • To answer this question we have to use Nearest Neighbor approach to determine which class is a better fit
  • 8. • Measuring similarity with distance: • Locating the tomato’s nearest neighbors requires a distance function • Traditionally, the k-NN algorithm uses Euclidean distance • Suppose we are measuring distance between tomato (sweetness = 6, crunchiness = 4), and the green bean (sweetness = 3, crunchiness = 7) Dist.(tomato, green bean) = (6 – 3)2 + (4 - 7)2 = 4.2 • Similarly we can find the distance of tomato with other food types ingredients • The distance between tomato and both fruits orange and grape is smaller than remaining food types, hence tomato can be labeled as a fruit
  • 9. Choosing an appropriate k • Determining the value of k plays a significant role in determining the efficacy of the model and generalize to future data • Choosing a larger k value over fits the data, and smaller value of k causes under fitting to the training data • One common practice is to begin with k equal to the square root of the number of training examples • Test several k values on a variety of test datasets and choose the one that delivers the best classification performance
  • 10. Preparing data for use with k-NN • Features are typically transformed to a standard range prior to applying the k-NN algorithm • Scale used for the values for each variable might be different • The best practice is to normalize the data and transform all the values to a common scale • Two traditional methods for normalization: min-max normalization z-score standardization
  • 11. 1) min-max normalization: 2) Z-score standardization:
  • 12. Case Study: Detecting Prostate Cancer • The data set consists of 100 observations and 10 variables (out of which 8 numeric variables and one categorical variable and is ID) which are as follows: 1. Radius 2. Texture 3. Perimeter 4. Area 5. Smoothness 6. Compactness 7. Symmetry 8. Fractal dimension
  • 13.
  • 14. • Applying normalization and prediction:
  • 15. • Model performance for different k values For k = 10; accuracy = 60% For k = 9; accuracy = 68%
  • 16. K-NN Algorithm: Pros and Cons • Pros: • The algorithm is highly unbiased in nature • makes no prior assumption of the underlying data. • Being simple and effective in nature, it is easy to implement and has gained good popularity. • Cons: • Doesn’t create a model since there’s no abstraction process involved. • The training process is really fast but the prediction time is pretty high with useful insights missing at times. • Building this algorithm requires time to be invested in data preparation (especially treating the missing data and categorical features) to obtain a robust model.
  • 17. Applications of k-NN Agriculture: • For instance such as simulating daily precipitations and weather variables • Evaluation of forest inventories and estimating forest variables • Climate forecasting and estimating soil water parameters Finance: • Stock market forecasting, Currency exchange rate, Bank bankruptcies, Credit rating, Loan Management, Bank customer profiling Medicine: • Predict whether a patient, hospitalized due to a heart attack, will have a second attack • Estimating the amount of glucose in the blood of a diabetic person