SlideShare a Scribd company logo
Jaskaran Singh and
Vansh
Algorithms: K Nearest Neighbors
1
Simple Analogy..
• Tell me about your friends(who your
neighbors are) and I will tell you who you are.
2
Instance-based Learning
Its very similar to a
Desktop!!
3
KNN – Different names
• K-Nearest Neighbors
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
4
What is KNN?
• A powerful classification algorithm used in pattern
recognition.
• K nearest neighbors stores all available cases and
classifies new cases based on a similarity measure(e.g
distance function)
• One of the top data mining algorithms used today.
• A non-parametric lazy learning algorithm (An Instance-
based Learning method).
5
KNN: Classification Approach
• An object (a new instance) is classified by a
majority votes for its neighbor classes.
• The object is assigned to the most common class
amongst its K nearest neighbors.(measured by a
distant function )
6
7
Distance Measure
Training
Records
Test
Record
Compute
Distance
Choose k of the
“nearest” records
8
Distance measure for Continuous
Variables
9
Distance Between Neighbors
• Calculate the distance between new example
(E) and all examples in the training set.
• Euclidean distance between two examples.
– X = [x1,x2,x3,..,xn]
– Y = [y1,y2,y3,...,yn]
– The Euclidean distance between X and Y is defined
as:
10




n
i
i
i y
x
Y
X
D
1
2
)
(
)
,
(
K-Nearest Neighbor Algorithm
• All the instances correspond to points in an n-dimensional
feature space.
• Each instance is represented with a set of numerical
attributes.
• Each of the training data consists of a set of vectors and a
class label associated with each vector.
• Classification is done by comparing feature vectors of
different K nearest points.
• Select the K-nearest examples to E in the training set.
• Assign E to the most common class among its K-nearest
neighbors.
11
3-KNN: Example(1)
Distance from John
sqrt [(35-37)2+(35-50)2 +(3-
2)2]=15.16
sqrt [(22-37)2+(50-50)2 +(2-
2)2]=15
sqrt [(63-37)2+(200-50)2 +(1-
2)2]=152.23
sqrt [(59-37)2+(170-50)2 +(1-
2)2]=122
sqrt [(25-37)2+(40-50)2 +(4-
2)2]=15.74
12
Customer Age Income No.
credit
cards
Class
George 35 35K 3 No
Rachel 22 50K 2 Yes
Steve 63 200K 1 No
Tom 59 170K 1 No
Anne 25 40K 4 Yes
John 37 50K 2 ? YES
How to choose K?
• If K is too small it is sensitive to noise points.
• Larger K works well. But too large K may include majority
points from other classes.
• Rule of thumb is K < sqrt(n), n is number of examples.
13
X
14
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points
that have the k smallest distance to x
15
KNN Feature Weighting
• Scale each feature by its importance for
classification
• Can use our prior knowledge about which features
are more important
• Can learn the weights wk using cross‐validation (to
be covered later)
16
Feature Normalization
• Distance between neighbors could be dominated
by some attributes with relatively large numbers.
 e.g., income of customers in our previous example.
• Arises when two features are in different scales.
• Important to normalize those features.
– Mapping values to numbers between 0 – 1.
17
Nominal/Categorical Data
• Distance works naturally with numerical attributes.
• Binary value categorical data attributes can be regarded as 1
or 0.
18
KNN Classification
$0
$50,000
$100,000
$150,000
$200,000
$250,000
0 10 20 30 40 50 60 70
Non-Default
Default
Age
Loan$
19
KNN Classification – Distance
Age Loan Default Distance
25 $40,000 N 102000
35 $60,000 N 82000
45 $80,000 N 62000
20 $20,000 N 122000
35 $120,000 N 22000
52 $18,000 N 124000
23 $95,000 Y 47000
40 $62,000 Y 80000
60 $100,000 Y 42000
48 $220,000 Y 78000
33 $150,000 Y 8000
48 $142,000 ?
2
2
1
2
2
1 )
(
)
( y
y
x
x
D 



20
KNN Classification – Standardized Distance
Age Loan Default Distance
0.125 0.11 N 0.7652
0.375 0.21 N 0.5200
0.625 0.31 N 0.3160
0 0.01 N 0.9245
0.375 0.50 N 0.3428
0.8 0.00 N 0.6220
0.075 0.38 Y 0.6669
0.5 0.22 Y 0.4437
1 0.41 Y 0.3650
0.7 1.00 Y 0.3861
0.325 0.65 Y 0.3771
0.7 0.61 ?
Min
Max
Min
X
Xs



21
Strengths of KNN
• Very simple and intuitive.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large enough.
22
Weaknesses of KNN
• Takes more time to classify a new example.
• need to calculate and compare distance from new example
to all other examples.
• Choosing k may be tricky.
• Need large number of samples for accuracy.

More Related Content

Similar to knn is the k nearest algorithm ppt that includes all about knn, its adv and disadv

KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
AbhilashChauhan14
 
Knn
KnnKnn
Knn
KnnKnn
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_Presentation
Madeleine Organ
 
07 learning
07 learning07 learning
07 learning
ankit_ppt
 
Clasification approaches
Clasification approachesClasification approaches
Clasification approaches
gscprasad1111
 
knn-1.pptx
knn-1.pptxknn-1.pptx
knn-1.pptx
MohammedSahil63
 
Knn
KnnKnn
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
Cory Cook
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
Melba Shaya Sweety
 
K Nearest neighbour
K Nearest neighbourK Nearest neighbour
K Nearest neighbour
Learnbay Datascience
 
Knn
KnnKnn
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
Md Abul Hayat
 
Part 2
Part 2Part 2
Variability
VariabilityVariability
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
gamingzonedead880
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
Pratik Gohel
 
Data Mining Lecture_5.pptx
Data Mining Lecture_5.pptxData Mining Lecture_5.pptx
Data Mining Lecture_5.pptx
Subrata Kumer Paul
 
similarities-knn.pptx
similarities-knn.pptxsimilarities-knn.pptx
similarities-knn.pptx
ZhammerEsrael2
 

Similar to knn is the k nearest algorithm ppt that includes all about knn, its adv and disadv (20)

KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
 
Knn
KnnKnn
Knn
 
Knn
KnnKnn
Knn
 
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_Presentation
 
07 learning
07 learning07 learning
07 learning
 
Clasification approaches
Clasification approachesClasification approaches
Clasification approaches
 
knn-1.pptx
knn-1.pptxknn-1.pptx
knn-1.pptx
 
Knn
KnnKnn
Knn
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
K Nearest neighbour
K Nearest neighbourK Nearest neighbour
K Nearest neighbour
 
Knn
KnnKnn
Knn
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Part 2
Part 2Part 2
Part 2
 
Variability
VariabilityVariability
Variability
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
 
Data Mining Lecture_5.pptx
Data Mining Lecture_5.pptxData Mining Lecture_5.pptx
Data Mining Lecture_5.pptx
 
similarities-knn.pptx
similarities-knn.pptxsimilarities-knn.pptx
similarities-knn.pptx
 

Recently uploaded

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
PIMR BHOPAL
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
Prakhyath Rai
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
mahaffeycheryld
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
edwin408357
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 

Recently uploaded (20)

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 

knn is the k nearest algorithm ppt that includes all about knn, its adv and disadv

  • 1. Jaskaran Singh and Vansh Algorithms: K Nearest Neighbors 1
  • 2. Simple Analogy.. • Tell me about your friends(who your neighbors are) and I will tell you who you are. 2
  • 3. Instance-based Learning Its very similar to a Desktop!! 3
  • 4. KNN – Different names • K-Nearest Neighbors • Memory-Based Reasoning • Example-Based Reasoning • Instance-Based Learning • Lazy Learning 4
  • 5. What is KNN? • A powerful classification algorithm used in pattern recognition. • K nearest neighbors stores all available cases and classifies new cases based on a similarity measure(e.g distance function) • One of the top data mining algorithms used today. • A non-parametric lazy learning algorithm (An Instance- based Learning method). 5
  • 6. KNN: Classification Approach • An object (a new instance) is classified by a majority votes for its neighbor classes. • The object is assigned to the most common class amongst its K nearest neighbors.(measured by a distant function ) 6
  • 7. 7
  • 9. Distance measure for Continuous Variables 9
  • 10. Distance Between Neighbors • Calculate the distance between new example (E) and all examples in the training set. • Euclidean distance between two examples. – X = [x1,x2,x3,..,xn] – Y = [y1,y2,y3,...,yn] – The Euclidean distance between X and Y is defined as: 10     n i i i y x Y X D 1 2 ) ( ) , (
  • 11. K-Nearest Neighbor Algorithm • All the instances correspond to points in an n-dimensional feature space. • Each instance is represented with a set of numerical attributes. • Each of the training data consists of a set of vectors and a class label associated with each vector. • Classification is done by comparing feature vectors of different K nearest points. • Select the K-nearest examples to E in the training set. • Assign E to the most common class among its K-nearest neighbors. 11
  • 12. 3-KNN: Example(1) Distance from John sqrt [(35-37)2+(35-50)2 +(3- 2)2]=15.16 sqrt [(22-37)2+(50-50)2 +(2- 2)2]=15 sqrt [(63-37)2+(200-50)2 +(1- 2)2]=152.23 sqrt [(59-37)2+(170-50)2 +(1- 2)2]=122 sqrt [(25-37)2+(40-50)2 +(4- 2)2]=15.74 12 Customer Age Income No. credit cards Class George 35 35K 3 No Rachel 22 50K 2 Yes Steve 63 200K 1 No Tom 59 170K 1 No Anne 25 40K 4 Yes John 37 50K 2 ? YES
  • 13. How to choose K? • If K is too small it is sensitive to noise points. • Larger K works well. But too large K may include majority points from other classes. • Rule of thumb is K < sqrt(n), n is number of examples. 13 X
  • 14. 14
  • 15. X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x 15
  • 16. KNN Feature Weighting • Scale each feature by its importance for classification • Can use our prior knowledge about which features are more important • Can learn the weights wk using cross‐validation (to be covered later) 16
  • 17. Feature Normalization • Distance between neighbors could be dominated by some attributes with relatively large numbers.  e.g., income of customers in our previous example. • Arises when two features are in different scales. • Important to normalize those features. – Mapping values to numbers between 0 – 1. 17
  • 18. Nominal/Categorical Data • Distance works naturally with numerical attributes. • Binary value categorical data attributes can be regarded as 1 or 0. 18
  • 19. KNN Classification $0 $50,000 $100,000 $150,000 $200,000 $250,000 0 10 20 30 40 50 60 70 Non-Default Default Age Loan$ 19
  • 20. KNN Classification – Distance Age Loan Default Distance 25 $40,000 N 102000 35 $60,000 N 82000 45 $80,000 N 62000 20 $20,000 N 122000 35 $120,000 N 22000 52 $18,000 N 124000 23 $95,000 Y 47000 40 $62,000 Y 80000 60 $100,000 Y 42000 48 $220,000 Y 78000 33 $150,000 Y 8000 48 $142,000 ? 2 2 1 2 2 1 ) ( ) ( y y x x D     20
  • 21. KNN Classification – Standardized Distance Age Loan Default Distance 0.125 0.11 N 0.7652 0.375 0.21 N 0.5200 0.625 0.31 N 0.3160 0 0.01 N 0.9245 0.375 0.50 N 0.3428 0.8 0.00 N 0.6220 0.075 0.38 Y 0.6669 0.5 0.22 Y 0.4437 1 0.41 Y 0.3650 0.7 1.00 Y 0.3861 0.325 0.65 Y 0.3771 0.7 0.61 ? Min Max Min X Xs    21
  • 22. Strengths of KNN • Very simple and intuitive. • Can be applied to the data from any distribution. • Good classification if the number of samples is large enough. 22 Weaknesses of KNN • Takes more time to classify a new example. • need to calculate and compare distance from new example to all other examples. • Choosing k may be tricky. • Need large number of samples for accuracy.

Editor's Notes

  1. Based on similarity function “tell me who your neighbors are, and I’ll tell you who you are” Simplest Algorithms
  2. KNN stattistical estimation Pattern recognition in the beginning of 1970
  3. Algorithm
  4. Weight and hight of people and we assume Red : female Blue: Male So we have new measurement of hight and weight … K=1,female ,we assigned class =female K=5 3 female, 2 males K=odd K=6 we have ties
  5. Although there are other possible choices, most instance-based learners use Euclid- ean distance.
  6. Although there are other possible choices, most instance-based learners use Euclid- ean distance. Other distance metrics may be more appropriate in special circumstances. If we have too many points Sum the squared differences and we get the square roots In Manhanttan distances : get absoulte values Min-ko-waski distance= Different distance measure .. We use the common one
  7. N-feature size
  8. This single number K is given prior to the testing phase and this number decides how many neighbors influence the classification.
  9. If the sample set is finite.. Historically the optomal K for mpst datasets has been between 3-10 . That produces much better results than 1NN
  10. Some configurations over data before applying KNN Different attributes are often measured on different scales, Numerical: Here the difference between two values is just the numerical difference between them, and it is this difference that is squared and added to yield the distance functio Nominal: the difference between two values that are not the same is often taken to be 1, whereas if the values are the same the difference is 0. No scaling is required in this case because only the values 0 and 1 are used.
  11. For attributes with more than two values, binary indicator variables are used as an implicit solution. Ex: In a customers’ Bank field with 3 values, Bank1 – 100 Bank2 – 010 Bank3 - 001
  12. That’s price to pay
  13. Robust: strong and healthy; vigorou