SlideShare a Scribd company logo
K-NEAREST NEIGHBOR
ALGORITHM
Presented by Hien Nguyen
WHY DO WE CARE?
Amazon Prime Movie Adaptive Text Retrieval
Spam Email filtering Online course recommendation
WHAT IS K-NN ALGORITHM?
KNN is a non parametric lazy
learning algorithm that stores
all available cases and classifies
new cases based on a similarity
measure
3
Features Comparator
Library
Recommendati
on
K-NN CLASSIFICATION
4
Sir/Mada
m
Occurrenc
es
Word
Cout
Class
10 100 Spam
5 200 Good
25 100 Spam
30 400 Spam
20 300 Spam
0 400 Good
20 500 Good
30 600 Spam
10 600 Good
15 700 Spam
8 700 Spm
2 700 Good
K-NN CLASSIFICATION
2
21
2
21 )()( yyxxD 
K-NN CLASSIFICATIONSir/Madam
Occurrence
s
Word Cout Class Distanc
e
10 100 Spam 100
5 200 Good 5
25 100 Spam 101.11
30 400 Spam 200.99
20 300 Spam 100.49
0 400 Good 200.24
20 500 Good 300.16
30 600 Spam 400.49
10 600 Good 400
15 700 Spam 500.02
5
8 700 Spm 500.00
4
K-NN CLASSIFICATIONSir/Madam
Occurrence
s
Word Cout Class Distanc
e
10 100 Spam 100
5 200 Good 5
25 100 Spam 101.11
30 400 Spam 200.99
20 300 Spam 100.49
0 400 Good 200.24
20 500 Good 300.16
30 600 Spam 400.49
10 600 Good 400
15 700 Spam 500.02
5
8 700 Spm 500.00
4
K=1 => M: Good email
K-NN CLASSIFICATIONSir/Madam
Occurrence
s
Word Cout Class Distanc
e
10 100 Spam 100
5 200 Good 5
25 100 Spam 101.11
30 400 Spam 200.99
20 300 Spam 100.49
0 400 Spam 200.24
20 500 Good 300.16
30 600 Spam 400.49
10 600 Good 400
15 700 Spam 500.02
5
8 700 Spm 500.00
4
K=3 => M: Spam
K-NN CLASSIFICATIONSir/Madam
Occurrence
s
Word Cout Class Distanc
e
10 100 Spam 100
5 200 Good 5
25 100 Spam 101.11
30 400 Spam 200.99
20 300 Spam 100.49
0 400 Good 200.24
20 500 Good 300.16
30 600 Spam 400.49
10 600 Good 400
15 700 Spam 500.02
5
8 700 Spm 500.00
4
K=5 => M: Spam
KNN – VOTING
Majority voting:
 all votes are equal. Count how many of the k neighbours have that class.
Return the class with the most votes.
Inverse distance-weighted voting:
 Closer neighbours get higher votes. While there are better-motivated methods, the
simplest version is to take a neighbour’s vote to be the inverse of its distance to the
new instance:
 Then we sum the votes and return the class with the highest vote
10
SUMMARY
KNN is conceptually simple, yet able to solve complex problems
Can work with relatively little information
Learning is simple (no learning at all!)
Memory and CPU cost
Feature selection problem
Sensitive to representation
11WWW.ISMARTSOFT.COM
PRACTICE
Creature A Creature B
3-NN ? ?
5-NN ? ?
ACKNOWLEDGEMENT
Two examples of spam email and game are from MIT Open Course
Midterm and Final exam pages.

More Related Content

What's hot

LO 7 beats
LO 7 beatsLO 7 beats
LO 7 beats
Wenwan Zhang
 
November 16
November 16November 16
November 16
khyps13
 
Fraction To Decimal
Fraction To DecimalFraction To Decimal
Fraction To Decimal
Donna Furrey
 
Comparing fractions and decimals
Comparing fractions and decimalsComparing fractions and decimals
Comparing fractions and decimals
g2desai
 
Ordering decimals
Ordering decimalsOrdering decimals
Ordering decimals
emteacher
 
Comparing and Ordering Decimals
Comparing and Ordering DecimalsComparing and Ordering Decimals
Comparing and Ordering Decimals
NeilfieOrit2
 
Sept 4 Notes
Sept 4 NotesSept 4 Notes
Sept 4 Notes
april_lamb
 

What's hot (7)

LO 7 beats
LO 7 beatsLO 7 beats
LO 7 beats
 
November 16
November 16November 16
November 16
 
Fraction To Decimal
Fraction To DecimalFraction To Decimal
Fraction To Decimal
 
Comparing fractions and decimals
Comparing fractions and decimalsComparing fractions and decimals
Comparing fractions and decimals
 
Ordering decimals
Ordering decimalsOrdering decimals
Ordering decimals
 
Comparing and Ordering Decimals
Comparing and Ordering DecimalsComparing and Ordering Decimals
Comparing and Ordering Decimals
 
Sept 4 Notes
Sept 4 NotesSept 4 Notes
Sept 4 Notes
 

Recently uploaded

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 

Recently uploaded (20)

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 

K Nearest Neighbor Algorithm

  • 2. WHY DO WE CARE? Amazon Prime Movie Adaptive Text Retrieval Spam Email filtering Online course recommendation
  • 3. WHAT IS K-NN ALGORITHM? KNN is a non parametric lazy learning algorithm that stores all available cases and classifies new cases based on a similarity measure 3 Features Comparator Library Recommendati on
  • 4. K-NN CLASSIFICATION 4 Sir/Mada m Occurrenc es Word Cout Class 10 100 Spam 5 200 Good 25 100 Spam 30 400 Spam 20 300 Spam 0 400 Good 20 500 Good 30 600 Spam 10 600 Good 15 700 Spam 8 700 Spm 2 700 Good
  • 6. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Good 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4
  • 7. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Good 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4 K=1 => M: Good email
  • 8. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Spam 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4 K=3 => M: Spam
  • 9. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Good 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4 K=5 => M: Spam
  • 10. KNN – VOTING Majority voting:  all votes are equal. Count how many of the k neighbours have that class. Return the class with the most votes. Inverse distance-weighted voting:  Closer neighbours get higher votes. While there are better-motivated methods, the simplest version is to take a neighbour’s vote to be the inverse of its distance to the new instance:  Then we sum the votes and return the class with the highest vote 10
  • 11. SUMMARY KNN is conceptually simple, yet able to solve complex problems Can work with relatively little information Learning is simple (no learning at all!) Memory and CPU cost Feature selection problem Sensitive to representation 11WWW.ISMARTSOFT.COM
  • 12. PRACTICE Creature A Creature B 3-NN ? ? 5-NN ? ?
  • 13. ACKNOWLEDGEMENT Two examples of spam email and game are from MIT Open Course Midterm and Final exam pages.

Editor's Notes

  1. Nearest Neighbors have been used in statistical estimation and pattern recognition already in the beginning of 1970’s (non-parametric techniques). Dynamic Memory: A theory of Reminding and Learning in Computer and People (Schank, 1982). People reason by remembering and learn by doing. Thinking is reminding, making analogies