SlideShare a Scribd company logo
1 of 15
Supervised Learning
and k Nearest
Neighbors
Business Intelligence for Managers
Supervised learning
and classification
 Given: dataset of instances with
known categories
 Goal: using the “knowledge” in the
dataset, classify a given instance
 predict the category of the given
instance that is rationally consistent
with the dataset
Classifiers
Classifier
…
feature
values
category
X1
X2
X3
Xn
Y
DB
collection of instances
with known categories
Algorithms
 K Nearest Neighbors (kNN)
 Naïve-Bayes
 Decision trees
 Many others (support vector
machines, neural networks, genetic
algorithms, etc)
K - Nearest Neighbors
 For a given instance T, get the top k
dataset instances that are “nearest” to
T
 Select a reasonable distance measure
 Inspect the category of these k
instances, choose the category C that
represent the most instances
 Conclude that T belongs to category C
Example 1
 Determining decision on scholarship
application based on the following features:
 Household income (annual income in
millions of pesos)
 Number of siblings in family
 High school grade (on a QPI scale of 1.0 –
4.0)
 Intuition (reflected on data set): award
scholarships to high-performers and to
those with financial need
Distance formula
 Euclidian distance: squareroot of sum
of squares of differences
for two features: (x)2 + (y)2
 Intuition: similar samples should be
close to each other
 May not always apply
(example: quota and actual sales)
Incomparable ranges
 The Euclidian distance formula has
the implicit assumption that the
different dimensions are comparable
 Features that span wider ranges
affect the distance value more than
features with limited ranges
Example revisited
 Suppose household income was
instead indicated in thousands of
pesos per month and that grades are
given on a 70-100 scale
 Note different results produced by
kNN algorithm on the same dataset
Non-numeric data
 Feature values are not always
numbers
 Example
 Boolean values: Yes or no, presence
or absence of an attribute
 Categories: Colors, educational
attainment, gender
 How do these values factor into the
computation of distance?
Dealing with non-numeric
data
 Boolean values => convert to 0 or 1
 Applies to yes-no/presence-absence
attributes
 Non-binary characterizations
 Use natural progression when applicable;
e.g., educational attainment: GS, HS,
College, MS, PHD => 1,2,3,4,5
 Assign arbitrary numbers but be careful
about distances; e.g., color: red, yellow, blue
=> 1,2,3
 How about unavailable data?
(0 value not always the answer)
Preprocessing your dataset
 Dataset may need to be preprocessed
to ensure more reliable data mining
results
 Conversion of non-numeric data to
numeric data
 Calibration of numeric data to reduce
effects of disparate ranges
 Particularly when using the Euclidean
distance metric
k-NN variations
 Value of k
 Larger k increases confidence in prediction
 Note that if k is too large, decision may be
skewed
 Weighted evaluation of nearest neighbors
 Plain majority may unfairly skew decision
 Revise algorithm so that closer neighbors
have greater “vote weight”
 Other distance measures
Other distance measures
 City-block distance (Manhattan dist)
 Add absolute value of differences
 Cosine similarity
 Measure angle formed by the two samples
(with the origin)
 Jaccard distance
 Determine percentage of exact matches
between the samples (not including
unavailable data)
 Others
k-NN Time Complexity
 Suppose there are m instances and n
features in the dataset
 Nearest neighbor algorithm requires
computing m distances
 Each distance computation involves
scanning through each feature value
 Running time complexity is
proportional to m X n

More Related Content

Similar to K nearest neighbors s machine learning method

Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iepaaryarun1999
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.pptsatvikpatil5
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive ModelsDatamining Tools
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Claudia Wagner
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data AnalysisSaad Chahine
 
Know Your Data in data mining applications
Know Your Data in data mining applicationsKnow Your Data in data mining applications
Know Your Data in data mining applicationsMaleehaSheikh2
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Advanced statistics for librarians
Advanced statistics for librariansAdvanced statistics for librarians
Advanced statistics for librariansJohn McDonald
 

Similar to K nearest neighbors s machine learning method (20)

Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iep
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.ppt
 
Unit-2.ppt
Unit-2.pptUnit-2.ppt
Unit-2.ppt
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
DataPreProcessing
DataPreProcessing DataPreProcessing
DataPreProcessing
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
Data PreProcessing
Data PreProcessingData PreProcessing
Data PreProcessing
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data Analysis
 
Know Your Data in data mining applications
Know Your Data in data mining applicationsKnow Your Data in data mining applications
Know Your Data in data mining applications
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Advanced statistics for librarians
Advanced statistics for librariansAdvanced statistics for librarians
Advanced statistics for librarians
 

Recently uploaded

8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniR. Sosa
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New HorizonMorshed Ahmed Rahath
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfssuser5c9d4b1
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological universityMohd Saifudeen
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Studentskannan348865
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...IJECEIAES
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxMANASINANDKISHORDEOR
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.benjamincojr
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Studentskannan348865
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptjigup7320
 

Recently uploaded (20)

8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 

K nearest neighbors s machine learning method

  • 1. Supervised Learning and k Nearest Neighbors Business Intelligence for Managers
  • 2. Supervised learning and classification  Given: dataset of instances with known categories  Goal: using the “knowledge” in the dataset, classify a given instance  predict the category of the given instance that is rationally consistent with the dataset
  • 4. Algorithms  K Nearest Neighbors (kNN)  Naïve-Bayes  Decision trees  Many others (support vector machines, neural networks, genetic algorithms, etc)
  • 5. K - Nearest Neighbors  For a given instance T, get the top k dataset instances that are “nearest” to T  Select a reasonable distance measure  Inspect the category of these k instances, choose the category C that represent the most instances  Conclude that T belongs to category C
  • 6. Example 1  Determining decision on scholarship application based on the following features:  Household income (annual income in millions of pesos)  Number of siblings in family  High school grade (on a QPI scale of 1.0 – 4.0)  Intuition (reflected on data set): award scholarships to high-performers and to those with financial need
  • 7. Distance formula  Euclidian distance: squareroot of sum of squares of differences for two features: (x)2 + (y)2  Intuition: similar samples should be close to each other  May not always apply (example: quota and actual sales)
  • 8. Incomparable ranges  The Euclidian distance formula has the implicit assumption that the different dimensions are comparable  Features that span wider ranges affect the distance value more than features with limited ranges
  • 9. Example revisited  Suppose household income was instead indicated in thousands of pesos per month and that grades are given on a 70-100 scale  Note different results produced by kNN algorithm on the same dataset
  • 10. Non-numeric data  Feature values are not always numbers  Example  Boolean values: Yes or no, presence or absence of an attribute  Categories: Colors, educational attainment, gender  How do these values factor into the computation of distance?
  • 11. Dealing with non-numeric data  Boolean values => convert to 0 or 1  Applies to yes-no/presence-absence attributes  Non-binary characterizations  Use natural progression when applicable; e.g., educational attainment: GS, HS, College, MS, PHD => 1,2,3,4,5  Assign arbitrary numbers but be careful about distances; e.g., color: red, yellow, blue => 1,2,3  How about unavailable data? (0 value not always the answer)
  • 12. Preprocessing your dataset  Dataset may need to be preprocessed to ensure more reliable data mining results  Conversion of non-numeric data to numeric data  Calibration of numeric data to reduce effects of disparate ranges  Particularly when using the Euclidean distance metric
  • 13. k-NN variations  Value of k  Larger k increases confidence in prediction  Note that if k is too large, decision may be skewed  Weighted evaluation of nearest neighbors  Plain majority may unfairly skew decision  Revise algorithm so that closer neighbors have greater “vote weight”  Other distance measures
  • 14. Other distance measures  City-block distance (Manhattan dist)  Add absolute value of differences  Cosine similarity  Measure angle formed by the two samples (with the origin)  Jaccard distance  Determine percentage of exact matches between the samples (not including unavailable data)  Others
  • 15. k-NN Time Complexity  Suppose there are m instances and n features in the dataset  Nearest neighbor algorithm requires computing m distances  Each distance computation involves scanning through each feature value  Running time complexity is proportional to m X n