K-Nearest Neighbor(KNN)
Algorithm for Machine Learning
PREPARED BY
SK NOUSHAD DDOJA
ROLL NO:- 28100119039
KNN
 K-Nearest Neighbors (KNN)
 Simple,but a very powerful classification algorithm
 Classifiesbased ona similarity measure
 Non-parametric
 Lazy learning
Doesnot “learn” until the test example isgiven
Whenever wehavea newdata to classify, we find its
K-nearest neighbors from the training data
KNN:Classification Approach
 Classified by “MAJORITY VOTES” for its neighbor
classes
Assignedto the mostcommonclassamongstits K-
nearest neighbors (by measuring“distant” between
data)
KNN: Example
KNN: Pseudocode
KNN: Example
KNN:Euclideandistance matrix
DecisionBoundaries
 Voronoi diagram
Describesthe areas that are nearest to any given point,
given a setof data.
Eachline segmentisequidistant between two points of
opposite class
DecisionBoundaries
 With large numberof examplesand possible noise
in the labels, the decisionboundary canbecome
nasty!
“Overfitting”problem
Effectof K
 Larger kproducessmootherboundary effect
 When K==N, always predict the majority class
Discussion
 Which model isbetter between K=1 and K=15?
 Why?
Howto choosek?
 Empirically optimal k?
ProsandCons
 Pros
Learning and implementation isextremely simpleand
Intuitive
Flexible decision boundaries
 Cons
Irrelevant or correlated features havehighimpact and
mustbe eliminated
Typically difficult to handle high dimensionality
Computational costs:memoryand classification time
computation
Similarity and Dissimilarity
 Similarity
Numerical measureof howalike two data objectsare.
Ishigher whenobjects are more alike.
Often falls in the range [0,1]
 Dissimilarity
Numerical measureof howdifferent are two data objects
Lowerwhenobjects are morealike
Minimumdissimilarity isoften0
Upper limit varies
 Proximity refers to a similarity or dissimilarity
Euclidean Distance
 Euclidean Distance
𝑑𝑖𝑠𝑡= 𝑘= 1
σ𝑝
𝑘
(𝑎 − 𝑏𝑘 )2
Where p isthe numberof dimensions(attributes) and
𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th attributes
(components)or data objectsa and b.
 Standardization isnecessary,if scales differ.
Euclidean Distance
Minkowski Distance
 MinkowskiDistanceisa generalization of Euclidean
Distance
𝑝
𝑑𝑖𝑠𝑡 = ෍ |𝑎𝑘 − 𝑏𝑘|𝑟
𝑘=1
1/ 𝑟
Where r isa parameter, p isthe numberof dimensions
(attributes) and 𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th
attributes (components)or data objectsa and b
MinkowskiDistance: Examples
 r = 1. City block (Manhattan, taxicab, L1norm) distance.
Acommonexample of thisisthe Hammingdistance,whichisjust
the numberof bits that are different between two binary vectors
 r = 2. Euclideandistance
 r ∞ . “supremum” (𝐿𝑚𝑎𝑥 norm,𝐿∞ norm)distance.
Thisisthe maximumdifference between any component of the
vectors
 Donot confuser with p, i.e., all thesedistancesare defined
for all numbersof dimensions.
CosineSimilarity
 If 𝑑1 and 𝑑2 are two document vectors
cos(𝑑1, 𝑑2)=(𝑑1 ∙𝑑2 )/||𝑑1 ||||𝑑2 ||,
Where ∙ indicates vector dot product and | | 𝑑 | | isthe length of vectord.
 Example:
𝑑1= 3 2 0 5 0 0 0 2 0 0
𝑑2= 1 0 0 0 0 0 0 1 0 2
𝑑1 ∙ 𝑑2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
| | 𝑑 1 | | = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5=(42)^0.5= 6.481
| | 𝑑 1 | | = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6)^0.5= 2.245
cos(d1, d2) = .3150
CosineSimilarity
1: exactly the
same
cos(𝑑1, 𝑑2) = ቐ0: orthogonal
−1: exactly
opposite
Feature scaling
 Standardize the range of independent variables
(features of data)
 A.k.a Normalization or Standardization
Standardization
 Standardizationor Z-score normalization
 Rescalethedata sothat themeaniszero
and the standard deviation from themean
(standard scores)is one
x − 𝜇
x𝑛𝑜𝑟𝑚 =
𝜎
𝜇 ismean,𝜎 isa standard deviation from the mean
(standard score)
Min-Max scaling
 Scalethe data to a fixed range – between 0 and 1
xm or
m
=
x −
xmin
xmax −
xmin
Efficient implementation
 Considerdata asa matrix or a vector
 Matrix/Vector computational ismuchmoreefficient
thancomputingwith loop
SkNoushadddoja_28100119039.pptx

SkNoushadddoja_28100119039.pptx

  • 1.
    K-Nearest Neighbor(KNN) Algorithm forMachine Learning PREPARED BY SK NOUSHAD DDOJA ROLL NO:- 28100119039
  • 2.
    KNN  K-Nearest Neighbors(KNN)  Simple,but a very powerful classification algorithm  Classifiesbased ona similarity measure  Non-parametric  Lazy learning Doesnot “learn” until the test example isgiven Whenever wehavea newdata to classify, we find its K-nearest neighbors from the training data
  • 3.
    KNN:Classification Approach  Classifiedby “MAJORITY VOTES” for its neighbor classes Assignedto the mostcommonclassamongstits K- nearest neighbors (by measuring“distant” between data)
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    DecisionBoundaries  Voronoi diagram Describestheareas that are nearest to any given point, given a setof data. Eachline segmentisequidistant between two points of opposite class
  • 9.
    DecisionBoundaries  With largenumberof examplesand possible noise in the labels, the decisionboundary canbecome nasty! “Overfitting”problem
  • 10.
    Effectof K  Largerkproducessmootherboundary effect  When K==N, always predict the majority class
  • 11.
    Discussion  Which modelisbetter between K=1 and K=15?  Why?
  • 12.
  • 13.
    ProsandCons  Pros Learning andimplementation isextremely simpleand Intuitive Flexible decision boundaries  Cons Irrelevant or correlated features havehighimpact and mustbe eliminated Typically difficult to handle high dimensionality Computational costs:memoryand classification time computation
  • 14.
    Similarity and Dissimilarity Similarity Numerical measureof howalike two data objectsare. Ishigher whenobjects are more alike. Often falls in the range [0,1]  Dissimilarity Numerical measureof howdifferent are two data objects Lowerwhenobjects are morealike Minimumdissimilarity isoften0 Upper limit varies  Proximity refers to a similarity or dissimilarity
  • 15.
    Euclidean Distance  EuclideanDistance 𝑑𝑖𝑠𝑡= 𝑘= 1 σ𝑝 𝑘 (𝑎 − 𝑏𝑘 )2 Where p isthe numberof dimensions(attributes) and 𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th attributes (components)or data objectsa and b.  Standardization isnecessary,if scales differ.
  • 16.
  • 17.
    Minkowski Distance  MinkowskiDistanceisageneralization of Euclidean Distance 𝑝 𝑑𝑖𝑠𝑡 = ෍ |𝑎𝑘 − 𝑏𝑘|𝑟 𝑘=1 1/ 𝑟 Where r isa parameter, p isthe numberof dimensions (attributes) and 𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th attributes (components)or data objectsa and b
  • 18.
    MinkowskiDistance: Examples  r= 1. City block (Manhattan, taxicab, L1norm) distance. Acommonexample of thisisthe Hammingdistance,whichisjust the numberof bits that are different between two binary vectors  r = 2. Euclideandistance  r ∞ . “supremum” (𝐿𝑚𝑎𝑥 norm,𝐿∞ norm)distance. Thisisthe maximumdifference between any component of the vectors  Donot confuser with p, i.e., all thesedistancesare defined for all numbersof dimensions.
  • 19.
    CosineSimilarity  If 𝑑1and 𝑑2 are two document vectors cos(𝑑1, 𝑑2)=(𝑑1 ∙𝑑2 )/||𝑑1 ||||𝑑2 ||, Where ∙ indicates vector dot product and | | 𝑑 | | isthe length of vectord.  Example: 𝑑1= 3 2 0 5 0 0 0 2 0 0 𝑑2= 1 0 0 0 0 0 0 1 0 2 𝑑1 ∙ 𝑑2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5 | | 𝑑 1 | | = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5=(42)^0.5= 6.481 | | 𝑑 1 | | = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6)^0.5= 2.245 cos(d1, d2) = .3150
  • 20.
    CosineSimilarity 1: exactly the same cos(𝑑1,𝑑2) = ቐ0: orthogonal −1: exactly opposite
  • 21.
    Feature scaling  Standardizethe range of independent variables (features of data)  A.k.a Normalization or Standardization
  • 22.
    Standardization  Standardizationor Z-scorenormalization  Rescalethedata sothat themeaniszero and the standard deviation from themean (standard scores)is one x − 𝜇 x𝑛𝑜𝑟𝑚 = 𝜎 𝜇 ismean,𝜎 isa standard deviation from the mean (standard score)
  • 23.
    Min-Max scaling  Scalethedata to a fixed range – between 0 and 1 xm or m = x − xmin xmax − xmin
  • 24.
    Efficient implementation  Considerdataasa matrix or a vector  Matrix/Vector computational ismuchmoreefficient thancomputingwith loop