SkNoushadddoja_28100119039.pptx

K-Nearest Neighbor(KNN)
Algorithm for Machine Learning
PREPARED BY
SK NOUSHAD DDOJA
ROLL NO:- 28100119039

KNN
 K-Nearest Neighbors (KNN)
 Simple,but a very powerful classification algorithm
 Classifiesbased ona similarity measure
 Non-parametric
 Lazy learning
Doesnot “learn” until the test example isgiven
Whenever wehavea newdata to classify, we find its
K-nearest neighbors from the training data

KNN:Classification Approach
 Classified by “MAJORITY VOTES” for its neighbor
classes
Assignedto the mostcommonclassamongstits K-
nearest neighbors (by measuring“distant” between
data)

DecisionBoundaries
 Voronoi diagram
Describesthe areas that are nearest to any given point,
given a setof data.
Eachline segmentisequidistant between two points of
opposite class

DecisionBoundaries
 With large numberof examplesand possible noise
in the labels, the decisionboundary canbecome
nasty!
“Overfitting”problem

Effectof K
 Larger kproducessmootherboundary effect
 When K==N, always predict the majority class

Discussion
 Which model isbetter between K=1 and K=15?
 Why?

Howto choosek?
 Empirically optimal k?

ProsandCons
 Pros
Learning and implementation isextremely simpleand
Intuitive
Flexible decision boundaries
 Cons
Irrelevant or correlated features havehighimpact and
mustbe eliminated
Typically difficult to handle high dimensionality
Computational costs:memoryand classification time
computation

Similarity and Dissimilarity
 Similarity
Numerical measureof howalike two data objectsare.
Ishigher whenobjects are more alike.
Often falls in the range [0,1]
 Dissimilarity
Numerical measureof howdifferent are two data objects
Lowerwhenobjects are morealike
Minimumdissimilarity isoften0
Upper limit varies
 Proximity refers to a similarity or dissimilarity

Euclidean Distance
 Euclidean Distance
𝑑𝑖𝑠𝑡= 𝑘= 1
σ𝑝
𝑘
(𝑎 − 𝑏𝑘 )2
Where p isthe numberof dimensions(attributes) and
𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th attributes
(components)or data objectsa and b.
 Standardization isnecessary,if scales differ.

Minkowski Distance
 MinkowskiDistanceisa generalization of Euclidean
Distance
𝑝
𝑑𝑖𝑠𝑡 = ෍ |𝑎𝑘 − 𝑏𝑘|𝑟
𝑘=1
1/ 𝑟
Where r isa parameter, p isthe numberof dimensions
(attributes) and 𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th
attributes (components)or data objectsa and b

MinkowskiDistance: Examples
 r = 1. City block (Manhattan, taxicab, L1norm) distance.
Acommonexample of thisisthe Hammingdistance,whichisjust
the numberof bits that are different between two binary vectors
 r = 2. Euclideandistance
 r ∞ . “supremum” (𝐿𝑚𝑎𝑥 norm,𝐿∞ norm)distance.
Thisisthe maximumdifference between any component of the
vectors
 Donot confuser with p, i.e., all thesedistancesare defined
for all numbersof dimensions.

CosineSimilarity
 If 𝑑1 and 𝑑2 are two document vectors
cos(𝑑1, 𝑑2)=(𝑑1 ∙𝑑2 )/||𝑑1 ||||𝑑2 ||,
Where ∙ indicates vector dot product and | | 𝑑 | | isthe length of vectord.
 Example:
𝑑1= 3 2 0 5 0 0 0 2 0 0
𝑑2= 1 0 0 0 0 0 0 1 0 2
𝑑1 ∙ 𝑑2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
| | 𝑑 1 | | = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5=(42)^0.5= 6.481
| | 𝑑 1 | | = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6)^0.5= 2.245
cos(d1, d2) = .3150

CosineSimilarity
1: exactly the
same
cos(𝑑1, 𝑑2) = ቐ0: orthogonal
−1: exactly
opposite

Feature scaling
 Standardize the range of independent variables
(features of data)
 A.k.a Normalization or Standardization

Standardization
 Standardizationor Z-score normalization
 Rescalethedata sothat themeaniszero
and the standard deviation from themean
(standard scores)is one
x − 𝜇
x𝑛𝑜𝑟𝑚 =
𝜎
𝜇 ismean,𝜎 isa standard deviation from the mean
(standard score)

Min-Max scaling
 Scalethe data to a fixed range – between 0 and 1
xm or
m
=
x −
xmin
xmax −
xmin

Efficient implementation
 Considerdata asa matrix or a vector
 Matrix/Vector computational ismuchmoreefficient
thancomputingwith loop

SkNoushadddoja_28100119039.pptx

SkNoushadddoja_28100119039.pptx

More Related Content

Similar to SkNoushadddoja_28100119039.pptx

More from PrakasBhowmik

Recently uploaded

SkNoushadddoja_28100119039.pptx