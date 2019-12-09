Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Rameswara Reddy.K.V 1
SimpleAnalogy.. 2 • Tell me about your friends(who your neighbors are) and Iwill tellyou who you are.
Instance-based Learning 3 Its very similar to a Desktop!!
KNN–Different names 4 • K-Nearest Neighbors • Memory-Based Reasoning • Example-Based Reasoning • Instance-Based Learning •...
What isKNN? 5 • Apowerful classification algorithm used in pattern recognition. • Knearest neighbors stores all available ...
Approach 6 • An object (a new instance) is classified bya majority votes for its neighborclasses. • The object is assigned...
7
Distance Measure 8 Training Records Test Record Compute Distance Choose k of the “nearest” records
Distancemeasurefor Continuous Variables 9
DistanceBetweenNeighbors • Calculate the distance between newexample (E)and all examplesin the trainingset. • Euclidean di...
K-NearestNeighborAlgorithm 11 • All the instances correspond to points inan n-dimensional feature space. • Eachinstance is...
3-KNN:Example(1) 12 sqrt [(35-37)2+(35-50)2+(3- 2)2]=15.16 sqrt [(22-37)2+(50-50)2+(2- 2)2]=15 sqrt [(25-37)2+(40-50)2+(4-...
Howto chooseK? 13 • If Kis too small it is sensitive tonoise points. • LargerKworks well. But too large Kmay include major...
14
X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data poin...
FeatureNormalization 16 • Distance between neighbors could bedominated by some attributes with relatively largenumbers. e....
KNNClassification– Distance 17 Age Loan Default Distance 25 $40,000 N 102000 35 $60,000 N 82000 45 $80,000 N 62000 20 $20,...
StrengthsofKNN 18 • Very simple and intuitive. • Canbe applied to the data from anydistribution. • Good classification if ...
Clustering  Clustering: the process of grouping a set of objects into classes of similar objects  Documents within a clu...
 Distance measure will determine how the similarity of two elements is calculated and it will influence the shape of the ...
K Means
 Simply speaking k-means clustering is an algorithm to classify or to group the objects based on attributes/features into...
 Step 1: Begin with a decision on the value of k = number of clusters .  Step 2: Put any initial partition that classifi...
 Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is ...
Knn 160904075605-converted
Upcoming SlideShare
Loading in …5
×

Knn 160904075605-converted

39 views

Published on

KNN algorithm

Published in: Education
no profile picture user

  • Be the first to comment

  • Be the first to like this

Knn 160904075605-converted

  1. 1. Rameswara Reddy.K.V 1
  2. 2. SimpleAnalogy.. 2 • Tell me about your friends(who your neighbors are) and Iwill tellyou who you are.
  3. 3. Instance-based Learning 3 Its very similar to a Desktop!!
  4. 4. KNN–Different names 4 • K-Nearest Neighbors • Memory-Based Reasoning • Example-Based Reasoning • Instance-Based Learning • LazyLearning
  5. 5. What isKNN? 5 • Apowerful classification algorithm used in pattern recognition. • Knearest neighbors stores all available casesand classifies new casesbasedon asimilarity measure(e.g distance function) • Oneof the topdata mining algorithms used today. • Anon-parametric lazy learning algorithm (AnInstance- basedLearning method).
  6. 6. Approach 6 • An object (a new instance) is classified bya majority votes for its neighborclasses. • The object is assigned to the most common class amongst its Knearest neighbors.(measured by a distant function )
  7. 7. 7
  8. 8. Distance Measure 8 Training Records Test Record Compute Distance Choose k of the “nearest” records
  9. 9. Distancemeasurefor Continuous Variables 9
  10. 10. DistanceBetweenNeighbors • Calculate the distance between newexample (E)and all examplesin the trainingset. • Euclidean distance between two examples. – X=[x1,x2,x3,..,xn] – Y=[y1,y2,y3,...,yn] – TheEuclidean distance between X and Yisdefined as: 11 n  i1 2 (xi  yi )D(X ,Y ) 
  11. 11. K-NearestNeighborAlgorithm 11 • All the instances correspond to points inan n-dimensional feature space. • Eachinstance is represented with aset of numerical attributes. • Eachof the training data consists of aset ofvectors and a classlabel associatedwith eachvector. • Classification is done by comparing feature vectors of different Knearestpoints. (K>=2) • Select the K-nearest examples to Ein the trainingset. • AssignEto the most common classamong itsK-nearest neighbors.
  12. 12. 3-KNN:Example(1) 12 sqrt [(35-37)2+(35-50)2+(3- 2)2]=15.16 sqrt [(22-37)2+(50-50)2+(2- 2)2]=15 sqrt [(25-37)2+(40-50)2+(4- 2)2]=15.74 ? Distance from John sqrt [(63-37)2+(200-50)2 +(1- 2)2]=152.23 sqrt [(59-37)2+(170-50)2 +(1- 2)2]=122 Customer Age Income No. credit cards Class George 35 35K 3 No Rachel 22 50K 2 Yes Steve 63 200K 1 No Tom 59 170K 1 No Anne 25 40K 4 Yes John 37 50K 2 YES
  13. 13. Howto chooseK? 13 • If Kis too small it is sensitive tonoise points. • LargerKworks well. But too large Kmay include majority points from otherclasses. • Kvalueshouldbe(K>=2) X
  14. 14. 14
  15. 15. X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x 15
  16. 16. FeatureNormalization 16 • Distance between neighbors could bedominated by some attributes with relatively largenumbers. e.g., income of customers in our previousexample. • Arises when two features are in differentscales. • Important to normalize thosefeatures. – Mapping values to numbers between 0 –1.
  17. 17. KNNClassification– Distance 17 Age Loan Default Distance 25 $40,000 N 102000 35 $60,000 N 82000 45 $80,000 N 62000 20 $20,000 N 122000 35 $120,000 N 22000 52 $18,000 N 124000 23 $95,000 Y 47000 40 $62,000 Y 80000 60 $100,000 Y 42000 48 $220,000 Y 78000 33 $150,000 Y 8000 48 $142,000 ? D  (x  x )2  (y  y )2 1 2 1 2
  18. 18. StrengthsofKNN 18 • Very simple and intuitive. • Canbe applied to the data from anydistribution. • Good classification if the number ofsamples is large enough. Weaknessesof KNN • Takesmore time to classify anewexample. • need to calculate and compare distance from newexample to all otherexamples. • Choosing k may be tricky. • Need large number of samples foraccuracy.
  19. 19. Clustering  Clustering: the process of grouping a set of objects into classes of similar objects  Documents within a cluster should be similar. Documents from different clusters should be dissimilar.  The commonest form of unsupervised learning  Unsupervised learning = learning from raw data, as opposed to supervised data where a classification of examples is given.  in principle, optimal partition achieved via minimising the sum of squared distance to its “representative object” in each cluster
  20. 20.  Distance measure will determine how the similarity of two elements is calculated and it will influence the shape of the clusters.
  21. 21. K Means
  22. 22.  Simply speaking k-means clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group.  K is positive integer number.  The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid.
  23. 23.  Step 1: Begin with a decision on the value of k = number of clusters .  Step 2: Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly,or systematically as the following: 1.Take the first k training sample as single- element clusters 2. Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recompute the centroid of the gaining cluster.
  24. 24.  Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.  Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.

×