2. The K-NN algorithm
The key concepts that define nearest neighbor classifiers, and why
they are considered "lazy" learners
Methods to measure the similarity of two examples using distance.
To apply a popular nearest neighbor classifier called k-NN
3. The K-NN algorithm
The key concepts that define nearest neighbor classifiers,
and why they are considered "lazy" learners
Methods to measure the similarity of two examples using
distance.
To apply a popular nearest neighbor classifier called k-NN
4. Nearest Neighbor Classification
Nearest neighbor classifiers are defined by their
characteristic of classifying unlabeled examples by assigning
them the class of similar labeled examples.
In general, nearest neighbor classifiers are well-suited for
classification tasks, Where relationships among the features
and the target classes are numerous, complicated, or
extremely difficult to understand, yet the items of similar
class type tend to be fairly homogeneous.
On the other hand, if the data is noisy and thus no clear
distinction exists among the groups, the nearest neighbor
algorithms may struggle to identify the class boundaries.
5. The K-NN algorithm
The nearest neighbors approach to classification is exemplified by
the k-nearest neighbors algorithm (k-NN).
K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears
then it can be easily classified into a well suite category by using K-
NN algorithm.
The strengths and weaknesses of this algorithm are as follows:
7. Nearest Neighbor Classification
The k-NN algorithm gets its name from the fact that it uses
information about an example's k-nearest neighbors to
classify unlabeled examples.
The letter k is a variable term implying that any number of
nearest neighbors could be used.
The k in KNN algorithm represents the number of nearest
neighbors. So if the value of K is 3, it means that the 3
nearest neighbors are considered for computation.
After choosing k, the algorithm requires a training dataset
made up of examples that have been classified into several
categories, as labeled by a nominal variable.
9. Nearest Neighbor Classification
The k-NN algorithm gets its name from the fact that it uses information
about an example's k-nearest neighbors to classify unlabeled
examples.
The letter k is a variable term implying that any number of nearest
neighbors could be used.
After choosing k, the algorithm requires a training dataset made up of
examples that have been classified into several categories, as
labeled by a nominal variable.
Then, for each unlabeled record in the test dataset, k-NN identifies k
records in the training data that are the "nearest" in similarity.
The unlabeled test instance is assigned the class of the majority of the k
nearest neighbors.
11. The K-NN working can be explained on
the basis of the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of
neighbors
Step-3: Take the K nearest neighbors as per the calculated
Euclidean distance.
Step-4: Among these k neighbors, count the number of the
data points in each category.
Step-5: Assign the new data points to that category for which
the number of the neighbor is maximum.
Step-6: Our model is ready.
12. Pythagorean theorem
Euclidean distance is the most common metric used, and is derived
from the Pythagorean theorem. Euclidean distance simply refers to
the distance between two points. The formula for calculating
Euclidean distance:
13. Pythagorean theorem
The formula above states, for each dimension, we calculate the length of
that side in the triangle by subtracting a point’s value from the
other’s. Then, square and add it to the running total. The Euclidean
distance is the square root of the running total.