2. WHAT IS KNN?
• KNN is a type of supervised ML algorithm which can be
used for both classification as well as regression predictive
problems. However, it is mainly used for classification
predictive problems in industry.
• It is a Lazy Learning Algorithm because it does not learn
from the training set immediately instead it stores the dataset
and at the time of classification, it performs an action on the
dataset.
3. One of the simplest of all machine learning classifiers.
Simple Idea: Label a new point the same as the closest known
point.
Nearest Neighbor?
Red is nearest to
this new instance
4. • This technique implements classification by considering
majority of vote among the “k” closest points to the unlabeled
data point.
• K here describes the Numbers of instance point that should be
taken into the consideration.
• Euclidean Distance or Manhattan Distance are used as metric
for calculating the distance between points.
Principle of KNN - Classifier
5. Green circle is the unlabeled data
point.
K=3 in this problem
• Closest 3 points taken
• 2 are red 1 is blue
• Votes = 2 Red > 1 Blue
So, Here the Green circle will be red
triangle.
K Nearest neighbor
6. Green circle is the unlabeled data
point.
K=5 in this problem
• Closest 5 points taken
• 2 are red & 3 are blue
• Votes = 2 Red < 3 Blue
So, Here the Green circle will be
Blue rectangle.
K Nearest neighbor
7. Euclidean Distance
The Euclidean distance between two points in the plane with
coordinates (x1,y1) and (x2,y2) is given by-
D = √ [ (x2 – x1)2 + (y2 – y1)2]
8. Applications of KNN
The KNN can be used in various fields such as:
1. Recommendation System
2. Medicine
3. Finance
4. Agriculture
5. Text Mining
9. KNN Implementation in Python
b
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
path = "https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data"
headernames = ['sepal-length', 'sepal-width', 'petal-length',
'petal-width', 'Class']
dataset = pd.read_csv(path, names = headernames) dataset.head()
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)
#Data Preprocessing
#Split the dataset into 60% training data and 40% of testing data
10. from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 8)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix,
accuracy_score
result = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(result) result1 = classification_report(y_test, y_pred)
print("Classification Report:",)
print (result1)
result2 = accuracy_score(y_test,y_pred)
print("Accuracy:",result2)
#Next, data scaling will be done
#Train the model with the help of KNeighborsClassifier class of sklearn
#Prediction
#Next, print the results as follows