20MEMECH Part 3- Classification.pdf

2
Classification is a tasks of supervised learning.
It specifies the class to which data elements
belong to.
Two common types of Classification:
Binary (2 classes)
Multi-Class (More than 2 classes)

3
Application:
Social media sentiment analysis has two
potential outcomes, positive or negative, as
displayed by the chart given below.
To find whether an email received is a spam or
not
To find if a bank loan is granted or not
To identify if a student will pass or fail in an
examination
To classify images

4
Types of classification algorithms
(discriminative and generative learning
algorithms)
discriminative learning algorithm tries to find a
straight line (decision boundary) that separates
the classes (e.g. cats and dogs) from each other
Eg. SVM (to be discussed).
generative learning algorithm builds separate
models of each class (cats and dogs) E.g. Naïve
Bayes (to be discussed)

6
Types of Classification Algorithms
Logistic Regression
Naïve Bayes
SupportVector Machines
K-nearest Neighbors (KNN)
DecisionTree Classification
Random Forest
(Assignment)

7
Named because it uses logistic function.
The logistic or sigmoid function is an S-shaped curve that can
take any real-valued number and map it into a value between 0
and 1, but never exactly at those limits.
Logistic regression
SIGMOID FUNCTION

8
Unlike linear regression which outputs
continuous number values, logistic regression
transforms its output using the logistic sigmoid
function to return a probability value which can
then be mapped to two or more discrete
classes.
Linear Regression could help us predict the
student’s test score on a scale of 0 - 100.
Logistic Regression could help use predict
whether the student passed or failed.

9
Types of logistic regression
Binary (example: Pass/Fail)
Multiclass (Example: Cats, Dogs, Sheep)
Ordinal (Example: Low, Medium, High)

11
Python Example: Digits Dataset
The digits dataset is included in scikit-learn.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_digits
digits = load_digits()
print(digits.data.shape)
plt.matshow(digits.images[1796])
plt.show()

12
from sklearn.model_selection import
train_test_split
x_train, x_test, y_train, y_test =
train_test_split(digits.data, digits.target,
test_size=0.25, random_state=0)

13
Scikit-learn 4-Step Modeling Pattern
Step 1. Import the model you want to use
from sklearn.linear_model import LogisticRegression
Step 2. Make an instance of the Model
logisticRegr = LogisticRegression()
Step 3.Training the model on the data, storing the information
learned from the data
logisticRegr.fit(x_train, y_train)
Step 4. Predict the labels of new data
y_pred = logisticRegr.predict(x_test)

14
Model Performance
Confusion matrix and classification report are used to
check model performance.
from sklearn.metrics import classification_report,
confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

15
Confusion Matrix
1. Accuracy
Accuracy = (TP+TN) / (TP+FP+FN+TN)
Ratio of correctly predicted observation to the total
observations.
Accuracy is suitable when you have symmetric datasets where
values of false positive and false negatives are almost same.

16
Is accuracy is good measure for the following
confusion matrix?
Accuracy is suitable for symmetric datasets (i.e. false
positive and false negatives are almost same)

17
Precision is a good measure to use, when the costs of False
Positive is high (e.g. in email spam detection)
2. Precision
Precision = (TP) / (TP+FP)

18
Recall is a good measure to use, when the costs of False
Negative is high (e.g. in fraud detection)
3. Recall
Recall = (TP) / (TP+FN)

19
F1 Score is a better measure to use if we need
balance between Precision and Recall AND there is
an uneven class distribution.
4. F1 Measure

20
ASSIGNMENT 02
Date of submission:
Use Logistic regression model on MNIST
database.
Run four steps of scikit learn
Calculate confusion matrix
Find performance measures

21
Naïve Bayes classifier
Naïve Bayes classifier is a probabilistic algorithm
used for classification. It uses Baye’s theorem of
probability to predict the class of unknown data.
It is a probabilistic algorithm that can be used in a
wide variety of classification tasks.Typical
applications include filtering spam and sentiment
prediction.The word naïve is used because features
are assumed to be independent of each other. Naïve
Bayes is a simple yet powerful and fast algorithm.

24
Will you play or not if it rains, temperature
level is hot, humidity is high and there is light
wind?
X = rain, hot temperature, high humidity, light wind
P(play | X) = P(X | play) · P(play) / P (X)
= P(rain | play) · P(hot temperature | play) · P(high humidity | play) ·
P(light wind | play) · P(play) / P (X)
= (3/9 · 2/9 · 3/9 · 6/9 · 9/14) / (5/14 · 4/14 · 7/14 · 8/14)
= 3.26

25
X = rain, hot temperature, high humidity, light wind
P(don’t play | X) = P(X | don’t play) · P(don’t play) / P (X)
= P(rain | don’t play) · P(hot temperature | don’t play) · P(high
humidity | don’t play) · P(light wind | don’t play) · P(don’t play) / P (X)
= (2/5 · 2/5 · 4/5 · 2/5 · 5/14) / (5/14 · 4/14 · 7/14 · 8/14)
= 0.62

26
Implementation in sklearn
In jupyter notebook

27
Support Vector Machines (SVM)
SVM algorithm finds a hyperplane that classifies
data points.
Hyperplane is a:
point for 1 feature data,
line for 2 feature data,
plane for a 3 feature data
and hyperplane for data with more than 3
features.

28
Consider we have to classify 2 types of objects
(represented by circles and squares below) on
the basis of two features (X1 and X2) .

29
Infinite number of lines may be drawn to classify
them.The optimal hyperplane is shown below).

32
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).

35
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).

The two-feature linearly non-
separable data is shown in fig below.
36

In this case the input space is transformed in to a higher
dimensional space as shown below. The data points are
plotted on the x-axis and z-axis such that
2 2
z x y
 
37

The decision boundary (blue circle) in original input
space looks like below.
38

KERNEL
A kernel transforms a low-dimensional
input space into a higher dimensional
space, i.e. it converts non-separable
problem to separable problems by adding
more dimension to it.
Three types are Kernels are used:
1.Linear Kernel
2.Polynomial Kernel
3.Radial Basis Function Kernel
39

Example:
Classifier Building in Scikit-learn
We will use banknote dataset.This example is available
online at: https://stackabuse.com/implementing-svm-and-
kernel-svm-with-pythons-scikit-learn/
Task is to predict whether a bank currency note is authentic
or not (i.e. binary classification).
Four attributes of the image:
1. skewness
2. variance
3. entropy
4. kurtosis
40

The following script imports required libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Importing the Dataset
The data is available for download at the following link:
https://drive.google.com/file/d/13nw-
uRXPY8XIZQxKRNZ3yYlho-CYm_Qt/view
The detailed information about the data is available at the
following link:
https://archive.ics.uci.edu/ml/datasets/banknote+authenticatio
n
Download the dataset from the Google drive link and store
it locally on your machine.
41

Load dataset:
bankdata = pd.read_csv("D:/Datasets/bill_authentication.csv")
Shape of dataset:
bankdata.shape
To check first five rows:
bankdata.head()
42

Data Preprocessing
Data preprocessing involves
(1) Dividing the data into attributes and
labels and
(2) dividing the data into training and
testing sets.
43

(1) Dividing the data into attributes and
labels
X = bankdata.drop('Class', axis=1) #1
y = bankdata['Class’] #2
#1 The drop() command drops whole column
labeled ‘Class’ (axis=1 means whole column,
not just values are deleted)
#2 Only the class column is being stored in
the y variable.
Now, X variable contains features while y
variable contains corresponding labels.
44

(2) dividing the data into training and testing sets
from sklearn.model_selection import
train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.20)
45

Training the Algorithm
Scikit-Learn svm library, contains built-in classes for
different SVM algorithms.
We will use the support vector classifier (SVC) class.
The fit command of SVC class is called to train the
algorithm on the training data:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
Making Predictions
y_pred = svclassifier.predict(X_test)
46

Evaluating the Algorithm
from sklearn.metrics import
classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
47

48
ASSIGNMENT NO. 1
1.Download any publicly available linearly
separable dataset. Run SVM. Put your
code, dataset and confusion matrix in
single word file.What do you conclude?

20MEMECH Part 3- Classification.pdf

More Related Content

Similar to 20MEMECH Part 3- Classification.pdf

More from MariaKhan905189

Recently uploaded

20MEMECH Part 3- Classification.pdf