PART 3
Classification
1
2
Classification is a tasks of supervised learning.
It specifies the class to which data elements
belong to.
Two common types of Classification:
Binary (2 classes)
Multi-Class (More than 2 classes)
3
Application:
Social media sentiment analysis has two
potential outcomes, positive or negative, as
displayed by the chart given below.
To find whether an email received is a spam or
not
To find if a bank loan is granted or not
To identify if a student will pass or fail in an
examination
To classify images
4
Types of classification algorithms
(discriminative and generative learning
algorithms)
discriminative learning algorithm tries to find a
straight line (decision boundary) that separates
the classes (e.g. cats and dogs) from each other
Eg. SVM (to be discussed).
generative learning algorithm builds separate
models of each class (cats and dogs) E.g. Naïve
Bayes (to be discussed)
5
6
Types of Classification Algorithms
Logistic Regression
Naïve Bayes
SupportVector Machines
K-nearest Neighbors (KNN)
DecisionTree Classification
Random Forest
(Assignment)
7
Named because it uses logistic function.
The logistic or sigmoid function is an S-shaped curve that can
take any real-valued number and map it into a value between 0
and 1, but never exactly at those limits.
Logistic regression
SIGMOID FUNCTION
8
Unlike linear regression which outputs
continuous number values, logistic regression
transforms its output using the logistic sigmoid
function to return a probability value which can
then be mapped to two or more discrete
classes.
Linear Regression could help us predict the
student’s test score on a scale of 0 - 100.
Logistic Regression could help use predict
whether the student passed or failed.
9
Types of logistic regression
Binary (example: Pass/Fail)
Multiclass (Example: Cats, Dogs, Sheep)
Ordinal (Example: Low, Medium, High)
10
Model Building
11
Python Example: Digits Dataset
The digits dataset is included in scikit-learn.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_digits
digits = load_digits()
print(digits.data.shape)
plt.matshow(digits.images[1796])
plt.show()
12
from sklearn.model_selection import
train_test_split
x_train, x_test, y_train, y_test =
train_test_split(digits.data, digits.target,
test_size=0.25, random_state=0)
13
Scikit-learn 4-Step Modeling Pattern
Step 1. Import the model you want to use
from sklearn.linear_model import LogisticRegression
Step 2. Make an instance of the Model
logisticRegr = LogisticRegression()
Step 3.Training the model on the data, storing the information
learned from the data
logisticRegr.fit(x_train, y_train)
Step 4. Predict the labels of new data
y_pred = logisticRegr.predict(x_test)
14
Model Performance
Confusion matrix and classification report are used to
check model performance.
from sklearn.metrics import classification_report,
confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
15
Confusion Matrix
1. Accuracy
Accuracy = (TP+TN) / (TP+FP+FN+TN)
Ratio of correctly predicted observation to the total
observations.
Accuracy is suitable when you have symmetric datasets where
values of false positive and false negatives are almost same.
16
Is accuracy is good measure for the following
confusion matrix?
Accuracy is suitable for symmetric datasets (i.e. false
positive and false negatives are almost same)
17
Precision is a good measure to use, when the costs of False
Positive is high (e.g. in email spam detection)
2. Precision
Precision = (TP) / (TP+FP)
18
Recall is a good measure to use, when the costs of False
Negative is high (e.g. in fraud detection)
3. Recall
Recall = (TP) / (TP+FN)
19
F1 Score is a better measure to use if we need
balance between Precision and Recall AND there is
an uneven class distribution.
4. F1 Measure
20
ASSIGNMENT 02
Date of submission:
Use Logistic regression model on MNIST
database.
Run four steps of scikit learn
Calculate confusion matrix
Find performance measures
21
Naïve Bayes classifier
Naïve Bayes classifier is a probabilistic algorithm
used for classification. It uses Baye’s theorem of
probability to predict the class of unknown data.
It is a probabilistic algorithm that can be used in a
wide variety of classification tasks.Typical
applications include filtering spam and sentiment
prediction.The word naïve is used because features
are assumed to be independent of each other. Naïve
Bayes is a simple yet powerful and fast algorithm.
22
Play-tennis example
23
24
Will you play or not if it rains, temperature
level is hot, humidity is high and there is light
wind?
X = rain, hot temperature, high humidity, light wind
P(play | X) = P(X | play) · P(play) / P (X)
= P(rain | play) · P(hot temperature | play) · P(high humidity | play) ·
P(light wind | play) · P(play) / P (X)
= (3/9 · 2/9 · 3/9 · 6/9 · 9/14) / (5/14 · 4/14 · 7/14 · 8/14)
= 3.26
25
X = rain, hot temperature, high humidity, light wind
P(don’t play | X) = P(X | don’t play) · P(don’t play) / P (X)
= P(rain | don’t play) · P(hot temperature | don’t play) · P(high
humidity | don’t play) · P(light wind | don’t play) · P(don’t play) / P (X)
= (2/5 · 2/5 · 4/5 · 2/5 · 5/14) / (5/14 · 4/14 · 7/14 · 8/14)
= 0.62
26
Implementation in sklearn
In jupyter notebook
27
Support Vector Machines (SVM)
SVM algorithm finds a hyperplane that classifies
data points.
Hyperplane is a:
point for 1 feature data,
line for 2 feature data,
plane for a 3 feature data
and hyperplane for data with more than 3
features.
28
Consider we have to classify 2 types of objects
(represented by circles and squares below) on
the basis of two features (X1 and X2) .
29
Infinite number of lines may be drawn to classify
them.The optimal hyperplane is shown below).
30
31
32
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).
33
34
35
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).
The two-feature linearly non-
separable data is shown in fig below.
36
In this case the input space is transformed in to a higher
dimensional space as shown below. The data points are
plotted on the x-axis and z-axis such that
2 2
z x y
 
37
The decision boundary (blue circle) in original input
space looks like below.
38
KERNEL
A kernel transforms a low-dimensional
input space into a higher dimensional
space, i.e. it converts non-separable
problem to separable problems by adding
more dimension to it.
Three types are Kernels are used:
1.Linear Kernel
2.Polynomial Kernel
3.Radial Basis Function Kernel
39
Example:
Classifier Building in Scikit-learn
We will use banknote dataset.This example is available
online at: https://stackabuse.com/implementing-svm-and-
kernel-svm-with-pythons-scikit-learn/
Task is to predict whether a bank currency note is authentic
or not (i.e. binary classification).
Four attributes of the image:
1. skewness
2. variance
3. entropy
4. kurtosis
40
The following script imports required libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Importing the Dataset
The data is available for download at the following link:
https://drive.google.com/file/d/13nw-
uRXPY8XIZQxKRNZ3yYlho-CYm_Qt/view
The detailed information about the data is available at the
following link:
https://archive.ics.uci.edu/ml/datasets/banknote+authenticatio
n
Download the dataset from the Google drive link and store
it locally on your machine.
41
Load dataset:
bankdata = pd.read_csv("D:/Datasets/bill_authentication.csv")
Shape of dataset:
bankdata.shape
To check first five rows:
bankdata.head()
42
Data Preprocessing
Data preprocessing involves
(1) Dividing the data into attributes and
labels and
(2) dividing the data into training and
testing sets.
43
(1) Dividing the data into attributes and
labels
X = bankdata.drop('Class', axis=1) #1
y = bankdata['Class’] #2
#1 The drop() command drops whole column
labeled ‘Class’ (axis=1 means whole column,
not just values are deleted)
#2 Only the class column is being stored in
the y variable.
Now, X variable contains features while y
variable contains corresponding labels.
44
(2) dividing the data into training and testing sets
from sklearn.model_selection import
train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.20)
45
Training the Algorithm
Scikit-Learn svm library, contains built-in classes for
different SVM algorithms.
We will use the support vector classifier (SVC) class.
The fit command of SVC class is called to train the
algorithm on the training data:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
Making Predictions
y_pred = svclassifier.predict(X_test)
46
Evaluating the Algorithm
from sklearn.metrics import
classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
47
48
ASSIGNMENT NO. 1
1.Download any publicly available linearly
separable dataset. Run SVM. Put your
code, dataset and confusion matrix in
single word file.What do you conclude?
49
THE END

20MEMECH Part 3- Classification.pdf

  • 1.
  • 2.
    2 Classification is atasks of supervised learning. It specifies the class to which data elements belong to. Two common types of Classification: Binary (2 classes) Multi-Class (More than 2 classes)
  • 3.
    3 Application: Social media sentimentanalysis has two potential outcomes, positive or negative, as displayed by the chart given below. To find whether an email received is a spam or not To find if a bank loan is granted or not To identify if a student will pass or fail in an examination To classify images
  • 4.
    4 Types of classificationalgorithms (discriminative and generative learning algorithms) discriminative learning algorithm tries to find a straight line (decision boundary) that separates the classes (e.g. cats and dogs) from each other Eg. SVM (to be discussed). generative learning algorithm builds separate models of each class (cats and dogs) E.g. Naïve Bayes (to be discussed)
  • 5.
  • 6.
    6 Types of ClassificationAlgorithms Logistic Regression Naïve Bayes SupportVector Machines K-nearest Neighbors (KNN) DecisionTree Classification Random Forest (Assignment)
  • 7.
    7 Named because ituses logistic function. The logistic or sigmoid function is an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits. Logistic regression SIGMOID FUNCTION
  • 8.
    8 Unlike linear regressionwhich outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes. Linear Regression could help us predict the student’s test score on a scale of 0 - 100. Logistic Regression could help use predict whether the student passed or failed.
  • 9.
    9 Types of logisticregression Binary (example: Pass/Fail) Multiclass (Example: Cats, Dogs, Sheep) Ordinal (Example: Low, Medium, High)
  • 10.
  • 11.
    11 Python Example: DigitsDataset The digits dataset is included in scikit-learn. import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline from sklearn.datasets import load_digits digits = load_digits() print(digits.data.shape) plt.matshow(digits.images[1796]) plt.show()
  • 12.
    12 from sklearn.model_selection import train_test_split x_train,x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)
  • 13.
    13 Scikit-learn 4-Step ModelingPattern Step 1. Import the model you want to use from sklearn.linear_model import LogisticRegression Step 2. Make an instance of the Model logisticRegr = LogisticRegression() Step 3.Training the model on the data, storing the information learned from the data logisticRegr.fit(x_train, y_train) Step 4. Predict the labels of new data y_pred = logisticRegr.predict(x_test)
  • 14.
    14 Model Performance Confusion matrixand classification report are used to check model performance. from sklearn.metrics import classification_report, confusion_matrix print(confusion_matrix(y_test,y_pred)) print(classification_report(y_test,y_pred))
  • 15.
    15 Confusion Matrix 1. Accuracy Accuracy= (TP+TN) / (TP+FP+FN+TN) Ratio of correctly predicted observation to the total observations. Accuracy is suitable when you have symmetric datasets where values of false positive and false negatives are almost same.
  • 16.
    16 Is accuracy isgood measure for the following confusion matrix? Accuracy is suitable for symmetric datasets (i.e. false positive and false negatives are almost same)
  • 17.
    17 Precision is agood measure to use, when the costs of False Positive is high (e.g. in email spam detection) 2. Precision Precision = (TP) / (TP+FP)
  • 18.
    18 Recall is agood measure to use, when the costs of False Negative is high (e.g. in fraud detection) 3. Recall Recall = (TP) / (TP+FN)
  • 19.
    19 F1 Score isa better measure to use if we need balance between Precision and Recall AND there is an uneven class distribution. 4. F1 Measure
  • 20.
    20 ASSIGNMENT 02 Date ofsubmission: Use Logistic regression model on MNIST database. Run four steps of scikit learn Calculate confusion matrix Find performance measures
  • 21.
    21 Naïve Bayes classifier NaïveBayes classifier is a probabilistic algorithm used for classification. It uses Baye’s theorem of probability to predict the class of unknown data. It is a probabilistic algorithm that can be used in a wide variety of classification tasks.Typical applications include filtering spam and sentiment prediction.The word naïve is used because features are assumed to be independent of each other. Naïve Bayes is a simple yet powerful and fast algorithm.
  • 22.
  • 23.
  • 24.
    24 Will you playor not if it rains, temperature level is hot, humidity is high and there is light wind? X = rain, hot temperature, high humidity, light wind P(play | X) = P(X | play) · P(play) / P (X) = P(rain | play) · P(hot temperature | play) · P(high humidity | play) · P(light wind | play) · P(play) / P (X) = (3/9 · 2/9 · 3/9 · 6/9 · 9/14) / (5/14 · 4/14 · 7/14 · 8/14) = 3.26
  • 25.
    25 X = rain,hot temperature, high humidity, light wind P(don’t play | X) = P(X | don’t play) · P(don’t play) / P (X) = P(rain | don’t play) · P(hot temperature | don’t play) · P(high humidity | don’t play) · P(light wind | don’t play) · P(don’t play) / P (X) = (2/5 · 2/5 · 4/5 · 2/5 · 5/14) / (5/14 · 4/14 · 7/14 · 8/14) = 0.62
  • 26.
  • 27.
    27 Support Vector Machines(SVM) SVM algorithm finds a hyperplane that classifies data points. Hyperplane is a: point for 1 feature data, line for 2 feature data, plane for a 3 feature data and hyperplane for data with more than 3 features.
  • 28.
    28 Consider we haveto classify 2 types of objects (represented by circles and squares below) on the basis of two features (X1 and X2) .
  • 29.
    29 Infinite number oflines may be drawn to classify them.The optimal hyperplane is shown below).
  • 30.
  • 31.
  • 32.
    32 Consider the casewhen data cannot be linearly separable. For example, the Low and high amounts of a drug didn’t cured the disease (red dots).
  • 33.
  • 34.
  • 35.
    35 Consider the casewhen data cannot be linearly separable. For example, the Low and high amounts of a drug didn’t cured the disease (red dots).
  • 36.
    The two-feature linearlynon- separable data is shown in fig below. 36
  • 37.
    In this casethe input space is transformed in to a higher dimensional space as shown below. The data points are plotted on the x-axis and z-axis such that 2 2 z x y   37
  • 38.
    The decision boundary(blue circle) in original input space looks like below. 38
  • 39.
    KERNEL A kernel transformsa low-dimensional input space into a higher dimensional space, i.e. it converts non-separable problem to separable problems by adding more dimension to it. Three types are Kernels are used: 1.Linear Kernel 2.Polynomial Kernel 3.Radial Basis Function Kernel 39
  • 40.
    Example: Classifier Building inScikit-learn We will use banknote dataset.This example is available online at: https://stackabuse.com/implementing-svm-and- kernel-svm-with-pythons-scikit-learn/ Task is to predict whether a bank currency note is authentic or not (i.e. binary classification). Four attributes of the image: 1. skewness 2. variance 3. entropy 4. kurtosis 40
  • 41.
    The following scriptimports required libraries: import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline Importing the Dataset The data is available for download at the following link: https://drive.google.com/file/d/13nw- uRXPY8XIZQxKRNZ3yYlho-CYm_Qt/view The detailed information about the data is available at the following link: https://archive.ics.uci.edu/ml/datasets/banknote+authenticatio n Download the dataset from the Google drive link and store it locally on your machine. 41
  • 42.
    Load dataset: bankdata =pd.read_csv("D:/Datasets/bill_authentication.csv") Shape of dataset: bankdata.shape To check first five rows: bankdata.head() 42
  • 43.
    Data Preprocessing Data preprocessinginvolves (1) Dividing the data into attributes and labels and (2) dividing the data into training and testing sets. 43
  • 44.
    (1) Dividing thedata into attributes and labels X = bankdata.drop('Class', axis=1) #1 y = bankdata['Class’] #2 #1 The drop() command drops whole column labeled ‘Class’ (axis=1 means whole column, not just values are deleted) #2 Only the class column is being stored in the y variable. Now, X variable contains features while y variable contains corresponding labels. 44
  • 45.
    (2) dividing thedata into training and testing sets from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20) 45
  • 46.
    Training the Algorithm Scikit-Learnsvm library, contains built-in classes for different SVM algorithms. We will use the support vector classifier (SVC) class. The fit command of SVC class is called to train the algorithm on the training data: from sklearn.svm import SVC svclassifier = SVC(kernel='linear') svclassifier.fit(X_train, y_train) Making Predictions y_pred = svclassifier.predict(X_test) 46
  • 47.
    Evaluating the Algorithm fromsklearn.metrics import classification_report, confusion_matrix print(confusion_matrix(y_test,y_pred)) print(classification_report(y_test,y_pred)) 47
  • 48.
    48 ASSIGNMENT NO. 1 1.Downloadany publicly available linearly separable dataset. Run SVM. Put your code, dataset and confusion matrix in single word file.What do you conclude?
  • 49.