Support Vector Machine.pptx

2
2
2
Presented by,
HARISH NAYAK, G.H.
PALB 9202
Support Vector Machine And it’s
Application in Agriculture
First seminar
on
AGRICULTURAL
STATISTICS

Flow of seminar
1. Introduction
2. SVM
3. Linear and Nonlinear separable case
4. Kernel functions
5. Case study
6. Conclusion
7. References

Introduction
o SVM was introduced by
Vladimir Vapnik in 1995 as a
kernel based machine learning
model for classification and
regression task.
o SVM has been used as a
powerful tool for solving
practical binary classification
problems.
4

What is SVM ?
SVM is a supervised machine learning
algorithm which is mainly used to classify
data into different classes. Unlike most
algorithms, SVM makes use of a hyperplane
which acts like a decision boundary
between the various classes.
5

Features of SVM
1. SVM is a supervised learning
algorithm (i.e., it trains on a set of
labelled data).
o SVM studies the labelled training data
and then classifies any new input data
depending on what it learned in the
training phase.
6

Cont...
2. SVM can be used for both
‘classification’ and ‘regression’
problems.
o However, it is mainly used for
classification problems.
7

Cont...
3. SVM is used for classifying non-linear
data by using the ‘kernel trick’.
8

Principle of SVM
o The formulation of SVM learning is based on the
principle of Structural Risk Minimization[SRM]
(Vapnik, 2000).
o SVM allows to maximize the generalization ability of a
model. This is the objective of the SRM principle that
allows the minimization of a bound on the
generalization error of a model, instead of minimizing
the mean squared error on the set of training data,
which is often used by empirical risk minimization.
9

10
Structural Risk Minimization

There are two cases in the dataset:
1. Linearly separable case: The data can be
separated into two or more classes
clearly. There will be no overlapping (or)
no intersection.
2. Non-Linearly separable case: The data
can not be separated into two or more
classes clearly. There will be overlapping
(or) presence of intersection.
11

Linearly separable case
o Each dataset consists of a pair, an vector 𝑥𝑖
(input vector) and the associated label 𝑦𝑖.
o Let training set 𝑋 be:
𝑥1, 𝑦1 , 𝑥2, 𝑦2 , …, 𝑥𝑛, 𝑦𝑛
i.e., 𝑋 = 𝑥𝑖, 𝑦𝑖 𝑖=1
𝑛
where 𝑥𝑖 ∈ 𝑅𝑑
[d-dimensions]
and 𝑦𝑖∈ +1, −1 .
12

CONT...
o For visualization purpose, we will consider 2-
dimensional input, i.e., 𝑥𝑖 ∈ 𝑅2
o The data are linearly separable.
o There are infinite number of hyperplanes that can
perform the separation of the data.
o Fig.1 shows several decision hyperplanes that
perfectly separate the input data set.
13

Fig.1 Separation hyperplane
𝑿𝟐
𝑿𝟏
Class2
Class1

Cont...
o Among all the hyperplanes, the one with
maximum margin and good
generalization ability will be selected.
o This hyperplane is called “Optimal
separation hyperplane”.(Fig.2)
15

16
Fig.2 Optimal classifier
𝑿𝟏
𝑿𝟐
𝐻1
𝐻
𝐻2
Optimal
separation
hyperplane

Cont...
o The hyperplane that separates the input space is
defined by the equation
𝑤𝑇
𝑥𝑖 + 𝑏 = 0 …(1) [𝑖 = 1, … , 𝑁]
o It can be fitted to correctly classify training
patterns, where
a. The weight vector ‘𝑤’ is normal to the hyperplane,
and defines its orientation,
b. ‘𝑏’ is the bias term and
c. ‘T’ is the transpose
17

Cont...
From Equation (1), the linear classifier (decision function),
given by
𝑦(𝑥)=𝑠𝑖𝑔𝑛( 𝑤𝑇𝑥+𝑏) …(2)
classifying
Class 2 (𝑦𝑖 = +1 if 𝑤𝑇𝑥 + 𝑏 ≥ 0) and
Class 1(𝑦𝑖 = −1 if 𝑤𝑇
𝑥 + 𝑏 ≤ 0) patterns
18

Cont...
𝑤𝑇
𝑥𝑖 + 𝑏 ≥ +1 𝑖𝑓 𝑦𝑖 = +1
𝑤𝑇𝑥𝑖 + 𝑏 ≤ −1 𝑖𝑓 𝑦𝑖 = −1 , 𝑖 = 1,2, … 𝑁
𝑚𝑖𝑛𝑤,𝑏 𝑤. 𝑤 = ||𝑤||2
This can be combined into a single set of equalities:
𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 − 1 ≥ 0, …(∗) 𝑖 = 1,2, … 𝑁
where 𝑁 is the training data size.
o For maximal separation, the hyperplane should be as far away as
possible from each of them.
19

Cont...
Solving the equations:
𝑤𝑇
𝑥1 + 𝑏 = +1 and 𝑤𝑇
𝑥2 + 𝑏 = −1
𝑤𝑇
𝑥1 − 𝑥2 = +2
𝑤
||𝑤||
. 𝑥1 − 𝑥2 =
2
||𝑤||
where 𝑤 = 𝑤𝑇𝑤 = 𝑤1
2
+ 𝑤2
2
+ ⋯ + 𝑤𝑛
2
is the norm of a
vector. (Fig.3)
20

Cont...
o The distance between the hyperplane and
the training data closest to the
hyperplane is called ‘margin’.
o The geometric margin of 𝑥+
y𝑥−
is
𝛾𝑖 =
1
2
𝑤
||𝑤||
. 𝑥+ −
𝑤
||𝑤||
. 𝑥−
=
1
2||𝑤||
=
1
||𝑤||
21

22
Fig.3 Optimal classifier
𝑿𝟏
𝑿𝟐
𝒃
𝒘
The “maximum margin linear
classifier”, the linear classifier
with the maximum margin =
2
||𝑤||

Cont...
o Optimizing the geometric margin means
minimizing the norm of the vector of weights.
o The set of margin-determining training vectors
are called the ‘support vectors’. These are the
data points which are closest to the optimal
hyperplane.
o The solution of an SVM is given only by this
small set of support vectors.
23

Cont...
o Construction of hyperplane is same as
the convex Quadratic Programming(QP)
problem.
o The Lagrangian multipliers and Karush-
Kuhn-Tucker (KKT) complimentary
conditions are used to find the optimal
solution.
24

o Under condition for optimality, QP problem is finally obtained in
the dual space of Lagrange function.
𝐿 𝑤, 𝑏; 𝛼 =
1
2
||𝑤||2 − 𝑖=1
𝑁
𝛼𝑖 𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 − 1 …(3)
where, Lagrange multipliers: 𝛼𝑖 ≥ 0
o Thus by solving the dual QP problem, the decision function from
Eq.(2) can be rewritten as
𝑓 𝑥 = 𝑠𝑖𝑔𝑛 𝑖=1
𝑁
𝛼𝑖𝑦𝑖𝑥𝑇
𝑥𝑖 + 𝑏 …(4)
o It is the set of positive multiplier that influences the classification,
and their corresponding training vectors are called the support
vectors.
25

Limitation of linearly separable case
o The learning problem presented before is valid for
the case where the data is linearly separable, which
means that the training data set has no
intersections.
o However, these problems are rare in the real life.
26

NON-SEPARABLE CASE
o We assumed that data is linearly
separable in the previous case. But,
practically it is not always possible.
o The classes in the data sets will have
overlapping and it is not possible to
classify using linear separation
hyperplane.
27

Cont...
o Cortes and Vapnik (1993) introduced a modified
maximum margin idea called “Soft margin
hyperplanes”.
o In other words, a linear SVM can be refitted to
learn a hyperplane that is tolerable to a small
number of non-separable training data.
o The approach of refitting is called soft margin
approach, where it introduces slack variables 𝜉𝑖
to the inseparable cases.
28

29
Fig.4 Soft margin hyperplane
𝑿𝟏
𝑿𝟐
𝛏1
𝛏2
𝛏3

30
Fig.5 Non linearly classifier
𝑿𝟏
𝑿𝟐

Cont...
o To find a classifier with maximum
margin, the algorithm presented before
should be changed allowing a soft
margin (Fig.4), therefore, it is necessary
to introduce non-negative slack
variables 𝜉𝑖(≥ 0)in the Eq. (*).
𝑦𝑖 𝑤𝑇
𝑥𝑖 + 𝑏 ≥ 1 − 𝜉𝑖 𝑖 = 1,2, … 𝑁
31

Cont...
o Due to the slack variables 𝜉𝑖, the feasible solution always
exist.
o If 0 < 𝜉𝑖 < 1, the training data do not have the maximum
margin, but can be correctly classified.
o 𝐶 is the regularization parameter.
o If the value of 𝐶 = ∞ , then there will be no
misclassification.
o However, for non-linear case it is not so. The problem may
be feasible only for some value 𝐶 < ∞.
32

o The optimization problem, instead of the conditions of the Eq.(*),
the separation hyperplane should satisfy
𝑚𝑖𝑛𝑤,𝑏,𝜉𝑖
𝑤. 𝑤 + 𝐶
𝑖=1
𝑁
𝜉𝑖
2
such that, 𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 ≥ 1 − 𝜉𝑖, 𝑖 = 1,2, … 𝑁, 𝜉𝑖 ≥ 0
𝑤𝑇
𝑥𝑖 + 𝑏 ≥ +1 − 𝜉𝑖, 𝑦𝑖 = +1, 𝜉𝑖 ≥ 0
𝑤𝑇𝑥𝑖 + 𝑏 ≤ −1 + 𝜉𝑖, 𝑦𝑖 = −1, 𝜉𝑖 ≥ 0
o For the maximum soft margin, the original Lagrangian is
𝐿 𝑤, 𝑏, 𝜉𝑖, 𝛼 =
1
2
𝑤. 𝑤 −
𝑁
𝛼𝑖 𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 − 1 + 𝜉𝑖 +
𝐶
2
𝑁
𝜉𝑖
2
33

KERNELS
o In non-linearly separable case, the
classifier may not have a high
generalization ability, even if the
hyperplanes are optimally determined.
o The original input space is transformed
into a highly dimensional space called
“feature space”(refers to n-dimensions).
34

What Kernel Trick does?
o A kernel trick is a simple
method where a nonlinear
data is projected onto a higher
dimension space so as to make
it easier to linearly classify the
data by a plane.
35

36
Low dimensional space Higher dimensional space
KERNEL

Cont...
o A kernel is a function 𝐾, such that for each 𝑥, 𝑧 ∈ 𝑋
𝐾 𝑥, 𝑧 = 𝜙 𝑥 . 𝜙(𝑧)
Where 𝜙 is a mapping of 𝑋 to a feature space F.
o The decision function is
𝑓 𝑥 =
𝑖=1
𝑁
𝛼𝑖𝑦𝑖𝐾 𝑥𝑖. 𝑥𝑗 + 𝑏
37

o A kernel function must satisfy the
following properties, for any 𝑥, 𝑦, 𝑧 ∈ 𝑋
and 𝛼 ∈ 𝑅
1. 𝑥. 𝑥 = 0 𝑜𝑛𝑙𝑦 𝑖𝑓 𝑥 = 0
2. 𝑥. 𝑥 > 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
3. 𝑥. 𝑦 = 𝑦. 𝑥
4. 𝛼𝑥. 𝑦 = 𝛼 𝑥. 𝑦
5. 𝑧 + 𝑥 . 𝑦 = 𝑧. 𝑦 + 𝑥. 𝑦
38

Kernel Functions
1. Linear kernel: 𝐾 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖. 𝑥𝑗
2. Polynomial kernel: 𝐾 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖. 𝑥𝑗 + 1
𝑃
3. Gaussian kernel: 𝐾 𝑥𝑖, 𝑥𝑗 = 𝑒
−||𝑥𝑖−𝑥𝑗||2
2𝜎2
4. RBF kernel: 𝐾 𝑥𝑖, 𝑥𝑗 = 𝑒−𝛾(𝑥𝑖−𝑥𝑗)2
5. Sigmoid kernel: 𝐾 𝑥𝑖, 𝑥𝑗 = tanh 𝜂𝑥𝑖. 𝑥𝑗 + 𝑣
39

40
Linear
Function
Polynomial
Function
Radial Basis
Function

Applications of SVM
1. Face detection
2. Text and hypertext categorization
3. Classification of images
4. Bioinformatics
5. Protein fold and remote homology
detection
6. Handwriting recognition
7. Generalized predictive control (GPC)
41

Advantages of SVM
1. SVM works relatively well when there is a clear margin of
separation between classes.
2. SVM is more effective in high dimensional spaces.
3. SVM is effective in cases where the number of dimensions is
greater than the number of samples.
4. SVM is relatively memory efficient.
42

Case study
Title:
Pest Identification in Leaf Images
using SVM Classifier
43

Architecture:
It consists of various stages such as
image acquisition, preprocessing,
segmentation and accuracy of
infected area. It is calculated by
SVM classifier. The proposed
approach is implemented in
MATLAB.
44
Image Acquisition
Image Preprocessing
Image segmentation
SVM Classifier
Accuracy of infected area

a) Image Acquisition
The data set contains pest infected leaf images. The leaves are infected
with Whiteflies.
A sample leaf image with whiteflies

b) Image pre-processing
Contrast stretching is an image enhancement technique that improves
the contrast in an Image expanding the dynamic range of intensity
values it contains
46

c) Color Based Segmentation Using K-means Clustering
K-Means clustering algorithm is an unsupervised algorithm and it is used to
segment the interested area from the background
There are a lot of applications of the K-mean clustering, range from unsupervised
learning of neural network, Pattern recognitions, Classification analysis, Artificial
intelligent, image processing, machine vision, etc.
47

D) SVM Classifier
o A Support vector machine is a powerful tool for binary classification, capable of generating
very fast classifier function following a training period.
Precision=TP/ (TP+FP) :Refers to the percentage of the results which are relevant.
Recall=TP/ (TP+FN) :Refers to the percentage of total relevant results correctly classified by the
algorithm.
Accuracy = (TP+TN)/ (TD+TN+FP+FN) :Number of correct predictions.
48
Positive (+1) Negative(-1)
Positive (+1) True positive False negative
Negative(-1) False positive True negative

49
Results
Accuracy of pest infected area in leaf image by using svm classifier- I

Conclusion
o Image processing technique plays an important role in the detection
of the pests.
o The pest such as whiteflies, aphids and thrips are very small in size
and infects the leaves.
o The main objective is to detect the pest infected region accuracy in
the leaf image.
o The multiclass svm classifier is used to calculate the accuracy of
infected leaf region.
50

SUMMARY
o SVM is a relatively new algorithm proposed for solving
problems in classification.
o SVM can also be used for prediction purpose.
o Kernel trick is the main advantage of SVM, because of
which it has gained more importance.
o SVM can be used in multiple fields of science
depending upon objectives and application domain.
51

References
CERVANTES, J., GARCIA-LAMONT, F., RODRÍGUEZ-MAZAHUA,
L. AND LOPEZ, A., 2020, A comprehensive survey on support
vector machine classification: Applications, challenges and
trends, Neurocomputing, 408:189-215.
JAKKULA, V., 2006, Tutorial on support vector machine (svm), School
of EECS, Washington State University, 37.
52

Cont...
MOHAN KUMAR, T.L., 2013, Development of Statistical Models using
Nonlinear Support Vector Machines (Doctoral dissertation, IARI-INDIAN
AGRICULTURAL STATISTICS RESEARCH INSTITUTE, NEW DELHI).
RANI, R.U. AND AMSINI, P., 2016, Pest identification in leaf images
using SVM classifier, Int. J. Computational Intelligence and
Informatics, 6(1):248-260.
53

Support Vector Machine.pptx

Recommended

Recommended

More Related Content

Similar to Support Vector Machine.pptx

Similar to Support Vector Machine.pptx (20)

Recently uploaded

Recently uploaded (20)

Support Vector Machine.pptx

Editor's Notes