SlideShare a Scribd company logo
1 of 31
Download to read offline
Support Vector Machines
October 23, 2019
Amit Praseed Classification October 23, 2019 1 / 31
Introduction
Support Vector Machines (SVMs) are arguably the most famous ma-
chine learning tool.
They are extensively used in text and image classification.
Amit Praseed Classification October 23, 2019 2 / 31
Linear Separability
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Amit Praseed Classification October 23, 2019 3 / 31
Linear Inseparability (The XOR Problem)
0 1
1
x
y
Amit Praseed Classification October 23, 2019 4 / 31
SVM for Linearly Separable Problems
For now, let us consider only
linearly separable cases.
The only thing SVM does
is find the hyperplane (in n-
dimensions; in two dimensions
it simply finds the line) that
separates the two classes.
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Amit Praseed Classification October 23, 2019 5 / 31
Which is the Optimal Hyperplane?
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Amit Praseed Classification October 23, 2019 6 / 31
The Optimal Hyperplane
The optimal Hyperplane should be maximally separated from the clos-
est data points (”maximize the margin”).
The optimal hyperplane should completely separate the data points
into two perfectly pure classes.
Amit Praseed Classification October 23, 2019 7 / 31
SVM in 2 Dimensions
In two dimensions, the problem reduces to finding the optimal line sep-
arating the two data classes. The conditions for the optimal hyperplane
still hold in two dimensions.
Let the line we have to find be y = ax + b.
y = ax + b
ax − y + b = 0
Let X = [x y] and W = [a − 1], then the equation for the line
becomes
W .X + b = 0
which incidentally is the equation for a hyperplane as well, when gen-
eralized to n dimensions. Here W is a vector perpendicular to the
plane.
Amit Praseed Classification October 23, 2019 8 / 31
SVM in 2 Dimensions
Let us denote the two classes by
integer numbers. The SVM will
output -1 for all the blue data
points and +1 for the red data
points.
It can also be noted that for all
blue points which are on the left
of the separating line, W .X +
b ≤ 0. For all the red points,
W .X + b ≥ 0.
Denote
h(xi ) =
+1, W .X + b ≥ 0
−1, W .X + b ≤ 0
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
xy
Amit Praseed Classification October 23, 2019 9 / 31
SVM in 2 Dimensions
The sign of W .X + b gives the class label.
The magnitude of W .X + b gives the distance of the data point from
the separating line.
Let βi denote the distance from a data point to the separating hyper-
plane P. Then the closest point to the plane P will have the lowest
value of β. Let us denote this value as the functional margin F.
However, the distance measures on the blue side will be negative while
on the red side they will be positive!!! To avoid the difficulty in com-
paring, we can simply multiply with the class labels, because all blue
points have class labels as -1 and all red points have class labels +1.
F = min βi
F = min yi (W .Xi + b)
Amit Praseed Classification October 23, 2019 10 / 31
SVM in 2 Dimensions
From a potentially infinite number of hyperplanes, an SVM selects the
one which maximizes the margin, or in other words which maximally
separates the two closest data points to it.
Put in a mathematical sense, SVM selects the hyperplane which max-
imizes the functional margin.
However, the functional margin is not scale invariant.
For instance consider two planes given by W1 = (2, 1), b1 = 5 and
W2 = (20, 10), b1 = 50. Essentially they represent the same plane
because W is a vector orthogonal to the plane P, and the only thing
that matters is the direction of W . However, the functional margin
still multiplies the distance of a data point from P by a factor of 10,
which renders it ineffective.
We compute a scale invariant distance measure, given by
γi = yi (
W
|W |
.Xi +
b
|W |
)
Amit Praseed Classification October 23, 2019 11 / 31
SVM in 2 Dimensions
We define a scale invariant geometric margin as
M = min γi
M = min yi (
W
|W |
.Xi +
b
|W |
)
The SVM problem reduces to finding a hyperplane with the maximum
geometric margin, or in other words,
max M
subject to γi ≥ M, i = 1, 2.. m
Amit Praseed Classification October 23, 2019 12 / 31
SVM in 2 Dimensions
Now, we can rewrite the above problem by recalling that M = F
|W |
.
max
F
|W |
subject to
fi
|W |
≥
F
|W |
, i = 1, 2.. m
Since we are maximizing the geometric margin, which is scale invariant,
we can set W and b to make F = 1.
max
1
|W |
subject to fi ≥ 1, i = 1, 2.. m
Amit Praseed Classification October 23, 2019 13 / 31
SVM in 2 Dimensions
We can express the same as a minimization problem
min |W |
subject to yi (W .Xi + b) ≥ 1, i = 1, 2.. m
The final optimization problem seen in literature is a simple modifica-
tion of this
min
1
2
|W |2
subject to yi (W .Xi + b) − 1 ≥ 0, i = 1, 2.. m
This problem resembles the Convex Quadratic Optimization Problem
which can be solved using Lagrange multipliers.
Amit Praseed Classification October 23, 2019 14 / 31
Issues with Hard Margin SVMs
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Very Small Margin due to Outliers
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Linearly Inseparable due to Outliers
Amit Praseed Classification October 23, 2019 15 / 31
Soft Margin SVMs
Hard Margin SVMs have a strict rule that all of the training dataset
must be classified properly.
Soft Margin SVMs allow some data points in the training data set to
be misclassified if it allows for a better margin and a better decision
boundary.
They introduce a slack variable to allow for this, so that the constraint
becomes
yi (W .Xi + b) ≥ 1 − ζi , i = 1, 2.. m
The complete minimization problem for a soft margin SVM becomes
min
1
2
|W |2
+
m
i=1
ζi
subject to yi (W .Xi + b) ≥ 1 − ζi , i = 1, 2.. m
The additional term of m
i=1 ζi in the minimization function is meant to
penalize the choice of a large ζ value which could lead to all constraints
being satisfied, but would lead to more misclassification error.
Amit Praseed Classification October 23, 2019 16 / 31
A Look Back at Linear Separability
−4 −3 −2 −1 1 2 3 4
−3
−2
−1
1
2
3
4
5
6
7
8
9
10
x
y
Amit Praseed Classification October 23, 2019 17 / 31
The Kernel Trick
A data set that appears to be linearly inseparable in a lower dimen-
sional space can be made linearly separable by projecting onto a higher
dimensional space.
Instead of transforming all data points into the new higher dimensional
space, SVM uses a kernel that computes the distances between the
data points as if they belong to a higher dimensional space. This is
often called the Kernel Trick.
For the earlier dataset, let us apply a mapping φ : R → R2
φ(x, y) = (x, x2
)
Amit Praseed Classification October 23, 2019 18 / 31
Linearly Separable in 2D
−4 −3 −2 −1 1 2 3 4
−3
−2
−1
1
2
3
4
5
6
7
8
9
10
x
y
Amit Praseed Classification October 23, 2019 19 / 31
Another Example
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
x
y
φ(x, y) = (x2
,
√
2xy, y2
)
Amit Praseed Classification October 23, 2019 20 / 31
The Kernel Trick
The data when projected in 3 dimensions is linearly separable
Amit Praseed Classification October 23, 2019 21 / 31
The Kernel Trick
Amit Praseed Classification October 23, 2019 22 / 31
Types of Kernels
Linear Kernel
Polynomial Kernel
RBF or Gaussian Kernel : An RBF Kernel theoretically transforms data
into an infinite dimensional space for classification. It is usually rec-
ommended to try classification using an RBF kernel because it usually
gives good results.
Amit Praseed Classification October 23, 2019 23 / 31
Multi-class Classification using SVM
SVMs are primarily built for binary classification.
However, two approaches can be used to extend an SVM to classify
multiple classes.
One against All
One against One
Amit Praseed Classification October 23, 2019 24 / 31
One against All SVM
In One against All SVM, an n class classification problem is decomposed
into n binary classification problems.
For example, a problem involving three classes A,B and C is decom-
posed into three binary classifiers:
A binary classifier to recognize A
A binary classifier to recognize B
A binary classifier to recognize C
However, this approach can lead to inconsistencies.
Amit Praseed Classification October 23, 2019 25 / 31
One against All SVM
Amit Praseed Classification October 23, 2019 26 / 31
One against All leads to Ambiguity
Data in the blue regions will be classified into two classes simultaneously
Amit Praseed Classification October 23, 2019 27 / 31
One against One SVM
In One against One SVM, instead of pitting one class against all the
others, classes are pitted against each other.
This leads to a total of k(k−1)
2 classifiers for a k-class classification
problem.
The class of the data point is decided by a simple majority voting.
Amit Praseed Classification October 23, 2019 28 / 31
One against One SVM
One against one still leads to ambiguity if two classes get the same
amount of votes.
Amit Praseed Classification October 23, 2019 29 / 31
Anomaly Detection
The fundamental notion of classification is that there are two or more
classes that are well represented in the training data, and the classifier
must assign one of the data labels to the incoming testing data.
However in a number of scenarios, only one class of data is available.
The other class is either not available at all, or has very few data
samples.
Eg: Security related domains, for example detecting cyber-attacks, or
detecting abnormal program execution.
In such scenarios, obtaining the training data corresponding to attacks
or abnormal runs is expensive or infeasible.
The commonly used approach is to train the classifier on the available
data which models the normal data, and use the classifier to identify
any anomalies in the testing data.
This is often called anomaly detection.
Amit Praseed Classification October 23, 2019 30 / 31
One Class SVMs
SVMs can be used for anomaly detection by modifying the constraints
and the objective function.
Instead of maximizing the margin between classes as in a normal SVM,
the One Class SVM tries to form a hypersphere (with radius R and
centre a) around the training data, and tries to minimize the radius of
the sphere.
min R2
+ C
N
i=1
ζi
subject to
||xi − a||2
≤ R2
+ ζi
ζi ≥ 0
Amit Praseed Classification October 23, 2019 31 / 31

More Related Content

What's hot

Number and number theory
Number and number theoryNumber and number theory
Number and number theoryMaria BREEN
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVMHonglin Yu
 
Igcse core papers 2002 2014
Igcse core papers 2002 2014Igcse core papers 2002 2014
Igcse core papers 2002 2014Roelrocks
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Image representation
Image representationImage representation
Image representationRahul Dadwal
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answersFiras Husseini
 
FREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICS
FREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICSFREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICS
FREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICSAirlangga University , Indonesia
 
Chapter 02
Chapter 02Chapter 02
Chapter 02bmcfad01
 
0580_w13_qp_21
0580_w13_qp_210580_w13_qp_21
0580_w13_qp_21King Ali
 
0580_w13_qp_41
0580_w13_qp_410580_w13_qp_41
0580_w13_qp_41King Ali
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
0580_w13_qp_43
0580_w13_qp_430580_w13_qp_43
0580_w13_qp_43King Ali
 
Chapter5 data handling grade 8 cbse
Chapter5  data handling grade 8 cbseChapter5  data handling grade 8 cbse
Chapter5 data handling grade 8 cbsehtanny
 
0580_w14_qp_42
0580_w14_qp_420580_w14_qp_42
0580_w14_qp_42King Ali
 
0580 s13 qp_42
0580 s13 qp_420580 s13 qp_42
0580 s13 qp_42King Ali
 
0580_s13_qp_42
0580_s13_qp_420580_s13_qp_42
0580_s13_qp_42King Ali
 
Gr3 journal
Gr3 journalGr3 journal
Gr3 journalsusan70
 

What's hot (20)

Number and number theory
Number and number theoryNumber and number theory
Number and number theory
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVM
 
Thesis_Draft_Nurkhaidarov
Thesis_Draft_NurkhaidarovThesis_Draft_Nurkhaidarov
Thesis_Draft_Nurkhaidarov
 
Igcse core papers 2002 2014
Igcse core papers 2002 2014Igcse core papers 2002 2014
Igcse core papers 2002 2014
 
Principal Sensitivity Analysis
Principal Sensitivity AnalysisPrincipal Sensitivity Analysis
Principal Sensitivity Analysis
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Image representation
Image representationImage representation
Image representation
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answers
 
FREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICS
FREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICSFREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICS
FREQUENCY DISTRIBUTION ( distribusi frekuensi) - STATISTICS
 
Chapter 02
Chapter 02Chapter 02
Chapter 02
 
0580_w13_qp_21
0580_w13_qp_210580_w13_qp_21
0580_w13_qp_21
 
0580_w13_qp_41
0580_w13_qp_410580_w13_qp_41
0580_w13_qp_41
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
0580_w13_qp_43
0580_w13_qp_430580_w13_qp_43
0580_w13_qp_43
 
Chapter5 data handling grade 8 cbse
Chapter5  data handling grade 8 cbseChapter5  data handling grade 8 cbse
Chapter5 data handling grade 8 cbse
 
0580_w14_qp_42
0580_w14_qp_420580_w14_qp_42
0580_w14_qp_42
 
0580 s13 qp_42
0580 s13 qp_420580 s13 qp_42
0580 s13 qp_42
 
0580_s13_qp_42
0580_s13_qp_420580_s13_qp_42
0580_s13_qp_42
 
Gr3 journal
Gr3 journalGr3 journal
Gr3 journal
 

Similar to Support Vector Machines (SVM)

Tutorial on Support Vector Machine
Tutorial on Support Vector MachineTutorial on Support Vector Machine
Tutorial on Support Vector MachineLoc Nguyen
 
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...ijfls
 
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisDr.Pooja Jain
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptxsurbhidutta4
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector MachinesSilicon Mentor
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector MachinesVu Dat
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...gerogepatton
 
Chapter 4 Simplex Method ppt
Chapter 4  Simplex Method pptChapter 4  Simplex Method ppt
Chapter 4 Simplex Method pptDereje Tigabu
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
New approaches for boosting to uniformity
New approaches for boosting to uniformityNew approaches for boosting to uniformity
New approaches for boosting to uniformityNikita Kazeev
 

Similar to Support Vector Machines (SVM) (20)

SVM.ppt
SVM.pptSVM.ppt
SVM.ppt
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
Tutorial on Support Vector Machine
Tutorial on Support Vector MachineTutorial on Support Vector Machine
Tutorial on Support Vector Machine
 
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
 
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
Chapter 4 Simplex Method ppt
Chapter 4  Simplex Method pptChapter 4  Simplex Method ppt
Chapter 4 Simplex Method ppt
 
SVM.pdf
SVM.pdfSVM.pdf
SVM.pdf
 
Km2417821785
Km2417821785Km2417821785
Km2417821785
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
support vector machine
support vector machinesupport vector machine
support vector machine
 
Unit 2 lpp tp
Unit 2 lpp tpUnit 2 lpp tp
Unit 2 lpp tp
 
New approaches for boosting to uniformity
New approaches for boosting to uniformityNew approaches for boosting to uniformity
New approaches for boosting to uniformity
 

More from amitpraseed

Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysisamitpraseed
 
Perceptron Learning
Perceptron LearningPerceptron Learning
Perceptron Learningamitpraseed
 
Introduction to Classification
Introduction to ClassificationIntroduction to Classification
Introduction to Classificationamitpraseed
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionamitpraseed
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksamitpraseed
 
Bayesianclassifiers
BayesianclassifiersBayesianclassifiers
Bayesianclassifiersamitpraseed
 

More from amitpraseed (7)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Perceptron Learning
Perceptron LearningPerceptron Learning
Perceptron Learning
 
Introduction to Classification
Introduction to ClassificationIntroduction to Classification
Introduction to Classification
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Bayesianclassifiers
BayesianclassifiersBayesianclassifiers
Bayesianclassifiers
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 

Support Vector Machines (SVM)

  • 1. Support Vector Machines October 23, 2019 Amit Praseed Classification October 23, 2019 1 / 31
  • 2. Introduction Support Vector Machines (SVMs) are arguably the most famous ma- chine learning tool. They are extensively used in text and image classification. Amit Praseed Classification October 23, 2019 2 / 31
  • 3. Linear Separability 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Amit Praseed Classification October 23, 2019 3 / 31
  • 4. Linear Inseparability (The XOR Problem) 0 1 1 x y Amit Praseed Classification October 23, 2019 4 / 31
  • 5. SVM for Linearly Separable Problems For now, let us consider only linearly separable cases. The only thing SVM does is find the hyperplane (in n- dimensions; in two dimensions it simply finds the line) that separates the two classes. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Amit Praseed Classification October 23, 2019 5 / 31
  • 6. Which is the Optimal Hyperplane? 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Amit Praseed Classification October 23, 2019 6 / 31
  • 7. The Optimal Hyperplane The optimal Hyperplane should be maximally separated from the clos- est data points (”maximize the margin”). The optimal hyperplane should completely separate the data points into two perfectly pure classes. Amit Praseed Classification October 23, 2019 7 / 31
  • 8. SVM in 2 Dimensions In two dimensions, the problem reduces to finding the optimal line sep- arating the two data classes. The conditions for the optimal hyperplane still hold in two dimensions. Let the line we have to find be y = ax + b. y = ax + b ax − y + b = 0 Let X = [x y] and W = [a − 1], then the equation for the line becomes W .X + b = 0 which incidentally is the equation for a hyperplane as well, when gen- eralized to n dimensions. Here W is a vector perpendicular to the plane. Amit Praseed Classification October 23, 2019 8 / 31
  • 9. SVM in 2 Dimensions Let us denote the two classes by integer numbers. The SVM will output -1 for all the blue data points and +1 for the red data points. It can also be noted that for all blue points which are on the left of the separating line, W .X + b ≤ 0. For all the red points, W .X + b ≥ 0. Denote h(xi ) = +1, W .X + b ≥ 0 −1, W .X + b ≤ 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 xy Amit Praseed Classification October 23, 2019 9 / 31
  • 10. SVM in 2 Dimensions The sign of W .X + b gives the class label. The magnitude of W .X + b gives the distance of the data point from the separating line. Let βi denote the distance from a data point to the separating hyper- plane P. Then the closest point to the plane P will have the lowest value of β. Let us denote this value as the functional margin F. However, the distance measures on the blue side will be negative while on the red side they will be positive!!! To avoid the difficulty in com- paring, we can simply multiply with the class labels, because all blue points have class labels as -1 and all red points have class labels +1. F = min βi F = min yi (W .Xi + b) Amit Praseed Classification October 23, 2019 10 / 31
  • 11. SVM in 2 Dimensions From a potentially infinite number of hyperplanes, an SVM selects the one which maximizes the margin, or in other words which maximally separates the two closest data points to it. Put in a mathematical sense, SVM selects the hyperplane which max- imizes the functional margin. However, the functional margin is not scale invariant. For instance consider two planes given by W1 = (2, 1), b1 = 5 and W2 = (20, 10), b1 = 50. Essentially they represent the same plane because W is a vector orthogonal to the plane P, and the only thing that matters is the direction of W . However, the functional margin still multiplies the distance of a data point from P by a factor of 10, which renders it ineffective. We compute a scale invariant distance measure, given by γi = yi ( W |W | .Xi + b |W | ) Amit Praseed Classification October 23, 2019 11 / 31
  • 12. SVM in 2 Dimensions We define a scale invariant geometric margin as M = min γi M = min yi ( W |W | .Xi + b |W | ) The SVM problem reduces to finding a hyperplane with the maximum geometric margin, or in other words, max M subject to γi ≥ M, i = 1, 2.. m Amit Praseed Classification October 23, 2019 12 / 31
  • 13. SVM in 2 Dimensions Now, we can rewrite the above problem by recalling that M = F |W | . max F |W | subject to fi |W | ≥ F |W | , i = 1, 2.. m Since we are maximizing the geometric margin, which is scale invariant, we can set W and b to make F = 1. max 1 |W | subject to fi ≥ 1, i = 1, 2.. m Amit Praseed Classification October 23, 2019 13 / 31
  • 14. SVM in 2 Dimensions We can express the same as a minimization problem min |W | subject to yi (W .Xi + b) ≥ 1, i = 1, 2.. m The final optimization problem seen in literature is a simple modifica- tion of this min 1 2 |W |2 subject to yi (W .Xi + b) − 1 ≥ 0, i = 1, 2.. m This problem resembles the Convex Quadratic Optimization Problem which can be solved using Lagrange multipliers. Amit Praseed Classification October 23, 2019 14 / 31
  • 15. Issues with Hard Margin SVMs 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Very Small Margin due to Outliers 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Linearly Inseparable due to Outliers Amit Praseed Classification October 23, 2019 15 / 31
  • 16. Soft Margin SVMs Hard Margin SVMs have a strict rule that all of the training dataset must be classified properly. Soft Margin SVMs allow some data points in the training data set to be misclassified if it allows for a better margin and a better decision boundary. They introduce a slack variable to allow for this, so that the constraint becomes yi (W .Xi + b) ≥ 1 − ζi , i = 1, 2.. m The complete minimization problem for a soft margin SVM becomes min 1 2 |W |2 + m i=1 ζi subject to yi (W .Xi + b) ≥ 1 − ζi , i = 1, 2.. m The additional term of m i=1 ζi in the minimization function is meant to penalize the choice of a large ζ value which could lead to all constraints being satisfied, but would lead to more misclassification error. Amit Praseed Classification October 23, 2019 16 / 31
  • 17. A Look Back at Linear Separability −4 −3 −2 −1 1 2 3 4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 x y Amit Praseed Classification October 23, 2019 17 / 31
  • 18. The Kernel Trick A data set that appears to be linearly inseparable in a lower dimen- sional space can be made linearly separable by projecting onto a higher dimensional space. Instead of transforming all data points into the new higher dimensional space, SVM uses a kernel that computes the distances between the data points as if they belong to a higher dimensional space. This is often called the Kernel Trick. For the earlier dataset, let us apply a mapping φ : R → R2 φ(x, y) = (x, x2 ) Amit Praseed Classification October 23, 2019 18 / 31
  • 19. Linearly Separable in 2D −4 −3 −2 −1 1 2 3 4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 x y Amit Praseed Classification October 23, 2019 19 / 31
  • 20. Another Example 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 x y φ(x, y) = (x2 , √ 2xy, y2 ) Amit Praseed Classification October 23, 2019 20 / 31
  • 21. The Kernel Trick The data when projected in 3 dimensions is linearly separable Amit Praseed Classification October 23, 2019 21 / 31
  • 22. The Kernel Trick Amit Praseed Classification October 23, 2019 22 / 31
  • 23. Types of Kernels Linear Kernel Polynomial Kernel RBF or Gaussian Kernel : An RBF Kernel theoretically transforms data into an infinite dimensional space for classification. It is usually rec- ommended to try classification using an RBF kernel because it usually gives good results. Amit Praseed Classification October 23, 2019 23 / 31
  • 24. Multi-class Classification using SVM SVMs are primarily built for binary classification. However, two approaches can be used to extend an SVM to classify multiple classes. One against All One against One Amit Praseed Classification October 23, 2019 24 / 31
  • 25. One against All SVM In One against All SVM, an n class classification problem is decomposed into n binary classification problems. For example, a problem involving three classes A,B and C is decom- posed into three binary classifiers: A binary classifier to recognize A A binary classifier to recognize B A binary classifier to recognize C However, this approach can lead to inconsistencies. Amit Praseed Classification October 23, 2019 25 / 31
  • 26. One against All SVM Amit Praseed Classification October 23, 2019 26 / 31
  • 27. One against All leads to Ambiguity Data in the blue regions will be classified into two classes simultaneously Amit Praseed Classification October 23, 2019 27 / 31
  • 28. One against One SVM In One against One SVM, instead of pitting one class against all the others, classes are pitted against each other. This leads to a total of k(k−1) 2 classifiers for a k-class classification problem. The class of the data point is decided by a simple majority voting. Amit Praseed Classification October 23, 2019 28 / 31
  • 29. One against One SVM One against one still leads to ambiguity if two classes get the same amount of votes. Amit Praseed Classification October 23, 2019 29 / 31
  • 30. Anomaly Detection The fundamental notion of classification is that there are two or more classes that are well represented in the training data, and the classifier must assign one of the data labels to the incoming testing data. However in a number of scenarios, only one class of data is available. The other class is either not available at all, or has very few data samples. Eg: Security related domains, for example detecting cyber-attacks, or detecting abnormal program execution. In such scenarios, obtaining the training data corresponding to attacks or abnormal runs is expensive or infeasible. The commonly used approach is to train the classifier on the available data which models the normal data, and use the classifier to identify any anomalies in the testing data. This is often called anomaly detection. Amit Praseed Classification October 23, 2019 30 / 31
  • 31. One Class SVMs SVMs can be used for anomaly detection by modifying the constraints and the objective function. Instead of maximizing the margin between classes as in a normal SVM, the One Class SVM tries to form a hypersphere (with radius R and centre a) around the training data, and tries to minimize the radius of the sphere. min R2 + C N i=1 ζi subject to ||xi − a||2 ≤ R2 + ζi ζi ≥ 0 Amit Praseed Classification October 23, 2019 31 / 31