Support vector machine

Agenda
• SVM basics
• SVM Objective Function
• Slack Variable
• Kernel Trick
• SVM Hyperparameters
• Coding

AI / ML
Machine Learning
Using Computer algorithms to
uncover insights, determine
relationships , and make prediction
about future trends.
Artificial Intelligence
Enabling computer systems to perform
tasks that ordinarily requires human
intelligence.
We use machine learning methods to create AI systems.

Machine Learning Paradigms
• Unsupervised Learning
• Find structure in data. (Clusters, Density, Patterns)
• Supervised Learning
• Find mapping between features to labels

Support Vector Machine
• Supervised machine learning Algorithm.
• Can be used for Classification/Regression.
• Works well with small datasets

Classification
• Classification using SVM
• 2 class problem , linearly separable data

The “Best” Separation Boundary
This is the widest road that
separates the two groups

This is the widest margin
that separates the two
groups
Margin

The distance between the
points and the line are as
far as possible.
Margin

The distance between the
support vectors and the
line are as far as possible.
Margin
Support
Vectors

This hyperplane is an
optimal hyperplane
because it is as far as
possible from the support
vectors.
Maximum
Margin
Support
Vectors
Hyperplane

Decision Rule
+
+
-
-w
u
Projection
• w : normal vector of
any length
• u : unknown vector
and we want to find it
belongs to which
class?
Then unknown vector
will be classified as
+

Constraints
+
+
-
-
Constraint for
positive samples
+
Likewise for
negative samples
--1
1
0

Combining Constraints
+
+
-
-
Constraint for positive samples
Constraint for negative samples
0
0
To bring above inequalities together we
introduce another variable
For support vectors

Width
+
+
-
-
w
On the equation above x+ and x− are in the
gutter (on hyperplanes maximizing the
separation).
Positive Samples
Likewise Negative
Samples

SVM Objective
+
+
-
-
w
OBJECTIVE:
CONSTRAINT:
(Minimize)
Constrained Optimization problem.

Lagrange Multipliers
Lagrangian
OBJECTIVE: CONSTRAINT:

Solving the PRIMAL
¶LP
¶w
= 0
¶LP
¶b
= 0
The normal vector w are
the linear combination of
support vectors

SVM Objective (DUAL)
OBJECTIVE: Minimize
CONSTRAINT: SVM objective will depend only on
the dot product of pairs of support
vector.

Decision Rule
So whether a new sample will be
on the right of the road depends
on the dot product of the
support vectors and the
unknown sample.

Points to Consider
• SVM problem is constrained minimization problem
• To find the widest road between different samples we just need to
consider dot products of support vectors .

Non-Separable
+
+
-
-w
-
+
Slack Variables

PRIMAL Objective
LINEARLY SEPARABLE CASE
LINEARLY NON-SEPARABLE CASE

DUAL Objective
LINEARLY SEPARABLE CASE
LINEARLY NON-SEPARABLE CASE

Increasing Model Complexity
• Non linear dataset with n features (~n-dimensional)
• Match the complexity of the data by the complexity of the model.
Linear Classifier ?
• Improve accuracy by transforming
input feature space.
For datasets with a lot of features,
it becomes next to impossible to try out all
interesting transformations.
https://www.youtube.com/watch?v=3liCbRZPrZA

Increasing Model Capacity
y x( ) = w0 + wT
x
y x( ) = w0 + wjfj x( )
j=1
M
å = wjfj x( )
j=0
M
å
LINEAR CLASSIFIERS
GENERALIZED LINEAR CLASSIFIERS

KERNEL TRICK
y x( ) = w0 + wT
x
y x( ) = w0 + wjfj x( )
j=1
M
å
LD = ai -
1
2
aia j yi yjf xi( )f x j( )i, j
å
i
å
LD = ai -
1
2
aia j yi yjxix j
i, j
å
i
å

Kernel Trick
• For a given pair of vectors (in a lower-dimensional feature space) and
a transformation into a higher-dimensional space, there exists a
function (The Kernel Function) which can compute the dot product in
the higher-dimensional space without explicitly transforming the
vectors into the higher-dimensional space first
LD = ai -
1
2
aia j yi yjf xi( )f x j( )i, j
å
i
å
K xi , x j( ) = f xi( )f x j( )
KERNEL FUNCTION
LD = ai -
1
2
aia j yi yj K xi ,x j( )i, j
å
i
å

SVM Hyperparameters
• Parameter C : Penalty parameter
• Parameter gamma : Specific to Gaussian RBF
• Large Value of parameter C => small margin
• Small Value of parameter C => Large margin
• Large Value of parameter gamma => small gaussian
• Small Value of parameter gamma => Large gaussian

Source
• https://www.quora.com/What-are-C-and-gamma-with-regards-to-a-support-vector-
machine
• https://www.quora.com/How-can-I-choose-the-parameter-C-for-SVM
• https://www.youtube.com/watch?v=_PwhiWxHK8o
• https://www.youtube.com/watch?v=N1vOgolbjSc
• https://medium.com/@pushkarmandot/what-is-the-significance-of-c-value-in-support-
vector-machine-28224e852c5a
• https://towardsdatascience.com/understanding-support-vector-machine-part-1-
lagrange-multipliers-5c24a52ffc5e
• https://towardsdatascience.com/understanding-support-vector-machine-part-2-kernel-
trick-mercers-theorem-e1e6848c6c4d
• http://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf
• https://www.quora.com/What-is-the-kernel-trick

Support vector machine

In this document

More Related Content

What's hot

Similar to Support vector machine

Recently uploaded

Support vector machine