Support Vector Machine
Agenda
• SVM basics
• SVM Objective Function
• Slack Variable
• Kernel Trick
• SVM Hyperparameters
• Coding
AI / ML
Machine Learning
Using Computer algorithms to
uncover insights, determine
relationships , and make prediction
about future trends.
Artificial Intelligence
Enabling computer systems to perform
tasks that ordinarily requires human
intelligence.
We use machine learning methods to create AI systems.
Machine Learning Paradigms
• Unsupervised Learning
• Find structure in data. (Clusters, Density, Patterns)
• Supervised Learning
• Find mapping between features to labels
Support Vector Machine
• Supervised machine learning Algorithm.
• Can be used for Classification/Regression.
• Works well with small datasets
Classification
• Classification using SVM
• 2 class problem , linearly separable data
The “Best” Separation Boundary
This is the widest road that
separates the two groups
The “Best” Separation Boundary
The “Best” Separation Boundary
The “Best” Separation Boundary
This is the widest road that
separates the two groups
The “Best” Separation Boundary
This is the widest margin
that separates the two
groups
Margin
The “Best” Separation Boundary
The distance between the
points and the line are as
far as possible.
Margin
The “Best” Separation Boundary
The distance between the
support vectors and the
line are as far as possible.
Margin
Support
Vectors
The “Best” Separation Boundary
This hyperplane is an
optimal hyperplane
because it is as far as
possible from the support
vectors.
Maximum
Margin
Support
Vectors
Hyperplane
SVM Objective Function
Decision Rule
+
+
-
-w
u
Projection
• w : normal vector of
any length
• u : unknown vector
and we want to find it
belongs to which
class?
Then unknown vector
will be classified as
+
Constraints
+
+
-
-
Constraint for
positive samples
+
Likewise for
negative samples
--1
1
0
Combining Constraints
+
+
-
-
Constraint for positive samples
Constraint for negative samples
0
0
To bring above inequalities together we
introduce another variable
For support vectors
Width
+
+
-
-
w
On the equation above x+ and x− are in the
gutter (on hyperplanes maximizing the
separation).
Positive Samples
Likewise Negative
Samples
Width
+
+
-
-
w
Maximize
SVM Objective
+
+
-
-
w
OBJECTIVE:
CONSTRAINT:
(Minimize)
Constrained Optimization problem.
Lagrange Multipliers
Lagrangian
OBJECTIVE: CONSTRAINT:
Solving the PRIMAL
¶LP
¶w
= 0
¶LP
¶b
= 0
The normal vector w are
the linear combination of
support vectors
PRIMAL  DUAL
SVM Objective (DUAL)
OBJECTIVE: Minimize
CONSTRAINT: SVM objective will depend only on
the dot product of pairs of support
vector.
Decision Rule
So whether a new sample will be
on the right of the road depends
on the dot product of the
support vectors and the
unknown sample.
Points to Consider
• SVM problem is constrained minimization problem
• To find the widest road between different samples we just need to
consider dot products of support vectors .
Slack variable
Separable Case
+
+
-
-w
Non-Separable
+
+
-
-w
-
+
Slack Variables
PRIMAL Objective
LINEARLY SEPARABLE CASE
LINEARLY NON-SEPARABLE CASE
DUAL Objective
LINEARLY SEPARABLE CASE
LINEARLY NON-SEPARABLE CASE
KERNEL TRICK
Increasing Model Complexity
• Non linear dataset with n features (~n-dimensional)
• Match the complexity of the data by the complexity of the model.
Linear Classifier ?
• Improve accuracy by transforming
input feature space.
For datasets with a lot of features,
it becomes next to impossible to try out all
interesting transformations.
https://www.youtube.com/watch?v=3liCbRZPrZA
Increasing Model Capacity
y x( ) = w0 + wT
x
y x( ) = w0 + wjfj x( )
j=1
M
å = wjfj x( )
j=0
M
å
LINEAR CLASSIFIERS
GENERALIZED LINEAR CLASSIFIERS
KERNEL TRICK
y x( ) = w0 + wT
x
y x( ) = w0 + wjfj x( )
j=1
M
å
LD = ai -
1
2
aia j yi yjf xi( )f x j( )i, j
å
i
å
LD = ai -
1
2
aia j yi yjxix j
i, j
å
i
å
Kernel Trick
• For a given pair of vectors (in a lower-dimensional feature space) and
a transformation into a higher-dimensional space, there exists a
function (The Kernel Function) which can compute the dot product in
the higher-dimensional space without explicitly transforming the
vectors into the higher-dimensional space first
LD = ai -
1
2
aia j yi yjf xi( )f x j( )i, j
å
i
å
K xi , x j( ) = f xi( )f x j( )
KERNEL FUNCTION
LD = ai -
1
2
aia j yi yj K xi ,x j( )i, j
å
i
å
Kernel functions
SVM Hyperparameters
• Parameter C : Penalty parameter
• Parameter gamma : Specific to Gaussian RBF
• Large Value of parameter C => small margin
• Small Value of parameter C => Large margin
• Large Value of parameter gamma => small gaussian
• Small Value of parameter gamma => Large gaussian
Code
What I really do?
Questions
Source
• https://www.quora.com/What-are-C-and-gamma-with-regards-to-a-support-vector-
machine
• https://www.quora.com/How-can-I-choose-the-parameter-C-for-SVM
• https://www.youtube.com/watch?v=_PwhiWxHK8o
• https://www.youtube.com/watch?v=N1vOgolbjSc
• https://medium.com/@pushkarmandot/what-is-the-significance-of-c-value-in-support-
vector-machine-28224e852c5a
• https://towardsdatascience.com/understanding-support-vector-machine-part-1-
lagrange-multipliers-5c24a52ffc5e
• https://towardsdatascience.com/understanding-support-vector-machine-part-2-kernel-
trick-mercers-theorem-e1e6848c6c4d
• http://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf
• https://www.quora.com/What-is-the-kernel-trick

Support vector machine