Support Vector Machines
Theory and Implementation in python
In machine learning, support vector machines
are supervised learning models with associated
learning algorithms that analyze data and
recognize patterns, used for classification and
Properties of an SVM
Non probabilistic binary linear classifier
Support for non-linear classification using the
Two sets of points in p-dimensional space are
said to be linearly separable if they can be
separated using a p-1 dimensional hyperplane.
Example - The two sets of 2D data
in the image are separated by a single
straight line (1D hyperplane),
and hence are linearly separable
The hyperplane that separates the two sets of
data is called the linear discriminant.
X = C
W = [w1,w2,.......wn]
X = [X1,X2,......Xn]
for the nth dimension
Selecting the hyperplane
For every linearly separable data, there exist
infinite number of separating hyperplanes.
Hence, we must choose the most suitable one
Maximal Margin Hyperplane
We can compute the (perpendicular) distance from
each observation in the data set to a given
separating hyperplane; the smallest such distance
is the minimal distance from the observations to
the hyperplane, and is known as the margin. The
maximal margin hyperplane is the separating
hyperplane for which the margin is largest.
Finding the shortest distance (margin)
Such that ||Xp-X|| is minimum and
Wt Xp =C (as Xp is on decision boundary)
[Wt - W transpose]
Maximizing the margin
Maximize D such that
D = (WT X – C) / ||W||
where X is the support v
Why maximum margin hyperplane?
● Supposing we have a maximal margin hyperplane for
a data set and want to predict the class for a new
observation, we compute the distance from the
● The more the distance from the hyperplane the more
confident we are that the sample belongs to that
● Thus the hyperplane with the farthest smallest
distance from the training observation would be the
Classifying a new sample
Consider a new sample x’ = [x1,x2,....xn]. To
predict the class to which the sample belongs,
we must simply compute WT
X = C.
X > C it lies on one side (positive half space) of the
hyperplane or if WT
X < C it lies on the other side (negative
half space) of the hyperplane. The sample belongs to the
class which represents the corresponding half space.
SVM - A linear discriminant
An SVM is simply a linear discriminant which
tries to build a hyperplane such that it has a
It classifies a new sample by simply computing
the distance from the hyperplane.
● Observations (represented as vectors) which
lie at marginal distance from the hyperplane
are called support vectors.
● These are important as shifting them even
slightly might change the position of the
hyperplane to a great extent.
Example - Support vectors
The vectors lying on the
green lines in the image
are the support vectors.
To avoid ‘overfitting’ of data (i.e. low sensitivity
of individual observations) by trying to make
perfectly linearly separable sets, we may opt to
allow some amount of misclassification keeping
in mind the greater robustness to individual
observations and better classification of most of
Achieving soft margin
Each observation has something known as the
‘slack variable’ that allow individual
observations to be on the wrong side of the
margin or the hyperplane.
Sum of slack variables <= C
Where C is a nonnegative tuning parameter. C is our
budget for the amount that the margin can violated by all
Tuning parameter C & Support vectors relation
Observations that lie directly on the margin, or on the
wrong side of the margin for their class, are known as
support vectors. These observations do affect the
support vector classifier.
When the tuning parameter C is large, then the margin is
wide, many observations violate the margin, and so there
are many support vectors.
Non linearly separable
In this case, an SVM would not able to linearly
classify the data. Hence SVM uses what is known
as the ‘kernel trick’.
The idea is that the enlarged feature space might
have a linear boundary which might not quite be
linear in the original feature space. In this ‘trick’ the
feature space is enlarged. This can be done using
various kernel functions.