2. Support Vector
Machines
• A Support Vector Machine (SVM) is
a very powerful and versatile
Machine Learning model, capable
of performing linear or nonlinear
classification, regression, and even
outlier detection. It is one of the
most popular models in Machine
Learning, and any‐one interested in
Machine Learning should have it in
their toolbox. SVMs are particularly
well suited for classification of
complex but small- or medium-
sized datasets.
3. Concept of SVM in
three parts:
Linear SVM
Hard Margin
Classifier
Soft Margin
Classifier
Non – Linear SVM
Linear SVM Non-Linear SVM
It can be easily separated with a
linear line.
It cannot be easily separated with a linear line.
Data is classified with the help of
hyperplane.
We use Kernels to make non-separable data into
separable data.
4. Support Vector Machine
• 1. Linear SVM – Hard Margin Classifier
• Here we will build our initial concept of SVM by
classifying perfectly separated dataset ( linear classification
). This is also called “Linear SVM – Hard Margin
Classifier”. We will define the objective function. This
tutorial is dedicated for Hard Margin Classifier.
• 2. Linear SVM – Soft Margin Classifier
• We will extend our concept of Hard Margin Classifier to
solve for dataset where there are some outliers. In this case
all of the data points cant be separated using a straight line,
there will be some miss-classified points. This is similar of
adding regularization to a regression model.
• 3. Non – Linear SVM
• Finally we will learn how to derive Non-linear SVM using
kernel. I will probably have a separate tutorial on kernel
before this.
•
5. Maximal Margin Classifier
• a margin classifier is a classifier which is able to give an
associated distance from the decision boundary for each
example.
• Hyperplane
• We can use a line to separate data which is in two
dimension (Have 2 features x1 and x2 ). Similarly need a
2D plane to separate data in 3 dimension. In order to
generalize the concepts, we will call them hyperplane,
instead of line, plane or cube for n dimension of data,
where n > 0.
6. What is Margin?
• Margin can be defined using the minimum
distance (normal distance) from each observations to a
given separating hyperplane. Let’s see how we can use
Margin to find optimal Hyperplane.
• What is classification margin?
• The classification margin is the difference between the
classification score for the true class and maximal
classification score for the false classes. The
classification margin is a column vector with the same
number of rows as in the matrix X .
7. What is hard margin
SVM?
• A hard margin means that an
SVM is very rigid in
classification and tries to work
extremely well in the training set,
causing overfitting.
8. What is soft margin
classification?
• Soft Margin Classifier
• The constraint of maximizing the margin of the line
that separates the classes must be relaxed. This is
often called the soft margin classifier. This change
allows some points in the training data to violate the
separating line.
9. What is soft and
hard margin
in SVM?
• The difference between a hard
margin and a soft margin in
SVMs lies in the separability of the
data. In this case, a soft margin SVM
is appropriate. Sometimes, the data
is linearly separable, but the margin
is so small that the model becomes
prone to overfitting or being too
sensitive to outliers.
10. What & Why of SVM as Soft Margin Classifier?
• Linear classifier can be made using support vector machines. One
disadvantage of an SVM is that a classifier with NO regularization cost
is updated only until all the training points are classified correctly.
• SVM based-classifiers do not distinquish models based on how well or
confidently they classify the data. As a result it is difficult to compare
the quality of two models. A softmax classifier is a better choice when
we are also concerned about the quality of classification.
• For example, both the SVM models presented below classify the data
accurately, however, the one on the right is prefered because it has higher
margin. A SVM update rule without regularized weight will not be able
to pick out this difference. Worse, it is possilbe that with regularized
weights the SVM method chooses the classifier with a smaller margin.
12. SOFTMAX MARGIN
CLASSIFICATION
• Single outlier can push the decision
boundary greatly, so that the margin
becomes very narrow.
• Even though a linear decision boundary
can classify the target classes properly,
the data may not be separable using a
straight line ( no clear boundary )
13. SOFTMAX MARGIN
CLASSIFICATION
• If we strictly impose that all instances be off the street and on the
right side, this is called hard margin classification. There are two
main issues with hard margin classification.
• First, it only works if the data is linearly separable, and second it is
quite sensitive to outliers. Figure 5-3 shows the iris dataset with just
one additional outlier: on the left, it is impossible to find a hard
margin, and on the right the decision boundary ends up very
different from the one we saw in Figure 5-1 without the outlier, and
it will probably not generalize as well.
• To avoid these issues it is preferable to use a more flexible model.
The objective is to find a good balance between keeping the street
as large as possible and limiting the margin violations (i.e., instances
that end up in the middle of the street or even on the wrong side).
This is called soft margin classification.
• In Scikit-Learn’s SVM classes, you can control this balance using
the C hyperparame‐ter: a smaller C value leads to a wider street but
more margin violations. Figure 5-4shows the decision boundaries
and margins of two soft margin SVM classifiers on a nonlinearly
separable dataset. On the left, using a high C value the classifier
makesfewer margin violations but ends up with a smaller margin.
• On the right, using a low C value the margin is much larger, but
many instances end up on the street. However,it seems likely that
the second classifier will generalize better: in fact even on this
training set it makes fewer prediction errors, since most of the
margin violations are actually on the correct side of the decision
boundary.