Support Vector Machine
➔ Most popular Supervised Learning algorithms.
➔ Used for both Classification and Regression tasks. (But widely used for
classification task)
➔ Works better for smaller size data.
➔ Works well with high dimensional space (independent variables)
➔ Works better where no. of features are greater than no. of obs.
➔ It doesn't perform well, when dataset has more noise i.e. target classes
are overlapping.
➔ It doesn't perform well, when we have large dataset because the required
training time is higher.
The goal of the support vector machine algorithm is to create the best
line or decision boundary in an n-dimensional space (n — the number
of features) that distinctly classifies the data points.
This best decision boundary is called a hyperplane.
Our objective is to find a plane that has the maximum margin, i.e
the maximum distance between data points of both classes.
Terminologies:
Margin: Margin is the perpendicular distance between
the closest data points and the hyperplane.
The best optimised line(hyperplane) with maximum
margin is termed as Margin Maximal Hyperplane.
The closest points where the margin distance is
calculated are considered as support Vectors.
➔ Support vectors are data points that are closer to the hyperplane
➔ Support vectors are influencers of the position and orientation of hyperplane
➔ With the help of support vectors, we maximise the margin of classifier.
➔ Support vectors are the points which help us to build the SVM classifier.
Terminologies:
Regularization:
➔ ‘C’ parameter in python sklearn library
➔ Optimises SVM classifier to avoid misclassifying the data
C = large Margin of Hyperplane = small
C = small Margin of Hyperplane = large
Problems with setting C values:
C small = chances of underfitting
C large = chances of overfitting
C = large
Margin of Hyperplane = small
C = small
Margin of Hyperplane = large
Terminologies:
Gamma:
➔ Defines how far influences the calculation of line of
separation.
➔ Low gamma - points far from hyperplane are considered for
the calculation
➔ High gamma - points close to hyperplane are considered for
the calculation
Outliers in the data can affect the threshold value and lead
to wrong predictions.
Terminologies:
Kernels:
➔ Kernel is the technique used
by SVM to classify the non-
linear data.
➔ Kernel functions are used to
increase the dimension of
the data, so that SVM can fit
the optimum hyperplane to
separate the data.
2D and 3D feature space
If the number of input features is 2, then the hyperplane is just a line.
If the number of input features is 3, then the hyperplane becomes a two-dimensional plane.
It becomes difficult to imagine when the number of features exceeds 3
Types of SVM
Non-Linear SVM
02
01
Linear SVM
Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two
classes by using a single straight line
Non-Linear SVM is used for non-linearly
separated data.which means if a dataset
cannot be classified by using a straight line,
then such data is termed as non-linear data
and classifier used is called as Non-linear
SVM classifier.
Non-Linear SVM
SVM_notes.pdf

SVM_notes.pdf

  • 1.
  • 2.
    ➔ Most popularSupervised Learning algorithms. ➔ Used for both Classification and Regression tasks. (But widely used for classification task) ➔ Works better for smaller size data. ➔ Works well with high dimensional space (independent variables) ➔ Works better where no. of features are greater than no. of obs. ➔ It doesn't perform well, when dataset has more noise i.e. target classes are overlapping. ➔ It doesn't perform well, when we have large dataset because the required training time is higher.
  • 3.
    The goal ofthe support vector machine algorithm is to create the best line or decision boundary in an n-dimensional space (n — the number of features) that distinctly classifies the data points. This best decision boundary is called a hyperplane.
  • 4.
    Our objective isto find a plane that has the maximum margin, i.e the maximum distance between data points of both classes.
  • 5.
    Terminologies: Margin: Margin isthe perpendicular distance between the closest data points and the hyperplane. The best optimised line(hyperplane) with maximum margin is termed as Margin Maximal Hyperplane. The closest points where the margin distance is calculated are considered as support Vectors.
  • 6.
    ➔ Support vectorsare data points that are closer to the hyperplane ➔ Support vectors are influencers of the position and orientation of hyperplane ➔ With the help of support vectors, we maximise the margin of classifier. ➔ Support vectors are the points which help us to build the SVM classifier.
  • 8.
    Terminologies: Regularization: ➔ ‘C’ parameterin python sklearn library ➔ Optimises SVM classifier to avoid misclassifying the data C = large Margin of Hyperplane = small C = small Margin of Hyperplane = large Problems with setting C values: C small = chances of underfitting C large = chances of overfitting
  • 9.
    C = large Marginof Hyperplane = small C = small Margin of Hyperplane = large
  • 10.
    Terminologies: Gamma: ➔ Defines howfar influences the calculation of line of separation. ➔ Low gamma - points far from hyperplane are considered for the calculation ➔ High gamma - points close to hyperplane are considered for the calculation
  • 12.
    Outliers in thedata can affect the threshold value and lead to wrong predictions.
  • 13.
    Terminologies: Kernels: ➔ Kernel isthe technique used by SVM to classify the non- linear data. ➔ Kernel functions are used to increase the dimension of the data, so that SVM can fit the optimum hyperplane to separate the data.
  • 14.
    2D and 3Dfeature space If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the number of features exceeds 3
  • 15.
    Types of SVM Non-LinearSVM 02 01 Linear SVM Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line Non-Linear SVM is used for non-linearly separated data.which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.
  • 16.