“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Module-3_SVM_Kernel_KNN.pptx
1. Support Vector Machine
• Support Vector Machine or SVM is one of the most popular
Supervised Learning algorithms, which is used for Classification
as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional space into
classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a
hyperplane.
Machine Learning Department of Computer Science & Technology 1
2. Example
Machine Learning Department of Computer Science & Technology 2
SVM algorithm can be used for Face detection, image classification,
text categorization, etc.
10. • SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine. Consider the
below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
Machine Learning Department of Computer Science & Technology 10
11. Hyperplane and Support Vectors in the SVM
• Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the dataset, which means if there are
2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means the maximum distance between
the data points.
• Support Vectors:
• The data points or vectors that are the closest to the hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.
Machine Learning Department of Computer Science & Technology 11
12. Types of SVM
• SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a
straight line, then such data is termed as non-linear data and
classifier used is called as Non-linear SVM classifier.
Machine Learning Department of Computer Science & Technology 12
13. How does SVM works?
Linear SVM:
• The working of the SVM algorithm can be understood by using an example. Suppose we have a
dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a
classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the below
image:
Machine Learning Department of Computer Science & Technology 13
14. • So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there
can be multiple lines that can separate these classes. Consider the below image:
• Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors. The distance between the vectors and the hyperplane
is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum
margin is called the optimal hyperplane.
Machine Learning Department of Computer Science & Technology 14
16. Non-Linear SVM
• If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the below image:
Machine Learning Department of Computer Science & Technology 16
17. • So to separate these data points, we need to add one more dimension. For linear data, we have used
two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated
as:
z=x2 +y2
• By adding the third dimension, the sample space will become as below image:
Machine Learning Department of Computer Science & Technology 17
18. • So now, SVM will divide the datasets into classes in the following way. Consider the below image:
• Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d
space with z=1, then it will become as:
• Hence we get a circumference of radius 1 in case of non-linear data.
Machine Learning Department of Computer Science & Technology 18
26. Nonlinear Support Vector Machine
• What if decision boundary is not linear?
Machine Learning Department of Computer Science & Technology 26
27. Nonlinear Support Vector Machine
• Transform data into higher dimensional space
Machine Learning Department of Computer Science & Technology 27
28. Nonlinear Support Vector Machine
Machine Learning Department of Computer Science & Technology 28
29. Nonlinear Support Vector Machine
Machine Learning Department of Computer Science & Technology 29
30. Nonlinear Support Vector Machine
Machine Learning Department of Computer Science & Technology 30
31. Nonlinear Support Vector Machine
Machine Learning Department of Computer Science & Technology 31
32. What is Kernel method in machine learning?
• Kernels or kernel methods (also called Kernel functions) are sets of
different types of algorithms that are being used for pattern analysis.
• They are used to solve a non-linear problem by using a linear classifier.
Kernels Methods are employed in SVM (Support Vector Machines) which
are used in classification and regression problems.
• The SVM uses what is called a “Kernel Trick” where the data is transformed
and an optimal boundary is found for the possible outputs.
• You can break SVM strategy down into two steps:
• First, the data is projected implicitly onto a high-dimensional space through the
kernel trick.
• The second step involves applying a linear classifier to the projected data.
Machine Learning Department of Computer Science & Technology 32
33. What is Kernel method in machine learning?
• The Kernel function will usually convert the training set of data so
that a non-linear decision surface can be transformed to a linear
equation in a higher number of dimension spaces.
• Essentially, it gives back the inner product between two points in a
standard feature dimension.
• Kernel functions are applied to every data instance for the purpose of
mapping the original nonlinear observations into a higher-
dimensional space.
• These observations become separable in the higher-dimensional
space.
Machine Learning Department of Computer Science & Technology 33
34. The Need for Kernel Method and its Working
It is very difficult to solve this classification using a linear classifier as there is no good linear line that should
be able to classify the red and the green dots as the points are randomly distributed.
Machine Learning Department of Computer Science & Technology 34
35. • Here comes the use of kernel function which takes the points to
higher dimensions, solves the problem over there and returns the
output.
• Think of this in this way, we can see that the green dots are enclosed
in some perimeter area while the red one lies outside it, likewise,
there could be other scenarios where green dots might be distributed
in a trapezoid-shaped area.
• So what we do is to convert the two-dimensional plane which was
first classified by one-dimensional hyperplane (“or a straight line”) to
the three-dimensional area and here our classifier i.e. hyperplane will
not be a straight line but a two-dimensional plane which will cut the
area.
Machine Learning Department of Computer Science & Technology 35
36. • In order to get a mathematical understanding of kernel, let us
understand the Lili Jiang’s equation of kernel which is:
K(x, y)=<f(x), f(y)> where,
• K is the kernel function,
• X and Y are the dimensional inputs,
• f is the map from n-dimensional to m-dimensional space and,
• < x, y > is the dot product.
Machine Learning Department of Computer Science & Technology 36
37. Illustration with the help of an example.
Let us say that we have two points, x= (2, 3, 4) and y= (3, 4, 5)
As we have seen, K(x, y) = < f(x), f(y) >.
Let us first calculate < f(x), f(y) >
f(x)=(x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
f(y)=(y1y1, y1y2, y1y3, y2y1, y2y2, y2y3, y3y1, y3y2, y3y3)
so,
f(2, 3, 4)=(4, 6, 8, 6, 9, 12, 8, 12, 16)and
f(3 ,4, 5)=(9, 12, 15, 12, 16, 20, 15, 20, 25)
so the dot product,
f (x). f (y) = f(2,3,4) . f(3,4,5)=(36 + 72 + 120 + 72 +144 + 240 + 120 + 240 +
400)=1444
And, K(x, y) = (2*3 + 3*4 + 4*5) ^2=(6 + 12 + 20)^2=38*38=1444.
This as we find out, f(x).f(y) and K(x, y) give us the same result, but the former
method required a lot of calculations(because of projecting 3 dimensions into 9
dimensions) while using the kernel, it was much easier.
Machine Learning Department of Computer Science & Technology 37
38. What are the types of Kernel methods in
SVM models?
Support vector machines use various kinds of kernel methods. Here are a
few of them:
1. Liner Kernel
Let us say that we have two vectors with name x1 and Y1, then the linear
kernel is defined by the dot product of these two vectors:
K(x1, x2) = x1 . x2
2. Polynomial Kernel
A polynomial kernel is defined by the following equation:
K(x1, x2) = (x1 . x2 + 1)d,
Where, d is the degree of the polynomial and x1 and x2 are vectors
Machine Learning Department of Computer Science & Technology 38
39. 3. Gaussian Kernel
This kernel is an example of a radial basis function kernel. Below is the
equation for this:
The given sigma plays a very important role in the performance of the
Gaussian kernel and should neither be overestimated and nor be
underestimated, it should be carefully tuned according to the problem.
4. Exponential Kernel
This is in close relation with the previous kernel i.e. the Gaussian kernel
with the only difference is – the square of the norm is removed.
Machine Learning Department of Computer Science & Technology 39
40. The function of the exponential function is:
This is also a radial basis kernel function.
5. Laplacian Kernel
This type of kernel is less prone for changes and is totally equal to
previously discussed exponential function kernel, the equation of
Laplacian kernel is given as:
Machine Learning Department of Computer Science & Technology 40
41. 6. Hyperbolic or the Sigmoid Kernel
This kernel is used in neural network areas of machine learning. The
activation function for the sigmoid kernel is the bipolar sigmoid
function. The equation for the hyperbolic kernel function is:
This kernel is very much used and popular among support vector
machines.
7. Anova radial basis kernel
This kernel is known to perform very well in multidimensional
regression problems just like the Gaussian and Laplacian kernels. This
also comes under the category of radial basis kernel.
The equation for Anova kernel is :
Machine Learning Department of Computer Science & Technology 41
42. There are a lot more types of Kernel Method and we have discussed
the mostly used kernels. It purely depends on the type of problem
which will decide the kernel function to be used.
Machine Learning Department of Computer Science & Technology 42
43. Learning Nonlinear Support Vector Machine
Machine Learning Department of Computer Science & Technology 43
45. K-Nearest Neighbor(KNN) Algorithm for
Machine Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into
the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means
when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification
problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data
into a category that is much similar to the new data.
Machine Learning Department of Computer Science & Technology 45
46. Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we can
use the KNN algorithm, as it works on a similarity measure. Our KNN model will
find the similar features of the new data set to the cats and dogs images and based
on the most similar features it will put it in either cat or dog category.
Machine Learning Department of Computer Science & Technology 46
47. Why do we need a K-NN Algorithm?
Machine Learning Department of Computer Science & Technology 47
48. How does K-NN work?
o Step-1: Select the number K of the neighbors
o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of the data points
in each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
o Step-6: Our model is ready.
Machine Learning Department of Computer Science & Technology 48
49. • Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points.
Machine Learning Department of Computer Science & Technology 49
50. • By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B.
• Consider the below image:
• As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.
Machine Learning Department of Computer Science & Technology 50
51. How to select the value of K in the K-NN
Algorithm?
• Below are some points to remember while selecting the value of K in
the K-NN algorithm:
o There is no particular way to determine the best value for "K", so we
need to try some values to find the best out of them. The most
preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to
the effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.
Machine Learning Department of Computer Science & Technology 51