2. What is Machine Learning?
β’ Machine learning is the subfield of computer science that βgives computers the ability
to learn without being explicitly programmedβ.
β’ Tom M. Mitchell provided a widely quoted, more formal definition: βA computer
program is said to learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T, as measured by P, improves
with experience E.β
3. Types of Machine Learning
β’ Supervised Learning
Inferring a function from labelled training data. A supervised learning algorithm
analyses the training data (a list of input and their correct output) and produces
an appropriate function, which can be used for mapping new examples.
β’ Unsupervised Learning
Inferring a function to describe hidden structure from unlabelled data. No labels
are given to the learning algorithm, leaving it on its own to find structure in its
input.
β’ Reinforcement Learning
Concerned with how software agents ought to take actions in an environment so
as to maximize some notion of cumulative reward.
4. Types of Supervised Learning
β’ Regression
In a regression problem, we are trying to predict results within a continuous
output, meaning that we are trying to map input variables to some continuous
function. For example, predicting housing prices where the output is a real
number.
β’ Classification
In a classification problem, we are instead trying to predict results in a discrete
output. In other words, we are trying to map input variables into discrete
categories. For example, predicting whether a particular email is spam or not.
5. Machine Learning Tools and Techniques
β’ Linear Regression
Here the predicted function is of linear degree. Itβs the most common type of
regression as it usually fits most of the regression problems. It uses minimization
of its Cost Function using Gradient Descent, etc. for its working. Linear
regression can be single variable or multivariate. Higher degree regression
technique is called Polynomial Regression.
β’ Logistic Regression
This technique is used for classification problems. Here, the predicted function
has a discrete range. It can be considered as a modified form of Linear Regression
which uses Sigmoid function for its task.
6. β’ Neural Network
This tool can be used both for regression as well as for classification. It consists of one
or more layers of computational units between the input and output for modelling
the problem using Backpropagation Algorithm. For complex problems with many
features to model, neural network provide efficient solution compared to other
techniques.
β’ Support Vector Machine
While Neural Network outputs the probability by which a particular input is close to
the output, SVM is a non-probabilistic binary linear classifier. It has the ability to
linearly separate the classes by a large margin. Add to it the Kernel, and SVM
becomes one of the most powerful classifier capable of handling infinite dimensional
feature vectors.
7. Support Vector Machine
The Hypothesis function for an SVM is same as that of logistic regression:
β π π₯ =
1
1 + πβπ π π₯
The difference lies in the Cost Function:
π½ π = πΆ
π=1
π
[π¦ π πππ π‘1(π π π₯(π)) + 1 β π¦ π πππ π‘0(π π π₯(π))]
8. Where πππ π‘0 and πππ π‘1 are defined as
πππ π‘0 π§ =
0, ππ π§ β€ β1
π§ + 1, ππ‘βπππ€ππ π
πππ π‘1 π§ =
0, ππ π§ β₯ 1
βπ§ + 1, ππ‘βπππ€ππ π
Thus the learned parameter vector is obtained as:
π = min
π
π½(π) +
1
2
π=1
π
ππ
2
where the minimization functionality can be obtained via Gradient Descent, Conjugate
gradient, BFGS, L-BFGS, etc. and the second term is for Regularization to prevent Overfitting.
9. Kernel
β’ A Kernel is a βsimilarity functionβ that we provide to a machine learning algorithm,
most commonly, an SVM. It takes two inputs and outputs how similar they are. The
means by which this similarity is determined differentiates one kernel function from
another. It is a shortcut that helps us do certain calculation faster which otherwise
would involve computations in higher dimensional space. Examples include Gaussian
Kernel, String Kernel, Chi-Squared Kernel, Histogram Intersection Kernel, etc.
β’ Kernel methods owe their name to the use of kernel functions, which enable them to
operate in a high-dimensional, implicit feature space without ever computing the
coordinates of the data in that space, but rather by simply computing the inner
products between the images of all pairs of data in the feature space. This operation is
often computationally cheaper than the explicit computation of the coordinates. This
approach is called the "kernel trick".
10. SVM Intuition
β’ We have 2 colours of balls on the table that we want to separate.
11. β’ We get a stick and put it on the table, this works
pretty well right?
12. β’ Some villain comes and places more balls on the
table, it kind of works but one of the balls is on the
wrong side and there is probably a better place to put
the stick now.
13. β’ SVMs try to put the stick in the best possible place by
having as big a gap on either side of the stick as
possible.
14. β’ Now when the villain returns the stick is still in a
pretty good spot.
15. β’ There is another trick in the SVM toolbox that is even
more important. Say the villain has seen how good
you are with a stick so he gives you a new challenge.
16. β’ Thereβs no stick in the world that will let you split
those balls well, so what do you do? You flip the table
of course! Throwing the balls into the air. Then, with
your pro ninja skills, you grab a sheet of paper and
slip it between the balls.
17. β’ Now, looking at the balls from where the villain is
standing, they balls will look split by some curvy line.
18. βThe balls can be considered as data, the stick a classifier,
the biggest gap trick an optimization, flipping the table
kernelling and the piece of paper a hyperplaneβ