3. SYLLABUS
� A Tour of Machine Learning Classifiers Using Scikit-
learn
� Choosing a classification algorithm First steps with
scikit-learn Training a perception via scikit-learn
� Modeling class probabilities via logistic regression
Logistic regression intuition and conditional probabilities
Learning the weights of the logistic cost function Training a
logistic regression model with scikit-learn Tackling over
fitting via regularization.
�
3
4. SYLLABUS
� Maximum margin classification with support vector
machines Maximum margin intuition Dealing with the
nonlinearly separable case using slack variables Alternative
implementations in scikit-learn
� Solving nonlinear problems using a kernel SVM Using
the kernel trick to find separating hyper planes in higher
dimensional space
� Decision tree learning Maximizing information gain –
getting the most bang for the buck Building a decision tree
Combining weak to strong learners via random forests
� Self Learning Exercise: K-nearest neighbors – a lazy
learning algorithm
4
5. A TOUR OF MACHINE LEARNING CLASSIFIERS
USING SCIKIT-LEARN
� There are popular and powerful ML algorithms
that are commonly used in academia as well as in
industry.
� While learning about the differences between
several supervised learning algorithms for
classification, we will also develop an intuitive
appreciation of their individual strengths and
weaknesses.
� The scikit-learn library, which offers a user-
friendly interface for using those algorithms
efficiently and productively. 5
6. � Robust and popular algorithms for classification,
such as
⚫ logistic regression,
⚫ support vector machines, and
⚫ decision trees
� Examples and explanations using the scikit-learn
machine learning library, which provides a wide
variety of machine learning algorithms via a
user-friendly Python API
� Discussions about the strengths and weaknesses
of classifiers with linear and nonlinear decision
boundaries
6
7. CHOOSING A CLASSIFICATION ALGORITHM
� To restate the no free lunch theorem by David
H. Wolpert, no single classifier works best across
all possible scenarios.
� In practice, it is always recommended that you
compare the performance of at least a handful of
different learning algorithms to select the best
model for the particular problem;
� these may differ in the number of features or
examples, the amount of noise in a dataset, and
whether the classes are linearly separable or not. 7
8. � The performance of a classifier—computational
performance as well as predictive power—
depends heavily on the underlying data that is
available for learning
� The five main steps that are involved in training
a supervised machine learning algorithm
1. Selecting features and collecting labeled training
examples.
2. Choosing a performance metric.
3. Choosing a classifier and optimization algorithm.
4. Evaluating the performance of the model.
5. Tuning the algorithm. 8
9. FIRST STEPS WITH SCIKIT-LEARN – TRAINING A
PERCEPTRON
� In Module 2, Training Simple Machine Learning
Algorithms for Classification,
the perceptron rule and Adaline, which is
implemented in Python and NumPy.
� Now consider the scikit-learn API, which,
combines a user-friendly and consistent interface
with a highly optimized implementation of
several classification algorithms.
� The scikit-learn library offers not only a large
variety of learning algorithms, but also many
convenient functions to preprocess data and to
fine-tune and evaluate our models. 9
10. � To get started with the scikit-learn library, we will
train a perceptron model similar to the one that we
implemented in Module 2.
� For simplicity, we will use the already familiar Iris
dataset .
� we will only use two features from the Iris dataset for
visualization purposes.
� We will assign the petal length and petal width of the
150 flower examples to the feature matrix, X, and the
corresponding class labels of the flower species to the
vector array, y: 10
19. LOGISTIC REGRESSION IN MACHINE LEARNING
� Supervised Learning technique used for predicting
the categorical dependent variable using a given set
of independent variables.
� it gives the probabilistic values which lie
between 0 and 1.
� Linear Regression is used for solving Regression
problems, whereas Logistic regression is used for
solving the classification problems.
19
20. � Logistic Regression can be used to classify the
observations using different types of data and can
easily determine the most effective variables used
for the classification. The below image is showing
the logistic function:
20
21. LOGISTIC FUNCTION (SIGMOID FUNCTION):
� The sigmoid function is a mathematical function used to
map the predicted values to probabilities.
� It maps any real value into another value within a range of
0 and 1.
� The value of the logistic regression must be between 0 and
1, so it forms a curve like the "S" form.
� The S-form curve is called the Sigmoid function or the
logistic function.
� In logistic regression, we use the concept of the threshold
value, which defines the probability of either 0 or 1.
21
23. � Type of Logistic Regression:
� On the basis of the categories, Logistic Regression can be
classified into three types:
� Binomial: In binomial Logistic regression, there can be,
Pass or Fail, etc. only two possible types of the dependent
variables, such as 0 or 1.
� Multinomial: In multinomial Logistic regression, there
can be 3 or more possible unordered types of the
dependent variable, such as "cat", "dogs", or "sheep“
� Ordinal: In ordinal Logistic regression, there can be 3 or
more possible ordered types of dependent variables, such
as "low", "Medium", or "High".
23
24. SUPPORT VECTOR MACHINE ALGORITHM
� SVM is Supervised ML algorithms, which is used for
Classification as well as Regression problems.
�
� The goal of the SVM algorithm is to create the best line
or decision boundary that can segregate n-dimensional
space into classes. This best decision boundary is called
a hyperplane.
� SVM chooses the extreme points/vectors that help in
creating the hyperplane.
� These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine.
24
25. 25
Consider the below diagram in which there are two different
categories that are classified using a decision boundary or
hyperplane:
26. 26
SVM algorithm can be used for Face detection,
image classification, text categorization, etc.
27. TYPES OF SVM
� SVM can be of two types:
� Linear SVM: Linear SVM is used for linearly
separable data, which means if a dataset can be
classified into two classes by using a single straight
line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM
classifier.
� Non-linear SVM: Non-Linear SVM is used for non-
linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
27
28. HYPERPLANE AND SUPPORT VECTORS IN THE
SVM ALGORITHM:
� Hyperplane: There can be multiple lines/decision
boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision
boundary that helps to classify the data points. This
best boundary is known as the hyperplane of SVM.
� The dimensions of the hyperplane depend on the
features present in the dataset, which means if there
are 2 features (as shown in image), then hyperplane
will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
� We always create a hyperplane that has a maximum
margin, which means the maximum distance between
the data points.
28
29. SUPPORT VECTORS:
� The data points or vectors that are the closest to the
hyperplane and which affect the position of the
hyperplane are termed as Support Vector.
� Since these vectors support the hyperplane, hence
called a Support vector.
29
30. HOW DOES SVM WORKS?
� Linear SVM:
� The working of the SVM algorithm can be
understood by using an example.
� Suppose we have a dataset that has two tags
(green and blue), and the dataset has two
features x1 and x2.
30
31. 31
We want a classifier that can classify the pair(x1,
x2) of coordinates in either green or blue.
Consider the below image:
32. � So as it is 2-d space so by just using a straight
line, we can easily separate these two classes.
But there can be multiple lines that can separate
these classes. Consider the below image:
32
33. � Hence, the SVM algorithm helps to find the best line
or decision boundary; this best boundary or region is
called as a hyperplane.
� SVM algorithm finds the closest point of the lines
from both the classes. These points are called support
vectors.
� The distance between the vectors and the hyperplane
is called as margin.
� And the goal of SVM is to maximize this margin.
� The hyperplane with maximum margin is called
the optimal hyperplane. 33
35. � Non-Linear SVM:
� If data is linearly arranged, then we can separate
it by using a straight line, but for non-linear
data, we cannot draw a single straight line.
Consider the below image:
35
36. � So to separate these data points, we need to add one more
dimension. For linear data, we have used two dimensions x and
y, so for non-linear data, we will add a third dimension z.
� It can be calculated as: z=x2 +y2
� By adding the third dimension, the sample space will become as
below image:
36
37. � So now, SVM will divide the datasets into classes
in the following way. Consider the below image:
37
38. � Since we are in 3-d Space, hence it is looking like a
plane parallel to the x-axis. If we convert it in 2d
space with z=1, then it will become as:
� Hence we get a circumference of radius 1 in case of
non-linear data.
38
39. � Python Implementation of Support Vector
Machine
� Now we will implement the SVM algorithm using
Python. Here we will use the same dataset user_data,
which we have used in Logistic regression and KNN
classification.
� Data Pre-processing step
� Till the Data pre-processing step, the code will remain
the same. Below is the code:
https://www.javatpoint.com/machine-learning-support-
vector-machine-algorithm
39