Support Vector machine(SVM) and Random Forest

Data Science
Classification Algorithms

Introduction to SVM
• Support Vector Machines (SVMs) are a type of supervised learning algorithm
that can be used for classification or regression tasks.
• The main idea behind SVMs is to find a hyperplane that maximally separates
the different classes in the training data.
• This is done by finding the hyperplane that has the largest margin, which is
defined as the distance between the hyperplane and the closest data points
from each class.
• SVMs are particularly useful when the data has many features, and/or when
there is a clear margin of separation in the data.

Support Vector Machine (SVM)
Machine
Learning
Technique
Supervised SVM
Classification
Linear
Regression

Applications of SVM
Face detection Gene classification Handwriting recognition
Object recognition

Data Type
 It does work on both linear and non-linear data types.
Non-linear separable
data
Linear separable data

Important Parameters in Kernelized
SVC ( Support Vector Classifier)
Kernel
1
Gamma
2
‘C’ parameter
3

Parameters of SVC (Cont…)
 Decide Margin
• Is the perpendicular distance
between the closest data points
and the hyperplane (on both sides)
• The best-optimized line with
maximum margin is termed as
margin maximal hyperplane
• The closest points where the
margin distance is calculated are

 ‘C’ parameter
• This parameter controls the
amount of regularization applied to
the data.
• C > Margin of hyperplane -> small
• C< Margin of hyperplane-> large
• If C is too large chance of
overfitting (System doesn't learn
anything)
• If C is too small chance of
underlining

 Gamma
• Defines how far influence the
calculation of reasonable line of
separation
• Gamma < Points fat from the
reasonable line are considered for
calculation
• Gamma > Points close to the
plausible line are considered for
calculation

Parameters of SVM (Cont…)
Kernel
Different SVM algorithm uses different types of linear algebraic kernel functions
• Linear Kernel
• Non-linear kernel
• Radial basis function
• Sigmoid
• Polynomial
• Exponential

Pros of SVM
They perform very well on a range of datasets.
They are versatile: different kernel functions can be
specified, or custom kernels can also be defined for
specific datatypes.
They work well for both high and low dimensional
data.

Cons of SVM
Efficiency (running time and memory usage) decreases as the size of the
training set increases.
Needs careful normalization of input data and parameter tuning.
Does not provide a direct probability estimator.
Difficult to interpret when a prediction was made.

Tools
• Matlab
• R tool
• Anaconda

Introduction to RM
• Random forest is a supervised learning algorithm
• It has two variations – one is used for classification
problems and other is used for regression problems
• It creates decision trees on the given data samples
• Gets prediction from each tree and selects the best
solution by means of voting
• Random forest algorithm combines multiple decision-
trees, resulting in a forest of trees
• It is also a pretty good indicator of feature
importance
• In the random forest classifier, the higher the number
of trees in the forest results in higher accuracy

Applications of RM
Remote sensing
Multiclass object
detection
Object movement
detection

Advantages of RM
Reduce risk of
system overfitting,
Takes less training
time
It can predict
highly accurate
results on large
dataset
Also predict
missing data
points

Algorithm of RM
• Step 1: Select random samples from a given data or training set.
• Step 2: This algorithm will construct a decision tree for every
training data.
• Step 3: Voting will take place by averaging the decision tree.
• Step 4: Finally, select the most voted prediction result as the final
prediction result.
 This combination of multiple models is called Ensemble. The
ensemble uses two methods:
 Bagging: Creating a different training subset from sample training
data with replacement is called Bagging. The final output is based
on majority voting.
 Boosting: Combing weak learners into strong learners by
creating sequential models such that the final model has the
highest accuracy is called Boosting. Example: ADA BOOST, XG
BOOST.

Parameters of RM
 There are more than 19 parameters of RM using
Sklearnlib
• n_estimators
• max_depth
• max_features
• Bootstrap
• max_samples

Bagging
The prediction at input x
when bootstrap sample
b is used for training

Bagging (Cont…)
Notice the bootstrap trees are different than the original tree

Support Vector machine(SVM) and Random Forest

Recommended

Recommended

More Related Content

Similar to Support Vector machine(SVM) and Random Forest

Similar to Support Vector machine(SVM) and Random Forest (20)

Recently uploaded

Recently uploaded (20)

Support Vector machine(SVM) and Random Forest