SlideShare a Scribd company logo
1 of 116
The 10 Algorithms
Machine Learning
Engineers Need to Know
By Chode Amarnath
Machine Learning algorithms are classified into 4 types
→ Supervised
→ Unsupervised Learning
→ Semi - Supervised Learning
→ Reinforcement Learning
Supervised Learning
→ The machine trained on Labeled data
first, we train the machine with the input and corresponding output, and then we ask
the machine to predict the output using the test dataset.
Let's understand supervised learning with an example.
→ Suppose we have an input dataset of cats and dog images.
→ So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour,
height (dogs are taller, cats are smaller), etc. After completion of training, we input
the picture of a cat and ask the machine to identify the object and predict the
output. Now, the machine is well trained, so it will check all the features of the
object, such as height, shape, colour, eyes, ears, tail, etc., and find that it's a cat. So,
Advantages and Disadvantages
Advantages:
→ Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
→ These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
→ These algorithms are not able to solve complex tasks.
→ It may predict the wrong output if the test data is different from the training data.
→ It requires lots of computational time to train the algorithm.
Unsupervised Learning
→ In unsupervised machine learning the machine is trained on unlabeled dataset
→ The main aim of the unsupervised learning algorithm is to group or categories
the unsorted dataset according to the similarities, patterns, and differences.
→ Machines are instructed to find the hidden patterns from the input dataset.
Clustering
→ Clustering is a process of partitioning a set of data (or
objects) into a set of meaningful sub-classes.
→ One is market segmentation where you may have a
database of customers and want to group them into different market segments so you
can sell to them separately or serve your different market segments better.
→ The K-Means algorithm is by far the most popular, by far
the most widely used clustering algorithm.
Dimensionality Reduction
There are a couple of different reasons why one might want to do dimensionality
reduction.
→ One is data compression.
→ data compression not only allows us to compress the data and have
it therefore use up less computer memory or disk space, but it will also allow
us to speed up our learning algorithms.
→ Linear regression
→ logistic regression
→ Decision tree
→ SVM Algorithm
→ Naive Bayes algorithm
→ KNN algorithm
→ K - means
→ Random Forest algorithm
→ Dimensionality reduction algorithm
Important Links
Code Video - https://www.youtube.com/watch?v=rw84t7QU2O0
https://www.fireblazeaischool.in/blogs/assumptions-of-linear-
regression/#Multivariate_Normality
The four Assumption of Linear regression
Linear Regression is a useful statistical method we can use to understand the relationship
between two variables, x and y
Linear model make the following assumptions
1) Linear relationship between the variable and the target(Linearity).
2) Multivariate normality
3) No or little Collinearity
4) Homoscedasticity
5) No auto-correlation
1. The Two Variables Should be in a Linear Relationship
Linear relationship refer to the relation between the independent variables X and target
Y.
The Assumptions of linear relationship can be easily visualized using scatter plots
where we plot the independent variable X in the a-axis and the dependent variable Y in
the Y - axis.
Linear Relationship - Residual Plots
Multivariate Normality - Histogram
Multivariate Normality means that every independent Variable X follow a Gaussian
Distribution(Normally Distribution).
1) Normality can be assessed with histograms and Q-Q plots.
2) Normality can be statistically tested for example with the Kolmogorov -
Smirnov test.
3) When the variable is not normally distributed a non-linear transformation(Eg:
logarithm - transformation) may fix this issue.
Multiple regression assumes that the residuals are normally distributed
Tests to check Multivariate Normality
Q-Q plots -- If the data is normally distributed then it gets a fairly a straight line.
→ If it not normal then seen with deviation in the straight line
No or Low Multicollinearity
The next assumption of linear regression is that there should be less or no
multicollinearity in the given dataset.
→ This situation occurs when the features or independent variables of a
given dataset are highly correlated to each other.
→ In a model having correlated variables, it becomes difficult to determine
which variable is contributing to predict the target variable. Another thing is, the
standard errors tend to increase due to the presence of correlated variables.
Methods to handle Multicollinearity
→ You can drop one of those features which are highly correlated in the given data.
→ Derive a new feature from collinear features and drop these features (used for
making new features).
Multicollinearity can be detected via various methods. we will focus on the most
common one – VIF (Variable Inflation Factors).
→ VIF determines the strength of the correlation between the independent
variables. It is predicted by taking a variable and regressing it against every other
variable.
https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/
→ R^2 value is determined to find out how well an independent variable is
described by the other independent variables. A high value of R^2 means that the
variable is highly correlated with the other variables. This is captured by the VIF which is
denoted below.
→ VIF = 1/ 1 - R^2
→ So, the closer the R^2 value to 1, the higher the value of VIF and the higher
the multicollinearity with the particular independent variable.
VIF starts at 1 and has no upper limit
→ VIF = 1, no correlation between the independent variable and the other variables.
→ VIF exceeding 5 or 10 indicates high multicollinearity between this independent
variable and the others.
Fixing Multicollinearity
Dropping one of the correlated features will help in bringing down the multicollinearity
between correlated
Multicollinearity in categorical Variables
→ multicollinearity can be detected with spearman rank correlation
coefficient(ordinal variables)
→ Chi - Square test (Nominal variables).
It is important to note that the variables to be compared should have only 2
categories i.e 1 and 0 the chi-square test fails to determine the correlation between
variables with more than 2 categories
Chi-Square Test
Chi - Square test is a statistical test which is used to find out the
→ Difference between the observed and the expected data.
→ Find the correlation between categorical variables is due to chance, or if it is due
to a relationship between them.
→ It is important to note that the variables to be compared should have only 2
categories i.e 1 and 0 the chi-square test fails to determine the correlation between
variables with more than 2 categories.
Link
Homoscedasticity
Important video link https://www.youtube.com/watch?v=35jMqo2IroE
→ To understand Homoscedasticity we must understand Residual value of the
dependent variable in regression Analysis.
→ Residual value are the difference between the actual and predicted value.
→ Homoscedasticity refers to whether these residuals distributed equally or
whether they tend to cluster together at some values and spread far at some other
values
→ If the residuals are equally distributed then it is called homoscedasticity.
→ if the residuals tend to cluster together at some values it is called
Heteroscedasticity
If we do regression analysis and draw the chart of residual variable distribution
Residuals are distributed uniformly and any cluster formed uniformly
Heteroscedasticity
→ From left to right the distribution has taken triangle shape
→ At left side the values are coming very close together as we are going left to
right the values are far away from each other
To Check
Draw regplot on the basis of predicted and residual to check homoscedasticity
No Autocorrelation
When you are building a linear regression model for forecasting purpose, we’ll come
across this problem called autocorrelation and create multiple issues in interpreting the
results.
Autocorrelation
→ Linear model assumes that error terms are independent.
→ when you build regression model the error terms need to be completely
independent of each other.
→ the error term is the difference between the expected price at a particular
time and the price that was actually observed.
→ when you have uncorrelated the error terms will be randomly distributed
across the origin and no pattern
Outliers
An outlier is a data point which is significantly different from the remaining data.
Algorithms susceptible to outliers
1) Linear models(Linear models)
2) Adaboost
Detecting Outliers
Normal Distribution
→ 99% of the observation of a normal distribution variable lie within the mean
+- 3 * standard deviation.
→ values outside +- 3 * standard deviation are considered as outliers.
Skewed Distribution
→ The general approach is to calculate the quantiles and then Interquartile
range(IQR)
→ IQR = 75th Quantile - 25th Quantile
→ Upper Limit = 75th Quantile + IQR * 1.5
Note
For extreme outliers, Multiply the IQR by 3 instead of 1.5
Linear regression
→ In linear regression a relationship is established between independent and
dependent variable by fitting the the line.
→ The line is represented by a line y(Dependent variable) = a*x(independent ) +
b(intercept)
→ linear regression is used to solve regression problems
Advantages of Linear Regression
https://www.geeksforgeeks.org/ml-advantages-and-disadvantages-of-linear-regression/
→ Linear Regression is simple to implement and easier to interpret the output
coefficient.
Explanation:
→ when you know the relationship between the independent
and dependent variable have a linear relationship, this algorithm is the best to use
because of it’s less complexity to compared to other model.
→ Linear Regression is susceptible to overfitting but it can be avoided using
some dimensionality reduction technique(Regularization L1 and L2 technique and cross
validation)
Disadvantages of Linear regression
→ On the other hand in linear regression technique outliers can have huge
effects on the regression and boundaries are linear in the technique.
Explanation : Linear regression assumes a linear relationship between
dependent and independent variables, that means it assumes that there is a straight
line relationship between them
Summary :
→ Linear relationship is a great tool to analyze the relationship among the
variables but it is not recommended for most practical applications
Assumptions of Logistic Regression
Logistic regression does not make many of the key assumptions of linear regression
and general linear models that are based on ordinary least squares algorithms
particularly regarding linearity, normality, homoscedasticity, and measurement level.
→ First, logistic regression does not require a linear relationship between the
dependent and independent variables.
→ Second, the error terms (residuals) do not need to be normally distributed.
→ Third, homoscedasticity is not required.
→ Finally, the dependent variable in logistic regression is not measured on an
interval or ratio scale.
Supervised learning
→ In supervised learning, we are given a data set and already know what our
correct output should look like, having the idea that there is a relationship between the
input and the output.
→ Supervised learning problems are categorized into
1) Regression.
2) Classification problems.
Classification
→ In a classification problem, we are instead trying to predict results in a
discrete output. In other words, we are trying to map input variables into discrete
categories.
→ The main goal of classification is to predict the target class (Yes/
No).
→ The classification problem is just like the regression problem, except that
the values we now want to predict take on only a small number of discrete values.
For now, we will focus on the binary classification problem in which y can take on
only two values, 0 and 1.
Types of classification:
Binary classification.
When there are only two classes to predict, usually 1 or 0 values.
Multi-Class Classification
When there are more than two class labels to predict we call multi-classification task.
Logistic regression model
https://www.javatpoint.com/logistic-regression-in-machine-learning
→ Logistic regression predicts the output of a categorical dependent variable.
→ Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc.
→ But instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
→ In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
Logistic regression uses the concept of predictive modeling as regression; therefore, it
is called logistic regression, but is used to classify samples; Therefore, it falls under the
classification algorithm.
Logistic Function (Sigmoid Function):
→ The sigmoid function is a mathematical function used to map the predicted
values to probabilities.
It maps any real value into another value within a range of 0 and 1.
→ In logistic regression, we use the concept of the threshold value, which defines
the probability of either 0 or 1. Such as values above the threshold value tends to 1,
and a value below the threshold values tends to 0.
Advantages of logistic regression
Regression
→ Linear Regression is used to handle regression problems
→ whereas Logistic regression is used to handle the classification problems.
→ Linear regression provides a continuous output
→ Logistic regression provides discreet output.
→ The purpose of Linear Regression is to find the best-fitted line
while Logistic regression is one step ahead and fitting the line values to the
sigmoid curve.
→
When do you use linear regression vs Decision
Trees?
→ Linear regression is a linear model, which means it works really nicely when the
data has a linear shape.
→ But, when the data has a non-linear shape, then a linear model cannot capture the
non-linear features.
→ So in this case, you can use the decision trees, which do a better job at capturing
the non-linearity in the data by dividing the space into smaller sub-spaces depending on
the questions asked.
Support Vector Machines
→ SVMs are considered by many to be the most powerful 'black box' learning
algorithm, and by posing a cleverly-chosen optimization objective, one of the most
widely used learning algorithms today.
→ Compared to both logistic regression and neural networks, the support Vector
Machine, or SVM sometimes give a cleaner and sometimes more powerful way of
learning **complex nonlinear functions**.
→ SVMs are also called as Large Margin Classifiers.
https://www.javatpoint.com/machine-learning-
support-vector-machine-algorithm
→ Support Vector regression is one of the most popular supervised learning
algorithm, which is used for classification and regression model.
→ Primarily used for Classification problem.
Important topics in SVMs are:
1) Large Margin Classification.
2) Kernels
Large Margin Intuition
In SVMs Theta(transpose X), just a little bit bigger than Zero. and other much Less than
Zero.
This builds an extra safety factor or safety margin factor in SVM.
Consider a case, if you set “C” a very large value
If “C” is very very large, then minimizing this optimization objective we are going to be
highly motivated to choose a value, so that this first term equal to Zero.
What would it take to make this first term in the objective equal to Zero.
→ when ever we have a training example of Y = 1, if you to make the first term
Zero, what we need is to find a value of theta.
→ so that Thete(transpose)Xi is >= 1
→ when ever we have a example with label 0,
→ so that Thete(transpose)Xi is >= -1
The SVM will instead choose this decision boundary in black and that seems like a
better decision boundary.
→ This back decision boundary has a larger distance, that distance is called a
“Margin”.
→ This distance is called the margin of the SVM.
→ this gives a SVM robustness and it tries to separate the data with a large
margin as possible.
SVM
https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/
https://www.analyticsvidhya.com/blog/2014/10/support-vector-machine-
simplified/?utm_source=blog&utm_medium=understandingsupportvectormachinearticle
Datacamp SVM
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can
be used for both classification or regression challenges. However, it is mostly used in
classification problems.
Definition and Objective
The objective of the support vector machine algorithm is to find a hyperplane in an N-
dimensional space(N — the number of features) that distinctly classifies the data
points.
The SVM algorithm has a feature to ignore outliers and find the hyper-plane that has the
maximum margin. Hence, we can say, SVM classification is robust to outliers.
SVM offers high accuracy compared to other classifiers such as logistic regression
and decision trees.
It is used in variety of application such as face detection, intrusion detection,
classification emails, news articles and web pages.
Support Vector, Hyperplane
Support vectors
Support vectors are the data points, which are closest to the hyperplane and influence
the position and orientation of the hyperplane, using these support vectors, we
maximize the margin of the classifier
Hyperplane
A hyperplane is a decision plane which separates between a set of objects having
different class memberships.
Margin
A margin is a gap between the two lines on the closest class points. If the margin is
larger in between the classes, then it is considered a good margin, a smaller margin is a
How does SVM work?
The main objective is to segregate the given dataset in the best possible way. The
objective is to select a hyperplane with the maximum possible margin between support
vectors in the given dataset.
Dealing with non-linear and inseparable planes
Some problems can’t be solved using linear hyperplane, as shown in the figure below
In such situation, SVM uses a kernel trick to transform the input space to a higher
dimensional
SVM Kernels
kernel takes a low-dimensional input space and transforms it into a higher dimensional
space. In other words, you can say that it converts non separable problem to separable
problems by adding more dimension to it. It is most useful in non-linear separation
problem. Kernel trick helps you to build a more accurate classifier.
Non-linear data
Tuning Hyperparameters
Kernel: The main function of the kernel is to transform the given dataset input data into
the required form. Polynomial and RBF are useful for non-linear hyperplane. Polynomial
and RBF kernels compute the separation line in the higher dimension. This
transformation can lead to more accurate classifiers.
Regularization: Regularization parameter in python's Scikit-learn C parameter used to
maintain regularization. A smaller value of C creates a small-margin hyperplane and a
larger value of C creates a larger-margin hyperplane.
Gamma: A lower value of Gamma will loosely fit the training dataset, whereas a higher
value of gamma will exactly fit the training dataset, which causes over-fitting. In other
words, you can say a low value of gamma considers only nearby points in calculating the
separation line, while the a value of gamma considers all the data points in the
calculation of the separation line.
Important Decision trees best
→
https://www.youtube.com/watch?v=eKD5gxPPeY0&list=PLBv09BD7ez_4temBw
7vLA19p3tdQH6FYO&index=1
Decision Tree(https://www.youtube.com/watch?v=nWuUahhK3Oc&t=1126s)
A supervised learning technique in Data Mining, which can be used for prediction of both
Numeric and Non-Numeric independent variable.
Trees in general use a divide and conquer strategy to try to divide the training data into
smaller and smaller subsets.
Algorithm go through all the predictors and see which one of them is the most
predictive of the target feature and that feature will be the root of our tree.
So you have the root at the top and then you have splits and then you have decision
nodes
When tree terminate we call that a terminal node.
But the questions you should ask (and should know the answer to) are:
→ How do you split a decision tree?
→ What are the different splitting criteria?
→ What is the difference between Gini and Information Gain?
LINK - https://www.analyticsvidhya.com/blog/2020/06/4-ways-split-decision-tree/
Parent and Child Node: A node that gets divided into sub-nodes is known as Parent
Node, and these sub-nodes are known as Child Nodes. Since a node can be divided into
multiple sub-nodes, therefore a node can act as a parent node of numerous child nodes
Root Node: The top-most node of a decision tree. It does not have any parent node. It
represents the entire population or sample
Leaf / Terminal Nodes: Nodes that do not have any child node are known as
Terminal/Leaf Nodes
A decision tree makes decisions by splitting nodes into sub-nodes. This process is performed multiple times during
the training process until only homogenous nodes are left.
How to choose what feature to split on at each node?
At Rote Node, As well as the left branch and the right branch of the decision tree.
Important points to consider for
→ We had to decide if there were a few examples at that node comprising a
mix of cats and dogs.
→ Decision tree will choose what feature to split on in order to try to maximize
purity.
Purity
Purity(Means you want to get to what subsets, which are as close as possible to all data
samples
Example : If we had feature that said does the animal has a cat DNA, we didn’t have this
feature, but if we did, we could have split on this feature at the root node.
Two categories based on the type of target
variable
1.Continuous Target Variable
→ Reduction in Variance
2. Categorical Target Variable
→ Gini Impurity
→ Information Gain
→ Chi-Square
Variance
→ Variance is the measure of spread it tells us how far your data is spread
from the mean.
→ Low value of variance is leading to more pure nodes.
→ High value of variance is leading to more impure nodes.
→ low variance for splitting
Properties of Variance
→ Used when the target is continuous.
→ Split with lower Variance is selected.
Decision Tree Splitting Method #1: Reduction in Variance
Reduction in Variance is a method for splitting the node used when the target variable is
continuous, i.e., regression problems.
It is so-called because it uses variance as a measure for deciding the feature on which
node is split into child nodes.
→ x is sample, MU is mean, n is number of sample
→ lower value of variance is moving to pure nodes
Here are the steps to split a decision tree using reduction in variance:
For each split,
→ individually calculate the variance of each child node
→ Calculate the variance of each split as the weighted average variance of child
nodes
→ Select the split with the lowest variance
→ Perform steps 1-3 until completely homogeneous nodes are achieved
Decision Tree Splitting Method #2: Information Gain
Example
To select a feature to split further we need to know how impure or pure that split will be.
→ A pure sub-split means that either you should be getting “yes” or “no”. Suppose this is our dataset.
https://www.analyticsvidhya.com/blog/2021/10/an-introduction-to-random-forest-algorithm-for-beginners/
Random Forest(Bagging Algorithm)
Random Forest is a versatile machine learning method capable of performing both
regression and classification tasks.
It also undertakes dimensional reduction methods, treats missing values, outlier
values and other essential steps of data exploration, and does a fairly good job. It is a
type of ensemble learning method, where a group of weak models combine to form a
powerful model.
→ We’re combining multiple trees to get the final output. And hence it is called
forest.
But why is it called Random Forest.
→ Because we use the random bootstrap sample.
Random forests creates decision trees on randomly selected data samples, gets
prediction from each tree and selects the best solution by means of voting. It also
provides a pretty good indicator of the feature importance.
How does the algorithm work?
It works on four steps:
1)From a given dataset multiple bootstrap samples are created and
the number of bootstrap samples depend on the number of models we want to train.
Eg : If I want to build 10 models here then I’ll create 10
bootstrap samples.
2) Construct a decision tree for each bootstrap sample and get a
predicted value from each decision tree.
3) Perform a vote for each predicted result.
4) select the prediction result with the most votes as the final
prediction.
Resample
Resampling is the process of creation of new samples based on observed sample
→ Permutation tests
→ Bootstrapping -- Bootstrapping is a statistical procedure that resamples a single
dataset to create many simulated samples.
Bootstrapping
Bootstrapping is a powerful, non-parametric resampling technique that’s used to assess
the uncertainty in the estimator.
→ In bootstrapping, a large number of samples with the same size are drawn
repeatedly from an original sample.
→ This allows a given observation to be included in more than one sample, which
is known as sampling with replacement.
→ Each sample is of identical size.
→ The larger n, the closer the set of samples will be to the ideal bootstrap sample.
Bootstrap aggregation
Bootstrap aggregation, also known as bagging, is a powerful ensemble method that was
proposed to prevent overfitting.
→ The concept behind bagging is to combine the prediction of several base
learners to create a more accurate output.
→ Algorithms such as neural network and decisions trees are example of
unstable learning algorithms.
→ Bagging also supports the classification and regression problem.
→ Bootstrap is effective on small dataset.
Advantages:
Random forests is considered as a highly accurate and robust method because of the
number of decision trees participating in the process.
It does not suffer from the overfitting problem. The main reason is that it takes the
average of all the predictions, which cancels out the biases.
The algorithm can be used in both classification and regression problems.
Random forests can also handle missing values. There are two ways to handle these:
using median values to replace continuous variables, and computing the proximity-
weighted average of missing values.
You can get the relative feature importance, which helps in selecting the most
contributing features for the classifier.
Disadvantages:
Random forests is slow in generating predictions because it has multiple decision
trees. Whenever it makes a prediction, all the trees in the forest have to make a
prediction for the same given input and then perform voting on it. This whole process
is time-consuming.
The model is difficult to interpret compared to a decision tree, where you can easily
make a decision by following the path in the tree.
Finding important features
Random forests also offers a good feature selection indicator. Scikit-learn provides an
extra variable with the model, which shows the relative importance or contribution of
each feature in the prediction.
Random Forests vs Decision Trees
Random forests is a set of multiple decision trees.
Deep decision trees may suffer from overfitting, but random forests prevents overfitting
by creating trees on random subsets.
Decision trees are computationally faster.
Random forests is difficult to interpret, while a decision tree is easily interpretable and
can be converted to rules.
Gradient Boosting Algorithm
GBM is a boosting algorithm used when we deal with plenty of data to make predictions
with high prediction power
Boosting
→ Boosting is actually a machine learning algorithm which combines the
prediction of several base estimator in order to improve robustness over a single
estimator.
→ It combines multiple weak or average predictors to build a strong predictor.
XGboost is a powerful machine learning algorithm especially speed and accuracy is
required.
XGboost requires model requires parameter tuning to improve and fully leverage its
advantages over other algorithms.
XGboost or extreme gradient boosting is one of the well- known gradient boosting
technique(ensemble) having enhanced performance and speed in tree-based
Algorithm Working Process
Linear Regression - A relationship is established between Independent and Dependent
variable by fitting a straight line.
Logistic Regression - In Logistic regression instead fitting a straight line an s shaped
sigmoid function is fitted to get the output in discrete form, which provides to maximum
values 0 or 1
Linear regression - The method for calculating loss function in linear regression is the
mean squared error
Logistic regression - whereas for logistic regression it is maximum likelihood
estimation.

More Related Content

Similar to The 10 Algorithms Machine Learning Engineers Need to Know.pptx

Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2preetikumara
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
causality_discussion_slides_final.pdf
causality_discussion_slides_final.pdfcausality_discussion_slides_final.pdf
causality_discussion_slides_final.pdfssuser8cde591
 
Core Machine Learning Algorithms
Core Machine Learning AlgorithmsCore Machine Learning Algorithms
Core Machine Learning AlgorithmsPriyanka Kasture
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
machine learning
machine learningmachine learning
machine learninggunjisrihari2
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2uetian12
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Rabby Bhatt
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 

Similar to The 10 Algorithms Machine Learning Engineers Need to Know.pptx (20)

Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
causality_discussion_slides_final.pdf
causality_discussion_slides_final.pdfcausality_discussion_slides_final.pdf
causality_discussion_slides_final.pdf
 
Core Machine Learning Algorithms
Core Machine Learning AlgorithmsCore Machine Learning Algorithms
Core Machine Learning Algorithms
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
machine learning
machine learningmachine learning
machine learning
 
Data Science Using Python
Data Science Using PythonData Science Using Python
Data Science Using Python
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 

More from Chode Amarnath

Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxChode Amarnath
 
Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptxChode Amarnath
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with baggingChode Amarnath
 
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodingsChode Amarnath
 
Validation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategiesValidation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategiesChode Amarnath
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...Chode Amarnath
 

More from Chode Amarnath (6)

Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptx
 
Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodings
 
Validation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategiesValidation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategies
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...
 

Recently uploaded

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)Dr. Mazin Mohamed alkathiri
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 

Recently uploaded (20)

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 

The 10 Algorithms Machine Learning Engineers Need to Know.pptx

  • 1. The 10 Algorithms Machine Learning Engineers Need to Know By Chode Amarnath
  • 2. Machine Learning algorithms are classified into 4 types → Supervised → Unsupervised Learning → Semi - Supervised Learning → Reinforcement Learning
  • 3. Supervised Learning → The machine trained on Labeled data first, we train the machine with the input and corresponding output, and then we ask the machine to predict the output using the test dataset. Let's understand supervised learning with an example. → Suppose we have an input dataset of cats and dog images. → So, first, we will provide the training to the machine to understand the images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height (dogs are taller, cats are smaller), etc. After completion of training, we input the picture of a cat and ask the machine to identify the object and predict the output. Now, the machine is well trained, so it will check all the features of the object, such as height, shape, colour, eyes, ears, tail, etc., and find that it's a cat. So,
  • 4. Advantages and Disadvantages Advantages: → Since supervised learning work with the labelled dataset so we can have an exact idea about the classes of objects. → These algorithms are helpful in predicting the output on the basis of prior experience. Disadvantages: → These algorithms are not able to solve complex tasks. → It may predict the wrong output if the test data is different from the training data. → It requires lots of computational time to train the algorithm.
  • 5. Unsupervised Learning → In unsupervised machine learning the machine is trained on unlabeled dataset → The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset according to the similarities, patterns, and differences. → Machines are instructed to find the hidden patterns from the input dataset.
  • 6. Clustering → Clustering is a process of partitioning a set of data (or objects) into a set of meaningful sub-classes. → One is market segmentation where you may have a database of customers and want to group them into different market segments so you can sell to them separately or serve your different market segments better. → The K-Means algorithm is by far the most popular, by far the most widely used clustering algorithm.
  • 7. Dimensionality Reduction There are a couple of different reasons why one might want to do dimensionality reduction. → One is data compression. → data compression not only allows us to compress the data and have it therefore use up less computer memory or disk space, but it will also allow us to speed up our learning algorithms.
  • 8. → Linear regression → logistic regression → Decision tree → SVM Algorithm → Naive Bayes algorithm → KNN algorithm → K - means → Random Forest algorithm → Dimensionality reduction algorithm
  • 9. Important Links Code Video - https://www.youtube.com/watch?v=rw84t7QU2O0 https://www.fireblazeaischool.in/blogs/assumptions-of-linear- regression/#Multivariate_Normality
  • 10. The four Assumption of Linear regression Linear Regression is a useful statistical method we can use to understand the relationship between two variables, x and y Linear model make the following assumptions 1) Linear relationship between the variable and the target(Linearity). 2) Multivariate normality 3) No or little Collinearity 4) Homoscedasticity 5) No auto-correlation
  • 11. 1. The Two Variables Should be in a Linear Relationship Linear relationship refer to the relation between the independent variables X and target Y.
  • 12. The Assumptions of linear relationship can be easily visualized using scatter plots where we plot the independent variable X in the a-axis and the dependent variable Y in the Y - axis.
  • 13.
  • 14. Linear Relationship - Residual Plots
  • 15. Multivariate Normality - Histogram Multivariate Normality means that every independent Variable X follow a Gaussian Distribution(Normally Distribution). 1) Normality can be assessed with histograms and Q-Q plots. 2) Normality can be statistically tested for example with the Kolmogorov - Smirnov test. 3) When the variable is not normally distributed a non-linear transformation(Eg: logarithm - transformation) may fix this issue. Multiple regression assumes that the residuals are normally distributed
  • 16. Tests to check Multivariate Normality Q-Q plots -- If the data is normally distributed then it gets a fairly a straight line. → If it not normal then seen with deviation in the straight line
  • 17.
  • 18.
  • 19. No or Low Multicollinearity The next assumption of linear regression is that there should be less or no multicollinearity in the given dataset. → This situation occurs when the features or independent variables of a given dataset are highly correlated to each other. → In a model having correlated variables, it becomes difficult to determine which variable is contributing to predict the target variable. Another thing is, the standard errors tend to increase due to the presence of correlated variables.
  • 20. Methods to handle Multicollinearity → You can drop one of those features which are highly correlated in the given data. → Derive a new feature from collinear features and drop these features (used for making new features).
  • 21. Multicollinearity can be detected via various methods. we will focus on the most common one – VIF (Variable Inflation Factors). → VIF determines the strength of the correlation between the independent variables. It is predicted by taking a variable and regressing it against every other variable. https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/ → R^2 value is determined to find out how well an independent variable is described by the other independent variables. A high value of R^2 means that the variable is highly correlated with the other variables. This is captured by the VIF which is denoted below. → VIF = 1/ 1 - R^2 → So, the closer the R^2 value to 1, the higher the value of VIF and the higher the multicollinearity with the particular independent variable.
  • 22. VIF starts at 1 and has no upper limit → VIF = 1, no correlation between the independent variable and the other variables. → VIF exceeding 5 or 10 indicates high multicollinearity between this independent variable and the others.
  • 23.
  • 24. Fixing Multicollinearity Dropping one of the correlated features will help in bringing down the multicollinearity between correlated
  • 25. Multicollinearity in categorical Variables → multicollinearity can be detected with spearman rank correlation coefficient(ordinal variables) → Chi - Square test (Nominal variables). It is important to note that the variables to be compared should have only 2 categories i.e 1 and 0 the chi-square test fails to determine the correlation between variables with more than 2 categories
  • 26. Chi-Square Test Chi - Square test is a statistical test which is used to find out the → Difference between the observed and the expected data. → Find the correlation between categorical variables is due to chance, or if it is due to a relationship between them. → It is important to note that the variables to be compared should have only 2 categories i.e 1 and 0 the chi-square test fails to determine the correlation between variables with more than 2 categories. Link
  • 27. Homoscedasticity Important video link https://www.youtube.com/watch?v=35jMqo2IroE → To understand Homoscedasticity we must understand Residual value of the dependent variable in regression Analysis. → Residual value are the difference between the actual and predicted value. → Homoscedasticity refers to whether these residuals distributed equally or whether they tend to cluster together at some values and spread far at some other values → If the residuals are equally distributed then it is called homoscedasticity. → if the residuals tend to cluster together at some values it is called Heteroscedasticity
  • 28. If we do regression analysis and draw the chart of residual variable distribution Residuals are distributed uniformly and any cluster formed uniformly
  • 29. Heteroscedasticity → From left to right the distribution has taken triangle shape → At left side the values are coming very close together as we are going left to right the values are far away from each other
  • 30. To Check Draw regplot on the basis of predicted and residual to check homoscedasticity
  • 31. No Autocorrelation When you are building a linear regression model for forecasting purpose, we’ll come across this problem called autocorrelation and create multiple issues in interpreting the results. Autocorrelation → Linear model assumes that error terms are independent. → when you build regression model the error terms need to be completely independent of each other. → the error term is the difference between the expected price at a particular time and the price that was actually observed.
  • 32. → when you have uncorrelated the error terms will be randomly distributed across the origin and no pattern
  • 33. Outliers An outlier is a data point which is significantly different from the remaining data. Algorithms susceptible to outliers 1) Linear models(Linear models) 2) Adaboost
  • 34.
  • 35.
  • 36.
  • 37. Detecting Outliers Normal Distribution → 99% of the observation of a normal distribution variable lie within the mean +- 3 * standard deviation. → values outside +- 3 * standard deviation are considered as outliers. Skewed Distribution → The general approach is to calculate the quantiles and then Interquartile range(IQR) → IQR = 75th Quantile - 25th Quantile → Upper Limit = 75th Quantile + IQR * 1.5
  • 38. Note For extreme outliers, Multiply the IQR by 3 instead of 1.5
  • 39. Linear regression → In linear regression a relationship is established between independent and dependent variable by fitting the the line. → The line is represented by a line y(Dependent variable) = a*x(independent ) + b(intercept) → linear regression is used to solve regression problems
  • 40. Advantages of Linear Regression https://www.geeksforgeeks.org/ml-advantages-and-disadvantages-of-linear-regression/ → Linear Regression is simple to implement and easier to interpret the output coefficient. Explanation: → when you know the relationship between the independent and dependent variable have a linear relationship, this algorithm is the best to use because of it’s less complexity to compared to other model. → Linear Regression is susceptible to overfitting but it can be avoided using some dimensionality reduction technique(Regularization L1 and L2 technique and cross validation)
  • 41. Disadvantages of Linear regression → On the other hand in linear regression technique outliers can have huge effects on the regression and boundaries are linear in the technique. Explanation : Linear regression assumes a linear relationship between dependent and independent variables, that means it assumes that there is a straight line relationship between them Summary : → Linear relationship is a great tool to analyze the relationship among the variables but it is not recommended for most practical applications
  • 42. Assumptions of Logistic Regression Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms particularly regarding linearity, normality, homoscedasticity, and measurement level. → First, logistic regression does not require a linear relationship between the dependent and independent variables. → Second, the error terms (residuals) do not need to be normally distributed. → Third, homoscedasticity is not required. → Finally, the dependent variable in logistic regression is not measured on an interval or ratio scale.
  • 43. Supervised learning → In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. → Supervised learning problems are categorized into 1) Regression. 2) Classification problems.
  • 44. Classification → In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories. → The main goal of classification is to predict the target class (Yes/ No). → The classification problem is just like the regression problem, except that the values we now want to predict take on only a small number of discrete values. For now, we will focus on the binary classification problem in which y can take on only two values, 0 and 1.
  • 45. Types of classification: Binary classification. When there are only two classes to predict, usually 1 or 0 values. Multi-Class Classification When there are more than two class labels to predict we call multi-classification task.
  • 46.
  • 47.
  • 48. Logistic regression model https://www.javatpoint.com/logistic-regression-in-machine-learning → Logistic regression predicts the output of a categorical dependent variable. → Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. → But instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. → In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1).
  • 49. Logistic regression uses the concept of predictive modeling as regression; therefore, it is called logistic regression, but is used to classify samples; Therefore, it falls under the classification algorithm.
  • 50. Logistic Function (Sigmoid Function): → The sigmoid function is a mathematical function used to map the predicted values to probabilities. It maps any real value into another value within a range of 0 and 1. → In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0.
  • 51.
  • 53. Regression → Linear Regression is used to handle regression problems → whereas Logistic regression is used to handle the classification problems. → Linear regression provides a continuous output → Logistic regression provides discreet output. → The purpose of Linear Regression is to find the best-fitted line while Logistic regression is one step ahead and fitting the line values to the sigmoid curve. →
  • 54. When do you use linear regression vs Decision Trees? → Linear regression is a linear model, which means it works really nicely when the data has a linear shape. → But, when the data has a non-linear shape, then a linear model cannot capture the non-linear features. → So in this case, you can use the decision trees, which do a better job at capturing the non-linearity in the data by dividing the space into smaller sub-spaces depending on the questions asked.
  • 55. Support Vector Machines → SVMs are considered by many to be the most powerful 'black box' learning algorithm, and by posing a cleverly-chosen optimization objective, one of the most widely used learning algorithms today. → Compared to both logistic regression and neural networks, the support Vector Machine, or SVM sometimes give a cleaner and sometimes more powerful way of learning **complex nonlinear functions**. → SVMs are also called as Large Margin Classifiers.
  • 56. https://www.javatpoint.com/machine-learning- support-vector-machine-algorithm → Support Vector regression is one of the most popular supervised learning algorithm, which is used for classification and regression model. → Primarily used for Classification problem.
  • 57. Important topics in SVMs are: 1) Large Margin Classification. 2) Kernels
  • 58. Large Margin Intuition In SVMs Theta(transpose X), just a little bit bigger than Zero. and other much Less than Zero. This builds an extra safety factor or safety margin factor in SVM.
  • 59.
  • 60. Consider a case, if you set “C” a very large value If “C” is very very large, then minimizing this optimization objective we are going to be highly motivated to choose a value, so that this first term equal to Zero. What would it take to make this first term in the objective equal to Zero. → when ever we have a training example of Y = 1, if you to make the first term Zero, what we need is to find a value of theta. → so that Thete(transpose)Xi is >= 1 → when ever we have a example with label 0, → so that Thete(transpose)Xi is >= -1
  • 61.
  • 62. The SVM will instead choose this decision boundary in black and that seems like a better decision boundary. → This back decision boundary has a larger distance, that distance is called a “Margin”. → This distance is called the margin of the SVM. → this gives a SVM robustness and it tries to separate the data with a large margin as possible.
  • 63.
  • 64.
  • 65.
  • 66. SVM https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/ https://www.analyticsvidhya.com/blog/2014/10/support-vector-machine- simplified/?utm_source=blog&utm_medium=understandingsupportvectormachinearticle Datacamp SVM “Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems.
  • 67. Definition and Objective The objective of the support vector machine algorithm is to find a hyperplane in an N- dimensional space(N — the number of features) that distinctly classifies the data points. The SVM algorithm has a feature to ignore outliers and find the hyper-plane that has the maximum margin. Hence, we can say, SVM classification is robust to outliers. SVM offers high accuracy compared to other classifiers such as logistic regression and decision trees. It is used in variety of application such as face detection, intrusion detection, classification emails, news articles and web pages.
  • 68.
  • 69.
  • 70. Support Vector, Hyperplane Support vectors Support vectors are the data points, which are closest to the hyperplane and influence the position and orientation of the hyperplane, using these support vectors, we maximize the margin of the classifier Hyperplane A hyperplane is a decision plane which separates between a set of objects having different class memberships. Margin A margin is a gap between the two lines on the closest class points. If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a
  • 71. How does SVM work? The main objective is to segregate the given dataset in the best possible way. The objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. Dealing with non-linear and inseparable planes Some problems can’t be solved using linear hyperplane, as shown in the figure below In such situation, SVM uses a kernel trick to transform the input space to a higher dimensional
  • 72. SVM Kernels kernel takes a low-dimensional input space and transforms it into a higher dimensional space. In other words, you can say that it converts non separable problem to separable problems by adding more dimension to it. It is most useful in non-linear separation problem. Kernel trick helps you to build a more accurate classifier. Non-linear data
  • 73.
  • 74. Tuning Hyperparameters Kernel: The main function of the kernel is to transform the given dataset input data into the required form. Polynomial and RBF are useful for non-linear hyperplane. Polynomial and RBF kernels compute the separation line in the higher dimension. This transformation can lead to more accurate classifiers. Regularization: Regularization parameter in python's Scikit-learn C parameter used to maintain regularization. A smaller value of C creates a small-margin hyperplane and a larger value of C creates a larger-margin hyperplane. Gamma: A lower value of Gamma will loosely fit the training dataset, whereas a higher value of gamma will exactly fit the training dataset, which causes over-fitting. In other words, you can say a low value of gamma considers only nearby points in calculating the separation line, while the a value of gamma considers all the data points in the calculation of the separation line.
  • 75.
  • 76. Important Decision trees best → https://www.youtube.com/watch?v=eKD5gxPPeY0&list=PLBv09BD7ez_4temBw 7vLA19p3tdQH6FYO&index=1
  • 77. Decision Tree(https://www.youtube.com/watch?v=nWuUahhK3Oc&t=1126s) A supervised learning technique in Data Mining, which can be used for prediction of both Numeric and Non-Numeric independent variable. Trees in general use a divide and conquer strategy to try to divide the training data into smaller and smaller subsets. Algorithm go through all the predictors and see which one of them is the most predictive of the target feature and that feature will be the root of our tree. So you have the root at the top and then you have splits and then you have decision nodes When tree terminate we call that a terminal node.
  • 78. But the questions you should ask (and should know the answer to) are: → How do you split a decision tree? → What are the different splitting criteria? → What is the difference between Gini and Information Gain? LINK - https://www.analyticsvidhya.com/blog/2020/06/4-ways-split-decision-tree/
  • 79.
  • 80. Parent and Child Node: A node that gets divided into sub-nodes is known as Parent Node, and these sub-nodes are known as Child Nodes. Since a node can be divided into multiple sub-nodes, therefore a node can act as a parent node of numerous child nodes Root Node: The top-most node of a decision tree. It does not have any parent node. It represents the entire population or sample Leaf / Terminal Nodes: Nodes that do not have any child node are known as Terminal/Leaf Nodes A decision tree makes decisions by splitting nodes into sub-nodes. This process is performed multiple times during the training process until only homogenous nodes are left.
  • 81. How to choose what feature to split on at each node? At Rote Node, As well as the left branch and the right branch of the decision tree. Important points to consider for → We had to decide if there were a few examples at that node comprising a mix of cats and dogs. → Decision tree will choose what feature to split on in order to try to maximize purity.
  • 82. Purity Purity(Means you want to get to what subsets, which are as close as possible to all data samples Example : If we had feature that said does the animal has a cat DNA, we didn’t have this feature, but if we did, we could have split on this feature at the root node.
  • 83. Two categories based on the type of target variable 1.Continuous Target Variable → Reduction in Variance 2. Categorical Target Variable → Gini Impurity → Information Gain → Chi-Square
  • 84. Variance → Variance is the measure of spread it tells us how far your data is spread from the mean. → Low value of variance is leading to more pure nodes. → High value of variance is leading to more impure nodes. → low variance for splitting
  • 85. Properties of Variance → Used when the target is continuous. → Split with lower Variance is selected.
  • 86. Decision Tree Splitting Method #1: Reduction in Variance Reduction in Variance is a method for splitting the node used when the target variable is continuous, i.e., regression problems. It is so-called because it uses variance as a measure for deciding the feature on which node is split into child nodes. → x is sample, MU is mean, n is number of sample → lower value of variance is moving to pure nodes
  • 87.
  • 88.
  • 89. Here are the steps to split a decision tree using reduction in variance: For each split, → individually calculate the variance of each child node → Calculate the variance of each split as the weighted average variance of child nodes → Select the split with the lowest variance → Perform steps 1-3 until completely homogeneous nodes are achieved
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96. Decision Tree Splitting Method #2: Information Gain
  • 97. Example To select a feature to split further we need to know how impure or pure that split will be. → A pure sub-split means that either you should be getting “yes” or “no”. Suppose this is our dataset. https://www.analyticsvidhya.com/blog/2021/10/an-introduction-to-random-forest-algorithm-for-beginners/
  • 98. Random Forest(Bagging Algorithm) Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model. → We’re combining multiple trees to get the final output. And hence it is called forest. But why is it called Random Forest. → Because we use the random bootstrap sample.
  • 99. Random forests creates decision trees on randomly selected data samples, gets prediction from each tree and selects the best solution by means of voting. It also provides a pretty good indicator of the feature importance.
  • 100.
  • 101. How does the algorithm work? It works on four steps: 1)From a given dataset multiple bootstrap samples are created and the number of bootstrap samples depend on the number of models we want to train. Eg : If I want to build 10 models here then I’ll create 10 bootstrap samples. 2) Construct a decision tree for each bootstrap sample and get a predicted value from each decision tree. 3) Perform a vote for each predicted result. 4) select the prediction result with the most votes as the final prediction.
  • 102. Resample Resampling is the process of creation of new samples based on observed sample → Permutation tests → Bootstrapping -- Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples.
  • 103. Bootstrapping Bootstrapping is a powerful, non-parametric resampling technique that’s used to assess the uncertainty in the estimator. → In bootstrapping, a large number of samples with the same size are drawn repeatedly from an original sample. → This allows a given observation to be included in more than one sample, which is known as sampling with replacement. → Each sample is of identical size. → The larger n, the closer the set of samples will be to the ideal bootstrap sample.
  • 104.
  • 105. Bootstrap aggregation Bootstrap aggregation, also known as bagging, is a powerful ensemble method that was proposed to prevent overfitting. → The concept behind bagging is to combine the prediction of several base learners to create a more accurate output. → Algorithms such as neural network and decisions trees are example of unstable learning algorithms. → Bagging also supports the classification and regression problem. → Bootstrap is effective on small dataset.
  • 106.
  • 107.
  • 108.
  • 109. Advantages: Random forests is considered as a highly accurate and robust method because of the number of decision trees participating in the process. It does not suffer from the overfitting problem. The main reason is that it takes the average of all the predictions, which cancels out the biases. The algorithm can be used in both classification and regression problems. Random forests can also handle missing values. There are two ways to handle these: using median values to replace continuous variables, and computing the proximity- weighted average of missing values. You can get the relative feature importance, which helps in selecting the most contributing features for the classifier.
  • 110. Disadvantages: Random forests is slow in generating predictions because it has multiple decision trees. Whenever it makes a prediction, all the trees in the forest have to make a prediction for the same given input and then perform voting on it. This whole process is time-consuming. The model is difficult to interpret compared to a decision tree, where you can easily make a decision by following the path in the tree.
  • 111. Finding important features Random forests also offers a good feature selection indicator. Scikit-learn provides an extra variable with the model, which shows the relative importance or contribution of each feature in the prediction.
  • 112. Random Forests vs Decision Trees Random forests is a set of multiple decision trees. Deep decision trees may suffer from overfitting, but random forests prevents overfitting by creating trees on random subsets. Decision trees are computationally faster. Random forests is difficult to interpret, while a decision tree is easily interpretable and can be converted to rules.
  • 113. Gradient Boosting Algorithm GBM is a boosting algorithm used when we deal with plenty of data to make predictions with high prediction power Boosting → Boosting is actually a machine learning algorithm which combines the prediction of several base estimator in order to improve robustness over a single estimator. → It combines multiple weak or average predictors to build a strong predictor.
  • 114. XGboost is a powerful machine learning algorithm especially speed and accuracy is required. XGboost requires model requires parameter tuning to improve and fully leverage its advantages over other algorithms.
  • 115. XGboost or extreme gradient boosting is one of the well- known gradient boosting technique(ensemble) having enhanced performance and speed in tree-based
  • 116. Algorithm Working Process Linear Regression - A relationship is established between Independent and Dependent variable by fitting a straight line. Logistic Regression - In Logistic regression instead fitting a straight line an s shaped sigmoid function is fitted to get the output in discrete form, which provides to maximum values 0 or 1 Linear regression - The method for calculating loss function in linear regression is the mean squared error Logistic regression - whereas for logistic regression it is maximum likelihood estimation.