SlideShare a Scribd company logo
1 of 222
Philosophies of Modeling
The simplest explanation is the best explanation.
In modeling, if we are given two models that
predict equally well, then we should always
choose the simpler one. 
Machine Learning India 1
Algorithm #1:
Least Squares Fitting
Machine Learning India 2
Scatterplot of your data:
Machine Learning India 3
What is the plot good for?
Machine Learning India 4
Prediction? BAM!
Machine Learning India 5
How do you do that?
Machine Learning India 6
You fit a line!
Machine Learning India 7
But is this the best line?
Machine Learning India 8
Or does the new line fit our data better?
Machine Learning India 9
How about a horizontal line?
Machine Learning India 10
How do you judge whether or not
a line is a good fit?
Machine Learning India 11
By seeing how close it is to the
data points? BAM!
Machine Learning India 12
Back to the horizontal line.
Machine Learning India 13
Residual:
Machine Learning India 14
Machine Learning India 15
Total Error = Sum of Squared Residuals
= (b – y1)2 + (b – y2)2 + …. (b – Yn) 2
Machine Learning India 16
What if rotate the line a whole lot?
Machine Learning India 17
So there is a sweet spot between
a horizontal and a vertical line!
Machine Learning India 18
y = mx + c
Slope Y-Intercept
Machine Learning India 19
Line:
Machine Learning India 20
We will have to find the optimal values
of ‘m’ and ‘c’, in order to minimize the
sum of squared residuals.
Machine Learning India 21
Since we want to fit a line that will give us
the least amount of ‘sum of squares’, this
method for finding the best values of ‘m’
and ‘c’ is called least squares.
Machine Learning India 22
Plotting the ‘sum of squared residuals’
versus each rotation…
Machine Learning India 23
Machine Learning India 24
Machine Learning India 25
Machine Learning India 26
Machine Learning India 27
Big Important Concept #1:
We have to minimize the difference
between the observed values (target
values) and the line (output values).
Machine Learning India 28
Big Important Concept #2:
We do this by taking the derivative and
finding where the value of the derivative
equals zero.
Machine Learning India 29
Big Important Concept #3:
Reducible and Irreducible error!
Machine Learning India 30
And you’re done!
Machine Learning India 31
Algorithm #2:
Linear Regression
Machine Learning India 32
Fitting a linear model:
1. Use least squares.
2. Calculate R2.
3. Calculate p-vale for R2.
Machine Learning India 33
Before understanding R2, let us understand
what variance, standard deviation,
covariance and correlation mean.
Machine Learning India 34
Variance is the average of the squared
differences from the mean.
Machine Learning India 35
Machine Learning India 36
• It is a measure of how much the
members of a group differ from the
mean value of the group.
• It is a measure of how spread out the
members are.
• It is the square root of variance.
Standard Deviation:
Machine Learning India 37
Machine Learning India 38
For the entire population.
Machine Learning India 39
For a sample from the population.
Machine Learning India 40
Covariance is the measure of the joint
variability of two random variables.
The sign of covariance shows the tendency
of the linear relationship between variables.
Machine Learning India 41
Formula for covariance:
Over the entire
population
Machine Learning India 42
Formula for covariance:
Over a sample
from population
Machine Learning India 43
Correlation is a statistical technique that
can show whether and how strongly pairs
of variables are related.
For example, height and weight are related;
taller people tend to be heavier than
shorter people.
Machine Learning India 44
Difference?
Machine Learning India 45
Covariance provides the direction of the
linear relationship, while correlation
provides the direction as well as strength.
Machine Learning India 46
Covariance has no upper or lower bounds,
and the value is dependent on the scale of
the variable, while…
Correlation is always between -1 and +1,
and is scale independent.
Machine Learning India 47
Guidelines:
• First find out the pattern that the data is
exhibiting, by looking at a scatterplot.
• Correlation is only applicable to linear
relationships.
• Correlation is not causation.
• Correlation strength does not necessarily mean
that correlation is statistically significant.
Machine Learning India 48
Guess the correlation coefficients!
Machine Learning India 49
How about these?
Machine Learning India 50
Pearson’s Correlation Coefficient:
In statistics, the Pearson correlation coefficient
(PCC), is a measure of the linear correlation
between two variables X and Y.
Machine Learning India 51
Machine Learning India 52
Revision!
Machine Learning India 53
Revision!
Machine Learning India 54
How can we more objectively state
whether or not a relationship exists
between two variables?
Machine Learning India 55
Relationship rule of thumb:
If |r| >= 2 / (√n)
Then, a relationship exists.
Machine Learning India 56
Fitting a linear model:
1. Use least squares.
2. Calculate R2.
3. Calculate p-value for R2.
Coming back to,
Machine Learning India 57
r2 : R2 : R-Squared
It is a measure of how well a model fits to
data. It measures the goodness-of-fit.
It can also be seen as a statistical measure
of how close the data is fitted to the line.
Machine Learning India 58
r2 : R2 : R-Squared
In general higher the R2, better the model
fits your data. R2 can be defined as a
percentage as well as a decimal value
between 0 and 1.
Machine Learning India 59
r2 : R2 : R-Squared
Machine Learning India 60
R2 = Var(mean) – Var(line)
Var(mean)
Machine Learning India 61
If R2 turns out to be 80%, then it
means that there is 80% less variation
around the line than the mean.
Machine Learning India 62
Big Important Concept #4:
R2 gives the percentage of variation
explained by the relationship between two
variables.
Machine Learning India 63
Big Important Concept #5:
If someone gives you the value of the plain
old R (PCC), just square it!
Machine Learning India 64
Adjusted R2
The adjusted R-squared is a modified
version of R-squared that has been
adjusted for the number of predictors in
the model.
Machine Learning India 65
Adjusted R2
Machine Learning India 66
P-value
When you perform a hypothesis test in statistics, a
p-value helps you determine the significance of
your results. It answers the question, “Does this
result provide enough evidence that something
is wrong with my assumptions, or could this
result come out just because of luck?”
Machine Learning India 67
The smaller the p-value, the lesser
likely it is that the result we got, is an
outcome of luck.
Machine Learning India 68
Process:
1. Assuming that the null hypothesis is true.
2. Taking a sample and getting the statistic.
3. Working out how likely it is to get a statistic
like this, by calculating the p-value.
Machine Learning India 69
If ‘p’ is low, NULL must GO!

Machine Learning India 70
If ‘p’ is high, alternative
hypothesis is a lie! 
Machine Learning India 71
Fitting a linear model:
1. Use least squares.
2. Calculate R2.
3. Calculate p-value for R2.
Coming back to,
Done!
Machine Learning India 72
Linear Regression Visualization
Machine Learning India 73
Big Important Concept #6:
Overfitting and Underfitting!
Machine Learning India 74
One of the major aspects of training your
machine learning model is avoiding
overfitting. The model will have a low
accuracy if it is overfitting. This happens
because your model is trying too hard to
capture the noise in your training dataset.
Machine Learning India 75
By noise we mean the data points that don’t
really represent the true properties of your data,
but random chance. Learning such data points,
makes your model more flexible, at the risk of
overfitting. The concept of balancing bias and
variance, is helpful in understanding the
phenomenon of overfitting.
Machine Learning India 76
Big Important Concept #7:
Bias Variance Tradeoff:
The inability of a machine learning model to
capture the true relationship is called bias.
The difference in fits between datasets is
called variance. The goal is to achieve low
bias and low variance.
Machine Learning India 77
Bias Variance Tradeoff
Machine Learning India 78
Bias Variance Tradeoff
Machine Learning India 79
Big Important Concept #8:
No Free Lunch Theorem:
No single machine learning algorithm is
better than all others on all problems. It is
common to try multiple models and find
the one that works the best for that
particular problem.
Machine Learning India 80
Algorithm #3:
Multiple Linear Regression
Machine Learning India 81
Multiple Linear Regression is just an
extension of simple linear regression.
It is used to determine a mathematical
relationship among a number of random
variables. In other terms, MLR examines how
multiple independent variables are related to one
dependent variable.
Machine Learning India 82
Machine Learning India 83
The equation:
Machine Learning India 84
Machine Learning India 85
Alert:
• Having more independent variables can make
the model complicated.
• Adding more independent variables does not
guarantee a better prediction model.
Machine Learning India 86
Alert:
Lack of multicollinearity must be checked for.
Multicollinearity is the phenomenon where one of
more independent variables in a regression model
strongly predict one or more other independent
variables. It might result in dummy-variable trap.
Homework!
Machine Learning India 87
Regularization:
This is a form of regression, that constrains/
regularizes or shrinks the coefficient estimates
towards zero. In other words, this technique
discourages learning a more complex or flexible
model, so as to avoid the risk of overfitting.
Ridge Regression
Lasso Regression
Machine Learning India 88
How do we estimate which parameters
are actually important for our model?
Machine Learning India 89
• Have domain knowledge.
• Use Subset Selection Methods.
– All-in method
– Backward Elimination
– Forward Elimination
– Bidirectional Elimination
– Score Comparison
Machine Learning India 90
General Intuition:
Machine Learning India 91
Algorithm #4:
Polynomial Regression
Machine Learning India 92
Polynomial Regression:
In statistics, polynomial regression is a form of
regression analysis in which the relationship
between the independent variable x and the
dependent variable y is modeled as an nth
degree polynomial in x.
Machine Learning India 93
The equation:
Machine Learning India 94
The fitment:
Machine Learning India 95
The fitment in 3-Dimensions:
Machine Learning India 96
The fitment in 3-Dimensions:
Machine Learning India 97
Woah, we had a great time
predicting continuous values!
Machine Learning India 98
What if I want to predict
discrete values?
Machine Learning India 99
Algorithm #5:
Logistic Regression
Machine Learning India 100
Logistic regression is a predictive analysis. It is
used to describe data and to explain the
relationship between one dependent binary
variable and one or more independent variables.
Machine Learning India 101
Logistic regression is intended for binary
(two-class) classification problems.
Machine Learning India 102
Machine Learning India 103
Machine Learning India 104
Machine Learning India 105
y = mx + c
Slope Y-Intercept
Machine Learning India 106
Logistic Function
Machine Learning India 107
Machine Learning India 108
Machine Learning India 109
Machine Learning India 110
Machine Learning India 111
Machine Learning India 112
Big Important Concept #9:
Evaluating classification model with the
help of metrics! Choosing the right metric is
paramount in judging how well the model is
performing.
Machine Learning India 113
A confusion matrix is a table that is often
used to describe the performance of a
classification model (or "classifier") on a set
of test data for which the true values are
known. The confusion matrix itself is
relatively simple to understand, but the
related terminology can be confusing.
Machine Learning India 114
Machine Learning India 115
Machine Learning India 116
Machine Learning India 117
Machine Learning India 118
Woah, we had a great time
predicting binary discrete values!
Machine Learning India 119
What if I want to predict n-ary
discrete values?
Machine Learning India 120
Algorithm #5:
Softmax Regression
Machine Learning India 121
Softmax regression (or multinomial logistic
regression) is a generalization of logistic
regression to the case where we want to handle
multiple classes.
Machine Learning India 122
In logistic regression we assumed that the
labels were binary: y(i) ∈ {0,1}. We used such
a classifier to distinguish between two
categories. Softmax regression allows us to
handle y(i) ∈ {1, …, K} where K is the number
of classes.
Machine Learning India 123
Machine Learning India 124
Machine Learning India 125
Machine Learning India 126
What if we have a huge
number of classes?
Machine Learning India 127
Algorithm #6:
Linear Discriminant Analysis
Machine Learning India 128
Algorithm #6:
Linear Discriminant Analysis
Let us first understand Principal Component Analysis
Machine Learning India 129
Algorithm #6:
Principle Component Analysis
Machine Learning India 130
In real world data analysis tasks we analyze
complex data i.e. multi-dimensional data.
Machine Learning India 131
As the dimensions of data increase, the difficulty
to visualize it and to perform computations on
the data also increases. How do we do it?
Remove the redundant dimensions.
Only keep the most important dimensions.
Machine Learning India 132
Principal component analysis (PCA) to the
rescue! It is a technique used to emphasize
variation and bring out strong patterns in a
dataset. It's often used to make data easy to
explore and visualize.
It is used for dimensionality reduction.
Machine Learning India 133
Too much of visualization.
StatQuest to our rescue!
Machine Learning India 134
https://www.youtube.com/watch?
v=FgakZw6K1QQ
Machine Learning India 135
The main idea of principal component analysis (PCA) is
to reduce the dimensionality of a data set consisting
of many variables correlated with each other, either
heavily or lightly, while retaining the variation present
in the dataset, up to the maximum extent.
Machine Learning India 136
The same is done by transforming the variables to a
new set of variables, which are known as the
principal components (or simply, the PCs) and are
orthogonal, ordered such that the retention of
variation present in the original variables decreases as
we move down in the order.
Machine Learning India 137
So, in this way, the 1st principal component retains
maximum variation that was present in the original
components. The principal components are the
eigenvectors of a covariance matrix, and hence they
are orthogonal.
Machine Learning India 138
Puzzle!
If you want to reduce the dimensionality of
data from 2D to 1D, while classifying it into two
categories. How will you do it?
Machine Learning India 139
Algorithm #7:
Linear Discriminant Analysis
Machine Learning India 140
Linear discriminant analysis is similar to
PCA, both can help us reduce the
dimensionality, but LDA also focuses on
increasing or maximizing the linear
separability between classes, in data.
Machine Learning India 141
Linear discriminant analysis is similar to
PCA, both can help us reduce the
dimensionality, but LDA also focuses on
increasing or maximizing the linear
separability between classes, in data.
Machine Learning India 142
Machine Learning India 143
Machine Learning India 144
Machine Learning India 145
Machine Learning India 146
Machine Learning India 147
PCA and LDA both rank the new axes in
order of importance. PCA accounts for
the most variation in data, while LDA
accounts for the most separability in
data.
Machine Learning India 148
An eigenvector is a vector whose direction remains
unchanged when a linear transformation is applied to
it. Consider the image below in which three vectors
are shown. The green square is only drawn to illustrate
the linear transformation that is applied to each of
these three vectors.
Machine Learning India 149
More about Eigenvectors on:
www.visiondummy.com/2014/03/eigenvalues-
eigenvectors/
Machine Learning India 150
Algorithm #8:
Support Vector Machine
Machine Learning India 151
A Support Vector Machine (SVM) is a
discriminative classifier formally defined
by a separating hyperplane.
It is an algorithm for linearly separable
binary sets.
Machine Learning India 152
In other words, given labeled training data
(supervised learning), the algorithm outputs an
optimal hyperplane which categorizes new
examples. In two dimentional space this
hyperplane is a line dividing a plane in two
parts wherein each class lay in either side.
Machine Learning India 153
Machine Learning India 154
Machine Learning India 155
Machine Learning India 156
The goal of the SVM is to classify all the
training vectors two classes.
Machine Learning India 157
Confusing? Don’t worry, we shall learn in
laymen terms.
Machine Learning India 158
Suppose you are given plot of two label classes
on graph as shown in the image. Can you
decide a separating line for the classes?
Machine Learning India 159
Any point that is left of line falls into black circle
class and on right falls into blue square class.
Separation of classes. That’s what SVM does.
Machine Learning India 160
So far so good. Now consider what if we had
data as shown in image below?
Machine Learning India 161
We apply transformation and add one more
dimension as we call it z-axis. Now can you
draw a separating hyperplane? Yes!
Machine Learning India 162
Machine Learning India 163
Machine Learning India 164
Machine Learning India 165
Machine Learning India 166
When we transform back this line to original
plane, it maps to circular boundary as shown in
image. These transformations are called
kernels.
Machine Learning India 167
Kernel functions:
These are functions which takes low dimensional input
space and transform it to a higher dimensional space
i.e. it converts not separable problem to separable
problem, these functions are called kernels. It is mostly
useful in non-linear separation problem. Simply put, it
does some extremely complex data transformations,
then find out the process to separate the data based on
the labels or outputs you’ve defined.
Machine Learning India 168
A bit complicated!
Machine Learning India 169
Which one do you think is appropriate?
Machine Learning India 170
Well, both the answers are correct. The first
one tolerates some outlier points. The second
one is trying to achieve 0 tolerance with perfect
partition.
Machine Learning India 171
But, there is trade off. In real world
application, finding perfect classes for millions
of samples from the training data set takes lot
of time. Therefore we define two terms
regularization parameter and gamma. These
are tuning parameters in SVM classifier.
Machine Learning India 172
Varying those we can achieve a considerable
non-linear classification line with more
accuracy in reasonable amount of time.
Machine Learning India 173
The Regularization parameter (often termed as
C parameter) tells the SVM optimization – the
extent to which you want to avoid
misclassifying each training example.
Machine Learning India 174
For large values of C, the optimization will choose a
smaller-margin hyperplane if that hyperplane does a
better job of getting all the training points classified
correctly. Conversely, a very small value of C will cause
the optimizer to look for a larger-margin separating
hyperplane, even if that hyperplane misclassifies more
points.
Machine Learning India 175
The gamma parameter defines how far the influence
of a single training example reaches, with low values
meaning ‘far’ and high values meaning ‘close’.
Machine Learning India 176
In other words, with low gamma, points far away from
plausible separation line are considered in calculation
for the separation line. Where as high gamma means
that the points close to plausible line are considered in
calculation.
Machine Learning India 177
Machine Learning India 178
How do we find out the right
hyperplane?
Machine Learning India 179
Identify the right hyperplane (scenario #1):
Machine Learning India 180
Rule #1:
Select the hyper-plane which segregates the
two classes better.
Machine Learning India 181
Identify the right hyperplane (scenario #2):
Machine Learning India 182
Rule #2:
Maximizing the distances between nearest data point
(either class) and hyper-plane helps us to decide the
right hyper-plane.
Machine Learning India 183
Identify the right hyperplane (scenario #3):
Machine Learning India 184
Rule #3:
SVM selects the hyper-plane which classifies the
classes accurately prior to maximizing margin.
Machine Learning India 185
SVM has a feature to ignore outliers and find the
hyper-plane that has maximum margin. Hence, we can
say, SVM is robust to outliers.
Machine Learning India 186
Algorithm
1.Define an optimal hyperplane: maximize margin
2.Extend the above definition for non-linearly separable
problems: have a penalty term for misclassifications.
3.Map data to high dimensional space where it is easier
to classify with linear decision surfaces: reformulate
problem so that data is mapped implicitly to this space.
Machine Learning India 187
To define an optimal hyperplane we need
to maximize the width of the margin (w).
Machine Learning India 188
Machine Learning India 189
We find w and b by solving the following objective
function using Quadratic Programming.
Machine Learning India 190
Algorithm #9:
Naïve Bayes
Cutest
Machine Learning India 191
The Naive Bayes Classifier technique is based on
the so-called Bayesian theorem and is
particularly suited when the dimensionality of
the inputs is high. Despite its simplicity, Naive
Bayes can often outperform more sophisticated
classification methods.
Machine Learning India 192
As indicated, the objects can be classified as
either GREEN or RED. Our task is to classify new
cases as they arrive, i.e., decide to which class
label they belong, based on the currently exiting
objects.
Machine Learning India 193
Since there are twice as many GREEN objects as
RED, it is reasonable to believe that a new case
(which hasn't been observed yet) is twice as
likely to have membership GREEN rather than
RED. In the Bayesian analysis, this belief is
known as the prior probability.
Machine Learning India 194
Machine Learning India 195
Since there is a total of 60 objects, 40 of which are
GREEN and 20 RED, our prior probabilities for class
membership are:
Machine Learning India 196
Since the objects are well clustered, it is
reasonable to assume that the more GREEN (or
RED) objects in the vicinity of X (test point), the
more likely that it belongs to that particular
color. To measure this likelihood, we draw a
circle around X which encompasses a number
(to be chosen a priori) of points irrespective of
their class labels.
Machine Learning India 197
Machine Learning India 198
Then we calculate the number of points in the circle
belonging to each class label. From this we calculate
the likelihood:
Machine Learning India 199
Machine Learning India 200
Although the prior probabilities indicate that X may
belong to GREEN (given that there are twice as many
GREEN compared to RED) the likelihood indicates
otherwise; that the class membership of X is RED
(given that there are more RED objects in the vicinity of
X than GREEN). In the Bayesian analysis, the final
classification is produced by combining both sources
of information, i.e., the prior and the likelihood, to
form a posterior probability using the so-called Bayes'
rule
Machine Learning India 201
Machine Learning India 202
Finally, we classify X as RED since its class
membership achieves the largest posterior
probability.
Machine Learning India 203
Machine Learning India 204
Algorithm #10:
K-Nearest Neighbors
These neighbors
are not annoying.
Machine Learning India 205
“Birds of a feather flock together.”
Machine Learning India 206
K-Nearest Neighbors is one of the most basic
yet essential classification algorithms in Machine
Learning. It belongs to the supervised learning
domain and finds intense application in pattern
recognition, data mining and intrusion
detection.
Machine Learning India 207
Machine Learning India 208
An understanding of how we calculate the
distance between points on a graph is necessary
before moving on. If you are unfamiliar with or
need a refresher on how this calculation is done.
Homework
Machine Learning India 209
Machine Learning India 210
Machine Learning India 211
Machine Learning India 212
Algorithm #11:
K-Means Clustering
Machine Learning India 213
K-means clustering is a type of unsupervised learning,
which is used when you have unlabeled data (i.e., data
without defined categories or groups). The goal of this
algorithm is to find groups in the data, with the
number of groups represented by the variable K. The
algorithm works iteratively to assign each data point to
one of K groups based on the features that are
provided. Data points are clustered based on feature
similarity.
Machine Learning India 214
Machine Learning India 215
The results of the K-means clustering algorithm are:
• The centroids of the K clusters, which can be used to
label new data
• Labels for the training data (each data point is
assigned to a single cluster)
Machine Learning India 216
Rather than defining groups before looking at the data,
clustering allows you to find and analyze the groups
that have formed organically.
Machine Learning India 217
Each centroid of a cluster is a collection of
feature values which define the resulting groups.
Examining the centroid feature weights can be
used to qualitatively interpret what kind of
group each cluster represents.
Machine Learning India 218
Machine Learning India 219
Machine Learning India 220
Machine Learning India 221
BAM! You guys are pros at regression, classification,
dimensionality reduction and clustering!!
Feeling like a data-scientist, eh?
Machine Learning India 222

More Related Content

What's hot

What's hot (14)

Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
Regression, Multiple regression in statistics
Regression, Multiple regression in statistics Regression, Multiple regression in statistics
Regression, Multiple regression in statistics
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
6260966
62609666260966
6260966
 
2 Day Training Day 1
2 Day Training Day 12 Day Training Day 1
2 Day Training Day 1
 
What is pattern recognition (lecture 3 of 6)
What is pattern recognition (lecture 3 of 6)What is pattern recognition (lecture 3 of 6)
What is pattern recognition (lecture 3 of 6)
 
Fuzzification of College Adviser Proficiency Based on Specific Knowledge
Fuzzification of College Adviser Proficiency Based on Specific KnowledgeFuzzification of College Adviser Proficiency Based on Specific Knowledge
Fuzzification of College Adviser Proficiency Based on Specific Knowledge
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Data Preparation with the help of Analytics Methodology
Data Preparation with the help of Analytics MethodologyData Preparation with the help of Analytics Methodology
Data Preparation with the help of Analytics Methodology
 
M1 regression metrics_middleschool
M1 regression metrics_middleschoolM1 regression metrics_middleschool
M1 regression metrics_middleschool
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 

Similar to Core Machine Learning Algorithms

Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
AaryanArora10
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Rebecca Bilbro
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Algorithm & data structures lec1
Algorithm & data structures lec1Algorithm & data structures lec1
Algorithm & data structures lec1
Abdul Khan
 

Similar to Core Machine Learning Algorithms (20)

How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Regresión
RegresiónRegresión
Regresión
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
MACHINE LEARNING.pptx
MACHINE LEARNING.pptxMACHINE LEARNING.pptx
MACHINE LEARNING.pptx
 
machine learning
machine learningmachine learning
machine learning
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
WIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual DiagnosticsWIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual Diagnostics
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Steering Model Selection with Visual Diagnostics
Steering Model Selection with Visual DiagnosticsSteering Model Selection with Visual Diagnostics
Steering Model Selection with Visual Diagnostics
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Algorithm & data structures lec1
Algorithm & data structures lec1Algorithm & data structures lec1
Algorithm & data structures lec1
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty
 

Recently uploaded

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Recently uploaded (20)

UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

Core Machine Learning Algorithms

  • 1. Philosophies of Modeling The simplest explanation is the best explanation. In modeling, if we are given two models that predict equally well, then we should always choose the simpler one.  Machine Learning India 1
  • 2. Algorithm #1: Least Squares Fitting Machine Learning India 2
  • 3. Scatterplot of your data: Machine Learning India 3
  • 4. What is the plot good for? Machine Learning India 4
  • 6. How do you do that? Machine Learning India 6
  • 7. You fit a line! Machine Learning India 7
  • 8. But is this the best line? Machine Learning India 8
  • 9. Or does the new line fit our data better? Machine Learning India 9
  • 10. How about a horizontal line? Machine Learning India 10
  • 11. How do you judge whether or not a line is a good fit? Machine Learning India 11
  • 12. By seeing how close it is to the data points? BAM! Machine Learning India 12
  • 13. Back to the horizontal line. Machine Learning India 13
  • 16. Total Error = Sum of Squared Residuals = (b – y1)2 + (b – y2)2 + …. (b – Yn) 2 Machine Learning India 16
  • 17. What if rotate the line a whole lot? Machine Learning India 17
  • 18. So there is a sweet spot between a horizontal and a vertical line! Machine Learning India 18
  • 19. y = mx + c Slope Y-Intercept Machine Learning India 19
  • 21. We will have to find the optimal values of ‘m’ and ‘c’, in order to minimize the sum of squared residuals. Machine Learning India 21
  • 22. Since we want to fit a line that will give us the least amount of ‘sum of squares’, this method for finding the best values of ‘m’ and ‘c’ is called least squares. Machine Learning India 22
  • 23. Plotting the ‘sum of squared residuals’ versus each rotation… Machine Learning India 23
  • 28. Big Important Concept #1: We have to minimize the difference between the observed values (target values) and the line (output values). Machine Learning India 28
  • 29. Big Important Concept #2: We do this by taking the derivative and finding where the value of the derivative equals zero. Machine Learning India 29
  • 30. Big Important Concept #3: Reducible and Irreducible error! Machine Learning India 30
  • 31. And you’re done! Machine Learning India 31
  • 33. Fitting a linear model: 1. Use least squares. 2. Calculate R2. 3. Calculate p-vale for R2. Machine Learning India 33
  • 34. Before understanding R2, let us understand what variance, standard deviation, covariance and correlation mean. Machine Learning India 34
  • 35. Variance is the average of the squared differences from the mean. Machine Learning India 35
  • 37. • It is a measure of how much the members of a group differ from the mean value of the group. • It is a measure of how spread out the members are. • It is the square root of variance. Standard Deviation: Machine Learning India 37
  • 39. For the entire population. Machine Learning India 39
  • 40. For a sample from the population. Machine Learning India 40
  • 41. Covariance is the measure of the joint variability of two random variables. The sign of covariance shows the tendency of the linear relationship between variables. Machine Learning India 41
  • 42. Formula for covariance: Over the entire population Machine Learning India 42
  • 43. Formula for covariance: Over a sample from population Machine Learning India 43
  • 44. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. Machine Learning India 44
  • 46. Covariance provides the direction of the linear relationship, while correlation provides the direction as well as strength. Machine Learning India 46
  • 47. Covariance has no upper or lower bounds, and the value is dependent on the scale of the variable, while… Correlation is always between -1 and +1, and is scale independent. Machine Learning India 47
  • 48. Guidelines: • First find out the pattern that the data is exhibiting, by looking at a scatterplot. • Correlation is only applicable to linear relationships. • Correlation is not causation. • Correlation strength does not necessarily mean that correlation is statistically significant. Machine Learning India 48
  • 49. Guess the correlation coefficients! Machine Learning India 49
  • 50. How about these? Machine Learning India 50
  • 51. Pearson’s Correlation Coefficient: In statistics, the Pearson correlation coefficient (PCC), is a measure of the linear correlation between two variables X and Y. Machine Learning India 51
  • 55. How can we more objectively state whether or not a relationship exists between two variables? Machine Learning India 55
  • 56. Relationship rule of thumb: If |r| >= 2 / (√n) Then, a relationship exists. Machine Learning India 56
  • 57. Fitting a linear model: 1. Use least squares. 2. Calculate R2. 3. Calculate p-value for R2. Coming back to, Machine Learning India 57
  • 58. r2 : R2 : R-Squared It is a measure of how well a model fits to data. It measures the goodness-of-fit. It can also be seen as a statistical measure of how close the data is fitted to the line. Machine Learning India 58
  • 59. r2 : R2 : R-Squared In general higher the R2, better the model fits your data. R2 can be defined as a percentage as well as a decimal value between 0 and 1. Machine Learning India 59
  • 60. r2 : R2 : R-Squared Machine Learning India 60
  • 61. R2 = Var(mean) – Var(line) Var(mean) Machine Learning India 61
  • 62. If R2 turns out to be 80%, then it means that there is 80% less variation around the line than the mean. Machine Learning India 62
  • 63. Big Important Concept #4: R2 gives the percentage of variation explained by the relationship between two variables. Machine Learning India 63
  • 64. Big Important Concept #5: If someone gives you the value of the plain old R (PCC), just square it! Machine Learning India 64
  • 65. Adjusted R2 The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. Machine Learning India 65
  • 67. P-value When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. It answers the question, “Does this result provide enough evidence that something is wrong with my assumptions, or could this result come out just because of luck?” Machine Learning India 67
  • 68. The smaller the p-value, the lesser likely it is that the result we got, is an outcome of luck. Machine Learning India 68
  • 69. Process: 1. Assuming that the null hypothesis is true. 2. Taking a sample and getting the statistic. 3. Working out how likely it is to get a statistic like this, by calculating the p-value. Machine Learning India 69
  • 70. If ‘p’ is low, NULL must GO!  Machine Learning India 70
  • 71. If ‘p’ is high, alternative hypothesis is a lie!  Machine Learning India 71
  • 72. Fitting a linear model: 1. Use least squares. 2. Calculate R2. 3. Calculate p-value for R2. Coming back to, Done! Machine Learning India 72
  • 74. Big Important Concept #6: Overfitting and Underfitting! Machine Learning India 74
  • 75. One of the major aspects of training your machine learning model is avoiding overfitting. The model will have a low accuracy if it is overfitting. This happens because your model is trying too hard to capture the noise in your training dataset. Machine Learning India 75
  • 76. By noise we mean the data points that don’t really represent the true properties of your data, but random chance. Learning such data points, makes your model more flexible, at the risk of overfitting. The concept of balancing bias and variance, is helpful in understanding the phenomenon of overfitting. Machine Learning India 76
  • 77. Big Important Concept #7: Bias Variance Tradeoff: The inability of a machine learning model to capture the true relationship is called bias. The difference in fits between datasets is called variance. The goal is to achieve low bias and low variance. Machine Learning India 77
  • 78. Bias Variance Tradeoff Machine Learning India 78
  • 79. Bias Variance Tradeoff Machine Learning India 79
  • 80. Big Important Concept #8: No Free Lunch Theorem: No single machine learning algorithm is better than all others on all problems. It is common to try multiple models and find the one that works the best for that particular problem. Machine Learning India 80
  • 81. Algorithm #3: Multiple Linear Regression Machine Learning India 81
  • 82. Multiple Linear Regression is just an extension of simple linear regression. It is used to determine a mathematical relationship among a number of random variables. In other terms, MLR examines how multiple independent variables are related to one dependent variable. Machine Learning India 82
  • 86. Alert: • Having more independent variables can make the model complicated. • Adding more independent variables does not guarantee a better prediction model. Machine Learning India 86
  • 87. Alert: Lack of multicollinearity must be checked for. Multicollinearity is the phenomenon where one of more independent variables in a regression model strongly predict one or more other independent variables. It might result in dummy-variable trap. Homework! Machine Learning India 87
  • 88. Regularization: This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting. Ridge Regression Lasso Regression Machine Learning India 88
  • 89. How do we estimate which parameters are actually important for our model? Machine Learning India 89
  • 90. • Have domain knowledge. • Use Subset Selection Methods. – All-in method – Backward Elimination – Forward Elimination – Bidirectional Elimination – Score Comparison Machine Learning India 90
  • 93. Polynomial Regression: In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Machine Learning India 93
  • 96. The fitment in 3-Dimensions: Machine Learning India 96
  • 97. The fitment in 3-Dimensions: Machine Learning India 97
  • 98. Woah, we had a great time predicting continuous values! Machine Learning India 98
  • 99. What if I want to predict discrete values? Machine Learning India 99
  • 101. Logistic regression is a predictive analysis. It is used to describe data and to explain the relationship between one dependent binary variable and one or more independent variables. Machine Learning India 101
  • 102. Logistic regression is intended for binary (two-class) classification problems. Machine Learning India 102
  • 106. y = mx + c Slope Y-Intercept Machine Learning India 106
  • 113. Big Important Concept #9: Evaluating classification model with the help of metrics! Choosing the right metric is paramount in judging how well the model is performing. Machine Learning India 113
  • 114. A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing. Machine Learning India 114
  • 119. Woah, we had a great time predicting binary discrete values! Machine Learning India 119
  • 120. What if I want to predict n-ary discrete values? Machine Learning India 120
  • 122. Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes. Machine Learning India 122
  • 123. In logistic regression we assumed that the labels were binary: y(i) ∈ {0,1}. We used such a classifier to distinguish between two categories. Softmax regression allows us to handle y(i) ∈ {1, …, K} where K is the number of classes. Machine Learning India 123
  • 127. What if we have a huge number of classes? Machine Learning India 127
  • 128. Algorithm #6: Linear Discriminant Analysis Machine Learning India 128
  • 129. Algorithm #6: Linear Discriminant Analysis Let us first understand Principal Component Analysis Machine Learning India 129
  • 130. Algorithm #6: Principle Component Analysis Machine Learning India 130
  • 131. In real world data analysis tasks we analyze complex data i.e. multi-dimensional data. Machine Learning India 131
  • 132. As the dimensions of data increase, the difficulty to visualize it and to perform computations on the data also increases. How do we do it? Remove the redundant dimensions. Only keep the most important dimensions. Machine Learning India 132
  • 133. Principal component analysis (PCA) to the rescue! It is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. It is used for dimensionality reduction. Machine Learning India 133
  • 134. Too much of visualization. StatQuest to our rescue! Machine Learning India 134
  • 136. The main idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of many variables correlated with each other, either heavily or lightly, while retaining the variation present in the dataset, up to the maximum extent. Machine Learning India 136
  • 137. The same is done by transforming the variables to a new set of variables, which are known as the principal components (or simply, the PCs) and are orthogonal, ordered such that the retention of variation present in the original variables decreases as we move down in the order. Machine Learning India 137
  • 138. So, in this way, the 1st principal component retains maximum variation that was present in the original components. The principal components are the eigenvectors of a covariance matrix, and hence they are orthogonal. Machine Learning India 138
  • 139. Puzzle! If you want to reduce the dimensionality of data from 2D to 1D, while classifying it into two categories. How will you do it? Machine Learning India 139
  • 140. Algorithm #7: Linear Discriminant Analysis Machine Learning India 140
  • 141. Linear discriminant analysis is similar to PCA, both can help us reduce the dimensionality, but LDA also focuses on increasing or maximizing the linear separability between classes, in data. Machine Learning India 141
  • 142. Linear discriminant analysis is similar to PCA, both can help us reduce the dimensionality, but LDA also focuses on increasing or maximizing the linear separability between classes, in data. Machine Learning India 142
  • 148. PCA and LDA both rank the new axes in order of importance. PCA accounts for the most variation in data, while LDA accounts for the most separability in data. Machine Learning India 148
  • 149. An eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it. Consider the image below in which three vectors are shown. The green square is only drawn to illustrate the linear transformation that is applied to each of these three vectors. Machine Learning India 149
  • 150. More about Eigenvectors on: www.visiondummy.com/2014/03/eigenvalues- eigenvectors/ Machine Learning India 150
  • 151. Algorithm #8: Support Vector Machine Machine Learning India 151
  • 152. A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. It is an algorithm for linearly separable binary sets. Machine Learning India 152
  • 153. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts wherein each class lay in either side. Machine Learning India 153
  • 157. The goal of the SVM is to classify all the training vectors two classes. Machine Learning India 157
  • 158. Confusing? Don’t worry, we shall learn in laymen terms. Machine Learning India 158
  • 159. Suppose you are given plot of two label classes on graph as shown in the image. Can you decide a separating line for the classes? Machine Learning India 159
  • 160. Any point that is left of line falls into black circle class and on right falls into blue square class. Separation of classes. That’s what SVM does. Machine Learning India 160
  • 161. So far so good. Now consider what if we had data as shown in image below? Machine Learning India 161
  • 162. We apply transformation and add one more dimension as we call it z-axis. Now can you draw a separating hyperplane? Yes! Machine Learning India 162
  • 167. When we transform back this line to original plane, it maps to circular boundary as shown in image. These transformations are called kernels. Machine Learning India 167
  • 168. Kernel functions: These are functions which takes low dimensional input space and transform it to a higher dimensional space i.e. it converts not separable problem to separable problem, these functions are called kernels. It is mostly useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then find out the process to separate the data based on the labels or outputs you’ve defined. Machine Learning India 168
  • 169. A bit complicated! Machine Learning India 169
  • 170. Which one do you think is appropriate? Machine Learning India 170
  • 171. Well, both the answers are correct. The first one tolerates some outlier points. The second one is trying to achieve 0 tolerance with perfect partition. Machine Learning India 171
  • 172. But, there is trade off. In real world application, finding perfect classes for millions of samples from the training data set takes lot of time. Therefore we define two terms regularization parameter and gamma. These are tuning parameters in SVM classifier. Machine Learning India 172
  • 173. Varying those we can achieve a considerable non-linear classification line with more accuracy in reasonable amount of time. Machine Learning India 173
  • 174. The Regularization parameter (often termed as C parameter) tells the SVM optimization – the extent to which you want to avoid misclassifying each training example. Machine Learning India 174
  • 175. For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points. Machine Learning India 175
  • 176. The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. Machine Learning India 176
  • 177. In other words, with low gamma, points far away from plausible separation line are considered in calculation for the separation line. Where as high gamma means that the points close to plausible line are considered in calculation. Machine Learning India 177
  • 179. How do we find out the right hyperplane? Machine Learning India 179
  • 180. Identify the right hyperplane (scenario #1): Machine Learning India 180
  • 181. Rule #1: Select the hyper-plane which segregates the two classes better. Machine Learning India 181
  • 182. Identify the right hyperplane (scenario #2): Machine Learning India 182
  • 183. Rule #2: Maximizing the distances between nearest data point (either class) and hyper-plane helps us to decide the right hyper-plane. Machine Learning India 183
  • 184. Identify the right hyperplane (scenario #3): Machine Learning India 184
  • 185. Rule #3: SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin. Machine Learning India 185
  • 186. SVM has a feature to ignore outliers and find the hyper-plane that has maximum margin. Hence, we can say, SVM is robust to outliers. Machine Learning India 186
  • 187. Algorithm 1.Define an optimal hyperplane: maximize margin 2.Extend the above definition for non-linearly separable problems: have a penalty term for misclassifications. 3.Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data is mapped implicitly to this space. Machine Learning India 187
  • 188. To define an optimal hyperplane we need to maximize the width of the margin (w). Machine Learning India 188
  • 190. We find w and b by solving the following objective function using Quadratic Programming. Machine Learning India 190
  • 192. The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often outperform more sophisticated classification methods. Machine Learning India 192
  • 193. As indicated, the objects can be classified as either GREEN or RED. Our task is to classify new cases as they arrive, i.e., decide to which class label they belong, based on the currently exiting objects. Machine Learning India 193
  • 194. Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case (which hasn't been observed yet) is twice as likely to have membership GREEN rather than RED. In the Bayesian analysis, this belief is known as the prior probability. Machine Learning India 194
  • 196. Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for class membership are: Machine Learning India 196
  • 197. Since the objects are well clustered, it is reasonable to assume that the more GREEN (or RED) objects in the vicinity of X (test point), the more likely that it belongs to that particular color. To measure this likelihood, we draw a circle around X which encompasses a number (to be chosen a priori) of points irrespective of their class labels. Machine Learning India 197
  • 199. Then we calculate the number of points in the circle belonging to each class label. From this we calculate the likelihood: Machine Learning India 199
  • 201. Although the prior probabilities indicate that X may belong to GREEN (given that there are twice as many GREEN compared to RED) the likelihood indicates otherwise; that the class membership of X is RED (given that there are more RED objects in the vicinity of X than GREEN). In the Bayesian analysis, the final classification is produced by combining both sources of information, i.e., the prior and the likelihood, to form a posterior probability using the so-called Bayes' rule Machine Learning India 201
  • 203. Finally, we classify X as RED since its class membership achieves the largest posterior probability. Machine Learning India 203
  • 205. Algorithm #10: K-Nearest Neighbors These neighbors are not annoying. Machine Learning India 205
  • 206. “Birds of a feather flock together.” Machine Learning India 206
  • 207. K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection. Machine Learning India 207
  • 209. An understanding of how we calculate the distance between points on a graph is necessary before moving on. If you are unfamiliar with or need a refresher on how this calculation is done. Homework Machine Learning India 209
  • 214. K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. Machine Learning India 214
  • 216. The results of the K-means clustering algorithm are: • The centroids of the K clusters, which can be used to label new data • Labels for the training data (each data point is assigned to a single cluster) Machine Learning India 216
  • 217. Rather than defining groups before looking at the data, clustering allows you to find and analyze the groups that have formed organically. Machine Learning India 217
  • 218. Each centroid of a cluster is a collection of feature values which define the resulting groups. Examining the centroid feature weights can be used to qualitatively interpret what kind of group each cluster represents. Machine Learning India 218
  • 222. BAM! You guys are pros at regression, classification, dimensionality reduction and clustering!! Feeling like a data-scientist, eh? Machine Learning India 222