SlideShare a Scribd company logo
1 of 41
Welcome to Deep Learning Online
Bootcamp
Day 4
Learning Objectives
Linear Regression Cost
Epochs
Learning Rate Batch Size
What is linear regression? - an example
Suppose you are thinking of selling your home. And, various houses with
different sizes (area in sq.ft) around you have sold for different prices as
listed below:
And considering, your home is 3000 square feet. How much should you
sell it for?
When you look at the existing price patterns (data) and predict a price
for your home that is an example of linear regression.
Here's an easy way to do it. Plotting the 3 data points (information
about previous homes) we have so far:
Each point represents one home.
What is linear regression? - an example
Now you can eyeball it and roughly draw a line that gets pretty close to
all of these points. Then look at the price shown by the line, where the
square footage is 3000:
Boom! Your home should sell for $260,000.
What is linear regression? - an example
That's all! (Well kinda) You plot your data, make a rough line, and use
the line to make predictions. You just need to make sure your line fits
the data well (but not too well :)
But of course we don't want to roughly make a line, we want to
compute the exact line that best "fits" our data. That’s where
machine learning comes into play!
What is linear regression? - an example
What is linear regression? - to summarize
● Linear regression is a linear model i.e. a model that assumes a linear
relationship (straight-line relationship) between the input variables (x)
and the single output variable (y).
● When there is a single input variable (x), the method is referred to as
simple linear regression or just linear regression.
For eg. The area of houses in this case.
● When there are multiple input variables, it is often referred to as
multiple linear regression.
For eg. If area of house, neighbourhood, rooms, etc were given.
Simple Linear Regression
In case of linear regression, the relationship between the dependant and the
independent variable is a simple equation of a straight line (y = mx + c)
Which line is good?
Consider this line and the predictions that it makes
This above drawn line is way off. For example, according to the line, a
1000 sq foot house should sell for $310,000, whereas we know it
actually sold for $200,000.
Which line is good?
This line is an average of
$8,333 dollars off
(adding all the distances
and dividing by 3).
This $8,333 is called the
cost of using this line.
Here's a better line:
The residual (absolute
difference between the
actual and the predicted
value) in this case are:
$5000, $15000 and
$5000.
Cost
The cost is how far off the learned line is from the real data. The best line is the
one that is the least off from the real data.
To find out what line is the best line (to find the values of coefficients), we need to
define and use a cost function.
In ML, cost functions are used to estimate how badly models are performing.
Put simply, a cost function is a measure of how wrong the model is in terms of its
ability to estimate the relationship between X and y.
Cost Function
Now let’s take a step back and reflect on why we are performing Linear
Regression. It is being performed so that when we are finally using our model
for prediction, it will predict the value of target variable (or dependent variable)
y for the input value (s) of x (even on a new unseen data).
By achieving the best-fit regression line, the model aims to predict y value such
that the error difference between predicted value and true value is
minimum.
So, it is very important to update the m and b values, to reach the best value
that minimize the error between predicted value (pred) and true value (y)
per sample i.
Here, n is the total no. of observations/data points.
Now let’s have a look at how exactly we got the above formula
Cost Function
Here, pred = mx + c. Pred can also be termed as the hypothesis function
The difference between the model’s predicted value and the true value
is called RESIDUAL or COST or LOSS, which needs to be minimized.
Cost Function - Types
There are three primary metrics used to evaluate linear models (to find
how well a model is performing):
1. Mean Squared Error
2. Root Mean Squared Error
3. Mean Absolute Error
Mean Squared Error (MSE)
● MSE is simply the average of the squared difference between the true
target value and the value predicted by the regression model.
● As it squares the differences, it penalizes (gives some penalty or weight
for deviating from the objective) even a small error which leads to
over-estimation of how bad the model is.
Root Mean Squared Error (RMSE)
● It is just the square root of the mean square error.
● It is preferred more in some cases because the errors are first
squared before averaging which poses a high penalty on large
errors. This implies that RMSE is useful when large errors are
undesired.
Mean Absolute Error(MAE)
● MAE is the absolute difference between the target value and the value
predicted by the model.
● MAE does not penalize the errors as effectively as mse making it not
suitable for use-cases where you want to pay more attention to the
outliers.
Objective
Find Optimal Values that provide efficient/effective predictions
The values of m (coefficient of x) and c (y intercept) are updated repeatedly until
the line fits the data well enough to get the optimal solution
y = 1.47 * x + 0.0296
Gradient Descent!
Gradient Descent (GD) is an optimization technique in the
machine learning process which helps us minimize the
cost function by learning the weights of the model such
that it fits the training data well.
How do we find the optimal values, you ask?
Gradient Descent - let’s break it
down
Gradient = rate of inclination or declination of a slope (how steep a
slope is and in which direction it is going) of a function at some point.
Descent = an act of moving downwards.
Simple way to memorize:
● line going up as we move right → positive slope, gradient
● line going down as we move right → negative slope, gradient
Gradient Descent- A technique to minimize cost
At the start, the parameter values of the hypothesis are randomly
initialized (can be 0)
Then, Gradient Descent iteratively (one step at a time) adjusts the
parameters and gradually finds the best values for them by minimizing
the cost
In this case we’ll use it to find the parameters m and c of the linear
regression model
m = slope of a line that we just learned about
c = bias : where the line crosses the Y axis.
Gradient Descent - An Example
Imagine a person at the top of a mountain who wants to get to the
bottom of the valley below the mountain through the fog (he cannot
see clearly ahead and only can look around the point he’s standing at)
He goes down the slope and takes large steps when the slope is
steep and small steps when the slope is less steep.
He decides his next position based on his current position and stops
when he gets to the bottom of the valley which was his goal.
Gradient Descent
Gradient Descent in action
Now, think of the valley as a cost function.
The objective of GD is to minimise the cost
function (by updating it’s parameter
values). In this case the minimum value
would be 0.
m
Y = m*x
What does a cost function look like?
J is the cost function, plotted along it’s parameter values. Notice the
crests and troughs.
How we navigate the cost function via GD
Hyperparameters
● Hyperparameters are parameters which determine the
model’s performance (or how well we are able to train it)
● These are parameters outside of the model which are used
to tune the model so that it fits the training data for the given
hypothesis
● We can change the values of these hyperparameters during
successive runs of training a model.
● Here you will learn three important hyper parameters:
○ Learning Rate
○ Epoch
○ Batch Size
Learning Rate
How does learning rate play a role in getting the
optimal weights?
Learning Rate
- Learning rate is the rate at which we
update (increase or decrease) the values
of the parameters (m) during gradient
descent
- As you can see in the image,
initially the steps are bigger (huge jumps)
- As the point goes down, the steps
become shorter(smaller jumps)
- When updating weights during GD, if the learning rate is high, the weights
will be changed aggressively and when it is low, they’ll be updated slowly
m
m
Learning Rate
Epochs
One epoch is when an ENTIRE dataset is passed forward and
backward through the neural network only ONCE.
Usually, training a neural network takes more than a few epochs. It is
because just passing the dataset once through the Neural Network is
not enough (depends on the problem too).
We need to pass the full dataset multiple times to the same neural
network so that the model is able to learn the patterns present in the
data
Note: There is no guarantee a network will converge or get better
by letting it learn the data for multiple epochs. It is an art in machine
learning to decide the number of epochs sufficient for a network.
Epochs
m (weight) c (bias)
When the data is too big, we can’t pass all the data to the
computer at once (think on the order of millions records of
data).
To overcome this problem we need to divide the data into
smaller sizes and give it to our computer one by one and
update the weights of the neural networks at the end of every
step.
Batch
Batch Size
Batch Size = Total number of training examples present in a
single batch.
But What is a Batch?
As we said, you can’t pass the entire dataset into the neural net at
once. So, you divide dataset into Number of Batches or sets or
parts.
Batch Size
Let’s say we have 2000 training examples that we are going to use .
We can divide the dataset of 2000 samples into batches of 500.
What is the Batch Size here? → 500
What will be the number of batches? → Simple, 2000/500 = 4
Thus, when all the 4 batches (the whole data) pass through the model
back and forth once, it’ll complete 1 epoch.
The 5 Step Model Life-Cycle
● A model has a life-cycle, and this very simple knowledge
provides the backbone for both modeling a dataset and
understanding the tf.keras API.
● The five steps in the life-cycle are as follows:
● Define the model.
● Compile the model.
● Fit the model.
● Make predictions on the test data.
● Evaluate the model.
● The above 5 steps are explained in the notebook with practical
implementation.
Notebook for practice
● Beginners’ Notebook
● Intermediate Learners Notebook
Overfit - Optimal - Underfit Example
Slide Download Link
● You can download the slides here:
https://docs.google.com/presentation/d/1VuNfO4p5Gumm_UT_t84
n9jptTN-REoDIZip5Dx2JxB8/edit?usp=sharing
References
● https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-
4dfb9c7ce9c9
That’s it for the day. Thank you!
Feel free to post any queries on Discuss
or in the #help channel on Slack

More Related Content

Similar to Bootcamp of new world to taken seriously

Essay On Linear Function
Essay On Linear FunctionEssay On Linear Function
Essay On Linear Function
Angie Lee
 
MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentation
Steve Dias da Cruz
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
Kemal İnciroğlu
 

Similar to Bootcamp of new world to taken seriously (20)

Essay On Linear Function
Essay On Linear FunctionEssay On Linear Function
Essay On Linear Function
 
Machine learning
Machine learningMachine learning
Machine learning
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentation
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
 
Matlab for marketing people
Matlab for marketing peopleMatlab for marketing people
Matlab for marketing people
 
Detail Study of the concept of Regression model.pptx
Detail Study of the concept of  Regression model.pptxDetail Study of the concept of  Regression model.pptx
Detail Study of the concept of Regression model.pptx
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Regression Analysis Techniques.pptx
Regression Analysis Techniques.pptxRegression Analysis Techniques.pptx
Regression Analysis Techniques.pptx
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detection
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
Plane-and-Solid-Geometry. introduction to proving
Plane-and-Solid-Geometry. introduction to provingPlane-and-Solid-Geometry. introduction to proving
Plane-and-Solid-Geometry. introduction to proving
 
Linear regression
Linear regressionLinear regression
Linear regression
 
classification algorithms in machine learning.pptx
classification algorithms in machine learning.pptxclassification algorithms in machine learning.pptx
classification algorithms in machine learning.pptx
 

Recently uploaded

如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
gkyvm
 
如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
epyhpep
 
如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样
如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样
如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样
qyguxu
 
Abortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotec
Abortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotecAbortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotec
Abortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样
如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样
如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样
qyguxu
 
如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样
如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样
如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样
qyguxu
 
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证
gkyvm
 
如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样
如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样
如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样
muwyto
 
BLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOAL
BLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOALBLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOAL
BLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOAL
CaitlinCummins3
 
如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样
如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样
如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样
qyguxu
 
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样
muwyto
 
如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样
如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样
如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样
muwyto
 
如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样
如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样
如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样
muwyto
 
B. A. (Prog.) Political Science 6th Semester 2019.pdf
B. A. (Prog.) Political Science 6th Semester 2019.pdfB. A. (Prog.) Political Science 6th Semester 2019.pdf
B. A. (Prog.) Political Science 6th Semester 2019.pdf
paraspiyush3
 

Recently uploaded (20)

如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(VIU毕业证书)温哥华岛大学毕业证成绩单本科硕士学位证留信学历认证
 
Kathleen McBride Costume Design Resume.pdf
Kathleen McBride Costume Design Resume.pdfKathleen McBride Costume Design Resume.pdf
Kathleen McBride Costume Design Resume.pdf
 
Crafting an effective CV for AYUSH Doctors.pdf
Crafting an effective CV for AYUSH Doctors.pdfCrafting an effective CV for AYUSH Doctors.pdf
Crafting an effective CV for AYUSH Doctors.pdf
 
LinkedIn For Job Search Presentation May 2024
LinkedIn For Job Search Presentation May 2024LinkedIn For Job Search Presentation May 2024
LinkedIn For Job Search Presentation May 2024
 
如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia毕业证书)哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
 
如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样
如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样
如何办理(Indiana State毕业证书)印第安纳州立大学毕业证成绩单原件一模一样
 
Job Hunting - pick over this fishbone for telephone interviews!.pptx
Job Hunting - pick over this fishbone for telephone interviews!.pptxJob Hunting - pick over this fishbone for telephone interviews!.pptx
Job Hunting - pick over this fishbone for telephone interviews!.pptx
 
Abortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotec
Abortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotecAbortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotec
Abortion pills in Jeddah Saudi Arabia (+966572737505) buy cytotec
 
如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样
如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样
如何办理(CBU毕业证书)浸会大学毕业证成绩单原件一模一样
 
如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样
如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样
如何办理(UNTEC毕业证书)新西兰联合理工学院毕业证成绩单原件一模一样
 
Stack and its operations, Queue and its operations
Stack and its operations, Queue and its operationsStack and its operations, Queue and its operations
Stack and its operations, Queue and its operations
 
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单本科硕士学位证留信学历认证
 
如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样
如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样
如何办理(UdeM毕业证书)蒙特利尔大学毕业证成绩单原件一模一样
 
BLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOAL
BLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOALBLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOAL
BLAHALIFHKSDFOILEWKHJSFDNLDSKFN,DLFKNFMELKFJAERPIOAL
 
如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样
如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样
如何办理(CCA毕业证书)加利福尼亚艺术学院毕业证成绩单原件一模一样
 
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样
如何办理(UST毕业证书)圣托马斯大学毕业证成绩单原件一模一样
 
Master SEO in 2024 - The Complete Beginner's Guide
Master SEO in 2024 - The Complete Beginner's GuideMaster SEO in 2024 - The Complete Beginner's Guide
Master SEO in 2024 - The Complete Beginner's Guide
 
如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样
如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样
如何办理(UW毕业证书)西雅图华盛顿大学毕业证成绩单原件一模一样
 
如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样
如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样
如何办理(CQU毕业证书)中央昆士兰大学毕业证成绩单原件一模一样
 
B. A. (Prog.) Political Science 6th Semester 2019.pdf
B. A. (Prog.) Political Science 6th Semester 2019.pdfB. A. (Prog.) Political Science 6th Semester 2019.pdf
B. A. (Prog.) Political Science 6th Semester 2019.pdf
 

Bootcamp of new world to taken seriously

  • 1. Welcome to Deep Learning Online Bootcamp Day 4
  • 2. Learning Objectives Linear Regression Cost Epochs Learning Rate Batch Size
  • 3. What is linear regression? - an example Suppose you are thinking of selling your home. And, various houses with different sizes (area in sq.ft) around you have sold for different prices as listed below: And considering, your home is 3000 square feet. How much should you sell it for? When you look at the existing price patterns (data) and predict a price for your home that is an example of linear regression.
  • 4. Here's an easy way to do it. Plotting the 3 data points (information about previous homes) we have so far: Each point represents one home. What is linear regression? - an example
  • 5. Now you can eyeball it and roughly draw a line that gets pretty close to all of these points. Then look at the price shown by the line, where the square footage is 3000: Boom! Your home should sell for $260,000. What is linear regression? - an example
  • 6. That's all! (Well kinda) You plot your data, make a rough line, and use the line to make predictions. You just need to make sure your line fits the data well (but not too well :) But of course we don't want to roughly make a line, we want to compute the exact line that best "fits" our data. That’s where machine learning comes into play! What is linear regression? - an example
  • 7. What is linear regression? - to summarize ● Linear regression is a linear model i.e. a model that assumes a linear relationship (straight-line relationship) between the input variables (x) and the single output variable (y). ● When there is a single input variable (x), the method is referred to as simple linear regression or just linear regression. For eg. The area of houses in this case. ● When there are multiple input variables, it is often referred to as multiple linear regression. For eg. If area of house, neighbourhood, rooms, etc were given.
  • 8. Simple Linear Regression In case of linear regression, the relationship between the dependant and the independent variable is a simple equation of a straight line (y = mx + c)
  • 9. Which line is good? Consider this line and the predictions that it makes This above drawn line is way off. For example, according to the line, a 1000 sq foot house should sell for $310,000, whereas we know it actually sold for $200,000.
  • 10. Which line is good? This line is an average of $8,333 dollars off (adding all the distances and dividing by 3). This $8,333 is called the cost of using this line. Here's a better line: The residual (absolute difference between the actual and the predicted value) in this case are: $5000, $15000 and $5000.
  • 11. Cost The cost is how far off the learned line is from the real data. The best line is the one that is the least off from the real data. To find out what line is the best line (to find the values of coefficients), we need to define and use a cost function. In ML, cost functions are used to estimate how badly models are performing. Put simply, a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y.
  • 12. Cost Function Now let’s take a step back and reflect on why we are performing Linear Regression. It is being performed so that when we are finally using our model for prediction, it will predict the value of target variable (or dependent variable) y for the input value (s) of x (even on a new unseen data). By achieving the best-fit regression line, the model aims to predict y value such that the error difference between predicted value and true value is minimum. So, it is very important to update the m and b values, to reach the best value that minimize the error between predicted value (pred) and true value (y) per sample i. Here, n is the total no. of observations/data points. Now let’s have a look at how exactly we got the above formula
  • 13. Cost Function Here, pred = mx + c. Pred can also be termed as the hypothesis function The difference between the model’s predicted value and the true value is called RESIDUAL or COST or LOSS, which needs to be minimized.
  • 14. Cost Function - Types There are three primary metrics used to evaluate linear models (to find how well a model is performing): 1. Mean Squared Error 2. Root Mean Squared Error 3. Mean Absolute Error
  • 15. Mean Squared Error (MSE) ● MSE is simply the average of the squared difference between the true target value and the value predicted by the regression model. ● As it squares the differences, it penalizes (gives some penalty or weight for deviating from the objective) even a small error which leads to over-estimation of how bad the model is.
  • 16. Root Mean Squared Error (RMSE) ● It is just the square root of the mean square error. ● It is preferred more in some cases because the errors are first squared before averaging which poses a high penalty on large errors. This implies that RMSE is useful when large errors are undesired.
  • 17. Mean Absolute Error(MAE) ● MAE is the absolute difference between the target value and the value predicted by the model. ● MAE does not penalize the errors as effectively as mse making it not suitable for use-cases where you want to pay more attention to the outliers.
  • 18. Objective Find Optimal Values that provide efficient/effective predictions The values of m (coefficient of x) and c (y intercept) are updated repeatedly until the line fits the data well enough to get the optimal solution y = 1.47 * x + 0.0296
  • 19. Gradient Descent! Gradient Descent (GD) is an optimization technique in the machine learning process which helps us minimize the cost function by learning the weights of the model such that it fits the training data well. How do we find the optimal values, you ask?
  • 20. Gradient Descent - let’s break it down Gradient = rate of inclination or declination of a slope (how steep a slope is and in which direction it is going) of a function at some point. Descent = an act of moving downwards. Simple way to memorize: ● line going up as we move right → positive slope, gradient ● line going down as we move right → negative slope, gradient
  • 21. Gradient Descent- A technique to minimize cost At the start, the parameter values of the hypothesis are randomly initialized (can be 0) Then, Gradient Descent iteratively (one step at a time) adjusts the parameters and gradually finds the best values for them by minimizing the cost In this case we’ll use it to find the parameters m and c of the linear regression model m = slope of a line that we just learned about c = bias : where the line crosses the Y axis.
  • 22. Gradient Descent - An Example Imagine a person at the top of a mountain who wants to get to the bottom of the valley below the mountain through the fog (he cannot see clearly ahead and only can look around the point he’s standing at) He goes down the slope and takes large steps when the slope is steep and small steps when the slope is less steep. He decides his next position based on his current position and stops when he gets to the bottom of the valley which was his goal.
  • 24. Gradient Descent in action Now, think of the valley as a cost function. The objective of GD is to minimise the cost function (by updating it’s parameter values). In this case the minimum value would be 0. m Y = m*x
  • 25. What does a cost function look like? J is the cost function, plotted along it’s parameter values. Notice the crests and troughs.
  • 26. How we navigate the cost function via GD
  • 27. Hyperparameters ● Hyperparameters are parameters which determine the model’s performance (or how well we are able to train it) ● These are parameters outside of the model which are used to tune the model so that it fits the training data for the given hypothesis ● We can change the values of these hyperparameters during successive runs of training a model. ● Here you will learn three important hyper parameters: ○ Learning Rate ○ Epoch ○ Batch Size
  • 28. Learning Rate How does learning rate play a role in getting the optimal weights?
  • 29. Learning Rate - Learning rate is the rate at which we update (increase or decrease) the values of the parameters (m) during gradient descent - As you can see in the image, initially the steps are bigger (huge jumps) - As the point goes down, the steps become shorter(smaller jumps) - When updating weights during GD, if the learning rate is high, the weights will be changed aggressively and when it is low, they’ll be updated slowly m m
  • 31. Epochs One epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE. Usually, training a neural network takes more than a few epochs. It is because just passing the dataset once through the Neural Network is not enough (depends on the problem too). We need to pass the full dataset multiple times to the same neural network so that the model is able to learn the patterns present in the data Note: There is no guarantee a network will converge or get better by letting it learn the data for multiple epochs. It is an art in machine learning to decide the number of epochs sufficient for a network.
  • 33. When the data is too big, we can’t pass all the data to the computer at once (think on the order of millions records of data). To overcome this problem we need to divide the data into smaller sizes and give it to our computer one by one and update the weights of the neural networks at the end of every step. Batch
  • 34. Batch Size Batch Size = Total number of training examples present in a single batch. But What is a Batch? As we said, you can’t pass the entire dataset into the neural net at once. So, you divide dataset into Number of Batches or sets or parts.
  • 35. Batch Size Let’s say we have 2000 training examples that we are going to use . We can divide the dataset of 2000 samples into batches of 500. What is the Batch Size here? → 500 What will be the number of batches? → Simple, 2000/500 = 4 Thus, when all the 4 batches (the whole data) pass through the model back and forth once, it’ll complete 1 epoch.
  • 36. The 5 Step Model Life-Cycle ● A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the tf.keras API. ● The five steps in the life-cycle are as follows: ● Define the model. ● Compile the model. ● Fit the model. ● Make predictions on the test data. ● Evaluate the model. ● The above 5 steps are explained in the notebook with practical implementation.
  • 37. Notebook for practice ● Beginners’ Notebook ● Intermediate Learners Notebook
  • 38. Overfit - Optimal - Underfit Example
  • 39. Slide Download Link ● You can download the slides here: https://docs.google.com/presentation/d/1VuNfO4p5Gumm_UT_t84 n9jptTN-REoDIZip5Dx2JxB8/edit?usp=sharing
  • 41. That’s it for the day. Thank you! Feel free to post any queries on Discuss or in the #help channel on Slack