SlideShare a Scribd company logo
1 of 60
Machine Learning
Instructor: Dr. Emad Nabil
Linear regression with one variable
Model representation
Credits: This material is based mainly on Andrew NG’s Coursera ML course, edited by Dr. Emad Nabil
Model Representation And Hypothesis
Supervised Learning
Formally stated, the aim of the supervised learning algorithm is to use the training Dataset and output
a hypothesis function, h where h is a function that takes the input instance and predicts the output based
on its learnings from the training dataset.
f1 f1 f3 fn label
m examples
n features
As shown above, say given a dataset where each training instance consists of the area of a house and its
price, the job of the learning algorithm would be to produce a hypothesis, h such that it takes the size of house
as input and predicts its price
For example, for Linear Regression in One Variable or Univariate Linear Regression, the hypothesis, h
is given by:
Hypothesis
Hypothesis is the function that is to be learnt by the learning algorithm by the training progress for making the
predictions about the unseen data.
Cost Function of Linear Regression
Linear Regression
Size (feet)2
Price in 1000’
dollars
Cost Function of Linear Regression
Size (feet)2
the objective of the learning algorithm is to find the best
parameters to fit the dataset i.e. choose ΞΈ0 and ΞΈ1 so
that hΞΈ(x) is close to y for the training examples (x, y).
This can be mathematically represented as,
Price in 1000’
dollars
Fit model by minimizing mean of squared errors
Least Squared errors Linear Regression
Predicted by h actual
Cost function visualization
Consider a simple case of hypothesis by setting ΞΈ0=0, then h becomes : hΞΈ(x)=ΞΈ1x
which corresponds to different lines passing through the origin as shown in plots below as y-intercept
i.e. ΞΈ0 is nulled out.
Each value of ΞΈ1 corresponds to a different hypothesis as it is the slope of the line
Simple Hypothesis
At ΞΈ1=1,
At ΞΈ1=2,
At ΞΈ1=0.5, J(0.5)
Cost function visualization
Simple Hypothesis
At ΞΈ1=1,
At ΞΈ1=2,
At ΞΈ1=0.5, J(0.5)
On plotting points like this further, one gets
the following graph for the cost function which
is dependent on parameter ΞΈ1.
plot each value of ΞΈ1 corresponds to a
different hypothesizes
Cost function visualization
What is the optimal value of ΞΈ1 that minimizes J(ΞΈ1) ?
It is clear that best value for ΞΈ1 =1 as J(ΞΈ1 ) = 0,
which is the minimum.
How to find the best value for ΞΈ1 ?
Plotting ?? Not practical specially in high
dimensions?
The solution :
1. Analytical solution: not applicable for large
datasets
2. Numerical solution: ex: Gradient descent .
polting the cost function 𝑗 πœƒ0, πœƒ1
Each point (like the red one above) represents a
pair of parameter values for Ɵ0 and Ɵ1
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Gradient descent
To reach the lowest point
We should move the ball step by
step in the gradient direction
Andrew Ng
1
0
J(0,1)
Imagine that this is a landscape of grassy park, and you want to go to
the lowest point in the park as rapidly as possible
Red: means high
blue: means low
Starting point
local minimum
Andrew Ng
0
1
J(0,1)
Red: means high
blue: means low
With different starting point
New Starting
point
New local
minimum
Andrew Ng
Have some function
Want
Outline:
β€’ Start with some
β€’ Keep changing to reduce
until we hopefully end up at a minimum
Gradient Descent
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function.
This is where gradient descent steps in. There are two basic steps involved in this algorithm:
 Start with some random value of θ0, θ1.
 Keep updating the value of θ0, θ1 to reduce J(θ0, θ1) until minimum is reached.
Gradient descent
Understanding Gradient Descent
Consider a simpler cost function J(ΞΈ1) and the objective: MinΞΈ1 J(ΞΈ1)
Repeat{
πœƒ1 = πœƒ1 βˆ’ 𝛼
𝑑
π‘‘πœƒ1
𝐽 πœƒ1
}
+ve slope
-ve slope
If the slope is positive and since Ξ± is a positive real number then
overall term βˆ’πœΆ
𝒅
π’…πœ½πŸ
𝑱(𝜽𝟏) would be negative. This means
the value of ΞΈ1 will be decreased in magnitude as shown by the
arrows.
Understanding Gradient Descent
Consider a simpler cost function J(ΞΈ1) and the objective: MinΞΈ1 J(ΞΈ1) +ve slope
-ve slope
if the initialization was at point B, then the slope of the
line would be negative and then update term would be
positive leading to increase in the value of ΞΈ1 as shown in
the plot
Repeat{
πœƒ1 = πœƒ1 βˆ’ 𝛼
𝑑
π‘‘πœƒ1
𝐽 πœƒ1
}
Understanding Gradient Descent
Consider a simpler cost function J(ΞΈ1) and the objective: MinΞΈ1 J(ΞΈ1) +ve slope
-ve slope
Repeat{
πœƒ1 = πœƒ1 βˆ’ 𝛼
𝑑
π‘‘πœƒ1
𝐽 πœƒ1
}
So, no matter where the ΞΈ1 is initialized, the algorithm ensures that
parameter is updated in the right direction towards the minima,
given proper value for Ξ± is chosen.
How can we choose the learning Rate: Ξ±?
here
Repeat{
πœƒ1 = πœƒ1 βˆ’ 𝛼
𝑑
π‘‘πœƒ1
𝐽 πœƒ1
}
β€’The yellow plot shows the divergence of the algorithm
when the learning rate is really high wherein the learning
steps overshoot.
β€’The green plot shows the case where learning rate is
not as large as the previous case but is high enough that
the steps keep oscillating at a point which is not the
minima.
β€’The red plot would be the optimum curve for the
cost drop as it drops steeply initially and then saturates
very close to the optimum value.
β€’The blue plot is the least value of Ξ±Ξ± and converges
very slowly as the steps taken by the algorithm during
update steps are very small.
The learning rate 𝛼 vs the cost function 𝐽 πœƒ1
How to choose the learning rate?
Automatic Convergence Test: Gradient descent can be considered to be converged if the drop in
cost function is not more than a preset threshold say 10βˆ’3
In order to choose optimum value of Ξ± run the algorithm with different values like:
1, 0.3, 0.1, 0.03, 0.01 etc and plot the learning curve to understand whether the value should be
increased or decreased.
Gradient Descent Stopping condition
Practical tips
Slope1> slope2> slope3> …..
By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no
need to change the value of alpha.
Fixed learning rate alpha
Is it required to decrease the value of 𝜢 by time?
Applying gradient decent to the linear model
here
area price
1
2
…
m
area age price
1
2
…
m
β„Žπœƒ π‘₯ = πœƒ0 + πœƒ1π‘₯1 + πœƒ2π‘₯2
β„Žπœƒ π‘₯ = πœƒ0 + πœƒ1π‘₯1
area …. #Floors price
1
2
…
m
β„Žπœƒ π‘₯ = πœƒ0 + πœƒ1π‘₯1 + β‹― + πœƒπ‘›π‘₯𝑛
Model against dataset
the housing price problem
Andrew Ng
Gradient descent algorithm Linear Regression Model
Fixed learning rate alpha
Slope1> slope2> slope3> …..
By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no
need to change the value of alpha.
In order to apply gradient descent, the derivative term
πœ•
πœ•πœƒπ‘—
𝐽(πœƒ0, πœƒ1) needs to be calculated.
πœ•
πœ•πœƒπ‘—
𝐽(πœƒ0, πœƒ1) =
πœ•
πœ•πœƒπ‘—
(
1
2π‘š
𝑖=1
π‘š
β„Žπœƒ π‘₯ 𝑖
βˆ’ 𝑦 𝑖 2
)
=
1
2π‘š
(
πœ•
πœ•πœƒπ‘—
𝑖=1
π‘š
β„Žπœƒ(π‘₯ 𝑖 ) βˆ’ 𝑦 𝑖 2
)
=
1
2π‘š
(
πœ•
πœ•πœƒπ‘—
𝑖=1
π‘š
πœƒ0 + πœƒ1π‘₯ 𝑖 βˆ’ 𝑦 𝑖 2
)
=
1
π‘š
𝑖=1
π‘š
(β„Žπœƒ(π‘₯ 𝑖 ) βˆ’ 𝑦 𝑖 ) for j = 0
1
π‘š
𝑖=1
π‘š
(β„Žπœƒ(π‘₯ 𝑖
) βˆ’ 𝑦 𝑖
)π‘₯ 𝑖
for j = 1
In order to apply gradient descent, the derivative term
πœ•
πœ•πœƒπ‘—
𝐽(πœƒ0, πœƒ1) needs to be calculated.
Risk of meeting different local optimum
The problem of existing more than one local optima, may make the gradient decent converge to the
non global optima.
local optima
local optima
Cost/error function is always convex
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x)
Now you can do prediction, given a house size!
The gradient descent technique that uses all the training examples in each step is called Batch Gradient
Descent. This is basically the calculation of the derivative term over all the training examples as it can be
seen it the equation above.
The equation of linear regression can also be solved using Normal Equations method, but it poses a
disadvantage that it does not scale very well on larger data while gradient descent does.
There are plenty of optimizers
Examples
1. Adagrad
2. Adadelta
3. Adam
4. Conjugate Gradients
5. BFGS
6. Momentum
7. Nesterov Momentum
8. Newton’s Method
9. RMSProp
10. SGD
2. Linear regression with one variable.pptx

More Related Content

What's hot

Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
Β 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
Β 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price PredictionKadambini Indurkar
Β 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
Β 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
Β 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
Β 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
Β 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
Β 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
Β 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
Β 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
Β 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)cairo university
Β 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
Β 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding GradientsSiddharth Vij
Β 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
Β 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHMHARDIK SINGH
Β 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
Β 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceJay Nagar
Β 
Random forest
Random forestRandom forest
Random forestUjjawal
Β 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
Β 

What's hot (20)

Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
Β 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Β 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
Β 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
Β 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Β 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
Β 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Β 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
Β 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Β 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Β 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Β 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
Β 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
Β 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding Gradients
Β 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Β 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
Β 
Back propagation
Back propagationBack propagation
Back propagation
Β 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Β 
Random forest
Random forestRandom forest
Random forest
Β 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Β 

Similar to 2. Linear regression with one variable.pptx

Machine learning
Machine learningMachine learning
Machine learningShreyas G S
Β 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variablesTanmayVijay1
Β 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesEric Conner
Β 
2 linear regression with one variable
2 linear regression with one variable2 linear regression with one variable
2 linear regression with one variableTanmayVijay1
Β 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
Β 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptxnathansel1
Β 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
Β 
Polynomials lecture
Polynomials lecturePolynomials lecture
Polynomials lectureAdnanBukhari13
Β 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1weekcsl9496
Β 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
Β 
working with python
working with pythonworking with python
working with pythonbhavesh lande
Β 
1. Regression_V1.pdf
1. Regression_V1.pdf1. Regression_V1.pdf
1. Regression_V1.pdfssuser4c50a9
Β 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
Β 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)cairo university
Β 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep LearningShajun Nisha
Β 
Regression
RegressionRegression
RegressionNcib Lotfi
Β 
Intensity Transformation and Spatial filtering
Intensity Transformation and Spatial filteringIntensity Transformation and Spatial filtering
Intensity Transformation and Spatial filteringShajun Nisha
Β 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Deepak John
Β 
Solution 3.
Solution 3.Solution 3.
Solution 3.sansaristic
Β 

Similar to 2. Linear regression with one variable.pptx (20)

Machine learning
Machine learningMachine learning
Machine learning
Β 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variables
Β 
Ann a Algorithms notes
Ann a Algorithms notesAnn a Algorithms notes
Ann a Algorithms notes
Β 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
Β 
2 linear regression with one variable
2 linear regression with one variable2 linear regression with one variable
2 linear regression with one variable
Β 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
Β 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
Β 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
Β 
Polynomials lecture
Polynomials lecturePolynomials lecture
Polynomials lecture
Β 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1week
Β 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
Β 
working with python
working with pythonworking with python
working with python
Β 
1. Regression_V1.pdf
1. Regression_V1.pdf1. Regression_V1.pdf
1. Regression_V1.pdf
Β 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
Β 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)
Β 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
Β 
Regression
RegressionRegression
Regression
Β 
Intensity Transformation and Spatial filtering
Intensity Transformation and Spatial filteringIntensity Transformation and Spatial filtering
Intensity Transformation and Spatial filtering
Β 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1
Β 
Solution 3.
Solution 3.Solution 3.
Solution 3.
Β 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
Β 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
Β 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
Β 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
Β 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
Β 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
Β 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)Dr. Mazin Mohamed alkathiri
Β 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
Β 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
Β 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
Β 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
Β 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
Β 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
Β 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
Β 
β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
Β 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
Β 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
Β 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
Β 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
Β 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
Β 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
Β 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
Β 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
Β 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
Β 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
Β 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
Β 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
Β 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
Β 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
Β 
Model Call Girl in Tilak Nagar Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at πŸ”9953056974πŸ”Model Call Girl in Tilak Nagar Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at πŸ”9953056974πŸ”
Β 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
Β 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
Β 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
Β 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
Β 
β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
β€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Β 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
Β 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
Β 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
Β 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
Β 

2. Linear regression with one variable.pptx

  • 1. Machine Learning Instructor: Dr. Emad Nabil Linear regression with one variable Model representation Credits: This material is based mainly on Andrew NG’s Coursera ML course, edited by Dr. Emad Nabil
  • 2.
  • 3.
  • 5. Supervised Learning Formally stated, the aim of the supervised learning algorithm is to use the training Dataset and output a hypothesis function, h where h is a function that takes the input instance and predicts the output based on its learnings from the training dataset. f1 f1 f3 fn label m examples n features
  • 6. As shown above, say given a dataset where each training instance consists of the area of a house and its price, the job of the learning algorithm would be to produce a hypothesis, h such that it takes the size of house as input and predicts its price
  • 7. For example, for Linear Regression in One Variable or Univariate Linear Regression, the hypothesis, h is given by: Hypothesis Hypothesis is the function that is to be learnt by the learning algorithm by the training progress for making the predictions about the unseen data.
  • 8. Cost Function of Linear Regression Linear Regression Size (feet)2 Price in 1000’ dollars
  • 9. Cost Function of Linear Regression Size (feet)2 the objective of the learning algorithm is to find the best parameters to fit the dataset i.e. choose ΞΈ0 and ΞΈ1 so that hΞΈ(x) is close to y for the training examples (x, y). This can be mathematically represented as, Price in 1000’ dollars Fit model by minimizing mean of squared errors
  • 10. Least Squared errors Linear Regression
  • 11. Predicted by h actual
  • 12. Cost function visualization Consider a simple case of hypothesis by setting ΞΈ0=0, then h becomes : hΞΈ(x)=ΞΈ1x which corresponds to different lines passing through the origin as shown in plots below as y-intercept i.e. ΞΈ0 is nulled out. Each value of ΞΈ1 corresponds to a different hypothesis as it is the slope of the line Simple Hypothesis At ΞΈ1=1, At ΞΈ1=2, At ΞΈ1=0.5, J(0.5)
  • 13. Cost function visualization Simple Hypothesis At ΞΈ1=1, At ΞΈ1=2, At ΞΈ1=0.5, J(0.5) On plotting points like this further, one gets the following graph for the cost function which is dependent on parameter ΞΈ1. plot each value of ΞΈ1 corresponds to a different hypothesizes
  • 14. Cost function visualization What is the optimal value of ΞΈ1 that minimizes J(ΞΈ1) ? It is clear that best value for ΞΈ1 =1 as J(ΞΈ1 ) = 0, which is the minimum. How to find the best value for ΞΈ1 ? Plotting ?? Not practical specially in high dimensions? The solution : 1. Analytical solution: not applicable for large datasets 2. Numerical solution: ex: Gradient descent .
  • 15. polting the cost function 𝑗 πœƒ0, πœƒ1
  • 16. Each point (like the red one above) represents a pair of parameter values for Ɵ0 and Ɵ1
  • 17.
  • 18.
  • 19. (for fixed , this is a function of x) (function of the parameters )
  • 20. (for fixed , this is a function of x) (function of the parameters )
  • 22. To reach the lowest point We should move the ball step by step in the gradient direction
  • 23. Andrew Ng 1 0 J(0,1) Imagine that this is a landscape of grassy park, and you want to go to the lowest point in the park as rapidly as possible Red: means high blue: means low Starting point local minimum
  • 24. Andrew Ng 0 1 J(0,1) Red: means high blue: means low With different starting point New Starting point New local minimum
  • 25. Andrew Ng Have some function Want Outline: β€’ Start with some β€’ Keep changing to reduce until we hopefully end up at a minimum
  • 26. Gradient Descent Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. This is where gradient descent steps in. There are two basic steps involved in this algorithm:  Start with some random value of ΞΈ0, ΞΈ1.  Keep updating the value of ΞΈ0, ΞΈ1 to reduce J(ΞΈ0, ΞΈ1) until minimum is reached.
  • 28.
  • 29.
  • 30. Understanding Gradient Descent Consider a simpler cost function J(ΞΈ1) and the objective: MinΞΈ1 J(ΞΈ1) Repeat{ πœƒ1 = πœƒ1 βˆ’ 𝛼 𝑑 π‘‘πœƒ1 𝐽 πœƒ1 } +ve slope -ve slope If the slope is positive and since Ξ± is a positive real number then overall term βˆ’πœΆ 𝒅 π’…πœ½πŸ 𝑱(𝜽𝟏) would be negative. This means the value of ΞΈ1 will be decreased in magnitude as shown by the arrows.
  • 31. Understanding Gradient Descent Consider a simpler cost function J(ΞΈ1) and the objective: MinΞΈ1 J(ΞΈ1) +ve slope -ve slope if the initialization was at point B, then the slope of the line would be negative and then update term would be positive leading to increase in the value of ΞΈ1 as shown in the plot Repeat{ πœƒ1 = πœƒ1 βˆ’ 𝛼 𝑑 π‘‘πœƒ1 𝐽 πœƒ1 }
  • 32. Understanding Gradient Descent Consider a simpler cost function J(ΞΈ1) and the objective: MinΞΈ1 J(ΞΈ1) +ve slope -ve slope Repeat{ πœƒ1 = πœƒ1 βˆ’ 𝛼 𝑑 π‘‘πœƒ1 𝐽 πœƒ1 } So, no matter where the ΞΈ1 is initialized, the algorithm ensures that parameter is updated in the right direction towards the minima, given proper value for Ξ± is chosen.
  • 33. How can we choose the learning Rate: Ξ±?
  • 34. here
  • 35. Repeat{ πœƒ1 = πœƒ1 βˆ’ 𝛼 𝑑 π‘‘πœƒ1 𝐽 πœƒ1 }
  • 36. β€’The yellow plot shows the divergence of the algorithm when the learning rate is really high wherein the learning steps overshoot. β€’The green plot shows the case where learning rate is not as large as the previous case but is high enough that the steps keep oscillating at a point which is not the minima. β€’The red plot would be the optimum curve for the cost drop as it drops steeply initially and then saturates very close to the optimum value. β€’The blue plot is the least value of Ξ±Ξ± and converges very slowly as the steps taken by the algorithm during update steps are very small. The learning rate 𝛼 vs the cost function 𝐽 πœƒ1
  • 37. How to choose the learning rate? Automatic Convergence Test: Gradient descent can be considered to be converged if the drop in cost function is not more than a preset threshold say 10βˆ’3 In order to choose optimum value of Ξ± run the algorithm with different values like: 1, 0.3, 0.1, 0.03, 0.01 etc and plot the learning curve to understand whether the value should be increased or decreased. Gradient Descent Stopping condition Practical tips
  • 38. Slope1> slope2> slope3> ….. By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no need to change the value of alpha. Fixed learning rate alpha Is it required to decrease the value of 𝜢 by time?
  • 39. Applying gradient decent to the linear model here
  • 40. area price 1 2 … m area age price 1 2 … m β„Žπœƒ π‘₯ = πœƒ0 + πœƒ1π‘₯1 + πœƒ2π‘₯2 β„Žπœƒ π‘₯ = πœƒ0 + πœƒ1π‘₯1 area …. #Floors price 1 2 … m β„Žπœƒ π‘₯ = πœƒ0 + πœƒ1π‘₯1 + β‹― + πœƒπ‘›π‘₯𝑛 Model against dataset the housing price problem
  • 41. Andrew Ng Gradient descent algorithm Linear Regression Model
  • 42. Fixed learning rate alpha Slope1> slope2> slope3> ….. By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no need to change the value of alpha.
  • 43. In order to apply gradient descent, the derivative term πœ• πœ•πœƒπ‘— 𝐽(πœƒ0, πœƒ1) needs to be calculated.
  • 44. πœ• πœ•πœƒπ‘— 𝐽(πœƒ0, πœƒ1) = πœ• πœ•πœƒπ‘— ( 1 2π‘š 𝑖=1 π‘š β„Žπœƒ π‘₯ 𝑖 βˆ’ 𝑦 𝑖 2 ) = 1 2π‘š ( πœ• πœ•πœƒπ‘— 𝑖=1 π‘š β„Žπœƒ(π‘₯ 𝑖 ) βˆ’ 𝑦 𝑖 2 ) = 1 2π‘š ( πœ• πœ•πœƒπ‘— 𝑖=1 π‘š πœƒ0 + πœƒ1π‘₯ 𝑖 βˆ’ 𝑦 𝑖 2 ) = 1 π‘š 𝑖=1 π‘š (β„Žπœƒ(π‘₯ 𝑖 ) βˆ’ 𝑦 𝑖 ) for j = 0 1 π‘š 𝑖=1 π‘š (β„Žπœƒ(π‘₯ 𝑖 ) βˆ’ 𝑦 𝑖 )π‘₯ 𝑖 for j = 1 In order to apply gradient descent, the derivative term πœ• πœ•πœƒπ‘— 𝐽(πœƒ0, πœƒ1) needs to be calculated.
  • 45.
  • 46. Risk of meeting different local optimum The problem of existing more than one local optima, may make the gradient decent converge to the non global optima. local optima local optima
  • 47. Cost/error function is always convex
  • 48. (for fixed , this is a function of x) (function of the parameters )
  • 49. (for fixed , this is a function of x) (function of the parameters )
  • 50. (for fixed , this is a function of x) (function of the parameters )
  • 51. (for fixed , this is a function of x) (function of the parameters )
  • 52. (for fixed , this is a function of x) (function of the parameters )
  • 53. (for fixed , this is a function of x) (function of the parameters )
  • 54. (for fixed , this is a function of x) (function of the parameters )
  • 55. (for fixed , this is a function of x) (function of the parameters )
  • 56. (for fixed , this is a function of x) (function of the parameters )
  • 57. (for fixed , this is a function of x) Now you can do prediction, given a house size!
  • 58. The gradient descent technique that uses all the training examples in each step is called Batch Gradient Descent. This is basically the calculation of the derivative term over all the training examples as it can be seen it the equation above. The equation of linear regression can also be solved using Normal Equations method, but it poses a disadvantage that it does not scale very well on larger data while gradient descent does.
  • 59. There are plenty of optimizers Examples 1. Adagrad 2. Adadelta 3. Adam 4. Conjugate Gradients 5. BFGS 6. Momentum 7. Nesterov Momentum 8. Newton’s Method 9. RMSProp 10. SGD