gradientDescentTNP (2).pdf

CS771: Intro to ML
Gradient descent algorithm
• Gradient descent algorithm is an optimization algorithm which is used to
minimise the function.
• The function which is set to be minimised is called as an objective
function.
• For machine learning, the objective function is also termed as the cost
function or loss function.
• Loss function is the measure of the squared difference between actual
values and predictions

CS771: Intro to ML
Gradient descent algorithm
2
• Gradient descent is an optimization algorithm used to
minimize some function by iteratively moving in the
direction of steepest descent .
• In machine learning, we use gradient descent to
update the parameters of our model. Parameters refer
to coefficients in Linear Regression

CS771: Intro to ML
5
Learning rate
• The size of these steps is called the learning rate.
• With a high learning rate we can cover more ground each step, but we
risk overshooting the lowest point since the slope of the hill is
constantly changing.
• A low learning rate is more precise, but calculating the gradient is
time-consuming, so it will take us a very long time to get to the
bottom.

CS771: Intro to ML
Local & Global Minima , Maxima
10
𝑓(𝑥)
Global
maxima
A local
maxima
A local
maxima
A local
minima
A local
minima A local
minima
Global
minima
𝑥

CS771: Intro to ML
the tangent is perfectly horizontal at the local minima and maxima.

CS771: Intro to ML
Derivatives
13
 How the derivative itself changes tells us about the function’s optima
 The second derivative 𝑓’’(𝑥) can provide this information
𝑓’(𝑥)= 0 at 𝑥,
𝑓’(𝑥)>0 just
before 𝑥 𝑓’(𝑥)<0
just after 𝑥
𝑥 is a maxima
𝑓’(𝑥)= 0 at 𝑥
𝑓’(𝑥)< 0 just
before 𝑥 𝑓’(𝑥)>0
just after 𝑥
𝑥 is a minima
𝑓’(𝑥)= 0 at 𝑥
𝑓’(𝑥)= 0 just
before 𝑥 𝑓’(𝑥)= 0
just after 𝑥
𝑥 may be a saddle
𝑓’(𝑥)= 0 and 𝑓’’(𝑥) <
0
𝑥 is a maxima
𝑓’(𝑥)= 0 and 𝑓’’ 𝑥 > 0
𝑥 is a minima
𝑓’(𝑥)= 0 and 𝑓’’ 𝑥 = 0
𝑥 may be a saddle. May
need higher derivatives

CS771: Intro to ML
Saddle Points
15
 Points where derivative is zero but are neither minima nor maxima
 Second or higher derivative may help identify if a stationary point is a
saddle
Saddle is a point of
inflection where the
derivative is also zero
A saddle
point

CS771: Intro to ML
Gradient Descent: An Illustration
16
𝒘∗
𝒘(0) 𝒘(1) 𝒘(2) 𝒘(0)
𝒘(1)
𝒘(2) 𝒘∗
𝒘(3) 𝒘(3)
Stuck at a
local minima
Negative gradient here (
𝛿𝐿
𝛿𝑤
<
0). Let’s move in the positive
direction
Positive gradient
here. Let’s move
in the negative
direction
Learning rate is very important
Good initialization
is very important
𝐿(𝒘)
𝒘

CS771: Intro to ML
Optimal value of intercept ?
24

CS771: Intro to ML
Assume intercept=0
25

CS771: Intro to ML
For row =1
26

CS771: Intro to ML
For row =2 and row=3
27

CS771: Intro to ML
Different values of intercept
30

CS771: Intro to ML
Red line is the slope .. As the intercept
increases…
32

CS771: Intro to ML
For the first row
34

CS771: Intro to ML
Third Intercept
37

CS771: Intro to ML
Fourth intercept
38

gradientDescentTNP (2).pdf

Recommended

Recommended

More Related Content

Similar to gradientDescentTNP (2).pdf

Similar to gradientDescentTNP (2).pdf (20)

Recently uploaded

Recently uploaded (20)

gradientDescentTNP (2).pdf