GradientDescent.pptx

Gradient Descent
• Gradient Descent is an optimization algorithm.
• Purpose: To find the optimal value of the cost function
• MSE/SSE is the default cost functions
• End goal is to find the best fit line.
• Used in both ML and DL

Best Fit Line
Among all the possible lines there will be only one line that minimizes errors.
That one line will be the best fit line

What is an error/residue in linear regression
• A residual is a measure of how far away a point is vertically from the regression line.
• Simply, it is the error between a predicted value and the observed actual value.

What is loss function in linear regression
• In statistics and machine learning, a loss function quantifies the losses generated by
the errors that we commit when:
Loss function = (𝒚 𝒂𝒄𝒕𝒖𝒂𝒍 − 𝒚(𝒑𝒓𝒆𝒅))^2

Types of loss function
• Mean Absolute Error (MAE).
• Mean Absolute Percentage Error (MAPE).
• Mean Squared Error (MSE).
• Root Mean Squared Error (RMSE).
• Huber Loss.
• Log-Cosh Loss.

What is cost function in linear regression
• Cost function measures how a machine learning model performs.
• Formula for cost function:

Relation Between Variance And Error or Residual
Equation of variance :
Equation of cost function :

Two types of variance
1. stochastic variance or noise :
- Unexplained Error

2. Deterministic variance :
- Explained error

CONVEX FUNCTION
Cost function :

Partial Differentiation
1. The process of finding the partial differentiation of the given function is called
partial differentiation.
2. It is used when we take one of the target lines of the graph of a given function &
obtain its slope.

Partial Differentiation
1. For small changes in the value of m
or c in either direction, gives how
much your error varies. i.e:- increase
or decrease
2. Partial derivative helps to reach that
point from where our BFL pass.

Learning Rate
Learning Rate gives the rate of speed where the Gradient moves during gradient
descent.

Learning Rate
1. Setting a high value makes your path unstable, and too low makes convergence slow.
2. If we put the Learning rate value Zero, then there is no movement.
• C(new)= C(old) – [ŋ*(slope)]

Learning Rate
• Example:-
C(old)= -10
Slope = -30
C(new) = C(old) – slope
= -10 - ( -30)
= 20
For reducing C(new) value we use Learning Rate
C(new) = C(old) - (ŋ * slope)
= -10 - (0.1 * -30)
= -7

Gradient Descent in DL
Learning Rate : 0.01 Data Points : 300 Epochs :
170

Gradient Descent in DL
Learning Rate : 0.02 Data Points : 300 Epochs :
40

GradientDescent.pptx

More Related Content

Similar to GradientDescent.pptx

Recently uploaded

GradientDescent.pptx