Gradient Descent
• Gradient Descent is an optimization algorithm.
• Purpose: To find the optimal value of the cost function
• MSE/SSE is the default cost functions
• End goal is to find the best fit line.
• Used in both ML and DL
Best Fit Line
Among all the possible lines there will be only one line that minimizes errors.
That one line will be the best fit line
What is an error/residue in linear regression
• A residual is a measure of how far away a point is vertically from the regression line.
• Simply, it is the error between a predicted value and the observed actual value.
What is loss function in linear regression
• In statistics and machine learning, a loss function quantifies the losses generated by
the errors that we commit when:
Loss function = (𝒚 𝒂𝒄𝒕𝒖𝒂𝒍 − 𝒚(𝒑𝒓𝒆𝒅))^2
Types of loss function
• Mean Absolute Error (MAE).
• Mean Absolute Percentage Error (MAPE).
• Mean Squared Error (MSE).
• Root Mean Squared Error (RMSE).
• Huber Loss.
• Log-Cosh Loss.
What is cost function in linear regression
• Cost function measures how a machine learning model performs.
• Formula for cost function:
Relation Between Variance And Error or Residual
Equation of variance :
Equation of cost function :
Two types of variance
1. stochastic variance or noise :
- Unexplained Error
2. Deterministic variance :
- Explained error
CONVEX FUNCTION
Cost function :
Partial Differentiation
1. The process of finding the partial differentiation of the given function is called
partial differentiation.
2. It is used when we take one of the target lines of the graph of a given function &
obtain its slope.
Partial Differentiation
1. For small changes in the value of m
or c in either direction, gives how
much your error varies. i.e:- increase
or decrease
2. Partial derivative helps to reach that
point from where our BFL pass.
Learning Rate
Learning Rate gives the rate of speed where the Gradient moves during gradient
descent.
Learning Rate
1. Setting a high value makes your path unstable, and too low makes convergence slow.
2. If we put the Learning rate value Zero, then there is no movement.
• C(new)= C(old) – [ŋ*(slope)]
Learning Rate
• Example:-
C(old)= -10
Slope = -30
C(new) = C(old) – slope
= -10 - ( -30)
= 20
For reducing C(new) value we use Learning Rate
C(new) = C(old) - (ŋ * slope)
= -10 - (0.1 * -30)
= -7
Gradient Descent in ML
Gradient Descent in DL
Learning Rate : 0.01 Data Points : 300 Epochs :
170
Gradient Descent in DL
Learning Rate : 0.02 Data Points : 300 Epochs :
40
THANK YOU

GradientDescent.pptx

  • 1.
    Gradient Descent • GradientDescent is an optimization algorithm. • Purpose: To find the optimal value of the cost function • MSE/SSE is the default cost functions • End goal is to find the best fit line. • Used in both ML and DL
  • 2.
    Best Fit Line Amongall the possible lines there will be only one line that minimizes errors. That one line will be the best fit line
  • 3.
    What is anerror/residue in linear regression • A residual is a measure of how far away a point is vertically from the regression line. • Simply, it is the error between a predicted value and the observed actual value.
  • 4.
    What is lossfunction in linear regression • In statistics and machine learning, a loss function quantifies the losses generated by the errors that we commit when: Loss function = (𝒚 𝒂𝒄𝒕𝒖𝒂𝒍 − 𝒚(𝒑𝒓𝒆𝒅))^2
  • 5.
    Types of lossfunction • Mean Absolute Error (MAE). • Mean Absolute Percentage Error (MAPE). • Mean Squared Error (MSE). • Root Mean Squared Error (RMSE). • Huber Loss. • Log-Cosh Loss.
  • 6.
    What is costfunction in linear regression • Cost function measures how a machine learning model performs. • Formula for cost function:
  • 7.
    Relation Between VarianceAnd Error or Residual Equation of variance : Equation of cost function :
  • 8.
    Two types ofvariance 1. stochastic variance or noise : - Unexplained Error
  • 9.
    2. Deterministic variance: - Explained error
  • 10.
  • 11.
    Partial Differentiation 1. Theprocess of finding the partial differentiation of the given function is called partial differentiation. 2. It is used when we take one of the target lines of the graph of a given function & obtain its slope.
  • 12.
    Partial Differentiation 1. Forsmall changes in the value of m or c in either direction, gives how much your error varies. i.e:- increase or decrease 2. Partial derivative helps to reach that point from where our BFL pass.
  • 13.
    Learning Rate Learning Rategives the rate of speed where the Gradient moves during gradient descent.
  • 14.
    Learning Rate 1. Settinga high value makes your path unstable, and too low makes convergence slow. 2. If we put the Learning rate value Zero, then there is no movement. • C(new)= C(old) – [ŋ*(slope)]
  • 15.
    Learning Rate • Example:- C(old)=-10 Slope = -30 C(new) = C(old) – slope = -10 - ( -30) = 20 For reducing C(new) value we use Learning Rate C(new) = C(old) - (ŋ * slope) = -10 - (0.1 * -30) = -7
  • 16.
  • 17.
    Gradient Descent inDL Learning Rate : 0.01 Data Points : 300 Epochs : 170
  • 18.
    Gradient Descent inDL Learning Rate : 0.02 Data Points : 300 Epochs : 40
  • 19.