SlideShare a Scribd company logo
1 of 38
Download to read offline
DERIVATION OF THE GRADIENT
DESCENT RULE
• To calculate the direction of steepest descent
along the error surface:
• This direction can be found by computing the
derivative of E with respect to each
component of the vector 𝑤.
• This vector derivative is called the gradient of
E with respect to 𝑤 written 𝛻𝐸(𝑤)
• Since the gradient specifies the direction of
steepest increase of E, the training rule for
gradient descent is
• training rule can also be written in its
component form
So final standard GRADIENT DESCENT
STOCHASTIC APPROXIMATION TO
GRADIENT DESCENT
• Gradient descent is a strategy for searching
through a large or infinite hypothesis space
that can be applied whenever
(1) the hypothesis space contains continuously
parameterized hypotheses
(2) the error can be differentiated with respect
to these hypothesis parameters.
• The key practical difficulties in applying
gradient descent are
(1) converging to a local minimum can
sometimes be quite slow
(2) if there are multiple local minima in the error
surface, then there is no guarantee that the
procedure will find the global minimum.
• One common variation on gradient descent
intended to alleviate these difficulties is called
incremental gradient descent, or alternatively
stochastic gradient descent.
• Whereas the standard gradient descent
training rule presented in Equation computes
weight updates after summing over all the
training examples in D, the idea behind
stochastic gradient descent is to approximate
this gradient descent search by updating
weights incrementally, following the
calculation of the error for each individual
example.
MULTILAYER NETWORKS AND
THE BACKPROPAGATION ALGORITHM
• single perceptrons can only express linear
decision surfaces.
A Differentiable Threshold Unit
The sigmoid threshold unit
• the sigmoid unit computes its output o as
The BACKPROPAGATIAOIN Algorithm
• we begin by redefining E to sum the errors
over all of the network output units:
ADDING MOMENTUM
LEARNING IN ARBITRARY ACYCLIC
NETWORKS
Derivation of the
BACKPROPAGATION Rule
• The specific problem we address here is
deriving the stochastic gradient descent rule
implemented by the algorithm
• Stochastic gradient descent involves iterating
through the training examples one at a time,
for each training example d descending the
gradient of the error Ed with respect to this
single example.
• In other words, for each training example d
every weight wji is updated by adding to it
𝛻wji
1.
• To begin, notice that weight wji can influence
the rest of the network only through netj.
Case 1: Training Rule for Output Unit
Weights.
• netj can influence the network only through oj
• So,
• Finally, we have the stochastic gradient
descent rule for output units
Case 2: Training Rule for Hidden Unit
Weights
• netj can influence the network outputs (and
therefore Ed) only through the units in
Downstream(j).
• So,

More Related Content

Similar to Derivation of the gradient descent rule.

PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Conducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationConducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationMalik Abdul Wahab
 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxAliElMoselhy
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
Reporting.ppt
Reporting.pptReporting.ppt
Reporting.pptasodiatel
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...Jihun Yun
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithmswapnac12
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptxsurbhidutta4
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Derivative Free Optimization and Robust Optimization
Derivative Free Optimization and Robust OptimizationDerivative Free Optimization and Robust Optimization
Derivative Free Optimization and Robust OptimizationSSA KPI
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 

Similar to Derivation of the gradient descent rule. (20)

Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Conducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationConducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulation
 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptx
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
Reporting.ppt
Reporting.pptReporting.ppt
Reporting.ppt
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
Linear regression
Linear regressionLinear regression
Linear regression
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Derivative Free Optimization and Robust Optimization
Derivative Free Optimization and Robust OptimizationDerivative Free Optimization and Robust Optimization
Derivative Free Optimization and Robust Optimization
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 

Recently uploaded

Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxmichaelprrior
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdfKamal Acharya
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AISheetal Jain
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxalijaker017
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDrGurudutt
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdftawat puangthong
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...MohammadAliNayeem
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfMadan Karki
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoninghotman30312
 
Circuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineeringCircuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineeringKanchhaTamang
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfqasastareekh
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesRashidFaridChishti
 

Recently uploaded (20)

Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsx
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdf
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AI
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdf
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
Circuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineeringCircuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineering
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
 

Derivation of the gradient descent rule.

  • 1. DERIVATION OF THE GRADIENT DESCENT RULE
  • 2. • To calculate the direction of steepest descent along the error surface: • This direction can be found by computing the derivative of E with respect to each component of the vector 𝑤. • This vector derivative is called the gradient of E with respect to 𝑤 written 𝛻𝐸(𝑤)
  • 3.
  • 4. • Since the gradient specifies the direction of steepest increase of E, the training rule for gradient descent is
  • 5. • training rule can also be written in its component form
  • 6.
  • 7. So final standard GRADIENT DESCENT
  • 8. STOCHASTIC APPROXIMATION TO GRADIENT DESCENT • Gradient descent is a strategy for searching through a large or infinite hypothesis space that can be applied whenever (1) the hypothesis space contains continuously parameterized hypotheses (2) the error can be differentiated with respect to these hypothesis parameters.
  • 9. • The key practical difficulties in applying gradient descent are (1) converging to a local minimum can sometimes be quite slow (2) if there are multiple local minima in the error surface, then there is no guarantee that the procedure will find the global minimum.
  • 10. • One common variation on gradient descent intended to alleviate these difficulties is called incremental gradient descent, or alternatively stochastic gradient descent.
  • 11.
  • 12. • Whereas the standard gradient descent training rule presented in Equation computes weight updates after summing over all the training examples in D, the idea behind stochastic gradient descent is to approximate this gradient descent search by updating weights incrementally, following the calculation of the error for each individual example.
  • 13.
  • 14. MULTILAYER NETWORKS AND THE BACKPROPAGATION ALGORITHM • single perceptrons can only express linear decision surfaces.
  • 17. • the sigmoid unit computes its output o as
  • 18.
  • 19. The BACKPROPAGATIAOIN Algorithm • we begin by redefining E to sum the errors over all of the network output units:
  • 20.
  • 22. LEARNING IN ARBITRARY ACYCLIC NETWORKS
  • 23. Derivation of the BACKPROPAGATION Rule • The specific problem we address here is deriving the stochastic gradient descent rule implemented by the algorithm
  • 24. • Stochastic gradient descent involves iterating through the training examples one at a time, for each training example d descending the gradient of the error Ed with respect to this single example.
  • 25. • In other words, for each training example d every weight wji is updated by adding to it 𝛻wji
  • 26.
  • 27. 1. • To begin, notice that weight wji can influence the rest of the network only through netj.
  • 28. Case 1: Training Rule for Output Unit Weights. • netj can influence the network only through oj
  • 29.
  • 30.
  • 31.
  • 33. • Finally, we have the stochastic gradient descent rule for output units
  • 34. Case 2: Training Rule for Hidden Unit Weights
  • 35. • netj can influence the network outputs (and therefore Ed) only through the units in Downstream(j).
  • 36.
  • 37.