SlideShare a Scribd company logo
Backpropagation. A Peek into
the Mathematics of Optimization
1 Motivation
In order to get a truly deep understanding of deep neural networks, one must
look at the mathematics of it. As backpropagation is at the core of the optimiza-
tion process, we wanted to introduce you to it. This is definitely not a necessary
part of the course, as in TensorFlow, sk-learn, or any other machine learning
package (as opposed to simply NumPy), will have backpropagation methods
incorporated.
2 The specific net and notation we will examine
Here’s our simple network:
Figure 1: Backpropagation
We have two inputs: x1 and x2. There is a single hidden layer with 3 units
(nodes): h1, h2, and h3. Finally, there are two outputs: y1 and y2. The arrows
that connect them are the weights. There are two weights matrices: w, and u.
The w weights connect the input layer and the hidden layer. The u weights
connect the hidden layer and the output layer. We have employed the letters
w, and u, so it is easier to follow the computation to follow.
You can also see that we compare the outputs y1 and y2 with the targets t1 and
t2.
1
There is one last letter we need to introduce before we can get to the compu-
tations. Let a be the linear combination prior to activation. Thus, we have:
a(1)
= xw + b(1)
and a(2)
= hu + b(2)
.
Since we cannot exhaust all activation functions and all loss functions, we will
focus on two of the most common. A sigmoid activation and an L2-norm
loss.
With this new information and the new notation, the output y is equal to the ac-
tivated linear combination. Therefore, for the output layer, we have y = σ(a(2)
),
while for the hidden layer: h = σ(a(1)
).
We will examine backpropagation for the output layer and the hidden layer
separately, as the methodologies differ.
3 Useful formulas
I would like to remind you that:
L2-norm loss: L =
1
2
X
i
(yi − ti)2
The sigmoid function is:
σ(x) =
1
1 + e−x
and its derivative is:
σ0
(x) = σ(x)(1 − σ(x))
4 Backpropagation for the output layer
In order to obtain the update rule:
u ← u − η∇uL(u)
we must calculate
∇uL(u)
Let’s take a single weight uij. The partial derivative of the loss w.r.t. uij equals:
∂L
∂uij
=
∂L
∂yj
∂yj
∂a
(2)
j
∂a
(2)
j
∂uij
where i corresponds to the previous layer (input layer for this transformation)
and j corresponds to the next layer (output layer of the transformation). The
partial derivatives were computed simply following the chain rule.
∂L
∂yj
= (yj − tj)
2
following the L2-norm loss derivative.
∂yj
∂a
(2)
j
= σ(a
(2)
j )(1 − σ(a
(2)
j )) = yj(1 − yj)
following the sigmoid derivative.
Finally, the third partial derivative is simply the derivative of a(2)
= hu + b(2)
.
So,
∂a
(2)
j
∂uij
= hi
Replacing the partial derivatives in the expression above, we get:
∂L
∂uij
=
∂L
∂yj
∂yj
∂a
(2)
j
∂a
(2)
j
∂uij
= (yj − tj)yj(1 − yj)hi = δjhi
Therefore, the update rule for a single weight for the output layer is given by:
uij ← uij − ηδjhi
5 Backpropagation of a hidden layer
Similarly to the backpropagation of the output layer, the update rule for a single
weight, wij would depend on:
∂L
∂wij
=
∂L
∂hj
∂hj
∂a
(1)
j
∂a
(1)
j
∂wij
following the chain rule.
Taking advantage of the results we have so far for transformation using the
sigmoid activation and the linear model, we get:
∂hj
∂a
(1)
j
= σ(a
(1)
j )(1 − σ(a
(1)
j )) = hj(1 − hj)
and
∂a
(1)
j
∂wij
= xi
The actual problem for backpropagation comes from the term
∂L
∂hj
. That’s due
to the fact that there is no ”hidden” target. You can follow the solution for
weight w11 below. It is advisable to also check Figure 1, while going through
the computations.
3
∂L
∂h1
=
∂L
∂y1
∂y1
∂a
(2)
1
∂a
(2)
1
∂h1
+
∂L
∂y2
∂y2
∂a
(2)
2
∂a
(2)
2
∂h1
=
= (y1 − t1)y1(1 − y1)u11 + (y2 − t2)y2(1 − y2)u12
From here, we can calculate
∂L
∂w11
, which was what we wanted. The final ex-
pression is:
∂L
∂w11
= [(y1 − t1)y1(1 − y1)u11 + (y2 − t2)y2(1 − y2)u12] h1(1 − h1)x1
The generalized form of this equation is:
∂L
∂wij
=
X
k
(yk − tk)yk(1 − yk)ujkhj(1 − hj)xi
6 Backpropagation generalization
Using the results for backpropagation for the output layer and the hidden layer,
we can put them together in one formula, summarizing backpropagation, in the
presence of L2-norm loss and sigmoid activations.
∂L
∂wij
= δjxi
where for a hidden layer
δj =
X
k
δkwjkyj(1 − yj)
Kudos to those of you who got to the end.
Thanks for reading.
4

More Related Content

Similar to Backpropagation - A peek into the Mathematics of optimization.pdf

Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2
Scilab
 
redes neuronais
redes neuronaisredes neuronais
redes neuronais
Roland Silvestre
 
Unit 1
Unit 1Unit 1
2-Perceptrons.pdf
2-Perceptrons.pdf2-Perceptrons.pdf
2-Perceptrons.pdf
DrSmithaVasP
 
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
Alexander Litvinenko
 
NN-Ch6.PDF
NN-Ch6.PDFNN-Ch6.PDF
NN-Ch6.PDF
gnans Kgnanshek
 
5HBC: How to Graph Implicit Relations Intro Packet!
5HBC: How to Graph Implicit Relations Intro Packet!5HBC: How to Graph Implicit Relations Intro Packet!
5HBC: How to Graph Implicit Relations Intro Packet!A Jorge Garcia
 
Chapter No. 6: Backpropagation Networks
Chapter No. 6:  Backpropagation NetworksChapter No. 6:  Backpropagation Networks
Chapter No. 6: Backpropagation Networks
RamkrishnaPatil17
 
Matlab Sample Assignment Solution
Matlab Sample Assignment SolutionMatlab Sample Assignment Solution
Matlab Sample Assignment Solution
All Assignment Experts
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Ashwin Rao
 
Module 4 exponential and logarithmic functions
Module 4   exponential and logarithmic functionsModule 4   exponential and logarithmic functions
Module 4 exponential and logarithmic functions
dionesioable
 
Cheatsheet deep-learning
Cheatsheet deep-learningCheatsheet deep-learning
Cheatsheet deep-learning
Steve Nouri
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
arogozhnikov
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Sagacious IT Solution
 
Neural network and mlp
Neural network and mlpNeural network and mlp
Neural network and mlp
partha pratim deb
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
Revanth Kumar
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
tuxette
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
Reachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical SystemsReachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical SystemsM Reza Rahmati
 

Similar to Backpropagation - A peek into the Mathematics of optimization.pdf (20)

Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2
 
redes neuronais
redes neuronaisredes neuronais
redes neuronais
 
Unit 1
Unit 1Unit 1
Unit 1
 
2-Perceptrons.pdf
2-Perceptrons.pdf2-Perceptrons.pdf
2-Perceptrons.pdf
 
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
 
NN-Ch6.PDF
NN-Ch6.PDFNN-Ch6.PDF
NN-Ch6.PDF
 
5HBC: How to Graph Implicit Relations Intro Packet!
5HBC: How to Graph Implicit Relations Intro Packet!5HBC: How to Graph Implicit Relations Intro Packet!
5HBC: How to Graph Implicit Relations Intro Packet!
 
Chapter No. 6: Backpropagation Networks
Chapter No. 6:  Backpropagation NetworksChapter No. 6:  Backpropagation Networks
Chapter No. 6: Backpropagation Networks
 
Matlab Sample Assignment Solution
Matlab Sample Assignment SolutionMatlab Sample Assignment Solution
Matlab Sample Assignment Solution
 
2. 2.1 BPN.pdf
2. 2.1 BPN.pdf2. 2.1 BPN.pdf
2. 2.1 BPN.pdf
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
 
Module 4 exponential and logarithmic functions
Module 4   exponential and logarithmic functionsModule 4   exponential and logarithmic functions
Module 4 exponential and logarithmic functions
 
Cheatsheet deep-learning
Cheatsheet deep-learningCheatsheet deep-learning
Cheatsheet deep-learning
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Neural network and mlp
Neural network and mlpNeural network and mlp
Neural network and mlp
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Reachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical SystemsReachability Analysis Control of Non-Linear Dynamical Systems
Reachability Analysis Control of Non-Linear Dynamical Systems
 

Recently uploaded

Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 

Recently uploaded (20)

Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 

Backpropagation - A peek into the Mathematics of optimization.pdf

  • 1. Backpropagation. A Peek into the Mathematics of Optimization 1 Motivation In order to get a truly deep understanding of deep neural networks, one must look at the mathematics of it. As backpropagation is at the core of the optimiza- tion process, we wanted to introduce you to it. This is definitely not a necessary part of the course, as in TensorFlow, sk-learn, or any other machine learning package (as opposed to simply NumPy), will have backpropagation methods incorporated. 2 The specific net and notation we will examine Here’s our simple network: Figure 1: Backpropagation We have two inputs: x1 and x2. There is a single hidden layer with 3 units (nodes): h1, h2, and h3. Finally, there are two outputs: y1 and y2. The arrows that connect them are the weights. There are two weights matrices: w, and u. The w weights connect the input layer and the hidden layer. The u weights connect the hidden layer and the output layer. We have employed the letters w, and u, so it is easier to follow the computation to follow. You can also see that we compare the outputs y1 and y2 with the targets t1 and t2. 1
  • 2. There is one last letter we need to introduce before we can get to the compu- tations. Let a be the linear combination prior to activation. Thus, we have: a(1) = xw + b(1) and a(2) = hu + b(2) . Since we cannot exhaust all activation functions and all loss functions, we will focus on two of the most common. A sigmoid activation and an L2-norm loss. With this new information and the new notation, the output y is equal to the ac- tivated linear combination. Therefore, for the output layer, we have y = σ(a(2) ), while for the hidden layer: h = σ(a(1) ). We will examine backpropagation for the output layer and the hidden layer separately, as the methodologies differ. 3 Useful formulas I would like to remind you that: L2-norm loss: L = 1 2 X i (yi − ti)2 The sigmoid function is: σ(x) = 1 1 + e−x and its derivative is: σ0 (x) = σ(x)(1 − σ(x)) 4 Backpropagation for the output layer In order to obtain the update rule: u ← u − η∇uL(u) we must calculate ∇uL(u) Let’s take a single weight uij. The partial derivative of the loss w.r.t. uij equals: ∂L ∂uij = ∂L ∂yj ∂yj ∂a (2) j ∂a (2) j ∂uij where i corresponds to the previous layer (input layer for this transformation) and j corresponds to the next layer (output layer of the transformation). The partial derivatives were computed simply following the chain rule. ∂L ∂yj = (yj − tj) 2
  • 3. following the L2-norm loss derivative. ∂yj ∂a (2) j = σ(a (2) j )(1 − σ(a (2) j )) = yj(1 − yj) following the sigmoid derivative. Finally, the third partial derivative is simply the derivative of a(2) = hu + b(2) . So, ∂a (2) j ∂uij = hi Replacing the partial derivatives in the expression above, we get: ∂L ∂uij = ∂L ∂yj ∂yj ∂a (2) j ∂a (2) j ∂uij = (yj − tj)yj(1 − yj)hi = δjhi Therefore, the update rule for a single weight for the output layer is given by: uij ← uij − ηδjhi 5 Backpropagation of a hidden layer Similarly to the backpropagation of the output layer, the update rule for a single weight, wij would depend on: ∂L ∂wij = ∂L ∂hj ∂hj ∂a (1) j ∂a (1) j ∂wij following the chain rule. Taking advantage of the results we have so far for transformation using the sigmoid activation and the linear model, we get: ∂hj ∂a (1) j = σ(a (1) j )(1 − σ(a (1) j )) = hj(1 − hj) and ∂a (1) j ∂wij = xi The actual problem for backpropagation comes from the term ∂L ∂hj . That’s due to the fact that there is no ”hidden” target. You can follow the solution for weight w11 below. It is advisable to also check Figure 1, while going through the computations. 3
  • 4. ∂L ∂h1 = ∂L ∂y1 ∂y1 ∂a (2) 1 ∂a (2) 1 ∂h1 + ∂L ∂y2 ∂y2 ∂a (2) 2 ∂a (2) 2 ∂h1 = = (y1 − t1)y1(1 − y1)u11 + (y2 − t2)y2(1 − y2)u12 From here, we can calculate ∂L ∂w11 , which was what we wanted. The final ex- pression is: ∂L ∂w11 = [(y1 − t1)y1(1 − y1)u11 + (y2 − t2)y2(1 − y2)u12] h1(1 − h1)x1 The generalized form of this equation is: ∂L ∂wij = X k (yk − tk)yk(1 − yk)ujkhj(1 − hj)xi 6 Backpropagation generalization Using the results for backpropagation for the output layer and the hidden layer, we can put them together in one formula, summarizing backpropagation, in the presence of L2-norm loss and sigmoid activations. ∂L ∂wij = δjxi where for a hidden layer δj = X k δkwjkyj(1 − yj) Kudos to those of you who got to the end. Thanks for reading. 4