BACK PROPAGATION
METHOD
LESSON 2
DESCRIPTION
• It’s a common method of training artificial neural networks and
used in conjunction with an optimization method such as
gradient descent.
• The algorithm repeats a two phase cycle, propagation and
weight update.
• When an input vector is presented to the network, it is
propagated forward through the network, layer by layer, until it
reaches the output layer.
• The output of the network is then compared to the desired
output, using a loss function, and an error value is calculated
for each of the neurons in the output layer.
DAVID RUMELHART
GEOFFREY HINTON
RONALD J WILLIAMS
The other two took the good quotes
- Ronald J. Williams
THE HEART OF BACK PROPAGATION
Expression for the
partial derivative
∂C/∂w of the cost
function C with
respect to any weight
w (or bias b) in the
network.
BASIC THEORY
• The goal of back propagation is to compute the partial
derivatives ∂C/∂w and ∂C/∂b of the cost function C with
respect to any weight w or bias b in the network.
• C=12n∑x‖y(x)−aL(x)‖^2
• where: n is the total number of training examples;
the sum is over individual training examples, x;
y=y(x) is the corresponding desired output;
L denotes the number of layers in the network;
and aL=aL(x) is the vector of activations output from the
network when x is input.
2 MAJOR ASSUMPTIONS
• The first assumption we need is that the cost function can be
written as an average C=1n∑xCx over cost functions Cx for
individual training examples, x.
• This is the case for the quadratic cost function, where the cost
for a single training example is Cx=12‖y−aL‖^2.
• Back propagation actually lets us do is compute the partial
derivatives ∂Cx/∂w and ∂Cx/∂b for a single training example.
We then recover ∂C/∂w and ∂C/∂b by averaging over training
examples.
2ND ASSUMPTION
• The second assumption we make about the cost is that it can
be written as a function of the outputs from the neural network
•
• For example, the quadratic cost function satisfies this
requirement, since the quadratic cost for a single training
example x may be written as C=12‖y−aL‖2=12∑j(yj−aLj)^2
THE HAMDARD PRODUCT
ROOH-AFZA
THE HADAMARD PRODUCT
• The back propagation algorithm is based on common linear
algebraic operations - things like vector addition, multiplying a
vector by a matrix, and so on.
• In particular, suppose s and t are two vectors of the same
dimension. Then we use s⊙t to denote the elementwise product
of the two vectors.
• Thus the components of s⊙t are just (s⊙t)j=sjtj. As an example,
[12]⊙[34]=[1∗32∗4]=[38]
• This kind of elementwise multiplication is sometimes called the
Hadamard product
4 BASIC EQUATIONS OF BACK
PROPAGATION
ADDITIONAL THEORY
ADDITIONAL THEORY

Back propagation method

  • 1.
  • 2.
    DESCRIPTION • It’s acommon method of training artificial neural networks and used in conjunction with an optimization method such as gradient descent. • The algorithm repeats a two phase cycle, propagation and weight update. • When an input vector is presented to the network, it is propagated forward through the network, layer by layer, until it reaches the output layer. • The output of the network is then compared to the desired output, using a loss function, and an error value is calculated for each of the neurons in the output layer.
  • 3.
  • 4.
  • 5.
    RONALD J WILLIAMS Theother two took the good quotes - Ronald J. Williams
  • 6.
    THE HEART OFBACK PROPAGATION Expression for the partial derivative ∂C/∂w of the cost function C with respect to any weight w (or bias b) in the network.
  • 7.
    BASIC THEORY • Thegoal of back propagation is to compute the partial derivatives ∂C/∂w and ∂C/∂b of the cost function C with respect to any weight w or bias b in the network. • C=12n∑x‖y(x)−aL(x)‖^2 • where: n is the total number of training examples; the sum is over individual training examples, x; y=y(x) is the corresponding desired output; L denotes the number of layers in the network; and aL=aL(x) is the vector of activations output from the network when x is input.
  • 8.
    2 MAJOR ASSUMPTIONS •The first assumption we need is that the cost function can be written as an average C=1n∑xCx over cost functions Cx for individual training examples, x. • This is the case for the quadratic cost function, where the cost for a single training example is Cx=12‖y−aL‖^2. • Back propagation actually lets us do is compute the partial derivatives ∂Cx/∂w and ∂Cx/∂b for a single training example. We then recover ∂C/∂w and ∂C/∂b by averaging over training examples.
  • 9.
    2ND ASSUMPTION • Thesecond assumption we make about the cost is that it can be written as a function of the outputs from the neural network • • For example, the quadratic cost function satisfies this requirement, since the quadratic cost for a single training example x may be written as C=12‖y−aL‖2=12∑j(yj−aLj)^2
  • 10.
  • 11.
    THE HADAMARD PRODUCT •The back propagation algorithm is based on common linear algebraic operations - things like vector addition, multiplying a vector by a matrix, and so on. • In particular, suppose s and t are two vectors of the same dimension. Then we use s⊙t to denote the elementwise product of the two vectors. • Thus the components of s⊙t are just (s⊙t)j=sjtj. As an example, [12]⊙[34]=[1∗32∗4]=[38] • This kind of elementwise multiplication is sometimes called the Hadamard product
  • 12.
    4 BASIC EQUATIONSOF BACK PROPAGATION
  • 13.
  • 14.