A Primer on Back-Propagation of Errors
(applied to neural networks)
• Summary of Forward-Propagation
• The Calculus of Back-propagation
A Feed-Forward Network is a Brain-Inspired Metaphor
Feed-forward to calculate the error relative to the
Error-Function (aka Loss-, Cost-, or Objective-Function)
• In the feed-forward path, calculate the error relative to the desired output
• We define a error-function E(X3, Y) as the “penalty” of predicting X3 when the true output is Y.
• The objective is to minimize the error across all the training samples.
• The error/loss E(X3, Y) assigns a numerical score (a scalar) for the network’s output X3 given
the expected output Y.
• The loss is zero only for cases where the neural network’s output is correct.
Sigmoid Activation Function
The sigmoid activation function
σ(x) = 1/(1 + e−x)
is an S-shaped activation function transforming all
values of x in the range, [0,1]
Note, in practice, we don’t expect a global minima, as shown here
“Unshackled by the chain-rule”
-Patrick Winston, MIT
Derivative of the Error E with-respect-to the
Derivative of the Sigmoid Activation Function
For the Sigmoid function, the cool thing is, the derivative of the output, X3
(with respect to the input, P3) is expressed in terms of the output, i.e.,
X3 . (1 - X3)
Derivative of P3 with-respect-to W3
Propagate the errors backward and adjust the weights,
w, so the actual output mimics the desired output
Computations are Localized & Partially Pre-computed in the Previous Layer
☑If there’s a representative set of inputs and
outputs, then back-propagation can learn the
☑Back-propagation has linear performance
relative to the number of layers.
☑Simple to implement (and test)
Concepts crystalized from MIT Professor Patrick Winston’s lecture,