Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn

y
1
y2
y3
x
1
x2
xn
This simple neural network must be trained to recognize
handwritten alphabets ‘a’, ‘b’ and ‘c’
a
b
c
Neural Network

The handwritten alphabets are present as images of 28*28 pixels
y
1
y2
y3
x
1
x2
xn
a
b
c
Neural Network
28
28

The 784 pixels are fed as input to the first layer of our neural
network
y
1
y2
y3
x
1
x2
xn
a
b
c
28*28=784
Neural Network
784
neurons
28
28

The initial prediction is made using the random weights assigned
to each channel
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28*28=784
Neural Network
28
28

Our network predicts the input to be ‘b’ with a probability of 0.5
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
Neural Network
28

The predicted probabilities are compared against the actual
probabilities and the error is calculated
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
0.9
0.0
0.0
actual probabilities
+0.6
-0.5
-0.2
error = actual - prediction
Neural Network
28

The magnitude indicates the amount of change while the sign
indicates an increase or decrease in the weights
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
Neural Network
0.0
0.0
-0.5
-0.2
28
0.9 +0.6

The information is transmitted back through the network
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
Neural Network
0.0
0.0
-0.5
-0.2
28
0.9 +0.6

Weights through out the network are adjusted in order to reduce
the loss in prediction
0.6
0.2
0.0
x
1
x2
xn
a
b
c
28*28=784
+0.3
-0.2
0.0
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
Neural Network
0.0
0.0
28
0.9

a
b
c
28
28
28*28=784
Neural Network
+0.2
-0.1
0.0
In this manner, we keep training the network with multiple inputs
until it is able to predict with high accuracy
0.7
0.
1
0.0
x
1
x2
xn
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
0.0
0.0
0.9

a
b
c
28
28
28*28=784
Neural Network
0.0
0.0
0.0
In this manner, we keep training the network with multiple inputs
until it is able to predict with high accuracy
0.9
0.
1
0.0
x
1
x2
xn
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
0.0
0.0
0.9

28
28
1.0
0.0
1.
0
0.0
0.0
Neural Network
0.0
x
1
x2
xn
a
b
c
28*28=784
0.0
0.0
0.0
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
Similarly, our network is trained with the images for ‘b’ and ‘c’ too

Here’s a straightforward dataset. Let’s build a neural network to
predict the outputs, given the inputs
Input Output
0
1
2
3
4
0
6
1
2
1
8
2
4
Example

Input Output
Neural Network
x y
This box
represents our
neural network
x*w
Example

Input Output
Neural Network
x y
‘w’ is the weight
x*w
Example

Input Output
Neural Network
x y
The network starts training
itself by choosing a
random value for w
x*w
Example

Input Output
Neural Network
x y
x*wW=3
Example

Example
Input Output
Neural Network
x y
x*wW=3

Input Output
Neural Network
x y
x*wW=6
Our second model has
w=6
Example

Example
Input Output
Neural Network
x y
x*wW=6
Our second model has
w=6

Input Output
Neural Network
x y
x*wW=6
Example

Input Output
Neural Network
x y
x*wW=9
And finally, our third model
has w=9
Example

Example
Input Output
Neural Network
x y
x*wW=9
And finally, our third model
has w=9

Input Output
Neural Network
x y
x*wW=9
Example

Example
Input Output
Neural Network
x y
x*wW=9
We, as humans, can know just by a look at the data that our weight
should be 6. But how does the machine come to this conclusion?

Loss function
The loss function is a measurement of error which defines the
precision lost on comparing the predicted output to the actual output
loss = [(actual output) – (predicted output)]2

Loss function
Let’s apply the loss
function to input
value “2”
Input Actual Output W=3 W=6 W=9
2 12 6 12 18
---loss (12-6)2 = 36 (12-12)2 = 0 (12-18)2 = 36Los
s
---

Loss function
We now plot a graph
for weight versus loss.

Loss function
This graphical method of
finding the minimum of a
function is called gradient
descent

A random point on this curve is chosen
and the slope at this point is calculated
Gradient descent

A random point on this curve is chosen
and the slope at this point is calculated
A positive slope indicates an increase in
weight
Gradient descent

This time the slope is negative. Hence,
another random point towards its left is
chosen
weight
A negative slope indicates a decrease in
weight
Gradient descent

Gradient descent
loss
This time the slope is negative. Hence,
another random point towards its left is
chosen
weight
weight
We continue checking
slopes at various points in
this manner

Our aim is to reach a point where the
slope is zero
weight
weight
A zero slope indicates the appropriate
weight
Gradient descent

Backpropagation
Backpropagation is the process of updating the weights of the
network in order to reduce the error in prediction

Backpropagation
The magnitude of loss at any point on our graph, combined with
the slope is fed back to the network
backpropagation

Backpropagation
A random point on the graph gives a loss value of 36 with a
positive slope
backpropagation

Backpropagation
positive slope
this manner
positive slope
36 is quite a large number. This means our current weight
needs to change by a large number
A positive slope indicates that the change in weight must
be positive

Backpropagation
positive slope
this manner
Similarly, another random point on the graph gives a loss value of
10 with a negative slope
10 is a small number. Hence, the weight requires to be
tuned quite less
A negative slope indicates that the weight needs to be
reduced rather than increased

Backpropagation
After multiple iterations of backpropagation, our weights are
assigned the appropriate value
Input Output
x y
x*6

Backpropagation
Input Output
x y
x*6
At this point, our network is
trained and can be used to
make predictions

Backpropagation
Input Output
x y
x*6
Let’s now get back to our first example and see where backpropagation
and gradient descent fall into place

As mentioned earlier, our predicted output is compared against
the actual output
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28
28*28=784
1.
0
0.0
0.0
+0.7
-0.5
-0.2
Neural Network

As mentioned earlier, our predicted output is compared against
the actual output
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28
28*28=784
1.
0
0.0
0.0
Neural Network
+0.7
-0.5
-0.2
loss(a)  0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c)  0.22 = 0.04
1st iteration

0.6
0.2
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
+0.4
-0.2
-0.1
0.2
0.
8
1.
30.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.7
Neural Network
1.
0
0.0
0.0

0.6
0.2
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
0.2
0.
8
1.
30.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.7
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 = 0.04
loss(c)  0.12 = 0.01
+0.4
-0.2
-0.1

0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
+0.2
-0.1
-0.1
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0

0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01

0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
Let’s focus on finding the
minimum loss for our
variable ‘a’

0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
And here is where gradient
descent comes into the
picture

Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
Let’s assume the below to be our graph for the loss of
prediction with variable a as compared to the weights
contributing to it from the second last layer

Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
Random points chosen on the graph is now backpropagated
through the network in order to adjust the weights
0.8
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3

Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
The network is run once again with the new weights
1.0
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3

Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
This process is repeated multiple times till it provides
accurate predictions
1.0
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3

Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
This process is repeated multiple times till it provides
accurate predictions
1.0
0.0
0.0
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3

Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
The weights are further adjust to identify ‘b’ and ‘c’ too
0.0
1.0
0.0
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3

Neural Network
0.0
1.0
0.0
x
1
x2
xn
a
b
c
0.5
0.
8
0.
20.
3
0.2
1
0.2
0.6
1.0
1.9
0.3
The weights are further adjust
to identify ‘b’ and ‘c’ too

Neural Network
0.0
00
1.0
x
1
x2
xn
a
b
c
0.4
0.
3
0.
20.
3
0.2
1
0.2
0.7
0.5
0.9
1.3
The weights are further adjust
to identify ‘b’ and ‘c’ too

Neural Network
0.0
00
1.0
x
1
x2
xn
a
b
c
0.4
0.
3
0.
20.
3
0.2
1
0.7
0.5
0.9
1.3
Thus, through gradient descent
and backpropagation, our
network is completely trained
0.2

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn

Similar to Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn (20)

More from Simplilearn

More from Simplilearn (20)

Recently uploaded

Recently uploaded (20)

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn

Editor's Notes