This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
3. The handwritten alphabets are present as images of 28*28 pixels
y
1
y2
y3
x
1
x2
xn
a
b
c
Neural Network
28
28
4. The 784 pixels are fed as input to the first layer of our neural
network
y
1
y2
y3
x
1
x2
xn
a
b
c
28*28=784
Neural Network
784
neurons
28
28
5. The initial prediction is made using the random weights assigned
to each channel
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28*28=784
Neural Network
28
28
6. Our network predicts the input to be ‘b’ with a probability of 0.5
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
Neural Network
28
7. The predicted probabilities are compared against the actual
probabilities and the error is calculated
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
0.9
0.0
0.0
actual probabilities
+0.6
-0.5
-0.2
error = actual - prediction
Neural Network
28
8. The magnitude indicates the amount of change while the sign
indicates an increase or decrease in the weights
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
actual probabilities
error = actual - prediction
Neural Network
0.0
0.0
-0.5
-0.2
28
0.9 +0.6
9. The information is transmitted back through the network
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
actual probabilities
error = actual - prediction
Neural Network
0.0
0.0
-0.5
-0.2
28
0.9 +0.6
10. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.6
0.2
0.0
x
1
x2
xn
a
b
c
28*28=784
actual probabilities
+0.3
-0.2
0.0
error = actual - prediction
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
Neural Network
0.0
0.0
28
0.9
11. a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
Neural Network
+0.2
-0.1
0.0
In this manner, we keep training the network with multiple inputs
until it is able to predict with high accuracy
0.7
0.
1
0.0
x
1
x2
xn
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
0.0
0.0
0.9
12. a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
Neural Network
0.0
0.0
0.0
In this manner, we keep training the network with multiple inputs
until it is able to predict with high accuracy
0.9
0.
1
0.0
x
1
x2
xn
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
0.0
0.0
0.9
14. Here’s a straightforward dataset. Let’s build a neural network to
predict the outputs, given the inputs
Input Output
0
1
2
3
4
0
6
1
2
1
8
2
4
Example
27. Example
Input Output
Neural Network
x y
x*wW=9
We, as humans, can know just by a look at the data that our weight
should be 6. But how does the machine come to this conclusion?
28. Loss function
The loss function is a measurement of error which defines the
precision lost on comparing the predicted output to the actual output
loss = [(actual output) – (predicted output)]2
29. Loss function
Let’s apply the loss
function to input
value “2”
Input Actual Output W=3 W=6 W=9
2 12 6 12 18
---loss (12-6)2 = 36 (12-12)2 = 0 (12-18)2 = 36Los
s
---
33. A random point on this curve is chosen
and the slope at this point is calculated
Gradient descent
34. A random point on this curve is chosen
and the slope at this point is calculated
A positive slope indicates an increase in
weight
Gradient descent
35. This time the slope is negative. Hence,
another random point towards its left is
chosen
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
Gradient descent
36. Gradient descent
loss
This time the slope is negative. Hence,
another random point towards its left is
chosen
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
We continue checking
slopes at various points in
this manner
37. Our aim is to reach a point where the
slope is zero
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
A zero slope indicates the appropriate
weight
Gradient descent
38. Our aim is to reach a point where the
slope is zero
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
A zero slope indicates the appropriate
weight
Gradient descent
42. Backpropagation
A random point on the graph gives a loss value of 36 with a
positive slope
We continue checking
slopes at various points in
this manner
A random point on the graph gives a loss value of 36 with a
positive slope
36 is quite a large number. This means our current weight
needs to change by a large number
A positive slope indicates that the change in weight must
be positive
43. Backpropagation
A random point on the graph gives a loss value of 36 with a
positive slope
We continue checking
slopes at various points in
this manner
Similarly, another random point on the graph gives a loss value of
10 with a negative slope
10 is a small number. Hence, the weight requires to be
tuned quite less
A negative slope indicates that the weight needs to be
reduced rather than increased
45. Backpropagation
After multiple iterations of backpropagation, our weights are
assigned the appropriate value
Input Output
x y
x*6
At this point, our network is
trained and can be used to
make predictions
46. Backpropagation
After multiple iterations of backpropagation, our weights are
assigned the appropriate value
Input Output
x y
x*6
Let’s now get back to our first example and see where backpropagation
and gradient descent fall into place
47. As mentioned earlier, our predicted output is compared against
the actual output
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28
28*28=784
1.
0
0.0
0.0
actual probabilities
+0.7
-0.5
-0.2
error = actual - prediction
Neural Network
48. As mentioned earlier, our predicted output is compared against
the actual output
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28
28*28=784
1.
0
0.0
0.0
actual probabilities
error = actual - prediction
Neural Network
+0.7
-0.5
-0.2
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration
49. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.6
0.2
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
+0.4
-0.2
-0.1
error = actual - prediction
0.2
0.
8
1.
30.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.7
Neural Network
1.
0
0.0
0.0
50. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.6
0.2
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
8
1.
30.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.7
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 = 0.04
loss(c) 0.12 = 0.01
+0.4
-0.2
-0.1
51. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
+0.2
-0.1
-0.1
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
52. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
53. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
Let’s focus on finding the
minimum loss for our
variable ‘a’
54. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
Let’s focus on finding the
minimum loss for our
variable ‘a’
55. Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
And here is where gradient
descent comes into the
picture
56. Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
Let’s assume the below to be our graph for the loss of
prediction with variable a as compared to the weights
contributing to it from the second last layer
57. Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
Random points chosen on the graph is now backpropagated
through the network in order to adjust the weights
0.8
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
59. Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
This process is repeated multiple times till it provides
accurate predictions
1.0
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
60. Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
This process is repeated multiple times till it provides
accurate predictions
1.0
0.0
0.0
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3