SlideShare a Scribd company logo
APPLIED MACHINE LEARNING: FINALS
Course Code: CS501
DEEP NEURAL NETWORKS & COMPUTATIONAL GRAPHS
Name Puranam Revanth Kumar
(Research Scholar)
Roll No. 19STRCHH01004
Question
1. Deep Neural Networks & Computational Graphs
(a) Explain the Concept - derivatives, partial derivatives, optimization, training set,
activation functions etc.
(b) Give simple examples of Chain Rule then generalize - assume all activation func-
tions have partial derivatives.
(c) Demonstrate on simple example such as Sigmoid activation functions.
ii
Contents
Question ii
1 Deep Neural Networks & Computational Graphs 1
1.1 Explain the Concept - derivatives, partial derivatives, optimization, training
set, activation functions etc. . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Give simple examples of Chain Rule then generalize - assume all activation
functions have partial derivatives . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Chain Rule in Back Propagation . . . . . . . . . . . . . . . . . . . 9
1.3 Demonstrate on simple example such as Sigmoid activation functions . . . 11
Summary 13
iii
List of Figures
1.1 Artificial neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 A simple computational graph with two nodes . . . . . . . . . . . . . . . . 2
1.3 Illustration of chain rule in computational graphs: The products of node-
specific partial derivatives along paths from weight w to output o are aggre-
gated. The resulting value yields the derivative of output o with respect to
weight w. Only two paths between input and output exist in this simplified
example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 plot a function f(a)=3a . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 function f(a) with slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Sigmoid Activation function . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Hyperbolic Tangent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 ReLU function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.9 Leaky ReLU function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.10 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.11 Neural network with two hidden layers . . . . . . . . . . . . . . . . . . . . 9
1.12 Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.13 DNN using Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . 11
iv
1. Deep Neural Networks & Computational Graphs
Deep learning is a technique which basically mimics the human brain. So, the Scientist
and Researchers taught can we make machine learn in the same way so, their is where deep
learning concept came that lead to the invention called neural network. The first simplest
type of neural network is called perceptron. There was some problems in the perceptron
because the perceptron not able to learn properly because the concepts they applied, but later
on in 1980’s Geoffrey Hinton he invented concept called backpropagation[1]. So, the ANN,
CNN, RNN became efficient that many companies are using it, developed lot of applications.
An artificial neural network computes a function of the inputs by propagating the com-
puted values from the input neurons to the output neuron(s) and using the weights as inter-
mediate parameters. Learning occurs by changing the weights connecting the neurons [2, 3].
Just as external stimuli are needed for learning in biological organisms, the external stimulus
in artificial neural networks is provided by the training data containing examples of input-
output pairs of the function to be learned. For example, the training data might contain pixel
representations of images (input) and their annotated labels (e.g., cat, dog) as the output.
These training data pairs are fed into the neural network by using the input representations
to make predictions about the output labels.
Figure 1.1: Artificial neural network
Here,
f1, f2, f3 are my input features.
• If it is a multi classification: more than one node can be specified.
• If it is a binary classification: only one node need to be specified.
The training data provides feedback to the correctness of the weights in the neural network
depending on how well the predicted output (e.g., probability of cat) for a particular input
matches the annotated output label in the training data. One can view the errors made by the
1
neural network in the computation of a function as a kind of unpleasant feedback in a bio-
logical organism, leading to an adjustment in the synaptic strengths. Similarly, the weights
between neurons are adjusted in a neural network in response to prediction errors. The goal
of changing the weights is to modify the computed function to make the predictions more
correct in future iterations. Therefore, the weights are changed carefully in a mathematically
justified way so as to reduce the error in computation on that example. By successively ad-
justing the weights between neurons over many input-output pairs, the function computed by
the neural network is refined over time so that it provides more accurate predictions.
Computational Graphs
A neural network is a computational graph, in which a unit of computation is the neu-
ron. Neural networks are fundamentally more powerful than their building blocks because
the parameters of these models are learned jointly to create a highly optimized composition
function of these models [4]. Furthermore, the nonlinear activations between the different
layers add to the expressive power of the network. A multilayer network evaluates compo-
sitions of functions computed at individual nodes. A path of length 2 in the neural network
in which the function f(·) follows g(·) can be considered a composition function f(g(·)). Just
to provide an idea, let us look at a trivial computational graph with two nodes, in which the
sigmoid function is applied at each node to the input weight w. In such a case, the computed
function appears as follows:
f(g(w)) =
1
1 + exp[− 1
1+exp(w)
]
The resulting iterative approach is dynamic programming, and the corresponding update is
really the chain rule of differential calculus. In order to understand how the chain rule works
in a computational graph, we will discuss the two basic variants of the rule that one needs to
keep in mind. The simplest version of the chain rule works for a straightforward composition
of the functions.
∂f(g(w))
∂w
=
∂f(g(w))
∂g(w)
∂g(w)
∂w
Figure 1.2: A simple computational graph with two nodes
Consider a sequence of hidden units h1, h2, ..., hk followed by output o, with respect to
which the loss function L is computed. Furthermore, assume that the weight of the connec-
tion from hidden unit hr to hr+1 is w(hr,hr+1). Then, in the case that a single path exists from
h1 to o, one can derive the gradient of the loss function with respect to any of these edge
weights using the chain rule:
∂L
∂w(hr−1,hr)
=
∂L
∂o
"
∂o
∂hk
k−1
Y
i=r
∂hi+1
∂hi
#
∂hr
∂w(hr−1,hr)
∀r ∈ 1...k
2
Figure 1.3: Illustration of chain rule in computational graphs: The products of node-specific
partial derivatives along paths from weight w to output o are aggregated. The resulting value
yields the derivative of output o with respect to weight w. Only two paths between input and
output exist in this simplified example.
∂o
∂w
=
∂o
∂p
∂p
∂w
+
∂o
∂q
∂q
∂w
[Multivariable Chain Rule]
=
∂o
∂p
∂p
∂y
∂y
∂w
+
∂o
∂q
∂q
∂z
∂z
∂w
[Univariate Chain Rule]
=
∂K(p, q)
∂p
g
0
(y) f
0
(w) +
∂K(p, q)
∂q
h
0
(z) f
0
(w)
First path Second path
3
1.1 Explain the Concept - derivatives, partial derivatives,
optimization, training set, activation functions etc.
(a) Derivatives: The derivative of a function of a single variable at a chosen input value,
when it exists, is the slope of the tangent line to the graph of the function at that point [5].
Example: Let us plot here the function f(a) = 3a. So, it’s just a straight line.
Figure 1.4: plot a function f(a)=3a
Let say that a = 2. In that case, f(a), which is equal to 3 times a is equal to f(a) = 6. Now, i
am going to just bump up a, a little bit, so that it is now 2.001 just plot this into scale, 2.001,
this 0.001 difference is too small to show on this plot.
Now,just give a little nudge to that right f(a), is equal to three times that. So, it’s 6.003, so
we plot this over here.
Figure 1.5: function f(a) with slope
If you look at this little triangle the slope or derivative of a function f(a) at a = 2 is 3.
The term derivative basically means slope, formally slop is defined as height / width which
is = 3.
4
Now,
df(a)
da
= 3 ;
d
da
f(a).
(b) Partial derivatives: Finding the gradient is essentially finding the derivative of the
function. There are many independent variables that we can tweak (all the weights and bi-
ases), we have to find the derivatives with respect to each variable. This is known as the
partial derivative, with the symbol ∂.
Computing the partial derivative of simple functions is easy: simply treat every other vari-
able in the equation as a constant and find the usual scalar derivative.
(c) Optimization: The Optimization choose inputs that result in best possible outputs. Op-
timizers are algorithms or methods used to change the attributes of your neural network such
as weights and learning rate in order to reduce the losses.
How you should change your weights or learning rates of your neural network to reduce the
losses is defined by the optimizers you use.
Example: θ1 := θ1 − α ∂
∂θ1
; If α is too small, the gradient decent can be slow.
Is α is too large, gradient decent can overshoot the minimum. It may fail to converge or even
diverge.
5
(d) Training set: Training set is a set of pairs of input patterns with corresponding de-
sired output patterns. Each pair represents how the network is supposed to respond to a
particular input.
There are two approaches to training - supervised and unsupervised. Supervised training
involves a mechanism of providing the network with the desired output either by manually
"grading" the network’s performance or by providing the desired outputs with the inputs.
Unsupervised training is where the network has to make sense of the inputs without outside
help.
(e) Activation function: The activation function is a mathematical “gate” in between the
input feeding the current neuron and its output going to the next layer. It can be as simple as
a step function that turns the neuron output on and off, depending on a rule or threshold [7].
Figure 1.6: Sigmoid Activation function
TanH / Hyperbolic Tangent: Zero centered—making it easier to model inputs that have
strongly negative, neutral, and strongly positive values. Otherwise like the Sigmoid function.
Figure 1.7: Hyperbolic Tangent
6
ReLU (Rectified Linear Unit): Computationally efficient—allows the network to con-
verge very quickly Non-linear—although it looks like a linear function, ReLU has a deriva-
tive function and allows for backpropagation.
Figure 1.8: ReLU function
Leaky ReLU: Prevents dying ReLU problem—this variation of ReLU has a small positive
slope in the negative area, so it does enable backpropagation, even for negative input values
Otherwise like ReLU.
Figure 1.9: Leaky ReLU function
Softmax: Able to handle multiple classes only one class in other activation functions—normalizes
the outputs for each class between 0 and 1, and divides by their sum, giving the probability
of the input value being in a specific class.
Useful for output neurons—typically Softmax is used only for the output layer, for neural
networks that need to classify inputs into multiple categories.
7
1.2 Give simple examples of Chain Rule then generalize -
assume all activation functions have partial derivatives
Suppose u is a differentiable function of x1, x2, ..., xn and each xj differentiable function
of t1, t2, ..., tn. Then u is a function of t1, t2, ..., tn and the partial derivative u with respect to
t is [6]:
∂u
∂t1
=
∂u
∂x1
∂x1
∂t1
+
∂u
∂x2
∂x2
∂t1
+ ... +
∂u
∂xn
∂xn
∂tn
(1)
1.2.1 Forward Propagation
In this phase, the inputs for a training instance are fed into the neural network. This results
in a forward cascade of computations across the layers, using the current set of weights. The
final predicted output can be compared to that of the training instance and the derivative of
the loss function with respect to the output is computed. The derivative of this loss now
needs to be computed with respect to the weights in all layers in the backwards phase.
Let Inputs are x1, x2, x3. These inputs will pass to hidden neuron. Then 2 important
operations will take place:
Figure 1.10: Neural Network
Step 1: The summation of weights and the inputs
n
X
i=1
wixi
y = w1x1 + w2x2 + w3x3
Step 2: Before activation function the bias will be added and summation follows:
y = w1x1 + w2x2 + w3x3 + bi
z = Act(y)
z = z × w4
8
1.2.2 Chain Rule in Back Propagation
The main goal of the backward phase is to learn the gradient of the loss function with re-
spect to the different weights by using the chain rule of differential calculus. These gradients
are used to update the weights. Since these gradients are learned in the backward direction,
starting from the output node, this learning process is referred to as the backward phase.
Suppose the inputs are x1, x2, x3, x4 which are getting connected with two hidden layers.
In hidden layer one there are 3 neurons and in the hidden layer two there are 2 neurons. The
best way to define the hidden layer is w1
11for 1st
hidden layer and w2
11or the 2nd
hidden layer
[4].
Figure 1.11: Neural network with two hidden layers
To reduce the loss value back propagate need to be used. While doing back propagation these
weights will get updated.
For a single record value the difference can be found by loss function.
Loss = (y − b
y)2
For multiple records the Cost function need to be defined.
n
X
i=1
(y − b
y)2
where,
w1
11, w2
11, w3
11 are weights,
HL1, HL2 are hidden layers,
O11, O21, O31 are outputs of hidden layer.
9
• Let us update the weights;
w11
3
new = w11
3
old − α
∂L
∂w3
11
• w3
11 need to be updated in the back propagation, what we do is that we get a b
y we get
a loss value now, when we back propagate we update the weights.
• Now, we see how to find derivative ∂L
∂w3
11
. This basically indicates the slope and how it
is related to chain rule.
• ∂L
∂w3
11
can be written as
• The weight w3
11 will impact the output O31. Since it impact output O31 this can be
write as:
∂L
∂w3
11
=
∂L
∂O31
×
∂O31
∂w3
11
this is basically a chain rule.
• Suppose, to find the derivative of w3
21
∂L
∂w3
21
=
∂L
∂O31
×
∂O31
∂w3
21
• To find the derivative of w2
11
∂L
∂w2
11
=
∂L
∂O31
×
∂O31
∂O21
×
∂O21
∂w2
11
• To find w3
11 because there are 2 output layers are impacting f21, f22. After finding the
derivative add one more derivative

∂L
∂O31
×
∂O31
∂O21
×
∂O21
∂w2
11

+

∂L
∂O31
×
∂O31
∂O22
×
∂O22
∂w2
12

• When this derivative is updated basically weights are getting updated then b
y going to
change until we reach global minima.
10
1.3 Demonstrate on simple example such as Sigmoid acti-
vation functions
The activation function is a mathematical “gate” in between the input feeding the current
neuron and its output going to the next layer. It can be as simple as a step function that turns
the neuron output on and off depending on a rule or threshold [3].
σ(x) =
1
1 + e−y
; y =
n
X
i=1
wixi + bi
The inputs can be classified based on the gradient or the slope that we decide as a threshold.
Figure 1.12: Sigmoid function
Here 0.5 is the threshold, any inputs which fall above the given threshold are classified in to
one Cluster and any inputs below the threshold are classified in to another. This will trans-
form the value between 0 or 1. If it is  0.5 considered as 0.
Figure 1.13: DNN using Sigmoid function
11
Nice Property =
dσ(x)
dx
= σ(x)(1 − σ(x))
w0, w1, ..., wn ⇒ weights
x1, x2, ..., xn ⇒ inputs
In the above diagram, the activation function i.e.,the Sigmoid function is applied on sum-
mation, differentiating sigmoid function with respect to x.
Now,
dσ(x)
dx
=
1
(1 + e−x)2
· e−x
=
e−x
(1 + e−x)2
=
1
(1 + e−x)
*
e−x
(1 + e−x)
(sigmoid) (1-sigmoid)
∴
dσ(x)
dx
= σ(x)(1 − σ(x)).
12
Summary
Although a neural network can be viewed as a simulation of the learning process in living
organisms, a more direct understanding of neural networks is as computational graphs. Such
computational graphs perform recursive composition of simpler functions in order to learn
more complex functions. Since these computational graphs are parameterized, the problem
generally boils down to learning the parameters of the graph in order to optimize a loss
function. The simplest types of neural networks are often basic machine learning models
like least-squares regression. The real power of neural networks is unleashed by using more
complex combinations of the underlying functions. The parameters of such networks are
learned by using a dynamic programming method, referred to as backpropagation. There
are several challenges associated with learning neural network models, such as overfitting
and training instability. In recent years, numerous algorithmic advancements have reduced
these problems.Lastly, the mathematical intuition behind forward and back-ward propagation
has been derived in order to show how internally training of dataset is done and how error is
minimized using back-propagation. The design of deep learning methods in specific domains
such as text and images requires carefully crafted architectures.
13
Bibliography
[1] https://en.wikipedia.org/wiki/Geoffrey_Hinton [Cited on page 1]
[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. [Cited
on page 1]
[3] https://www.youtube.com/watch?v=DKSZHN7jftIt=4s [Cited on pages 1 and 11]
[4] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-
propagating errors. Nature, 323 (6088), pp. 533–536, 1986. [Cited on pages 2 and 9]
[5] https://www.coursera.org [Cited on page 4]
[6] https://towardsdatascience.com/understanding-backpropagation-algorithm-
7bb3aa2f95fd [Cited on page 8]
[7] https://missinglink.ai/ [Cited on page 6]
14

More Related Content

What's hot

The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Sagacious IT Solution
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyay
abhishek upadhyay
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning Rule
Noor Ul Hudda Memon
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
SheilaJimenezMorejon
 
Unit 1
Unit 1Unit 1
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
Masayuki Tanaka
 
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
Zaid Al-husseini
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back Propagation
Sung-ju Kim
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
Nagarajan
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
Tamer Ahmed Farrag, PhD
 
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksSivagowry Shathesh
 
Perceptron
PerceptronPerceptron
Perceptron
Nagarajan
 
Chapter3 bp
Chapter3   bpChapter3   bp
Chapter3 bp
kumar tm
 
Backpropagation
BackpropagationBackpropagation
Backpropagationariffast
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
Krish_ver2
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
Dr. C.V. Suresh Babu
 

What's hot (20)

The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyay
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning Rule
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
Unit 1
Unit 1Unit 1
Unit 1
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back Propagation
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networks
 
Perceptron
PerceptronPerceptron
Perceptron
 
Chapter3 bp
Chapter3   bpChapter3   bp
Chapter3 bp
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
 

Similar to APPLIED MACHINE LEARNING

Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
theijes
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
Devashish Patel
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
cscpconf
 
Artificial Neural Network for machine learning
Artificial Neural Network for machine learningArtificial Neural Network for machine learning
Artificial Neural Network for machine learning
2303oyxxxjdeepak
 
Analytical and Systematic Study of Artificial Neural Network
Analytical and Systematic Study of Artificial Neural NetworkAnalytical and Systematic Study of Artificial Neural Network
Analytical and Systematic Study of Artificial Neural Network
IRJET Journal
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 
Mlp trainning algorithm
Mlp trainning algorithmMlp trainning algorithm
Mlp trainning algorithm
Hưng Đặng
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
ijcsa
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
ShwethaShreeS
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification
IJCSEA Journal
 
Lab 6 Neural Network
Lab 6 Neural NetworkLab 6 Neural Network
Lab 6 Neural NetworkKyle Villano
 
Back propagation
Back propagationBack propagation
Back propagation
Nagarajan
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
aciijournal
 
On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...
iosrjce
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
aciijournal
 

Similar to APPLIED MACHINE LEARNING (20)

Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 
honn
honnhonn
honn
 
Neural network
Neural networkNeural network
Neural network
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
 
Ann
Ann Ann
Ann
 
Artificial Neural Network for machine learning
Artificial Neural Network for machine learningArtificial Neural Network for machine learning
Artificial Neural Network for machine learning
 
Analytical and Systematic Study of Artificial Neural Network
Analytical and Systematic Study of Artificial Neural NetworkAnalytical and Systematic Study of Artificial Neural Network
Analytical and Systematic Study of Artificial Neural Network
 
N ns 1
N ns 1N ns 1
N ns 1
 
20120140503023
2012014050302320120140503023
20120140503023
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Mlp trainning algorithm
Mlp trainning algorithmMlp trainning algorithm
Mlp trainning algorithm
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification
 
Lab 6 Neural Network
Lab 6 Neural NetworkLab 6 Neural Network
Lab 6 Neural Network
 
Back propagation
Back propagationBack propagation
Back propagation
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...On The Application of Hyperbolic Activation Function in Computing the Acceler...
On The Application of Hyperbolic Activation Function in Computing the Acceler...
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 

Recently uploaded

LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 

Recently uploaded (20)

LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 

APPLIED MACHINE LEARNING

  • 1. APPLIED MACHINE LEARNING: FINALS Course Code: CS501 DEEP NEURAL NETWORKS & COMPUTATIONAL GRAPHS Name Puranam Revanth Kumar (Research Scholar) Roll No. 19STRCHH01004
  • 2. Question 1. Deep Neural Networks & Computational Graphs (a) Explain the Concept - derivatives, partial derivatives, optimization, training set, activation functions etc. (b) Give simple examples of Chain Rule then generalize - assume all activation func- tions have partial derivatives. (c) Demonstrate on simple example such as Sigmoid activation functions. ii
  • 3. Contents Question ii 1 Deep Neural Networks & Computational Graphs 1 1.1 Explain the Concept - derivatives, partial derivatives, optimization, training set, activation functions etc. . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Give simple examples of Chain Rule then generalize - assume all activation functions have partial derivatives . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.1 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.2 Chain Rule in Back Propagation . . . . . . . . . . . . . . . . . . . 9 1.3 Demonstrate on simple example such as Sigmoid activation functions . . . 11 Summary 13 iii
  • 4. List of Figures 1.1 Artificial neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 A simple computational graph with two nodes . . . . . . . . . . . . . . . . 2 1.3 Illustration of chain rule in computational graphs: The products of node- specific partial derivatives along paths from weight w to output o are aggre- gated. The resulting value yields the derivative of output o with respect to weight w. Only two paths between input and output exist in this simplified example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 plot a function f(a)=3a . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 function f(a) with slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.6 Sigmoid Activation function . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.7 Hyperbolic Tangent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.8 ReLU function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.9 Leaky ReLU function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.10 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.11 Neural network with two hidden layers . . . . . . . . . . . . . . . . . . . . 9 1.12 Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.13 DNN using Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . 11 iv
  • 5. 1. Deep Neural Networks & Computational Graphs Deep learning is a technique which basically mimics the human brain. So, the Scientist and Researchers taught can we make machine learn in the same way so, their is where deep learning concept came that lead to the invention called neural network. The first simplest type of neural network is called perceptron. There was some problems in the perceptron because the perceptron not able to learn properly because the concepts they applied, but later on in 1980’s Geoffrey Hinton he invented concept called backpropagation[1]. So, the ANN, CNN, RNN became efficient that many companies are using it, developed lot of applications. An artificial neural network computes a function of the inputs by propagating the com- puted values from the input neurons to the output neuron(s) and using the weights as inter- mediate parameters. Learning occurs by changing the weights connecting the neurons [2, 3]. Just as external stimuli are needed for learning in biological organisms, the external stimulus in artificial neural networks is provided by the training data containing examples of input- output pairs of the function to be learned. For example, the training data might contain pixel representations of images (input) and their annotated labels (e.g., cat, dog) as the output. These training data pairs are fed into the neural network by using the input representations to make predictions about the output labels. Figure 1.1: Artificial neural network Here, f1, f2, f3 are my input features. • If it is a multi classification: more than one node can be specified. • If it is a binary classification: only one node need to be specified. The training data provides feedback to the correctness of the weights in the neural network depending on how well the predicted output (e.g., probability of cat) for a particular input matches the annotated output label in the training data. One can view the errors made by the 1
  • 6. neural network in the computation of a function as a kind of unpleasant feedback in a bio- logical organism, leading to an adjustment in the synaptic strengths. Similarly, the weights between neurons are adjusted in a neural network in response to prediction errors. The goal of changing the weights is to modify the computed function to make the predictions more correct in future iterations. Therefore, the weights are changed carefully in a mathematically justified way so as to reduce the error in computation on that example. By successively ad- justing the weights between neurons over many input-output pairs, the function computed by the neural network is refined over time so that it provides more accurate predictions. Computational Graphs A neural network is a computational graph, in which a unit of computation is the neu- ron. Neural networks are fundamentally more powerful than their building blocks because the parameters of these models are learned jointly to create a highly optimized composition function of these models [4]. Furthermore, the nonlinear activations between the different layers add to the expressive power of the network. A multilayer network evaluates compo- sitions of functions computed at individual nodes. A path of length 2 in the neural network in which the function f(·) follows g(·) can be considered a composition function f(g(·)). Just to provide an idea, let us look at a trivial computational graph with two nodes, in which the sigmoid function is applied at each node to the input weight w. In such a case, the computed function appears as follows: f(g(w)) = 1 1 + exp[− 1 1+exp(w) ] The resulting iterative approach is dynamic programming, and the corresponding update is really the chain rule of differential calculus. In order to understand how the chain rule works in a computational graph, we will discuss the two basic variants of the rule that one needs to keep in mind. The simplest version of the chain rule works for a straightforward composition of the functions. ∂f(g(w)) ∂w = ∂f(g(w)) ∂g(w) ∂g(w) ∂w Figure 1.2: A simple computational graph with two nodes Consider a sequence of hidden units h1, h2, ..., hk followed by output o, with respect to which the loss function L is computed. Furthermore, assume that the weight of the connec- tion from hidden unit hr to hr+1 is w(hr,hr+1). Then, in the case that a single path exists from h1 to o, one can derive the gradient of the loss function with respect to any of these edge weights using the chain rule: ∂L ∂w(hr−1,hr) = ∂L ∂o " ∂o ∂hk k−1 Y i=r ∂hi+1 ∂hi # ∂hr ∂w(hr−1,hr) ∀r ∈ 1...k 2
  • 7. Figure 1.3: Illustration of chain rule in computational graphs: The products of node-specific partial derivatives along paths from weight w to output o are aggregated. The resulting value yields the derivative of output o with respect to weight w. Only two paths between input and output exist in this simplified example. ∂o ∂w = ∂o ∂p ∂p ∂w + ∂o ∂q ∂q ∂w [Multivariable Chain Rule] = ∂o ∂p ∂p ∂y ∂y ∂w + ∂o ∂q ∂q ∂z ∂z ∂w [Univariate Chain Rule] = ∂K(p, q) ∂p g 0 (y) f 0 (w) + ∂K(p, q) ∂q h 0 (z) f 0 (w) First path Second path 3
  • 8. 1.1 Explain the Concept - derivatives, partial derivatives, optimization, training set, activation functions etc. (a) Derivatives: The derivative of a function of a single variable at a chosen input value, when it exists, is the slope of the tangent line to the graph of the function at that point [5]. Example: Let us plot here the function f(a) = 3a. So, it’s just a straight line. Figure 1.4: plot a function f(a)=3a Let say that a = 2. In that case, f(a), which is equal to 3 times a is equal to f(a) = 6. Now, i am going to just bump up a, a little bit, so that it is now 2.001 just plot this into scale, 2.001, this 0.001 difference is too small to show on this plot. Now,just give a little nudge to that right f(a), is equal to three times that. So, it’s 6.003, so we plot this over here. Figure 1.5: function f(a) with slope If you look at this little triangle the slope or derivative of a function f(a) at a = 2 is 3. The term derivative basically means slope, formally slop is defined as height / width which is = 3. 4
  • 9. Now, df(a) da = 3 ; d da f(a). (b) Partial derivatives: Finding the gradient is essentially finding the derivative of the function. There are many independent variables that we can tweak (all the weights and bi- ases), we have to find the derivatives with respect to each variable. This is known as the partial derivative, with the symbol ∂. Computing the partial derivative of simple functions is easy: simply treat every other vari- able in the equation as a constant and find the usual scalar derivative. (c) Optimization: The Optimization choose inputs that result in best possible outputs. Op- timizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses. How you should change your weights or learning rates of your neural network to reduce the losses is defined by the optimizers you use. Example: θ1 := θ1 − α ∂ ∂θ1 ; If α is too small, the gradient decent can be slow. Is α is too large, gradient decent can overshoot the minimum. It may fail to converge or even diverge. 5
  • 10. (d) Training set: Training set is a set of pairs of input patterns with corresponding de- sired output patterns. Each pair represents how the network is supposed to respond to a particular input. There are two approaches to training - supervised and unsupervised. Supervised training involves a mechanism of providing the network with the desired output either by manually "grading" the network’s performance or by providing the desired outputs with the inputs. Unsupervised training is where the network has to make sense of the inputs without outside help. (e) Activation function: The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold [7]. Figure 1.6: Sigmoid Activation function TanH / Hyperbolic Tangent: Zero centered—making it easier to model inputs that have strongly negative, neutral, and strongly positive values. Otherwise like the Sigmoid function. Figure 1.7: Hyperbolic Tangent 6
  • 11. ReLU (Rectified Linear Unit): Computationally efficient—allows the network to con- verge very quickly Non-linear—although it looks like a linear function, ReLU has a deriva- tive function and allows for backpropagation. Figure 1.8: ReLU function Leaky ReLU: Prevents dying ReLU problem—this variation of ReLU has a small positive slope in the negative area, so it does enable backpropagation, even for negative input values Otherwise like ReLU. Figure 1.9: Leaky ReLU function Softmax: Able to handle multiple classes only one class in other activation functions—normalizes the outputs for each class between 0 and 1, and divides by their sum, giving the probability of the input value being in a specific class. Useful for output neurons—typically Softmax is used only for the output layer, for neural networks that need to classify inputs into multiple categories. 7
  • 12. 1.2 Give simple examples of Chain Rule then generalize - assume all activation functions have partial derivatives Suppose u is a differentiable function of x1, x2, ..., xn and each xj differentiable function of t1, t2, ..., tn. Then u is a function of t1, t2, ..., tn and the partial derivative u with respect to t is [6]: ∂u ∂t1 = ∂u ∂x1 ∂x1 ∂t1 + ∂u ∂x2 ∂x2 ∂t1 + ... + ∂u ∂xn ∂xn ∂tn (1) 1.2.1 Forward Propagation In this phase, the inputs for a training instance are fed into the neural network. This results in a forward cascade of computations across the layers, using the current set of weights. The final predicted output can be compared to that of the training instance and the derivative of the loss function with respect to the output is computed. The derivative of this loss now needs to be computed with respect to the weights in all layers in the backwards phase. Let Inputs are x1, x2, x3. These inputs will pass to hidden neuron. Then 2 important operations will take place: Figure 1.10: Neural Network Step 1: The summation of weights and the inputs n X i=1 wixi y = w1x1 + w2x2 + w3x3 Step 2: Before activation function the bias will be added and summation follows: y = w1x1 + w2x2 + w3x3 + bi z = Act(y) z = z × w4 8
  • 13. 1.2.2 Chain Rule in Back Propagation The main goal of the backward phase is to learn the gradient of the loss function with re- spect to the different weights by using the chain rule of differential calculus. These gradients are used to update the weights. Since these gradients are learned in the backward direction, starting from the output node, this learning process is referred to as the backward phase. Suppose the inputs are x1, x2, x3, x4 which are getting connected with two hidden layers. In hidden layer one there are 3 neurons and in the hidden layer two there are 2 neurons. The best way to define the hidden layer is w1 11for 1st hidden layer and w2 11or the 2nd hidden layer [4]. Figure 1.11: Neural network with two hidden layers To reduce the loss value back propagate need to be used. While doing back propagation these weights will get updated. For a single record value the difference can be found by loss function. Loss = (y − b y)2 For multiple records the Cost function need to be defined. n X i=1 (y − b y)2 where, w1 11, w2 11, w3 11 are weights, HL1, HL2 are hidden layers, O11, O21, O31 are outputs of hidden layer. 9
  • 14. • Let us update the weights; w11 3 new = w11 3 old − α ∂L ∂w3 11 • w3 11 need to be updated in the back propagation, what we do is that we get a b y we get a loss value now, when we back propagate we update the weights. • Now, we see how to find derivative ∂L ∂w3 11 . This basically indicates the slope and how it is related to chain rule. • ∂L ∂w3 11 can be written as • The weight w3 11 will impact the output O31. Since it impact output O31 this can be write as: ∂L ∂w3 11 = ∂L ∂O31 × ∂O31 ∂w3 11 this is basically a chain rule. • Suppose, to find the derivative of w3 21 ∂L ∂w3 21 = ∂L ∂O31 × ∂O31 ∂w3 21 • To find the derivative of w2 11 ∂L ∂w2 11 = ∂L ∂O31 × ∂O31 ∂O21 × ∂O21 ∂w2 11 • To find w3 11 because there are 2 output layers are impacting f21, f22. After finding the derivative add one more derivative ∂L ∂O31 × ∂O31 ∂O21 × ∂O21 ∂w2 11 + ∂L ∂O31 × ∂O31 ∂O22 × ∂O22 ∂w2 12 • When this derivative is updated basically weights are getting updated then b y going to change until we reach global minima. 10
  • 15. 1.3 Demonstrate on simple example such as Sigmoid acti- vation functions The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off depending on a rule or threshold [3]. σ(x) = 1 1 + e−y ; y = n X i=1 wixi + bi The inputs can be classified based on the gradient or the slope that we decide as a threshold. Figure 1.12: Sigmoid function Here 0.5 is the threshold, any inputs which fall above the given threshold are classified in to one Cluster and any inputs below the threshold are classified in to another. This will trans- form the value between 0 or 1. If it is 0.5 considered as 0. Figure 1.13: DNN using Sigmoid function 11
  • 16. Nice Property = dσ(x) dx = σ(x)(1 − σ(x)) w0, w1, ..., wn ⇒ weights x1, x2, ..., xn ⇒ inputs In the above diagram, the activation function i.e.,the Sigmoid function is applied on sum- mation, differentiating sigmoid function with respect to x. Now, dσ(x) dx = 1 (1 + e−x)2 · e−x = e−x (1 + e−x)2 = 1 (1 + e−x) * e−x (1 + e−x) (sigmoid) (1-sigmoid) ∴ dσ(x) dx = σ(x)(1 − σ(x)). 12
  • 17. Summary Although a neural network can be viewed as a simulation of the learning process in living organisms, a more direct understanding of neural networks is as computational graphs. Such computational graphs perform recursive composition of simpler functions in order to learn more complex functions. Since these computational graphs are parameterized, the problem generally boils down to learning the parameters of the graph in order to optimize a loss function. The simplest types of neural networks are often basic machine learning models like least-squares regression. The real power of neural networks is unleashed by using more complex combinations of the underlying functions. The parameters of such networks are learned by using a dynamic programming method, referred to as backpropagation. There are several challenges associated with learning neural network models, such as overfitting and training instability. In recent years, numerous algorithmic advancements have reduced these problems.Lastly, the mathematical intuition behind forward and back-ward propagation has been derived in order to show how internally training of dataset is done and how error is minimized using back-propagation. The design of deep learning methods in specific domains such as text and images requires carefully crafted architectures. 13
  • 18. Bibliography [1] https://en.wikipedia.org/wiki/Geoffrey_Hinton [Cited on page 1] [2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. [Cited on page 1] [3] https://www.youtube.com/watch?v=DKSZHN7jftIt=4s [Cited on pages 1 and 11] [4] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back- propagating errors. Nature, 323 (6088), pp. 533–536, 1986. [Cited on pages 2 and 9] [5] https://www.coursera.org [Cited on page 4] [6] https://towardsdatascience.com/understanding-backpropagation-algorithm- 7bb3aa2f95fd [Cited on page 8] [7] https://missinglink.ai/ [Cited on page 6] 14