SlideShare a Scribd company logo
1 of 22
Deep Neural Networks & Computational Graphs
By
P Revanth Kumar
Research Scholar,
IFHE Hyderabad.
Objective
• To improve the performance of a Deep Learning model. The goal is to the
reduce the optimization function which can be divided based on the
classification and the regression problems.
Agenda
• Deep Learning
• How Neural Network Work
• Activation function
• Neural Network with Back Propagation
• What is Chain rule
• Chain rule in back propagation
• Computation Graph
Deep Learning
• Deep learning is a technique which basically mimics the human brain.
• So, the Scientist and Researchers taught can we make machine learn in the
same way so, their is where deep learning concept came that lead to the
invention called neural network.
• The 1st simplest type of neural network is called perceptron.
• There was some problems in the perceptron because the perceptron not
able to learn properly because the concepts they applied.
• But later on in 1980’s Geoffrey Hinton he invented concept called
backpropagation. So, the ANN, CNN, RNN became efficient that many
companies are using it, developed lot of applications.
• 𝑓1, 𝑓2, 𝑓3 are my input features
• This resembles the ANN
• If it is a multi classification: more than one node can be specified
• If it is a binary classification: only one node need to be specified
How Neural Network Work
• Features 𝑥1, 𝑥2, 𝑥3 for my input layer. I want to determine binary
classification.
• Now, let us understand what kind of process does hidden layer do and
what is the importance of 𝑤1, 𝑤2, 𝑤3 (weights).
• As soon as the inputs are given they will get multiplied with respective
weights which are intern inputs for hidden layer
• The activation function will trigger.
• When 𝑤1, 𝑤2, 𝑤3 are assigned, the weights passes to the hidden neuron.
Then two types of operation usually happen.
• Step 1: The summation of weights and the inputs
i=1
n
WiXi
y= 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3
• Step 2: Before activation function the bias will be added and summation
follows
y= 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3+𝑏𝑖 (1)
z= Act (y) * Sigmoid function
z= z × 𝑤4
• If it is a classification problem then 0 or 1 will be obtained.
• This is an example of forward propagation.
Activation function
• The activation function is a mathematical “gate” in between the input
feeding the current neuron and its output going to the next layer. It can be
as simple as a step function that turns the neuron output on and off
depending on a rule or threshold.
• Sigmoid Function = σ 𝑋 =
1
1+𝑒−𝑦 ; y= 𝑖=1
𝑛
𝑊𝑖 𝑋𝑖+𝑏𝑖
• This will transform the value between 0 or 1. If it is < 0.5 considered as 0.
Here 0.5 is the threshold.
Neural Network with Back Propagation
• Let us consider a dataset
• Forward propagation: Let Inputs are 𝑥1, 𝑥2, 𝑥3. These inputs will pass to
neuron. Then 2 important operations will take place
y= [𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3]+𝑏𝑖
z= Act (y) * Sigmoid Activation function
𝒙 𝟏 𝒙 𝟐 𝒙 𝟑 O/P
Play Study Sleep y
2h 4h 8h 1
*Only one hidden neuron is considered for training example
• y is the predicted output; Suppose y is predicted as 0, but, we know that we
need to compare to check whether the y and y are almost same.
As current record y=1.
• The difference can be found by loss function
Loss = (𝑦 − 𝑦)2
= (1-0)2
= 1
• Here loss value is higher and completely predicted wrong.
• Now, the weights are to be adjusted in such a way that my predicted output
should be 1.
• This is basically done by using Optimizer. To reduce the loss value back
propagate need to be used.
Back Propagation: While doing back propagation these weights will get
w4new
= w4 𝑜𝑙𝑑
− α
𝜕L
𝜕w4
• Here learning rate α should be minimal value = 0.001.
• This small learning rate will help to reach global minima in the gradient
descent. Which is possible only with the optimizer.
• After updating 𝑤4, the other weights 𝑤1, 𝑤2, 𝑤3 need to be updated
respectively.
w3new
= w3 𝑜𝑙𝑑
− α
𝜕L
𝜕w3
• Once the values are updated, the forward propagation will start. It will
iterate to such a point the loss value will completely reduce to 𝑦 = 𝑦.
• Since there is a single record value defined with Loss function. If there are
multiple records the Cost function need to be defined.
i=1
n
(y − y)2
What is Chain Rule
• Chain Rule: Suppose u is a differentiable function of 𝑥1, 𝑥2, … 𝑥 𝑛 and each 𝑥𝑗 is a
differentiable function of 𝑡1 , 𝑡2, … 𝑡 𝑛. Then u is a function of 𝑡1 , 𝑡2, … 𝑡 𝑛 and the
the partial derivative u with respect to t is
𝜕𝑢
𝜕𝑡1
=
𝜕𝑢
𝜕𝑥1
𝜕𝑥1
𝜕𝑡1
+
𝜕𝑢
𝜕𝑥2
𝜕𝑥2
𝜕𝑡1
+ … +
𝜕𝑢
𝜕𝑥 𝑛
𝜕𝑥 𝑛
𝜕𝑡 𝑛
Chain Rule in Back Propagation
• Suppose the inputs are 𝑥1, 𝑥2, 𝑥3, 𝑥4 which are getting connected with two
hidden layers. In hidden layer one there are 3 neurons and in the hidden
layer two there 2 neurons.
• The best way to define the hidden layer is 𝑤11
1
for 1st hidden layer and 𝑤11
2
for the 2nd hidden layer.
• Let us update the weights; 𝑤11 𝑛𝑒𝑤
3
= 𝑤11 𝑜𝑙𝑑
3
− α
𝜕L
𝜕𝑤11
3
𝑤11
3
need to be updated in the back propagation, what we do is that we get a
𝑦 we get a loss value now, when we back propagate we update the weights.
• Now, we see how to find derivative 𝜕L
𝜕𝑤11
3 .This basically indicates the slope
and how it is related to chain rule.
•
𝜕L
𝜕𝑤11
3 can be written as
• The weight w11
3
will impact the output O31. Since it impact output 𝑂31 this
can be write as
𝜕L
𝜕𝑤11
3 =
𝜕𝐿
𝜕𝑂31
×
𝜕𝑂31
𝜕𝑤11
3 this is basically a chain rule
• Suppose, to find the derivative of w21
3
𝜕L
𝜕𝑤21
3 =
𝜕𝐿
𝜕𝑂31
×
𝜕𝑂31
𝜕𝑤21
3
• To find the derivative of w11
2
•
𝜕L
𝜕𝑤11
2 =
𝜕𝐿
𝜕𝑂31
×
𝜕𝑂31
𝜕𝑂21
×
𝜕𝑂21
𝜕𝑤11
2
• To find 𝑤12
2
because there are 2 output layers are impacting 𝑓21, 𝑓22.
• After finding the derivative adding one more derivative [
𝜕𝐿
𝜕𝑂31
×
𝜕𝑂31
𝜕𝑂21
×
𝜕𝑂21
𝜕𝑤11
2 ] +
[
𝜕𝐿
𝜕𝑂31
×
𝜕𝑂31
𝜕𝑂22
×
𝜕𝑂22
𝜕𝑤12
2 ]
• When this derivative is updated basically weights are getting updated then
𝑦 going to change until we reach global minima.
Computational Graphs
• Directed graph where the nodes correspond to mathematical operations.
• Way of expressing and evaluating a mathematical expression
• Example 1:
• Mathematical Equation: p = x + y
+
x
y
p
Back Propagation Algorithm
• Objective : Compute the gradients for each input with respect to the final
output.
• These gradients are essential for training the neural network using
gradient descent.
Desired Gradients:
𝜕𝑥
𝜕𝑓
,
𝜕𝑦
𝜕𝑓
,
𝜕𝑧
𝜕𝑓
x, y, z are the inputs.
G is the output.
• Step 1:
- Finding the derivative of output with respect to output itself
- This will result to the identity derivation and value is equal to one
𝜕𝑔
𝜕𝑔
= 1
• Computational graph
• Step 2
- Backward pass through the “*” operation.
- Calculation of gradients at nodes p and z. Since g=p*z
We know that
𝜕𝑔
𝜕𝑧
= 𝑝;
𝜕𝑔
𝜕𝑝
= 𝑧
From forward pass we get p and z as 4 and -3.
Hence, 𝜕𝑔
𝜕𝑧
= 𝑝 = 4 (1)
𝜕𝑔
𝜕𝑝
= 𝑧 = -3 (2)
• Step 3
Calculation of gradients at x and y.
𝜕𝑔
𝜕𝑥
,
𝜕𝑔
𝜕𝑦
• From chain rule:
𝜕𝑔
𝜕𝑥
=
𝜕𝑔
𝜕𝑝
∗
𝜕𝑝
𝜕𝑥
𝜕𝑔
𝜕𝑦
=
𝜕𝑔
𝜕𝑝
∗
𝜕𝑝
𝜕𝑦
dg/dp = -3 from (2)
Hence p = x + y = 𝜕𝑝
𝜕𝑥
= 1;
𝜕𝑝
𝜕𝑦
= 3
For input x
𝜕𝑔
𝜕𝑥
=
𝜕𝑔
𝜕𝑝
∗
𝜕𝑝
𝜕𝑥
= -3*1 = -3
For input y
𝜕𝑔
𝜕𝑦
=
𝜕𝑔
𝜕𝑝
∗
𝜕𝑝
𝜕𝑦
= -3*-3 = 9

More Related Content

What's hot

The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
ESCOM
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Hebbian Learning
Hebbian LearningHebbian Learning
Hebbian Learning
ESCOM
 
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networks
Sivagowry Shathesh
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
butest
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
ariffast
 

What's hot (20)

Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Unit 1
Unit 1Unit 1
Unit 1
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning Rule
 
Hebbian Learning
Hebbian LearningHebbian Learning
Hebbian Learning
 
Perceptron
PerceptronPerceptron
Perceptron
 
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networks
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyay
 
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back Propagation
 
Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
Introduction to Neural networks (under graduate course) Lecture 2 of 9
Introduction to Neural networks (under graduate course) Lecture 2 of 9Introduction to Neural networks (under graduate course) Lecture 2 of 9
Introduction to Neural networks (under graduate course) Lecture 2 of 9
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
 

Similar to Deep neural networks & computational graphs

Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 

Similar to Deep neural networks & computational graphs (20)

Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Neural Netwoks
Introduction to Neural Netwoks Introduction to Neural Netwoks
Introduction to Neural Netwoks
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLP
 
CS767_Lecture_04.pptx
CS767_Lecture_04.pptxCS767_Lecture_04.pptx
CS767_Lecture_04.pptx
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Machine learning by using python lesson 2 Neural Networks By Professor Lili S...
Machine learning by using python lesson 2 Neural Networks By Professor Lili S...Machine learning by using python lesson 2 Neural Networks By Professor Lili S...
Machine learning by using python lesson 2 Neural Networks By Professor Lili S...
 
Lec 3-4-5-learning
Lec 3-4-5-learningLec 3-4-5-learning
Lec 3-4-5-learning
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Unit 6: Application of AI
Unit 6: Application of AIUnit 6: Application of AI
Unit 6: Application of AI
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptx
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

Deep neural networks & computational graphs

  • 1. Deep Neural Networks & Computational Graphs By P Revanth Kumar Research Scholar, IFHE Hyderabad.
  • 2. Objective • To improve the performance of a Deep Learning model. The goal is to the reduce the optimization function which can be divided based on the classification and the regression problems.
  • 3. Agenda • Deep Learning • How Neural Network Work • Activation function • Neural Network with Back Propagation • What is Chain rule • Chain rule in back propagation • Computation Graph
  • 4. Deep Learning • Deep learning is a technique which basically mimics the human brain. • So, the Scientist and Researchers taught can we make machine learn in the same way so, their is where deep learning concept came that lead to the invention called neural network. • The 1st simplest type of neural network is called perceptron. • There was some problems in the perceptron because the perceptron not able to learn properly because the concepts they applied. • But later on in 1980’s Geoffrey Hinton he invented concept called backpropagation. So, the ANN, CNN, RNN became efficient that many companies are using it, developed lot of applications.
  • 5. • 𝑓1, 𝑓2, 𝑓3 are my input features • This resembles the ANN • If it is a multi classification: more than one node can be specified • If it is a binary classification: only one node need to be specified
  • 6. How Neural Network Work • Features 𝑥1, 𝑥2, 𝑥3 for my input layer. I want to determine binary classification. • Now, let us understand what kind of process does hidden layer do and what is the importance of 𝑤1, 𝑤2, 𝑤3 (weights).
  • 7. • As soon as the inputs are given they will get multiplied with respective weights which are intern inputs for hidden layer • The activation function will trigger. • When 𝑤1, 𝑤2, 𝑤3 are assigned, the weights passes to the hidden neuron. Then two types of operation usually happen. • Step 1: The summation of weights and the inputs i=1 n WiXi y= 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3
  • 8. • Step 2: Before activation function the bias will be added and summation follows y= 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3+𝑏𝑖 (1) z= Act (y) * Sigmoid function z= z × 𝑤4 • If it is a classification problem then 0 or 1 will be obtained. • This is an example of forward propagation.
  • 9. Activation function • The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off depending on a rule or threshold. • Sigmoid Function = σ 𝑋 = 1 1+𝑒−𝑦 ; y= 𝑖=1 𝑛 𝑊𝑖 𝑋𝑖+𝑏𝑖 • This will transform the value between 0 or 1. If it is < 0.5 considered as 0. Here 0.5 is the threshold.
  • 10. Neural Network with Back Propagation • Let us consider a dataset • Forward propagation: Let Inputs are 𝑥1, 𝑥2, 𝑥3. These inputs will pass to neuron. Then 2 important operations will take place y= [𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3]+𝑏𝑖 z= Act (y) * Sigmoid Activation function 𝒙 𝟏 𝒙 𝟐 𝒙 𝟑 O/P Play Study Sleep y 2h 4h 8h 1 *Only one hidden neuron is considered for training example
  • 11. • y is the predicted output; Suppose y is predicted as 0, but, we know that we need to compare to check whether the y and y are almost same. As current record y=1. • The difference can be found by loss function Loss = (𝑦 − 𝑦)2 = (1-0)2 = 1
  • 12. • Here loss value is higher and completely predicted wrong. • Now, the weights are to be adjusted in such a way that my predicted output should be 1. • This is basically done by using Optimizer. To reduce the loss value back propagate need to be used. Back Propagation: While doing back propagation these weights will get w4new = w4 𝑜𝑙𝑑 − α 𝜕L 𝜕w4
  • 13. • Here learning rate α should be minimal value = 0.001. • This small learning rate will help to reach global minima in the gradient descent. Which is possible only with the optimizer. • After updating 𝑤4, the other weights 𝑤1, 𝑤2, 𝑤3 need to be updated respectively. w3new = w3 𝑜𝑙𝑑 − α 𝜕L 𝜕w3 • Once the values are updated, the forward propagation will start. It will iterate to such a point the loss value will completely reduce to 𝑦 = 𝑦. • Since there is a single record value defined with Loss function. If there are multiple records the Cost function need to be defined. i=1 n (y − y)2
  • 14. What is Chain Rule • Chain Rule: Suppose u is a differentiable function of 𝑥1, 𝑥2, … 𝑥 𝑛 and each 𝑥𝑗 is a differentiable function of 𝑡1 , 𝑡2, … 𝑡 𝑛. Then u is a function of 𝑡1 , 𝑡2, … 𝑡 𝑛 and the the partial derivative u with respect to t is 𝜕𝑢 𝜕𝑡1 = 𝜕𝑢 𝜕𝑥1 𝜕𝑥1 𝜕𝑡1 + 𝜕𝑢 𝜕𝑥2 𝜕𝑥2 𝜕𝑡1 + … + 𝜕𝑢 𝜕𝑥 𝑛 𝜕𝑥 𝑛 𝜕𝑡 𝑛
  • 15. Chain Rule in Back Propagation • Suppose the inputs are 𝑥1, 𝑥2, 𝑥3, 𝑥4 which are getting connected with two hidden layers. In hidden layer one there are 3 neurons and in the hidden layer two there 2 neurons. • The best way to define the hidden layer is 𝑤11 1 for 1st hidden layer and 𝑤11 2 for the 2nd hidden layer.
  • 16. • Let us update the weights; 𝑤11 𝑛𝑒𝑤 3 = 𝑤11 𝑜𝑙𝑑 3 − α 𝜕L 𝜕𝑤11 3 𝑤11 3 need to be updated in the back propagation, what we do is that we get a 𝑦 we get a loss value now, when we back propagate we update the weights. • Now, we see how to find derivative 𝜕L 𝜕𝑤11 3 .This basically indicates the slope and how it is related to chain rule. • 𝜕L 𝜕𝑤11 3 can be written as • The weight w11 3 will impact the output O31. Since it impact output 𝑂31 this can be write as 𝜕L 𝜕𝑤11 3 = 𝜕𝐿 𝜕𝑂31 × 𝜕𝑂31 𝜕𝑤11 3 this is basically a chain rule
  • 17. • Suppose, to find the derivative of w21 3 𝜕L 𝜕𝑤21 3 = 𝜕𝐿 𝜕𝑂31 × 𝜕𝑂31 𝜕𝑤21 3 • To find the derivative of w11 2 • 𝜕L 𝜕𝑤11 2 = 𝜕𝐿 𝜕𝑂31 × 𝜕𝑂31 𝜕𝑂21 × 𝜕𝑂21 𝜕𝑤11 2 • To find 𝑤12 2 because there are 2 output layers are impacting 𝑓21, 𝑓22. • After finding the derivative adding one more derivative [ 𝜕𝐿 𝜕𝑂31 × 𝜕𝑂31 𝜕𝑂21 × 𝜕𝑂21 𝜕𝑤11 2 ] + [ 𝜕𝐿 𝜕𝑂31 × 𝜕𝑂31 𝜕𝑂22 × 𝜕𝑂22 𝜕𝑤12 2 ] • When this derivative is updated basically weights are getting updated then 𝑦 going to change until we reach global minima.
  • 18. Computational Graphs • Directed graph where the nodes correspond to mathematical operations. • Way of expressing and evaluating a mathematical expression • Example 1: • Mathematical Equation: p = x + y + x y p
  • 19. Back Propagation Algorithm • Objective : Compute the gradients for each input with respect to the final output. • These gradients are essential for training the neural network using gradient descent. Desired Gradients: 𝜕𝑥 𝜕𝑓 , 𝜕𝑦 𝜕𝑓 , 𝜕𝑧 𝜕𝑓 x, y, z are the inputs. G is the output.
  • 20. • Step 1: - Finding the derivative of output with respect to output itself - This will result to the identity derivation and value is equal to one 𝜕𝑔 𝜕𝑔 = 1 • Computational graph
  • 21. • Step 2 - Backward pass through the “*” operation. - Calculation of gradients at nodes p and z. Since g=p*z We know that 𝜕𝑔 𝜕𝑧 = 𝑝; 𝜕𝑔 𝜕𝑝 = 𝑧 From forward pass we get p and z as 4 and -3. Hence, 𝜕𝑔 𝜕𝑧 = 𝑝 = 4 (1) 𝜕𝑔 𝜕𝑝 = 𝑧 = -3 (2) • Step 3 Calculation of gradients at x and y. 𝜕𝑔 𝜕𝑥 , 𝜕𝑔 𝜕𝑦
  • 22. • From chain rule: 𝜕𝑔 𝜕𝑥 = 𝜕𝑔 𝜕𝑝 ∗ 𝜕𝑝 𝜕𝑥 𝜕𝑔 𝜕𝑦 = 𝜕𝑔 𝜕𝑝 ∗ 𝜕𝑝 𝜕𝑦 dg/dp = -3 from (2) Hence p = x + y = 𝜕𝑝 𝜕𝑥 = 1; 𝜕𝑝 𝜕𝑦 = 3 For input x 𝜕𝑔 𝜕𝑥 = 𝜕𝑔 𝜕𝑝 ∗ 𝜕𝑝 𝜕𝑥 = -3*1 = -3 For input y 𝜕𝑔 𝜕𝑦 = 𝜕𝑔 𝜕𝑝 ∗ 𝜕𝑝 𝜕𝑦 = -3*-3 = 9