SlideShare a Scribd company logo
1 of 60
GRADIENT DESCENT ALGORITHM
FEED FORWARD NEURAL NETWORK
BACK PROPAGATION ALGORITHM
Content
Brief discussion about Machine Learning,Artificial Intelligence and Deep learning
Gradient Descent Algorithm
Feed Forward Neural Network
Back propagation Algorithm
Neural Network
MACHINE LEARNING
MACHINE LEARNING(ML)
Machine learning (ML) is the study of computer algorithms that improve
automatically through experience.
It is seen as a subset of artificial intelligence.
Machine learning algorithms build a model based on sample data, known as
"training data", in order to make predictions or decisions.
ARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCE
Artificial intelligence (AI), is intelligence demonstrated by machines, unlike
the natural intelligence displayed by humans and animals.
The study of "intelligent agents": any device that perceives its environment and
takes actions that maximize its chance of successfully achieving its goals.
 Colloquially, the term "artificial intelligence" is often used to describe machines
(or computers) that mimic "cognitive" functions that humans associate with
the human mind, such as "learning" and "problem solving".
DEEP LEARNING
DEEP LEARNING
Deep learning (also known as deep structured learning) is part of a broader family
of machine learning methods based on artificial neural networks with representation
learning. Learning can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief
networks, recurrent neural networks and convolutional neural networks have been
applied to fields including computer vision, machine vision, speech
recognition, natural language processing, audio recognition, social network
filtering, machine translation, bioinformatics, drug design, medical image analysis,
material inspection and board game programs, where they have produced results
comparable to and in some cases surpassing human expert performance.
FIG 1
FIG 2
GRADIENT DESCENT ALGORITHM
Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of
a differentiable function.
Gradient Descent is an optimization technique that is used to improve deep learning and neural
network-based models by minimizing the cost function
 To find a local minimum of a function using gradient descent, we take steps proportional to
the negative of the gradient (or approximate gradient) of the function at the current point.
 But if we instead take steps proportional to the positive of the gradient, we approach a local
maximum of that function; the procedure is then known as gradient ascent.
Gradient descent is generally attributed to Cauchy, who first suggested it in 1847,but its
convergence properties for non-linear optimization problems were first studied by Haskell
Curry in 1944.
Local minima,Global Minima,Local
Maxima ,Global Maxima
An analogy for understanding gradient
descent
The basic intuition behind gradient descent can be illustrated by a
hypothetical scenario.
A person is stuck in the mountains and is trying to get down (i.e.
trying to find the global minimum). There is heavy fog such that
visibility is extremely low.
Therefore, the path down the mountain is not visible, so they must
use local information to find the minimum.
They can use the method of gradient descent, which involves looking
at the steepness of the hill at their current position, then proceeding
in the direction with the steepest descent (i.e. downhill).
An analogy for understanding gradient
descent
In this analogy, the person represents the algorithm, and the path taken down the mountain
represents the sequence of parameter settings that the algorithm will explore.
The steepness of the hill represents the slope of the error surface at that point. The instrument
used to measure steepness is differentiation .
The direction they choose to travel in aligns with the gradient of the error surface at that point.
The amount of time they travel before taking another measurement is the learning rate of the
algorithm.
ANOTHER ANALOGY
 An analogy could be drawn in the form of
a steep mountain whose base touches the
sea.
 We assume a person’s goal is to reach
down to sea level. Ideally, the person
would have to take one step at a time to
reach the goal.
 Each step has a gradient in the negative
direction (Note: the value can be of
different magnitude).
 The person continues hiking down till he
reaches the bottom or to a threshold point,
where there is no room to go further down.
GRADIENT DESCENT ALGORITHM
Illustration of gradient descent on an
example
Consider the nonlinear system of equations
showing the first 80 iterations of
gradient descent applied to this
example. and arrows show the direction
of descent. Due to a small and constant
step size, the convergence is slow.
APPLICATION
Gradient descent can be used to solve a system of linear equations.
Gradient descent can also be used to solve a system of nonlinear equations.
Gradient descent works in spaces of any number of dimensions, even in infinite-dimensional
ones.
The gradient descent can be combined with a line search
Methods based on Newton's method and inversion of the Hessian using conjugate
gradient techniques can be better alternatives
Gradient descent can be viewed as applying Euler's method for solving ordinary differential
equations to a gradient flow.
FEED FORWARD NEURAL NETWORK
A feedforward neural network is an artificial neural network wherein connections between
the nodes do not form a cycle. As such, it is different from its descendant: recurrent neural
networks.
The feedforward neural network was the first and simplest type of artificial neural network
devised.
In this network, the information moves in only one direction—forward—from the input nodes,
through the hidden nodes (if any) and to the output nodes.
Deep feedforward networks, also often called feedforward neural networks, or multilayer
perceptrons(MLPs), are the quintessential deep learning models.
The goal of a feedforward network is to approximate some function f*.
FEED FORWARD NEURAL NETWORK
These models are called feedforward because information flows through the function being
evaluated from x, through the intermediate computations used to define f, and finally to the
output y.
 There are no feedback connections in which outputs of the model are fed back into itself.
 When feedforward neural networks are extended to include feedback connections, they are
called recurrent neural networks
FEED FORWARD NEURAL NETWORK
The inspiration behind neural networks are our brains. So lets see the biological aspect of neural
networks.
FEED FORWARD NEURAL NETWORK
Visualising the two images in Fig 1 where the left image shows how multilayer neural network
identify different object by learning different characteristic of object at each layer, for example
at first hidden layer edges are detected, on second hidden layer corners and contours are
identified.
Similarly in our brain there are different regions for the same purpose, as we can the region
denoted by V1, identifies edges, corners and etc.
SINGLE LAYER PERCEPTRON
The simplest kind of neural network is a single-layer perceptron network, which consists of a
single layer of output nodes; the inputs are fed directly to the outputs via a series of weights.
 The sum of the products of the weights and the inputs is calculated in each node
 if the value is above some threshold (typically 0) the neuron fires and takes the activated value
(typically 1); otherwise it takes the deactivated value (typically -1). Neurons with this kind
of activation function are also called artificial neurons or linear threshold units.
A perceptron can be created using any values for the activated and deactivated states as long
as the threshold value lies between the two.
SINGLE LAYER PERCEPTRON
Perceptrons can be trained by a simple learning algorithm that is usually called the delta rule. It
calculates the errors between calculated output and sample output data, and uses this to create
an adjustment to the weights, thus implementing a form of gradient descent.
Single-layer perceptrons are only capable of learning linearly separable patterns
In 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour
Papert showed that it was impossible for a single-layer perceptron network to learn an XOR
function (nonetheless, it was known that multi-layer perceptrons are capable of producing any
possible boolean function).
SINGLE LAYER PERCEPTRON
A single-layer neural network can compute a continuous output instead of a step function. A
common choice is the so-called logistic function. the single-layer network is identical to
the logistic regression model, widely used in statistical modeling.
 If single-layer neural network activation function is modulo 1, then this network can solve XOR
problem with exactly ONE neuron.
MULTILAYER PERCEPTRON
This class of networks consists of multiple layers of computational units, usually interconnected
in a feed-forward way. Each neuron in one layer has directed connections to the neurons of the
subsequent layer.
 In many applications the units of these networks apply a sigmoid function as an activation
function.
The universal approximation theorem for neural networks states that every continuous
function that maps intervals of real numbers to some output interval of real numbers can be
approximated arbitrarily closely by a multi-layer perceptron with just one hidden layer. This
result holds for a wide range of activation functions, e.g. for the sigmoidal functions.
Multi-layer networks use a variety of learning techniques, the most popular being back-
propagation.
OTHER FEED FORWARD NETWORKS
More generally, any directed acyclic graph may be used for a feedforward network, with some
nodes (with no parents) designated as inputs, and some nodes (with no children) designated as
outputs. These can be viewed as multilayer networks where some edges skip layers, either
counting layers backwards from the outputs or forwards from the inputs.
Various activation functions can be used, and there can be relations between weights, as
in convolutional neural networks.
Examples of other feedforward networks include radial basis function networks, which use a
different activation function.
Sometimes multi-layer perceptron is used loosely to refer to any feedforward neural network,
while in other cases it is restricted to specific ones (e.g., with specific activation functions, or
with fully connected layers, or trained by the perceptron algorithm).
MULTILAYER PERCEPTRON
A two-layer neural network capable of calculating
XOR. The numbers within the neurons represent
each neuron's explicit threshold (which can be
factored out so that all neurons have the same
threshold, usually 1). The numbers that annotate
arrows represent the weight of the inputs. This net
assumes that if the threshold is not reached, zero
(not -1) is output. Note that the bottom layer of
inputs is not always considered a real neural
network layer
BACK PROPAGATION ALGORITHM
Backpropagation algorithm is probably the
most fundamental building block in a neural
network. It was first introduced in 1960s and
almost 30 years later (1989) popularized by
Rumelhart, Hinton and Williams in a paper
called “Learning representations by back-
propagating errors”.
The algorithm is used to effectively train a
neural network through a method called chain
rule. In simple terms, after each forward pass
through a network, backpropagation performs
a backward pass while adjusting the model’s
parameters (weights and biases).
BACK PROPAGATION ALGORITHM
The output values are compared with the correct answer to compute the value
of some predefined error-function.
By various techniques, the error is then fed back through the network.
Using this information, the algorithm adjusts the weights of each connection in
order to reduce the value of the error function by some small amount.
After repeating this process for a sufficiently large number of training cycles,
the network will usually converge to some state where the error of the
calculations is small
To adjust weights properly, we can apply a general method for non-
linear optimization that is called gradient descent.
BACK PROPAGATION
For this, the network calculates the derivative of the error function with respect to the network
weights, and changes the weights such that the error decreases (thus going downhill on the
surface of the error function).
For this reason, back-propagation can only be applied on networks with differentiable
activation functions
Why We Need Backpropagation?
 Most prominent advantages of Backpropagation are:
Backpropagation is fast, simple and easy to program
It has no parameters to tune apart from the numbers of input
It is a flexible method as it does not require prior knowledge about the network
It is a standard method that generally works well
It does not need any special mention of the features of the function to be learned.
BACK PROPAGATION ALGORITHM
EXAMPLE
Define the neural network model:
The 4-layer neural network consists of 4 neurons for the input layer, 4 neurons for the hidden
layers and 1 neuron for the output layer.
EXAMPLE CONTINUED
INPUT LAYER:
The neurons, colored in purple, represent the input data.
These can be as simple as scalars or more complex like
vectors or multidimensional matrices.
The first set of activations (a) are equal to the input
values. NB: “activation” is the neuron’s value after applying
an activation function.
EXAMPLE CONTINUED
HIDDEN LAYER:
The final values at the hidden
neurons, colored in green, are
computed using z^l — weighted
inputs in layer l, and a^l—
activations in layer l.
EXAMPLE CONTINUED
For layer 2 and 3 the equations are:
EXAMPLE CONTINUED
W² and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those layers.
Activations a² and a³ are computed using an activation function f. Typically, this function f is
non-linear (e.g. sigmoid, ReLU, tanh) and allows the network to learn complex patterns in data.
Combined all parameter values in matrices, grouped by layers.
Let’s pick layer 2 and its parameters as an example. The same operations can be applied to any
layer in the network.
W¹ is a weight matrix of shape (n, m) where n is the number of output neurons (neurons in the
next layer) and m is the number of input neurons (neurons in the previous layer). For us, n =
2 and m = 4.
EXAMPLE CONTINUED
NB: The first number in any
weight’s subscript matches the
index of the neuron in the next
layer (in our case this is
the Hidden_1 layer) and the second
number matches the index of the
neuron in previous layer (in our
case this is the Input layer).
EXAMPLE CONTINUED
x is the input vector of
shape (m, 1) where m is the
number of input neurons. For
us, m = 4.
b¹ is a bias vector of shape (n , 1) where n is
the number of neurons in the current layer.
For us, n = 2.
EXAMPLE CONTINUED
Following the equation for z², we can use the above definitions of W¹, x and b¹ to derive
“Equation for z² ”:
EXAMPLE CONTINUED
Now carefully observe the neural network illustration from above.
EXAMPLE CONTINUED
OUTPUT LAYER
The final part of a neural
network is the output layer
which produces the predicated
value. In our simple example, it
is presented as a single neuron,
colored in blue and evaluated
as follows:
EXAMPLE CONTINUED
Again, we are using the matrix representation to simplify the equation. One can use the above
techniques to understand the underlying logic.
Forward propagation and evaluation
The equations above form network’s forward propagation.The slide is a short overview:
EXAMPLE CONTINUED(overview)
EXAMPLE CONTINUED
The final step in a forward pass is to evaluate the predicted output s against an expected
output y.
The output y is part of the training dataset (x, y) where x is the input (as we saw in the previous
section).
Evaluation between s and y happens through a cost function. This can be as simple
as MSE (mean squared error) or more complex like cross-entropy
We name this cost function C and denote it as follows:
EXAMPLE CONTINUED
where cost can be equal to MSE, cross-entropy or any other cost function.
Based on C’s value, the model “knows” how much to adjust its parameters in order to get closer
to the expected output y. This happens using the backpropagation algorithm.
Backpropagation and computing gradients
According to the paper from 1989, backpropagation:
repeatedly adjusts the weights of the connections in the network so as to minimize a measure
of the difference between the actual output vector of the net and the desired output vector.
EXAMPLE CONTINUED
And,
the ability to create useful new features distinguishes back-propagation from earlier,
simpler methods…
In other words, backpropagation aims to minimize the cost function by adjusting network’s
weights and biases. The level of adjustment is determined by the gradients of the cost function
with respect to those parameters.
One question may arise — why computing gradients?
To answer this, we first need to revisit some calculus terminology:
EXAMPLE CONTINUED
Gradient of a function C(x_1, x_2, …, x_m) in point x is a vector of the partial derivatives of C in
x.
The derivative of a function C measures the sensitivity to change of the function value (output
value) with respect to a change in its argument x (input value). In other words, the derivative
tells us the direction C is going.
The gradient shows how much the parameter x needs to change (in positive or negative
direction) to minimize C.
Compute those gradients happens using a technique called chain rule
EXAMPLE CONTINUED
Similar set of equations can be applied to (b_j)^l:
EXAMPLE CONTINUED
The common part in both equations is often
called “local gradient” and is expressed as follows:
The “local gradient” can easily be determined
using the chain rule.
The gradients allow us to optimize the model’s
parameters:
EXAMPLE CONTINUED
Initial values of w and b are randomly chosen.
Epsilon (e) is the learning rate. It determines the gradient’s influence.
w and b are matrix representations of the weights and biases. Derivative of C in w or b can be
calculated using partial derivatives of C in the individual weights or biases.
Termination condition is met once the cost function is minimized.
EXAMPLE CONTINUED
The final part of this
section to a simple
example in which we
will calculate the
gradient of C with
respect to a single
weight (w_22)².
Let’s zoom in on the
bottom part of the
above neural
network:
EXAMPLE CONTINUED
Weight (w_22)² connects (a_2)² and (z_2)², so computing the gradient requires applying the
chain rule through (z_2)³ and (a_2)³:
Calculating the final value of derivative of C in (a_2)³ requires knowledge of the function C.
Since C is dependent on (a_2)³, calculating the derivative should be fairly straightforward.
Knowing the nuts and bolts of this algorithm will fortify your neural networks knowledge and
make you feel comfortable to take on more complex models.
Summary of Back Propagation Algorithm
Summary of Back Propagation Algorithm
1.Inputs X, arrive through the preconnected path
2.Input is modeled using real weightsW.The weights are usually randomly selected.
3.Calculate the output for every neuron from the input layer, to the hidden layers, to the
output layer.
4.Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5.Travel back from the output layer to the hidden layer to adjust the weights such that the
error is decreased.
Keep repeating the process until the desired output is achieved
Feed forward ,back propagation,gradient descent

More Related Content

What's hot

Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashrisheetal katkar
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Learning Methods in a Neural Network
Learning Methods in a Neural NetworkLearning Methods in a Neural Network
Learning Methods in a Neural NetworkSaransh Choudhary
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 

What's hot (20)

Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Perceptron
PerceptronPerceptron
Perceptron
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Learning Methods in a Neural Network
Learning Methods in a Neural NetworkLearning Methods in a Neural Network
Learning Methods in a Neural Network
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 

Similar to Feed forward ,back propagation,gradient descent

Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13Dr. Muhammad Ali Tirmizi., Ph.D.
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroFariz Darari
 
Neural networks of artificial intelligence
Neural networks of artificial  intelligenceNeural networks of artificial  intelligence
Neural networks of artificial intelligencealldesign
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RManish Saraswat
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cseNaveenBhajantri1
 
Neural Networks Ver1
Neural  Networks  Ver1Neural  Networks  Ver1
Neural Networks Ver1ncct
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Artificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical DiagnosisArtificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical DiagnosisAdityendra Kumar Singh
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_ASrimatre K
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks ShwethaShreeS
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkmustafa aadel
 
Backpropagation
BackpropagationBackpropagation
Backpropagationariffast
 

Similar to Feed forward ,back propagation,gradient descent (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An Intro
 
Cnn
CnnCnn
Cnn
 
Neural networks of artificial intelligence
Neural networks of artificial  intelligenceNeural networks of artificial  intelligence
Neural networks of artificial intelligence
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
Neural Networks Ver1
Neural  Networks  Ver1Neural  Networks  Ver1
Neural Networks Ver1
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Artificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical DiagnosisArtificial Neural Network in Medical Diagnosis
Artificial Neural Network in Medical Diagnosis
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
ANN - UNIT 1.pptx
ANN - UNIT 1.pptxANN - UNIT 1.pptx
ANN - UNIT 1.pptx
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Nn devs
Nn devsNn devs
Nn devs
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 

Recently uploaded

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Recently uploaded (20)

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

Feed forward ,back propagation,gradient descent

  • 1. GRADIENT DESCENT ALGORITHM FEED FORWARD NEURAL NETWORK BACK PROPAGATION ALGORITHM
  • 2. Content Brief discussion about Machine Learning,Artificial Intelligence and Deep learning Gradient Descent Algorithm Feed Forward Neural Network Back propagation Algorithm
  • 5. MACHINE LEARNING(ML) Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions.
  • 7. ARTIFICIAL INTELLIGENCE Artificial intelligence (AI), is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals. The study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.  Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".
  • 9. DEEP LEARNING Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. Deep-learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, machine vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.
  • 11.
  • 12.
  • 13. GRADIENT DESCENT ALGORITHM Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. Gradient Descent is an optimization technique that is used to improve deep learning and neural network-based models by minimizing the cost function  To find a local minimum of a function using gradient descent, we take steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.  But if we instead take steps proportional to the positive of the gradient, we approach a local maximum of that function; the procedure is then known as gradient ascent. Gradient descent is generally attributed to Cauchy, who first suggested it in 1847,but its convergence properties for non-linear optimization problems were first studied by Haskell Curry in 1944.
  • 15. An analogy for understanding gradient descent The basic intuition behind gradient descent can be illustrated by a hypothetical scenario. A person is stuck in the mountains and is trying to get down (i.e. trying to find the global minimum). There is heavy fog such that visibility is extremely low. Therefore, the path down the mountain is not visible, so they must use local information to find the minimum. They can use the method of gradient descent, which involves looking at the steepness of the hill at their current position, then proceeding in the direction with the steepest descent (i.e. downhill).
  • 16. An analogy for understanding gradient descent In this analogy, the person represents the algorithm, and the path taken down the mountain represents the sequence of parameter settings that the algorithm will explore. The steepness of the hill represents the slope of the error surface at that point. The instrument used to measure steepness is differentiation . The direction they choose to travel in aligns with the gradient of the error surface at that point. The amount of time they travel before taking another measurement is the learning rate of the algorithm.
  • 17. ANOTHER ANALOGY  An analogy could be drawn in the form of a steep mountain whose base touches the sea.  We assume a person’s goal is to reach down to sea level. Ideally, the person would have to take one step at a time to reach the goal.  Each step has a gradient in the negative direction (Note: the value can be of different magnitude).  The person continues hiking down till he reaches the bottom or to a threshold point, where there is no room to go further down.
  • 19. Illustration of gradient descent on an example Consider the nonlinear system of equations showing the first 80 iterations of gradient descent applied to this example. and arrows show the direction of descent. Due to a small and constant step size, the convergence is slow.
  • 20. APPLICATION Gradient descent can be used to solve a system of linear equations. Gradient descent can also be used to solve a system of nonlinear equations. Gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones. The gradient descent can be combined with a line search Methods based on Newton's method and inversion of the Hessian using conjugate gradient techniques can be better alternatives Gradient descent can be viewed as applying Euler's method for solving ordinary differential equations to a gradient flow.
  • 21. FEED FORWARD NEURAL NETWORK A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle. As such, it is different from its descendant: recurrent neural networks. The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes. Deep feedforward networks, also often called feedforward neural networks, or multilayer perceptrons(MLPs), are the quintessential deep learning models. The goal of a feedforward network is to approximate some function f*.
  • 22. FEED FORWARD NEURAL NETWORK These models are called feedforward because information flows through the function being evaluated from x, through the intermediate computations used to define f, and finally to the output y.  There are no feedback connections in which outputs of the model are fed back into itself.  When feedforward neural networks are extended to include feedback connections, they are called recurrent neural networks
  • 23.
  • 24. FEED FORWARD NEURAL NETWORK The inspiration behind neural networks are our brains. So lets see the biological aspect of neural networks.
  • 25. FEED FORWARD NEURAL NETWORK Visualising the two images in Fig 1 where the left image shows how multilayer neural network identify different object by learning different characteristic of object at each layer, for example at first hidden layer edges are detected, on second hidden layer corners and contours are identified. Similarly in our brain there are different regions for the same purpose, as we can the region denoted by V1, identifies edges, corners and etc.
  • 26. SINGLE LAYER PERCEPTRON The simplest kind of neural network is a single-layer perceptron network, which consists of a single layer of output nodes; the inputs are fed directly to the outputs via a series of weights.  The sum of the products of the weights and the inputs is calculated in each node  if the value is above some threshold (typically 0) the neuron fires and takes the activated value (typically 1); otherwise it takes the deactivated value (typically -1). Neurons with this kind of activation function are also called artificial neurons or linear threshold units. A perceptron can be created using any values for the activated and deactivated states as long as the threshold value lies between the two.
  • 27. SINGLE LAYER PERCEPTRON Perceptrons can be trained by a simple learning algorithm that is usually called the delta rule. It calculates the errors between calculated output and sample output data, and uses this to create an adjustment to the weights, thus implementing a form of gradient descent. Single-layer perceptrons are only capable of learning linearly separable patterns In 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).
  • 28. SINGLE LAYER PERCEPTRON A single-layer neural network can compute a continuous output instead of a step function. A common choice is the so-called logistic function. the single-layer network is identical to the logistic regression model, widely used in statistical modeling.  If single-layer neural network activation function is modulo 1, then this network can solve XOR problem with exactly ONE neuron.
  • 29. MULTILAYER PERCEPTRON This class of networks consists of multiple layers of computational units, usually interconnected in a feed-forward way. Each neuron in one layer has directed connections to the neurons of the subsequent layer.  In many applications the units of these networks apply a sigmoid function as an activation function. The universal approximation theorem for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multi-layer perceptron with just one hidden layer. This result holds for a wide range of activation functions, e.g. for the sigmoidal functions. Multi-layer networks use a variety of learning techniques, the most popular being back- propagation.
  • 30. OTHER FEED FORWARD NETWORKS More generally, any directed acyclic graph may be used for a feedforward network, with some nodes (with no parents) designated as inputs, and some nodes (with no children) designated as outputs. These can be viewed as multilayer networks where some edges skip layers, either counting layers backwards from the outputs or forwards from the inputs. Various activation functions can be used, and there can be relations between weights, as in convolutional neural networks. Examples of other feedforward networks include radial basis function networks, which use a different activation function. Sometimes multi-layer perceptron is used loosely to refer to any feedforward neural network, while in other cases it is restricted to specific ones (e.g., with specific activation functions, or with fully connected layers, or trained by the perceptron algorithm).
  • 31. MULTILAYER PERCEPTRON A two-layer neural network capable of calculating XOR. The numbers within the neurons represent each neuron's explicit threshold (which can be factored out so that all neurons have the same threshold, usually 1). The numbers that annotate arrows represent the weight of the inputs. This net assumes that if the threshold is not reached, zero (not -1) is output. Note that the bottom layer of inputs is not always considered a real neural network layer
  • 32. BACK PROPAGATION ALGORITHM Backpropagation algorithm is probably the most fundamental building block in a neural network. It was first introduced in 1960s and almost 30 years later (1989) popularized by Rumelhart, Hinton and Williams in a paper called “Learning representations by back- propagating errors”. The algorithm is used to effectively train a neural network through a method called chain rule. In simple terms, after each forward pass through a network, backpropagation performs a backward pass while adjusting the model’s parameters (weights and biases).
  • 33. BACK PROPAGATION ALGORITHM The output values are compared with the correct answer to compute the value of some predefined error-function. By various techniques, the error is then fed back through the network. Using this information, the algorithm adjusts the weights of each connection in order to reduce the value of the error function by some small amount. After repeating this process for a sufficiently large number of training cycles, the network will usually converge to some state where the error of the calculations is small To adjust weights properly, we can apply a general method for non- linear optimization that is called gradient descent.
  • 34. BACK PROPAGATION For this, the network calculates the derivative of the error function with respect to the network weights, and changes the weights such that the error decreases (thus going downhill on the surface of the error function). For this reason, back-propagation can only be applied on networks with differentiable activation functions
  • 35. Why We Need Backpropagation?  Most prominent advantages of Backpropagation are: Backpropagation is fast, simple and easy to program It has no parameters to tune apart from the numbers of input It is a flexible method as it does not require prior knowledge about the network It is a standard method that generally works well It does not need any special mention of the features of the function to be learned.
  • 36. BACK PROPAGATION ALGORITHM EXAMPLE Define the neural network model: The 4-layer neural network consists of 4 neurons for the input layer, 4 neurons for the hidden layers and 1 neuron for the output layer.
  • 37.
  • 38. EXAMPLE CONTINUED INPUT LAYER: The neurons, colored in purple, represent the input data. These can be as simple as scalars or more complex like vectors or multidimensional matrices. The first set of activations (a) are equal to the input values. NB: “activation” is the neuron’s value after applying an activation function.
  • 39. EXAMPLE CONTINUED HIDDEN LAYER: The final values at the hidden neurons, colored in green, are computed using z^l — weighted inputs in layer l, and a^l— activations in layer l.
  • 40. EXAMPLE CONTINUED For layer 2 and 3 the equations are:
  • 41. EXAMPLE CONTINUED W² and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those layers. Activations a² and a³ are computed using an activation function f. Typically, this function f is non-linear (e.g. sigmoid, ReLU, tanh) and allows the network to learn complex patterns in data. Combined all parameter values in matrices, grouped by layers. Let’s pick layer 2 and its parameters as an example. The same operations can be applied to any layer in the network. W¹ is a weight matrix of shape (n, m) where n is the number of output neurons (neurons in the next layer) and m is the number of input neurons (neurons in the previous layer). For us, n = 2 and m = 4.
  • 42. EXAMPLE CONTINUED NB: The first number in any weight’s subscript matches the index of the neuron in the next layer (in our case this is the Hidden_1 layer) and the second number matches the index of the neuron in previous layer (in our case this is the Input layer).
  • 43. EXAMPLE CONTINUED x is the input vector of shape (m, 1) where m is the number of input neurons. For us, m = 4. b¹ is a bias vector of shape (n , 1) where n is the number of neurons in the current layer. For us, n = 2.
  • 44. EXAMPLE CONTINUED Following the equation for z², we can use the above definitions of W¹, x and b¹ to derive “Equation for z² ”:
  • 45. EXAMPLE CONTINUED Now carefully observe the neural network illustration from above.
  • 46. EXAMPLE CONTINUED OUTPUT LAYER The final part of a neural network is the output layer which produces the predicated value. In our simple example, it is presented as a single neuron, colored in blue and evaluated as follows:
  • 47. EXAMPLE CONTINUED Again, we are using the matrix representation to simplify the equation. One can use the above techniques to understand the underlying logic. Forward propagation and evaluation The equations above form network’s forward propagation.The slide is a short overview:
  • 49. EXAMPLE CONTINUED The final step in a forward pass is to evaluate the predicted output s against an expected output y. The output y is part of the training dataset (x, y) where x is the input (as we saw in the previous section). Evaluation between s and y happens through a cost function. This can be as simple as MSE (mean squared error) or more complex like cross-entropy We name this cost function C and denote it as follows:
  • 50. EXAMPLE CONTINUED where cost can be equal to MSE, cross-entropy or any other cost function. Based on C’s value, the model “knows” how much to adjust its parameters in order to get closer to the expected output y. This happens using the backpropagation algorithm. Backpropagation and computing gradients According to the paper from 1989, backpropagation: repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector.
  • 51. EXAMPLE CONTINUED And, the ability to create useful new features distinguishes back-propagation from earlier, simpler methods… In other words, backpropagation aims to minimize the cost function by adjusting network’s weights and biases. The level of adjustment is determined by the gradients of the cost function with respect to those parameters. One question may arise — why computing gradients? To answer this, we first need to revisit some calculus terminology:
  • 52. EXAMPLE CONTINUED Gradient of a function C(x_1, x_2, …, x_m) in point x is a vector of the partial derivatives of C in x. The derivative of a function C measures the sensitivity to change of the function value (output value) with respect to a change in its argument x (input value). In other words, the derivative tells us the direction C is going. The gradient shows how much the parameter x needs to change (in positive or negative direction) to minimize C. Compute those gradients happens using a technique called chain rule
  • 53. EXAMPLE CONTINUED Similar set of equations can be applied to (b_j)^l:
  • 54. EXAMPLE CONTINUED The common part in both equations is often called “local gradient” and is expressed as follows: The “local gradient” can easily be determined using the chain rule. The gradients allow us to optimize the model’s parameters:
  • 55. EXAMPLE CONTINUED Initial values of w and b are randomly chosen. Epsilon (e) is the learning rate. It determines the gradient’s influence. w and b are matrix representations of the weights and biases. Derivative of C in w or b can be calculated using partial derivatives of C in the individual weights or biases. Termination condition is met once the cost function is minimized.
  • 56. EXAMPLE CONTINUED The final part of this section to a simple example in which we will calculate the gradient of C with respect to a single weight (w_22)². Let’s zoom in on the bottom part of the above neural network:
  • 57. EXAMPLE CONTINUED Weight (w_22)² connects (a_2)² and (z_2)², so computing the gradient requires applying the chain rule through (z_2)³ and (a_2)³: Calculating the final value of derivative of C in (a_2)³ requires knowledge of the function C. Since C is dependent on (a_2)³, calculating the derivative should be fairly straightforward. Knowing the nuts and bolts of this algorithm will fortify your neural networks knowledge and make you feel comfortable to take on more complex models.
  • 58. Summary of Back Propagation Algorithm
  • 59. Summary of Back Propagation Algorithm 1.Inputs X, arrive through the preconnected path 2.Input is modeled using real weightsW.The weights are usually randomly selected. 3.Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer. 4.Calculate the error in the outputs ErrorB= Actual Output – Desired Output 5.Travel back from the output layer to the hidden layer to adjust the weights such that the error is decreased. Keep repeating the process until the desired output is achieved