This document discusses deep neural networks and computational graphs. It begins by explaining key concepts like derivatives, partial derivatives, optimization, training sets, and activation functions. It then provides examples of applying the chain rule in deep learning, including forward and back propagation in a neural network. Specifically, it demonstrates forward propagation through a simple network and calculating the gradient using backpropagation and the chain rule. Finally, it works through an example applying these concepts to a neural network using sigmoid activation functions.
Deep neural networks & computational graphsRevanth Kumar
To improve the performance of a Deep Learning model. The goal is to reduce the optimization function which can be divided based on the classification and the regression problems.
Deep learning is a technique that basically mimics the human brain. So, the Scientist and Researchers taught can we make machines learn in the same way so, there is where the deep learning concept came that lead to the invention called Neural Network
An image can be seen as a matrix I, where I(x, y) is the brightness of the pixel located at coordinates (x, y). In the Convolutional neural network, the kernel is nothing but a filter
that is used to extract the features from the images.
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
Abstract: This PDSG workshop introduces basic concepts of the grandfather of neural networks - the Perceptron. Concepts covered are history, algorithm and limitations.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Deep neural networks & computational graphsRevanth Kumar
To improve the performance of a Deep Learning model. The goal is to reduce the optimization function which can be divided based on the classification and the regression problems.
Deep learning is a technique that basically mimics the human brain. So, the Scientist and Researchers taught can we make machines learn in the same way so, there is where the deep learning concept came that lead to the invention called Neural Network
An image can be seen as a matrix I, where I(x, y) is the brightness of the pixel located at coordinates (x, y). In the Convolutional neural network, the kernel is nothing but a filter
that is used to extract the features from the images.
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
Abstract: This PDSG workshop introduces basic concepts of the grandfather of neural networks - the Perceptron. Concepts covered are history, algorithm and limitations.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
The slide covers the basic concepts and designs of artificial neural networks. It explains and justifies the use of McCulloh Pitts Model, Adaline network, Perceptron algorithm, Backpropagation algorithm, Hopfield network and Kohonen network; along with its practical applications.
1
The Perceptron and its Learning Rule
Carlo U. Nicola, SGI FH Aargau
With extracts from publications of :
M. Minsky, MIT, Demuth, U. of Colorado,
D.J. C. MacKay, Cambridge University
WBS WS06-07 2
Perceptron
(i) Single layer ANN
(ii) It works with continuous or binary inputs
(iii) It stores pattern pairs (Ak,Ck) where: Ak = (a1
k, …, an
k)
and Ck = (c1
k, …, cn
k) are bipolar valued [-1, +1].
(iv) It applies the perceptron error-correction procedure, which
always converges.
(v) A perceptron is a classifier.
Bias b is sometimes called θ
Inroduction to Perceptron and how it is used in Machine Learning and Artificial Neural Network.
This presentation is prepared by Zaid Al-husseini, as a lectur for third stage of undergraduate students in Softwrae department - faculity of IT - University of Babylon, Iraq.
It is publicly availabe for the beginners to learn in theory and mathmatically how the Perceptron is working.
Notice: the slides are not detailed. And need a teacher to explain them deeply.
Modeling of neural image compression using gradient decent technologytheijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
The International Journal of Engineering & Science would take much care in making your article published without much delay with your kind cooperation
The slide covers the basic concepts and designs of artificial neural networks. It explains and justifies the use of McCulloh Pitts Model, Adaline network, Perceptron algorithm, Backpropagation algorithm, Hopfield network and Kohonen network; along with its practical applications.
1
The Perceptron and its Learning Rule
Carlo U. Nicola, SGI FH Aargau
With extracts from publications of :
M. Minsky, MIT, Demuth, U. of Colorado,
D.J. C. MacKay, Cambridge University
WBS WS06-07 2
Perceptron
(i) Single layer ANN
(ii) It works with continuous or binary inputs
(iii) It stores pattern pairs (Ak,Ck) where: Ak = (a1
k, …, an
k)
and Ck = (c1
k, …, cn
k) are bipolar valued [-1, +1].
(iv) It applies the perceptron error-correction procedure, which
always converges.
(v) A perceptron is a classifier.
Bias b is sometimes called θ
Inroduction to Perceptron and how it is used in Machine Learning and Artificial Neural Network.
This presentation is prepared by Zaid Al-husseini, as a lectur for third stage of undergraduate students in Softwrae department - faculity of IT - University of Babylon, Iraq.
It is publicly availabe for the beginners to learn in theory and mathmatically how the Perceptron is working.
Notice: the slides are not detailed. And need a teacher to explain them deeply.
Modeling of neural image compression using gradient decent technologytheijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
The International Journal of Engineering & Science would take much care in making your article published without much delay with your kind cooperation
A Threshold Logic Unit (TLU) is a mathematical function conceived as a crude model, or abstraction of biological neurons. Threshold logic units are the constitutive units in an artificial neural network. In this paper a positive clock-edge triggered T flip-flop is designed using Perceptron Learning Algorithm, which is a basic design algorithm of threshold logic units. Then this T flip-flop is used to design a two-bit up-counter that goes through the states 0, 1, 2, 3, 0, 1… Ultimately, the goal is to show how to design simple logic units based on threshold logic based perceptron concepts.
An Artificial Neural Network (ANN) is a computational model inspired by the structure and functioning of the human brain's neural networks. It consists of interconnected nodes, often referred to as neurons or units, organized in layers. These layers typically include an input layer, one or more hidden layers, and an output layer.
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
An artificial neural network (ANN) is the piece of a computing system designed to simulate the way the human brain analyzes and processes information. It is the foundation of artificial intelligence (AI) and solves problems that would prove impossible or difficult by human or statistical standards. ANNs have self-learning capabilities that enable them to produce better results as more data becomes available.
Hybrid PSO-SA algorithm for training a Neural Network for ClassificationIJCSEA Journal
In this work, we propose a Hybrid particle swarm optimization-Simulated annealing algorithm and present a comparison with i) Simulated annealing algorithm and ii) Back propagation algorithm for training neural networks. These neural networks were then tested on a classification task. In particle swarm optimization behaviour of a particle is influenced by the experiential knowledge of the particle as well as socially exchanged information. Particle swarm optimization follows a parallel search strategy. In simulated annealing uphill moves are made in the search space in a stochastic fashion in addition to the downhill moves. Simulated annealing therefore has better scope of escaping local minima and reach a global minimum in the search space. Thus simulated annealing gives a selective randomness to the search. Back propagation algorithm uses gradient descent approach search for minimizing the error. Our goal of global minima in the task being done here is to come to lowest energy state, where energy state is being modelled as the sum of the squares of the error between the target and observed output values for all the training samples. We compared the performance of the neural networks of identical architectures trained by the i) Hybrid particle swarm optimization-simulated annealing, ii) Simulated annealing and iii) Back propagation algorithms respectively on a classification task and noted the results obtained. Neural network trained by Hybrid particle swarm optimization-simulated annealing has given better results compared to the neural networks trained by the Simulated annealing and Back propagation algorithms in the tests conducted by us.
Web spam classification using supervised artificial neural network algorithmsaciijournal
Due to the rapid growth in technology employed by the spammers, there is a need of classifiers that are more efficient, generic and highly adaptive. Neural Network based technologies have high ability of adaption as well as generalization. As per our knowledge, very little work has been done in this field using neural network. We present this paper to fill this gap. This paper evaluates performance of three supervised learning algorithms of artificial neural network by creating classifiers for the complex problem of latest web spam pattern classification. These algorithms are Conjugate Gradient algorithm, Resilient Backpropagation learning, and Levenberg-Marquardt algorithm.
On The Application of Hyperbolic Activation Function in Computing the Acceler...iosrjce
Hyperbolic activation function is examined for its ability to accelerate the performance of doing data
mining by using a technique named as Reverse Analysis method. In this paper, we describe how Hopfield
network perform better with hyperbolic activation function and able to induce logical rules from large database
by using reverse analysis method: given the values of the connections of a network, we hope to determine what
logical rules are entrenched in the database. We limit our analysis to Horn clauses. The analysis for this study
was simulated using Microsoft Visual C++ software, 2010 Express.
Web Spam Classification Using Supervised Artificial Neural Network Algorithmsaciijournal
Due to the rapid growth in technology employed by the spammers, there is a need of classifiers that are
more efficient, generic and highly adaptive. Neural Network based technologies have high ability of
adaption as well as generalization. As per our knowledge, very little work has been done in this field using
neural network. We present this paper to fill this gap. This paper evaluates performance of three supervised
learning algorithms of artificial neural network by creating classifiers for the complex problem of latest
web spam pattern classification. These algorithms are Conjugate Gradient algorithm, Resilient Backpropagation learning, and Levenberg-Marquardt algorithm.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
APPLIED MACHINE LEARNING
1. APPLIED MACHINE LEARNING: FINALS
Course Code: CS501
DEEP NEURAL NETWORKS & COMPUTATIONAL GRAPHS
Name Puranam Revanth Kumar
(Research Scholar)
Roll No. 19STRCHH01004
2. Question
1. Deep Neural Networks & Computational Graphs
(a) Explain the Concept - derivatives, partial derivatives, optimization, training set,
activation functions etc.
(b) Give simple examples of Chain Rule then generalize - assume all activation func-
tions have partial derivatives.
(c) Demonstrate on simple example such as Sigmoid activation functions.
ii
3. Contents
Question ii
1 Deep Neural Networks & Computational Graphs 1
1.1 Explain the Concept - derivatives, partial derivatives, optimization, training
set, activation functions etc. . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Give simple examples of Chain Rule then generalize - assume all activation
functions have partial derivatives . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Chain Rule in Back Propagation . . . . . . . . . . . . . . . . . . . 9
1.3 Demonstrate on simple example such as Sigmoid activation functions . . . 11
Summary 13
iii
5. 1. Deep Neural Networks & Computational Graphs
Deep learning is a technique which basically mimics the human brain. So, the Scientist
and Researchers taught can we make machine learn in the same way so, their is where deep
learning concept came that lead to the invention called neural network. The first simplest
type of neural network is called perceptron. There was some problems in the perceptron
because the perceptron not able to learn properly because the concepts they applied, but later
on in 1980’s Geoffrey Hinton he invented concept called backpropagation[1]. So, the ANN,
CNN, RNN became efficient that many companies are using it, developed lot of applications.
An artificial neural network computes a function of the inputs by propagating the com-
puted values from the input neurons to the output neuron(s) and using the weights as inter-
mediate parameters. Learning occurs by changing the weights connecting the neurons [2, 3].
Just as external stimuli are needed for learning in biological organisms, the external stimulus
in artificial neural networks is provided by the training data containing examples of input-
output pairs of the function to be learned. For example, the training data might contain pixel
representations of images (input) and their annotated labels (e.g., cat, dog) as the output.
These training data pairs are fed into the neural network by using the input representations
to make predictions about the output labels.
Figure 1.1: Artificial neural network
Here,
f1, f2, f3 are my input features.
• If it is a multi classification: more than one node can be specified.
• If it is a binary classification: only one node need to be specified.
The training data provides feedback to the correctness of the weights in the neural network
depending on how well the predicted output (e.g., probability of cat) for a particular input
matches the annotated output label in the training data. One can view the errors made by the
1
6. neural network in the computation of a function as a kind of unpleasant feedback in a bio-
logical organism, leading to an adjustment in the synaptic strengths. Similarly, the weights
between neurons are adjusted in a neural network in response to prediction errors. The goal
of changing the weights is to modify the computed function to make the predictions more
correct in future iterations. Therefore, the weights are changed carefully in a mathematically
justified way so as to reduce the error in computation on that example. By successively ad-
justing the weights between neurons over many input-output pairs, the function computed by
the neural network is refined over time so that it provides more accurate predictions.
Computational Graphs
A neural network is a computational graph, in which a unit of computation is the neu-
ron. Neural networks are fundamentally more powerful than their building blocks because
the parameters of these models are learned jointly to create a highly optimized composition
function of these models [4]. Furthermore, the nonlinear activations between the different
layers add to the expressive power of the network. A multilayer network evaluates compo-
sitions of functions computed at individual nodes. A path of length 2 in the neural network
in which the function f(·) follows g(·) can be considered a composition function f(g(·)). Just
to provide an idea, let us look at a trivial computational graph with two nodes, in which the
sigmoid function is applied at each node to the input weight w. In such a case, the computed
function appears as follows:
f(g(w)) =
1
1 + exp[− 1
1+exp(w)
]
The resulting iterative approach is dynamic programming, and the corresponding update is
really the chain rule of differential calculus. In order to understand how the chain rule works
in a computational graph, we will discuss the two basic variants of the rule that one needs to
keep in mind. The simplest version of the chain rule works for a straightforward composition
of the functions.
∂f(g(w))
∂w
=
∂f(g(w))
∂g(w)
∂g(w)
∂w
Figure 1.2: A simple computational graph with two nodes
Consider a sequence of hidden units h1, h2, ..., hk followed by output o, with respect to
which the loss function L is computed. Furthermore, assume that the weight of the connec-
tion from hidden unit hr to hr+1 is w(hr,hr+1). Then, in the case that a single path exists from
h1 to o, one can derive the gradient of the loss function with respect to any of these edge
weights using the chain rule:
∂L
∂w(hr−1,hr)
=
∂L
∂o
"
∂o
∂hk
k−1
Y
i=r
∂hi+1
∂hi
#
∂hr
∂w(hr−1,hr)
∀r ∈ 1...k
2
7. Figure 1.3: Illustration of chain rule in computational graphs: The products of node-specific
partial derivatives along paths from weight w to output o are aggregated. The resulting value
yields the derivative of output o with respect to weight w. Only two paths between input and
output exist in this simplified example.
∂o
∂w
=
∂o
∂p
∂p
∂w
+
∂o
∂q
∂q
∂w
[Multivariable Chain Rule]
=
∂o
∂p
∂p
∂y
∂y
∂w
+
∂o
∂q
∂q
∂z
∂z
∂w
[Univariate Chain Rule]
=
∂K(p, q)
∂p
g
0
(y) f
0
(w) +
∂K(p, q)
∂q
h
0
(z) f
0
(w)
First path Second path
3
8. 1.1 Explain the Concept - derivatives, partial derivatives,
optimization, training set, activation functions etc.
(a) Derivatives: The derivative of a function of a single variable at a chosen input value,
when it exists, is the slope of the tangent line to the graph of the function at that point [5].
Example: Let us plot here the function f(a) = 3a. So, it’s just a straight line.
Figure 1.4: plot a function f(a)=3a
Let say that a = 2. In that case, f(a), which is equal to 3 times a is equal to f(a) = 6. Now, i
am going to just bump up a, a little bit, so that it is now 2.001 just plot this into scale, 2.001,
this 0.001 difference is too small to show on this plot.
Now,just give a little nudge to that right f(a), is equal to three times that. So, it’s 6.003, so
we plot this over here.
Figure 1.5: function f(a) with slope
If you look at this little triangle the slope or derivative of a function f(a) at a = 2 is 3.
The term derivative basically means slope, formally slop is defined as height / width which
is = 3.
4
9. Now,
df(a)
da
= 3 ;
d
da
f(a).
(b) Partial derivatives: Finding the gradient is essentially finding the derivative of the
function. There are many independent variables that we can tweak (all the weights and bi-
ases), we have to find the derivatives with respect to each variable. This is known as the
partial derivative, with the symbol ∂.
Computing the partial derivative of simple functions is easy: simply treat every other vari-
able in the equation as a constant and find the usual scalar derivative.
(c) Optimization: The Optimization choose inputs that result in best possible outputs. Op-
timizers are algorithms or methods used to change the attributes of your neural network such
as weights and learning rate in order to reduce the losses.
How you should change your weights or learning rates of your neural network to reduce the
losses is defined by the optimizers you use.
Example: θ1 := θ1 − α ∂
∂θ1
; If α is too small, the gradient decent can be slow.
Is α is too large, gradient decent can overshoot the minimum. It may fail to converge or even
diverge.
5
10. (d) Training set: Training set is a set of pairs of input patterns with corresponding de-
sired output patterns. Each pair represents how the network is supposed to respond to a
particular input.
There are two approaches to training - supervised and unsupervised. Supervised training
involves a mechanism of providing the network with the desired output either by manually
"grading" the network’s performance or by providing the desired outputs with the inputs.
Unsupervised training is where the network has to make sense of the inputs without outside
help.
(e) Activation function: The activation function is a mathematical “gate” in between the
input feeding the current neuron and its output going to the next layer. It can be as simple as
a step function that turns the neuron output on and off, depending on a rule or threshold [7].
Figure 1.6: Sigmoid Activation function
TanH / Hyperbolic Tangent: Zero centered—making it easier to model inputs that have
strongly negative, neutral, and strongly positive values. Otherwise like the Sigmoid function.
Figure 1.7: Hyperbolic Tangent
6
11. ReLU (Rectified Linear Unit): Computationally efficient—allows the network to con-
verge very quickly Non-linear—although it looks like a linear function, ReLU has a deriva-
tive function and allows for backpropagation.
Figure 1.8: ReLU function
Leaky ReLU: Prevents dying ReLU problem—this variation of ReLU has a small positive
slope in the negative area, so it does enable backpropagation, even for negative input values
Otherwise like ReLU.
Figure 1.9: Leaky ReLU function
Softmax: Able to handle multiple classes only one class in other activation functions—normalizes
the outputs for each class between 0 and 1, and divides by their sum, giving the probability
of the input value being in a specific class.
Useful for output neurons—typically Softmax is used only for the output layer, for neural
networks that need to classify inputs into multiple categories.
7
12. 1.2 Give simple examples of Chain Rule then generalize -
assume all activation functions have partial derivatives
Suppose u is a differentiable function of x1, x2, ..., xn and each xj differentiable function
of t1, t2, ..., tn. Then u is a function of t1, t2, ..., tn and the partial derivative u with respect to
t is [6]:
∂u
∂t1
=
∂u
∂x1
∂x1
∂t1
+
∂u
∂x2
∂x2
∂t1
+ ... +
∂u
∂xn
∂xn
∂tn
(1)
1.2.1 Forward Propagation
In this phase, the inputs for a training instance are fed into the neural network. This results
in a forward cascade of computations across the layers, using the current set of weights. The
final predicted output can be compared to that of the training instance and the derivative of
the loss function with respect to the output is computed. The derivative of this loss now
needs to be computed with respect to the weights in all layers in the backwards phase.
Let Inputs are x1, x2, x3. These inputs will pass to hidden neuron. Then 2 important
operations will take place:
Figure 1.10: Neural Network
Step 1: The summation of weights and the inputs
n
X
i=1
wixi
y = w1x1 + w2x2 + w3x3
Step 2: Before activation function the bias will be added and summation follows:
y = w1x1 + w2x2 + w3x3 + bi
z = Act(y)
z = z × w4
8
13. 1.2.2 Chain Rule in Back Propagation
The main goal of the backward phase is to learn the gradient of the loss function with re-
spect to the different weights by using the chain rule of differential calculus. These gradients
are used to update the weights. Since these gradients are learned in the backward direction,
starting from the output node, this learning process is referred to as the backward phase.
Suppose the inputs are x1, x2, x3, x4 which are getting connected with two hidden layers.
In hidden layer one there are 3 neurons and in the hidden layer two there are 2 neurons. The
best way to define the hidden layer is w1
11for 1st
hidden layer and w2
11or the 2nd
hidden layer
[4].
Figure 1.11: Neural network with two hidden layers
To reduce the loss value back propagate need to be used. While doing back propagation these
weights will get updated.
For a single record value the difference can be found by loss function.
Loss = (y − b
y)2
For multiple records the Cost function need to be defined.
n
X
i=1
(y − b
y)2
where,
w1
11, w2
11, w3
11 are weights,
HL1, HL2 are hidden layers,
O11, O21, O31 are outputs of hidden layer.
9
14. • Let us update the weights;
w11
3
new = w11
3
old − α
∂L
∂w3
11
• w3
11 need to be updated in the back propagation, what we do is that we get a b
y we get
a loss value now, when we back propagate we update the weights.
• Now, we see how to find derivative ∂L
∂w3
11
. This basically indicates the slope and how it
is related to chain rule.
• ∂L
∂w3
11
can be written as
• The weight w3
11 will impact the output O31. Since it impact output O31 this can be
write as:
∂L
∂w3
11
=
∂L
∂O31
×
∂O31
∂w3
11
this is basically a chain rule.
• Suppose, to find the derivative of w3
21
∂L
∂w3
21
=
∂L
∂O31
×
∂O31
∂w3
21
• To find the derivative of w2
11
∂L
∂w2
11
=
∂L
∂O31
×
∂O31
∂O21
×
∂O21
∂w2
11
• To find w3
11 because there are 2 output layers are impacting f21, f22. After finding the
derivative add one more derivative
∂L
∂O31
×
∂O31
∂O21
×
∂O21
∂w2
11
+
∂L
∂O31
×
∂O31
∂O22
×
∂O22
∂w2
12
• When this derivative is updated basically weights are getting updated then b
y going to
change until we reach global minima.
10
15. 1.3 Demonstrate on simple example such as Sigmoid acti-
vation functions
The activation function is a mathematical “gate” in between the input feeding the current
neuron and its output going to the next layer. It can be as simple as a step function that turns
the neuron output on and off depending on a rule or threshold [3].
σ(x) =
1
1 + e−y
; y =
n
X
i=1
wixi + bi
The inputs can be classified based on the gradient or the slope that we decide as a threshold.
Figure 1.12: Sigmoid function
Here 0.5 is the threshold, any inputs which fall above the given threshold are classified in to
one Cluster and any inputs below the threshold are classified in to another. This will trans-
form the value between 0 or 1. If it is 0.5 considered as 0.
Figure 1.13: DNN using Sigmoid function
11
16. Nice Property =
dσ(x)
dx
= σ(x)(1 − σ(x))
w0, w1, ..., wn ⇒ weights
x1, x2, ..., xn ⇒ inputs
In the above diagram, the activation function i.e.,the Sigmoid function is applied on sum-
mation, differentiating sigmoid function with respect to x.
Now,
dσ(x)
dx
=
1
(1 + e−x)2
· e−x
=
e−x
(1 + e−x)2
=
1
(1 + e−x)
*
e−x
(1 + e−x)
(sigmoid) (1-sigmoid)
∴
dσ(x)
dx
= σ(x)(1 − σ(x)).
12
17. Summary
Although a neural network can be viewed as a simulation of the learning process in living
organisms, a more direct understanding of neural networks is as computational graphs. Such
computational graphs perform recursive composition of simpler functions in order to learn
more complex functions. Since these computational graphs are parameterized, the problem
generally boils down to learning the parameters of the graph in order to optimize a loss
function. The simplest types of neural networks are often basic machine learning models
like least-squares regression. The real power of neural networks is unleashed by using more
complex combinations of the underlying functions. The parameters of such networks are
learned by using a dynamic programming method, referred to as backpropagation. There
are several challenges associated with learning neural network models, such as overfitting
and training instability. In recent years, numerous algorithmic advancements have reduced
these problems.Lastly, the mathematical intuition behind forward and back-ward propagation
has been derived in order to show how internally training of dataset is done and how error is
minimized using back-propagation. The design of deep learning methods in specific domains
such as text and images requires carefully crafted architectures.
13
18. Bibliography
[1] https://en.wikipedia.org/wiki/Geoffrey_Hinton [Cited on page 1]
[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. [Cited
on page 1]
[3] https://www.youtube.com/watch?v=DKSZHN7jftIt=4s [Cited on pages 1 and 11]
[4] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-
propagating errors. Nature, 323 (6088), pp. 533–536, 1986. [Cited on pages 2 and 9]
[5] https://www.coursera.org [Cited on page 4]
[6] https://towardsdatascience.com/understanding-backpropagation-algorithm-
7bb3aa2f95fd [Cited on page 8]
[7] https://missinglink.ai/ [Cited on page 6]
14