SlideShare a Scribd company logo
1 of 27
Neural Networks and
Fuzzy Systems
Multi-layer Feed forward Networks
Dr. Tamer Ahmed Farrag
Course No.: 803522-3
Course Outline
Part I : Neural Networks (11 weeks)
β€’ Introduction to Machine Learning
β€’ Fundamental Concepts of Artificial Neural Networks
(ANN)
β€’ Single layer Perception Classifier
β€’ Multi-layer Feed forward Networks
β€’ Single layer FeedBack Networks
β€’ Unsupervised learning
Part II : Fuzzy Systems (4 weeks)
β€’ Fuzzy set theory
β€’ Fuzzy Systems
2
Outline
β€’ Why we need Multi-layer Feed forward Networks
(MLFF)?
β€’ Error Function (or Cost Function or Loss function)
β€’ Gradient Descent
β€’ Backpropagation
3
Why we need Multi-layer Feed forward
Networks (MLFF)?
β€’ Overcoming failure of single layer perceptron in
solving nonlinear problems.
β€’ First Suggestion:
β€’ Divide the problem space into smaller linearly separable
regions
β€’ Use a perceptron for each linearly separable region
β€’ Combine the output of multiple hidden neurons to
produce a final decision neuron.
4
Region 1
Region 2
Why we need Multi-layer Feed forward
Networks (MLFF)?
β€’ Second suggestion
β€’ In some cases we need a curve decision boundary or we try to solve
more complicated classification and regression problems.
β€’ So, we need to:
β€’ Add more layers
β€’ Increase a number of neurons in each layer.
β€’ Use non linear activation function in
the hidden layers.
β€’ So , we need Multi-layer Feed forward Networks (MLFF).
5
Notation for Multi-Layer Networks
β€’ Dealing with multi-layer networks is easy if a sensible notation is adopted.
β€’ We simply need another label (n) to tell us which layer in the network we
are dealing with.
β€’ Each unit j in layer n receives activations π‘œπ‘’π‘‘π‘–
(π‘›βˆ’1)
𝑀𝑖𝑗
(𝑛)
from the previous
layer of processing units and sends activations π‘œπ‘’π‘‘π‘—
(𝑛)
to the next layer of
units.
6
1
2
3
1
2
layer (0) layer (1)
π’˜π’Šπ’‹
(𝟏)
layer (n-1) layer (n)
π’˜π’Šπ’‹
(𝒏)
ANN Representation
(1 input layer + 1 hidden layer +1 output layer)
7
for example:
𝑧1
(1)
= (𝑀11
(1)
π‘₯1 + 𝑀21
(1)
π‘₯2+ 𝑀31
(1)
π‘₯3 + 𝑏1
(1)
)
π‘Ž1
(1)
= 𝑓 𝑧1
(1)
=Οƒ (𝑧1
(1)
)
𝑧2
(2)
= (𝑀12
(2)
π‘Ž1
(1)
+ 𝑀22
(2)
π‘Ž2
(1)
+ 𝑀32
(2)
π‘Ž3
(1)
+ 𝑏2
(2)
)
𝑦2 = π‘Ž2
(2)
= 𝑓 𝑧2
(2)
=Οƒ (𝑧2
(2)
)
𝒛𝒋
(𝒍)
=
𝒋
π’˜π’Šπ’‹
(𝒍)
π’‚π’Š
(π’βˆ’πŸ)
+ 𝒃𝒋
(𝒍)
𝒂𝒋
(𝒍)
= 𝒇 𝒛𝒋
𝒍
= Οƒ 𝒛𝒋
𝒍
layer (0)
π‘₯1= π‘Ž1
(0)
π‘₯2= π‘Ž2
(0)
𝒛 𝟏
(𝟏)
𝒂 𝟏
(𝟏)
𝒛 𝟐
(𝟏)
𝒂 𝟐
(𝟏)
𝒛 πŸ‘
(𝟏)
𝒂 πŸ‘
(𝟏)
layer (1)
𝒛 𝟏
(𝟐)
𝒂 𝟏
(𝟐)
𝒛 𝟐
(𝟐)
𝒂 𝟐
(𝟐)
layer (2)
π’˜ 𝟏𝟏
(𝟏)
π’˜ 𝟏𝟐
(𝟏)
π’˜ πŸπŸ‘
(𝟏)
π’˜ 𝟐𝟏
(𝟏)
π’˜ 𝟐𝟐
(𝟏)
π’˜ πŸπŸ‘
(𝟏)
π’˜ πŸ‘πŸ
(𝟏)
π’˜ πŸ‘πŸ
(𝟏)
π’˜ πŸ‘πŸ‘
(𝟏)
π’˜ 𝟏𝟏
(𝟐)
π’˜ 𝟏𝟐
(𝟐)
π’˜ 𝟐𝟏
(𝟐)
π’˜ 𝟐𝟐
(𝟐)
π’˜ πŸ‘πŸ
(𝟐)
π’˜ πŸ‘πŸ
(𝟐)
π’š 𝟏
π’š 𝟐
π‘₯2= π‘Ž2
(0)
Gradient Descent
and Backpropagation
Error Function
● how we can evaluate performance of a neuron
????
● We can use a Error function (or cost function or
loss function) to measure how far off we are from
the expected value.
● Choosing appropriate Error function help the
learning algorithm to reach to best values for
weights and biases.
● We’ll use the following variables:
β—‹ D to represent the true value (desired value)
β—‹ y to represent neuron’s prediction 9
Error Functions
(Cost function or Lost Function)
β€’ There are many formulates for error functions.
β€’ In this course, we will deal with two Error function
formulas.
Sum Squared Error (SSE) :
𝑒 𝑝𝑗 = 𝑦𝑗 βˆ’ 𝐷𝑗
2
for single perceptron
𝐸𝑆𝑆𝐸=
𝑗=1
𝑛
𝑦𝑗 βˆ’ 𝐷𝑗
2
1
Cross entropy (CE):
𝐸 𝐢𝐸 =
1
𝑛 𝑗=1
𝑛
[𝐷𝑗 βˆ— ln(𝑦𝑗) + (1βˆ’ 𝐷𝑗) βˆ— ln(1βˆ’ 𝑦𝑗)] (2)
10
1
2
Why the error in ANN occurs?
β€’ Each weight and bias in the network contribute in
the occasion of the error.
β€’ To solve this we need:
β€’ A cost function or error function to compute the error.
(SSE or CE Error function)
β€’ An optimization algorithm to minimize the error
function. (Gradient Decent)
β€’ A learning algorithm to modify weights and biases to
new values to get the error down. (Backpropagation)
β€’ Repeat this operation until find the best solution
11
Gradient Decent (in 1 dimension)
β€’ Assume we have a error function E and we need to
use it to update one weight w
β€’ The figure show the error function in terms of w
β€’ Our target is to learn the value of w produces the
minimum value of E.
How?
12
E
W
minimum
Gradient Decent (in 1 dimension)
β€’ In Gradient Decent algorithm, we use the following
equation to get a better value of w:
𝑀 = 𝑀 βˆ’ αΔ𝑀 (called Delta rule)
Where:
Ξ± : is the learning rate
Δ𝑀 : is mathematically can be computed using
derivative of E with respect to w (
𝑑𝐸
𝑑𝑀
)
13
E
W
minimum
𝑀 = 𝑀 βˆ’ Ξ±
𝑑𝐸
𝑑𝑀
(3)
Local Minima problem
14
Choosing learning rate
15
Gradient Decent (multi dimension)
β€’ In ANN with many layers and many neurons in each layer the
Error function will be multi-variable function.
β€’ So, the derivative in equation (3) should be partial derivative
𝑀𝑖𝑗 = 𝑀𝑖𝑗 βˆ’ Ξ±
πœ•πΈ 𝑗
πœ•π‘€ 𝑖𝑗
(4)
β€’ We write equation (4) as :
𝑀𝑖𝑗 = 𝑀𝑖𝑗 βˆ’ Ξ± πœ•π‘€π‘–π‘—
β€’ Same process will be use to get the
new bias value:
𝑏𝑗= 𝑏𝑗 βˆ’ Ξ± πœ•π‘π‘—
16
derivative of activation functions
17
Sigmoid
Learning Rule in the output layer
using SSE as error function and sigmoid as Activation
function
πœ•πΈ 𝑗
πœ•π‘€ 𝑖𝑗
=
πœ•πΈ 𝑗
πœ•π‘Ž 𝑗
(𝑙) *
πœ•π‘Ž 𝑗
(𝑙)
πœ•π‘§ 𝑗
(𝑙) *
πœ•π‘§ 𝑗
(𝑙)
πœ•π‘€π‘–π‘—
(𝑙)
Where:
𝐸𝑗 =
𝑗
(𝑦𝑗 βˆ’ 𝐷𝑗 )2
𝑦𝑗 = π‘Žπ‘—
(𝑙)
= 𝑓 𝑧𝑗
𝑙
= 𝜎 𝑧𝑗
𝑙
𝑧𝑗
(𝑙)
=
𝑗
𝑀𝑖𝑗
(𝑙)
π‘Žπ‘–
(π‘™βˆ’1)
+ 𝑏𝑗
(𝑙)
From the previous table:
πœŽβ€² 𝑧𝑗
𝑙
= 𝜎 𝑧𝑗
𝑙
βˆ— 1 βˆ’ 𝜎 𝑧𝑗
𝑙
= 𝑦𝑗 (1 βˆ’ 𝑦𝑗)
18
Learning Rule in the output layer (cont.)
So (How?),
πœ•π‘¦π‘—
πœ•π‘§π‘–
= 𝑦𝑗 (1 βˆ’ 𝑦𝑗)
πœ•π‘§π‘—
πœ•π‘€π‘–π‘—
= π‘Žπ‘–
(π‘™βˆ’1)
πœ•πΈπ‘—
πœ•π‘¦π‘—
= βˆ’2(𝑦𝑗 βˆ’ 𝐷𝑗 )
β€’ Then:
πœ•πΈ 𝑗
πœ•π‘€ 𝑖𝑗
= 2π‘Žπ‘–
π‘™βˆ’1
𝑦𝑗 βˆ’ 𝐷𝑗 𝑦𝑗 1 βˆ’ 𝑦𝑗
𝑀𝑖𝑗 = 𝑀𝑖𝑗 βˆ’ 2 Ξ± π‘Žπ‘–
π‘™βˆ’1
𝑦𝑗 βˆ’ 𝐷𝑗 𝑦𝑗 1 βˆ’ 𝑦𝑗
19
Learning Rule in the Hidden layer
β€’ Now we have to determine the appropriate
weight change for an input to hidden weight.
β€’ This is more complicated because it depends on
the error at all of the nodes this weighted
connection can lead to.
β€’ The mathematical proof is out our scope.
20
Gradient Decent (Notes)
Note 1:
β€’ the neuron activation function (f ) should be is defined
and differentiable function.
Note 3:
β€’ The calculating of πœ•π‘€π‘–π‘— for the hidden layer will be
more difficult (Why?)
Note 2:
β€’ The previous calculation will be repeated for each
weight and for each bias in the ANN
β€’ So, we need big computational power (what about
deeper networks? )
21
Gradient Decent (Notes)
β€’ πœ•π‘€π‘–π‘— is represent the change in the values of 𝑀𝑖𝑗
to get better output
β€’ The equation of πœ•π‘€π‘–π‘— is dependent on the choosing
of the Error(Cost) function and activation function.
β€’ Gradient Decent algorithm help in calculated the
new values of weights and bias.
β€’ Question: is one iteration (one trail) enough to
bet the best values for weights and biases
β€’ Answer: No, we need a extended version ?
Backpropagation
22
How Backpropagation Work?
23
π’˜ 𝟏𝟏
(𝟏)
π’˜ 𝟏𝟐
(𝟏)
π’˜ 𝟐𝟏
(𝟏)
π’˜ 𝟐𝟐
(𝟏)
π’˜ πŸ‘πŸ
(𝟏)
π’˜ πŸ‘πŸ
(𝟏)
π’˜ 𝟏𝟏
(𝟐)
π’˜ 𝟐𝟏
(𝟐)
π’š
𝒂 𝟏
(𝟏)
= π’˜ 𝟏𝟏
(𝟏)
-𝛼 πœ•π’˜ 𝟏𝟏
(𝟏)
= π’˜ 𝟏𝟏
(𝟐)
-𝛼 πœ•π’˜ 𝟏𝟏
(𝟐)
π‘­π’π’“π’˜π’‚π’“π’… π‘·π’“π’π’‘π’‚π’ˆπ’‚π’•π’Šπ’π’ π‘©π’‚π’„π’Œ π‘·π’“π’π’‘π’‚π’ˆπ’‚π’•π’Šπ’π’
π’π’‚π’šπ’†π’“ 𝟎 π’π’‚π’šπ’†π’“ 𝟏 π’π’‚π’šπ’†π’“ 𝟐
Online Learning vs. Offline Learning
β€’ Online: Pattern-by-Pattern
learning
β€’ Error calculated for each
pattern
β€’ Weights updated after each
individual pattern
πš«π’˜π’Šπ’‹ = βˆ’πœΆ
𝝏𝑬 𝒑
ππ’˜π’Šπ’‹
β€’ Offline: Batch learning
β€’ Error calculated for all
patterns
β€’ Weights updated once at
the end of each epoch
πš«π’˜π’Šπ’‹ = βˆ’πœΆ
𝒑
𝝏𝑬 𝒑
ππ’˜π’Šπ’‹
24
Choosing Appropriate Activation and Cost
Functions
β€’ We already know consideration of single layer networks what
output activation and cost functions should be used for
particular problem types.
β€’ We have also seen that non-linear hidden unit activations are
needed, such as sigmoids.
β€’ So we can summarize the required network properties:
β€’ Regression/ Function Approximation Problems
β€’ SSE cost function, linear output activations, sigmoid hidden activations
β€’ Classification Problems (2 classes, 1 output)
β€’ CE cost function, sigmoid output and hidden activations
β€’ Classification Problems (multiple-classes, 1 output per class)
β€’ CE cost function, softmax outputs, sigmoid hidden activations
β€’ In each case, application of the gradient descent learning
algorithm (by computing the partial derivatives) leads to
appropriate back-propagation weight update equations.
25
Overall picture : learning process on ANN
26
Neural network simulator
β€’ Search through the internet to find a simulator and
report it
For example:
β€’ https://www.mladdict.com/neural-network-
simulator
β€’ http://playground.tensorflow.org/
27

More Related Content

What's hot

Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersMohammed Bennamoun
Β 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
Β 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaEr. Arpit Sharma
Β 
Perceptron
PerceptronPerceptron
PerceptronNagarajan
Β 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
Β 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
Β 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
Β 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
Β 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksThe Integral Worm
Β 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
Β 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
Β 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
Β 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
Β 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
Β 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applicationsSangeeta Tiwari
Β 

What's hot (20)

Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Β 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
Β 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
Β 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharma
Β 
Perceptron
PerceptronPerceptron
Perceptron
Β 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
Β 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Β 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
Β 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
Β 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Β 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Β 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
Β 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
Β 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Β 
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
Β 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
Β 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Β 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Β 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
Β 
Activation function
Activation functionActivation function
Activation function
Β 

Similar to 04 Multi-layer Feedforward Networks

Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
Β 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
Β 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
Β 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
Β 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
Β 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
Β 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
Β 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
Β 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
Β 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
Β 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
Β 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
Β 
Class13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfClass13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfAkashSingh625550
Β 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfssuser7f0b19
Β 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15Hao Zhuang
Β 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 κΉ€
Β 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Randa Elanwar
Β 

Similar to 04 Multi-layer Feedforward Networks (20)

Neural Networks
Neural NetworksNeural Networks
Neural Networks
Β 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
Β 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
Β 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
Β 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
Β 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
Β 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
Β 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
Β 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
Β 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
Β 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
Β 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
Β 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Β 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
Β 
Class13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfClass13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdf
Β 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
Β 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
Β 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
Β 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Β 
Backpropagation
BackpropagationBackpropagation
Backpropagation
Β 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
Β 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
Β 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
Β 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
Β 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
Β 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
Β 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
Β 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
Β 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
Β 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
Β 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
Β 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
Β 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
Β 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
Β 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
Β 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
Β 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
Β 

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
Β 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
Β 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
Β 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
Β 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
Β 
Model Call Girl in Bikash Puri Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Bikash Puri  Delhi reach out to us at πŸ”9953056974πŸ”Model Call Girl in Bikash Puri  Delhi reach out to us at πŸ”9953056974πŸ”
Model Call Girl in Bikash Puri Delhi reach out to us at πŸ”9953056974πŸ”
Β 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
Β 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
Β 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
Β 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
Β 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
Β 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
Β 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
Β 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
Β 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
Β 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
Β 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
Β 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
Β 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
Β 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Β 

04 Multi-layer Feedforward Networks

  • 1. Neural Networks and Fuzzy Systems Multi-layer Feed forward Networks Dr. Tamer Ahmed Farrag Course No.: 803522-3
  • 2. Course Outline Part I : Neural Networks (11 weeks) β€’ Introduction to Machine Learning β€’ Fundamental Concepts of Artificial Neural Networks (ANN) β€’ Single layer Perception Classifier β€’ Multi-layer Feed forward Networks β€’ Single layer FeedBack Networks β€’ Unsupervised learning Part II : Fuzzy Systems (4 weeks) β€’ Fuzzy set theory β€’ Fuzzy Systems 2
  • 3. Outline β€’ Why we need Multi-layer Feed forward Networks (MLFF)? β€’ Error Function (or Cost Function or Loss function) β€’ Gradient Descent β€’ Backpropagation 3
  • 4. Why we need Multi-layer Feed forward Networks (MLFF)? β€’ Overcoming failure of single layer perceptron in solving nonlinear problems. β€’ First Suggestion: β€’ Divide the problem space into smaller linearly separable regions β€’ Use a perceptron for each linearly separable region β€’ Combine the output of multiple hidden neurons to produce a final decision neuron. 4 Region 1 Region 2
  • 5. Why we need Multi-layer Feed forward Networks (MLFF)? β€’ Second suggestion β€’ In some cases we need a curve decision boundary or we try to solve more complicated classification and regression problems. β€’ So, we need to: β€’ Add more layers β€’ Increase a number of neurons in each layer. β€’ Use non linear activation function in the hidden layers. β€’ So , we need Multi-layer Feed forward Networks (MLFF). 5
  • 6. Notation for Multi-Layer Networks β€’ Dealing with multi-layer networks is easy if a sensible notation is adopted. β€’ We simply need another label (n) to tell us which layer in the network we are dealing with. β€’ Each unit j in layer n receives activations π‘œπ‘’π‘‘π‘– (π‘›βˆ’1) 𝑀𝑖𝑗 (𝑛) from the previous layer of processing units and sends activations π‘œπ‘’π‘‘π‘— (𝑛) to the next layer of units. 6 1 2 3 1 2 layer (0) layer (1) π’˜π’Šπ’‹ (𝟏) layer (n-1) layer (n) π’˜π’Šπ’‹ (𝒏)
  • 7. ANN Representation (1 input layer + 1 hidden layer +1 output layer) 7 for example: 𝑧1 (1) = (𝑀11 (1) π‘₯1 + 𝑀21 (1) π‘₯2+ 𝑀31 (1) π‘₯3 + 𝑏1 (1) ) π‘Ž1 (1) = 𝑓 𝑧1 (1) =Οƒ (𝑧1 (1) ) 𝑧2 (2) = (𝑀12 (2) π‘Ž1 (1) + 𝑀22 (2) π‘Ž2 (1) + 𝑀32 (2) π‘Ž3 (1) + 𝑏2 (2) ) 𝑦2 = π‘Ž2 (2) = 𝑓 𝑧2 (2) =Οƒ (𝑧2 (2) ) 𝒛𝒋 (𝒍) = 𝒋 π’˜π’Šπ’‹ (𝒍) π’‚π’Š (π’βˆ’πŸ) + 𝒃𝒋 (𝒍) 𝒂𝒋 (𝒍) = 𝒇 𝒛𝒋 𝒍 = Οƒ 𝒛𝒋 𝒍 layer (0) π‘₯1= π‘Ž1 (0) π‘₯2= π‘Ž2 (0) 𝒛 𝟏 (𝟏) 𝒂 𝟏 (𝟏) 𝒛 𝟐 (𝟏) 𝒂 𝟐 (𝟏) 𝒛 πŸ‘ (𝟏) 𝒂 πŸ‘ (𝟏) layer (1) 𝒛 𝟏 (𝟐) 𝒂 𝟏 (𝟐) 𝒛 𝟐 (𝟐) 𝒂 𝟐 (𝟐) layer (2) π’˜ 𝟏𝟏 (𝟏) π’˜ 𝟏𝟐 (𝟏) π’˜ πŸπŸ‘ (𝟏) π’˜ 𝟐𝟏 (𝟏) π’˜ 𝟐𝟐 (𝟏) π’˜ πŸπŸ‘ (𝟏) π’˜ πŸ‘πŸ (𝟏) π’˜ πŸ‘πŸ (𝟏) π’˜ πŸ‘πŸ‘ (𝟏) π’˜ 𝟏𝟏 (𝟐) π’˜ 𝟏𝟐 (𝟐) π’˜ 𝟐𝟏 (𝟐) π’˜ 𝟐𝟐 (𝟐) π’˜ πŸ‘πŸ (𝟐) π’˜ πŸ‘πŸ (𝟐) π’š 𝟏 π’š 𝟐 π‘₯2= π‘Ž2 (0)
  • 9. Error Function ● how we can evaluate performance of a neuron ???? ● We can use a Error function (or cost function or loss function) to measure how far off we are from the expected value. ● Choosing appropriate Error function help the learning algorithm to reach to best values for weights and biases. ● We’ll use the following variables: β—‹ D to represent the true value (desired value) β—‹ y to represent neuron’s prediction 9
  • 10. Error Functions (Cost function or Lost Function) β€’ There are many formulates for error functions. β€’ In this course, we will deal with two Error function formulas. Sum Squared Error (SSE) : 𝑒 𝑝𝑗 = 𝑦𝑗 βˆ’ 𝐷𝑗 2 for single perceptron 𝐸𝑆𝑆𝐸= 𝑗=1 𝑛 𝑦𝑗 βˆ’ 𝐷𝑗 2 1 Cross entropy (CE): 𝐸 𝐢𝐸 = 1 𝑛 𝑗=1 𝑛 [𝐷𝑗 βˆ— ln(𝑦𝑗) + (1βˆ’ 𝐷𝑗) βˆ— ln(1βˆ’ 𝑦𝑗)] (2) 10 1 2
  • 11. Why the error in ANN occurs? β€’ Each weight and bias in the network contribute in the occasion of the error. β€’ To solve this we need: β€’ A cost function or error function to compute the error. (SSE or CE Error function) β€’ An optimization algorithm to minimize the error function. (Gradient Decent) β€’ A learning algorithm to modify weights and biases to new values to get the error down. (Backpropagation) β€’ Repeat this operation until find the best solution 11
  • 12. Gradient Decent (in 1 dimension) β€’ Assume we have a error function E and we need to use it to update one weight w β€’ The figure show the error function in terms of w β€’ Our target is to learn the value of w produces the minimum value of E. How? 12 E W minimum
  • 13. Gradient Decent (in 1 dimension) β€’ In Gradient Decent algorithm, we use the following equation to get a better value of w: 𝑀 = 𝑀 βˆ’ αΔ𝑀 (called Delta rule) Where: Ξ± : is the learning rate Δ𝑀 : is mathematically can be computed using derivative of E with respect to w ( 𝑑𝐸 𝑑𝑀 ) 13 E W minimum 𝑀 = 𝑀 βˆ’ Ξ± 𝑑𝐸 𝑑𝑀 (3)
  • 16. Gradient Decent (multi dimension) β€’ In ANN with many layers and many neurons in each layer the Error function will be multi-variable function. β€’ So, the derivative in equation (3) should be partial derivative 𝑀𝑖𝑗 = 𝑀𝑖𝑗 βˆ’ Ξ± πœ•πΈ 𝑗 πœ•π‘€ 𝑖𝑗 (4) β€’ We write equation (4) as : 𝑀𝑖𝑗 = 𝑀𝑖𝑗 βˆ’ Ξ± πœ•π‘€π‘–π‘— β€’ Same process will be use to get the new bias value: 𝑏𝑗= 𝑏𝑗 βˆ’ Ξ± πœ•π‘π‘— 16
  • 17. derivative of activation functions 17 Sigmoid
  • 18. Learning Rule in the output layer using SSE as error function and sigmoid as Activation function πœ•πΈ 𝑗 πœ•π‘€ 𝑖𝑗 = πœ•πΈ 𝑗 πœ•π‘Ž 𝑗 (𝑙) * πœ•π‘Ž 𝑗 (𝑙) πœ•π‘§ 𝑗 (𝑙) * πœ•π‘§ 𝑗 (𝑙) πœ•π‘€π‘–π‘— (𝑙) Where: 𝐸𝑗 = 𝑗 (𝑦𝑗 βˆ’ 𝐷𝑗 )2 𝑦𝑗 = π‘Žπ‘— (𝑙) = 𝑓 𝑧𝑗 𝑙 = 𝜎 𝑧𝑗 𝑙 𝑧𝑗 (𝑙) = 𝑗 𝑀𝑖𝑗 (𝑙) π‘Žπ‘– (π‘™βˆ’1) + 𝑏𝑗 (𝑙) From the previous table: πœŽβ€² 𝑧𝑗 𝑙 = 𝜎 𝑧𝑗 𝑙 βˆ— 1 βˆ’ 𝜎 𝑧𝑗 𝑙 = 𝑦𝑗 (1 βˆ’ 𝑦𝑗) 18
  • 19. Learning Rule in the output layer (cont.) So (How?), πœ•π‘¦π‘— πœ•π‘§π‘– = 𝑦𝑗 (1 βˆ’ 𝑦𝑗) πœ•π‘§π‘— πœ•π‘€π‘–π‘— = π‘Žπ‘– (π‘™βˆ’1) πœ•πΈπ‘— πœ•π‘¦π‘— = βˆ’2(𝑦𝑗 βˆ’ 𝐷𝑗 ) β€’ Then: πœ•πΈ 𝑗 πœ•π‘€ 𝑖𝑗 = 2π‘Žπ‘– π‘™βˆ’1 𝑦𝑗 βˆ’ 𝐷𝑗 𝑦𝑗 1 βˆ’ 𝑦𝑗 𝑀𝑖𝑗 = 𝑀𝑖𝑗 βˆ’ 2 Ξ± π‘Žπ‘– π‘™βˆ’1 𝑦𝑗 βˆ’ 𝐷𝑗 𝑦𝑗 1 βˆ’ 𝑦𝑗 19
  • 20. Learning Rule in the Hidden layer β€’ Now we have to determine the appropriate weight change for an input to hidden weight. β€’ This is more complicated because it depends on the error at all of the nodes this weighted connection can lead to. β€’ The mathematical proof is out our scope. 20
  • 21. Gradient Decent (Notes) Note 1: β€’ the neuron activation function (f ) should be is defined and differentiable function. Note 3: β€’ The calculating of πœ•π‘€π‘–π‘— for the hidden layer will be more difficult (Why?) Note 2: β€’ The previous calculation will be repeated for each weight and for each bias in the ANN β€’ So, we need big computational power (what about deeper networks? ) 21
  • 22. Gradient Decent (Notes) β€’ πœ•π‘€π‘–π‘— is represent the change in the values of 𝑀𝑖𝑗 to get better output β€’ The equation of πœ•π‘€π‘–π‘— is dependent on the choosing of the Error(Cost) function and activation function. β€’ Gradient Decent algorithm help in calculated the new values of weights and bias. β€’ Question: is one iteration (one trail) enough to bet the best values for weights and biases β€’ Answer: No, we need a extended version ? Backpropagation 22
  • 23. How Backpropagation Work? 23 π’˜ 𝟏𝟏 (𝟏) π’˜ 𝟏𝟐 (𝟏) π’˜ 𝟐𝟏 (𝟏) π’˜ 𝟐𝟐 (𝟏) π’˜ πŸ‘πŸ (𝟏) π’˜ πŸ‘πŸ (𝟏) π’˜ 𝟏𝟏 (𝟐) π’˜ 𝟐𝟏 (𝟐) π’š 𝒂 𝟏 (𝟏) = π’˜ 𝟏𝟏 (𝟏) -𝛼 πœ•π’˜ 𝟏𝟏 (𝟏) = π’˜ 𝟏𝟏 (𝟐) -𝛼 πœ•π’˜ 𝟏𝟏 (𝟐) π‘­π’π’“π’˜π’‚π’“π’… π‘·π’“π’π’‘π’‚π’ˆπ’‚π’•π’Šπ’π’ π‘©π’‚π’„π’Œ π‘·π’“π’π’‘π’‚π’ˆπ’‚π’•π’Šπ’π’ π’π’‚π’šπ’†π’“ 𝟎 π’π’‚π’šπ’†π’“ 𝟏 π’π’‚π’šπ’†π’“ 𝟐
  • 24. Online Learning vs. Offline Learning β€’ Online: Pattern-by-Pattern learning β€’ Error calculated for each pattern β€’ Weights updated after each individual pattern πš«π’˜π’Šπ’‹ = βˆ’πœΆ 𝝏𝑬 𝒑 ππ’˜π’Šπ’‹ β€’ Offline: Batch learning β€’ Error calculated for all patterns β€’ Weights updated once at the end of each epoch πš«π’˜π’Šπ’‹ = βˆ’πœΆ 𝒑 𝝏𝑬 𝒑 ππ’˜π’Šπ’‹ 24
  • 25. Choosing Appropriate Activation and Cost Functions β€’ We already know consideration of single layer networks what output activation and cost functions should be used for particular problem types. β€’ We have also seen that non-linear hidden unit activations are needed, such as sigmoids. β€’ So we can summarize the required network properties: β€’ Regression/ Function Approximation Problems β€’ SSE cost function, linear output activations, sigmoid hidden activations β€’ Classification Problems (2 classes, 1 output) β€’ CE cost function, sigmoid output and hidden activations β€’ Classification Problems (multiple-classes, 1 output per class) β€’ CE cost function, softmax outputs, sigmoid hidden activations β€’ In each case, application of the gradient descent learning algorithm (by computing the partial derivatives) leads to appropriate back-propagation weight update equations. 25
  • 26. Overall picture : learning process on ANN 26
  • 27. Neural network simulator β€’ Search through the internet to find a simulator and report it For example: β€’ https://www.mladdict.com/neural-network- simulator β€’ http://playground.tensorflow.org/ 27