SlideShare a Scribd company logo
1 of 42
Download to read offline
CHAPTER 04
MULTILAYER PERCEPTRONS
CSC445: Neural Networks
Prof. Dr. Mostafa Gadal-Haqq M. Mostafa
Computer Science Department
Faculty of Computer & Information Sciences
AIN SHAMS UNIVERSITY
(most of figures in this presentation are copyrighted to Pearson Education, Inc.)
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
 Introduction
 Limitation of Rosenblatt’s Perceptron
 Batch Learning and On-line Learning
 The Back-propagation Algorithm
 Heuristics for Making the BP Alg. Perform Better
 Computer Experiment
2
Multilayer Perceptron
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Introduction
 Limitation of Rosenblatt’s Perceptron
 AND operation:
3
www
www
www
www
011
001
010
000
021
021
021
021




www
ww
ww
w
021
01
02
0 0




dx2x1
000
010
001
111
+1
x1
x2
w0
w1
w2
y
Its easy to find a set of weight that satisfy the above inequalities.
xxfy )201010( 21 
z
e
zf 


1
1
)(
Linear
Decision boundary
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Introduction
 Limitation of Rosenblatt’s Perceptron
 OR Operation:
4
www
www
www
www
011
001
010
000
021
021
021
021




www
ww
ww
w
021
01
02
0 0




dx2x1
000
110
101
111
+1
x1
x2
w0
w1
w2
y
Its easy to find a set of weight that satisfy the above inequalities.
xxfy )102020( 21 
z
e
zf 


1
1
)(
Linear
Decision boundary
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Introduction
 Limitation of Rosenblatt’s Perceptron
 XOR Operation:
5
www
www
www
www
011
001
010
000
021
021
021
021




www
ww
ww
w
021
01
02
0 0




Clearly the second and third inequalities are incompatible with the fourth, so
there is no solution for the XOR problem. We need more complex networks!
dx2x1
000
110
101
011
+1
x1
x2
w0
w1
w2
y
Non-linear
Decision boundary
fy (???)
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The XOR Problem
 A two-layer Network to solve the XOR Problem
Figure 4.8 (a) Architectural graph of network for solving the XOR problem. (b)
Signal-flow graph of the network.
6
b
ww
2
3
1
1
1211


b
ww
2
1
1
2
2221


b
ww
2
1
1,2
3
3231


ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The XOR Problem
 A two-layer Network to solve the XOR Problem
Figure 4.9 (a) Decision boundary constructed by hidden neuron 1 of the network in
Fig. 4.8. (b) Decision boundary constructed by hidden neuron 2 of the network. (c)
Decision boundaries constructed by the complete network.
7
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 8
MLP: Some Preliminaries
 The multilayer perceptron (MLP) is
proposed to overcome the limitations of the
perceptron
 That is, building a network that can solve
nonlinear problems.
 The basic features of the multilayer perceptrons:
 Each neuron in the network includes a nonlinear activation
function that is differentiable.
 The network contains one or more layers that are hidden from
both the input and output nodes.
 The network exhibits a high degree of connectivity.
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
MLP: Some Preliminaries
 Architecture of a multilayer perceptron
Figure 4.1 Architectural graph of a multilayer perceptron with two hidden layers.
9
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
MLP: Some Preliminaries
 Weight Dimensions
10
If network has n units in layer i , m units in layer i +1 , then the weight
matrix Wij will be of dimension m x (n+1) .
Wij
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
MLP: Some Preliminaries
 Number of neuron in the output layer
11
Pedestrian Car Motorcycle Truck
CarPedestrain Moto Truck
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 12
MLP: Some Preliminaries
 Training of the multilayer perceptron proceeds in
two phases:
 In the forward phase, the weights of the network are fixed and
the input signal is propagated through the network, layer by
layer, until it reaches the output.
 In the backward phase, the error signal, which is produced by
comparing the output of the network and the desired response,
is propagated through the network, again layer by layer, but in
the backward direction.
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
MLP: Some Preliminaries
 Function Signal:
 is the input signal that comes in
at the input end of the network,
propagates forward (neuron
by neuron) through the network,
and emerges at the output of the
network as an output signal.
 Error Signal:
 originate at the output neuron of
the network and propagates
backward (layer by layer)
through the network.
 Each hidden or output
neuron computes these two
signals.
Figure 4.2 Illustration of the
directions of two basic signal flows
in a multilayer perceptron: forward
propagation of function signals
and back propagation of error
signals.
13
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
MLP: Some Preliminaries
 Function of the Hidden neurons
 The hidden neurons play a critical role in the operation of a
multilayer perceptron; they act as feature detectors.
 The nonlinearity transform the input data into a feature
space in which data may be separated easily.
 Credit Assignment Problem
 Is the problem of assigning a credit or a blame for overall
outcomes to the internal decisions made by the computational
units of the distributed learning system.
 The error-correction learning algorithm is easy to use for
training single layer perceptrons. But its not easy to use it for a
multilayer perceptrons,
 the backpropagation algorithm solves this problem.
14
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 An on-line learning algorithm.
Figure 4.3 Signal-flow graph highlighting the
details of output neuron j.
15



m
i
ijij nynwnv
0
)()()(
))(()( nvny ij 
)()()( nyndne jjj 
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 The weights are updated in a manner similar to the LMS and
the gradient descent method. That is, the instantaneous error
and the weight corrections are:
and
 Using the chain rule of calculus, we get:
 We have:
16
(n)e
2
1
(n) 2
jj E (n)w
(n)
ηw
ji
j
ji



E
Δ
)(and))((1 ny
(n)w
(n)v
,nv
(n)v
(n)y
,
(n)y
(n)e
(n),e
(n)e
(n)
i
ji
j
jj
ji
j
j
j
j
j
j













E
(n)w
(n)v
(n)v
(n)y
(n)y
(n)e
(n)e
(n)
(n)w
(n)
ji
j
j
j
j
j
j
j
ji
j










 EE
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 which yields:
 Then the weight correction is given by:
 where the local gradient j (n) is defined by:
17
)(
)(
)(
)(
)(
)(
)(
)(
)(
nv
ny
ny
ne
ne
n
nv
n
n
j
j
j
j
j
j
j
j
j










E
E

)())(( nynv(n)e
(n)w
(n)
ijjj
ji
j


E
)()()(Δ nynηnw ijji 
))(()()( nvnen jjjj  
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 That is, the local gradient of neuron j is equal to the product
of the corresponding error signal of that neuron and the
derivative of the associated of the activation function. Then,
we have two distinct cases:
 Case 1: Neuron j is an output node:
 In this case, it is easy to use the credit assignment rule to compute
the error signal ej(n), because we have the desired signal visible to
the output neuron. That is, ej(n)=dj(n) - yj(n).
 Case 2: Neuron j is an hidden node:
 In this case, the desired signal is not visible to the hidden neuron.
Accordingly, the error signal for the hidden neuron would have to be
determined recursively and working backwards in terms of the
error signals of all the neurons to which that hidden neuron is
directly connected.
18
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 Case 2: Neuron j is hidden node.
Figure 4.4 Signal-flow graph highlighting the details of output neuron k connected
to hidden neuron j.
19
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 We redefine the local gradient for a hidden neuron j as:
 Where the total instantaneous error of the output neuron k:
 Differentiating w. r. t. yj (n) yields:
 But
 Hence
20
))((
)(
)(
)(
)(
)(
)(
)( nv
ny
n
nv
ny
ny
n
n jj
jj
j
j
j  








EE



Ck
k(n)e(n) 2
2
1
E











k j
k
k
k
k
k j
k
k
j ny
v
nv
ne
ne
ny
ne
ne
ny
n
)()(
)(
)(
)(
)(
)(
)(
)(E
))(()()()( nvndnynd(n)e kkkkkk 
))((
)(
nv
nv
e
kk
k
k 


ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 Also, we have
 Differentiating, yields:
 Then, we get
 Finally, the backpropagation for the local gradient of (hidden)
neuron j, (neuron k is output neuron), is given by:
21

k
kjkjjj nwnnvn )()())(()( 
 


k
kjk
k
kjkkk
j
wnwnvne
ny
n
)())(()(
)(
)(

E
)(
)(
)(
nw
ny
nv
kj
j
k 





m
j
jkjk nynwnv
0
)()()(
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
Figure 4.5 Signal-flow graph of a part of the adjoint system pertaining to back-
propagation of error signals.
22
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Back-propagation Algorithm
 We summarize the relations for the back-propagation algorithm:
 First: the correction wji(n) applied to the weight connecting
neuron i to neuron j is defined by the delta rule:
 Second: local gradient j (n) depends on neuron j :
 Neuron j is an output node:
 Neuron j is an hidden node (neuron k is output or hidden):
23































 











 )(
jneuronof
signalinput
)(
gradient
local
parameter
ratelearning
)(
correction
weight
nynnw ijji 
)()()(;))(()()( nyndnenvnen jjjjjjj  

k
kjkjjj nwnnvn )()())(()( 
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Activation Function
 Differentiability is the only requirement that an activation
function has to satisfy in the BP Algoruthm.
 This is required to compute the  for each neuron.
 Sigmoidal functions are commonly used, since they satisfy
such a condition:
 Logistic Function
 Hyperbolic Tangent Function
24
0a,
)exp(1
1
)( 


av
v )](1)[(
)exp(1
)exp(
)(' vva
av
ava
v  



0ba,,)tanh()(  bvav )]()][([)(' vava
a
b
v  
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
The Rate of Learning
 A simple method of increasing the rate of learning
and avoiding instability (for large learning rate ) is
to modify the delta rule by including a momentum
term as:
Figure 4.6 Signal-flow graph
illustrating the effect of
momentum constant α, which lies
inside the feedback loop.
25
 where  is usually a positive
number called the momentum
constant.
 To ensure convergence, the
momentum constant must be
restricted to
)()()1()(Δ nynηnwnw ijjiji  
10  
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Summary of the Back-propagation Algorithm
1. Initialization
2. Presentation of
training example
3. Forward
computation
4. Backward
computation
5. Iteration
Figure 4.7 Signal-flow graphical summary of back-propagation learning. Top part of
the graph: forward pass. Bottom part of the graph: backward pass.
26
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Heuristics for making the BP Better
1. Stochastic vs. Batch update
 Stochastic (sequential) mode is computationally faster than the
batch mode.
2. Maximizing information content
 Use an example that results in large training error
 Use an example that is radically different from the others.
3. Activation function
 Use an odd function
 Hyperbolic not logistic function
27
)tanh()( bvav 
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Heuristics for making the BP Better
4. Target values
 Its very important to choose the
values of the desired response
to be within the range of the
sigmoid function.
5. Normalizing the input
 Each input should be
preprocessed so that its mean
value, averaged over the entire
training sample, is close to zero,
or else it will be small
compared to its standard
deviation.
28
Figure 4.11 Illustrating the operation of mean
removal, decorrelation, and covariance
equalization for a two-dimensional input space.
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Heuristics for making the BP Better
6. Initialization
 A good choice will be of tremendous help.
 Initialize the weights so that the standard deviation of the
induced local field v of a neuron lies in the transition area
between the linear and saturated parts or its sigmoid function.
7. Learning from hints
 Is achieved by allowing prior information that we may have
about the mapping function, e.g., symmetry, invariances, etc.
8. Learning rate
 All neurons in the multilayer should learn at the same rate,
except for that at the last layer, the learning rate should be
assigned smaller value than that of the front layers.
29
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Batch Learning and On-line Learning
 Consider the training sample used to train the network in supervised
manner:
T = {x(n), d(n); n =1, 2, …, N}
 If yj(n) is the functional signal produced at the output neuron j. the
error signal produced at the same neuron is:
ej (n) = dj(n) – yj (n)
 the instantaneous error produced at the output neuron j is:
 the total instantaneous error of the whole network is:
 the total instantaneous error averaged over the training sample:
30



Cj
2
j
Cj
j (n)e
2
1
(n)(n) EE
 
 

N
1n Cj
2
j
N
1n
av (n)e
2N
1
(n)
N
1
(n) EE
(n)e
2
1
(n) 2
jj E
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Batch Learning and On-line Learning
Batch Learning:
 Adjustment of the weights of the MLP is performed after the
presentation of all the N training examples T.
 this is called an epoch of training.
 Thus, weight adjustment is made on epoch-by-epoch basis.
 After each epoch, the examples in the training samples T are randomly
shuffled.
 Advantages:
 Accurate estimation of the gradient vector (the derivates of the cost
function Eav w.r.t. the weight vector w), which therefore guarantee the
convergence of the method of steepest descent to a local minimum.
 Parallelization of the learning process.
 Disadvantages: it is demanding in terms of storage requirements.
31
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Batch Learning and On-line Learning
On-line Learning:
 Adjustment of the weights of the MLP are performed on an example-
by-example basis.
 The cost function to be minimized is therefore the total instantaneous
error E (n).
 An epoch of training is the presentation all the N samples to the
network. Also, in each epoch the examples are randomly shuffled.
 Advantages:
 Its stochastic learning nature, make it less likely to be trapped in
local minimum.
 it is much less demanding in terms of storage requirements.
 Disadvantages:
 We can not Parallelize the learning process.
32
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Batch Learning and On-line Learning
 Batch learning vs. On-line Learning:
33
On-line LearningBatch learning
The learning process is performed
in stochastic manner.
The learning process is performed
by ensemble averaging, which in
statistical context my be viewed as
a form of statistical inference.
It is less likely to be trapper in a
local minimum.
Guarantee for convergence to local
minimum.
Can not be parallelizedCan be parallelized
Require much less storageRequire large storage
Well suited for pattern
classification problems.
Well suited for nonlinear
regression problems.
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Generalization
 A network is said to generalize well when
the network input-output mapping is
correct (or nearly so) for the test data.
 If we viewed the learning process as “curve-
fitting”.
 When the network is trained with too many
sample, it may become overfitted, or
overtrained, which lead to wrong
generalization.
 Sufficient training-Sample Size
 Generalization is influenced by three factors:
 The size of the training sample
 The network architecture
 The physical complexity of the problem at hand
 In practice, good generalization is achieved if
we the training sample size, N, satisfies:
 W is number of free parameters in the
network, and  is the fraction of classification
error permitted on test data.
Figure 4.16 (a) Properly fitted nonlinear
mapping with good generalization. (b) Overfitted
nonlinear mapping with poor generalization.
34
)/( WON 
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Cross-Validation Method
 Cross-Validation is a standard tool in statistics that
provide appealing guiding principle:
 First: the available data set is randomly partitioned into a
training set and a test set.
 Second: the training set is further partitioned into two disjoint
subsets:
 An estimation subset, used to select the model (estimate the
parameters).
 A validation subset, used to test or validate the model
 The training set is used to assess various models and choose the
“best” one.
 However, this best model may be overfitting the validation data.
 Then, to guard against this possibility, the generalization
performance is measured on the test set, which is different from
the validation subset.
35
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Cross-Validation Method
 Early-stopping Method
 (Holdout method)
 The training is stopped
periodically, i.e., after so many
epochs, and the network is
assessed using the validation
subset.
 When the validation phase is
complete, the estimation
(training) is resumed for another
period, and the process is
repeated.
 The best model (parameters) is
that at the minimum validation
error.
Figure 4.17 Illustration of the early-
stopping rule based on cross-
validation.
36
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Cross-Validation Method
 Variant of Cross-Validation
 (Multifold Method)
 Divide the data set of N samples
into K subsets, where K>1.
 The network is validated in each
trial using a different subset.
After training the network using
the other subsets.
 The performance of the model is
assessed by averaging the
squared error under validation
over all trials.
Figure 4.18 Illustration of the multifold
method of cross-validation. For a given trial,
the subset of data shaded in red is used to
validate the model trained on the remaining
data.
37
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Computer Experiment
 d= -4
Figure 4.12 Results of the computer experiment on the back-propagation
algorithm applied to the MLP with distance d = –4. MSE stands for mean-square
error.
38
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Computer Experiment
 d = -5
Figure 4.13 Results of the computer experiment on the back-propagation
algorithm applied to the MLP with distance d = –5.
39
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq
Real Experiment
 Handwritten Digit Recognition*
*Courtesy of Yann LeCun.
40
•Problems:
•4.1, 4.3
•Computer Experiment
•4.15
Homework 4
41
Kernel Methods and
RBF Networks
Next Time
42

More Related Content

What's hot

Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Sivagowry Shathesh
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applicationsSangeeta Tiwari
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural networkNagarajan
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima PanditPurnima Pandit
 
Analytical learning
Analytical learningAnalytical learning
Analytical learningswapnac12
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Neural network
Neural networkNeural network
Neural networkSilicon
 

What's hot (20)

04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
HOPFIELD NETWORK
HOPFIELD NETWORKHOPFIELD NETWORK
HOPFIELD NETWORK
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
 
Hebb network
Hebb networkHebb network
Hebb network
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Back propagation
Back propagationBack propagation
Back propagation
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
Neural network
Neural networkNeural network
Neural network
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
 
Analytical learning
Analytical learningAnalytical learning
Analytical learning
 
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Neural network
Neural networkNeural network
Neural network
 

Similar to Neural Networks: Multilayer Perceptron

Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...IOSR Journals
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...IOSR Journals
 
Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...ijsrd.com
 
Artificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA ChipArtificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA ChipMaria Perkins
 
mohsin dalvi artificial neural networks presentation
mohsin dalvi   artificial neural networks presentationmohsin dalvi   artificial neural networks presentation
mohsin dalvi artificial neural networks presentationAkash Maurya
 
14. mohsin dalvi artificial neural networks presentation
14. mohsin dalvi   artificial neural networks presentation14. mohsin dalvi   artificial neural networks presentation
14. mohsin dalvi artificial neural networks presentationPurnesh Aloni
 
Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...
Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...
Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...IJRES Journal
 
Neural Network Dynamical Systems
Neural Network Dynamical Systems Neural Network Dynamical Systems
Neural Network Dynamical Systems M Reza Rahmati
 
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...ijtsrd
 
Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...
Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...
Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...Waqas Tariq
 
Efficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesEfficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesIRJET Journal
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronMostafa G. M. Mostafa
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patternsVito Strano
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologytheijes
 
Neural network based identification of multimachine power system
Neural network based identification of multimachine power systemNeural network based identification of multimachine power system
Neural network based identification of multimachine power systemcsandit
 

Similar to Neural Networks: Multilayer Perceptron (20)

1.pptx
1.pptx1.pptx
1.pptx
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...
 
Artificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA ChipArtificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA Chip
 
mohsin dalvi artificial neural networks presentation
mohsin dalvi   artificial neural networks presentationmohsin dalvi   artificial neural networks presentation
mohsin dalvi artificial neural networks presentation
 
14. mohsin dalvi artificial neural networks presentation
14. mohsin dalvi   artificial neural networks presentation14. mohsin dalvi   artificial neural networks presentation
14. mohsin dalvi artificial neural networks presentation
 
Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...
Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...
Study Of The Fault Diagnosis Based On Wavelet And Fuzzy Neural Network For Th...
 
Neural Network Dynamical Systems
Neural Network Dynamical Systems Neural Network Dynamical Systems
Neural Network Dynamical Systems
 
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
 
6
66
6
 
Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...
Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...
Incorporating Kalman Filter in the Optimization of Quantum Neural Network Par...
 
Efficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesEfficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of Trusses
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's Perceptron
 
071bct537 lab4
071bct537 lab4071bct537 lab4
071bct537 lab4
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patterns
 
20120140503023
2012014050302320120140503023
20120140503023
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 
Neural network based identification of multimachine power system
Neural network based identification of multimachine power systemNeural network based identification of multimachine power system
Neural network based identification of multimachine power system
 

More from Mostafa G. M. Mostafa

Digital Image Processing: Image Restoration
Digital Image Processing: Image RestorationDigital Image Processing: Image Restoration
Digital Image Processing: Image RestorationMostafa G. M. Mostafa
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationMostafa G. M. Mostafa
 
Digital Image Processing: Image Enhancement in the Spatial Domain
Digital Image Processing: Image Enhancement in the Spatial DomainDigital Image Processing: Image Enhancement in the Spatial Domain
Digital Image Processing: Image Enhancement in the Spatial DomainMostafa G. M. Mostafa
 
Digital Image Processing: Image Enhancement in the Frequency Domain
Digital Image Processing: Image Enhancement in the Frequency DomainDigital Image Processing: Image Enhancement in the Frequency Domain
Digital Image Processing: Image Enhancement in the Frequency DomainMostafa G. M. Mostafa
 
Digital Image Processing: Digital Image Fundamentals
Digital Image Processing: Digital Image FundamentalsDigital Image Processing: Digital Image Fundamentals
Digital Image Processing: Digital Image FundamentalsMostafa G. M. Mostafa
 
Digital Image Processing: An Introduction
Digital Image Processing: An IntroductionDigital Image Processing: An Introduction
Digital Image Processing: An IntroductionMostafa G. M. Mostafa
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machinesMostafa G. M. Mostafa
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionMostafa G. M. Mostafa
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa
 

More from Mostafa G. M. Mostafa (20)

Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
 
CSC446: Pattern Recognition (LN8)
CSC446: Pattern Recognition (LN8)CSC446: Pattern Recognition (LN8)
CSC446: Pattern Recognition (LN8)
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
CSC446: Pattern Recognition (LN5)
CSC446: Pattern Recognition (LN5)CSC446: Pattern Recognition (LN5)
CSC446: Pattern Recognition (LN5)
 
CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)
 
CSC446: Pattern Recognition (LN3)
CSC446: Pattern Recognition (LN3)CSC446: Pattern Recognition (LN3)
CSC446: Pattern Recognition (LN3)
 
Csc446: Pattren Recognition (LN2)
Csc446: Pattren Recognition (LN2)Csc446: Pattren Recognition (LN2)
Csc446: Pattren Recognition (LN2)
 
Csc446: Pattren Recognition
Csc446: Pattren RecognitionCsc446: Pattren Recognition
Csc446: Pattren Recognition
 
Csc446: Pattren Recognition (LN1)
Csc446: Pattren Recognition (LN1)Csc446: Pattren Recognition (LN1)
Csc446: Pattren Recognition (LN1)
 
Digital Image Processing: Image Restoration
Digital Image Processing: Image RestorationDigital Image Processing: Image Restoration
Digital Image Processing: Image Restoration
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image Segmentation
 
Digital Image Processing: Image Enhancement in the Spatial Domain
Digital Image Processing: Image Enhancement in the Spatial DomainDigital Image Processing: Image Enhancement in the Spatial Domain
Digital Image Processing: Image Enhancement in the Spatial Domain
 
Digital Image Processing: Image Enhancement in the Frequency Domain
Digital Image Processing: Image Enhancement in the Frequency DomainDigital Image Processing: Image Enhancement in the Frequency Domain
Digital Image Processing: Image Enhancement in the Frequency Domain
 
Digital Image Processing: Digital Image Fundamentals
Digital Image Processing: Digital Image FundamentalsDigital Image Processing: Digital Image Fundamentals
Digital Image Processing: Digital Image Fundamentals
 
Digital Image Processing: An Introduction
Digital Image Processing: An IntroductionDigital Image Processing: An Introduction
Digital Image Processing: An Introduction
 
Neural Networks: Introducton
Neural Networks: IntroductonNeural Networks: Introducton
Neural Networks: Introducton
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 

Recently uploaded

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 

Recently uploaded (20)

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

Neural Networks: Multilayer Perceptron

  • 1. CHAPTER 04 MULTILAYER PERCEPTRONS CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq M. Mostafa Computer Science Department Faculty of Computer & Information Sciences AIN SHAMS UNIVERSITY (most of figures in this presentation are copyrighted to Pearson Education, Inc.)
  • 2. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq  Introduction  Limitation of Rosenblatt’s Perceptron  Batch Learning and On-line Learning  The Back-propagation Algorithm  Heuristics for Making the BP Alg. Perform Better  Computer Experiment 2 Multilayer Perceptron
  • 3. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Introduction  Limitation of Rosenblatt’s Perceptron  AND operation: 3 www www www www 011 001 010 000 021 021 021 021     www ww ww w 021 01 02 0 0     dx2x1 000 010 001 111 +1 x1 x2 w0 w1 w2 y Its easy to find a set of weight that satisfy the above inequalities. xxfy )201010( 21  z e zf    1 1 )( Linear Decision boundary
  • 4. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Introduction  Limitation of Rosenblatt’s Perceptron  OR Operation: 4 www www www www 011 001 010 000 021 021 021 021     www ww ww w 021 01 02 0 0     dx2x1 000 110 101 111 +1 x1 x2 w0 w1 w2 y Its easy to find a set of weight that satisfy the above inequalities. xxfy )102020( 21  z e zf    1 1 )( Linear Decision boundary
  • 5. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Introduction  Limitation of Rosenblatt’s Perceptron  XOR Operation: 5 www www www www 011 001 010 000 021 021 021 021     www ww ww w 021 01 02 0 0     Clearly the second and third inequalities are incompatible with the fourth, so there is no solution for the XOR problem. We need more complex networks! dx2x1 000 110 101 011 +1 x1 x2 w0 w1 w2 y Non-linear Decision boundary fy (???)
  • 6. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The XOR Problem  A two-layer Network to solve the XOR Problem Figure 4.8 (a) Architectural graph of network for solving the XOR problem. (b) Signal-flow graph of the network. 6 b ww 2 3 1 1 1211   b ww 2 1 1 2 2221   b ww 2 1 1,2 3 3231  
  • 7. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The XOR Problem  A two-layer Network to solve the XOR Problem Figure 4.9 (a) Decision boundary constructed by hidden neuron 1 of the network in Fig. 4.8. (b) Decision boundary constructed by hidden neuron 2 of the network. (c) Decision boundaries constructed by the complete network. 7
  • 8. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 8 MLP: Some Preliminaries  The multilayer perceptron (MLP) is proposed to overcome the limitations of the perceptron  That is, building a network that can solve nonlinear problems.  The basic features of the multilayer perceptrons:  Each neuron in the network includes a nonlinear activation function that is differentiable.  The network contains one or more layers that are hidden from both the input and output nodes.  The network exhibits a high degree of connectivity.
  • 9. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq MLP: Some Preliminaries  Architecture of a multilayer perceptron Figure 4.1 Architectural graph of a multilayer perceptron with two hidden layers. 9
  • 10. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq MLP: Some Preliminaries  Weight Dimensions 10 If network has n units in layer i , m units in layer i +1 , then the weight matrix Wij will be of dimension m x (n+1) . Wij
  • 11. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq MLP: Some Preliminaries  Number of neuron in the output layer 11 Pedestrian Car Motorcycle Truck CarPedestrain Moto Truck 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
  • 12. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 12 MLP: Some Preliminaries  Training of the multilayer perceptron proceeds in two phases:  In the forward phase, the weights of the network are fixed and the input signal is propagated through the network, layer by layer, until it reaches the output.  In the backward phase, the error signal, which is produced by comparing the output of the network and the desired response, is propagated through the network, again layer by layer, but in the backward direction.
  • 13. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq MLP: Some Preliminaries  Function Signal:  is the input signal that comes in at the input end of the network, propagates forward (neuron by neuron) through the network, and emerges at the output of the network as an output signal.  Error Signal:  originate at the output neuron of the network and propagates backward (layer by layer) through the network.  Each hidden or output neuron computes these two signals. Figure 4.2 Illustration of the directions of two basic signal flows in a multilayer perceptron: forward propagation of function signals and back propagation of error signals. 13
  • 14. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq MLP: Some Preliminaries  Function of the Hidden neurons  The hidden neurons play a critical role in the operation of a multilayer perceptron; they act as feature detectors.  The nonlinearity transform the input data into a feature space in which data may be separated easily.  Credit Assignment Problem  Is the problem of assigning a credit or a blame for overall outcomes to the internal decisions made by the computational units of the distributed learning system.  The error-correction learning algorithm is easy to use for training single layer perceptrons. But its not easy to use it for a multilayer perceptrons,  the backpropagation algorithm solves this problem. 14
  • 15. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  An on-line learning algorithm. Figure 4.3 Signal-flow graph highlighting the details of output neuron j. 15    m i ijij nynwnv 0 )()()( ))(()( nvny ij  )()()( nyndne jjj 
  • 16. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  The weights are updated in a manner similar to the LMS and the gradient descent method. That is, the instantaneous error and the weight corrections are: and  Using the chain rule of calculus, we get:  We have: 16 (n)e 2 1 (n) 2 jj E (n)w (n) ηw ji j ji    E Δ )(and))((1 ny (n)w (n)v ,nv (n)v (n)y , (n)y (n)e (n),e (n)e (n) i ji j jj ji j j j j j j              E (n)w (n)v (n)v (n)y (n)y (n)e (n)e (n) (n)w (n) ji j j j j j j j ji j            EE
  • 17. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  which yields:  Then the weight correction is given by:  where the local gradient j (n) is defined by: 17 )( )( )( )( )( )( )( )( )( nv ny ny ne ne n nv n n j j j j j j j j j           E E  )())(( nynv(n)e (n)w (n) ijjj ji j   E )()()(Δ nynηnw ijji  ))(()()( nvnen jjjj  
  • 18. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  That is, the local gradient of neuron j is equal to the product of the corresponding error signal of that neuron and the derivative of the associated of the activation function. Then, we have two distinct cases:  Case 1: Neuron j is an output node:  In this case, it is easy to use the credit assignment rule to compute the error signal ej(n), because we have the desired signal visible to the output neuron. That is, ej(n)=dj(n) - yj(n).  Case 2: Neuron j is an hidden node:  In this case, the desired signal is not visible to the hidden neuron. Accordingly, the error signal for the hidden neuron would have to be determined recursively and working backwards in terms of the error signals of all the neurons to which that hidden neuron is directly connected. 18
  • 19. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  Case 2: Neuron j is hidden node. Figure 4.4 Signal-flow graph highlighting the details of output neuron k connected to hidden neuron j. 19
  • 20. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  We redefine the local gradient for a hidden neuron j as:  Where the total instantaneous error of the output neuron k:  Differentiating w. r. t. yj (n) yields:  But  Hence 20 ))(( )( )( )( )( )( )( )( nv ny n nv ny ny n n jj jj j j j           EE    Ck k(n)e(n) 2 2 1 E            k j k k k k k j k k j ny v nv ne ne ny ne ne ny n )()( )( )( )( )( )( )( )(E ))(()()()( nvndnynd(n)e kkkkkk  ))(( )( nv nv e kk k k   
  • 21. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  Also, we have  Differentiating, yields:  Then, we get  Finally, the backpropagation for the local gradient of (hidden) neuron j, (neuron k is output neuron), is given by: 21  k kjkjjj nwnnvn )()())(()(      k kjk k kjkkk j wnwnvne ny n )())(()( )( )(  E )( )( )( nw ny nv kj j k       m j jkjk nynwnv 0 )()()(
  • 22. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm Figure 4.5 Signal-flow graph of a part of the adjoint system pertaining to back- propagation of error signals. 22
  • 23. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Back-propagation Algorithm  We summarize the relations for the back-propagation algorithm:  First: the correction wji(n) applied to the weight connecting neuron i to neuron j is defined by the delta rule:  Second: local gradient j (n) depends on neuron j :  Neuron j is an output node:  Neuron j is an hidden node (neuron k is output or hidden): 23                                              )( jneuronof signalinput )( gradient local parameter ratelearning )( correction weight nynnw ijji  )()()(;))(()()( nyndnenvnen jjjjjjj    k kjkjjj nwnnvn )()())(()( 
  • 24. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Activation Function  Differentiability is the only requirement that an activation function has to satisfy in the BP Algoruthm.  This is required to compute the  for each neuron.  Sigmoidal functions are commonly used, since they satisfy such a condition:  Logistic Function  Hyperbolic Tangent Function 24 0a, )exp(1 1 )(    av v )](1)[( )exp(1 )exp( )(' vva av ava v      0ba,,)tanh()(  bvav )]()][([)(' vava a b v  
  • 25. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq The Rate of Learning  A simple method of increasing the rate of learning and avoiding instability (for large learning rate ) is to modify the delta rule by including a momentum term as: Figure 4.6 Signal-flow graph illustrating the effect of momentum constant α, which lies inside the feedback loop. 25  where  is usually a positive number called the momentum constant.  To ensure convergence, the momentum constant must be restricted to )()()1()(Δ nynηnwnw ijjiji   10  
  • 26. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Summary of the Back-propagation Algorithm 1. Initialization 2. Presentation of training example 3. Forward computation 4. Backward computation 5. Iteration Figure 4.7 Signal-flow graphical summary of back-propagation learning. Top part of the graph: forward pass. Bottom part of the graph: backward pass. 26
  • 27. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Heuristics for making the BP Better 1. Stochastic vs. Batch update  Stochastic (sequential) mode is computationally faster than the batch mode. 2. Maximizing information content  Use an example that results in large training error  Use an example that is radically different from the others. 3. Activation function  Use an odd function  Hyperbolic not logistic function 27 )tanh()( bvav 
  • 28. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Heuristics for making the BP Better 4. Target values  Its very important to choose the values of the desired response to be within the range of the sigmoid function. 5. Normalizing the input  Each input should be preprocessed so that its mean value, averaged over the entire training sample, is close to zero, or else it will be small compared to its standard deviation. 28 Figure 4.11 Illustrating the operation of mean removal, decorrelation, and covariance equalization for a two-dimensional input space.
  • 29. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Heuristics for making the BP Better 6. Initialization  A good choice will be of tremendous help.  Initialize the weights so that the standard deviation of the induced local field v of a neuron lies in the transition area between the linear and saturated parts or its sigmoid function. 7. Learning from hints  Is achieved by allowing prior information that we may have about the mapping function, e.g., symmetry, invariances, etc. 8. Learning rate  All neurons in the multilayer should learn at the same rate, except for that at the last layer, the learning rate should be assigned smaller value than that of the front layers. 29
  • 30. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Batch Learning and On-line Learning  Consider the training sample used to train the network in supervised manner: T = {x(n), d(n); n =1, 2, …, N}  If yj(n) is the functional signal produced at the output neuron j. the error signal produced at the same neuron is: ej (n) = dj(n) – yj (n)  the instantaneous error produced at the output neuron j is:  the total instantaneous error of the whole network is:  the total instantaneous error averaged over the training sample: 30    Cj 2 j Cj j (n)e 2 1 (n)(n) EE      N 1n Cj 2 j N 1n av (n)e 2N 1 (n) N 1 (n) EE (n)e 2 1 (n) 2 jj E
  • 31. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Batch Learning and On-line Learning Batch Learning:  Adjustment of the weights of the MLP is performed after the presentation of all the N training examples T.  this is called an epoch of training.  Thus, weight adjustment is made on epoch-by-epoch basis.  After each epoch, the examples in the training samples T are randomly shuffled.  Advantages:  Accurate estimation of the gradient vector (the derivates of the cost function Eav w.r.t. the weight vector w), which therefore guarantee the convergence of the method of steepest descent to a local minimum.  Parallelization of the learning process.  Disadvantages: it is demanding in terms of storage requirements. 31
  • 32. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Batch Learning and On-line Learning On-line Learning:  Adjustment of the weights of the MLP are performed on an example- by-example basis.  The cost function to be minimized is therefore the total instantaneous error E (n).  An epoch of training is the presentation all the N samples to the network. Also, in each epoch the examples are randomly shuffled.  Advantages:  Its stochastic learning nature, make it less likely to be trapped in local minimum.  it is much less demanding in terms of storage requirements.  Disadvantages:  We can not Parallelize the learning process. 32
  • 33. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Batch Learning and On-line Learning  Batch learning vs. On-line Learning: 33 On-line LearningBatch learning The learning process is performed in stochastic manner. The learning process is performed by ensemble averaging, which in statistical context my be viewed as a form of statistical inference. It is less likely to be trapper in a local minimum. Guarantee for convergence to local minimum. Can not be parallelizedCan be parallelized Require much less storageRequire large storage Well suited for pattern classification problems. Well suited for nonlinear regression problems.
  • 34. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Generalization  A network is said to generalize well when the network input-output mapping is correct (or nearly so) for the test data.  If we viewed the learning process as “curve- fitting”.  When the network is trained with too many sample, it may become overfitted, or overtrained, which lead to wrong generalization.  Sufficient training-Sample Size  Generalization is influenced by three factors:  The size of the training sample  The network architecture  The physical complexity of the problem at hand  In practice, good generalization is achieved if we the training sample size, N, satisfies:  W is number of free parameters in the network, and  is the fraction of classification error permitted on test data. Figure 4.16 (a) Properly fitted nonlinear mapping with good generalization. (b) Overfitted nonlinear mapping with poor generalization. 34 )/( WON 
  • 35. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Cross-Validation Method  Cross-Validation is a standard tool in statistics that provide appealing guiding principle:  First: the available data set is randomly partitioned into a training set and a test set.  Second: the training set is further partitioned into two disjoint subsets:  An estimation subset, used to select the model (estimate the parameters).  A validation subset, used to test or validate the model  The training set is used to assess various models and choose the “best” one.  However, this best model may be overfitting the validation data.  Then, to guard against this possibility, the generalization performance is measured on the test set, which is different from the validation subset. 35
  • 36. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Cross-Validation Method  Early-stopping Method  (Holdout method)  The training is stopped periodically, i.e., after so many epochs, and the network is assessed using the validation subset.  When the validation phase is complete, the estimation (training) is resumed for another period, and the process is repeated.  The best model (parameters) is that at the minimum validation error. Figure 4.17 Illustration of the early- stopping rule based on cross- validation. 36
  • 37. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Cross-Validation Method  Variant of Cross-Validation  (Multifold Method)  Divide the data set of N samples into K subsets, where K>1.  The network is validated in each trial using a different subset. After training the network using the other subsets.  The performance of the model is assessed by averaging the squared error under validation over all trials. Figure 4.18 Illustration of the multifold method of cross-validation. For a given trial, the subset of data shaded in red is used to validate the model trained on the remaining data. 37
  • 38. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Computer Experiment  d= -4 Figure 4.12 Results of the computer experiment on the back-propagation algorithm applied to the MLP with distance d = –4. MSE stands for mean-square error. 38
  • 39. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Computer Experiment  d = -5 Figure 4.13 Results of the computer experiment on the back-propagation algorithm applied to the MLP with distance d = –5. 39
  • 40. ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq Real Experiment  Handwritten Digit Recognition* *Courtesy of Yann LeCun. 40
  • 42. Kernel Methods and RBF Networks Next Time 42