SlideShare a Scribd company logo
1 of 125
TYPES OF NEURAL NETWORKS
Dr.(Mrs.)Lini Mathew
Professor
Electrical Engineering Department
Simple Neural Network
X = I1W1+ I2W2+ ----- + INWN
Activation Function
S = K(X)
K is a threshold function
ie. S = 1 if X > T
S = O otherwise
T is a constant
threshold value.
Activation Functions
Threshold Function
S = 1 if X ≥ 0
S = 0 if X < 0
S = hardlim(X)
hard-limit transfer function
Also known as Heaviside step function
Binary-Step Function
S = 1 if X ≥ 
S = 0 if X < 
X
S
+1
-1
0
+1
Activation Functions
Signum Function
S = 1 if X ≥ 0
S = -1 if X < 0
S = hardlims(X)
symmetric hard-limit transfer function
+1
X
S
-1
0
+1
Activation Functions
Squashing Function or Logistic Function or Binary Sigmoidal
Function.
X = 0 S = 0.5 a is known
X > 0 S = 1 as steepness
X < 0 S = 0 parameter
S=logsig(X) log-sigmoid transfer function
aX
e
1
1
S 


Activation Functions
Hyperbolic Tangent Function or Bipolar Sigmoidal Function
S = tanh(X)
X = 0 S = 0
X > 0 S = 1
X < 0 S = -1
S=tansig(X) tan-sigmoid transfer function
aX
-2aX
e
1
e
-
1
S 2
2
1
1
2






 aX
e
Linear Transfer Function
S = purelin(X)
also known as identity function
S=X for all X
Positive Linear Transfer Function
S = poslin(X)
S = X if X ≥ 0
S = 0 if X < 0
Transfer Functions - MATLAB
X
S
+1
-1
0
+1
S
X
+1
-1
0
+1
Saturating Linear Transfer Function
S = satlin(X)
S = X if 0 ≤ X ≤ 1
S = 0 if X < 0
S = 1 if X > 1
Symmetric Saturating Linear Transfer
Function
S = satlins(X)
S = X if -1 ≤ X ≤ 1
S = -1 if X < -1
S = 1 if X > 1
Transfer Functions - MATLAB
X
S
+1
-1
0
+1
+1
-1
X
S
+1
-1
0
+1
+1
-1
Transfer Functions - MATLAB
Radial Basis Function
S = radbas(X)
S=e−X
2
Triangular Basis Function
S = tribas(X)
S = 1-abs(X) if -1 ≤ X ≤ 1
S = 0 otherwise
McCulloch-Pitts Neuron Model
 Formulated by Warren McCulloch and Walter
Pitts in 1943
 McCulloch-Pitts neuron allows binary 0 or 1
states only ie.it is binary activated
 The input neurons are connected by direct
weighted path, excitatory or inhibitory
 The excitatory connections-positive weights,
inhibitory-negative weights
 Neuron is associated with a threshold value
Learning Rules
 A neural network learns about its environment through
an interactive process of adjustments applied to its
synaptic weights and bias levels.
 The set of well defined rules for the solution of a learning
problem is called a learning algorithm
 Hebbian Learning Rule. Oldest and most famous of all
learning rules, designed by Donald Hebb in 1949.
 Represents a purely feed-forward, unsupervised learning
 If the cross product of output and input is positive, this
results in increase of weights, otherwise the weight
decreases.
 The weights are adjusted as Wij
(k+1)
= Wij
(k)
+ xi y
Learning Rules
 Perceptron Learning Rule. Learning signal is the difference
between the desired and natural neuron’s response.
 This type of learning is supervised.
 Neti = b + Σxi Wi
Calculated output
yi = f(Neti) = 1 if Neti > 0
= 0 if -0 ≤ Neti ≤ 0
= -1 if Neti < -0
Weight updation
 If t ≠ y and the value of xi not equal to zero
Wi
(k+1)
= Wi
(k)
+ α t xi
bi
(k+1)
= bi
(k)
+ α t
 If t = y, there is no change in weights
Learning Rules
 Delta Learning Rule (Widrow-Hoff Rule or Least Mean
Square (LMS) Rule.
 The delta learning rule is valid only for continuous
activation functions and in the supervised training mode.
 The delta rule assumes that the error signal is directly
measurable.
 The aim of the delta rule is to minimize the error over all
training patterns.
 ∆Wi = α (t - yi) xi
 The mean square error for a particular pattern is
E = Σ(ti – yi)2
 The gradient of E is a vector consisting of partial
derivatives of E with respect to each of the weights.
Learning Rules
 Competitive Learning Rule.
 This rule has a mechanism that permits the neurons to
compete for the right to respond to a given subset of
inputs, such that only one output neuron per group is
active at a time.
 The winner neuron during competition is called winner-
takes-all neuron.
 This rule is suited for unsupervised network training. This
is the standard Kohenen learning rule.
 For neuron P to be the winning neuron, its induced local
field vp for a given particular input pattern must be largest
among all the neurons in the network.
N = 1 if vp > vq for all q, p ≠ q
N = 0 otherwise
Characteristics of Neural Networks
 Exhibit mapping capabilities. They can map input
patterns to their associated output patterns
 Learn by examples. They can be trained with
known examples of a problem and therefore can
identify new objects previously untrained
 Possess the capability to generalize. They can
predict new outcomes from past trends.
 Are robust systems and are fault tolerant. They
can recall full patterns from incomplete, partial or
noisy patterns.
 Can process information in parallel, at high speed
and in a distributed manner
PERCEPTRON
Single Layer Perceptron - The simplest form
of neural network used for the classification
of patterns that are linearly separable.
Algorithm – To start the training process,
initially the weights and biases are set to
zero.
The learning rate value is set, which ranges
from 0 to 1.
Wi
(k+1)
= Wi
(k)
+ α t xi
bi
(k+1)
= bi
(k)
+ α t
Perceptron Network
Example: Training of an AND gate
(i) Bias b = 0 W1
(0)
= 0 W2
(0)
=0
Neti = b + Σxi Wi Net1 = 0 + 0 = 0
y1 = 0 as Net1 = 0 t = -1
W1
(1)
= W1
(0)
+  t x1 = 0 + 1x-1x-1 = 1
W2
(1)
= W2
(0)
+  t x2 = 0 + 1x-1x-1 = 1
b
(1)
= b
(0)
+ α t = 0 + 1x-1 = -1
Perceptron
x1 x2 t
0 0 0
0 1 0
1 0 0
1 1 1
x1 x2 t
-1 -1 -1
-1 1 -1
1 -1 -1
1 1 1
(ii) b = -1 W1
(1)
= 1 W2
(1)
= 1 x1 = -1 x2 = 1
Net1 = -1 + 1x-1 + 1x1 = -1
y1 = -1 as Net1 < 0 t = -1
No weight change
(iii) b = -1 W1
(1)
= 1 W2
(1)
= 1 x1 = 1 x2 = -1
Net1 = -1 + 1x1 + 1x-1 = -1
y1 = -1 as Net1 < 0 t = -1
No weight change
Perceptron
(iv) b = -1 W1
(1)
= 1 W2
(1)
= 1 x1 = 1 x2 = 1
Net1 = -1 + 1x1 + 1x1 = 1
y1 = 1 as Net1 > 0 t = 1
No weight change
Epoch 2
Perceptron
x1 x2 b net y t w1 w2
-1 -1 -1 -3 -1 -1 1 1
-1 1 -1 -1 -1 -1 1 1
1 -1 -1 -1 -1 -1 1 1
1 1 -1 1 1 1 1 1
Linear Separability
(0,0) (0,1)
(1,0) (1,1)
AND
(0,0) (0,1)
(1,0) (1,1)
XOR
Linear Separability
 Netj = Σ xi wi + b = x1 w1 + x2 w2 + b
 The relation Σ xi wi + b = 0 gives the boundary region of
the net input.
 The equation denoting this decision boundary can
represent a line or plane.
 On training, if the weights of training input vectors of
correct response +1 lie on one side of the boundary and
that of -1 lie on the other side of the boundary, then the
problem is linearly separable.
 x1 w1 + x2 w2 + b = 0
2
1
1
2
2
w
w
x
w
b
x 


-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6
-1
-0.5
0
0.5
1
1.5
Vectors to be Classified
P(1)
P(2)
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6
-1
-0.5
0
0.5
1
1.5
Vectors to be Classified
P(1)
P(2)
Linear Separability
Linear Separability
(0,0) (0,1)
(1,0) (1,1)
XOR
(0,0) (0,1)
(1,0) (1,1)
AND
Perceptrons are successful only on problems
with linearly separable solution space.
ADALINE Network
 Adaptive Linear Neuron
 Developed by Widrow and Hoff in 1960.
 Inputs could be binary, bipolar or real valued
 The training process is continued until the error
(t-yi) is minimum.
 Mean Square Error 𝐸 = 𝑖=1
𝑛
(𝑡 − 𝑦𝑖)2
 Learning algorithm (Delta Rule)
yi = 1 if Neti ≥ 0
= -1 otherwise
Weight Adjustment:
Wi
(k+1)
= Wi
(k)
+  (t-yi)xi
Example: ADALINE network for OR function
(i) Bias b = w1
(0)
= w2
(0)
= 0.1  = 0.4
Neti = b + Σxi wi Net1 = 0.1 + 0.1 +0.1 = 0.3
y1 = 0.3 t = 1 ∆wi = α(t - yi)xi
w1
(1)
= w1
(0)
+ ∆w1 = 0.1 + 0.4x0.7x1 = 0.38
w2
(1)
= w2
(0)
+ ∆w2 = 0.1 + 0.4x0.7x1 = 0.38
b
(1)
= b
(0)
+ α(t - yi) = 0.1 + 0.4x0.7 = 0.38
ADALINE Network
x1 x2 t
1 1 1
1 -1 1
-1 1 1
-1 -1 -1
Activation function is
Identity Function.
yi = neti
Epoch 1 : b = w1
(0)
= w2
(0)
= 0.38  = 0.4
∆w2 = 0.4x(1–0.38)x1 = 0.248 w1
(1)
= 0.38-0.25 = 0.13
w2
(1)
= 0.38+0.25 = 0.63 ∆w3= 0.4x(1–0.13)x1 = 0.348
∆w4 = 0.4x(-1–0.22)x-1 = 0.488
E = ∑ (t-y)2 = 0.49 + 0.38 + 0.76 + 1.49 = 3.12
ADALINE Network
x1 x2 b y t dw1 dw2 db w1 w2 b (t-y)2
1 1 1 0.3 1 0.28 0.28 0.28 0.38 0.38 0.38 0.49
1 -1 1 0.38 1 0.25 -0.25 0.25 0.63 0.13 0.63 0.38
-1 1 1 0.13 1 -0.35 0.35 0.35 0.28 0.48 0.98 0.76
-1 -1 1 0.22 -1 0.49 0.49 -0.49 0.77 0.97 0.49 1.49
Epoch 2 : b = 0.49 w1
(0)
= 0.77 w2
(0)
= 0.97  = 0.4
∆w2 = 0.4x(1–2.23)x1 = 0.492 w1
(1)
= 0.77-0.49 = 0.28
w2
(1)
= 0.97-0.49 = 0.48 ∆w3= 0.4x(1+0.2)x1 = 0.48
∆w4 = 0.4x(1+0.28)x1 = 0.51
∆w4 = 0.4x(-1-0.23)x-1 = 0.49
E = ∑ (t-y)2 = 1.51+ 1.44 + 1.64 + 1.51 = 6.1
ADALINE Network
x1 x2 b y t dw1 dw2 db w1 w2 b (t-y)2
1 1 1 2.23 1 -0.49 -0.49 -0.49 0.28 0.48 0 1.51
1 -1 1 -0.2 1 0.48 -0.48 0.48 0.76 0 0.48 1.44
-1 1 1 -0.28 1 -0.51 0.51 0.51 0.25 0.51 0.99 1.64
-1 -1 1 0.23 -1 0.49 0.49 -0.49 0.74 1.0 0.5 1.51
MADALINE Network
Developed by Bernard Widrow
Multiple ADALINE Network
Combining a number of ADALINE Networks
spread across multiple layers with adjustable
weights
The use of multiple ADALINEs help counter
the problem of non-linear separability
Perceptron Learning Functions
in MATLAB
learnp
 learnp is the perceptron weight/bias learning function.
 learnp calculates the weight change dW for a given neuron
from the neuron's input P and error E according to the
perceptron learning rule:
 dw = 0, if e = 0
= p', if e = 1
= -p', e = -1
 This can be summarized as
 dw = e*p
Perceptron Learning Functions
learnpn
 Normalized perceptron weight and bias learning function
 learnpn is a weight and bias learning function. It can result
in faster learning than learnp when input vectors have
widely varying magnitudes.
 learnpn calculates the weight change dW for a given neuron
from the neuron's input P and error E according to the
normalized perceptron learning rule:
 pn = p / sqrt(1 + p(1)^2 + p(2)^2) + ... + p(R)^2)
 dw = 0, if e = 0
= pn', if e = 1
= -pn', if e = -1
 The expression for dW can be summarized as
dw = e*pn'
Multilayer Perceptron (MLP)
 The oldest and most popular multi-layer neural network
architectures
 Use a non-linear activation function like the logistic
sigmoid or the hyperbolic tangent, or a piecewise-linear
activation function such as Rectifier Linear Unit (ReLU).
Multilayer Perceptron
 The advantage of the MLP over the classic
Perceptron and Adaline.
 Can create complex, non-linear decision boundaries
that allow us to tackle problems where the different
classes are not linearly separable.
Back Propagation Network
Developed by Rumelhart, Hinton, Williams
The Back propagation learning rule is
applicable on any feed forward network
architecture (multilayer also)
The Back propagation is a systematic method
of training, built on high mathematical
foundation and has very good application
potential.
BP algorithm is a generalization of the Delta
rule or Widrow-Hoff error correction rule.
Slow rate of convergence and local minima
problem are its weaknesses
Error Back Propagation
 The Back propagation learning rule is applicable on
any multilayer feed forward network architecture.
 It can be considered the cornerstone of modern
neural networks and deep learning.
 The backpropagation algorithm consists of two
steps:
Forward Pass: inputs pass through the network and
receive output predictions (this step is also known as
the propagation step).
Backward Pass: the loss function gradient is
calculated in the network's final layer (prediction
layer). It is used then for recursive application of the
chain rule to update the weights in the network (also
known as weight update or backpropagation)
Error Back Propagation
 The input array x passes through the first layer,
whose output values are connected to the input
values of the next layer, and so on, until the
network gives, the outputs of the last layer.
 Calculate the value of the error function,
obtained by comparison with the expected output
value.
 In order to minimize the error, the gradients of
the error function with respect to each weight is
calculated.
Error Back Propagation
 Since the gradient vector has been calculated, each
weight is updated in an iterative way, and
recalculating the gradients at the beginning of each
training iteration step, until the error becomes lower
than a certain established threshold, or the
maximum number of iterations is reached, when
finally the algorithm ends, the network is well trained.
 Current deep learning networks, like Convolutional
Neural Networks, also uses backpropagation
internally.
 Recurrent Neural Networks, which has been used for
natural language processing, also utilizes this
algorithm.
Back Propagation Network
Ii1
Ii2
Ii3
Oi1
Oi2
Oi3
Ih1
Ih2
Ih3
Oh1
Oh2
Oh3
Io1
Io2
Io3
Oo1
Oo2
Oo3
V11
V21
V12
V22
V32
V13
V23
V33
V31
W11
W21
W31
Back Propagation Network
Input Layer Computation
{O}i = {I}I
{I}h = [V]t {O}i
Hidden Layer Computation
{I}o = [W]t {O}h
   







 
 h
h f
I
h
e
O 
1
1  sigmoidal gain
fh threshold of
the hidden layer
Back Propagation Network
Output Layer Computation
Calculation of error (Euclidean Norm)
   







 
 o
o f
I
o
e
O 
1
1
 
 

2
2
1
o
o O
T
E
Back Propagation Network
MLFF networks with non-linear activation functions
have MSE surface above the total Q-dimensional space
which is not a smooth parabolic surface.
The error surface is complex and consists of many local
and global minima.
V
W
E
A
B
Initial weights
adjusted weights
best weights
C
Back Propagation Network
Back Propagation Network
 During training, the incremental adjustments to
the weights have been made, the location is
shifted to a different E location on the error-
weight surface.
 In moving down the error-weight surface, the
path followed depends on the shape of the
surface and the learning rate.
 The error surface is assumed to be truly
spherical
Vector AB = (Vi+1 - Vi)ī + (Wi+1 - Wi)ĵ = Vī + Wĵ












 j
W
E
i
V
E
AB 
Back Propagation Network
 
 
   
W
E
W
O
O
O
O
T
W
E
O
W
I
O
O
I
O
O
T
O
E
W
I
I
O
O
E
W
E
h
o
o
o
h
O
o
o
O
O
o
O
o
o
o
o



































1
1
Back Propagation Network
   







 
 o
o f
I
o
e
O 
1
1
   
    





























2
2
1
1 o
o
o
o
I
I
I
I
o
o e
e
e
e
O
dI
d






 
 
 
    








































 2
1
1
1
1
1
1
1
o
o
o
o
o I
I
I
I
I
o
o
e
e
e
e
e
O
O








Back Propagation Network
     
i
i
i
i
i
i
i
i
i
i
i
h
h
o
o
o
o
o
h
h
o
o
o
o
V
V
V
W
W
W
V
V
E
V
W
W
E
W
V
E
V
I
O
O
W
O
O
O
T
V
E
V
I
I
O
O
I
I
O
O
E
V
E

















































1
1
1
1
1
1







Back Propagation Network
 Learning Rate Coefficient (α)
 Determines the size of the weight adjustments
made at each iteration and hence influences
the rate of convergence.
 Momentum Term (Coefficient): (η)
 Momentum is used to keep the training process
going in the same general direction.
 ie. By adding a fraction of the previous weight
change to the current weight change.
 It reduces the training time and enhances the
stability of the training process.
weight matrices
V =
W =
Back Propagation Example
x1 x2 T
0.4 -0.7 0.1
0.3 -0.5 0.05
0.6 0.1 0.3
0.2 0.4 0.25
0.4
-0.7
Oi2
0.1
-0.2
0.4
0.2
0.2
-0.5
0.1 0.4
-0.2 0.2
0.2
-0.5
Back Propagation Example
Oi = Ii = V =
Ih = Vt Oi = =
Oh =
Io = Wt Oh = = -0.14354
Oo = 0.4642 and T = 0.1
E = (0.1 – 0.4642)2 = 0.13264
0.2 -0.5
0.4
-0.7
0.1 -0.2
0.4 0.2
0.4
-0.7
0.18
0.02
0.5448
0.505
0.5448
0.505
0.1 0.4
-0.2 0.2
( ) ( ) h
o
o
o O
O
-
1
O
O
-
T
λ
=
W
E
∂
∂
= 1*(0.1-0.4642)*0.4642*(1-0.4642)*
= -0.09058 *
=
Back Propagation Network
( ) ( ) ( ) i
h
h
o
o
o I
O
-
1
O
Wλ
O
-
1
O
O
-
T
λ
=
V
E
∂
∂
0.5448
0.505
-0.0493
-0.0457
0.5448
0.505
= -0.09058* * * *Oi
=
=
Back Propagation Network
( ) ( ) ( ) i
h
h
o
o
o O
O
-
1
O
Wλ
O
-
1
O
O
-
T
λ
=
V
E
∂
∂
1- 0.5448
1- 0.505
-0.00449 0.01132
0.5448
0.505
0.2
-0.5
0.4
-0.7
-0.001077 0.002716
0.001855 0.004754
Gradient Descent Training Functions
traingd
 Gradient descent backpropagation
 traingd can train any network as long as its weight, net input, and
transfer functions have derivative functions.
 Backpropagation is used to calculate derivatives of performance
perf with respect to the weight and bias variables X. Each variable
is adjusted according to gradient descent:
dX = lr * dperf/dX
traingdm
 Gradient descent with momentum backpropagation
 Backpropagation is used to calculate derivatives of performance
perf with respect to the weight and bias variables X. Each variable
is adjusted according to gradient descent with momentum,
dX = mc*dXprev + lr*(1-mc)*dperf/dX
where dXprev is the previous change to the weight or bias.
Gradient Descent Training Functions
traingda
 Gradient descent with adaptive learning rate backpropagation
 traingda can train any network as long as its weight, net
input, and transfer functions have derivative functions.
 Backpropagation is used to calculate derivatives of
performance perf with respect to the weight and bias
variables X.
 Each variable is adjusted according to gradient descent:
dX = lr * dperf/dX
At each epoch, if performance decreases toward the goal,
then the learning rate is increased by the factor lr_inc.
If performance increases by more than the factor
max_perf_inc, the learning rate is adjusted by the factor
lr_dec and the change that increased the performance is not
made.
Gradient Descent Training Functions
traingdx
 Gradient descent with momentum and adaptive learning rate
backpropagation
 traingdx can train any network as long as its weight, net input, and
transfer functions have derivative functions.
 Backpropagation is used to calculate derivatives of performance
perf with respect to the weight and bias variables X.
 Each variable is adjusted according to gradient descent with
momentum,
dX = mc*dXprev + lr*mc*dperf/dX
where dXprev is the previous change to the weight or bias.
For each epoch, if performance decreases toward the goal, then
the learning rate is increased by the factor lr_inc.
If performance increases by more than the factor max_perf_inc, the
learning rate is adjusted by the factor lr_dec and the change that
increased the performance is not made.
Gradient Descent Learning Functions
learngd
 learngd is the gradient descent weight and bias learning function.
 learngd calculates the weight change dW for a given neuron from
the neuron's input P and error E, and the weight (or bias) learning
rate lr, according to the gradient descent dW = lr*gW.
learngdm
 learngdm is the gradient descent with momentum weight and bias
learning function.
 learngdm calculates the weight change dW for a given neuron
from the neuron's input P and error E, the weight (or bias) W,
learning rate lr, and momentum constant mc, according to
gradient descent with momentum:
dW = mc*dWprev + (1-mc)*lr*gW
 The previous weight change dWprev is stored and read from the
learning state LS.
Associative Memory
Developed by John Hopfield
Single layer feed forward or recurrent
network which makes use of Hebbian
learning or Gradient Descent learning rule
A storehouse of associated patterns
A content-addressable memory system
allows the recall of data on the degree of
similarity between the input patterns and
the patterns stored in memory.
Associative Memory Neural Networks
(AMNN) -
Associative Memory
AMNN – Hopfield Neural Networks and
Bi-directional Associative Memory.
AMNN are single layer networks in which
the weights are determined for the
network to store a set of pattern
associations. Each association is an
input-output vector pair
AutoAMNN – if the input vector is same
as that of the output vector associated
HeteroAMNN – if inputs and outputs are
different
Auto Associative Memory
 Hopfield Associative Memory
 Connection matrix is indicative of the association of
the pattern with itself
 Autocorrelator’s recall
equation (activation function)
 Two parameter bipolar
threshold equation
 Hamming Distance of
vector X from Y
  
i
m
i
T
i A
A
T 


1
( )
( )
0
<
α
1
-
0
=
α
β
0
>
α
1
=
β
α
=
if
if
if
f
a
t
a
f
a old
j
ij
i
new
j
,
,
,
,
,
  



n
i
i
i y
x
y
x
HD
1
,
Auto Associative Memory - Example
Considering three patterns
A1 =
A2 =
A3 =
Recall Equation
T =
-1 1 -1 1
1 1 1 -1
-1 -1 -1 1
  
i
m
i
T
i A
A
T 


1
3 1 3 -3
1 3 1 -1
3 1 3 -3
-3 -1 -3 3
 
 










0
,
1
-
0
,
0
,
1
,
,






if
if
if
f
a
t
a
f
a old
j
ij
i
new
j
Auto Associative Memory - Example
Stored pattern A2 = T =
a1
new
= f(1x3 + 1x1 + 1x3 + -1x-3, 1)
= f(3+1+3+3, 1)
= f(10, 1) = 1
a2
new
= f(6, 1) = 1
a3
new
= f(10, 1) = 1
a4
new
= f(-10, -1) = -1
A2
new =
1 1 1 -1 3 1 3 -3
1 3 1 -1
3 1 3 -3
-3 -1 -3 3
 
 










0
,
1
-
0
,
0
,
1
,
,






if
if
if
f
a
t
a
f
a old
j
ij
i
new
j
1 1 1 -1
Auto Associative Memory - Example
Another noisy vector A’ =
a1
new
= f(3+1+3-3, 1)
= f(4, 1) = 1
a2
new
= f(4, 1) = 1
a3
new
= f(4, 1) = 1
a4
new
= f(-4, 1) = -1
A2
new =
1 1 1 1
 
 










0
,
1
-
0
,
0
,
1
,
,






if
if
if
f
a
t
a
f
a old
j
ij
i
new
j
1 1 1 -1
Hetero Associative Memory
 Developed by Bart Kosko
 Hetero Associative memory neural network
consists of only one layer of weighted
interconnections.
 There exists ‘n’ number of input neurons in the
input layer and ‘m’ number of output neurons in
the output layer.
 This is a fully interconnected network, wherein the
inputs and the outputs are different, hence it is
called Hetero Associative memory neural network.
 The weights are found using the Hebb Rule
Hetero Associative Memory
 There are N training pairs {(A1,B1), (A2,B2),--- }
 Ai = (ai1, ai2, ai3 …….. ain)
 Bi = (bi1, bi2, bi3 …….. bin)
 Correlation Matrix
 Bi-directional Associative Memory (BAM) is a
hetero associative recurrent neural network
consisting of two layers.
 The net iterates by sending a signal back and
forth between the two layers until each neuron’s
activation remains constant for several steps.
[ ][ ]
i
m
1
=
i
T
i B
A
=
M ∑
The net can respond to input on either layer.
The layers are referred to as X-layer and Y-
layer instead of input and output layer.
B’ = f(AM)
A’ = f(B’MT
) Recall Equation
B’’ = f(A’M)
A’’ = f(B’’MT
)
Hetero Associative Memory
 









0
,
1
-
0
,
0
,
1
,






if
if
if
f
A1 = B1 =
A2 = B2 =
A3 = B3 =
Converting to bipolar
A1 = B1 =
A2 = B2 =
A3 = B3 =
1 0 0 1
1 0 1 0
1 1 0 0
1 0 1
0 1 1
0 0 1
1 -1 -1 1 1 -1 1
-1 -1 1
1 -1 1 -1
1 1 -1 -1
-1 1 1
Bi-directional Associative Memories
Finding the connection matrix
M = + +
M =   
i
m
i
T
i B
A
M 


1
-1 -1 3
-1 -1 -1
-1 3 -1
3 -1 -1
Bi-directional Associative Memories
1 -1 1
1
-1
-1
1
-1 1 1
1
-1
1
-1
-1 -1 1
1
1
-1
-1
Stored pattern A1 = M =
b1
new
= f(1x-1 +-1x-1 +-1x-1 + 1x3, 1)
= f(-1+1+1+3, 1)
= f(4, 1) = 1
b2
new
= f(-4, 1) = -1
b3
new
= f(4, 1) = 1
B1
new =
1 -1 -1 1
1 -1 1
Bi-directional Associative Memories
-1 -1 3
-1 -1 -1
-1 3 -1
3 -1 -1
 









0
,
1
-
0
,
0
,
1
,






if
if
if
f
with pattern B1 = MT =
a1
new
= f(1x-1 + -1x-1 + 1x3, 1)
= f(-1+1+3, 1)
= f(3, 1) = 1
a2
new
= f(-1, 1) = -1
a3
new
= f(-4, 1) = -1
a4
new
= f(3, 1) = 1
A1
new =
-1 -1 -1 3
-1 -1 3 -1
3 -1 -1 -1
1 -1 1
1 -1 -1 1
Bi-directional Associative Memories
 









0
,
1
-
0
,
0
,
1
,






if
if
if
f
Character Recognition
Two stored patterns of letter E
Connection matrix
Character Recognition
1 1 1
1 0 0
1 1 1
1 0 0
1 1 1
1 1 1
1 0 0
1 1 0
1 0 0
1 1 1
1 1 1
1 -1 -1
1 1 1
1 -1 -1
1 1 1
1 1 1
1 -1 -1
1 1 -1
1 -1 -1
1 1 1
10 2 0
2 10 8
0 8 10
Two stored patterns of letter E
Connection matrix will be a 15x15 matrix
Character Recognition
1 1 1 1 -1 -1 1 1 1 1 -1 -1 1 1 1
1 1 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 1
Fabric Defect Identification
Self-Organizing Maps (SOMs)
 Self-Organizing Maps (SOMs)
were invented by Professor T.
Kohenen. Also known as
Kohenen Neural Netwok (KNN)
 This topology uses an
unsupervised learning procedure
to produce a two-dimensional
discretized representation of the
input space of the training
samples called a ‘map’.
 KNN is widely used for clustering
applications
Competitive Network
 Kohenen worked in the development of the theory
of competition.
 The mostly used competition among group of
neurons is Winner-Takes-All.
 Here, only one neuron in the competing group will
have a non-zero output signal when the
competition is completed.
 The self-organizing map, developed by Kohenen,
groups the input data into clusters which are
commonly used for unsupervised learning.
Self-Organizing Maps (SOMs)
 Whenever an input is presented, the network
finds out the “distance” of the weight vector of
each node from the input vector, and selects the
node with the greatest distance.
 In this way, the whole network selects the node
with its weight vector closest to the input vector,
i.e. the winner.
 The network learns by moving the winning
weight vector towards the input vector while the
other weight vectors remain unchanged
Self-Organizing Maps (SOMs)
 If the samples are in clusters, then every time
the winning weight vector moves towards a
particular sample in one of the clusters.
 Eventually each of the weight vectors would
converge to the centroid of one cluster. At this
point, the training is complete.
 After training, the weight vectors become
centroids of various clusters.
Self-Organizing Maps (SOMs)
Clustering Technique
 To cluster 4 bipolar input patterns into 2 clusters.
 I1 = [1 1 1 -1]
 I2 = [-1 -1 -1 1]
 I3 = [1 -1 -1 -1]
 I4 = [-1 -1 1 1]
 The weights connected to the cluster units are:
 W1 = [0.2 0.6 0.5 0.9]
 W2 = [0.8 0.4 0.7 0.3]
 Learning rate α = 0.9
Clustering of Bipolar Input Patterns
Clustering of Bipolar Input Patterns
 Euclidean Distance (ED) between the weight
vector associated with it and the given input
vector is the minimum
 ED(1)= 𝑖=1:𝑛 𝑊𝑖 − 𝐼𝑖
2
 ED(1) = (0.2-1)2+(0.6-1)2+(0.5-1)2+(0.9-(-1))2
= 4.66
 ED(2) = (0.8-1)2+(0.4-1)2+(0.7-1)2+(0.3-(-1))2
= 2.18
 Winner is the second cluster unit as ED is
minimum
 Weight Updation for cluster 2
 Wi=2(new) = Wi=2(old) + α*(I1 - Wi=2(old))
 W2 = [0.8 0.4 0.7 0.3]
 W21(new) = 0.8 + 0.9*(1-0.8) = 0.98
 W22(new) = 0.4 + 0.9*(1-0.4) = 0.94
 W23(new) = 0.7 + 0.9*(1-0.7) = 0.97
 W24(new) = 0.3 + 0.9*(-1-0.3) = -0.87
 W2(new) = [0.98 0.94 0.97 -0.87]
 W1 = [0.2 0.6 0.5 0.9]
Clustering of Bipolar Input Patterns

Clustering of Bipolar Input Patterns
 Weight Updation for cluster 1
 Wi=1(new) = Wi=1(old) + α*(I1 - Wi=1(old))
 W1 = [0.2 0.6 0.5 0.9]
 W11(new) = 0.2 + 0.9*(-1-0.2) = -0.88
 W12(new) = 0.6 + 0.9*(-1-0.6) = -0.84
 W13(new) = 0.5 + 0.9*(-1-0.5) = -0.85
 W14(new) = 0.9 + 0.9*(1-0.9) = 0.99
 W1(new) = [-0.88 -0.84 -0.85 0.99]
 W2(new) = [0.98 0.94 0.97 -0.87]
Clustering of Bipolar Input Patterns

Clustering of Bipolar Input Patterns
 Weight Updation for cluster 1
 Wi=1(new) = Wi=1(old) + α*(I1 - Wi=1(old))
 W1 = [-0.88 -0.84 -0.85 0.99]
 W11(new) = -0.88 + 0.9*(1-(-0.88)) = 0.812
 W12(new) = -0.84 + 0.9*(-1-(-0.84)) = -0.984
 W13(new) = -0.85 + 0.9*(-1-(-0.85)) = -0.985
 W14(new) = 0.99 + 0.9*(-1-0.99) = -0.801
 W1(new) = [0.812 -0.984 -0.985 -0.801]
 W2(new) = [0.98 0.94 0.97 -0.87]
Clustering of Bipolar Input Patterns
Clustering of Bipolar Input Patterns
 Euclidean Distance (ED) for pattern 4
I4 = [-1 -1 1 1]
 ED(1)= 𝑖=1:𝑛 𝑊𝑖 − 𝐼𝑖
2
 ED(1) = (0.812-(-1)2+(-0.984-(-1))2+(-0.985-1)2
+(-0.801-1)2 = 10.4674
 ED(2) = (0.98-(-1))2+(0.94-(-1))2+(0.97-1)2
+(-0.87-1)2 = 11.1818
 Winner is the first cluster unit as ED is minimum
 Weight Updation for cluster 1
 W1(new) = [0.812 -0.984 -0.985 -0.801]
 W11(new) = 0.812 + 0.9*(-1- 0.812) = -0.8188
 W12(new) = -0.984 + 0.9*(-1-(-0.984)) = -0.9984
 W13(new) = -0.985 + 0.9*(1-(-0.985)) = 0.8015
 W14(new) = -0.801 + 0.9*(1-(-0.801)) = 0.8199
 W1(new) = [-0.8188 -0.9984 -0.8015 0.8199]
 W2(new) = [0.98 0.94 0.97 -0.87]
 After one epoch (iteration), patterns I2,I3 and I4 are
in cluster W1 and I1 is in cluster W2
 After several epochs, clustering becomes stagnant
Clustering of Bipolar Input Patterns
Clustering Technique
Vector Quantization is a method of dynamic
allocation of cluster centers.
To begin with, the first pattern will create the
cluster to hold it.
Points x y Points x y
P1 2 3 P7 6 4
P2 3 3 P8 7 4
P3 2 6 P9 2 4
P4 3 6 P10 3 4
P5 6 3 P11 2 7
P6 7 3 P12 3 7
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
P1
P9
P11
P4
P12
P5
P7
P6
P8
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
C3
C2
Clustering Technique
Coordinates of P1 = (2,3)
Centre of Cluster C1 = (2,3)
Threshold distance = 1.5
Considering point P2 whose coordinates are (3,3)
Distance between P2 and C1 =((3-2)2 + (3-3)2) = 1.0 < 1.5
Hence P2 is included in C1
New cluster centre of C1 =
3+2
2
,
3+3
2
= (2.5, 3)
Points x y Points x y
P1 2 3 P7 6 4
P2 3 3 P8 7 4
P3 2 6 P9 2 4
P4 3 6 P10 3 4
P5 6 3 P11 2 7
P6 7 3 P12 3 7
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
P3
P2
P10
Clustering Technique
Considering point P3 whose coordinates are (2,6)
Centre of Cluster C1 = (2.5,3)
Distance between P3 and C1 =((2-2.5)2 + (6-3)2) = 3.04
This is greater than 1.5
Hence P3 is not included in C1.
Another cluster C2 is selected whose centre is (2, 6)
Considering point P4 whose coordinates are (3,6)
Distance between P4 and C1 =((3-2.5)2 + (6-3)2) = 3.04 > 1.5
Distance between P4 and C2 =((3-2)2 + (6-6)2) = 1.0 < 1.5
Hence P4 is not included in C1 but included in C2
New cluster centre of C2 =
3+2
2
,
6+6
2
= (2.5, 6)
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
P3
P2
P10
Clustering Technique
Considering point P5 whose coordinates are (6,3)
Centre of Cluster C1 = (2.5,3)
Distance between P5 and C1 =((6-2.5)2 + (3-3)2) = 3.5 > 1.5
Distance between P5 and C2 =((6-2.5)2 + (3-6)2) = 4.6 > 1.5
Hence P5 is not included in C1 and also in C2
Another cluster C3 is selected whose centre is (6, 3)
Considering point P6 whose coordinates are (7,3)
Centre of Cluster C1 = (2.5,3)
Centre of Cluster C2 = (2.5,6)
Distance between P6 and C1 =((7-2.5)2 + (3-3)2) = 4.5 > 1.5
Distance between P6 and C2 =((7-2.5)2 + (3-6)2) = 5.40 > 1.5
Hence P6 is not included in C1 and in C2
Distance between P6 and C3 =((7-6)2 + (3-3)2) = 1.0 < 1.5
Now P6 is included in C3
New cluster centre of C3 =
6+7
2
,
3+3
2
= (6.5, 3)
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
C3
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
C3
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
C3
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
C3
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
C3
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
C3
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
P1
P9
P11
P4
P12
P5
P7
P6
P8
C2
C3
P3
P2
P10
Clustering Technique
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8
C1
C3
C2
Adaptive Resonance Theory
 ART was introduced by Carpenter and Stephen
Grossberg
 Widely used for clustering applications.
 The problems faced by competitive NNs are that
they do not always form stable clusters.
 They are oscillatory when more input patterns are
presented.
 ART NN are receptive to significant new patterns
and still remains stable.
 There are three types of ART networks: (i) ART-1
(ii) ART-2 and (iii) ART-3
Adaptive Resonance Theory
 ART-1 can cluster only binary inputs
 ART-2 can handle gray-scale inputs
 ART-3 can handle analog inputs better by
overcoming the limitations of ART-2.
 The basic ART learning is an unsupervised one.
 Stability of the network means that a pattern
should not oscillate among different cluster units
at different stages of training.
 Plasticity is the ability of the net to respond to
learn new pattern equally well at any stage of
learning.
Adaptive Resonance Theory
 The key innovation of ART is the use of a degree
of expectation called vigilance parameter.
 Vigilance parameter is the user specified value to
decide the degree of similarity essential for the
input patterns to be assigned to a cluster unit.
 As each input is presented to the network, it is
compared with the prototype vector for a match
based on the vigilance parameter.
 If the match is not adequate, a new prototype or a
cluster unit is selected.
 In this way, previous learned memories
(prototypes) are not eroded by new learning.
Adaptive Resonance Theory
 ‘Resonance’ in ART is the state of the network
when a class of prototype vector very closely
matches to the current input vector, and leads to
a state which permits learning.
 During this resonant state, the weight updation
takes place.
 The basic architecture consists of three layers:
 Input Processing Layer for processing the given
inputs.
 Further divided into Input Layer and Input
Interface Layer
 Output layer has the cluster units. This is the
competitive layer or a recognition region.
Adaptive Resonance Theory
 Interface layer is called the comparison region
where it transfers the input vector to its best
match in the recognition region.
 Reset Layer decides the degree of similarity of
patterns placed on the same cluster by a reset
mechanism.
 It compares the strength of the recognition match
to the vigilance parameter.
 Bottom-up weights are connected between the
Input Interface Layer to the Output layer.
 Top-down weights are connected between the
Output layer to the Input Interface Layer.
Adaptive Resonance Theory
Output layer
Input layer
Reset layer Input Interface
layer
Bottom-up weights
Top-down weights
Adaptive Resonance Theory
Adaptive Resonance Theory
 The units transmit the information to the output
layer through the bottom-up weights u,
 O1 = I1u11 + I2u12 = 0.5*0.3 + 0.6*0.5 = 0.45
 O2 = I1u21 + I2u22 = 0.5*0.2 + 0.6*0.6 = 0.46
 O2 > O1 so output cluster 2 is selected as winner
 The information about the winner is sent from the
output layer to the interface layer through the top-
down weights d.
 I1 = S1d11 = 0.5*0.1 = 0.05
 I2 = S2d12 = 0.6*0.3 = 0.18
 Norm of I is 𝐼 = I1 + I2 = 0.05 +0.18 = 0.23
 The value of 𝐼 gives an estimate of the degree
of match
Adaptive Resonance Theory
 The learning will occur only if the match is
acceptable to the value of vigilance parameter.
 The verdict for learning is carried out by
calculating the ratio of 𝐼 and 𝑆 .
 The updation of the weights is carried out if
Match Ratio 𝐼
𝑆 ≥ v
 𝐼
𝑆 = 0.23/1.1 = 0.209 < v (0.3)
 If 𝐼
𝑆 < v, then the current cluster unit is
rejected and inhibited.
Adaptive Resonance Theory
 Again I1 and I2 is calculated for next cluster unit
 I1 = S1d21 = 0.5*0.6 = 0.3
 I2 = S2d22 = 0.6*0.1 = 0.06
 𝐼 = I1 + I2 = 0.3 +0.06 = 0.36
 𝐼
𝑆 = 0.36/1.1 = 0.327 > v (0.3)
 Cluster 2 is selected and S is assigned to it.
 The weights associated with it are updated.
Adaptive Resonance Theory
 The top-down weights associated with cluster 2
are assigned the new calculated values I1 and I2
 d21 = I1 = 0.3
 d22 = I2 = 0.06
 The new bottom-up weights are calculated as:
 u21 =
𝐿∗𝐼1
𝐿−1+ 𝐼
=
4∗0.3
4−1+0.36
= 0.454
 u22 =
𝐿∗𝐼2
𝐿−1+ 𝐼
=
4∗0.06
4−1+0.36
= 0.091
 This procedure is repeated until a cluster unit is
accepted or all the units in the output layer are
inhibited.
Adaptive Resonance Theory
 If all the units in the output layer are inhibited, a
decision has to be taken by the user.
 Reduce the value of the vigilance parameter
allowing less matched patterns to be placed
on the same cluster units which may be
inhibited during earlier learning trial.
 Addition of more number of cluster units.
 Specify the current input pattern as the one
that cannot be clustered.
The vigilance parameter v can have a value less
than 1
 L > 1
THANKYOU
 plotpv - Plots perceptron input/target vectors
 plotpv(P,T) P is the matrix of input vectors
and T is the matrix of binary target vectors
 P = [ -0.5 -0.5 +0.3 -0.1; -0.5 +0.5 -0.5 +1.0];
 T = [1 1 0 0]; plotpv(P,T);
 plotpc - Plots classification line on perceptron
vector plot
 plotpc(W,B) W is the weight matrix and B is the
bias vector
July 16, 2023 116
Neural Network Toolbox
 newp Creates a perceptron
 net = newp(P,T,TF,LF)
P is the R x Q1 matrix of input vectors
T is the S x Q2 matrix of target vectors
TF is the transfer function (default = ‘hardlim')
LF is the Learning function (default = 'learnp')
 net.iw{1,1} = [-1.2 -0.5]; net.b{1} = 1;
plotpc(net.iw{1,1},net.b{1})
 adapt Allow neural network to change weights
and biases on inputs
July 16, 2023 117
Neural Network Toolbox
(percpt)
 adapt Allow neural network to change weights
and biases on inputs
 This function calculates network outputs and errors
after each presentation of an input.
 [net,Y,E,tr] = adapt(net,P,T)
net is the Network
P Network inputs
T Network targets (default = zeros)
Y Network outputs
E Network errors
tr Training record (epoch and perf)
 net.adaptParam.passes
July 16, 2023 118
Neural Network Toolbox
 sim Simulate neural network
 This function calculates network outputs and errors
after each presentation of an input.

net is the Network
P Network inputs
T Network targets (default = zeros)
Y Network outputs
E Network errors
 [Y,E,perf] = sim(net,P,T)
perf Network performance
July 16, 2023 119
Neural Network Toolbox
 newff Creates a feed-forward
backpropagation network
 net = newff(P,T,Si,Tfi)
P is the R x Q1 matrix of input vectors
T is the SN x Q2 matrix of target vectors
Si is the Size of the ith (hidden) layer
TFi is the transfer function of the ith layer
This function initializes its weights and biases. It also
sets the input, output data processing functions and
training functions to default values
July 16, 2023 120
Neural Network Toolbox
(feedfrwd)
 train Train neural network
 This function trains a network net according to
net.trainFcn and net.trainParam..
 [net, tr,Y,E] = train(net,P,T)
net is the Network
P Network inputs
T Network targets (default = zeros)
Y Network outputs
E Network errors
tr Training record (epoch and perf)
 net.trainParam.epochs
 net.trainParam.goal
July 16, 2023 121
Neural Network Toolbox
 Two different styles of training.
 Incremental training - the weights and biases of
the network are updated each time an input is
presented to the network.
 In this case, the function adapt is used , and the
inputs and targets are presented as sequences.
P = {[1;2] [2;1] [2;3] [3;1]}; T = {4 5 7 7};
 Batch training - the weights and biases are only
updated after all the inputs are presented.
The function train can only perform
batch training.
July 16, 2023 122
Neural Network Toolbox
 train applies the inputs to the new network,
calculates the outputs, compares them to the
associated targets, and calculates a mean square
error. If the error goal is met, or if the maximum
number of epochs is reached, the training is
stopped, and train returns the new network and a
training record. Otherwise train goes through
another epoch.
 train uses a matrix of concurrent vectors.
P = [1 2 2 3; 2 1 3 1]; T = [4 5 7 7];
July 16, 2023 123
Neural Network Toolbox
 Create and train a FF network to evaluate the
following function:
 for -10 < x < 10
 Generate input-output training data
x=-10:0.5:10
y=(x^2-6.5)/(x^2+6.5);
 Create a feed forward neural network
net=newff(x,y,5,{‘tansig’,’tansig’},’traingd’)
 Train the network
net=train(net,x,y);
July 16, 2023 124
Neural Network Toolbox
5
.
6
+
x
6.5
-
x
=
y 2
2
(feedfrwd1)
 Pre-processing and Post-processing Inputs and
Outputs
 Result in faster and efficient training of the network
 Pre- and Post-processing training data functions
are assigned automatically by network creation
functions like newff
 The function mapminmax scales inputs and outputs
so that they are in the range [-1 1]
 The normalized output is converted back to original
by using the function mapminmax with argument
reverse
July 16, 2023 125
Neural Network Toolbox
(preprocs)

More Related Content

Similar to Types of Neural Networks Explained

Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network Iman Ardekani
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Networkssuserab4f3e
 
Artificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionMohammed Bennamoun
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning RuleNoor Ul Hudda Memon
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronsmitamm
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfnikola_tesla1
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksESCOM
 
Unsupervised-learning.ppt
Unsupervised-learning.pptUnsupervised-learning.ppt
Unsupervised-learning.pptGrishma Sharma
 
Classification using perceptron.pptx
Classification using perceptron.pptxClassification using perceptron.pptx
Classification using perceptron.pptxsomeyamohsen3
 
Introduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdfIntroduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdfSasiKala592103
 

Similar to Types of Neural Networks Explained (20)

Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network
 
AI Lesson 38
AI Lesson 38AI Lesson 38
AI Lesson 38
 
Lesson 38
Lesson 38Lesson 38
Lesson 38
 
NN-Ch2.PDF
NN-Ch2.PDFNN-Ch2.PDF
NN-Ch2.PDF
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
neural networksNnf
neural networksNnfneural networksNnf
neural networksNnf
 
Artificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competition
 
Nn3
Nn3Nn3
Nn3
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning Rule
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
Unsupervised-learning.ppt
Unsupervised-learning.pptUnsupervised-learning.ppt
Unsupervised-learning.ppt
 
Classification using perceptron.pptx
Classification using perceptron.pptxClassification using perceptron.pptx
Classification using perceptron.pptx
 
Introduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdfIntroduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdf
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

Types of Neural Networks Explained

  • 1. TYPES OF NEURAL NETWORKS Dr.(Mrs.)Lini Mathew Professor Electrical Engineering Department
  • 2. Simple Neural Network X = I1W1+ I2W2+ ----- + INWN Activation Function S = K(X) K is a threshold function ie. S = 1 if X > T S = O otherwise T is a constant threshold value.
  • 3. Activation Functions Threshold Function S = 1 if X ≥ 0 S = 0 if X < 0 S = hardlim(X) hard-limit transfer function Also known as Heaviside step function Binary-Step Function S = 1 if X ≥  S = 0 if X <  X S +1 -1 0 +1
  • 4. Activation Functions Signum Function S = 1 if X ≥ 0 S = -1 if X < 0 S = hardlims(X) symmetric hard-limit transfer function +1 X S -1 0 +1
  • 5. Activation Functions Squashing Function or Logistic Function or Binary Sigmoidal Function. X = 0 S = 0.5 a is known X > 0 S = 1 as steepness X < 0 S = 0 parameter S=logsig(X) log-sigmoid transfer function aX e 1 1 S   
  • 6. Activation Functions Hyperbolic Tangent Function or Bipolar Sigmoidal Function S = tanh(X) X = 0 S = 0 X > 0 S = 1 X < 0 S = -1 S=tansig(X) tan-sigmoid transfer function aX -2aX e 1 e - 1 S 2 2 1 1 2        aX e
  • 7. Linear Transfer Function S = purelin(X) also known as identity function S=X for all X Positive Linear Transfer Function S = poslin(X) S = X if X ≥ 0 S = 0 if X < 0 Transfer Functions - MATLAB X S +1 -1 0 +1 S X +1 -1 0 +1
  • 8. Saturating Linear Transfer Function S = satlin(X) S = X if 0 ≤ X ≤ 1 S = 0 if X < 0 S = 1 if X > 1 Symmetric Saturating Linear Transfer Function S = satlins(X) S = X if -1 ≤ X ≤ 1 S = -1 if X < -1 S = 1 if X > 1 Transfer Functions - MATLAB X S +1 -1 0 +1 +1 -1 X S +1 -1 0 +1 +1 -1
  • 9. Transfer Functions - MATLAB Radial Basis Function S = radbas(X) S=e−X 2 Triangular Basis Function S = tribas(X) S = 1-abs(X) if -1 ≤ X ≤ 1 S = 0 otherwise
  • 10. McCulloch-Pitts Neuron Model  Formulated by Warren McCulloch and Walter Pitts in 1943  McCulloch-Pitts neuron allows binary 0 or 1 states only ie.it is binary activated  The input neurons are connected by direct weighted path, excitatory or inhibitory  The excitatory connections-positive weights, inhibitory-negative weights  Neuron is associated with a threshold value
  • 11. Learning Rules  A neural network learns about its environment through an interactive process of adjustments applied to its synaptic weights and bias levels.  The set of well defined rules for the solution of a learning problem is called a learning algorithm  Hebbian Learning Rule. Oldest and most famous of all learning rules, designed by Donald Hebb in 1949.  Represents a purely feed-forward, unsupervised learning  If the cross product of output and input is positive, this results in increase of weights, otherwise the weight decreases.  The weights are adjusted as Wij (k+1) = Wij (k) + xi y
  • 12. Learning Rules  Perceptron Learning Rule. Learning signal is the difference between the desired and natural neuron’s response.  This type of learning is supervised.  Neti = b + Σxi Wi Calculated output yi = f(Neti) = 1 if Neti > 0 = 0 if -0 ≤ Neti ≤ 0 = -1 if Neti < -0 Weight updation  If t ≠ y and the value of xi not equal to zero Wi (k+1) = Wi (k) + α t xi bi (k+1) = bi (k) + α t  If t = y, there is no change in weights
  • 13. Learning Rules  Delta Learning Rule (Widrow-Hoff Rule or Least Mean Square (LMS) Rule.  The delta learning rule is valid only for continuous activation functions and in the supervised training mode.  The delta rule assumes that the error signal is directly measurable.  The aim of the delta rule is to minimize the error over all training patterns.  ∆Wi = α (t - yi) xi  The mean square error for a particular pattern is E = Σ(ti – yi)2  The gradient of E is a vector consisting of partial derivatives of E with respect to each of the weights.
  • 14. Learning Rules  Competitive Learning Rule.  This rule has a mechanism that permits the neurons to compete for the right to respond to a given subset of inputs, such that only one output neuron per group is active at a time.  The winner neuron during competition is called winner- takes-all neuron.  This rule is suited for unsupervised network training. This is the standard Kohenen learning rule.  For neuron P to be the winning neuron, its induced local field vp for a given particular input pattern must be largest among all the neurons in the network. N = 1 if vp > vq for all q, p ≠ q N = 0 otherwise
  • 15. Characteristics of Neural Networks  Exhibit mapping capabilities. They can map input patterns to their associated output patterns  Learn by examples. They can be trained with known examples of a problem and therefore can identify new objects previously untrained  Possess the capability to generalize. They can predict new outcomes from past trends.  Are robust systems and are fault tolerant. They can recall full patterns from incomplete, partial or noisy patterns.  Can process information in parallel, at high speed and in a distributed manner
  • 17. Single Layer Perceptron - The simplest form of neural network used for the classification of patterns that are linearly separable. Algorithm – To start the training process, initially the weights and biases are set to zero. The learning rate value is set, which ranges from 0 to 1. Wi (k+1) = Wi (k) + α t xi bi (k+1) = bi (k) + α t Perceptron Network
  • 18. Example: Training of an AND gate (i) Bias b = 0 W1 (0) = 0 W2 (0) =0 Neti = b + Σxi Wi Net1 = 0 + 0 = 0 y1 = 0 as Net1 = 0 t = -1 W1 (1) = W1 (0) +  t x1 = 0 + 1x-1x-1 = 1 W2 (1) = W2 (0) +  t x2 = 0 + 1x-1x-1 = 1 b (1) = b (0) + α t = 0 + 1x-1 = -1 Perceptron x1 x2 t 0 0 0 0 1 0 1 0 0 1 1 1 x1 x2 t -1 -1 -1 -1 1 -1 1 -1 -1 1 1 1
  • 19. (ii) b = -1 W1 (1) = 1 W2 (1) = 1 x1 = -1 x2 = 1 Net1 = -1 + 1x-1 + 1x1 = -1 y1 = -1 as Net1 < 0 t = -1 No weight change (iii) b = -1 W1 (1) = 1 W2 (1) = 1 x1 = 1 x2 = -1 Net1 = -1 + 1x1 + 1x-1 = -1 y1 = -1 as Net1 < 0 t = -1 No weight change Perceptron
  • 20. (iv) b = -1 W1 (1) = 1 W2 (1) = 1 x1 = 1 x2 = 1 Net1 = -1 + 1x1 + 1x1 = 1 y1 = 1 as Net1 > 0 t = 1 No weight change Epoch 2 Perceptron x1 x2 b net y t w1 w2 -1 -1 -1 -3 -1 -1 1 1 -1 1 -1 -1 -1 -1 1 1 1 -1 -1 -1 -1 -1 1 1 1 1 -1 1 1 1 1 1
  • 21. Linear Separability (0,0) (0,1) (1,0) (1,1) AND (0,0) (0,1) (1,0) (1,1) XOR
  • 22. Linear Separability  Netj = Σ xi wi + b = x1 w1 + x2 w2 + b  The relation Σ xi wi + b = 0 gives the boundary region of the net input.  The equation denoting this decision boundary can represent a line or plane.  On training, if the weights of training input vectors of correct response +1 lie on one side of the boundary and that of -1 lie on the other side of the boundary, then the problem is linearly separable.  x1 w1 + x2 w2 + b = 0 2 1 1 2 2 w w x w b x   
  • 23. -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -1 -0.5 0 0.5 1 1.5 Vectors to be Classified P(1) P(2) -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -1 -0.5 0 0.5 1 1.5 Vectors to be Classified P(1) P(2) Linear Separability
  • 24. Linear Separability (0,0) (0,1) (1,0) (1,1) XOR (0,0) (0,1) (1,0) (1,1) AND Perceptrons are successful only on problems with linearly separable solution space.
  • 25. ADALINE Network  Adaptive Linear Neuron  Developed by Widrow and Hoff in 1960.  Inputs could be binary, bipolar or real valued  The training process is continued until the error (t-yi) is minimum.  Mean Square Error 𝐸 = 𝑖=1 𝑛 (𝑡 − 𝑦𝑖)2  Learning algorithm (Delta Rule) yi = 1 if Neti ≥ 0 = -1 otherwise Weight Adjustment: Wi (k+1) = Wi (k) +  (t-yi)xi
  • 26. Example: ADALINE network for OR function (i) Bias b = w1 (0) = w2 (0) = 0.1  = 0.4 Neti = b + Σxi wi Net1 = 0.1 + 0.1 +0.1 = 0.3 y1 = 0.3 t = 1 ∆wi = α(t - yi)xi w1 (1) = w1 (0) + ∆w1 = 0.1 + 0.4x0.7x1 = 0.38 w2 (1) = w2 (0) + ∆w2 = 0.1 + 0.4x0.7x1 = 0.38 b (1) = b (0) + α(t - yi) = 0.1 + 0.4x0.7 = 0.38 ADALINE Network x1 x2 t 1 1 1 1 -1 1 -1 1 1 -1 -1 -1 Activation function is Identity Function. yi = neti
  • 27. Epoch 1 : b = w1 (0) = w2 (0) = 0.38  = 0.4 ∆w2 = 0.4x(1–0.38)x1 = 0.248 w1 (1) = 0.38-0.25 = 0.13 w2 (1) = 0.38+0.25 = 0.63 ∆w3= 0.4x(1–0.13)x1 = 0.348 ∆w4 = 0.4x(-1–0.22)x-1 = 0.488 E = ∑ (t-y)2 = 0.49 + 0.38 + 0.76 + 1.49 = 3.12 ADALINE Network x1 x2 b y t dw1 dw2 db w1 w2 b (t-y)2 1 1 1 0.3 1 0.28 0.28 0.28 0.38 0.38 0.38 0.49 1 -1 1 0.38 1 0.25 -0.25 0.25 0.63 0.13 0.63 0.38 -1 1 1 0.13 1 -0.35 0.35 0.35 0.28 0.48 0.98 0.76 -1 -1 1 0.22 -1 0.49 0.49 -0.49 0.77 0.97 0.49 1.49
  • 28. Epoch 2 : b = 0.49 w1 (0) = 0.77 w2 (0) = 0.97  = 0.4 ∆w2 = 0.4x(1–2.23)x1 = 0.492 w1 (1) = 0.77-0.49 = 0.28 w2 (1) = 0.97-0.49 = 0.48 ∆w3= 0.4x(1+0.2)x1 = 0.48 ∆w4 = 0.4x(1+0.28)x1 = 0.51 ∆w4 = 0.4x(-1-0.23)x-1 = 0.49 E = ∑ (t-y)2 = 1.51+ 1.44 + 1.64 + 1.51 = 6.1 ADALINE Network x1 x2 b y t dw1 dw2 db w1 w2 b (t-y)2 1 1 1 2.23 1 -0.49 -0.49 -0.49 0.28 0.48 0 1.51 1 -1 1 -0.2 1 0.48 -0.48 0.48 0.76 0 0.48 1.44 -1 1 1 -0.28 1 -0.51 0.51 0.51 0.25 0.51 0.99 1.64 -1 -1 1 0.23 -1 0.49 0.49 -0.49 0.74 1.0 0.5 1.51
  • 29. MADALINE Network Developed by Bernard Widrow Multiple ADALINE Network Combining a number of ADALINE Networks spread across multiple layers with adjustable weights The use of multiple ADALINEs help counter the problem of non-linear separability
  • 30. Perceptron Learning Functions in MATLAB learnp  learnp is the perceptron weight/bias learning function.  learnp calculates the weight change dW for a given neuron from the neuron's input P and error E according to the perceptron learning rule:  dw = 0, if e = 0 = p', if e = 1 = -p', e = -1  This can be summarized as  dw = e*p
  • 31. Perceptron Learning Functions learnpn  Normalized perceptron weight and bias learning function  learnpn is a weight and bias learning function. It can result in faster learning than learnp when input vectors have widely varying magnitudes.  learnpn calculates the weight change dW for a given neuron from the neuron's input P and error E according to the normalized perceptron learning rule:  pn = p / sqrt(1 + p(1)^2 + p(2)^2) + ... + p(R)^2)  dw = 0, if e = 0 = pn', if e = 1 = -pn', if e = -1  The expression for dW can be summarized as dw = e*pn'
  • 32. Multilayer Perceptron (MLP)  The oldest and most popular multi-layer neural network architectures  Use a non-linear activation function like the logistic sigmoid or the hyperbolic tangent, or a piecewise-linear activation function such as Rectifier Linear Unit (ReLU).
  • 33. Multilayer Perceptron  The advantage of the MLP over the classic Perceptron and Adaline.  Can create complex, non-linear decision boundaries that allow us to tackle problems where the different classes are not linearly separable.
  • 34. Back Propagation Network Developed by Rumelhart, Hinton, Williams The Back propagation learning rule is applicable on any feed forward network architecture (multilayer also) The Back propagation is a systematic method of training, built on high mathematical foundation and has very good application potential. BP algorithm is a generalization of the Delta rule or Widrow-Hoff error correction rule. Slow rate of convergence and local minima problem are its weaknesses
  • 35. Error Back Propagation  The Back propagation learning rule is applicable on any multilayer feed forward network architecture.  It can be considered the cornerstone of modern neural networks and deep learning.  The backpropagation algorithm consists of two steps: Forward Pass: inputs pass through the network and receive output predictions (this step is also known as the propagation step). Backward Pass: the loss function gradient is calculated in the network's final layer (prediction layer). It is used then for recursive application of the chain rule to update the weights in the network (also known as weight update or backpropagation)
  • 36. Error Back Propagation  The input array x passes through the first layer, whose output values are connected to the input values of the next layer, and so on, until the network gives, the outputs of the last layer.  Calculate the value of the error function, obtained by comparison with the expected output value.  In order to minimize the error, the gradients of the error function with respect to each weight is calculated.
  • 37. Error Back Propagation  Since the gradient vector has been calculated, each weight is updated in an iterative way, and recalculating the gradients at the beginning of each training iteration step, until the error becomes lower than a certain established threshold, or the maximum number of iterations is reached, when finally the algorithm ends, the network is well trained.  Current deep learning networks, like Convolutional Neural Networks, also uses backpropagation internally.  Recurrent Neural Networks, which has been used for natural language processing, also utilizes this algorithm.
  • 39. Back Propagation Network Input Layer Computation {O}i = {I}I {I}h = [V]t {O}i Hidden Layer Computation {I}o = [W]t {O}h               h h f I h e O  1 1  sigmoidal gain fh threshold of the hidden layer
  • 40. Back Propagation Network Output Layer Computation Calculation of error (Euclidean Norm)               o o f I o e O  1 1      2 2 1 o o O T E
  • 41. Back Propagation Network MLFF networks with non-linear activation functions have MSE surface above the total Q-dimensional space which is not a smooth parabolic surface. The error surface is complex and consists of many local and global minima. V W E A B Initial weights adjusted weights best weights C
  • 43. Back Propagation Network  During training, the incremental adjustments to the weights have been made, the location is shifted to a different E location on the error- weight surface.  In moving down the error-weight surface, the path followed depends on the shape of the surface and the learning rate.  The error surface is assumed to be truly spherical Vector AB = (Vi+1 - Vi)ī + (Wi+1 - Wi)ĵ = Vī + Wĵ              j W E i V E AB 
  • 44. Back Propagation Network         W E W O O O O T W E O W I O O I O O T O E W I I O O E W E h o o o h O o o O O o O o o o o                                    1 1
  • 45. Back Propagation Network               o o f I o e O  1 1                                       2 2 1 1 o o o o I I I I o o e e e e O dI d                                                           2 1 1 1 1 1 1 1 o o o o o I I I I I o o e e e e e O O        
  • 46. Back Propagation Network       i i i i i i i i i i i h h o o o o o h h o o o o V V V W W W V V E V W W E W V E V I O O W O O O T V E V I I O O I I O O E V E                                                  1 1 1 1 1 1       
  • 47. Back Propagation Network  Learning Rate Coefficient (α)  Determines the size of the weight adjustments made at each iteration and hence influences the rate of convergence.  Momentum Term (Coefficient): (η)  Momentum is used to keep the training process going in the same general direction.  ie. By adding a fraction of the previous weight change to the current weight change.  It reduces the training time and enhances the stability of the training process.
  • 48. weight matrices V = W = Back Propagation Example x1 x2 T 0.4 -0.7 0.1 0.3 -0.5 0.05 0.6 0.1 0.3 0.2 0.4 0.25 0.4 -0.7 Oi2 0.1 -0.2 0.4 0.2 0.2 -0.5 0.1 0.4 -0.2 0.2 0.2 -0.5
  • 49. Back Propagation Example Oi = Ii = V = Ih = Vt Oi = = Oh = Io = Wt Oh = = -0.14354 Oo = 0.4642 and T = 0.1 E = (0.1 – 0.4642)2 = 0.13264 0.2 -0.5 0.4 -0.7 0.1 -0.2 0.4 0.2 0.4 -0.7 0.18 0.02 0.5448 0.505 0.5448 0.505 0.1 0.4 -0.2 0.2
  • 50. ( ) ( ) h o o o O O - 1 O O - T λ = W E ∂ ∂ = 1*(0.1-0.4642)*0.4642*(1-0.4642)* = -0.09058 * = Back Propagation Network ( ) ( ) ( ) i h h o o o I O - 1 O Wλ O - 1 O O - T λ = V E ∂ ∂ 0.5448 0.505 -0.0493 -0.0457 0.5448 0.505
  • 51. = -0.09058* * * *Oi = = Back Propagation Network ( ) ( ) ( ) i h h o o o O O - 1 O Wλ O - 1 O O - T λ = V E ∂ ∂ 1- 0.5448 1- 0.505 -0.00449 0.01132 0.5448 0.505 0.2 -0.5 0.4 -0.7 -0.001077 0.002716 0.001855 0.004754
  • 52. Gradient Descent Training Functions traingd  Gradient descent backpropagation  traingd can train any network as long as its weight, net input, and transfer functions have derivative functions.  Backpropagation is used to calculate derivatives of performance perf with respect to the weight and bias variables X. Each variable is adjusted according to gradient descent: dX = lr * dperf/dX traingdm  Gradient descent with momentum backpropagation  Backpropagation is used to calculate derivatives of performance perf with respect to the weight and bias variables X. Each variable is adjusted according to gradient descent with momentum, dX = mc*dXprev + lr*(1-mc)*dperf/dX where dXprev is the previous change to the weight or bias.
  • 53. Gradient Descent Training Functions traingda  Gradient descent with adaptive learning rate backpropagation  traingda can train any network as long as its weight, net input, and transfer functions have derivative functions.  Backpropagation is used to calculate derivatives of performance perf with respect to the weight and bias variables X.  Each variable is adjusted according to gradient descent: dX = lr * dperf/dX At each epoch, if performance decreases toward the goal, then the learning rate is increased by the factor lr_inc. If performance increases by more than the factor max_perf_inc, the learning rate is adjusted by the factor lr_dec and the change that increased the performance is not made.
  • 54. Gradient Descent Training Functions traingdx  Gradient descent with momentum and adaptive learning rate backpropagation  traingdx can train any network as long as its weight, net input, and transfer functions have derivative functions.  Backpropagation is used to calculate derivatives of performance perf with respect to the weight and bias variables X.  Each variable is adjusted according to gradient descent with momentum, dX = mc*dXprev + lr*mc*dperf/dX where dXprev is the previous change to the weight or bias. For each epoch, if performance decreases toward the goal, then the learning rate is increased by the factor lr_inc. If performance increases by more than the factor max_perf_inc, the learning rate is adjusted by the factor lr_dec and the change that increased the performance is not made.
  • 55. Gradient Descent Learning Functions learngd  learngd is the gradient descent weight and bias learning function.  learngd calculates the weight change dW for a given neuron from the neuron's input P and error E, and the weight (or bias) learning rate lr, according to the gradient descent dW = lr*gW. learngdm  learngdm is the gradient descent with momentum weight and bias learning function.  learngdm calculates the weight change dW for a given neuron from the neuron's input P and error E, the weight (or bias) W, learning rate lr, and momentum constant mc, according to gradient descent with momentum: dW = mc*dWprev + (1-mc)*lr*gW  The previous weight change dWprev is stored and read from the learning state LS.
  • 56. Associative Memory Developed by John Hopfield Single layer feed forward or recurrent network which makes use of Hebbian learning or Gradient Descent learning rule A storehouse of associated patterns A content-addressable memory system allows the recall of data on the degree of similarity between the input patterns and the patterns stored in memory. Associative Memory Neural Networks (AMNN) -
  • 57. Associative Memory AMNN – Hopfield Neural Networks and Bi-directional Associative Memory. AMNN are single layer networks in which the weights are determined for the network to store a set of pattern associations. Each association is an input-output vector pair AutoAMNN – if the input vector is same as that of the output vector associated HeteroAMNN – if inputs and outputs are different
  • 58. Auto Associative Memory  Hopfield Associative Memory  Connection matrix is indicative of the association of the pattern with itself  Autocorrelator’s recall equation (activation function)  Two parameter bipolar threshold equation  Hamming Distance of vector X from Y    i m i T i A A T    1 ( ) ( ) 0 < α 1 - 0 = α β 0 > α 1 = β α = if if if f a t a f a old j ij i new j , , , , ,       n i i i y x y x HD 1 ,
  • 59. Auto Associative Memory - Example Considering three patterns A1 = A2 = A3 = Recall Equation T = -1 1 -1 1 1 1 1 -1 -1 -1 -1 1    i m i T i A A T    1 3 1 3 -3 1 3 1 -1 3 1 3 -3 -3 -1 -3 3               0 , 1 - 0 , 0 , 1 , ,       if if if f a t a f a old j ij i new j
  • 60. Auto Associative Memory - Example Stored pattern A2 = T = a1 new = f(1x3 + 1x1 + 1x3 + -1x-3, 1) = f(3+1+3+3, 1) = f(10, 1) = 1 a2 new = f(6, 1) = 1 a3 new = f(10, 1) = 1 a4 new = f(-10, -1) = -1 A2 new = 1 1 1 -1 3 1 3 -3 1 3 1 -1 3 1 3 -3 -3 -1 -3 3               0 , 1 - 0 , 0 , 1 , ,       if if if f a t a f a old j ij i new j 1 1 1 -1
  • 61. Auto Associative Memory - Example Another noisy vector A’ = a1 new = f(3+1+3-3, 1) = f(4, 1) = 1 a2 new = f(4, 1) = 1 a3 new = f(4, 1) = 1 a4 new = f(-4, 1) = -1 A2 new = 1 1 1 1               0 , 1 - 0 , 0 , 1 , ,       if if if f a t a f a old j ij i new j 1 1 1 -1
  • 62. Hetero Associative Memory  Developed by Bart Kosko  Hetero Associative memory neural network consists of only one layer of weighted interconnections.  There exists ‘n’ number of input neurons in the input layer and ‘m’ number of output neurons in the output layer.  This is a fully interconnected network, wherein the inputs and the outputs are different, hence it is called Hetero Associative memory neural network.  The weights are found using the Hebb Rule
  • 63. Hetero Associative Memory  There are N training pairs {(A1,B1), (A2,B2),--- }  Ai = (ai1, ai2, ai3 …….. ain)  Bi = (bi1, bi2, bi3 …….. bin)  Correlation Matrix  Bi-directional Associative Memory (BAM) is a hetero associative recurrent neural network consisting of two layers.  The net iterates by sending a signal back and forth between the two layers until each neuron’s activation remains constant for several steps. [ ][ ] i m 1 = i T i B A = M ∑
  • 64. The net can respond to input on either layer. The layers are referred to as X-layer and Y- layer instead of input and output layer. B’ = f(AM) A’ = f(B’MT ) Recall Equation B’’ = f(A’M) A’’ = f(B’’MT ) Hetero Associative Memory            0 , 1 - 0 , 0 , 1 ,       if if if f
  • 65. A1 = B1 = A2 = B2 = A3 = B3 = Converting to bipolar A1 = B1 = A2 = B2 = A3 = B3 = 1 0 0 1 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 -1 -1 1 1 Bi-directional Associative Memories
  • 66. Finding the connection matrix M = + + M =    i m i T i B A M    1 -1 -1 3 -1 -1 -1 -1 3 -1 3 -1 -1 Bi-directional Associative Memories 1 -1 1 1 -1 -1 1 -1 1 1 1 -1 1 -1 -1 -1 1 1 1 -1 -1
  • 67. Stored pattern A1 = M = b1 new = f(1x-1 +-1x-1 +-1x-1 + 1x3, 1) = f(-1+1+1+3, 1) = f(4, 1) = 1 b2 new = f(-4, 1) = -1 b3 new = f(4, 1) = 1 B1 new = 1 -1 -1 1 1 -1 1 Bi-directional Associative Memories -1 -1 3 -1 -1 -1 -1 3 -1 3 -1 -1            0 , 1 - 0 , 0 , 1 ,       if if if f
  • 68. with pattern B1 = MT = a1 new = f(1x-1 + -1x-1 + 1x3, 1) = f(-1+1+3, 1) = f(3, 1) = 1 a2 new = f(-1, 1) = -1 a3 new = f(-4, 1) = -1 a4 new = f(3, 1) = 1 A1 new = -1 -1 -1 3 -1 -1 3 -1 3 -1 -1 -1 1 -1 1 1 -1 -1 1 Bi-directional Associative Memories            0 , 1 - 0 , 0 , 1 ,       if if if f
  • 70. Two stored patterns of letter E Connection matrix Character Recognition 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 0 1 0 0 1 1 1 1 1 1 1 -1 -1 1 1 1 1 -1 -1 1 1 1 1 1 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 1 10 2 0 2 10 8 0 8 10
  • 71. Two stored patterns of letter E Connection matrix will be a 15x15 matrix Character Recognition 1 1 1 1 -1 -1 1 1 1 1 -1 -1 1 1 1 1 1 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 1
  • 73. Self-Organizing Maps (SOMs)  Self-Organizing Maps (SOMs) were invented by Professor T. Kohenen. Also known as Kohenen Neural Netwok (KNN)  This topology uses an unsupervised learning procedure to produce a two-dimensional discretized representation of the input space of the training samples called a ‘map’.  KNN is widely used for clustering applications Competitive Network
  • 74.  Kohenen worked in the development of the theory of competition.  The mostly used competition among group of neurons is Winner-Takes-All.  Here, only one neuron in the competing group will have a non-zero output signal when the competition is completed.  The self-organizing map, developed by Kohenen, groups the input data into clusters which are commonly used for unsupervised learning. Self-Organizing Maps (SOMs)
  • 75.  Whenever an input is presented, the network finds out the “distance” of the weight vector of each node from the input vector, and selects the node with the greatest distance.  In this way, the whole network selects the node with its weight vector closest to the input vector, i.e. the winner.  The network learns by moving the winning weight vector towards the input vector while the other weight vectors remain unchanged Self-Organizing Maps (SOMs)
  • 76.  If the samples are in clusters, then every time the winning weight vector moves towards a particular sample in one of the clusters.  Eventually each of the weight vectors would converge to the centroid of one cluster. At this point, the training is complete.  After training, the weight vectors become centroids of various clusters. Self-Organizing Maps (SOMs)
  • 78.  To cluster 4 bipolar input patterns into 2 clusters.  I1 = [1 1 1 -1]  I2 = [-1 -1 -1 1]  I3 = [1 -1 -1 -1]  I4 = [-1 -1 1 1]  The weights connected to the cluster units are:  W1 = [0.2 0.6 0.5 0.9]  W2 = [0.8 0.4 0.7 0.3]  Learning rate α = 0.9 Clustering of Bipolar Input Patterns
  • 79. Clustering of Bipolar Input Patterns  Euclidean Distance (ED) between the weight vector associated with it and the given input vector is the minimum  ED(1)= 𝑖=1:𝑛 𝑊𝑖 − 𝐼𝑖 2  ED(1) = (0.2-1)2+(0.6-1)2+(0.5-1)2+(0.9-(-1))2 = 4.66  ED(2) = (0.8-1)2+(0.4-1)2+(0.7-1)2+(0.3-(-1))2 = 2.18  Winner is the second cluster unit as ED is minimum
  • 80.  Weight Updation for cluster 2  Wi=2(new) = Wi=2(old) + α*(I1 - Wi=2(old))  W2 = [0.8 0.4 0.7 0.3]  W21(new) = 0.8 + 0.9*(1-0.8) = 0.98  W22(new) = 0.4 + 0.9*(1-0.4) = 0.94  W23(new) = 0.7 + 0.9*(1-0.7) = 0.97  W24(new) = 0.3 + 0.9*(-1-0.3) = -0.87  W2(new) = [0.98 0.94 0.97 -0.87]  W1 = [0.2 0.6 0.5 0.9] Clustering of Bipolar Input Patterns
  • 81.  Clustering of Bipolar Input Patterns
  • 82.  Weight Updation for cluster 1  Wi=1(new) = Wi=1(old) + α*(I1 - Wi=1(old))  W1 = [0.2 0.6 0.5 0.9]  W11(new) = 0.2 + 0.9*(-1-0.2) = -0.88  W12(new) = 0.6 + 0.9*(-1-0.6) = -0.84  W13(new) = 0.5 + 0.9*(-1-0.5) = -0.85  W14(new) = 0.9 + 0.9*(1-0.9) = 0.99  W1(new) = [-0.88 -0.84 -0.85 0.99]  W2(new) = [0.98 0.94 0.97 -0.87] Clustering of Bipolar Input Patterns
  • 83.  Clustering of Bipolar Input Patterns
  • 84.  Weight Updation for cluster 1  Wi=1(new) = Wi=1(old) + α*(I1 - Wi=1(old))  W1 = [-0.88 -0.84 -0.85 0.99]  W11(new) = -0.88 + 0.9*(1-(-0.88)) = 0.812  W12(new) = -0.84 + 0.9*(-1-(-0.84)) = -0.984  W13(new) = -0.85 + 0.9*(-1-(-0.85)) = -0.985  W14(new) = 0.99 + 0.9*(-1-0.99) = -0.801  W1(new) = [0.812 -0.984 -0.985 -0.801]  W2(new) = [0.98 0.94 0.97 -0.87] Clustering of Bipolar Input Patterns
  • 85. Clustering of Bipolar Input Patterns  Euclidean Distance (ED) for pattern 4 I4 = [-1 -1 1 1]  ED(1)= 𝑖=1:𝑛 𝑊𝑖 − 𝐼𝑖 2  ED(1) = (0.812-(-1)2+(-0.984-(-1))2+(-0.985-1)2 +(-0.801-1)2 = 10.4674  ED(2) = (0.98-(-1))2+(0.94-(-1))2+(0.97-1)2 +(-0.87-1)2 = 11.1818  Winner is the first cluster unit as ED is minimum
  • 86.  Weight Updation for cluster 1  W1(new) = [0.812 -0.984 -0.985 -0.801]  W11(new) = 0.812 + 0.9*(-1- 0.812) = -0.8188  W12(new) = -0.984 + 0.9*(-1-(-0.984)) = -0.9984  W13(new) = -0.985 + 0.9*(1-(-0.985)) = 0.8015  W14(new) = -0.801 + 0.9*(1-(-0.801)) = 0.8199  W1(new) = [-0.8188 -0.9984 -0.8015 0.8199]  W2(new) = [0.98 0.94 0.97 -0.87]  After one epoch (iteration), patterns I2,I3 and I4 are in cluster W1 and I1 is in cluster W2  After several epochs, clustering becomes stagnant Clustering of Bipolar Input Patterns
  • 87. Clustering Technique Vector Quantization is a method of dynamic allocation of cluster centers. To begin with, the first pattern will create the cluster to hold it. Points x y Points x y P1 2 3 P7 6 4 P2 3 3 P8 7 4 P3 2 6 P9 2 4 P4 3 6 P10 3 4 P5 6 3 P11 2 7 P6 7 3 P12 3 7
  • 88. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 P1 P9 P11 P4 P12 P5 P7 P6 P8 P3 P2 P10
  • 90. Clustering Technique Coordinates of P1 = (2,3) Centre of Cluster C1 = (2,3) Threshold distance = 1.5 Considering point P2 whose coordinates are (3,3) Distance between P2 and C1 =((3-2)2 + (3-3)2) = 1.0 < 1.5 Hence P2 is included in C1 New cluster centre of C1 = 3+2 2 , 3+3 2 = (2.5, 3) Points x y Points x y P1 2 3 P7 6 4 P2 3 3 P8 7 4 P3 2 6 P9 2 4 P4 3 6 P10 3 4 P5 6 3 P11 2 7 P6 7 3 P12 3 7
  • 91. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 P3 P2 P10
  • 92. Clustering Technique Considering point P3 whose coordinates are (2,6) Centre of Cluster C1 = (2.5,3) Distance between P3 and C1 =((2-2.5)2 + (6-3)2) = 3.04 This is greater than 1.5 Hence P3 is not included in C1. Another cluster C2 is selected whose centre is (2, 6) Considering point P4 whose coordinates are (3,6) Distance between P4 and C1 =((3-2.5)2 + (6-3)2) = 3.04 > 1.5 Distance between P4 and C2 =((3-2)2 + (6-6)2) = 1.0 < 1.5 Hence P4 is not included in C1 but included in C2 New cluster centre of C2 = 3+2 2 , 6+6 2 = (2.5, 6)
  • 93. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 P3 P2 P10
  • 94. Clustering Technique Considering point P5 whose coordinates are (6,3) Centre of Cluster C1 = (2.5,3) Distance between P5 and C1 =((6-2.5)2 + (3-3)2) = 3.5 > 1.5 Distance between P5 and C2 =((6-2.5)2 + (3-6)2) = 4.6 > 1.5 Hence P5 is not included in C1 and also in C2 Another cluster C3 is selected whose centre is (6, 3) Considering point P6 whose coordinates are (7,3) Centre of Cluster C1 = (2.5,3) Centre of Cluster C2 = (2.5,6) Distance between P6 and C1 =((7-2.5)2 + (3-3)2) = 4.5 > 1.5 Distance between P6 and C2 =((7-2.5)2 + (3-6)2) = 5.40 > 1.5 Hence P6 is not included in C1 and in C2 Distance between P6 and C3 =((7-6)2 + (3-3)2) = 1.0 < 1.5 Now P6 is included in C3 New cluster centre of C3 = 6+7 2 , 3+3 2 = (6.5, 3)
  • 95. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 C3 P3 P2 P10
  • 96. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 C3 P3 P2 P10
  • 97. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 C3 P3 P2 P10
  • 98. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 C3 P3 P2 P10
  • 99. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 C3 P3 P2 P10
  • 100. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 C3 P3 P2 P10
  • 101. Clustering Technique 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 C1 P1 P9 P11 P4 P12 P5 P7 P6 P8 C2 C3 P3 P2 P10
  • 103. Adaptive Resonance Theory  ART was introduced by Carpenter and Stephen Grossberg  Widely used for clustering applications.  The problems faced by competitive NNs are that they do not always form stable clusters.  They are oscillatory when more input patterns are presented.  ART NN are receptive to significant new patterns and still remains stable.  There are three types of ART networks: (i) ART-1 (ii) ART-2 and (iii) ART-3
  • 104. Adaptive Resonance Theory  ART-1 can cluster only binary inputs  ART-2 can handle gray-scale inputs  ART-3 can handle analog inputs better by overcoming the limitations of ART-2.  The basic ART learning is an unsupervised one.  Stability of the network means that a pattern should not oscillate among different cluster units at different stages of training.  Plasticity is the ability of the net to respond to learn new pattern equally well at any stage of learning.
  • 105. Adaptive Resonance Theory  The key innovation of ART is the use of a degree of expectation called vigilance parameter.  Vigilance parameter is the user specified value to decide the degree of similarity essential for the input patterns to be assigned to a cluster unit.  As each input is presented to the network, it is compared with the prototype vector for a match based on the vigilance parameter.  If the match is not adequate, a new prototype or a cluster unit is selected.  In this way, previous learned memories (prototypes) are not eroded by new learning.
  • 106. Adaptive Resonance Theory  ‘Resonance’ in ART is the state of the network when a class of prototype vector very closely matches to the current input vector, and leads to a state which permits learning.  During this resonant state, the weight updation takes place.  The basic architecture consists of three layers:  Input Processing Layer for processing the given inputs.  Further divided into Input Layer and Input Interface Layer  Output layer has the cluster units. This is the competitive layer or a recognition region.
  • 107. Adaptive Resonance Theory  Interface layer is called the comparison region where it transfers the input vector to its best match in the recognition region.  Reset Layer decides the degree of similarity of patterns placed on the same cluster by a reset mechanism.  It compares the strength of the recognition match to the vigilance parameter.  Bottom-up weights are connected between the Input Interface Layer to the Output layer.  Top-down weights are connected between the Output layer to the Input Interface Layer.
  • 108. Adaptive Resonance Theory Output layer Input layer Reset layer Input Interface layer Bottom-up weights Top-down weights
  • 110. Adaptive Resonance Theory  The units transmit the information to the output layer through the bottom-up weights u,  O1 = I1u11 + I2u12 = 0.5*0.3 + 0.6*0.5 = 0.45  O2 = I1u21 + I2u22 = 0.5*0.2 + 0.6*0.6 = 0.46  O2 > O1 so output cluster 2 is selected as winner  The information about the winner is sent from the output layer to the interface layer through the top- down weights d.  I1 = S1d11 = 0.5*0.1 = 0.05  I2 = S2d12 = 0.6*0.3 = 0.18  Norm of I is 𝐼 = I1 + I2 = 0.05 +0.18 = 0.23  The value of 𝐼 gives an estimate of the degree of match
  • 111. Adaptive Resonance Theory  The learning will occur only if the match is acceptable to the value of vigilance parameter.  The verdict for learning is carried out by calculating the ratio of 𝐼 and 𝑆 .  The updation of the weights is carried out if Match Ratio 𝐼 𝑆 ≥ v  𝐼 𝑆 = 0.23/1.1 = 0.209 < v (0.3)  If 𝐼 𝑆 < v, then the current cluster unit is rejected and inhibited.
  • 112. Adaptive Resonance Theory  Again I1 and I2 is calculated for next cluster unit  I1 = S1d21 = 0.5*0.6 = 0.3  I2 = S2d22 = 0.6*0.1 = 0.06  𝐼 = I1 + I2 = 0.3 +0.06 = 0.36  𝐼 𝑆 = 0.36/1.1 = 0.327 > v (0.3)  Cluster 2 is selected and S is assigned to it.  The weights associated with it are updated.
  • 113. Adaptive Resonance Theory  The top-down weights associated with cluster 2 are assigned the new calculated values I1 and I2  d21 = I1 = 0.3  d22 = I2 = 0.06  The new bottom-up weights are calculated as:  u21 = 𝐿∗𝐼1 𝐿−1+ 𝐼 = 4∗0.3 4−1+0.36 = 0.454  u22 = 𝐿∗𝐼2 𝐿−1+ 𝐼 = 4∗0.06 4−1+0.36 = 0.091  This procedure is repeated until a cluster unit is accepted or all the units in the output layer are inhibited.
  • 114. Adaptive Resonance Theory  If all the units in the output layer are inhibited, a decision has to be taken by the user.  Reduce the value of the vigilance parameter allowing less matched patterns to be placed on the same cluster units which may be inhibited during earlier learning trial.  Addition of more number of cluster units.  Specify the current input pattern as the one that cannot be clustered. The vigilance parameter v can have a value less than 1  L > 1
  • 116.  plotpv - Plots perceptron input/target vectors  plotpv(P,T) P is the matrix of input vectors and T is the matrix of binary target vectors  P = [ -0.5 -0.5 +0.3 -0.1; -0.5 +0.5 -0.5 +1.0];  T = [1 1 0 0]; plotpv(P,T);  plotpc - Plots classification line on perceptron vector plot  plotpc(W,B) W is the weight matrix and B is the bias vector July 16, 2023 116 Neural Network Toolbox
  • 117.  newp Creates a perceptron  net = newp(P,T,TF,LF) P is the R x Q1 matrix of input vectors T is the S x Q2 matrix of target vectors TF is the transfer function (default = ‘hardlim') LF is the Learning function (default = 'learnp')  net.iw{1,1} = [-1.2 -0.5]; net.b{1} = 1; plotpc(net.iw{1,1},net.b{1})  adapt Allow neural network to change weights and biases on inputs July 16, 2023 117 Neural Network Toolbox (percpt)
  • 118.  adapt Allow neural network to change weights and biases on inputs  This function calculates network outputs and errors after each presentation of an input.  [net,Y,E,tr] = adapt(net,P,T) net is the Network P Network inputs T Network targets (default = zeros) Y Network outputs E Network errors tr Training record (epoch and perf)  net.adaptParam.passes July 16, 2023 118 Neural Network Toolbox
  • 119.  sim Simulate neural network  This function calculates network outputs and errors after each presentation of an input.  net is the Network P Network inputs T Network targets (default = zeros) Y Network outputs E Network errors  [Y,E,perf] = sim(net,P,T) perf Network performance July 16, 2023 119 Neural Network Toolbox
  • 120.  newff Creates a feed-forward backpropagation network  net = newff(P,T,Si,Tfi) P is the R x Q1 matrix of input vectors T is the SN x Q2 matrix of target vectors Si is the Size of the ith (hidden) layer TFi is the transfer function of the ith layer This function initializes its weights and biases. It also sets the input, output data processing functions and training functions to default values July 16, 2023 120 Neural Network Toolbox (feedfrwd)
  • 121.  train Train neural network  This function trains a network net according to net.trainFcn and net.trainParam..  [net, tr,Y,E] = train(net,P,T) net is the Network P Network inputs T Network targets (default = zeros) Y Network outputs E Network errors tr Training record (epoch and perf)  net.trainParam.epochs  net.trainParam.goal July 16, 2023 121 Neural Network Toolbox
  • 122.  Two different styles of training.  Incremental training - the weights and biases of the network are updated each time an input is presented to the network.  In this case, the function adapt is used , and the inputs and targets are presented as sequences. P = {[1;2] [2;1] [2;3] [3;1]}; T = {4 5 7 7};  Batch training - the weights and biases are only updated after all the inputs are presented. The function train can only perform batch training. July 16, 2023 122 Neural Network Toolbox
  • 123.  train applies the inputs to the new network, calculates the outputs, compares them to the associated targets, and calculates a mean square error. If the error goal is met, or if the maximum number of epochs is reached, the training is stopped, and train returns the new network and a training record. Otherwise train goes through another epoch.  train uses a matrix of concurrent vectors. P = [1 2 2 3; 2 1 3 1]; T = [4 5 7 7]; July 16, 2023 123 Neural Network Toolbox
  • 124.  Create and train a FF network to evaluate the following function:  for -10 < x < 10  Generate input-output training data x=-10:0.5:10 y=(x^2-6.5)/(x^2+6.5);  Create a feed forward neural network net=newff(x,y,5,{‘tansig’,’tansig’},’traingd’)  Train the network net=train(net,x,y); July 16, 2023 124 Neural Network Toolbox 5 . 6 + x 6.5 - x = y 2 2 (feedfrwd1)
  • 125.  Pre-processing and Post-processing Inputs and Outputs  Result in faster and efficient training of the network  Pre- and Post-processing training data functions are assigned automatically by network creation functions like newff  The function mapminmax scales inputs and outputs so that they are in the range [-1 1]  The normalized output is converted back to original by using the function mapminmax with argument reverse July 16, 2023 125 Neural Network Toolbox (preprocs)