1. UNIT III
Feedback Neural Networks and
Self organizing Feature map
Course :Soft Computing
By: Dr P Indira priyadarsini B.Tech,M.Tech,Ph.D
D E P A R T M E N T O F I T 1
2. D E P A R T M E N T O F I T 2
ASSOCIATIVE LEARNING
7. D E P A R T M E N T O F I T 7
Hopfield networks
• It is a form of recurrent artificial neural network invented by John Hopfield.
• Hop field network serve as content addressable memory systems with binary
threshold units.
• These are single layered recurrent networks
• All the neurons in the network are feedback from all other neurons in the
network.
• Number of input nodes should always be equal to number of output nodes.
• Hop field network is similar to our recall mechanism
• It is based on content addressable memory.
• It is recurrent network with total connectivity and a symmetric weight matrix,
binary valued outputs.
• It incorporates feedback and recurrent feed forward.
8. D E P A R T M E N T O F I T 8
Hopfield networks
- It has been developed based on fixed weights and adaptive activations.
- These nets serve as associative memory nets and can be used to solve constraint
satisfaction problems such as the “Travelling Salesman Problem”
9. D E P A R T M E N T O F I T 9
Hopfield networks
• A Hopfield network can reconstructed a pattern from a corrupted signal
• This means that the network has been able to store the correct pattern – in
other words it has a memory.
• These networks are called Associative memories or Hop field memories.
Feedback connection
+1
-1
+1
-1
Pattern A Pattern B
10. D E P A R T M E N T O F I T 10
• Networks with such connections are called “feedback” or recurrent networks
• Hop field networks is the one in which we add feedback connections to the
network and with these connections the network are capable of holding
memories.
• Once the outputs are obtained we feed it back to the inputs again. So we take
output1 and feed it into input A and likewise feed output2 into input B. This gives
us two new inputs and the process is repeated.
• We keep on doing this until the outputs don’t change any more (they remain
constant).
12. D E P A R T M E N T O F I T 12
HOPFIELD NETWORKS CONTI…
A Hopfield network which operates in a discrete line fashion or in other words, it
can be said the input and output patterns are discrete vector, which can be either
binary 0,1 or bipolar +1,−1 in nature. The network has symmetrical weights with no
self-connections i.e., wij = wji and wii= 0.
Architecture
Following are some important points to keep in mind about discrete Hopfield
network .
This model consists of neurons with one inverting and one non-inverting output.
The output of each neuron should be the input of other neurons but not the input
of self.
Weight/connection strength is represented by wij.Connections can be excitatory
as well as inhibitory. It would be excitatory, if the output of the neuron is same as
the input, otherwise inhibitory.
Weights should be symmetrical, i.e. wij = wji
For eg: if one of my inputs is colour of hat, and hat can be red,
green, blue, yellow.
binary would produce input vectors of [0, 0, 1, 0] bipolar: [-1, -
1, 1, -1]
13. D E P A R T M E N T O F I T 13
The output from Y1 going to Y2, Yi and Yn have the
weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on
them.
14. D E P A R T M E N T O F I T 14
Training Algorithm
• During training of discrete Hopfield network, weights will be
updated(Hebbain principle).
• As we know that we can have the binary input vectors as well as bipolar input
vectors. Take wii = 0.(no self connection)
• Hence, in both the cases, weight updates can be done with the following
relation
15. D E P A R T M E N T O F I T 15
Testing Algorithm
Step 1 − Initialize the weights to store patterns, which are obtained from training
algorithm by using Hebbian principle.
Step 2 − Perform steps 3-9, if the activations of the network is not converged.
Step 3 − For each input vector X, perform steps 4-8.
Step 4 − Make initial activation of the network equal to the external input vector X as
follows −
Step 5 − For each unit Yi, perform steps 6-9.
Step 6 − Calculate the net input of the network as follows −
Step 7 − Apply the activation as follows over the net input to calculate the output −
Here θi is the threshold. Threshold value =0 .
Step 8 − Broadcast this output yi to all other units.
Step 9 − Test the network for convergence.
1 1 1 0
X = 0 0 1 0
18. D E P A R T M E N T O F I T 18
When outputs are directed back as
inputs to same or preceding layer
nodes it results in the formation of
feedback networks
FEEDBACK NETWORK
19. D E P A R T M E N T O F I T 19
SIMULATED ANNEALING
The basic concept of Simulated Annealing SA is motivated by the annealing in
solids. In the process of annealing, if we heat a metal above its melting point
and cool it down then the structural properties will depend upon the rate of
cooling. We can also say that SA simulates the metallurgy process of
annealing.
Use in ANN
SA is a stochastic computational method, inspired by Annealing analogy, for
approximating the global optimization of a given function. We can use SA to
train feed-forward neural networks.
Algorithm
Step 1 − Generate a random solution.
Step 2 − Calculate its cost using some cost function.
Step 3 − Generate a random neighboring solution.
Step 4 − Calculate the new solution cost by the same cost function.
Step 5 − Compare the cost of a new solution with that of an old solution as
follows −
If CostNew Solution < CostOld Solution then move to the new solution.
Step 6 − Test for the stopping condition, which may be the maximum number
of iterations reached or get an acceptable solution.
23. D E P A R T M E N T O F I T 23
STATE TRANSITION DIAGRAM
The energy analysis of the Hopfield network in the previous section shows that the
energy of the network at each state either decreases or remains the same as the
network dynamics evolves.
• In other words, the network either remains in the same state or moves to a state
having a lower energy.
• This can also be demonstrated by means of a state transition diagram which gives
the states of the network and their energies, together with the probability of
transition from one state to another.
• In this section we will illustrate the state transition diagram.
• For a 3-unit feedback network with symmetric weights wij = wji.
• The units have a threshold value of θi,, i = 1, 2, 3 and a binary (0, 1) output function.
A binary output function is assumed for convenience, although the conclusions are
equally valid for the bipolar (-1, +1) case.
Figure 5.5 shows a 3-unit feedback network. The state update for
the unit i is governed by the following equation:
24. D E P A R T M E N T O F I T 24
• The energy at any state sl s2 s3 of the network is given by
• There are eight different states for the 3-unit network, as each
of the si may assume a value either 0 or 1.
• Thus the states are: 000, 001, 010, 100, 011, 101, 110 and 111.
• Assuming the values we get the following energy values, for each state.
V(OO0) = 0.0, V(00l) = 0.7, V(O10) = - 0.2, V(1OO) = - 0.1,
V(Ol1) = 0.1, V(101) = 0.1, V(110) = 0.2, and V(111) = 0.0.
25. D E P A R T M E N T O F I T 25
FALSE MINIMA PROBLEM
Supervised learning of multilayered neural networks with conventional
learning algorithms faces the local minimum problems.
Gradient descent-type learning algorithms including Back propagation
changes the connection weights of a network using a training set of input-
output pairs without any prior knowledge.
Using a gradient descent to adjust the weights involves following a local
slope of the error surface which may lead toward some undesirable points,
or the local minima.
In this situation, conventional training of neural networks often gets stuck in
the local minima.
There are several studies that investigate this problem, by exploring the
appearance of the architecture and the learning environment for the local
minima-free condition.
Different types of local minima are used in order to understand the behavior
of an error surface in the neighborhood of a local minimum and to explain
the global behavior of the error surface.
In fact, the local minima are mainly associated with two factors: the learning
style and the network structure.
The methods handling the problem can be based on a deterministic
approach or a probabilistic approach.
27. D E P A R T M E N T O F I T 27
The most common learning method used for supervised learning with
feedforward neural networks (FNNs) is back propagation (BP) algorithm.
The BP algorithm calculates the gradient of the network’s error with respect
to the network's modifiable weights.
However, the BP algorithm may result in a movement toward the local
minimum.
28. D E P A R T M E N T O F I T 28
STOCHASTIC UPDATE
• Error in pattern recall due to false minima can be reduced significantly if initially
the desired patterns are stored (by careful training) at the lowest energy minima
of a network.
• Solution for False minima problem is Stochastic update.
• The error can be reduced further by using suitable activation dynamics. Let us
assume that by training we have achieved a set of weights which will enable the
desired patterns to be stored at the lowest energy minima.
• The activation dynamics is modified so that the network can also move to a state
of higher energy value initially, and then to the nearest deep energy minimum. This
way errors in recall due to false minima can be reduced.
• It is possible to realize a transition to a higher energy state from a lower energy
state by using a stochastic update in each unit instead of the deterministic update
of the output function as in the Hopfield model.
• In a stochastic update the activation value of a unit does not decide the next
output state of the unit by directly using the output function f(x) as shown in
Figure 5.8a.
29. D E P A R T M E N T O F I T 29
• Instead, the update is expressed in probabilistic terms, like the probability of firing
by the unit being greater than 0.5 if the activation value exceeds a threshold, and
less than 0.5 if the activation value is less than the threshold.
• Note that the output function fi) is still a nonlinear function, either a hard-limiting
threshold logic function or a semi linear sigmoidal function, but the function itself
is applied in a stochastic manner.
Figure 5.8b shows a typical probability function that can be used
for stochastic update of units. The output function itself is the binary
logic function f(x) shown in Figure 5.8a.
30. D E P A R T M E N T O F I T 30
• The probability of firing for an activation value of x can be
expressed as
The probability function is defined in terms of a parameter called temperature T. At T
= 0, the probability function is sharp with a discontinuity at x =θ.
• In this case the stochastic update reduces to the deterministic update used in the
Hopfield analysis. As the temperature is increased, the uncertainty in making the
update according to f(x) increases, giving thus a chance for the network to go to a
higher energy state.
• Therefore the result of the Hopfield energy analysis, namely , will be no longer
true for nonzero temperatures.
• Finally, when T = ∞, then the update of the unit doesnot depend on the activation
value (x) any more. The state of a unit changes randomly from 1 to 0 or vice versa.
31. D E P A R T M E N T O F I T 31
BIDIRECTIONAL ASSOCIATIVE MEMORY
Bidirectional Associative Memory (BAM) is a supervised learning model in Artificial Neural Network.
This is hetero-associative memory, for an input pattern, it returns another pattern which is potentially of a
different size.
This phenomenon is very similar to the human brain.
Human memory is necessarily associative.
It uses a chain of mental associations to recover a lost memory like associations of faces with names, in exam
questions with answers, etc.
In such memory associations for one type of object with another, a Recurrent Neural Network (RNN) is
needed to receive a pattern of one set of neurons as an input and generate a related, but different, output
pattern of another set of neurons.
32. D E P A R T M E N T O F I T 32
BIDIRECTIONAL ASSOCIATIVE MEMORY
The objective is to store a set of pattern pairs in such a way that any stored
pattern pair can be recalled by giving either of the patterns as input.
The network is a two-layer heteroassociative neural network (Figure 7.1) that
encodes binary or bipolar pattern pairs (al, bl) using the Hebbian learning.
It can learn on-line and it operates in discrete time steps.
The BAM weight matrix from the first layer to the second layer is given by
40. D E P A R T M E N T O F I T 40
A bidirectional associative memory stores a set of pattern associations by
summing bipolar correlation matrices (an n by m outer product matrix for each
pattern to be stored).
The architecture of the net consists of two layers of neurons, connected by
directional weighted connection paths.
The net iterates, sending signals back and forth between the two layers until all
neurons reach equilibrium (i.e., until each neuron's activation remains constant
for several steps).
Bidirectional associative memory neural nets can respond to input to either layer.
Because the weights are bidirectional and the algorithm alternates between
updating the activations for each layer, we shall refer to the layers as the X-layer
and the Y-layer (rather than the input and output layers).
47. D E P A R T M E N T O F I T 47
Competitive Learning
• Competitive networks cluster, encode and classify an input data stream in a
way such that vectors which logically belong to the same category or vectors
that share similar properties cause the same neuron in the network to win the
competition.
• Competitive Learning algorithms use competition between lateral neurons in
a layer to provide selectivity of the learning process.
• Most competitive models use hard competition where exactly one neuron, the
one with the activation in the layer is declared the winner.
• One output neuron wins over other output neurons.
• Only winning neuron learns.
• In some models Soft competition is used. It is the one in which competition
suppresses the activities of all neurons except those that might lie in a
neighborhood of the true winner.
• Whether the competition is soft or hard, competitive learning algorithms
employ localized learning by updating the weights of only the active neurons.
48. D E P A R T M E N T O F I T 48
• The connections between input nodes x1,x2,x3,x4 and output nodes
y1,y2,y3 are called feed forward connections or excitatory connections.
• The connections between output nodes y1,y2,y3 are called lateral
connections or inhibitory connections.
53. D E P A R T M E N T O F I T 53
VECTOR QUANTIZATION
Vector quantization is a lossy compression technique used in speech
and image coding. In scalar quantization, a scalar value is selected
from a finite list of possible values to represent a sample.
54. D E P A R T M E N T O F I T 54
APPLICATIONS OF SELF ORGANIZING MAPS
1. Pattern Classification
2. Neural Phonetic Typewriter
3. Vector Quantization
4. Data clustering
5. Image compression
1. Pattern Classification Problem:
If a group of input patterns correspond to the same output pattern, then there
will be far fewer output patterns compared to number of input patterns.
If some of the output patterns in the pattern association problem are identical,
then the number of distinct output patterns can be viewed as class labels and the
input patterns corresponding to each class can be viewed as samples of that
class. Whenever a pattern belonging to a class is given as input, the network
identifies the class label.
An example of pattern recognition is classification, which attempts to assign
each input value to one of a given set of classes (for example, determine
whether a given email is "spam" or "non-spam"). ... This is opposed
to pattern matching algorithms, which look for exact matches in the input with
pre-existing patterns.
convergence describes a progression towards a network state where the network has learned to properly respond to a set of training patterns within some margin of error.
convergence describes a progression towards a network state where the network has learned to properly respond to a set of training patterns within some margin of error.