2. 2
Self-Organizing Networks (Maps)
Properties:
The weights in the neurons should be representative
of a class of patterns. So each neuron represents a
different class.
Input patterns are presented to all of the neurons,
and each neuron produces an output. The value of
the output of each neuron is used as a measure of
the match between the input pattern and the pattern
stored in the neuron.
A competitive learning strategy which selects the
neuron with the largest response.
A method of reinforcing the largest response.
4. 4
Like other neural networks, take inputs, xj, and
produce a weighted sum called netj. This
weighted sum is the output of the neuron,
which means that there is no non-linear output
function in the neurons.
The weighted sum can be expressed in vector
form as:
where |X| means the magnitude of X. In other
words netj is the product of two vectors and
therefore can be expressed as the “length” of
one vector multiplied by the projection of the
other vector along the direction of the first
vector.
θ===∑=
cosYX]W][X[xwnet ji
n
0i
ijj
5. 5
If the two vectors X and Wj are normalized,
which means scaling them so that they each
have a length of 1, the product ends up being
equal to cos(θ), where θ is the angle between
the two vectors.
If the two vectors are identical, θ will be zero,
and cos(θ) = 1.
The further apart the two vectors become, the
greater the angle (positive or negative) and the
smaller the value of the product.
In the extreme, where the input pattern is the
inverse of the stored weight, θ is ±1800
and
cos(θ) = -1.
6. 6
If the assumption is made that patterns that are
similar will be close together in pattern space,
then normalizing the input vector means that
the output of a neuron is a measure of the
similarity of the input pattern and and its
weights.
If a network is set up initially with random
weights, when an input pattern is applied, each
of the neurons will produce an output which is
a measure of the similarity between the weights
and input pattern. The neuron with the largest
response will be the one with the weights that
are most similar to the input pattern.
7. 7
Normalizing the input vector means dividing the
magnitude of the vector which is the square
root of the sum of square of all the elements in
the vector.
Example:
Assume that a neuron has been trained with the
pattern: 011
and that the weights are normalized. The
magnitude of X is
and the weights are therefore
w1 = 0, w2 = w3 = 1/1.412 ≈ 0.7
∑=
=
n
1i
2
ixX
412.12110xX 222
n
1i
2
i ==++== ∑=
8. 8
The following table shows the value of the output of this
neuron when other input patterns are applied. It can be
seen that the output ranges from 0 to 1, and that the
more the input pattern is like the stored pattern the
higher the output score.
Input
0 0 0 0 0 0 0
0 0 1 0 0 1 0.7
0 1 0 0 1 0 0.7
0 1 1 0 0.7 0.7 1
1 0 0 1 0 0 0
1 0 1 0.7 0 0.7 0.5
1 1 0 0.7 0.7 0 0.5
1 1 1 0.6 0.6 0.6 0.8
Normalized input output
9. 9
The next step is to apply a learning rule so that
the neuron with the largest response is selected
and its weights are adjusted to increase its
response. The first part part is described as a
“winner takes all” mechanism and can be
simply stated as
yj = 1 if netj > neti for all i≠j.
yj = 0 otherwise
The learning rule for adjusting the weights is
different to the Hebbian rules that have been
described in the previous lectures.
Instead of the weights being adjusted so that
the actual output matches some desired output,
the weights are adjusted so that they become
more like the incoming patterns.
10. 10
Mathematically, the learning rule, which is
often referred to as Kohonen learning is:
∆wij = k(xi – wij)y
In the extreme case where k = 1, after
being presented with a pattern, the
weights in a particular neuron will be
adjusted so that they are identical to the
inputs, that is wij = xi. Then for that
neuron, the output is maximum for that
input pattern. Other neurons are trained
to be maximum for other input patterns.
11. 11
With k < 1 the weights change in a way that
makes them more like the input patterns
but not necessarily the identical. After
training the weights should be
representative of the distribution of the
input pattern.
The term yj is included so that, during
training, only the neuron with the largest
response will have an output of 1 after
competitive learning, while all other
outputs are set to zero. Therefore, only
this neuron will adapt its weights.
12. 12
The combination of finding the weighted
sum of normalized vectors, Kohonen
learning and competition means that the
instar network has the ability to organize
itself such that individual neurons have
weights that represent particular
patterns or classes of patterns. When a
pattern is presented at its input, a single
neuron, which has weights that are the
closest to the input pattern, produces a 1
output while all the other neurons
produce a 0. Learning in the instar is
therefore unsupervised.
13. 13
Outstar Networks:
An outstar network consists of neurons which finds the
weighted sum of inputs.
Its function is to convert the input pattern, xi, into a
recognized output pattern and is therefore supervised.
14. 14
The learning rule for the outstar network is often
referred to as Grossberg learning and can be
stated mathematically as
∆wij = k(yi – wij)xi
It is possible to combine instar and outstar
together as shown below:
15. 15
A property of this network is that if a new
pattern is presented, the stored pattern
that is most similar to it will produce the
maximum output in the first layerand
then recall the stored pattern in the
second layer. So the instar/outstar
network can generalize and recall perfect
data from imperfect data.
17. 17
This theorem works as follows:
Step1: Input pattern X directly to the instar network.
Step2: Find the neuron with the maximum response –
neuron i
Step3: Make the output of neuron i equal to 1, and all
others zero.
Step4: Feed the output of the instar to the input of the
outstar to generate an output pattern, Y.
Step5: Feed Y to create a new pattern which equals X
AND Y.
Step 6: calculate the vigilance, ρ.
Step 7: if ρ is greater than some predetermined threshold,
modify the weights of neuron i in the instar network so
that the output produced equals the new pattern X
AND Y. Go back to step 1.
Step8: If ρ is less than the threshold, supress the output of
neuron i
, and find the neuron with the next largest output value –
neuron j. Go to step 3.
18. 18
The vigilance ρ equals the number of 1s in
the pattern produced by finding X AND
Y, divided by the number of 1s in the
input pattern, X. This can be written as
where yi is the stored pattern in 0/1
notation and Λ is the AND function.
In this algorithm, the weights are
normalized as
∑
∑
=
=
∧
=ρ n
0i
i
n
0i
ii
x
yx
( )∑=
∧+−
∧
= n
1i
ii
ii
i
yx1L
)yx(L
w where weights must be greater than 1.
19. 19
A typical solution is to let L = 2 so that the
equation becomes
In the second layer, weights in each of the
neurons are adjusted so that they too
correspond to the AND of the two patterns and
therefore have values of either 0 or 1. The effect
is that the patterns ‘resonate’, producing a
stable output.
( )
( )∑=
∧+
∧
= n
1i
ii
ii
i
yx1
yx2
w
20. 20
Example:
Consider the following diagram. Assume that an input pattern
Of 0 1 0 is presented to the network. The response of the three
neurons is 0 0.7 and 0 respectively, therefore the second neuron
gives the largest response indicating that its stored pattern is
the closest to the Input pattern. The winner takes all neuron
produces a 1 at the output of neuron 2, and 0 at each of the
outputs of neuron 1 and 3. Thus the pattern 110 is produced at
21. 21
the output of the ART network.This is fed back, and the
AND of the two patterns calculated:
input pattern X 0 1 0
output pattern Y 1 1 0
XΛY 0 1 0
The number of 1s in X is 1 and the number of 1s in X ΛY
is also 1, so the vigilance of this match is 1/1 = 1.
Assume a typical value for the threshold of say 0.8,
then as vigilance is larger than this, the pattern is
accepted into the class and the new exampler pattern
stored. The weights in neuron 2 are modified so that
they now represent the AND of the two patterns after
normalization. The weights in the output layer are also
modified so that if neuron 2 produces a 1 output than
the exampler pattern will be produced at the output of
the neuron.