Introduction to Adaptive Resonance Theory (ART) neural networks including:
Introduction (Stability-Plasticity Dilemma)
ART Network
ART Types
Basic ART network Architecture
ART Algorithm and Learning
ART Computational Example
ART Application
Conclusion
Main References
1. 1
Adaptive Resonance Theory (ART)
Shahid Rajaee Teacher Training University
Faculty of Computer Engineering
PRESENTED BY:
Amir Masoud Sefidian
2. OUTLINE
• Introduction (Stability-Plasticity Dilemma)
• ART Network
• ART Types
• Basic ART network Architecture
• ART Algorithm and Learning
• ART Computational Example
• ART Application
• Conclusion
• Main References
3. Stability-Plasticity Dilemma (SPD)
• System behaviour doesn’t change after irrelevant
eventsStability
• System adapts its behaviour according to
significant eventsPlasticity
• How to achieve stability without rigidity and
plasticity without chaos?
• Ongoing learning capability
• Preservation of learned knowledge
Dilemma
Real world is faced with a situations where data is continuously changing.
Every learning system faces plasticity-stability dilemma.
How can we create a machine that can act and navigate in a world that is constantly changing?
4. SPD (Contd.)
Every learning system faces the plasticity-stability dilemma.
• The plasticity-stability dilemma poses few questions :
How can we continue to quickly learn new things about the
environment and yet not forgetting what we have already learned?
How can a learning system remain plastic (adaptive) in response to
significant input yet stable in response to irrelevant input?
How can a neural network can remain plastic enough to learn new
patterns and yet be able to maintain the stability of the already
learned patterns?
How does the system know to switch between its plastic and stable
modes.
6. Back-propagation DrawBack
• The back-propagation algorithm suffer from stability problem.
• Once a back propagation is trained, the number of hidden neurons and the
weights are fixed.
• The network cannot learn from new patterns unless the network is
retrained from scratch.
• Thus we consider the back propagation networks don’t have plasticity.
• Assuming that the number of hidden neurons can be kept constant, the
plasticity problem can be solved by retraining the network on the new
patterns using on-line learning rule.
• However it will cause the network to forget about old knowledge rapidly.
We say that such algorithm is not stable.
7. • Gail Carpenter and Stephen Grossberg (Boston University) developed the
“Adaptive Resonance Theory” in 1976 to learning model to answer this dilemma.
• ART networks tackle the stability-plasticity dilemma:
▫ Maintain the plasticity required to learn new patterns, while preventing the modification
of pattern that have been learned previously.
▫ No stored pattern is modified if input is not matched with any pattern.
▫ Plasticity: They can always adapt to unknown inputs (by creating a new cluster with a
new weight vector) if the given input cannot be classified by existing clusters (e.g., this is
a computational corollary of the biological model of neural plasticity).
▫ Stability: Existing clusters are not deleted by the introduction of new inputs (new
clusters will just be created in addition to the old ones).
▫ The basic ART System is an unsupervised learning model.
8. The key innovation :
“expectations”
As each input is presented to the network, it is compared with the prototype vector
that is most closely matches (the expectation).
If the match between the prototype and the input vector is NOT adequate, a new
prototype is selected.
In this way, previous learned memories (prototypes) are not eroded by new learning.
8
9. 9
ART Networks
ART Networks
Grossberg, 1976
Unsupervised ART
Learning
supervised ART Learning
ART1, ART2
Carpenter &
Grossberg,
1987
Fuzzy ART
Carpenter &
Grossberg,
etal,1991
ARTMAP
Carpenter &
Grossberg,
etal,1991
Fuzzy ARTMAP
Carpenter &
Grossberg,
etal,1991
Gaussian
ARTMAP
Williamson,1
992
Simplified ART
Baraldi & Alpaydin, 1998
Simplified ARTMAP
Kasuba, 1993
Mahalanobis Distance Based ARTMAP
Vuskovic & Du, 2001, Vuskovic, Xu & Du, 2002
10. • ART 1 :
▫ simplest variety of ART networks
▫ accepting only binary inputs.
• ART2 :
▫ support continuous inputs.
• ART3 is refinement of both models.
• Fuzzy ART implements fuzzy logic into ART’s pattern recognition.
• ARTMAP also known as Predictive ART, combines two slightly modified ART-1 or ART-2 units
into a supervised learning structure .
• Fuzzy ARTMAP is merely ARTMAP using fuzzy ART units, resulting in a corresponding increase
in efficacy.
11. Basic ART network Architecture The basic ART system is unsupervised learning
model. It typically consists of
1. a comparison field
2. a recognition field composed of neurons,
3. a vigilance parameter, and
4. a reset module
F2 node is in one of three states:
•Active
•Inactive, but available to participate in
competition
•Inhibited, and prevented from participating
in competition
12. 12
ART Subsystems
Layer 1
Comparison of input pattern and expectation.
L1-L2 Connections
Perform clustering operation.
Each row of weights is a prototype pattern.
Layer 2
Competition (Contrast enhancement) winner-take-all learning strategy.
L2-L1 Connections
Perform pattern recall (Expectation).
Orienting Subsystem
Causes a reset when expectation does not match input pattern
The degree of similarity required for patterns to be assigned to the same cluster unit is
controlled by a user-defined gain control, known as the vigilance parameter
13. ART Algorithm
Adapt winner
node
Initialise uncommitted
node
new pattern
categorisation
known unknown
recognition
comparison
Incoming pattern matched with
stored cluster templates
If close enough to stored template
joins best matching cluster,
weights adapted
If not, a new cluster is initialised
with pattern as template
15. Comparison Phase (in Layer 1)
• Backward transmission via top-down weights
• Vigilance test: class template matched with input pattern
• If pattern close enough to template, categorisation was successful and “resonance”
achieved
• If not close enough reset winner neuron and try next best matching
• (The reset inhibit the current winning neuron, and the current expectation is
removed)
• A new competition is then performed in Layer 2, while the previous winning neuron
is disable.
• The new winning neuron in Layer 2 projects a new expectation to Layer 1, through
the L2-L1 connections.
• This process continues until the L2-L1 expectation provides a close enough match to
the input pattern.
• The process of matching, and subsequent adaptation, is referred to as resonance.
16. Step 1:
Send input from the F1 layer to F2 layer for
processing.
The first node within the F2 layer is chosen as
the closest match to the input and a
hypothesis is formed.
This hypothesis represents what
the node will look like after learning has
occurred, assuming it is the correct node to be
updated.
17. Step2:
Assume j is winner.
If T j (I*) > ρ then the hypothesis is accepted and
assigned to that node. Otherwise, the process moves
on to Step 3.
Step3:
If the hypothesis is rejected, a “reset” command is sent back
to the F2 layer.
In this situation, the jth node within F2 is no longer a
candidate so the process repeats for node j+1.
18.
19.
20. 20
Learning in ART1
Updates for both the bottom-up and top-down weights are controlled by
differential equations.
Assuming Jth node is winner and their connected weights
should be updated, Two separate learning laws:
L2-L1 connections:
Bottom-up weights should be
smaller than or equal to value:
Solve
Solve
Initial weights for ART1:
top-down weights are initialized to 1.
1)0( jit
L1-L2 connections
21.
22. 22
Vigilance Threshold
• A vigilance parameter ρ determines the tolerance of
matching process.
• Vigilance threshold sets granularity of clustering.
• It defines amount of attraction of each prototype.
• Low threshold
▫ Large mismatch accepted
▫ Few large clusters
▫ Misclassifications more likely
▫ the algorithm will be more willing to accept input
vectors into clusters (i.e., definition of similarity is
LESS strict).
• High threshold
▫ Small mismatch accepted
▫ Many small clusters
▫ Higher precision
▫ the algorithm will be more "picky" about assigning
input vectors to clusters (i.e., the definition of
similarity is MORE strict)
24. For this example, let us assume that we have an ART-1 network with 7 input
neurons (n = 7) and initially one output neuron (n = 1).
Our input vectors are
{(1, 1, 0, 0, 0, 0, 1),
(0, 0, 1, 1, 1, 1, 0),
(1, 0, 1, 1, 1, 1, 0),
(0, 0, 0, 1, 1, 1, 0),
(1, 1, 0, 1, 1, 1, 0)}
and the vigilance parameter = 0.7.
Initially, all top-down weights are set to 𝑡𝑙, 1(0) = 1, and all bottom-up weights
are set to 𝑏1, 𝑙(0) = 1/8.
ART Example Computation 24
25. For the first input vector, (1, 1, 0, 0, 0, 0, 1), we get:
Clearly, y1 is the winner (there are no competitors).
Since we have:
the vigilance condition is satisfied and we get the following new weights:
ART Example Computation 25
8
3
1
8
1
0
8
1
0
8
1
0
8
1
0
8
1
1
8
1
1
8
1
1 y
7.01
3
3
7
1
7
1 1,
l l
l ll
x
xt
5.3
1
35.0
1
)1()1()1( 7,12,11,1
bbb
0)1()1()1()1( 6,15,14,13,1 bbbb
26. Also, we have:
We can express the updated weights as matrices:
Now we have finished the first learning step and proceed by presenting the
next input vector.
ART Example Computation 26
lll xtt )0()1( 0,1,
T
5.3
1
0000
5.3
1
5.3
1
)1(
B
T
1000011)1( T
27. For the second input vector, (0, 0, 1, 1, 1, 1, 0), we get:
Of course, y1 is still the winner.
However, this time we do not reach the vigilance threshold:
This means that we have to generate a second node in the output layer that
represents the current input.
Therefore, the top-down weights of the new node will be identical to the
current input vector.
ART Example Computation 27
00
5.3
1
101010100
5.3
1
0
5.3
1
1 y
.7.00
4
0
7
1
7
1 1,
l l
l ll
x
xt
28. The new unit’s bottom-up weights are set to zero in the positions where the
input has zeroes as well.
The remaining weights are set to:
1/(0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0)
This gives us the following updated weight matrices:
ART Example Computation 28
T
0
4.5
1
4.5
1
4.5
1
4.5
100
3.5
10000
3.5
1
3.5
1
)2(
B
T
0111100
1000011
)2(
T
29. For the third input vector, (1, 0, 1, 1, 1, 1, 0), we have:
Here, y2 is the clear winner.
This time we exceed the vigilance threshold again:
Therefore, we adapt the second node’s weights.
Each top-down weight is multiplied by the corresponding element of the
current input.
ART Example Computation 29
5.4
4
;
5.3
1
21 yy
.7.08.0
5
4
7
1
7
1 2,
l l
l ll
x
xt
30. The new unit’s bottom-up weights are set to the top-down weights divided by
(0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0).
It turns out that, in the current case, these updates do not result in any weight
changes at all:
ART Example Computation 30
T
0
4.5
1
4.5
1
4.5
1
4.5
100
3.5
10000
3.5
1
3.5
1
)3(
B
T
0111100
1000011
)3(
T
31. For the fourth input vector, (0, 0, 0, 1, 1, 1, 0), it is:
Again, y2 is the winner.
The vigilance test succeeds once again:
Therefore, we adapt the second node’s weights.
As usual, each top-down weight is multiplied by the corresponding element of
the current input.
ART Example Computation 31
5.4
3
;0 21 yy
.7.01
3
3
7
1
7
1 2,
l l
l ll
x
xt
32. The new unit’s bottom-up weights are set to the top-down weights divided by
(0.5 + 0 + 0 + 0 + 1 + 1 + 1 + 0).
This gives us the following new weight matrices:
ART Example Computation 32
T
0
3.5
1
3.5
1
3.5
1000
3.5
10000
3.5
1
3.5
1
)4(
B
T
0111000
1000011
)4(
T
33. Finally, the fifth input vector, (1, 1, 0, 1, 1, 1, 0), gives us:
Once again, y2 is the winner.
The vigilance test fails this time:
This means that the active set A is reduced to contain only the first node, which
becomes the uncontested winner.
ART Example Computation 33
5.3
3
;
5.3
2
21 yy
.7.06.0
5
3
7
1
7
1 2,
l l
l ll
x
xt
34. The vigilance test fails for the first unit as well:
We thus have to create a third output neuron, which gives us the following new
weight matrices:
ART Example Computation 34
.7.04.0
5
2
7
1
7
1 1,
l l
l ll
x
xt
T
B
0
5.5
1
5.5
1
5.5
10
5.5
1
5.5
1
0
3.5
1
3.5
1
3.5
1000
3.5
10000
3.5
1
3.5
1
)5(
T
0111011
0111000
1000011
)5(
T
35. In the second epoch, the first input vector, (1, 1, 0, 0, 0, 0, 1), gives us:
Here, y1 is the winner, and the vigilance test succeeds:
Since the current input is identical to the winner’s top-down weights, no weight
update happens.
ART Example Computation 35
5.5
2
;0;
5.3
3
321 yyy
.7.01
3
3
7
1
7
1 1,
l l
l ll
x
xt
36. The second input vector, (0, 0, 1, 1, 1, 1, 0), results in:
Now y2 is the winner, and the vigilance test succeeds:
Again, because the current input is identical to the winner’s top-down weights,
no weight update occurs.
ART Example Computation 36
5.5
3
;
5.3
3
;0 321 yyy
.7.01
3
3
7
1
7
1 2,
l l
l ll
x
xt
37. The third input vector, (1, 0, 1, 1, 1, 1, 0), give us:
Once again, y2 is the winner, but this time the vigilance test fails:
This means that the active set is reduced to A = {1, 3}.
Since y3 > y1, the third node is the new winner.
ART Example Computation 37
5.5
4
;
5.3
3
;
5.3
1
321 yyy
.7.06.0
5
3
7
1
7
1 2,
l l
l ll
x
xt
38. The third node does satisfy the vigilance threshold:
This gives us the following updated weight matrices:
ART Example Computation 38
.7.08.0
5
4
7
1
7
1 3,
l l
l ll
x
xt
T
B
0
4.5
1
4.5
1
4.5
100
4.5
1
0
3.5
1
3.5
1
3.5
1000
3.5
10000
3.5
1
3.5
1
)8(
T
0111001
0111000
1000011
)8(
T
39. For the fourth vector, (0, 0, 0, 1, 1, 1, 0), the second node wins, passes the
vigilance test, but no weight changes occur.
The fifth vector, (1, 1, 0, 1, 1, 1, 0), makes the second unit win, which fails the
vigilance test.
The new winner is the third output neuron, which passes the vigilance test but
does not lead to any weight modifications.
Further presentation of the five sample vectors do not lead to any weight
changes; the network has thus stabilized.
ART Example Computation 39
40. Adaptive Resonance Theory
40
Illustration of the categories (or clusters) in input space formed by ART networks.
increasing leads to narrower cones and not to wider ones as suggested by the figure.
41. • A problem with ART-1 is the need to determine the vigilance parameter for
a given problem, which can be tricky.
• Furthermore, ART-1 always builds clusters of the same size, regardless of the
distribution of samples in the input space.
• Nevertheless, ART is one of the most important and successful attempts at
simulating incremental learning in biological systems.
Adaptive Resonance Theory 41
42. Applications of ART
• Face recognition
• Image compression
• Mobile robot control
• Target recognition
• Medical diagnosis
• Signature verification
43. Conclusion
ART is Artificial Neural Network system that must be able to adapt to
changing environment but constant change can make the system unstable
system because system may learn new information only by forgetting
everything it has so far learned.
44. 44
Main References:
• S. Rajasekaran, G.A.V. Pai, “Neural Networks, Fuzzy Logic and Genetic Algorithms”, Prentice Hall of India,
Adaptive Resonance Theory, Chapter 5.
• Jacek M. Zurada, “Introduction to Artificial Neural Systems”, West Publishing Company, Matching & Self
organizing maps, Chapter 7.
• Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern
recognition machine. Computer vision, graphics, and image processing, 37(1), 54-115.
• Adaptive Resonance Theory, Soft computing lecture notes, http://www.myreaders.info/html/soft_computing.html
• Fausett, L. V. (1994). Fundamentals of neural networks: Architectures, algorithms, and applications. Englewood
Cliffs, NJ: Prentice-Hall.