1. An incremental network for on-line
unsupervised classification and
topology learning
Shen Furao Osamu Hasegawa
Neural Networks, Vol.19, No.1, pp.90-106, (2006)
2. Background: Objective of unsupervised learning (1)
Clustering: Construct decision boundaries
based on unlabeled data.
– Single-link, complete-link, CURE
• Computation overload
• Much memory space
• Unsuitable for large data sets or online data
– K-means:
• Dependence on initial starting conditions
• Tendency to result in local minima
• Determine the number of clusters k in advance
• data sets consisting only of isotropic clusters
3. Background: Objective of unsupervised learning (2)
Topology learning: Given some high-dimensional data
distribution, find a topological structure that closely
reflects the topology of the data distribution
– SOM: self-organizing map
• predetermined structure and size
• posterior choice of class labels for the prototypes
– CHL+NG: competitive Hebbian learning + neural gas
• a priori decision about the network size
• ranking of all nodes in each adaptation step
• use of adaptation parameter
– GNG: growing neural gas
• permanent increase in the number of nodes
• permanent drift of centers to capture input probability
density
4. Background: Online or life-long learning
Fundamental issue (Stability-Plasticity Dilemma): How can
a learning system adapt to new information without
corrupting or forgetting previously learned information
– GNG-U: deletes nodes which are located in regions of
a low input probability density
• learned old prototype patterns will be destroyed
– Hybrid network: Fuzzy ARTMAP + PNN
– Life-long learning with improved GNG: learn number
of nodes needed for current task
• only for supervised life-long learning
5. Objectives of proposed algorithm
• To process the on-line non-stationary data.
• To do the unsupervised learning without any priori
condition such as:
• suitable number of nodes
• a good initial codebook
• how many classes there are
• Report a suitable number of classes
• Represent the topological structure of the input
probability density.
• Separate the classes with some low-density overlaps
• Detect the main structure of clusters polluted by noises.
6. Proposed algorithm
First Layer Second Layer
Input Growing First Growing Second
pattern Network Output Network Output
Insert Delete
Classify
Node Node
7. Algorithms
• Insert new nodes
– Criterion: nodes with high errors serve as a criterion
to insert a new node
– error-radius is used to judge if the insert is successful
• Delete nodes
– Criterion: remove nodes in low probability density
regions
– Realize: delete nodes with no or only one direct
topology neighbor
• Classify
– Criterion: all nodes linked with edges will be one
cluster
8. Experiment
• Stationary environment: patterns are randomly chosen
from all area A, B, C, D and E
• NON-Stationary environment:
Environment
I II III IV V VI VII
A 1 0 1 0 0 0 0
B 0 1 0 1 0 0 0
C 0 0 1 0 0 1 0
D 0 0 0 1 1 0 0
E1 0 0 0 0 1 0 0
E2 0 0 0 0 0 1 0
Original Data Set E3 0 0 0 0 0 0 1
22. Conclusion
• An autonomous learning system for
unsupervised classification and topology
representation task
• Grow incrementally and learn the number of
nodes needed to solve current task
• Accommodate input patterns of on-line non-
stationary data distribution
• Eliminate noise in the input data