Supervised versus Unsupervised Learning
Supervised
Labelled Data.
Difﬁcult to justify biologically.
Doesn’t ﬁt all situations.
Unsupervised
Input Environment only.
Self Organising Neural Networks
The basic design of an Unsupervised Network
Unsupervised Learning
Geometric Interpretation
What they Learn
Problems with Self Organising Neural Networks
Statistical Views.
Origins: Rosenblatt’s “spontaneous learning” in per
ceptrons
Important work by Fukushima, Grossberg, Kohonen,
von der Malsburg, Willshaw
No Teachers
Learn about regularities in the environment
Recognition — familiarity to previous inputs
Classiﬁcation — clustering
Feature Mapping — topographic mappings
Encoding — dimensionality reduction — data com
pression
What determines what is learnt?
Example
von der Malsburg, C (1973). Self–organisation of orientation sensi
tive cells in the striate cortex. Kybernetik, 14: 85–100.
Environment.
Initially random.
Orientation tuned units.
Basic Requirements for Unsupervised Networks
Rumelhart, D. and Zipser, D. (1985). Feature discovery by competi
tive learning. Cognitive Science, 2: 75–112.
1. Input units or Input lines.
2. Response units.
Number of units.
Units not all the same.
3. Limit the strength of units.
5 2 1 0 1 2000
Input pattern 1 0.5 0.01. Weight normalisation.
4. Allow the units to compete. “winner take all”.
5. Learning.
Learning in the Rumelhart and Zipser Network
Winning unit learns.
Weights become more like input patterns
(classiﬁcation).
Normalisation by weight redistribution:
´
¡Û ¼ if loses on stimulus
« «Û
Ò
if wins on stimulus
= 1 (0) if input is (in)active on pattern .
Ò = number of inputs active for pattern
È (Ò
).
« is the learning constant.
Example of Weight Redistribution
16 inputs; for each stimulus assume 8 inputs are
active.
È
Assume that for each output unit , weights are ini
tially normalised: ½ ½ Û ½.
Then
¡Û ½
« «Û if wins and is ON
¡Û «Û if wins and is OFF
All weights for wining unit decremented by «Û
Total weight from all lines decremented
È «Û
Since
ÈÛ ½, loss = total deducted from all weights
on winning unit = «
Each weight on an active line is incremented by ½
«
gain = total amount of weight added = ¡ ½ « «.
loss = gain, so no net change in weight.
Network Summary


Example
2 classiﬁcation units — binary classiﬁcation
16 input lines
Dipole input (2/16 neighbouring inputs active)
Weights learned
unit 1 unit 2
Also discovers horizontal, diagonal divisions; simi
lar result in 3d.
System discovers spatial structure, not in architec
ture.
Geometric representation of Learning
Problems with “Competitive Learning”
How many units?
Normalisation  biological?
Problem of dead units?
1. Leaky learning
2. Conscience mechanism
Not a magic technique. c.f. horizontal/vertical line
task (Rumelhart & Zipser, 1985).
Competitive Learning


Input space is divided up – units learn about a subset
of the input patterns.
Input space broken into groups of maximum simi
larity.
Cluster analysis.
Two sources of competition:
1. Winnertakeall mechanism
2. Resource limitation (normalisation)
Statistical Views
y
w1 w2 wi wN
x1 x2 xi xN
Simple Hebbian learning:
dÛ
dØ
«Ü Ý
Linear activation function.
Ý ÛÜ
Ý Û¡Ü
Then
dÛ
dØ
«Ü ÛÜ
dÛ
Û dØ
« Û ÜÜ
Correlation matrix
Ensemble average and slow changing weights:
¶ ·
Û « Û ÜÜ
Û « Û ÜÜ
Û « Û
Û « Û
where is the correlation matrix:
ÜÜ
Eigenvectors & Eigenvalues
Vector Ü viewed as a point in N dimensional space
(e.g. Ü = 1,1,1.5 ).
z=1.5
y=1
x=1
A Matrix as a linear transformation.
Ú Õ
Eigenvectors/values
Unconstrained Hebbian Learning
Û « Û
Over a large number of patterns the eigenvector with
the largest eigenvalue will be the dominant inﬂu
ence in weight change.
Weights change fastest in the direction of the eigen
vector with the largest eigenvalue.
So weights tend to the principle component of the
data.
Solutions to unbounded weights:
Explicit Normalisation.
Oja type rule – new terms.
Simple weight decay.
Principal Components
Find principal components:
Principal component of data = maximal eigenvector of
the covariance matrix of the data.
Oja rule
Simple Hebbian learning is unstable, weights grow
without limits:
dÛ
dØ
«Ü Ý
Oja rule adds weight decay term:
dÛ
dØ
«Ý´Ü ÝÛ µ
Several properties (p202, Hertz et al., 1991)
1. Û tends to 1.
2. Û is maximal eigenvector of .
ª «
3. Variance of the output, Ý ¾ , is maximised by Û.
¯ Decorrelate output units (via lateral inhibitory con
nections) to get other components (Sanger).
Correlation matrices and eigenvectors
Given the simple rule:
Û Û (ignore «)
w can be rewritten in terms of the eigenvectors ( ) of
with eigenvalues :
Û ½ ½ · ¾ ¾ · Ò Ò
where Û¡
Û ´ ½ ½· ¾ ¾ · Ò Ò
µ
But since :
Û ½ ½ ½ · ¾ ¾ ¾ · Ò Ò Ò
So weight derivative grows mostly in direction of eigen
vector Ñ with largest eigenvalue Ñ
Summary
No external teacher needed.
Competition arises from “winner take all” and weight
normalisation.
Discovers principal features of input environment.
Output units have maximal variance.
Be the first to comment