SlideShare a Scribd company logo
1 of 36
Download to read offline
Chapter3 Pattern Association &
Associative Memory
• Associating patterns which are
– similar,
– contrary,
– in close proximity (spatial),
– in close succession (temporal)
• Associative recall
– evoke associated patterns
– recall a pattern by part of it
– evoke/recall with incomplete/ noisy patterns
• Two types of associations. For two patterns s and t
– hetero-association (s != t) : relating two different patterns
– auto-association (s = t): relating parts of a pattern with
other parts
• Architectures of NN associative memory
– single layer (with/out input layer)
– two layers (for bidirectional assoc.)
• Learning algorithms for AM
– Hebbian learning rule and its variations
– gradient descent
• Analysis
– storage capacity (how many patterns can be
remembered correctly in a memory)
– convergence
• AM as a model for human memory
Training Algorithms for Simple AM
y_m
w_11
y_1
x_n
x_1
w_1m
w_n1
w_nm
• Goal of learning:
– to obtain a set of weights w_ij
– from a set of training pattern pairs {s:t}
– such that when s is applied to the input layer, t is computed
at the output layer
–
s_1
s_n
t_1
t_m
• Network structure: single layer
– one output layer of non-linear units and one input layer
– similar to the simple network for classification in Ch. 2
j
w
s
f
t
t
s j
T
j all
for
)
(
:
:
pairs
training
all
for •
=
• Similar to hebbian learning for classification in Ch. 2
• Algorithm: (bipolar or binary patterns)
– For each training samples s:t:
–
are ON (binary) or have the same sign (bipolar)
•
• Instead of obtaining W by iterative updates, it can be
computed from the training set by calculating the outer
product of s and t.
j
i
ij t
s
w ⋅
=
∆
}
{
)
(
)
(
1
ij
P
P
j
i
ij w
W
p
t
p
s
w =
= ∑
=
Hebbian rule
j
i
ij t
s
w and
both
if
increases
∆
patterns
training
all
for
updates
after
Then,
initiall.
0
If P
wij =
∆
• Outer product. Let s and t be row vectors.
Then for a particular training pair s:t
And
• It involves 3 nested loops p, i, j (order of p is irrelevant)
p= 1 to P /* for every training pair */
i = 1 to n /* for every row in W */
j = 1 to m /* for every element j in row i */
[ ]












∆
∆
∆
∆
=












=












=
⋅
=
∆
nm
n
m
m
n
n
m
m
m
n
T
w
w
w
w
t
s
t
s
t
s
t
s
t
s
t
s
t
t
s
s
p
t
p
s
p
W
......
......
......
......
......
,......
)
(
)
(
)
(
1
1
11
1
2
1
2
1
1
1
1
1
∑
=
⋅
=
P
p
T
p
t
p
s
P
W
1
)
(
)
(
)
(
)
(
)
(
: p
t
p
s
w
w j
i
ij
ij ⋅
+
=
• Does this method provide a good association?
– Recall with training samples (after the weights are
learned or computed)
– Apply to one layer, hope appear on the other,
e.g.
– May not always succeed (each weight contains some
information from all samples)
)
(k
s )
(k
t
)
(
)
)
(
( k
t
W
k
s
f =
∑
∑
∑
∑
≠
≠
=
=
+
=
+
=
⋅
⋅
=
=
k
p
T
k
p
T
T
P
p
T
P
p
T
p
t
p
s
k
s
k
t
k
s
p
t
p
s
k
s
k
t
k
s
k
s
p
t
p
s
k
s
p
t
p
s
k
s
W
k
s
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
2
1
1
cross-talk
term
principal
term
• Principal term gives the association between s(k) and t(k).
• Cross-talk represents correlation between s(k):t(k) and other
training pairs. When cross-talk is large, s(k) will recall
something other than t(k).
• If all s(p) are orthogonal to each other, then ,
no sample other than s(k):t(k) contribute to the result.
• There are at most n orthogonal vectors in an n-dimensional
space.
• Cross-talk increases when P increases.
• How many arbitrary training pairs can be stored in an AM?
– Can it be more than n (allowing some non-orthogonal patterns
while keeping cross-talk terms small)?
– Storage capacity (more later)
0
)
(
)
( =
⋅ p
s
k
s T
Delta Rule
• Similar to that used in Adaline
• The original delta rule for weight update:
• Extended delta rule
– For output units with differentiable activation functions
– Derived following gradient descent approach).
i
j
j
j
i x
y
t
w )
( −
=
∆ α
α
)
_
(
)
( '
j
i
j
j
j
i in
y
f
x
y
t
w −
=
∆ α
α
i
j
j
j
ij
ij
I
J
J
J
i
iJ
IJ
J
J
J
IJ
J
J
J
IJ
J
J
J
J
J
IJ
J
J
IJ
x
in
y
f
y
t
w
E
w
x
in
y
f
y
t
x
w
w
in
y
f
y
t
w
in
y
f
y
t
w
y
y
t
y
t
w
y
t
w
E
)
_
(
)
(
2
2
)
_
(
)
(
2
)
_
(
)
(
2
)
_
(
)
(
2
)
(
2
)
(
)
(
2
'
'
'
−
=
∂
∂
−
∝
∆
⋅
−
−
=
∂
∂
⋅
−
−
=
∂
∂
−
−
=
∂
∂
−
−
=
−
∂
∂
−
=
∂
∂
∑
∑
=
−
=
m
j
j
j y
t
E
1
2
)
(
• same as the update rule for output nodes in BP learning.
• Works well if S are linearly independent (even if not
orthogonal).
Example of hetero-associative memory
• Binary pattern pairs s:t with |s| = 4 and |t| = 2.
• Total weighted input to output units:
• Activation function: threshold
• Weights are computed by Hebbian rule (sum of outer
products of all training pairs)
• Training samples:
∑
=
i
ij
i
j w
x
in
y _



≤
>
=
0
_
0
0
_
1
j
j
j
in
y
if
in
y
if
y
∑
=
=
P
p
j
T
i p
t
p
s
W
1
)
(
)
(
s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
( )












=












=
⋅
0
0
0
0
0
0
0
1
0
1
0
0
0
1
)
1
(
)
1
( t
sT
( )












=












=
⋅
1
0
0
0
0
0
0
0
1
0
1
0
0
0
)
3
(
)
3
( t
sT
( )












=












=
⋅
0
0
0
0
0
1
0
1
0
1
0
0
1
1
)
2
(
)
2
( t
sT
( )












=












=
⋅
1
0
1
0
0
0
0
0
1
0
1
1
0
0
)
4
(
)
4
( t
sT














=
2
0
1
0
0
1
0
2
W
Computing the weights
recall:
x=(1 0 0 0) x=(0 1 0 0) (similar to S(1) and S(2)
x=(0 1 1 0)
( ) ( )
0
,
1
0
2
2
0
1
0
0
1
0
2
0
0
0
1
2
1 =
=
=












y
y
( ) ( )
0
,
1
0
1
2
0
1
0
0
1
0
2
0
0
1
0
2
1 =
=
=












y
y
( ) ( )
1
,
1
1
1
2
0
1
0
0
1
0
2
0
1
1
0
2
1 =
=
=












y
y
(1 0 0 0), (1 1 0 0) class (1, 0)
(0 0 0 1), (0 0 1 1) class (0, 1)
(0 1 1 0) is not sufficiently similar
to any class
delta-rule would give same or
similar results.
Example of auto-associative memory
• Same as hetero-associative nets, except t(p) =s (p).
• Used to recall a pattern by a its noisy or incomplete version.
(pattern completion/pattern recovery)
• A single pattern s = (1, 1, 1, -1) is stored (weights computed
by Hebbian rule – outer product)
•












−
−
−
−
−
−
=
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
W
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) recognized
not
0
0
0
0
1
1
1
1
noisy
more
1
1
1
1
2
2
2
2
1
1
0
0
info
missing
1
1
1
1
2
2
2
2
1
1
1
1
pat
noisy
1
1
1
1
4
4
4
4
1
1
1
1
pat.
training
=
⋅
−
−
−
−
→
−
=
⋅
−
−
→
−
=
⋅
−
−
−
→
−
=
⋅
−
W
W
W
W
• Diagonal elements will dominate the computation when
multiple patterns are stored (= P).
• When P is large, W is close to an identity matrix. This
causes output = input, which may not be any stoned
pattern. The pattern correction power is lost.
• Replace diagonal elements by zero.
•












−
−
−
−
−
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
0
W
wrong
W
W
W
W
→
−
=
−
−
−
−
→
−
=
−
−
→
−
=
−
−
−
→
−
=
−
)
1
1
1
1
(
'
)
1
1
1
1
(
)
1
1
1
1
(
)
1
1
2
2
(
'
)
1
1
0
0
(
)
1
1
1
1
(
)
1
1
1
3
(
'
)
1
1
1
1
(
)
1
1
1
1
(
)
3
3
3
3
(
'
)
1
1
1
1
(
Storage Capacity
• # of patterns that can be correctly stored & recalled by a
network.
• More patterns can be stored if they are not similar to each
other (e.g., orthogonal)
non-orthogonal
orthogonal
)
1
1
1
1
(
)
1
1
1
1
(
)
1
1
1
1
(
−
−
→
−
−
−
−












−
−
−
−
−
−
−
−
−
−
−
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
0
W
)
1
1
1
1
(
)
1
1
1
1
(
−
→
−
−












−
−
−
−
=
0
2
0
2
2
0
0
2
0
0
0
0
2
2
0
0
0
W
correctly
stored
not
is
It
)
1
1
0
1
(
)
11
1
1
( 0 −
=
⋅
−
− W
recalled
correctly
be
can
patterns
three
All
• Adding one more orthogonal pattern the
weight matrix becomes:
• Theorem: an n by n network is able to store up to n-1
mutually orthogonal (M.O.) bipolar vectors of n-
dimension, but not n such vectors.
• Informal argument: Suppose m orthogonal vectors
are stored with the following weight matrix:












=
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
W
)
1
1
1
1
(
The memory is
completely destroyed!
)
(
)......
1
( m
a
a




 =
=
∑
=
m
p
j
i
j
i
p
a
p
a
j
i
w
1
rule)
(Hebbian
otherwise
)
(
)
(
)
diagonal
(zero
if
0
∑ ∑
∑ ∑ ∑
∑
∑
∑
= ≠
= ≠ =
=
=
=
•
•
•
•
•
•
⋅
=
⋅
=
=
⋅
=
=
m
p j
i
i
i
j
n
i j
i
m
p
j
i
i
ij
i
n
i
in
i
n
i
i
i
n
i
i
i
n
n
p
a
k
a
p
a
p
a
p
a
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
w
w
k
a
W
k
a
1
1 1
1
1
2
1
1
2
1
2
1
)
(
)
(
)
(
)
(
)
(
)
(
)
(
:
component
jth
the
)
)
(
......
,
)
(
,
)
(
(
)
)
(
,......
)
(
,
)
(
(
)
,......
,
)(
(
)
(
Let’s try to recall one of them, say ))
(
)......
(
(
)
( 1 k
a
k
a
k
a n
=



=
⋅
=
−
≠
−
=
−
= ∑
∑ =
≠
)
)
(
)
(
(since
1
M.O.)
are
)
(
and
)
(
(since
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
1
n
p
a
p
a
p
k
n
p
a
k
a
p
k
p
a
k
a
p
a
k
a
p
a
k
a
p
a
k
a
T
j
j
n
i
j
j
i
i
j
i
i
i
[ ]
)
(
)
(
)
1
)(
(
)
(
)
1
(
)
1
)(
(
)
(
)
1
)(
(
)
(
)
(
)
(
)
(
)
(
)
(
1
k
a
m
n
n
k
a
k
a
m
n
k
a
k
a
n
k
a
p
a
k
a
p
a
p
a
k
a
p
a
j
j
j
k
p
j
j
j
m
p j
i k
p
j
j
j
i
i
j
−
=
−
+
−
−
=
−
+
−
=
−
+
−
=
∑
∑ ∑ ∑
≠
= ≠ ≠
)
(
)
(
)
(
Therefore, k
a
m
n
W
k
a −
=
• When m < n, a(k) can correctly recall itself
when m = n, output is a 0 vector, recall fails
• In linear algebraic term, a(k) is a eigenvector of W, whose
corresponding eigenvalue is (n-m).
when m = n, W has eigenvalue zero, the only eigenvector is
0, which is a trivial eigenvector.
• How many mutually orthogonal bipolar vectors with given
dimension n?
n can be written as , where m is an odd integer.
Then maximally: M.O. vectors
• Follow up questions:
– What would be the capacity of AM if stored patterns are not
mutually orthogonal (say random)
– Ability of pattern recovery and completion.
How far off a pattern can be from a stored pattern that is still
able to recall a correct/stored pattern
– Suppose x is a stored pattern, x’ is close to x, and x”=
f(xW) is even closer to x than x’. What should we do?
Feed back x” , and hope iterations of feedback will lead to x
m
n k
2
=
k
2
Iterative Autoassociative Networks
• Example:
• In general: using current output as input of the next iteration
x(0) = initial recall input
x(I) = f(x(I-1)W), I = 1, 2, ……
until x(N) = x(K) where K < N












−
−
−
−
−
−
=
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
)
1
,
1
,
1
,
1
( W
x
x
W
x
x
W
x
x
=
−
→
−
=
=
−
=
=
)
1
,
1
,
1
,
1
(
)
3
,
2
,
2
,
3
(
"
"
)
1
,
1
,
1
,
0
(
'
)
0
,
0
,
0
,
1
(
'
:
input
recall
incomplete
An
Output units are
threshold units
• Dynamic System: state vector x(I)
– If k = N-1, x(N) is a stable state (fixed point)
f(x(N)W) = f(x(N-1)W) = x(N)
• If x(K) is one of the stored pattern, then x(K) is called a
genuine memory
• Otherwise, x(K) is a spurious memory (caused by cross-
talk/interference between genuine memories)
• Each fixed point (genuine or spurious memory) is an
attractor (with different attraction basin)
– If k != N-1, limit-circle,
• The network will repeat
x(K), x(K+1), …..x(N)=x(K) when iteration continues.
• Iteration will eventually stop because the total number of
distinct state is finite (3^n) if threshold units are used.
• If sigmoid units are used, the system may continue evolve
forever (chaos).
Discrete Hopfield Model
• A single layer network
– each node as both input and output units
• More than an AM
– Other applications e.g., combinatorial optimization
• Different forms: discrete & continuous
• Major contribution of John Hopfield to NN
– Treating a network as a dynamic system
– Introduce the notion of energy function & attractors into
NN research
Discrete Hopfield Model (DHM) as AM
• Architecture:
– single layer (units serve as both input and output)
– nodes are threshold units (binary or bipolar)
– weights: fully connected, symmetric, and zero diagonal
– are external
inputs, which
may be transient
or permanent
0
=
=
ii
ji
ij
w
w
w
i
x
• Weights:
– To store patterns s(p), p=1,2,…P
bipolar:
same as Hebbian rule (with zero diagonal)
binary:
converting s(p) to bipolar when constructing W.
0
)
(
)
(
=
≠
= ∑
ii
p
j
i
ij
w
j
i
p
s
p
s
w
0
)
1
)
(
2
)(
1
)
(
2
(
=
≠
−
−
= ∑
ii
p
j
i
ij
w
j
i
p
s
p
s
w
• Recall
– Use an input vector to recall a stored vector (book calls the
application of DHM)
– Each time, randomly select a unit for update
Recall Procedure
1.Apply recall input vector to the network:
2.While convergence = fails do
2.1.Randomly select a unit
2.2. Compute
2.3. Determine activation of Yi
2.4. Periodically test for convergence.
x n
i
x
y i
i ,....
2
,
1
: =
=
∑
≠
+
=
i
j
ji
j
i
i w
y
x
in
y _





<
−
=
>
=
i
i
i
i
i
i
i
i
in
y
if
in
y
if
y
in
y
if
y
θ
θ
θ
θ
θ
θ
_
1
_
_
1
• Notes:
1. Each unit should have equal probability to be selected
at step 2.1
2. Theoretically, to guarantee convergence of the recall
process, only one unit is allowed to update its
activation at a time during the computation. However,
the system may converge faster if all units are allowed
to update their activations at the same time.
3. Convergence test:
4. usually set to zero.
5. in step 2.2 ( ) is optional.
i
next
y
current
y i
i ∀
= )
(
)
(
i
θ
i
x ∑
+
=
j
ji
j
i
j w
y
x
in
y _
• Example:
Store one pattern:
Recall input
)
same
the
gives
1)
-
1
1
(1
t
counterpar
(bipolar
)
0
,
1
,
1
,
1
(
pattern
binary
W
wrong
are
bits
first two
),
0
,
1
,
0
,
0
(
=
x
)
0
,
1
,
0
,
1
(
1
1
1
0
_
1
1
1
1
1
=
=
=
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
selected
is
1
Y












−
−
−
−
−
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
W
selected
is
4
Y
)
0
,
1
,
0
,
1
(
2
2
)
2
(
0
_
4
4
4
4
4
=
−
=
−
=
−
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
)
0
,
1
,
0
,
1
(
1
2
1
1
_
3
3
3
3
3
=
=
=
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
selected
is
3
Y selected
is
2
Y
)
0
,
1
,
1
,
1
(
1
2
2
0
_
2
2
2
2
2
=
=
=
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
The stored pattern is correctly recalled
Convergence Analysis of DHM
• Two questions:
1.Will Hopfield AM converge (stop) with any given recall input?
2.Will Hopfield AM converge to the stored pattern that is closest
to the recall input ?
• Hopfield provides answer to the first question
– By introducing an energy function to this model,
– No satisfactory answer to the second question so far.
• Energy function:
– Notion in thermo-dynamic physical systems. The system has a
tendency to move toward lower energy state.
– Also known as Lyapunov function. After Lyapunov theorem
for the stability of a system of differential equations.
• In general, the energy function is the state
of the system at step (time) t, must satisfy two conditions
1. is bounded from below
2. is monotonically nonincreasing.
• The energy function defined for DHM
• Show
At t+1, is selected for update
)
(
where
)),
(
( t
y
t
y
E
)
(t
E
t
c
t
E ∀
≥
)
(
)
(t
E
)
0
)
(
:
version
continuous
(in
0
)
(
)
1
(
)
1
( ≤
≤
−
+
=
+
∆ t
E
t
E
t
E
t
E &
i
j
i j i i
i
i
i
ij
j
i y
y
x
w
y
y
E ∑∑ ∑ ∑
≠
+
−
−
= θ
θ
5
.
0
0
)
1
( ≤
+
∆ t
E
k
Y
))
(
)
(
)
(
)
(
5
.
0
(
))
1
(
)
1
(
)
1
(
)
1
(
5
.
0
(
)
(
)
1
(
time)
a
at
update
can
unit
one
(only
0
)
1
(
:
Note
)
(
)
1
(
)
1
(
t
y
t
y
x
w
t
y
t
y
t
y
t
y
x
w
t
y
t
y
t
E
t
E
k
j
t
y
t
y
t
y
t
y
i
j
i j i i
i
i
i
ij
j
i
i
j
i j i i
i
i
i
ij
j
i
j
k
k
k
∑∑ ∑ ∑
∑∑ ∑ ∑
≠
≠
+
−
−
−
+
+
+
−
+
+
−
=
−
+
≠
=
+
∆
−
+
=
+
∆
θ
θ
θ
θ
terms which are different in the two parts are those involving
• Show E(t) is bounded from below, since are
all bounded, E is bounded.
0
)
1
(
0
)
1
(
)
(
)
1
(
otherwise,
0
)
1
(
_
1
)
1
(
1
)
1
(
&
1
)
(
if
0
)
1
(
_
2
)
1
(
1
)
1
(
&
1
)
(
if
:
cases
)
1
(
]
)
(
[
)
1
(
,
,
,
=
+
∆
⇒
=
+
∆
⇒
=
+
<
+
∆
⇒
>
⇒
=
+
∆
=
+
−
=
<
+
∆
⇒
<
⇒
−
=
+
∆
−
=
+
=
+
∆
−
+
−
=
+
∆ ∑
∑
∑
≠
t
E
t
y
t
y
t
y
t
E
in
y
t
y
t
y
t
y
t
E
in
y
t
y
t
y
t
y
t
y
x
w
t
y
t
E
y
y
x
w
y
y
w
y
y
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
j
k
k
j
k
j
k
k
k
k
i
i
j
jk
j
k ki
k
θ
θ
θ
θ
θ
θ
θ
θ
ij
i
i
i w
x
y ,
,
, θ
θ
k
y
)
1
(
_ +
t
in
y k
• Comments:
1.Why converge.
• Each time, E is either unchanged or decreases an amount.
• E is bounded from below.
• There is a limit E may decrease. After finite number of steps,
E will stop decrease no matter what unit is selected for update.
2.The state the system converges is a stable state.
Will return to this state after some small perturbation. It is called
an attractor (with different attraction basin)
3.Error function of BP learning is another example of
energy/Lyapunov function. Because
• It is bounded from below (E>0)
• It is monotonically non-increasing (W updates along gradient
descent of E)
0
_
0
)
(
)
1
(
=
∆
⇒
=
=
∆
⇒
=
+
∀
k
k
k
k
k
y
in
y
or
y
t
y
t
y
either
k
θ
θ
• P: maximum number of random patterns of dimension n
can be stored in a DHM of n nodes
• Hopfield’s observation:
• Theoretical analysis:
P/n decreases because larger n leads to more
interference between stored patterns.
• Some work to modify HM to increase its capacity to close
to n, W is trained (not computed by Hebbian rule).
15
.
0
,
15
.
0 ≈
≈
n
P
n
P
n
n
P
n
n
P
2
2 log
2
1
,
log
2
≈
≈
Capacity Analysis of DHM
My Own Work:
• One possible reason for the small
capacity of HM is that it does not
have hidden nodes.
• Train feed forward network (with
hidden layers) by BP to establish
pattern auto-associative.
• Recall: feedback the output to
input layer, making it a dynamic
system.
• Shown 1) it will converge, and 2)
stored patterns become genuine
memories.
• It can store many more patterns
(seems O(2^n))
• Its pattern complete/recovery
capability decreases when n
increases (# of spurious attractors
seems to increase exponentially)
input1
hidden1
output1
input2
hidden2
output2
input
hidden
output
Auto-association
Hetero-association
Bidirectional AM(BAM)
• Architecture:
– Two layers of non-linear units: X-layer, Y-layer
– Units: discrete threshold, continuing sigmoid (can be
either binary or bipolar).
• Weights:
– (Hebbian/outer product)
– Symmetric:
– Convert binary patterns to bipolar when constructing W
• Recall:
– Bidirectional, either by or by
– Recurrent:
– Update can be either asynchronous (as in HM) or
synchronous (change all Y units at one time, then all X
units the next time)
∑
=
× ⋅
=
P
p
T
m
n p
t
p
s
W
1
)
(
)
(
)
a
recall
to
( Y
X )
a
recall
to
( X
Y
∑
∑
=
=
⋅
=
+
+
+
=
+
−
⋅
=
=
m
j
j
ij
i
n
n
i
i
j
i
j
m
t
y
w
t
in
x
t
in
x
f
t
in
x
f
t
x
t
x
w
t
in
y
t
in
y
f
t
in
y
f
t
y
1
1
1
1
)
(
)
1
(
_
where
))
1
(
_
(
),......
1
(
_
(
(
)
1
(
)
1
(
)
(
_
where
))
(
_
(
),......
(
_
(
(
)
(
ji
ij w
w =
• Analysis (discrete case)
– Energy function: (also a Lyapunov function)
• The proof is similar to DHM
• Holds for both synchronous and asynchronous
update (holds for DHM only with asynchronous
update, due to lateral connections.)
– Storage capacity:
∑ ∑
= =
−
=
−
=
+
−
=
m
j
n
i
j
ij
i
T
T
T
T
y
w
x
XWY
X
YW
XWY
L
1 1
)
(
5
.
0
))
,
(max( m
n
ο
ο

More Related Content

Similar to NN-Ch3.PDF

JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rulesJAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning ruleshirokazutanaka
 
Redes neuronales basadas en competición
Redes neuronales basadas en competiciónRedes neuronales basadas en competición
Redes neuronales basadas en competiciónSpacetoshare
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkAtul Krishna
 
Hetro associative memory
Hetro associative memoryHetro associative memory
Hetro associative memoryDEEPENDRA KORI
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Dynamics of structures with uncertainties
Dynamics of structures with uncertaintiesDynamics of structures with uncertainties
Dynamics of structures with uncertaintiesUniversity of Glasgow
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptGayathriRHICETCSESTA
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptGayathriRHICETCSESTA
 
Lecture 11 state observer-2020-typed
Lecture 11 state observer-2020-typedLecture 11 state observer-2020-typed
Lecture 11 state observer-2020-typedcairo university
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxHamedNassar5
 

Similar to NN-Ch3.PDF (20)

JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rulesJAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
 
Max net
Max netMax net
Max net
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
CS767_Lecture_04.pptx
CS767_Lecture_04.pptxCS767_Lecture_04.pptx
CS767_Lecture_04.pptx
 
Redes neuronales basadas en competición
Redes neuronales basadas en competiciónRedes neuronales basadas en competición
Redes neuronales basadas en competición
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Hetro associative memory
Hetro associative memoryHetro associative memory
Hetro associative memory
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Dynamics of structures with uncertainties
Dynamics of structures with uncertaintiesDynamics of structures with uncertainties
Dynamics of structures with uncertainties
 
DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
 
Lec 3-4-5-learning
Lec 3-4-5-learningLec 3-4-5-learning
Lec 3-4-5-learning
 
feedforward-network-
feedforward-network-feedforward-network-
feedforward-network-
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 
Lecture 11 state observer-2020-typed
Lecture 11 state observer-2020-typedLecture 11 state observer-2020-typed
Lecture 11 state observer-2020-typed
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptx
 
ann-ics320Part4.ppt
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.ppt
 

More from gnans Kgnanshek

MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptxMICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptxgnans Kgnanshek
 
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptxEDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptxgnans Kgnanshek
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxgnans Kgnanshek
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxgnans Kgnanshek
 
Batch_Normalization.pptx
Batch_Normalization.pptxBatch_Normalization.pptx
Batch_Normalization.pptxgnans Kgnanshek
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdfgnans Kgnanshek
 
unit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptxunit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptxgnans Kgnanshek
 
3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptx3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptxgnans Kgnanshek
 

More from gnans Kgnanshek (19)

MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptxMICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
 
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptxEDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
 
types of research.pptx
types of research.pptxtypes of research.pptx
types of research.pptx
 
CS8601 4 MC NOTES.pdf
CS8601 4 MC NOTES.pdfCS8601 4 MC NOTES.pdf
CS8601 4 MC NOTES.pdf
 
CS8601 3 MC NOTES.pdf
CS8601 3 MC NOTES.pdfCS8601 3 MC NOTES.pdf
CS8601 3 MC NOTES.pdf
 
CS8601 2 MC NOTES.pdf
CS8601 2 MC NOTES.pdfCS8601 2 MC NOTES.pdf
CS8601 2 MC NOTES.pdf
 
CS8601 1 MC NOTES.pdf
CS8601 1 MC NOTES.pdfCS8601 1 MC NOTES.pdf
CS8601 1 MC NOTES.pdf
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
 
Batch_Normalization.pptx
Batch_Normalization.pptxBatch_Normalization.pptx
Batch_Normalization.pptx
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf
 
11_NeuralNets.pdf
11_NeuralNets.pdf11_NeuralNets.pdf
11_NeuralNets.pdf
 
NN-Ch7.PDF
NN-Ch7.PDFNN-Ch7.PDF
NN-Ch7.PDF
 
NN-Ch6.PDF
NN-Ch6.PDFNN-Ch6.PDF
NN-Ch6.PDF
 
NN-Ch5.PDF
NN-Ch5.PDFNN-Ch5.PDF
NN-Ch5.PDF
 
unit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptxunit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptx
 
POM all 5 Units.pptx
POM all 5 Units.pptxPOM all 5 Units.pptx
POM all 5 Units.pptx
 
3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptx3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptx
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

NN-Ch3.PDF

  • 1. Chapter3 Pattern Association & Associative Memory • Associating patterns which are – similar, – contrary, – in close proximity (spatial), – in close succession (temporal) • Associative recall – evoke associated patterns – recall a pattern by part of it – evoke/recall with incomplete/ noisy patterns • Two types of associations. For two patterns s and t – hetero-association (s != t) : relating two different patterns – auto-association (s = t): relating parts of a pattern with other parts
  • 2. • Architectures of NN associative memory – single layer (with/out input layer) – two layers (for bidirectional assoc.) • Learning algorithms for AM – Hebbian learning rule and its variations – gradient descent • Analysis – storage capacity (how many patterns can be remembered correctly in a memory) – convergence • AM as a model for human memory
  • 3. Training Algorithms for Simple AM y_m w_11 y_1 x_n x_1 w_1m w_n1 w_nm • Goal of learning: – to obtain a set of weights w_ij – from a set of training pattern pairs {s:t} – such that when s is applied to the input layer, t is computed at the output layer – s_1 s_n t_1 t_m • Network structure: single layer – one output layer of non-linear units and one input layer – similar to the simple network for classification in Ch. 2 j w s f t t s j T j all for ) ( : : pairs training all for • =
  • 4. • Similar to hebbian learning for classification in Ch. 2 • Algorithm: (bipolar or binary patterns) – For each training samples s:t: – are ON (binary) or have the same sign (bipolar) • • Instead of obtaining W by iterative updates, it can be computed from the training set by calculating the outer product of s and t. j i ij t s w ⋅ = ∆ } { ) ( ) ( 1 ij P P j i ij w W p t p s w = = ∑ = Hebbian rule j i ij t s w and both if increases ∆ patterns training all for updates after Then, initiall. 0 If P wij = ∆
  • 5. • Outer product. Let s and t be row vectors. Then for a particular training pair s:t And • It involves 3 nested loops p, i, j (order of p is irrelevant) p= 1 to P /* for every training pair */ i = 1 to n /* for every row in W */ j = 1 to m /* for every element j in row i */ [ ]             ∆ ∆ ∆ ∆ =             =             = ⋅ = ∆ nm n m m n n m m m n T w w w w t s t s t s t s t s t s t t s s p t p s p W ...... ...... ...... ...... ...... ,...... ) ( ) ( ) ( 1 1 11 1 2 1 2 1 1 1 1 1 ∑ = ⋅ = P p T p t p s P W 1 ) ( ) ( ) ( ) ( ) ( : p t p s w w j i ij ij ⋅ + =
  • 6. • Does this method provide a good association? – Recall with training samples (after the weights are learned or computed) – Apply to one layer, hope appear on the other, e.g. – May not always succeed (each weight contains some information from all samples) ) (k s ) (k t ) ( ) ) ( ( k t W k s f = ∑ ∑ ∑ ∑ ≠ ≠ = = + = + = ⋅ ⋅ = = k p T k p T T P p T P p T p t p s k s k t k s p t p s k s k t k s k s p t p s k s p t p s k s W k s ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 2 1 1 cross-talk term principal term
  • 7. • Principal term gives the association between s(k) and t(k). • Cross-talk represents correlation between s(k):t(k) and other training pairs. When cross-talk is large, s(k) will recall something other than t(k). • If all s(p) are orthogonal to each other, then , no sample other than s(k):t(k) contribute to the result. • There are at most n orthogonal vectors in an n-dimensional space. • Cross-talk increases when P increases. • How many arbitrary training pairs can be stored in an AM? – Can it be more than n (allowing some non-orthogonal patterns while keeping cross-talk terms small)? – Storage capacity (more later) 0 ) ( ) ( = ⋅ p s k s T
  • 8. Delta Rule • Similar to that used in Adaline • The original delta rule for weight update: • Extended delta rule – For output units with differentiable activation functions – Derived following gradient descent approach). i j j j i x y t w ) ( − = ∆ α α ) _ ( ) ( ' j i j j j i in y f x y t w − = ∆ α α i j j j ij ij I J J J i iJ IJ J J J IJ J J J IJ J J J J J IJ J J IJ x in y f y t w E w x in y f y t x w w in y f y t w in y f y t w y y t y t w y t w E ) _ ( ) ( 2 2 ) _ ( ) ( 2 ) _ ( ) ( 2 ) _ ( ) ( 2 ) ( 2 ) ( ) ( 2 ' ' ' − = ∂ ∂ − ∝ ∆ ⋅ − − = ∂ ∂ ⋅ − − = ∂ ∂ − − = ∂ ∂ − − = − ∂ ∂ − = ∂ ∂ ∑ ∑ = − = m j j j y t E 1 2 ) (
  • 9. • same as the update rule for output nodes in BP learning. • Works well if S are linearly independent (even if not orthogonal).
  • 10. Example of hetero-associative memory • Binary pattern pairs s:t with |s| = 4 and |t| = 2. • Total weighted input to output units: • Activation function: threshold • Weights are computed by Hebbian rule (sum of outer products of all training pairs) • Training samples: ∑ = i ij i j w x in y _    ≤ > = 0 _ 0 0 _ 1 j j j in y if in y if y ∑ = = P p j T i p t p s W 1 ) ( ) ( s(p) t(p) p=1 (1 0 0 0) (1, 0) p=2 (1 1 0 0) (1, 0) p=3 (0 0 0 1) (0, 1) p=4 (0 0 1 1) (0, 1)
  • 11. ( )             =             = ⋅ 0 0 0 0 0 0 0 1 0 1 0 0 0 1 ) 1 ( ) 1 ( t sT ( )             =             = ⋅ 1 0 0 0 0 0 0 0 1 0 1 0 0 0 ) 3 ( ) 3 ( t sT ( )             =             = ⋅ 0 0 0 0 0 1 0 1 0 1 0 0 1 1 ) 2 ( ) 2 ( t sT ( )             =             = ⋅ 1 0 1 0 0 0 0 0 1 0 1 1 0 0 ) 4 ( ) 4 ( t sT               = 2 0 1 0 0 1 0 2 W Computing the weights
  • 12. recall: x=(1 0 0 0) x=(0 1 0 0) (similar to S(1) and S(2) x=(0 1 1 0) ( ) ( ) 0 , 1 0 2 2 0 1 0 0 1 0 2 0 0 0 1 2 1 = = =             y y ( ) ( ) 0 , 1 0 1 2 0 1 0 0 1 0 2 0 0 1 0 2 1 = = =             y y ( ) ( ) 1 , 1 1 1 2 0 1 0 0 1 0 2 0 1 1 0 2 1 = = =             y y (1 0 0 0), (1 1 0 0) class (1, 0) (0 0 0 1), (0 0 1 1) class (0, 1) (0 1 1 0) is not sufficiently similar to any class delta-rule would give same or similar results.
  • 13. Example of auto-associative memory • Same as hetero-associative nets, except t(p) =s (p). • Used to recall a pattern by a its noisy or incomplete version. (pattern completion/pattern recovery) • A single pattern s = (1, 1, 1, -1) is stored (weights computed by Hebbian rule – outer product) •             − − − − − − = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 W ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) recognized not 0 0 0 0 1 1 1 1 noisy more 1 1 1 1 2 2 2 2 1 1 0 0 info missing 1 1 1 1 2 2 2 2 1 1 1 1 pat noisy 1 1 1 1 4 4 4 4 1 1 1 1 pat. training = ⋅ − − − − → − = ⋅ − − → − = ⋅ − − − → − = ⋅ − W W W W
  • 14. • Diagonal elements will dominate the computation when multiple patterns are stored (= P). • When P is large, W is close to an identity matrix. This causes output = input, which may not be any stoned pattern. The pattern correction power is lost. • Replace diagonal elements by zero. •             − − − − − − = 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 W wrong W W W W → − = − − − − → − = − − → − = − − − → − = − ) 1 1 1 1 ( ' ) 1 1 1 1 ( ) 1 1 1 1 ( ) 1 1 2 2 ( ' ) 1 1 0 0 ( ) 1 1 1 1 ( ) 1 1 1 3 ( ' ) 1 1 1 1 ( ) 1 1 1 1 ( ) 3 3 3 3 ( ' ) 1 1 1 1 (
  • 15. Storage Capacity • # of patterns that can be correctly stored & recalled by a network. • More patterns can be stored if they are not similar to each other (e.g., orthogonal) non-orthogonal orthogonal ) 1 1 1 1 ( ) 1 1 1 1 ( ) 1 1 1 1 ( − − → − − − −             − − − − − − − − − − − − = 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 W ) 1 1 1 1 ( ) 1 1 1 1 ( − → − −             − − − − = 0 2 0 2 2 0 0 2 0 0 0 0 2 2 0 0 0 W correctly stored not is It ) 1 1 0 1 ( ) 11 1 1 ( 0 − = ⋅ − − W recalled correctly be can patterns three All
  • 16. • Adding one more orthogonal pattern the weight matrix becomes: • Theorem: an n by n network is able to store up to n-1 mutually orthogonal (M.O.) bipolar vectors of n- dimension, but not n such vectors. • Informal argument: Suppose m orthogonal vectors are stored with the following weight matrix:             = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W ) 1 1 1 1 ( The memory is completely destroyed! ) ( )...... 1 ( m a a      = = ∑ = m p j i j i p a p a j i w 1 rule) (Hebbian otherwise ) ( ) ( ) diagonal (zero if 0
  • 17. ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ = ≠ = ≠ = = = = • • • • • • ⋅ = ⋅ = = ⋅ = = m p j i i i j n i j i m p j i i ij i n i in i n i i i n i i i n n p a k a p a p a p a k a w k a w k a w k a w k a w k a w k a w k a w w w k a W k a 1 1 1 1 1 2 1 1 2 1 2 1 ) ( ) ( ) ( ) ( ) ( ) ( ) ( : component jth the ) ) ( ...... , ) ( , ) ( ( ) ) ( ,...... ) ( , ) ( ( ) ,...... , )( ( ) ( Let’s try to recall one of them, say )) ( )...... ( ( ) ( 1 k a k a k a n =    = ⋅ = − ≠ − = − = ∑ ∑ = ≠ ) ) ( ) ( (since 1 M.O.) are ) ( and ) ( (since ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 n p a p a p k n p a k a p k p a k a p a k a p a k a p a k a T j j n i j j i i j i i i
  • 18. [ ] ) ( ) ( ) 1 )( ( ) ( ) 1 ( ) 1 )( ( ) ( ) 1 )( ( ) ( ) ( ) ( ) ( ) ( ) ( 1 k a m n n k a k a m n k a k a n k a p a k a p a p a k a p a j j j k p j j j m p j i k p j j j i i j − = − + − − = − + − = − + − = ∑ ∑ ∑ ∑ ≠ = ≠ ≠ ) ( ) ( ) ( Therefore, k a m n W k a − = • When m < n, a(k) can correctly recall itself when m = n, output is a 0 vector, recall fails • In linear algebraic term, a(k) is a eigenvector of W, whose corresponding eigenvalue is (n-m). when m = n, W has eigenvalue zero, the only eigenvector is 0, which is a trivial eigenvector.
  • 19. • How many mutually orthogonal bipolar vectors with given dimension n? n can be written as , where m is an odd integer. Then maximally: M.O. vectors • Follow up questions: – What would be the capacity of AM if stored patterns are not mutually orthogonal (say random) – Ability of pattern recovery and completion. How far off a pattern can be from a stored pattern that is still able to recall a correct/stored pattern – Suppose x is a stored pattern, x’ is close to x, and x”= f(xW) is even closer to x than x’. What should we do? Feed back x” , and hope iterations of feedback will lead to x m n k 2 = k 2
  • 20. Iterative Autoassociative Networks • Example: • In general: using current output as input of the next iteration x(0) = initial recall input x(I) = f(x(I-1)W), I = 1, 2, …… until x(N) = x(K) where K < N             − − − − − − = − = 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 ) 1 , 1 , 1 , 1 ( W x x W x x W x x = − → − = = − = = ) 1 , 1 , 1 , 1 ( ) 3 , 2 , 2 , 3 ( " " ) 1 , 1 , 1 , 0 ( ' ) 0 , 0 , 0 , 1 ( ' : input recall incomplete An Output units are threshold units
  • 21. • Dynamic System: state vector x(I) – If k = N-1, x(N) is a stable state (fixed point) f(x(N)W) = f(x(N-1)W) = x(N) • If x(K) is one of the stored pattern, then x(K) is called a genuine memory • Otherwise, x(K) is a spurious memory (caused by cross- talk/interference between genuine memories) • Each fixed point (genuine or spurious memory) is an attractor (with different attraction basin) – If k != N-1, limit-circle, • The network will repeat x(K), x(K+1), …..x(N)=x(K) when iteration continues. • Iteration will eventually stop because the total number of distinct state is finite (3^n) if threshold units are used. • If sigmoid units are used, the system may continue evolve forever (chaos).
  • 22. Discrete Hopfield Model • A single layer network – each node as both input and output units • More than an AM – Other applications e.g., combinatorial optimization • Different forms: discrete & continuous • Major contribution of John Hopfield to NN – Treating a network as a dynamic system – Introduce the notion of energy function & attractors into NN research
  • 23. Discrete Hopfield Model (DHM) as AM • Architecture: – single layer (units serve as both input and output) – nodes are threshold units (binary or bipolar) – weights: fully connected, symmetric, and zero diagonal – are external inputs, which may be transient or permanent 0 = = ii ji ij w w w i x
  • 24. • Weights: – To store patterns s(p), p=1,2,…P bipolar: same as Hebbian rule (with zero diagonal) binary: converting s(p) to bipolar when constructing W. 0 ) ( ) ( = ≠ = ∑ ii p j i ij w j i p s p s w 0 ) 1 ) ( 2 )( 1 ) ( 2 ( = ≠ − − = ∑ ii p j i ij w j i p s p s w
  • 25. • Recall – Use an input vector to recall a stored vector (book calls the application of DHM) – Each time, randomly select a unit for update Recall Procedure 1.Apply recall input vector to the network: 2.While convergence = fails do 2.1.Randomly select a unit 2.2. Compute 2.3. Determine activation of Yi 2.4. Periodically test for convergence. x n i x y i i ,.... 2 , 1 : = = ∑ ≠ + = i j ji j i i w y x in y _      < − = > = i i i i i i i i in y if in y if y in y if y θ θ θ θ θ θ _ 1 _ _ 1
  • 26. • Notes: 1. Each unit should have equal probability to be selected at step 2.1 2. Theoretically, to guarantee convergence of the recall process, only one unit is allowed to update its activation at a time during the computation. However, the system may converge faster if all units are allowed to update their activations at the same time. 3. Convergence test: 4. usually set to zero. 5. in step 2.2 ( ) is optional. i next y current y i i ∀ = ) ( ) ( i θ i x ∑ + = j ji j i j w y x in y _
  • 27. • Example: Store one pattern: Recall input ) same the gives 1) - 1 1 (1 t counterpar (bipolar ) 0 , 1 , 1 , 1 ( pattern binary W wrong are bits first two ), 0 , 1 , 0 , 0 ( = x ) 0 , 1 , 0 , 1 ( 1 1 1 0 _ 1 1 1 1 1 = = = + = ⋅ + = ∑ Y y w y x in y j selected is 1 Y             − − − − − − = 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 W selected is 4 Y ) 0 , 1 , 0 , 1 ( 2 2 ) 2 ( 0 _ 4 4 4 4 4 = − = − = − + = ⋅ + = ∑ Y y w y x in y j ) 0 , 1 , 0 , 1 ( 1 2 1 1 _ 3 3 3 3 3 = = = + = ⋅ + = ∑ Y y w y x in y j selected is 3 Y selected is 2 Y ) 0 , 1 , 1 , 1 ( 1 2 2 0 _ 2 2 2 2 2 = = = + = ⋅ + = ∑ Y y w y x in y j The stored pattern is correctly recalled
  • 28. Convergence Analysis of DHM • Two questions: 1.Will Hopfield AM converge (stop) with any given recall input? 2.Will Hopfield AM converge to the stored pattern that is closest to the recall input ? • Hopfield provides answer to the first question – By introducing an energy function to this model, – No satisfactory answer to the second question so far. • Energy function: – Notion in thermo-dynamic physical systems. The system has a tendency to move toward lower energy state. – Also known as Lyapunov function. After Lyapunov theorem for the stability of a system of differential equations.
  • 29. • In general, the energy function is the state of the system at step (time) t, must satisfy two conditions 1. is bounded from below 2. is monotonically nonincreasing. • The energy function defined for DHM • Show At t+1, is selected for update ) ( where )), ( ( t y t y E ) (t E t c t E ∀ ≥ ) ( ) (t E ) 0 ) ( : version continuous (in 0 ) ( ) 1 ( ) 1 ( ≤ ≤ − + = + ∆ t E t E t E t E & i j i j i i i i i ij j i y y x w y y E ∑∑ ∑ ∑ ≠ + − − = θ θ 5 . 0 0 ) 1 ( ≤ + ∆ t E k Y )) ( ) ( ) ( ) ( 5 . 0 ( )) 1 ( ) 1 ( ) 1 ( ) 1 ( 5 . 0 ( ) ( ) 1 ( time) a at update can unit one (only 0 ) 1 ( : Note ) ( ) 1 ( ) 1 ( t y t y x w t y t y t y t y x w t y t y t E t E k j t y t y t y t y i j i j i i i i i ij j i i j i j i i i i i ij j i j k k k ∑∑ ∑ ∑ ∑∑ ∑ ∑ ≠ ≠ + − − − + + + − + + − = − + ≠ = + ∆ − + = + ∆ θ θ θ θ
  • 30. terms which are different in the two parts are those involving • Show E(t) is bounded from below, since are all bounded, E is bounded. 0 ) 1 ( 0 ) 1 ( ) ( ) 1 ( otherwise, 0 ) 1 ( _ 1 ) 1 ( 1 ) 1 ( & 1 ) ( if 0 ) 1 ( _ 2 ) 1 ( 1 ) 1 ( & 1 ) ( if : cases ) 1 ( ] ) ( [ ) 1 ( , , , = + ∆ ⇒ = + ∆ ⇒ = + < + ∆ ⇒ > ⇒ = + ∆ = + − = < + ∆ ⇒ < ⇒ − = + ∆ − = + = + ∆ − + − = + ∆ ∑ ∑ ∑ ≠ t E t y t y t y t E in y t y t y t y t E in y t y t y t y t y x w t y t E y y x w y y w y y k k k k k k k k k k k k k k k j k k j k j k k k k i i j jk j k ki k θ θ θ θ θ θ θ θ ij i i i w x y , , , θ θ k y ) 1 ( _ + t in y k
  • 31. • Comments: 1.Why converge. • Each time, E is either unchanged or decreases an amount. • E is bounded from below. • There is a limit E may decrease. After finite number of steps, E will stop decrease no matter what unit is selected for update. 2.The state the system converges is a stable state. Will return to this state after some small perturbation. It is called an attractor (with different attraction basin) 3.Error function of BP learning is another example of energy/Lyapunov function. Because • It is bounded from below (E>0) • It is monotonically non-increasing (W updates along gradient descent of E) 0 _ 0 ) ( ) 1 ( = ∆ ⇒ = = ∆ ⇒ = + ∀ k k k k k y in y or y t y t y either k θ θ
  • 32. • P: maximum number of random patterns of dimension n can be stored in a DHM of n nodes • Hopfield’s observation: • Theoretical analysis: P/n decreases because larger n leads to more interference between stored patterns. • Some work to modify HM to increase its capacity to close to n, W is trained (not computed by Hebbian rule). 15 . 0 , 15 . 0 ≈ ≈ n P n P n n P n n P 2 2 log 2 1 , log 2 ≈ ≈ Capacity Analysis of DHM
  • 33. My Own Work: • One possible reason for the small capacity of HM is that it does not have hidden nodes. • Train feed forward network (with hidden layers) by BP to establish pattern auto-associative. • Recall: feedback the output to input layer, making it a dynamic system. • Shown 1) it will converge, and 2) stored patterns become genuine memories. • It can store many more patterns (seems O(2^n)) • Its pattern complete/recovery capability decreases when n increases (# of spurious attractors seems to increase exponentially) input1 hidden1 output1 input2 hidden2 output2 input hidden output Auto-association Hetero-association
  • 34. Bidirectional AM(BAM) • Architecture: – Two layers of non-linear units: X-layer, Y-layer – Units: discrete threshold, continuing sigmoid (can be either binary or bipolar).
  • 35. • Weights: – (Hebbian/outer product) – Symmetric: – Convert binary patterns to bipolar when constructing W • Recall: – Bidirectional, either by or by – Recurrent: – Update can be either asynchronous (as in HM) or synchronous (change all Y units at one time, then all X units the next time) ∑ = × ⋅ = P p T m n p t p s W 1 ) ( ) ( ) a recall to ( Y X ) a recall to ( X Y ∑ ∑ = = ⋅ = + + + = + − ⋅ = = m j j ij i n n i i j i j m t y w t in x t in x f t in x f t x t x w t in y t in y f t in y f t y 1 1 1 1 ) ( ) 1 ( _ where )) 1 ( _ ( ),...... 1 ( _ ( ( ) 1 ( ) 1 ( ) ( _ where )) ( _ ( ),...... ( _ ( ( ) ( ji ij w w =
  • 36. • Analysis (discrete case) – Energy function: (also a Lyapunov function) • The proof is similar to DHM • Holds for both synchronous and asynchronous update (holds for DHM only with asynchronous update, due to lateral connections.) – Storage capacity: ∑ ∑ = = − = − = + − = m j n i j ij i T T T T y w x XWY X YW XWY L 1 1 ) ( 5 . 0 )) , (max( m n ο ο