The document discusses associative memory and training algorithms for simple associative memory networks. It describes how patterns can be associated based on similarity, contrariness, spatial or temporal proximity. Associative recall involves evoking associated patterns with incomplete or noisy input patterns. The document outlines the architecture and learning algorithms of associative memory networks, including Hebbian learning and gradient descent. It provides an example of storing and recalling patterns from a basic hetero-associative memory network.
An artificial neuron network (neural network) is a computational model that mimics the way nerve cells work in the human brain. Artificial neural networks (ANNs) use learning algorithms that can independently make adjustments - or learn, in a sense - as they receive new input
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
During this presentation, we will answer how much you’ll need to invest in a superhero costume to be as popular as Superman. We will generate a unique logo which will stand against the ever popular Batman and create new superhero teams. We shall achieve it using linear regression and neural networks.
Machine learning is one of the hottest buzzwords in technology today as well as one of the most innovative fields in computer science – yet people use libraries as black boxes without basic knowledge of the field. In this session, we will strip them to bare math, so next time you use a machine learning library, you’ll have a deeper understanding of what lies underneath.
During this session, we will first provide a short history of machine learning and an overview of two basic teaching techniques: supervised and unsupervised learning.
We will start by defining what machine learning is and equip you with an intuition of how it works. We will then explain the gradient descent algorithm with the use of simple linear regression to give you an even deeper understanding of this learning method. Then we will project it to supervised neural networks training.
Within unsupervised learning, you will become familiar with Hebb’s learning and learning with concurrency (winner takes all and winner takes most algorithms). We will use Octave for examples in this session; however, you can use your favourite technology to implement presented ideas.
Our aim is to show the mathematical basics of neural networks for those who want to start using machine learning in their day-to-day work or use it already but find it difficult to understand the underlying processes. After viewing our presentation, you should find it easier to select parameters for your networks and feel more confident in your selection of network type, as well as be encouraged to dive into more complex and powerful deep learning methods.
An artificial neuron network (neural network) is a computational model that mimics the way nerve cells work in the human brain. Artificial neural networks (ANNs) use learning algorithms that can independently make adjustments - or learn, in a sense - as they receive new input
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
During this presentation, we will answer how much you’ll need to invest in a superhero costume to be as popular as Superman. We will generate a unique logo which will stand against the ever popular Batman and create new superhero teams. We shall achieve it using linear regression and neural networks.
Machine learning is one of the hottest buzzwords in technology today as well as one of the most innovative fields in computer science – yet people use libraries as black boxes without basic knowledge of the field. In this session, we will strip them to bare math, so next time you use a machine learning library, you’ll have a deeper understanding of what lies underneath.
During this session, we will first provide a short history of machine learning and an overview of two basic teaching techniques: supervised and unsupervised learning.
We will start by defining what machine learning is and equip you with an intuition of how it works. We will then explain the gradient descent algorithm with the use of simple linear regression to give you an even deeper understanding of this learning method. Then we will project it to supervised neural networks training.
Within unsupervised learning, you will become familiar with Hebb’s learning and learning with concurrency (winner takes all and winner takes most algorithms). We will use Octave for examples in this session; however, you can use your favourite technology to implement presented ideas.
Our aim is to show the mathematical basics of neural networks for those who want to start using machine learning in their day-to-day work or use it already but find it difficult to understand the underlying processes. After viewing our presentation, you should find it easier to select parameters for your networks and feel more confident in your selection of network type, as well as be encouraged to dive into more complex and powerful deep learning methods.
hetero associative memory is a single layer neural network. However, in this network the input training vector and the output target vectors are not the same. The weights are determined so that the network stores a set of patterns. Hetero associative network is static in nature, hence, there would be no non-linear and delay operations.
hetero associative memory is a single layer neural network. However, in this network the input training vector and the output target vectors are not the same. The weights are determined so that the network stores a set of patterns. Hetero associative network is static in nature, hence, there would be no non-linear and delay operations.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
"Protectable subject matters, Protection in biotechnology, Protection of othe...
NN-Ch3.PDF
1. Chapter3 Pattern Association &
Associative Memory
• Associating patterns which are
– similar,
– contrary,
– in close proximity (spatial),
– in close succession (temporal)
• Associative recall
– evoke associated patterns
– recall a pattern by part of it
– evoke/recall with incomplete/ noisy patterns
• Two types of associations. For two patterns s and t
– hetero-association (s != t) : relating two different patterns
– auto-association (s = t): relating parts of a pattern with
other parts
2. • Architectures of NN associative memory
– single layer (with/out input layer)
– two layers (for bidirectional assoc.)
• Learning algorithms for AM
– Hebbian learning rule and its variations
– gradient descent
• Analysis
– storage capacity (how many patterns can be
remembered correctly in a memory)
– convergence
• AM as a model for human memory
3. Training Algorithms for Simple AM
y_m
w_11
y_1
x_n
x_1
w_1m
w_n1
w_nm
• Goal of learning:
– to obtain a set of weights w_ij
– from a set of training pattern pairs {s:t}
– such that when s is applied to the input layer, t is computed
at the output layer
–
s_1
s_n
t_1
t_m
• Network structure: single layer
– one output layer of non-linear units and one input layer
– similar to the simple network for classification in Ch. 2
j
w
s
f
t
t
s j
T
j all
for
)
(
:
:
pairs
training
all
for •
=
4. • Similar to hebbian learning for classification in Ch. 2
• Algorithm: (bipolar or binary patterns)
– For each training samples s:t:
–
are ON (binary) or have the same sign (bipolar)
•
• Instead of obtaining W by iterative updates, it can be
computed from the training set by calculating the outer
product of s and t.
j
i
ij t
s
w ⋅
=
∆
}
{
)
(
)
(
1
ij
P
P
j
i
ij w
W
p
t
p
s
w =
= ∑
=
Hebbian rule
j
i
ij t
s
w and
both
if
increases
∆
patterns
training
all
for
updates
after
Then,
initiall.
0
If P
wij =
∆
5. • Outer product. Let s and t be row vectors.
Then for a particular training pair s:t
And
• It involves 3 nested loops p, i, j (order of p is irrelevant)
p= 1 to P /* for every training pair */
i = 1 to n /* for every row in W */
j = 1 to m /* for every element j in row i */
[ ]
∆
∆
∆
∆
=
=
=
⋅
=
∆
nm
n
m
m
n
n
m
m
m
n
T
w
w
w
w
t
s
t
s
t
s
t
s
t
s
t
s
t
t
s
s
p
t
p
s
p
W
......
......
......
......
......
,......
)
(
)
(
)
(
1
1
11
1
2
1
2
1
1
1
1
1
∑
=
⋅
=
P
p
T
p
t
p
s
P
W
1
)
(
)
(
)
(
)
(
)
(
: p
t
p
s
w
w j
i
ij
ij ⋅
+
=
6. • Does this method provide a good association?
– Recall with training samples (after the weights are
learned or computed)
– Apply to one layer, hope appear on the other,
e.g.
– May not always succeed (each weight contains some
information from all samples)
)
(k
s )
(k
t
)
(
)
)
(
( k
t
W
k
s
f =
∑
∑
∑
∑
≠
≠
=
=
+
=
+
=
⋅
⋅
=
=
k
p
T
k
p
T
T
P
p
T
P
p
T
p
t
p
s
k
s
k
t
k
s
p
t
p
s
k
s
k
t
k
s
k
s
p
t
p
s
k
s
p
t
p
s
k
s
W
k
s
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
2
1
1
cross-talk
term
principal
term
7. • Principal term gives the association between s(k) and t(k).
• Cross-talk represents correlation between s(k):t(k) and other
training pairs. When cross-talk is large, s(k) will recall
something other than t(k).
• If all s(p) are orthogonal to each other, then ,
no sample other than s(k):t(k) contribute to the result.
• There are at most n orthogonal vectors in an n-dimensional
space.
• Cross-talk increases when P increases.
• How many arbitrary training pairs can be stored in an AM?
– Can it be more than n (allowing some non-orthogonal patterns
while keeping cross-talk terms small)?
– Storage capacity (more later)
0
)
(
)
( =
⋅ p
s
k
s T
8. Delta Rule
• Similar to that used in Adaline
• The original delta rule for weight update:
• Extended delta rule
– For output units with differentiable activation functions
– Derived following gradient descent approach).
i
j
j
j
i x
y
t
w )
( −
=
∆ α
α
)
_
(
)
( '
j
i
j
j
j
i in
y
f
x
y
t
w −
=
∆ α
α
i
j
j
j
ij
ij
I
J
J
J
i
iJ
IJ
J
J
J
IJ
J
J
J
IJ
J
J
J
J
J
IJ
J
J
IJ
x
in
y
f
y
t
w
E
w
x
in
y
f
y
t
x
w
w
in
y
f
y
t
w
in
y
f
y
t
w
y
y
t
y
t
w
y
t
w
E
)
_
(
)
(
2
2
)
_
(
)
(
2
)
_
(
)
(
2
)
_
(
)
(
2
)
(
2
)
(
)
(
2
'
'
'
−
=
∂
∂
−
∝
∆
⋅
−
−
=
∂
∂
⋅
−
−
=
∂
∂
−
−
=
∂
∂
−
−
=
−
∂
∂
−
=
∂
∂
∑
∑
=
−
=
m
j
j
j y
t
E
1
2
)
(
9. • same as the update rule for output nodes in BP learning.
• Works well if S are linearly independent (even if not
orthogonal).
10. Example of hetero-associative memory
• Binary pattern pairs s:t with |s| = 4 and |t| = 2.
• Total weighted input to output units:
• Activation function: threshold
• Weights are computed by Hebbian rule (sum of outer
products of all training pairs)
• Training samples:
∑
=
i
ij
i
j w
x
in
y _
≤
>
=
0
_
0
0
_
1
j
j
j
in
y
if
in
y
if
y
∑
=
=
P
p
j
T
i p
t
p
s
W
1
)
(
)
(
s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
12. recall:
x=(1 0 0 0) x=(0 1 0 0) (similar to S(1) and S(2)
x=(0 1 1 0)
( ) ( )
0
,
1
0
2
2
0
1
0
0
1
0
2
0
0
0
1
2
1 =
=
=
y
y
( ) ( )
0
,
1
0
1
2
0
1
0
0
1
0
2
0
0
1
0
2
1 =
=
=
y
y
( ) ( )
1
,
1
1
1
2
0
1
0
0
1
0
2
0
1
1
0
2
1 =
=
=
y
y
(1 0 0 0), (1 1 0 0) class (1, 0)
(0 0 0 1), (0 0 1 1) class (0, 1)
(0 1 1 0) is not sufficiently similar
to any class
delta-rule would give same or
similar results.
13. Example of auto-associative memory
• Same as hetero-associative nets, except t(p) =s (p).
• Used to recall a pattern by a its noisy or incomplete version.
(pattern completion/pattern recovery)
• A single pattern s = (1, 1, 1, -1) is stored (weights computed
by Hebbian rule – outer product)
•
−
−
−
−
−
−
=
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
W
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) recognized
not
0
0
0
0
1
1
1
1
noisy
more
1
1
1
1
2
2
2
2
1
1
0
0
info
missing
1
1
1
1
2
2
2
2
1
1
1
1
pat
noisy
1
1
1
1
4
4
4
4
1
1
1
1
pat.
training
=
⋅
−
−
−
−
→
−
=
⋅
−
−
→
−
=
⋅
−
−
−
→
−
=
⋅
−
W
W
W
W
14. • Diagonal elements will dominate the computation when
multiple patterns are stored (= P).
• When P is large, W is close to an identity matrix. This
causes output = input, which may not be any stoned
pattern. The pattern correction power is lost.
• Replace diagonal elements by zero.
•
−
−
−
−
−
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
0
W
wrong
W
W
W
W
→
−
=
−
−
−
−
→
−
=
−
−
→
−
=
−
−
−
→
−
=
−
)
1
1
1
1
(
'
)
1
1
1
1
(
)
1
1
1
1
(
)
1
1
2
2
(
'
)
1
1
0
0
(
)
1
1
1
1
(
)
1
1
1
3
(
'
)
1
1
1
1
(
)
1
1
1
1
(
)
3
3
3
3
(
'
)
1
1
1
1
(
15. Storage Capacity
• # of patterns that can be correctly stored & recalled by a
network.
• More patterns can be stored if they are not similar to each
other (e.g., orthogonal)
non-orthogonal
orthogonal
)
1
1
1
1
(
)
1
1
1
1
(
)
1
1
1
1
(
−
−
→
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
0
W
)
1
1
1
1
(
)
1
1
1
1
(
−
→
−
−
−
−
−
−
=
0
2
0
2
2
0
0
2
0
0
0
0
2
2
0
0
0
W
correctly
stored
not
is
It
)
1
1
0
1
(
)
11
1
1
( 0 −
=
⋅
−
− W
recalled
correctly
be
can
patterns
three
All
16. • Adding one more orthogonal pattern the
weight matrix becomes:
• Theorem: an n by n network is able to store up to n-1
mutually orthogonal (M.O.) bipolar vectors of n-
dimension, but not n such vectors.
• Informal argument: Suppose m orthogonal vectors
are stored with the following weight matrix:
=
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
W
)
1
1
1
1
(
The memory is
completely destroyed!
)
(
)......
1
( m
a
a
=
=
∑
=
m
p
j
i
j
i
p
a
p
a
j
i
w
1
rule)
(Hebbian
otherwise
)
(
)
(
)
diagonal
(zero
if
0
17. ∑ ∑
∑ ∑ ∑
∑
∑
∑
= ≠
= ≠ =
=
=
=
•
•
•
•
•
•
⋅
=
⋅
=
=
⋅
=
=
m
p j
i
i
i
j
n
i j
i
m
p
j
i
i
ij
i
n
i
in
i
n
i
i
i
n
i
i
i
n
n
p
a
k
a
p
a
p
a
p
a
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
k
a
w
w
w
k
a
W
k
a
1
1 1
1
1
2
1
1
2
1
2
1
)
(
)
(
)
(
)
(
)
(
)
(
)
(
:
component
jth
the
)
)
(
......
,
)
(
,
)
(
(
)
)
(
,......
)
(
,
)
(
(
)
,......
,
)(
(
)
(
Let’s try to recall one of them, say ))
(
)......
(
(
)
( 1 k
a
k
a
k
a n
=
=
⋅
=
−
≠
−
=
−
= ∑
∑ =
≠
)
)
(
)
(
(since
1
M.O.)
are
)
(
and
)
(
(since
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
1
n
p
a
p
a
p
k
n
p
a
k
a
p
k
p
a
k
a
p
a
k
a
p
a
k
a
p
a
k
a
T
j
j
n
i
j
j
i
i
j
i
i
i
19. • How many mutually orthogonal bipolar vectors with given
dimension n?
n can be written as , where m is an odd integer.
Then maximally: M.O. vectors
• Follow up questions:
– What would be the capacity of AM if stored patterns are not
mutually orthogonal (say random)
– Ability of pattern recovery and completion.
How far off a pattern can be from a stored pattern that is still
able to recall a correct/stored pattern
– Suppose x is a stored pattern, x’ is close to x, and x”=
f(xW) is even closer to x than x’. What should we do?
Feed back x” , and hope iterations of feedback will lead to x
m
n k
2
=
k
2
20. Iterative Autoassociative Networks
• Example:
• In general: using current output as input of the next iteration
x(0) = initial recall input
x(I) = f(x(I-1)W), I = 1, 2, ……
until x(N) = x(K) where K < N
−
−
−
−
−
−
=
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
)
1
,
1
,
1
,
1
( W
x
x
W
x
x
W
x
x
=
−
→
−
=
=
−
=
=
)
1
,
1
,
1
,
1
(
)
3
,
2
,
2
,
3
(
"
"
)
1
,
1
,
1
,
0
(
'
)
0
,
0
,
0
,
1
(
'
:
input
recall
incomplete
An
Output units are
threshold units
21. • Dynamic System: state vector x(I)
– If k = N-1, x(N) is a stable state (fixed point)
f(x(N)W) = f(x(N-1)W) = x(N)
• If x(K) is one of the stored pattern, then x(K) is called a
genuine memory
• Otherwise, x(K) is a spurious memory (caused by cross-
talk/interference between genuine memories)
• Each fixed point (genuine or spurious memory) is an
attractor (with different attraction basin)
– If k != N-1, limit-circle,
• The network will repeat
x(K), x(K+1), …..x(N)=x(K) when iteration continues.
• Iteration will eventually stop because the total number of
distinct state is finite (3^n) if threshold units are used.
• If sigmoid units are used, the system may continue evolve
forever (chaos).
22. Discrete Hopfield Model
• A single layer network
– each node as both input and output units
• More than an AM
– Other applications e.g., combinatorial optimization
• Different forms: discrete & continuous
• Major contribution of John Hopfield to NN
– Treating a network as a dynamic system
– Introduce the notion of energy function & attractors into
NN research
23. Discrete Hopfield Model (DHM) as AM
• Architecture:
– single layer (units serve as both input and output)
– nodes are threshold units (binary or bipolar)
– weights: fully connected, symmetric, and zero diagonal
– are external
inputs, which
may be transient
or permanent
0
=
=
ii
ji
ij
w
w
w
i
x
24. • Weights:
– To store patterns s(p), p=1,2,…P
bipolar:
same as Hebbian rule (with zero diagonal)
binary:
converting s(p) to bipolar when constructing W.
0
)
(
)
(
=
≠
= ∑
ii
p
j
i
ij
w
j
i
p
s
p
s
w
0
)
1
)
(
2
)(
1
)
(
2
(
=
≠
−
−
= ∑
ii
p
j
i
ij
w
j
i
p
s
p
s
w
25. • Recall
– Use an input vector to recall a stored vector (book calls the
application of DHM)
– Each time, randomly select a unit for update
Recall Procedure
1.Apply recall input vector to the network:
2.While convergence = fails do
2.1.Randomly select a unit
2.2. Compute
2.3. Determine activation of Yi
2.4. Periodically test for convergence.
x n
i
x
y i
i ,....
2
,
1
: =
=
∑
≠
+
=
i
j
ji
j
i
i w
y
x
in
y _
<
−
=
>
=
i
i
i
i
i
i
i
i
in
y
if
in
y
if
y
in
y
if
y
θ
θ
θ
θ
θ
θ
_
1
_
_
1
26. • Notes:
1. Each unit should have equal probability to be selected
at step 2.1
2. Theoretically, to guarantee convergence of the recall
process, only one unit is allowed to update its
activation at a time during the computation. However,
the system may converge faster if all units are allowed
to update their activations at the same time.
3. Convergence test:
4. usually set to zero.
5. in step 2.2 ( ) is optional.
i
next
y
current
y i
i ∀
= )
(
)
(
i
θ
i
x ∑
+
=
j
ji
j
i
j w
y
x
in
y _
27. • Example:
Store one pattern:
Recall input
)
same
the
gives
1)
-
1
1
(1
t
counterpar
(bipolar
)
0
,
1
,
1
,
1
(
pattern
binary
W
wrong
are
bits
first two
),
0
,
1
,
0
,
0
(
=
x
)
0
,
1
,
0
,
1
(
1
1
1
0
_
1
1
1
1
1
=
=
=
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
selected
is
1
Y
−
−
−
−
−
−
=
0
1
1
1
1
0
1
1
1
1
0
1
1
1
1
0
W
selected
is
4
Y
)
0
,
1
,
0
,
1
(
2
2
)
2
(
0
_
4
4
4
4
4
=
−
=
−
=
−
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
)
0
,
1
,
0
,
1
(
1
2
1
1
_
3
3
3
3
3
=
=
=
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
selected
is
3
Y selected
is
2
Y
)
0
,
1
,
1
,
1
(
1
2
2
0
_
2
2
2
2
2
=
=
=
+
=
⋅
+
= ∑
Y
y
w
y
x
in
y j
The stored pattern is correctly recalled
28. Convergence Analysis of DHM
• Two questions:
1.Will Hopfield AM converge (stop) with any given recall input?
2.Will Hopfield AM converge to the stored pattern that is closest
to the recall input ?
• Hopfield provides answer to the first question
– By introducing an energy function to this model,
– No satisfactory answer to the second question so far.
• Energy function:
– Notion in thermo-dynamic physical systems. The system has a
tendency to move toward lower energy state.
– Also known as Lyapunov function. After Lyapunov theorem
for the stability of a system of differential equations.
29. • In general, the energy function is the state
of the system at step (time) t, must satisfy two conditions
1. is bounded from below
2. is monotonically nonincreasing.
• The energy function defined for DHM
• Show
At t+1, is selected for update
)
(
where
)),
(
( t
y
t
y
E
)
(t
E
t
c
t
E ∀
≥
)
(
)
(t
E
)
0
)
(
:
version
continuous
(in
0
)
(
)
1
(
)
1
( ≤
≤
−
+
=
+
∆ t
E
t
E
t
E
t
E &
i
j
i j i i
i
i
i
ij
j
i y
y
x
w
y
y
E ∑∑ ∑ ∑
≠
+
−
−
= θ
θ
5
.
0
0
)
1
( ≤
+
∆ t
E
k
Y
))
(
)
(
)
(
)
(
5
.
0
(
))
1
(
)
1
(
)
1
(
)
1
(
5
.
0
(
)
(
)
1
(
time)
a
at
update
can
unit
one
(only
0
)
1
(
:
Note
)
(
)
1
(
)
1
(
t
y
t
y
x
w
t
y
t
y
t
y
t
y
x
w
t
y
t
y
t
E
t
E
k
j
t
y
t
y
t
y
t
y
i
j
i j i i
i
i
i
ij
j
i
i
j
i j i i
i
i
i
ij
j
i
j
k
k
k
∑∑ ∑ ∑
∑∑ ∑ ∑
≠
≠
+
−
−
−
+
+
+
−
+
+
−
=
−
+
≠
=
+
∆
−
+
=
+
∆
θ
θ
θ
θ
30. terms which are different in the two parts are those involving
• Show E(t) is bounded from below, since are
all bounded, E is bounded.
0
)
1
(
0
)
1
(
)
(
)
1
(
otherwise,
0
)
1
(
_
1
)
1
(
1
)
1
(
&
1
)
(
if
0
)
1
(
_
2
)
1
(
1
)
1
(
&
1
)
(
if
:
cases
)
1
(
]
)
(
[
)
1
(
,
,
,
=
+
∆
⇒
=
+
∆
⇒
=
+
<
+
∆
⇒
>
⇒
=
+
∆
=
+
−
=
<
+
∆
⇒
<
⇒
−
=
+
∆
−
=
+
=
+
∆
−
+
−
=
+
∆ ∑
∑
∑
≠
t
E
t
y
t
y
t
y
t
E
in
y
t
y
t
y
t
y
t
E
in
y
t
y
t
y
t
y
t
y
x
w
t
y
t
E
y
y
x
w
y
y
w
y
y
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
j
k
k
j
k
j
k
k
k
k
i
i
j
jk
j
k ki
k
θ
θ
θ
θ
θ
θ
θ
θ
ij
i
i
i w
x
y ,
,
, θ
θ
k
y
)
1
(
_ +
t
in
y k
31. • Comments:
1.Why converge.
• Each time, E is either unchanged or decreases an amount.
• E is bounded from below.
• There is a limit E may decrease. After finite number of steps,
E will stop decrease no matter what unit is selected for update.
2.The state the system converges is a stable state.
Will return to this state after some small perturbation. It is called
an attractor (with different attraction basin)
3.Error function of BP learning is another example of
energy/Lyapunov function. Because
• It is bounded from below (E>0)
• It is monotonically non-increasing (W updates along gradient
descent of E)
0
_
0
)
(
)
1
(
=
∆
⇒
=
=
∆
⇒
=
+
∀
k
k
k
k
k
y
in
y
or
y
t
y
t
y
either
k
θ
θ
32. • P: maximum number of random patterns of dimension n
can be stored in a DHM of n nodes
• Hopfield’s observation:
• Theoretical analysis:
P/n decreases because larger n leads to more
interference between stored patterns.
• Some work to modify HM to increase its capacity to close
to n, W is trained (not computed by Hebbian rule).
15
.
0
,
15
.
0 ≈
≈
n
P
n
P
n
n
P
n
n
P
2
2 log
2
1
,
log
2
≈
≈
Capacity Analysis of DHM
33. My Own Work:
• One possible reason for the small
capacity of HM is that it does not
have hidden nodes.
• Train feed forward network (with
hidden layers) by BP to establish
pattern auto-associative.
• Recall: feedback the output to
input layer, making it a dynamic
system.
• Shown 1) it will converge, and 2)
stored patterns become genuine
memories.
• It can store many more patterns
(seems O(2^n))
• Its pattern complete/recovery
capability decreases when n
increases (# of spurious attractors
seems to increase exponentially)
input1
hidden1
output1
input2
hidden2
output2
input
hidden
output
Auto-association
Hetero-association
34. Bidirectional AM(BAM)
• Architecture:
– Two layers of non-linear units: X-layer, Y-layer
– Units: discrete threshold, continuing sigmoid (can be
either binary or bipolar).
35. • Weights:
– (Hebbian/outer product)
– Symmetric:
– Convert binary patterns to bipolar when constructing W
• Recall:
– Bidirectional, either by or by
– Recurrent:
– Update can be either asynchronous (as in HM) or
synchronous (change all Y units at one time, then all X
units the next time)
∑
=
× ⋅
=
P
p
T
m
n p
t
p
s
W
1
)
(
)
(
)
a
recall
to
( Y
X )
a
recall
to
( X
Y
∑
∑
=
=
⋅
=
+
+
+
=
+
−
⋅
=
=
m
j
j
ij
i
n
n
i
i
j
i
j
m
t
y
w
t
in
x
t
in
x
f
t
in
x
f
t
x
t
x
w
t
in
y
t
in
y
f
t
in
y
f
t
y
1
1
1
1
)
(
)
1
(
_
where
))
1
(
_
(
),......
1
(
_
(
(
)
1
(
)
1
(
)
(
_
where
))
(
_
(
),......
(
_
(
(
)
(
ji
ij w
w =
36. • Analysis (discrete case)
– Energy function: (also a Lyapunov function)
• The proof is similar to DHM
• Holds for both synchronous and asynchronous
update (holds for DHM only with asynchronous
update, due to lateral connections.)
– Storage capacity:
∑ ∑
= =
−
=
−
=
+
−
=
m
j
n
i
j
ij
i
T
T
T
T
y
w
x
XWY
X
YW
XWY
L
1 1
)
(
5
.
0
))
,
(max( m
n
ο
ο