SlideShare a Scribd company logo
1 of 57
Tutorial :
Echo State Networks
Dan Popovici
University of Montreal (UdeM)
MITACS 2005
Overview
1. Recurrent neural networks: a 1-minute
primer
2. Echo state networks
3. Examples, examples, examples
4. Open Issues
1 Recurrent neural networks
Feedforward- vs. recurrent NN
...
...
...
...
...
...
Input Input
Output Output
• connections only "from
left to right", no
connection cycle
• activation is fed forward
from input to output
through "hidden layers"
• no memory
• at least one connection
cycle
• activation can
"reverberate", persist
even with no input
• system with memory
recurrent NNs, main properties
• input time series  output time series
• can approximate any dynamical system
(universal approximation property)
• mathematical analysis difficult
• learning algorithms computationally
expensive and difficult to master
• few application-oriented publications, little
research
...
...
Supervised training of RNNs
A. Training
Teacher:
Model:
B. Exploitation
Input:
Correct (unknown)
output:
Model:
in
out
in
out
Backpropagation through time (BPTT)
• Most widely used general-
purpose supervised training
algorithm
• Idea: 1. stack network
copies, 2. interpret as
feedforward network, 3. use
backprop algorithm.
. . .
original
RNN
stack of
copies
What are ESNs?
• training method for
recurrent neural
networks
• black-box modelling of
nonlinear dynamical
systems
• supervised training,
offline and online
• exploits linear methods
for nonlinear modeling
...
+
+
Previously
ESN training
Introductory example: a tone generator
Goal: train a network to work as a tuneable tone
generator
input: frequency
setting
output: sines of
desired frequency
20 40 60 80 100
0.25
0.35
0.4
0.45
0.5
20 40 60 80 100
0.1
0.2
0.3
0.4
Tone generator, sampling
• For sampling period, drive fixed "reservoir" network with teacher input
and output.
20 40 60 80 100
0.25
0.35
0.4
0.45
0.5
20 40 60 80 100
0.1
0.2
0.3
0.4
• Observation: internal states of dynamical reservoir reflect both input
and output teacher signals
20 40 60 80 100
-0.5
-0.4
-0.3
-0.2
-0.1
20 40 60 80 100
-0.8
-0.6
-0.4
-0.2
20 40 60 80 100
-0.75
-0.5
-0.25
0.25
0.5
0.75
20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
Tone generator: compute weights
• Determine reservoir-to-output weights such that training
output is optimally reconstituted from internal "echo" signals.
20 40 60 80 100
0.25
0.35
0.4
0.45
0.5
20 40 60 80 100
0.1
0.2
0.3
0.4
20 40 60 80 100
-0.5
-0.4
-0.3
-0.2
-0.1
20 40 60 80 100
-0.8
-0.6
-0.4
-0.2
20 40 60 80 100
-0.75
-0.5
-0.25
0.25
0.5
0.75
20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
Tone generator: exploitation
• With new output weights in place, drive trained network with input.
• Observation: network continues to function as in training.
– internal states reflect input and output
– output is reconstituted from internal states
• internal states and output create each other
20 40 60 80 100
0.25
0.35
0.4
0.45
0.5
20 40 60 80 100
0.1
0.2
0.3
0.4
20 40 60 80 100
-0.5
-0.4
-0.3
-0.2
-0.1
20 40 60 80 100
-0.8
-0.6
-0.4
-0.2
20 40 60 80 100
-0.75
-0.5
-0.25
0.25
0.5
0.75
20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
echo
reconstitute
Tone generator: generalization
The trained generator network also works with input different from training input
50 100 150 200
0.1
0.2
0.3
0.4
50 100 150 200
0.25
0.35
0.4
0.45
0.5
200
0.6
0.7
0.8
0.9
200
- 0.6
- 0.5
- 0.4
- 0.3
200
- 0.7
- 0.6
- 0.5
- 0.4
200
- 0.75
- 0.5
- 0.25
0.25
0.5
0.75
A. step input B. teacher and learned output
C. some internal states
Dynamical reservoir
• large recurrent
network (100 - 
units)
• works as
"dynamical
reservoir", "echo
chamber"
• units in DR
respond differently
to excitation
• output units
combine different
internal dynamics
into desired
dynamics
...
...
input units output units
recurrent "dynamical
reservoir"
Rich excited dynamics
...
10 20 30 40 50
-0.8
-0.6
-0.4
-0.2
0.2
0.4
10 20 30 40 50
-0.2
0.2
0.4
0.6
10 20 30 40 50
-0.2
0.2
0.4
10 20 30 40 50
-0.4
-0.2
0.2
0.4
0.6
10 20 30 40 50
0.2
0.4
0.6
0.8
1
excitation
responses
Unit impulse
responses should
vary greatly.
Achieve this by,
e.g.,
• inhomogeneous
connectivity
• random weights
• different time
constants
• ...
Notation and Update Rules
))
(
),
1
(
),
1
(
(
(
)
1
(
))
(
)
(
)
1
(
(
)
1
(
)
(
),
(
),
(
),
(
))'
(
),...,
(
(
)
(
))'
(
),...,
(
(
)
(
))'
(
),...,
(
(
)
(
1
1
1
n
y
n
x
n
u
W
f
n
y
n
y
W
n
Wx
n
u
W
f
n
x
w
W
w
W
w
W
w
W
n
y
n
y
n
y
n
x
n
x
n
x
n
u
n
u
n
u
out
out
back
in
back
ij
back
out
ij
out
ij
in
ij
in
L
N
K
















Learning: basic idea
Every stationary deterministic dynamical system can be defined
by an equation like
 

 ),
2
(
),
1
(
,
),
1
(
),
(
)
( 


 t
d
t
d
t
u
t
u
h
t
d
where the system function h might be a monster.
Combine h from the I/O echo functions by selecting
suitable DR-to-output weights :






i
i
i
i
i
i
t
y
t
u
h
w
t
x
w
t
y
t
d
),...)
1
(
),...,
(
(
)
(
)
(
)
(
i
w
)
(t
xi i
w
)
(t
y
)
(t
u
Offline training: task definition


i
i
i t
x
w
t
y )
(
)
(
Let be the teacher output. .
)
(t
d
Compute weights such that mean square error
    ]
)
(
)
(
[
]
)
(
)
(
[ 2
2



 t
x
w
t
d
E
t
y
t
d
E i
i
is minimized.
Recall
Offline training: how it works
1. Let network run with training signal teacher-forced.
2. During this run, collect network states , in matrix M
3. Compute weights , such that
)
(t
d
)
(t
xi
i
w   ]
)
(
)
(
[ 2

 t
x
w
t
d
E i
i
is minimized
MSE minimizing weight computation (step 3) is a standard
operation.
Many efficient implementations available, offline/constructive
and online/adaptive.
T
M
w 1


Practical Considerations
back
in
W
W
W ,
, Chosen randomly
• Spectral radius of W < 1
• W should be sparse
• Input and feedback weights have to be scaled
“appropriately”
• Adding noise in the update rule can increase
generalization performance
Echo state network training, summary
• use large recurrent network as "excitable
dynamical reservoir (DR)"
• DR is not modified through learning
• adapt only DR output weights
• thereby combine desired system function from I/O
history echo functions
• use any offline or online linear regression algorithm
to minimize error
]
))
(
)
(
[( 2
t
y
t
d
E 
3 Examples, examples,
examples
Short-term memories
Delay line: scheme
...
t
input: ( )
s t
outputs:
( ),..., ( )
s t-d1 s t-dn
s t
( )
s t-d
( 1)
s t-dn
( ) ...
Delay line: example
• Network size 400
• Delays: 1, 30, 60, 70, 80, 90, 100, 103, 106, 120 steps
• Training sequence length N = 2000
10 20 30 40 50
-0.4
-0.2
0.2
0.4
training signal: random
walk with resting states
10 20 30 40 50
-0.4
-0.2
0.2
0.4
10 20 30 40 50
-0.4
-0.2
0.2
0.4
10 20 30 40 50
-0.4
-0.2
0.2
0.4
10 20 30 40 50
-0.4
-0.2
0.2
0.4
10 20 30 40 50
-0.2
0.2
0.4
10 20 30 40 50
-0.4
-0.2
0.2
0.4
10 20 30 40 50
-0.4
-0.2
0.2
0.4
10 20 30 40 50
-0.4
-0.2
0.2
0.4
results
correct delayed signals ( ) and network outputs ( )
-1 -30 -60 -90
-100 -103 -106 -120
traces of some DR internal units
1020304050
-0.001
-0.0005
0.0005
0.001
0.0015
1020304050
-0.002
-0.0015
-0.001
-0.0005
0.0005
0.001
0.0015
1020304050
-0.0004
-0.0002
0.0002
0.0004
1020304050
-0.0015
-0.001
-0.0005
0.0005
0.001
10 20 30 40 50
-0.2
-0.1
0.1
0.2
0.3
0.4
10 20 30 40 50
-0.2
-0.1
0.1
0.2
10 20 30 40 50
-0.3
-0.2
-0.1
0.1
0.2
10 20 30 40 50
-0.2
-0.1
0.1
0.2
10 20 30 40 50
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
10 20 30 40 50
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
10 20 30 40 50
0.1
0.2
0.3
0.4
10 20 30 40 50
-0.2
-0.1
0.1
0.2
0.3
0.4
Delay line: test with different input
correct delayed signals ( ) and network outputs ( )
-1 -30 -60 -90
-100 -103 -106 -120
traces of some DR internal units
1020304050
-0.001
-0.0005
0.0005
0.001
1020304050
-0.0015
-0.001
-0.0005
0.0005
0.001
0.0015
1020304050
-0.0004
-0.0002
0.0002
0.0004
0.0006
1020
304050
-0.0005
-0.00025
0.00025
0.0005
0.00075
0.001
3.2 Indentification of
nonlinear systems
Identifying higher-order nonlinear systems









9
,...
0
1
.
0
)
9
(
)
(
5
.
1
)
(
)
(
05
.
0
)
(
3
.
0
)
1
(
k
n
u
n
u
k
n
y
n
y
n
y
n
y
A tenth-order system
...
20 40 60 80 100
0.1
0.2
0.3
0.4
20 40 60 80 100
0.3
0.4
0.5
0.6
Training setup
)
(n
y
)
(n
u
Results: offline learning
augmented ESN (800
Parameters) :
NMSEtest = 0.006
previous published
state of the art1):
NMSEtrain = 0.24
D. Prokhorov, pers.
communication2):
NMSEtest = 0.004
50 100 150 200
0.3
0.4
0.5
0.6
0.7
1) Atiya & Parlos (2000), IEEE Trans. Neural Networks 11(3), 697-708
2) EKF-RNN, 30 units, 1000 Parameters.
The Mackey-Glass equation
• delay
differential
equation
• delay t > 16.8:
chaotic
• benchmark for
time series
prediction
)
(
1
.
0
)
)
(
1
(
/
)
(
2
.
0
)
( 10
t
x
t
x
t
x
t
x 
t


t



50 100 150 200 250 300
0.4
0.6
0.8
1.2
50 100 150 200 250 300
0.6
0.8
1.2
0.4 0.6 0.8 1.2
0.4
0.6
0.8
1.2
0.4 0.6 0.8 1.2
0.4
0.6
0.8
1.2
t = 17
t = 30
Learning setup
• network size 1000
• training sequence N = 3000
• sampling rate 1
50 100 150 200 250 300
0.6
0.8
1.2
Results for t = 17
Error for 84-step prediction:
NRMSE = 1E-4.2
(averaged over 100 training
runs on independently created
data)
With refined training method:
NRMSE = 1E-5.1
previous best:
NRMSE = 1E-1.7
original
0.4 0.6 0.8 1.2
0.4
0.6
0.8
1.2
0.4 0.6 0.8 1.2
0.4
0.6
0.8
1.2
learnt model
Prediction with model
visible discrepancy after about 1500 steps
. . .
. . .
Comparison: NRMSE for 84-step prediction
-1.2
-1.2
-1.3
-1.3
-1.7
-1.7
-4.2
-5.1
-1.7
ESN (refined) -5.1
ESN (1+2 K) -4.2
PCR Local Model (McNames 99, 2 K) -1.7
SOM (Vesanto 97, 3K) -1.7
DCS-LLM (Chudy & Farkas 98, 3K) -1.7
AMB (Bersini et al 98, ? K) * -1.3
Neural Gaz (Martinez et al 93, ~4K) -1.3
EPNet (Yao & Liu 97, 0.5 K) -1.2
BPNN (Lapedes & Farber 87, ? K) * -1.2
*) data from survey in Gers / Eck /Schmidhuber 2000
log10(NRMSE)
3.3 Dynamic pattern
recognition
Dynamic pattern detection1)
Training signal:
output jumps to 1 after occurence of pattern instance in input
)
(n
y
)
(n
u
1) see GMD Report Nr 152 for detailed coverage
Single-instance patterns, training setup
1. A single-instance, 10-
step pattern is randomly
fixed
4 6 8 10
-0.4
-0.2
0.2
0.4
2. It is inserted into 500-
step random signal at
positions
200 (for training)
350, 400, 450, 500 (for
testing)
3. 100-unit ESN trained
on first 300 steps (single
positive instance! "single
shot learning), tested on
remaining 200 steps
50 100 150 200
-0.4
-0.2
0.2
0.4
test data: 200 steps with 4 occurances of pattern on
random background, desired output: red impulses
the pattern
50 100 150 200
-0.75
-0.5
-0.25
0.25
0.5
0.75
1
50 100 150 200
-0.02
0.02
0.04
0.06
0.08
0.1
50 100 150 200
-0.05
0.05
0.1
0.15
0.2
0.25
0.3
Single-instance patterns, results
1. trained network
response on test data
50 100 150 200
-0.1
0.1
0.2
0.3
2. network response after
training 800 more pattern-
free steps ("negative
examples")
3. like 2., but 5 positive
examples in training data
DR: 12.4
DR: 12.1
DR: 6.4
4. comparison: optimal
linear filter
DR: 3.5
discrimination ratio DR:
)]
(
[
/
)]
(
[ 2
2 

n
d
E
n
d
E
Event detection for robots
(joint work with J.Hertzberg & F. Schönherr)
Robot runs through office environment, experiences
data streams (27 channels) like...
10 sec
infrared distance sensor
left motor speed
activation of "goThruDoor"
external teacher signal,
marking event category
Learning setup
...
...
27 (raw) data channels unlimited number of event
detector channels
100 unit RNN
• simulated robot (rich
simulation)
• training run spans 15
simulated minutes
• event categories like
• pass through door
• pass by 90° corner
• pass by smooth corner
Results
• easy to train event hypothesis signals
• "boolean" categories possible
• single-shot learning possible
Network setup in training
...
...
_
a
z
29 input channels
code symbols
.
.
.
29 output channels for
next symbol hypotheses
400 units
Trained network in "text" generation
...
...
decision
mechanism, e.g.
winner-take-all
!!
winning symbol is next input
Results
Selection by random draw according to output
yth_upsghteyshhfakeofw_io,l_yodoinglle_d_upeiuttytyr_hsymua_doey_sa
mmusos_trll,t.krpuflvek_hwiblhooslolyoe,_wtheble_ft_a_gimllveteud_ ...
Winner-take-all selection
sdear_oh,_grandmamma,_who_will_go_and_the_wolf_said_the_wolf_said
_the_wolf_said_the_wolf_said_the_wolf_said_the_wolf_said_the_wolf ...
4 Open Issues
4.2 Multiple timescales
4.3 Additive mixtures of dynamics
4.4 "Switching" memory
4.5 High-dimensional dynamics
Multiple time scales
This is hard to learn (Laser benchmark time series):
100 200 300 400 500
-1
-0.8
-0.6
-0.4
-0.2
0.2
0.4
Reason: 2 widely separated time scales
Approach for future research: ESNs with different
time constants in their units
Additive dynamics
This proved impossible to learn:
Reason: requires 2 independent oscillators; but in
ESN all dynamics are mutually coupled.
Approach for future research: modular ESNs and
unsupervised multiple expert learning
50 100 150 200 250 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
)
311
.
0
Sin(
)
2
.
0
Sin(
)
( n
n
n
y 

"Switching" memory
This FSA has long memory "switches":
Generating such sequences not possible with monotonic, area-
bounded forgetting curves!
a a
b
c
baaa....aaacaaa...aaabaaa...aaacaaa...aaa...
bounded
area
unbounded
width
An ESN simply is not a model for long-term memory!
High-dimensional dynamics
High-dimensional dynamics would require very large ESN.
Example: 6-DOF nonstationary time series one-step prediction
200-unit ESN: RMS = 0.2; 400-unit network: RMS = 0.1; best other
training technique1): RMS = 0.02
Approach for future research: task-specific optimization of ESN
100 200 300 400 500 600
0.2
0.4
0.6
0.8
f
eu
du
u
cu
bu
au
n
y






2
1
2
1
2
2
2
1
)
(
1)Prokhorov et al, extended Kalman filtering BPPT. Network size 40, 1400
trained links, training time 3 weeks
Spreading trouble...
• Signals xi(n) of reservoir can be interpreted as vectors in
(infinite-dimensional) signal space
• Correlation E[xy] yields inner product < x, y > on this space
• Output signal y(n) is linear combination of these xi(n)
• The more orthogonal the xi(n), the smaller the output weights:
y
y
x1
x2
x2
x1
y = 30 x1  28 x2 y = 0.5 x1  0.7 x2
• Eigenvectors vk of correlation matrix R = (E[xi x j] ) are
orthogonal signals
• Eigenvalues lk indicate what "mass" of reservoir signals xi
(all together) is aligned with vk
• Eigenvalue spread l max/ l min indicates overall "non-
orthogonality" of reservoir signals
vmax
x1
x2
x2
x1
vmin
vmax
vmin
l max/ l min  20 l max/ l min  1
Large eigenvalue spread 
large output weights ...
• harmful for generalization, because
slight changes in reservoir signals
will induce large changes in output
• harmful for model accuracy, because
estimation error contained in
reservoir signals is magnified
(applies not to deterministic systems)
• renders LMS online adaptive
learning useless
vmax
x1
x2
vmin
l max/ l min  20
Summary
• Basic idea: dynamical reservoir of echo states +
supervised teaching of output connections.
• Seemed difficult: in nonlinear coupled systems,
every variable interacts with every other. BUT
seen the other way round, every variable rules and
echoes every other. Exploit this for local learning
and local system analysis.
• Echo states shape the tool for the solution from
the task.
Thank you.
References
• H. Jaeger (2002): Tutorial on training recurrent
neural networks, covering BPPT, RTRL, EKF and
the "echo state network" approach. GMD Report
159, German National Research Center for
Information Technology, 2002
• Slides used by Herbert Jaeger at IK2002

More Related Content

Similar to 02-11-2005.ppt

Design for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFTDesign for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFTjayasreenimmakuri777
 
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...Andreas Dewes
 
Real Time Implementation of Active Noise Control
Real Time Implementation of Active Noise ControlReal Time Implementation of Active Noise Control
Real Time Implementation of Active Noise ControlChittaranjan Baliarsingh
 
Ann model and its application
Ann model and its applicationAnn model and its application
Ann model and its applicationmilan107
 
Resilience at Extreme Scale
Resilience at Extreme ScaleResilience at Extreme Scale
Resilience at Extreme ScaleMarc Snir
 
Implementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew GroupsImplementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew GroupsM Mei
 
Efficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input DataEfficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input Dataymelka
 
Frame detection.pdf
Frame detection.pdfFrame detection.pdf
Frame detection.pdfinfomerlin
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_finalNoha Elprince
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Combined EMC Test System
Combined EMC Test SystemCombined EMC Test System
Combined EMC Test SystemLisun Group
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 

Similar to 02-11-2005.ppt (20)

Design for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFTDesign for Test [DFT]-1 (1).pdf DESIGN DFT
Design for Test [DFT]-1 (1).pdf DESIGN DFT
 
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
 
Real Time Implementation of Active Noise Control
Real Time Implementation of Active Noise ControlReal Time Implementation of Active Noise Control
Real Time Implementation of Active Noise Control
 
Ann model and its application
Ann model and its applicationAnn model and its application
Ann model and its application
 
Resilience at Extreme Scale
Resilience at Extreme ScaleResilience at Extreme Scale
Resilience at Extreme Scale
 
Implementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew GroupsImplementing Useful Clock Skew Using Skew Groups
Implementing Useful Clock Skew Using Skew Groups
 
Efficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input DataEfficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input Data
 
N ns 1
N ns 1N ns 1
N ns 1
 
Frame detection.pdf
Frame detection.pdfFrame detection.pdf
Frame detection.pdf
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Unsupervised learning networks
Unsupervised learning networksUnsupervised learning networks
Unsupervised learning networks
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_final
 
Opal rt e phaso rsim_2013
Opal rt e phaso rsim_2013Opal rt e phaso rsim_2013
Opal rt e phaso rsim_2013
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Combined EMC Test System
Combined EMC Test SystemCombined EMC Test System
Combined EMC Test System
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Dsp lab pdf
Dsp lab pdfDsp lab pdf
Dsp lab pdf
 

Recently uploaded

Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Hauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhi
Hauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhiHauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhi
Hauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhiHot Call Girls In Sector 58 (Noida)
 
What Causes BMW Chassis Stabilization Malfunction Warning To Appear
What Causes BMW Chassis Stabilization Malfunction Warning To AppearWhat Causes BMW Chassis Stabilization Malfunction Warning To Appear
What Causes BMW Chassis Stabilization Malfunction Warning To AppearJCL Automotive
 
The 10th anniversary, Hyundai World Rally Team's amazing journey
The 10th anniversary, Hyundai World Rally Team's amazing journeyThe 10th anniversary, Hyundai World Rally Team's amazing journey
The 10th anniversary, Hyundai World Rally Team's amazing journeyHyundai Motor Group
 
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...
Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...
Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...Hot Call Girls In Sector 58 (Noida)
 
Dubai Call Girls Size E6 (O525547819) Call Girls In Dubai
Dubai Call Girls  Size E6 (O525547819) Call Girls In DubaiDubai Call Girls  Size E6 (O525547819) Call Girls In Dubai
Dubai Call Girls Size E6 (O525547819) Call Girls In Dubaikojalkojal131
 
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 person
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 personDelhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 person
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 personshivangimorya083
 
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...Delhi Call girls
 
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂
꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂
꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂Hot Call Girls In Sector 58 (Noida)
 
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111Sapana Sha
 
Call me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home DeliveryCall me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home DeliveryPooja Nehwal
 
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...Delhi Call girls
 
How To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative ErrorHow To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative ErrorAndres Auto Service
 
꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...
꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...
꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...Hot Call Girls In Sector 58 (Noida)
 
Russian Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...
Russian  Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...Russian  Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...
Russian Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...shivangimorya083
 
FULL ENJOY - 9953040155 Call Girls in Sector 61 | Noida
FULL ENJOY - 9953040155 Call Girls in Sector 61 | NoidaFULL ENJOY - 9953040155 Call Girls in Sector 61 | Noida
FULL ENJOY - 9953040155 Call Girls in Sector 61 | NoidaMalviyaNagarCallGirl
 

Recently uploaded (20)

Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Hauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhi
Hauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhiHauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhi
Hauz Khas Call Girls ☎ 7042364481 independent Escorts Service in delhi
 
What Causes BMW Chassis Stabilization Malfunction Warning To Appear
What Causes BMW Chassis Stabilization Malfunction Warning To AppearWhat Causes BMW Chassis Stabilization Malfunction Warning To Appear
What Causes BMW Chassis Stabilization Malfunction Warning To Appear
 
The 10th anniversary, Hyundai World Rally Team's amazing journey
The 10th anniversary, Hyundai World Rally Team's amazing journeyThe 10th anniversary, Hyundai World Rally Team's amazing journey
The 10th anniversary, Hyundai World Rally Team's amazing journey
 
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Kirti Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Kirti Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In Kirti Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Kirti Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...
Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...
Alina 7042364481 Call Girls Service Pochanpur Colony - independent Pochanpur ...
 
Dubai Call Girls Size E6 (O525547819) Call Girls In Dubai
Dubai Call Girls  Size E6 (O525547819) Call Girls In DubaiDubai Call Girls  Size E6 (O525547819) Call Girls In Dubai
Dubai Call Girls Size E6 (O525547819) Call Girls In Dubai
 
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 person
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 personDelhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 person
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Full night Service for more than 1 person
 
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
 
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Saket 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂
꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂
꧁༒☬ 7042364481 (Call Girl) In Dwarka Delhi Escort Service In Delhi Ncr☬༒꧂
 
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
 
Call me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home DeliveryCall me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
 
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
Call Girls in Malviya Nagar Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts Ser...
 
How To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative ErrorHow To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
 
꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...
꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...
꧁ ୨⎯Call Girls In Ashok Vihar, New Delhi **✿❀7042364481❀✿**Escorts ServiCes C...
 
Russian Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...
Russian  Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...Russian  Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...
Russian Call Girls Delhi Indirapuram {9711199171} Aarvi Gupta ✌️Independent ...
 
FULL ENJOY - 9953040155 Call Girls in Sector 61 | Noida
FULL ENJOY - 9953040155 Call Girls in Sector 61 | NoidaFULL ENJOY - 9953040155 Call Girls in Sector 61 | Noida
FULL ENJOY - 9953040155 Call Girls in Sector 61 | Noida
 
Call Girls In Kirti Nagar 7042364481 Escort Service 24x7 Delhi
Call Girls In Kirti Nagar 7042364481 Escort Service 24x7 DelhiCall Girls In Kirti Nagar 7042364481 Escort Service 24x7 Delhi
Call Girls In Kirti Nagar 7042364481 Escort Service 24x7 Delhi
 

02-11-2005.ppt

  • 1. Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005
  • 2. Overview 1. Recurrent neural networks: a 1-minute primer 2. Echo state networks 3. Examples, examples, examples 4. Open Issues
  • 4. Feedforward- vs. recurrent NN ... ... ... ... ... ... Input Input Output Output • connections only "from left to right", no connection cycle • activation is fed forward from input to output through "hidden layers" • no memory • at least one connection cycle • activation can "reverberate", persist even with no input • system with memory
  • 5. recurrent NNs, main properties • input time series  output time series • can approximate any dynamical system (universal approximation property) • mathematical analysis difficult • learning algorithms computationally expensive and difficult to master • few application-oriented publications, little research ... ...
  • 6. Supervised training of RNNs A. Training Teacher: Model: B. Exploitation Input: Correct (unknown) output: Model: in out in out
  • 7. Backpropagation through time (BPTT) • Most widely used general- purpose supervised training algorithm • Idea: 1. stack network copies, 2. interpret as feedforward network, 3. use backprop algorithm. . . . original RNN stack of copies
  • 8. What are ESNs? • training method for recurrent neural networks • black-box modelling of nonlinear dynamical systems • supervised training, offline and online • exploits linear methods for nonlinear modeling ... + + Previously ESN training
  • 9. Introductory example: a tone generator Goal: train a network to work as a tuneable tone generator input: frequency setting output: sines of desired frequency 20 40 60 80 100 0.25 0.35 0.4 0.45 0.5 20 40 60 80 100 0.1 0.2 0.3 0.4
  • 10. Tone generator, sampling • For sampling period, drive fixed "reservoir" network with teacher input and output. 20 40 60 80 100 0.25 0.35 0.4 0.45 0.5 20 40 60 80 100 0.1 0.2 0.3 0.4 • Observation: internal states of dynamical reservoir reflect both input and output teacher signals 20 40 60 80 100 -0.5 -0.4 -0.3 -0.2 -0.1 20 40 60 80 100 -0.8 -0.6 -0.4 -0.2 20 40 60 80 100 -0.75 -0.5 -0.25 0.25 0.5 0.75 20 40 60 80 100 0.4 0.5 0.6 0.7 0.8 0.9
  • 11. Tone generator: compute weights • Determine reservoir-to-output weights such that training output is optimally reconstituted from internal "echo" signals. 20 40 60 80 100 0.25 0.35 0.4 0.45 0.5 20 40 60 80 100 0.1 0.2 0.3 0.4 20 40 60 80 100 -0.5 -0.4 -0.3 -0.2 -0.1 20 40 60 80 100 -0.8 -0.6 -0.4 -0.2 20 40 60 80 100 -0.75 -0.5 -0.25 0.25 0.5 0.75 20 40 60 80 100 0.4 0.5 0.6 0.7 0.8 0.9
  • 12. Tone generator: exploitation • With new output weights in place, drive trained network with input. • Observation: network continues to function as in training. – internal states reflect input and output – output is reconstituted from internal states • internal states and output create each other 20 40 60 80 100 0.25 0.35 0.4 0.45 0.5 20 40 60 80 100 0.1 0.2 0.3 0.4 20 40 60 80 100 -0.5 -0.4 -0.3 -0.2 -0.1 20 40 60 80 100 -0.8 -0.6 -0.4 -0.2 20 40 60 80 100 -0.75 -0.5 -0.25 0.25 0.5 0.75 20 40 60 80 100 0.4 0.5 0.6 0.7 0.8 0.9 echo reconstitute
  • 13. Tone generator: generalization The trained generator network also works with input different from training input 50 100 150 200 0.1 0.2 0.3 0.4 50 100 150 200 0.25 0.35 0.4 0.45 0.5 200 0.6 0.7 0.8 0.9 200 - 0.6 - 0.5 - 0.4 - 0.3 200 - 0.7 - 0.6 - 0.5 - 0.4 200 - 0.75 - 0.5 - 0.25 0.25 0.5 0.75 A. step input B. teacher and learned output C. some internal states
  • 14. Dynamical reservoir • large recurrent network (100 -  units) • works as "dynamical reservoir", "echo chamber" • units in DR respond differently to excitation • output units combine different internal dynamics into desired dynamics ... ... input units output units recurrent "dynamical reservoir"
  • 15. Rich excited dynamics ... 10 20 30 40 50 -0.8 -0.6 -0.4 -0.2 0.2 0.4 10 20 30 40 50 -0.2 0.2 0.4 0.6 10 20 30 40 50 -0.2 0.2 0.4 10 20 30 40 50 -0.4 -0.2 0.2 0.4 0.6 10 20 30 40 50 0.2 0.4 0.6 0.8 1 excitation responses Unit impulse responses should vary greatly. Achieve this by, e.g., • inhomogeneous connectivity • random weights • different time constants • ...
  • 16. Notation and Update Rules )) ( ), 1 ( ), 1 ( ( ( ) 1 ( )) ( ) ( ) 1 ( ( ) 1 ( ) ( ), ( ), ( ), ( ))' ( ),..., ( ( ) ( ))' ( ),..., ( ( ) ( ))' ( ),..., ( ( ) ( 1 1 1 n y n x n u W f n y n y W n Wx n u W f n x w W w W w W w W n y n y n y n x n x n x n u n u n u out out back in back ij back out ij out ij in ij in L N K                
  • 17. Learning: basic idea Every stationary deterministic dynamical system can be defined by an equation like     ), 2 ( ), 1 ( , ), 1 ( ), ( ) (     t d t d t u t u h t d where the system function h might be a monster. Combine h from the I/O echo functions by selecting suitable DR-to-output weights :       i i i i i i t y t u h w t x w t y t d ),...) 1 ( ),..., ( ( ) ( ) ( ) ( i w ) (t xi i w ) (t y ) (t u
  • 18. Offline training: task definition   i i i t x w t y ) ( ) ( Let be the teacher output. . ) (t d Compute weights such that mean square error     ] ) ( ) ( [ ] ) ( ) ( [ 2 2     t x w t d E t y t d E i i is minimized. Recall
  • 19. Offline training: how it works 1. Let network run with training signal teacher-forced. 2. During this run, collect network states , in matrix M 3. Compute weights , such that ) (t d ) (t xi i w   ] ) ( ) ( [ 2   t x w t d E i i is minimized MSE minimizing weight computation (step 3) is a standard operation. Many efficient implementations available, offline/constructive and online/adaptive. T M w 1  
  • 20. Practical Considerations back in W W W , , Chosen randomly • Spectral radius of W < 1 • W should be sparse • Input and feedback weights have to be scaled “appropriately” • Adding noise in the update rule can increase generalization performance
  • 21. Echo state network training, summary • use large recurrent network as "excitable dynamical reservoir (DR)" • DR is not modified through learning • adapt only DR output weights • thereby combine desired system function from I/O history echo functions • use any offline or online linear regression algorithm to minimize error ] )) ( ) ( [( 2 t y t d E 
  • 24. Delay line: scheme ... t input: ( ) s t outputs: ( ),..., ( ) s t-d1 s t-dn s t ( ) s t-d ( 1) s t-dn ( ) ...
  • 25. Delay line: example • Network size 400 • Delays: 1, 30, 60, 70, 80, 90, 100, 103, 106, 120 steps • Training sequence length N = 2000 10 20 30 40 50 -0.4 -0.2 0.2 0.4 training signal: random walk with resting states
  • 26. 10 20 30 40 50 -0.4 -0.2 0.2 0.4 10 20 30 40 50 -0.4 -0.2 0.2 0.4 10 20 30 40 50 -0.4 -0.2 0.2 0.4 10 20 30 40 50 -0.4 -0.2 0.2 0.4 10 20 30 40 50 -0.2 0.2 0.4 10 20 30 40 50 -0.4 -0.2 0.2 0.4 10 20 30 40 50 -0.4 -0.2 0.2 0.4 10 20 30 40 50 -0.4 -0.2 0.2 0.4 results correct delayed signals ( ) and network outputs ( ) -1 -30 -60 -90 -100 -103 -106 -120 traces of some DR internal units 1020304050 -0.001 -0.0005 0.0005 0.001 0.0015 1020304050 -0.002 -0.0015 -0.001 -0.0005 0.0005 0.001 0.0015 1020304050 -0.0004 -0.0002 0.0002 0.0004 1020304050 -0.0015 -0.001 -0.0005 0.0005 0.001
  • 27. 10 20 30 40 50 -0.2 -0.1 0.1 0.2 0.3 0.4 10 20 30 40 50 -0.2 -0.1 0.1 0.2 10 20 30 40 50 -0.3 -0.2 -0.1 0.1 0.2 10 20 30 40 50 -0.2 -0.1 0.1 0.2 10 20 30 40 50 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.4 10 20 30 40 50 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.4 10 20 30 40 50 0.1 0.2 0.3 0.4 10 20 30 40 50 -0.2 -0.1 0.1 0.2 0.3 0.4 Delay line: test with different input correct delayed signals ( ) and network outputs ( ) -1 -30 -60 -90 -100 -103 -106 -120 traces of some DR internal units 1020304050 -0.001 -0.0005 0.0005 0.001 1020304050 -0.0015 -0.001 -0.0005 0.0005 0.001 0.0015 1020304050 -0.0004 -0.0002 0.0002 0.0004 0.0006 1020 304050 -0.0005 -0.00025 0.00025 0.0005 0.00075 0.001
  • 29. Identifying higher-order nonlinear systems          9 ,... 0 1 . 0 ) 9 ( ) ( 5 . 1 ) ( ) ( 05 . 0 ) ( 3 . 0 ) 1 ( k n u n u k n y n y n y n y A tenth-order system ... 20 40 60 80 100 0.1 0.2 0.3 0.4 20 40 60 80 100 0.3 0.4 0.5 0.6 Training setup ) (n y ) (n u
  • 30. Results: offline learning augmented ESN (800 Parameters) : NMSEtest = 0.006 previous published state of the art1): NMSEtrain = 0.24 D. Prokhorov, pers. communication2): NMSEtest = 0.004 50 100 150 200 0.3 0.4 0.5 0.6 0.7 1) Atiya & Parlos (2000), IEEE Trans. Neural Networks 11(3), 697-708 2) EKF-RNN, 30 units, 1000 Parameters.
  • 31. The Mackey-Glass equation • delay differential equation • delay t > 16.8: chaotic • benchmark for time series prediction ) ( 1 . 0 ) ) ( 1 ( / ) ( 2 . 0 ) ( 10 t x t x t x t x  t   t    50 100 150 200 250 300 0.4 0.6 0.8 1.2 50 100 150 200 250 300 0.6 0.8 1.2 0.4 0.6 0.8 1.2 0.4 0.6 0.8 1.2 0.4 0.6 0.8 1.2 0.4 0.6 0.8 1.2 t = 17 t = 30
  • 32. Learning setup • network size 1000 • training sequence N = 3000 • sampling rate 1 50 100 150 200 250 300 0.6 0.8 1.2
  • 33. Results for t = 17 Error for 84-step prediction: NRMSE = 1E-4.2 (averaged over 100 training runs on independently created data) With refined training method: NRMSE = 1E-5.1 previous best: NRMSE = 1E-1.7 original 0.4 0.6 0.8 1.2 0.4 0.6 0.8 1.2 0.4 0.6 0.8 1.2 0.4 0.6 0.8 1.2 learnt model
  • 34. Prediction with model visible discrepancy after about 1500 steps . . . . . .
  • 35. Comparison: NRMSE for 84-step prediction -1.2 -1.2 -1.3 -1.3 -1.7 -1.7 -4.2 -5.1 -1.7 ESN (refined) -5.1 ESN (1+2 K) -4.2 PCR Local Model (McNames 99, 2 K) -1.7 SOM (Vesanto 97, 3K) -1.7 DCS-LLM (Chudy & Farkas 98, 3K) -1.7 AMB (Bersini et al 98, ? K) * -1.3 Neural Gaz (Martinez et al 93, ~4K) -1.3 EPNet (Yao & Liu 97, 0.5 K) -1.2 BPNN (Lapedes & Farber 87, ? K) * -1.2 *) data from survey in Gers / Eck /Schmidhuber 2000 log10(NRMSE)
  • 37. Dynamic pattern detection1) Training signal: output jumps to 1 after occurence of pattern instance in input ) (n y ) (n u 1) see GMD Report Nr 152 for detailed coverage
  • 38. Single-instance patterns, training setup 1. A single-instance, 10- step pattern is randomly fixed 4 6 8 10 -0.4 -0.2 0.2 0.4 2. It is inserted into 500- step random signal at positions 200 (for training) 350, 400, 450, 500 (for testing) 3. 100-unit ESN trained on first 300 steps (single positive instance! "single shot learning), tested on remaining 200 steps 50 100 150 200 -0.4 -0.2 0.2 0.4 test data: 200 steps with 4 occurances of pattern on random background, desired output: red impulses the pattern
  • 39. 50 100 150 200 -0.75 -0.5 -0.25 0.25 0.5 0.75 1 50 100 150 200 -0.02 0.02 0.04 0.06 0.08 0.1 50 100 150 200 -0.05 0.05 0.1 0.15 0.2 0.25 0.3 Single-instance patterns, results 1. trained network response on test data 50 100 150 200 -0.1 0.1 0.2 0.3 2. network response after training 800 more pattern- free steps ("negative examples") 3. like 2., but 5 positive examples in training data DR: 12.4 DR: 12.1 DR: 6.4 4. comparison: optimal linear filter DR: 3.5 discrimination ratio DR: )] ( [ / )] ( [ 2 2   n d E n d E
  • 40. Event detection for robots (joint work with J.Hertzberg & F. Schönherr) Robot runs through office environment, experiences data streams (27 channels) like... 10 sec infrared distance sensor left motor speed activation of "goThruDoor" external teacher signal, marking event category
  • 41. Learning setup ... ... 27 (raw) data channels unlimited number of event detector channels 100 unit RNN • simulated robot (rich simulation) • training run spans 15 simulated minutes • event categories like • pass through door • pass by 90° corner • pass by smooth corner
  • 42. Results • easy to train event hypothesis signals • "boolean" categories possible • single-shot learning possible
  • 43. Network setup in training ... ... _ a z 29 input channels code symbols . . . 29 output channels for next symbol hypotheses 400 units
  • 44. Trained network in "text" generation ... ... decision mechanism, e.g. winner-take-all !! winning symbol is next input
  • 45. Results Selection by random draw according to output yth_upsghteyshhfakeofw_io,l_yodoinglle_d_upeiuttytyr_hsymua_doey_sa mmusos_trll,t.krpuflvek_hwiblhooslolyoe,_wtheble_ft_a_gimllveteud_ ... Winner-take-all selection sdear_oh,_grandmamma,_who_will_go_and_the_wolf_said_the_wolf_said _the_wolf_said_the_wolf_said_the_wolf_said_the_wolf_said_the_wolf ...
  • 47. 4.2 Multiple timescales 4.3 Additive mixtures of dynamics 4.4 "Switching" memory 4.5 High-dimensional dynamics
  • 48. Multiple time scales This is hard to learn (Laser benchmark time series): 100 200 300 400 500 -1 -0.8 -0.6 -0.4 -0.2 0.2 0.4 Reason: 2 widely separated time scales Approach for future research: ESNs with different time constants in their units
  • 49. Additive dynamics This proved impossible to learn: Reason: requires 2 independent oscillators; but in ESN all dynamics are mutually coupled. Approach for future research: modular ESNs and unsupervised multiple expert learning 50 100 150 200 250 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 ) 311 . 0 Sin( ) 2 . 0 Sin( ) ( n n n y  
  • 50. "Switching" memory This FSA has long memory "switches": Generating such sequences not possible with monotonic, area- bounded forgetting curves! a a b c baaa....aaacaaa...aaabaaa...aaacaaa...aaa... bounded area unbounded width An ESN simply is not a model for long-term memory!
  • 51. High-dimensional dynamics High-dimensional dynamics would require very large ESN. Example: 6-DOF nonstationary time series one-step prediction 200-unit ESN: RMS = 0.2; 400-unit network: RMS = 0.1; best other training technique1): RMS = 0.02 Approach for future research: task-specific optimization of ESN 100 200 300 400 500 600 0.2 0.4 0.6 0.8 f eu du u cu bu au n y       2 1 2 1 2 2 2 1 ) ( 1)Prokhorov et al, extended Kalman filtering BPPT. Network size 40, 1400 trained links, training time 3 weeks
  • 52. Spreading trouble... • Signals xi(n) of reservoir can be interpreted as vectors in (infinite-dimensional) signal space • Correlation E[xy] yields inner product < x, y > on this space • Output signal y(n) is linear combination of these xi(n) • The more orthogonal the xi(n), the smaller the output weights: y y x1 x2 x2 x1 y = 30 x1  28 x2 y = 0.5 x1  0.7 x2
  • 53. • Eigenvectors vk of correlation matrix R = (E[xi x j] ) are orthogonal signals • Eigenvalues lk indicate what "mass" of reservoir signals xi (all together) is aligned with vk • Eigenvalue spread l max/ l min indicates overall "non- orthogonality" of reservoir signals vmax x1 x2 x2 x1 vmin vmax vmin l max/ l min  20 l max/ l min  1
  • 54. Large eigenvalue spread  large output weights ... • harmful for generalization, because slight changes in reservoir signals will induce large changes in output • harmful for model accuracy, because estimation error contained in reservoir signals is magnified (applies not to deterministic systems) • renders LMS online adaptive learning useless vmax x1 x2 vmin l max/ l min  20
  • 55. Summary • Basic idea: dynamical reservoir of echo states + supervised teaching of output connections. • Seemed difficult: in nonlinear coupled systems, every variable interacts with every other. BUT seen the other way round, every variable rules and echoes every other. Exploit this for local learning and local system analysis. • Echo states shape the tool for the solution from the task.
  • 57. References • H. Jaeger (2002): Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. GMD Report 159, German National Research Center for Information Technology, 2002 • Slides used by Herbert Jaeger at IK2002