Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Recurrent Neural Networks
Viacheslav Khomenko, Ph.D.
Contents
 Recap: feed-forward artificial neural network
 Temporal dependencies
 Recurrent neural network architectures
...
RECAP: FEED-FORWARD
ARTIFICIAL NEURAL
NETWORK
Feed-forward network
W. McCulloch and W. Pitts , 1940s Abstract mathematical model of a brain cell
Perceptron for classifi...
Feed-forward network
Decisions are based on current inputs:
• No memory about the past
• No future scope
A 𝒚x A
Input laye...
TEMPORAL
DEPENDENCIES
Temporal dependencies
Analyzing temporal dependencies
Frame 0 Frame 1 Frame 2 Frame 3 Frame 4
P(Iris): 0.1
P(¬Iris): 0.9
P...
For each state
Reber Grammar
Synthetic problem that can not be solved without memory.
Learn to predict
next possible edges...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Word
Current node Possible paths
Begin 1 2 3 4 5 6 1 2 3 4 5 6 End
B
Step
0 1 0 0 0 0 0 0
Step
0 1 0 0 0 0 0 0
P 1 0 1 0 0...
Memory is important → Reasoning relies on
experience
Pro: Dependencies between features at different timestamps
Cons:
• Limited history of the input (< 10 timestamps)
• Delay ...
RECURRENT NEURAL
NETWORK
ARCHITECTURES
But… not working because not stable!
Simple recurrence:
feed-back output to input
Naïve attempts…
Lack of the feedback con...
A
A
∑
𝒚(𝒕)x(t)
A
Input layer Hidden layer Output layer
A
Context layer
Pro: Fast to train because can be parallelized in t...
J.L. Elman, 1990
Often referenced as the basic RNN structure
and called “Vanilla” RNN
• Should see complete sequence to be...
𝑾𝑖ℎ Weight matrix from input to hidden
𝑾 𝑜 Weight matrix from hidden to output
𝒙 𝑡 Input (feature) vector at time t
𝒚 𝑡 Ne...
Unfolding the network in time
Vanilla RNN
𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃
𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡
RNN TRAINING
Backpropagation: • Reliable and controlled convergence
• Supported by most of ML frameworks
Evolutionary methods, expectat...
1. Unfold the network.
2. Repeat for the train data:
1. Given some input sequence 𝒙
2. For t in 0, N-1:
1. Forward-propaga...
Apply chain rule:
Back-propagation through time
𝜕𝑬 𝟐
𝜕𝜽
=
𝑘=0
2
𝜕𝑬 𝟐
𝜕 𝒚 𝟐
∙
𝜕 𝒚 𝟐
𝜕𝒉 𝟐
∙
𝜕𝒉 𝟐
𝜕𝒉 𝒌
∙
𝜕𝒉 𝒌
𝜕𝜽
𝜽 - Network ...
Back-propagation through time
Back-propagation through time
Back-propagation through time
Back-propagation through time
Back-propagation through time
Back-propagation through time
Saturation
Gradient
close to 0
Saturated neurons gradients → 0
• Smaller weigh parameters leads to faster gradients vanish...
Network can not converge and
weigh parameters do not stabilize
Diagnostics: NaNs; Cost function large fluctuations
Large i...
Deep networks train difficulties:
• Vanishing gradient
• Exploding gradient
Possible solutions:
• One of the previously pr...
NEW RNN ARCHITECTURES
Echo State
Network Readout
Only readout
neurons are
trained!
Herbert Jaeger, 2001
In practice:
• Easy to over-fit
(models ...
Liquid state
machine
Similar to ESN, but using more
biological plausible neuron models
→ spiking (dynamic) neurons
In prac...
• No Input Gate
• No Forget Gate
• No Output Gate
• No Input Activation
Function
• No Output
Activation Function
• No Peep...
Has context in both directions, at any timestamp
Bidirectional RNN
Last-1 output = First+1 output
BPXXXXXPE
BTXXXXXXXXTE
Testing capacity to
maintain long term
dependencies
Correct cases
BT...
PRACTICAL
CONSIDERATIONS
Masking input (output)
Input (output) has variable length
Data batch
Length of input ≠ length of output
•CTC loss function•Encoder-decoder architecture
Transform the network outputs into a
co...
NEURAL MODELS FOR
LOCOMOTION
Locomotion principles in nature
[S.Roland et al., 2004]
Locomotion: movement or
the ability to move from
one place to anot...
Wheeled on soft
ground
[S.Roland et al. 2004 ]
Locomotion efficiency
Nature: no “pure” wheeled locomotion
Reason: variety of surfaces, rough terrain, adaptation is necessary
Biological locomo...
• Gait control is on “automatic pilot”
• Automatic gait is energy efficient
• Perturbation introduces modification
Not ful...
Complexity of the phenomena involved in motor control
Central Nervous
System
Motor Nervous
System
Neuromuscular
Junction
...
MU aggregates muscular fibers
innervated by the common
motor neuron. Contraction of
these fibers is thus
simultaneous.
Mot...
Central Pattern Generator
• Automatic activity is controlled by spinal centers
• CPG (Central Pattern Generator) is a grou...
Sensory-motor architecture for locomotion
[McCrea 2006]
Biological sensory-motor architecture
models
Muscular contraction is put in place during embryonic life or after the birth
• Insects can walk directly upon
birth
• Mos...
Mathematical modeling of CPG
[J. Nassour et al.
2010]
[P.F. Rowat,
A.I. Selverston
1997]
CPG approximation Limit cycle behavior
Gait Matrix
Coupling different CPG
Sensory feedback
Mathematical modeling of CPG
Ho...
Neural controllers
CPG of tronc
ipsilateral
And
Contralateral
Connections
Matsuoka model
Neural based CPG controller for b...
With couplingTemporal evaluation of frequency components of the
sagittal acceleration of the robot’s pelvis
• Automaticall...
APPLICATION OF
RECURRENT NEURAL
NETWORKS
• Human-computer interaction
– Speech and handwriting recognition
– Music composition
– Activity recognition
• Identificat...
Upcoming SlideShare
Loading in …5
×

Recurrent neural networks

5,948 views

Published on

1) Recap: feed-forward artificial neural network
2) Temporal dependencies
3) Recurrent neural network architectures
4) RNN training
5) New RNN architectures
6) Practical considerations
7) Neural models for locomotion
8) Application of RNNs

Published in: Engineering

Recurrent neural networks

  1. 1. Recurrent Neural Networks Viacheslav Khomenko, Ph.D.
  2. 2. Contents  Recap: feed-forward artificial neural network  Temporal dependencies  Recurrent neural network architectures  RNN training  New RNN architectures  Practical considerations  Neural models for locomotion  Application of RNNs
  3. 3. RECAP: FEED-FORWARD ARTIFICIAL NEURAL NETWORK
  4. 4. Feed-forward network W. McCulloch and W. Pitts , 1940s Abstract mathematical model of a brain cell Perceptron for classificationF. Rosenblatt, 1958 Multi-layer artificial neural networkP. Werbos, 1975 Input Features Input Input Input Petals Sepal Yellow patch VeinsIris flower Input layer Hidden layer(s) Output layer Hid- den Hid- den Hid- den Out- put Iris Out- put ¬Iris Decisions
  5. 5. Feed-forward network Decisions are based on current inputs: • No memory about the past • No future scope A 𝒚x A Input layer Hidden layer(s) Output layer A Input Decision output Simplified representation: Vector of input features: Vector of predicted values: x 𝒚 Neural activation: A – some activation function (tanh etc…) 𝑤, 𝑏 – network parameters
  6. 6. TEMPORAL DEPENDENCIES
  7. 7. Temporal dependencies Analyzing temporal dependencies Frame 0 Frame 1 Frame 2 Frame 3 Frame 4 P(Iris): 0.1 P(¬Iris): 0.9 P(Iris): 0.11 P(¬Iris): 0.89 P(Iris): 0.2 P(¬Iris): 0.8 P(Iris): 0.45 P(¬Iris): 0.55 P(Iris): 0.9 P(¬Iris): 0.1 Decision on sequence of observations Improved decisions Stem: seen Petals: hidden Stem: seen Petals: hidden Stem: seen Petals: partial Stem: partial Petals: partial Stem: hidden Petals: seen
  8. 8. For each state Reber Grammar Synthetic problem that can not be solved without memory. Learn to predict next possible edges Transitions have equal probabilities: P(1→2) = P(1→3) = 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 States (nodes) Transitions (edges)
  9. 9. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  10. 10. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  11. 11. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  12. 12. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  13. 13. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  14. 14. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  15. 15. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  16. 16. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  17. 17. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  18. 18. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar
  19. 19. Word Current node Possible paths Begin 1 2 3 4 5 6 1 2 3 4 5 6 End B Step 0 1 0 0 0 0 0 0 Step 0 1 0 0 0 0 0 0 P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0 T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0 T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0 T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0 T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0 V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0 P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0 X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0 T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0 T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0 T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0 V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0 V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0 E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 Reber Grammar Input vector x at time t = 2 Output vector y at time t = 2
  20. 20. Memory is important → Reasoning relies on experience
  21. 21. Pro: Dependencies between features at different timestamps Cons: • Limited history of the input (< 10 timestamps) • Delay values should be set explicitly • Not general, can not solve complex tasks (such as Reber Grammar) • FFNN with delayed inputs • No internal state Time-delay neural network Input Features Input Input Input Input layer Hidden layer Output layer Hid- den Hid- den Hid- den Out- put 𝒚(𝒕) x(t) x(t-1) x(t-2) x(t-3) delay delay delay
  22. 22. RECURRENT NEURAL NETWORK ARCHITECTURES
  23. 23. But… not working because not stable! Simple recurrence: feed-back output to input Naïve attempts… Lack of the feedback control A∑ 𝒚(𝒕)x(t) A Input layer Hidden layer Output layer A Input Decision output Past output state 1 step delay Expected 𝒚 𝒚 Obtained 𝒚 𝒚 Introducing recurrence
  24. 24. A A ∑ 𝒚(𝒕)x(t) A Input layer Hidden layer Output layer A Context layer Pro: Fast to train because can be parallelized in time Cons: • Output transforms hidden state → nonlinear effects, information distorted • The output dimension may be too small → information in hidden states is truncated M.I. Jordan, 1986 1 step delay Jordan recurrent network Limited short-term memory Output-to hidden connections
  25. 25. J.L. Elman, 1990 Often referenced as the basic RNN structure and called “Vanilla” RNN • Should see complete sequence to be trained • Can not be parallelized by timestamps • Has some important training difficulties…. A A ∑ 𝒚(𝒕)x(t) A Input layer Hidden layer Output layer A Context layer 1 step delay Hidden-to hidden connections make system Turing-complete Elman recurrent network
  26. 26. 𝑾𝑖ℎ Weight matrix from input to hidden 𝑾 𝑜 Weight matrix from hidden to output 𝒙 𝑡 Input (feature) vector at time t 𝒚 𝑡 Network output vector at time t 𝒉 𝑡 Network internal (hidden) states vector at time t 𝑼 Weight matrix from hidden to hidden 𝒃 Bias parameter vector 𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃 𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡 Vanilla RNN
  27. 27. Unfolding the network in time Vanilla RNN 𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃 𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡
  28. 28. RNN TRAINING
  29. 29. Backpropagation: • Reliable and controlled convergence • Supported by most of ML frameworks Evolutionary methods, expectation maximization, non-parametric methods, particle swarm optimization Target: obtain the network parameters that optimize the cost function Cost functions: log loss, mean squared root error etc… Tasks: Methods: • For each timestamp of the input sequence x predict output y (synchronously) • For the input sequence x predict the scalar value of y (e.g., at end of sequence) • For the input sequence x of length Lx generate the output sequence y of different length Ly Research RNN training
  30. 30. 1. Unfold the network. 2. Repeat for the train data: 1. Given some input sequence 𝒙 2. For t in 0, N-1: 1. Forward-propagate 2. Initialize hidden state to the past value 𝒉 𝑡−1 3. Obtain output sequence 𝒚 4. Calculate error 𝑬 𝒚, 𝒚 5. Back-propagate error across the unfolded network 6. Average the weights 7. Compute next hidden state value 𝒉 𝑡 𝒉 𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙 𝑡 + 𝑼 ∙ 𝒉 𝑡−1 +𝒃 𝒚 𝑡 = 𝜎 𝑾 𝑜 ∙ 𝒉 𝑡 𝑬 𝒚, 𝒚 = − 𝑡 𝒚 𝒕 lg 𝒚 𝒕 E.g., cross entropy loss: Back-propagation through time
  31. 31. Apply chain rule: Back-propagation through time 𝜕𝑬 𝟐 𝜕𝜽 = 𝑘=0 2 𝜕𝑬 𝟐 𝜕 𝒚 𝟐 ∙ 𝜕 𝒚 𝟐 𝜕𝒉 𝟐 ∙ 𝜕𝒉 𝟐 𝜕𝒉 𝒌 ∙ 𝜕𝒉 𝒌 𝜕𝜽 𝜽 - Network parametersFor time 2: 𝜕𝒉 𝟐 𝜕𝒉 𝟎 = 𝜕𝒉 𝟐 𝜕𝒉 𝟏 ∙ 𝜕𝒉 𝟏 𝜕𝒉 𝟎
  32. 32. Back-propagation through time
  33. 33. Back-propagation through time
  34. 34. Back-propagation through time
  35. 35. Back-propagation through time
  36. 36. Back-propagation through time
  37. 37. Back-propagation through time
  38. 38. Saturation Gradient close to 0 Saturated neurons gradients → 0 • Smaller weigh parameters leads to faster gradients vanishing. • Very big initial parameters make the gradient descent to diverge fast (explode). Drive previous layers gradients to 0 (especially for far time-stamps) Known problem for deep feed-forward networks. For recurrent networks (even shallow) makes impossible to learn long-term dependencies! 𝝏𝒉 𝑡 𝝏𝒉0 = 𝝏𝒉 𝑡 𝝏𝒉 𝑡−1 ∙ ⋯ ∙ 𝝏𝒉3 𝝏𝒉2 ∙ 𝝏𝒉2 𝝏𝒉1 ∙ 𝝏𝒉1 𝝏𝒉0 • Decays exponentially • Network stops learning, can’t update • Impossible to learn correlations • between temporally distant events Problem: vanishing gradients
  39. 39. Network can not converge and weigh parameters do not stabilize Diagnostics: NaNs; Cost function large fluctuations Large increase in the norm of the gradient during training Pascanou R. et al, On the difficulty of training recurrent neural networks. arXiv (2012) Problem: exploding gradients Solutions: • Use gradient clipping • Try reduce learning rate • Change loss function by setting constrains on weights (L1/L2 norms)
  40. 40. Deep networks train difficulties: • Vanishing gradient • Exploding gradient Possible solutions: • One of the previously proposed solutions or • Use unsupervised pre-training → difficult to implement, sometimes the unsupervised solution differs much from the supervised or • Improve network architecture! Fundamental deep learning problem
  41. 41. NEW RNN ARCHITECTURES
  42. 42. Echo State Network Readout Only readout neurons are trained! Herbert Jaeger, 2001 In practice: • Easy to over-fit (models learns by heart) – gives good results on the train data only • The reservoir hyper- parameters optimization is not evident Reservoir computing
  43. 43. Liquid state machine Similar to ESN, but using more biological plausible neuron models → spiking (dynamic) neurons In practice: • Still, more a research area • Requires special hardware to be computationally efficient Daniel Brunner Tal Dahan and Astar Sade Reservoir computing
  44. 44. • No Input Gate • No Forget Gate • No Output Gate • No Input Activation Function • No Output Activation Function • No Peepholes • Coupled Input and • Forget Gate • Full Gate Recurrence Variants S. Hochreiter & J. Schmidhuber, 1997 Long short-term memory Due to gaining routing mechanism, can be efficiently trained to learn LONG-TERM dependencies
  45. 45. Has context in both directions, at any timestamp Bidirectional RNN
  46. 46. Last-1 output = First+1 output BPXXXXXPE BTXXXXXXXXTE Testing capacity to maintain long term dependencies Correct cases BT ….. TE BP ….. PE Incorrect cases BT ….. PE BP ….. TE System must be able to learn to compare First+1 symbol with Last-1 symbol Embedded Reber Grammar
  47. 47. PRACTICAL CONSIDERATIONS
  48. 48. Masking input (output) Input (output) has variable length Data batch
  49. 49. Length of input ≠ length of output •CTC loss function•Encoder-decoder architecture Transform the network outputs into a conditional probability distribution over label sequences - C - A - T - - BLANK labelling Result decoding Raw output: -----CCCC---AA-TTTT--- 1) Remove repeating symbols: -C-A-T- 2) Remove blanks: CAT
  50. 50. NEURAL MODELS FOR LOCOMOTION
  51. 51. Locomotion principles in nature [S.Roland et al., 2004] Locomotion: movement or the ability to move from one place to another Manipulation ≠ Locomotion Aperiodic series of motions Stable Periodic motion gaits Quasi stable [A. Ijspeert et al., 2007]
  52. 52. Wheeled on soft ground [S.Roland et al. 2004 ] Locomotion efficiency
  53. 53. Nature: no “pure” wheeled locomotion Reason: variety of surfaces, rough terrain, adaptation is necessary Biological locomotion exploits patterns The number of legs influences • Mechanical complexity • Control complexity • Generated patterns (for 6 legs N = (2k-1)! = 11! = 39 916 800 ) [S.Roland 2004] Locomotion efficiency
  54. 54. • Gait control is on “automatic pilot” • Automatic gait is energy efficient • Perturbation introduces modification Not fully nature way (weak adaptation, no decisions) How the nature deals with locomotion? - Initiate motion by putting energy - Passive stage - Generate - Control for stability - Repeat - Brain? - Nervous system? - Spinal cord? Inconceivable automation
  55. 55. Complexity of the phenomena involved in motor control Central Nervous System Motor Nervous System Neuromuscular Junction  Models of musculoskeletal system …  Models of Motor Nervous System Extrait: Univ du Québec-etsmtl (cours) Extrait: collège de France ( L. Damn) Extrait: Univ. Paris 8- cours Licence L.612 Spinal cord [P. Hénaff 2013] Biological motor control
  56. 56. MU aggregates muscular fibers innervated by the common motor neuron. Contraction of these fibers is thus simultaneous. Motor unit Sensory nerve Motor nerve Dorsal root Posterior horn Anterior horn Ventral root Nervo-muscular fiber Reflexes: pathways Muscle contraction as a response to its own elongation Muscle contraction as a response to external stimuli [P. Hénaff 2013]
  57. 57. Central Pattern Generator • Automatic activity is controlled by spinal centers • CPG (Central Pattern Generator) is a group of synaptic connections to generate rhythmic motions • The spinal pattern-generating networks do not require sensory input but nevertheless are strongly regulated by input from limb proprioceptors
  58. 58. Sensory-motor architecture for locomotion [McCrea 2006] Biological sensory-motor architecture models
  59. 59. Muscular contraction is put in place during embryonic life or after the birth • Insects can walk directly upon birth • Most mammals require several minutes to stand • Humans require more than a year to walk on two legs How learning occurs [ejjack2]
  60. 60. Mathematical modeling of CPG [J. Nassour et al. 2010] [P.F. Rowat, A.I. Selverston 1997]
  61. 61. CPG approximation Limit cycle behavior Gait Matrix Coupling different CPG Sensory feedback Mathematical modeling of CPG Hopf oscillator
  62. 62. Neural controllers CPG of tronc ipsilateral And Contralateral Connections Matsuoka model Neural based CPG controller for biped locomotion [Taga 1995] Neural controller • 1 CPG per joint • 2 coupled neurons per CPG • Inhibitions: contra and ipsi latéral • sensori motricity Intégration Extrait de Taga 1995 (Biol. Cyb.) Internal coupling of the network Articular sensory inputs: speeds, forces, contact ground Model of Neuron i (Matsuoka 1985) [P. Hénaff 2013]
  63. 63. With couplingTemporal evaluation of frequency components of the sagittal acceleration of the robot’s pelvis • Automatically determines robot’s natural frequencies • Continuously adapts to evolution of defects Phase portraits of the oscillator Without coupling Learning Synchronous Compensation of articulation defects ROBIAN LISV, UVSQ ROBIAN LISV, UVSQ [V.Khomenko, 2013, LISV, UVSQ, France]
  64. 64. APPLICATION OF RECURRENT NEURAL NETWORKS
  65. 65. • Human-computer interaction – Speech and handwriting recognition – Music composition – Activity recognition • Identification and control – Identification and control of dynamic systems by learning – Biologically inspired robotics for adaptive locomotion – Study of biological pattern structures forming and evaluation Application of RNNs

×