1
Jungsik Hwang1,2 and Jun Tani2
1Korea Advanced Institute of Science and Technology, Korea
2Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Japan
Presented at the 28th Annual Conference of the Japanese Neural Network Society (JNNS2018)
Oct 26th, 2018
Visuomotor Associative Learning
under the Predictive Coding Framework
: a Neuro-robotics Experiment
2
Research Objectives
Build a Cognitive Agent which can
• Develop cognitive functions autonomously
Embodiment
“Learning from sensorimotor experience”
acquired from dynamic interaction with the world
Prediction
“Brain = a Prediction Machine”
Obtaining sensorimotor
experience
By showing a robot
how to do it.
Then, we make the robot
learn from
this experience.
Learning
from Demonstration
Experience consists of
• Visual images (Vision)
• Joint position values (Proprioception)
How can a robot
learn from experience?
How can a robot
learn from experience?
Dynamic Neural Network
Sensorimotor Experience
6
Predictive Visuo-Motor Dynamic Neural Network (P-VMDNN)
Proposed Model
“Learning to predict
sensorimotor signals
simultaneously
in an end-to-end manner”
7
• Predicts pixel-level dynamic visual images
– P-MSTRNN: Predictive - Multiple Spatio-Temporal Scales RNN
Proposed Neural Network Model
Visual Pathway
Example of Visual Prediction
INPUT PREDICTION
8
• Predicts robot’s action (specified as joint positions)
– MTRNN: Multiple Timescales RNN
Proposed Neural Network Model
Proprioceptive Pathway
Example of Action Generation
INPUT PREDICTION
9
• Bi-directional flow of visuomotor information
Proposed Neural Network Model
Lateral Connections between 2 Pathways
“Sensorimotor integration is a
key part of the “intelligence
algorithm” of the neocortex.”
- Jeff Hawkins (2017)
10
• Temporal Hierarchy
– Imposing different constraints on neural activation
Proposed Neural Network Model
Key Characteristics
Smaller
Time Constants
Larger
Time Constants
Fast-changing
Neural Activity
Slowly-changing
Neural Activity
“Emergence of
Functional Hierarchy”
11
• Task: Imitating human gestures
– 9 gestures x 3 human subjects
• Robot Platform
– iCub simulator
– Vision) 64 x 48 grayscale
– Action) 10 DoFs
Experiment Setting
12
Predictive Visuo-Motor Dynamic Neural Network
Key Features
• Processing of Spatio-
Temporal Patterns
• Coupling of Vision &
Proprioception
• Mental Simulation
• Prediction Error
Minimization
MENTAL
SIMULATION
13
14
Mental Simulation of Action
• Mental Simulation
– Ability to imagine probable result of our actions
– Important in social interaction
– Needs “What to simulate”
• In Our Experiment
– Ability to generate visuo-proprioceptive predictions with
given intention
• *Intention: specified as initial states
• They are learnable parameters.
15
Mental Simulation of Action
• Mental Simulation in the Proposed Model
1. Set the “Intention”
• Specified as the initial states
2. Generate Output
• Visual & Proprioceptive predictions
3. Feed Prediction Output into Input
• “Closed-loop Generation”
4. Iterate (2) – (3)
16
Mental Simulation of Action
• Mental Simulation in the Proposed Model
1. Set the “Intention”
• Specified as the initial states
2. Generate Output
• Visual & Proprioceptive predictions
3. Feed Prediction Output into Input
• “Closed-loop Generation”
4. Iterate (2) – (3)
17
Mental Simulation of Action
• Mental Simulation in the Proposed Model
1. Set the “Intention”
• Specified as the initial states
2. Generate Output
• Visual & Proprioceptive predictions
3. Feed Prediction Output into Input
• “Closed-loop Generation”
4. Iterate (2) – (3)
• Without external input from environment
• Only with given intention
Environment
X X
18
Mental Simulation of Action
• Mental Simulation in the Proposed Model
1. Set the “Intention”
• Specified as the initial states
2. Generate Output
• Visual & Proprioceptive predictions
3. Feed Prediction Output into Input
• “Closed-loop Generation”
4. Iterate (2) – (3)
• Without external input from environment
• Only with given intention
Environment
X X
19
• Setting intention states
– at the onset of mental simulation
– Obtained from training
Result
Mental Simulation of Action
20
• With given ‘intention’, the model generated coherent visuo-
proprioceptive patterns
– Imagination without any input from the external world
Result
Mental Simulation of Action
21
• Hierarchical representation of visuo-proprioceptive patterns
– Abstract information at higher-level: Type of gesture
– Specific information at lower-level: Shape of specific human subject
Self-organized Functional Hierarchy
Low-level Representation
(shape of a specific subject)
High-level Representation
(type of the gesture)
Initial States
obtained from Training
B
A
C
PREDICTION ERROR
MINIMIZATION
22
23
• Core of “Predictive Coding”
– Recognizing intention from observation by minimizing
prediction error
– Account for MNS (Mirror Neuron Systems)
• *Mirror Neurons: Activated while executing & observing an action
Prediction Error Minimization
Predictive Coding Framework
Stefanics, et. al., (2014)
24
At each time step
1. Generate Visuo-Prop. Predictions
• From Intention State (Top-Down
Process)
2. Compute Prediction Error
• Difference b/w Predicted &
Observed Patterns
3. Backpropagate Prediction Error &
Update Intention State
• Bottom-Up Process
4. Iterates a Few Times
Prediction Error Minimization
Environment
25
At each time step
1. Generate Visuo-Prop. Predictions
• From Intention State (Top-Down
Process)
2. Compute Prediction Error
• Difference b/w Predicted &
Observed Patterns
3. Backpropagate Prediction Error &
Update Intention State
• Bottom-Up Process
4. Iterates a Few Times
Prediction Error Minimization
Environment
Prediction
26
Observation
At each time step
1. Generate Visuo-Prop. Predictions
• From Intention State (Top-Down
Process)
2. Compute Prediction Error
• Difference b/w Predicted &
Observed Patterns
3. Backpropagate Prediction Error &
Update Intention State
• Bottom-Up Process
4. Iterates a Few Times
Prediction Error Minimization
Environment
PREDICTION
ERROR
Prediction
27
Observation
At each time step
1. Generate Visuo-Prop. Predictions
• From Intention State (Top-Down
Process)
2. Compute Prediction Error
• Difference b/w Predicted &
Observed Patterns
3. Backpropagate Prediction Error &
Update Intention State
• Bottom-Up Process
4. Iterates a Few Times
Prediction Error Minimization
Environment
PREDICTION
ERROR
28
Observation
At each time step
1. Generate Visuo-Prop. Predictions
• From Intention State (Top-Down
Process)
2. Compute Prediction Error
• Difference b/w Predicted &
Observed Patterns
3. Backpropagate Prediction Error &
Update Intention State
• Bottom-Up Process
4. Iterates a Few Times
Prediction Error Minimization
Environment
PREDICTION
ERROR
29
Observation
At each time step
1. Generate Visuo-Prop. Predictions
• From Intention State (Top-Down
Process)
2. Compute Prediction Error
• Difference b/w Predicted &
Observed Patterns
3. Backpropagate Prediction Error &
Update Intention State
• Bottom-Up Process
4. Iterates a Few Times
Prediction Error Minimization
Environment
MINIMIZED
PREDICTION ERROR
“Perception as
an Active Process”
 Solely determined by input
30
Prediction Error Minimization
Minimizing Visual PE
Prediction
Error
Prediction
Error
Minimizing Prop. PE
31
Prediction Error Minimization
Minimizing Visual PE
• Minimizing the difference b/w
– Visual Prediction (predicted gesture)
– Observation (observed gesture)
Prediction
Error
= Difference ( Observed Gesture, Predicted Gesture )
32
Prediction Error Minimization
Minimizing Visual PE
• Minimizing the difference b/w
– Visual Prediction (predicted gesture)
– Observation (observed gesture)
• No External Proprioceptive Signal
– Robot’s action was generated
simultaneously while minimizing Visual PE
Prediction
Error
33
Prediction Error Minimization
Minimizing Visual PE
• With Visual PE Minimization
– Predicted coherent visual &
Proprioceptive patterns  Successful
imitation
• Without Visual PE Minimization
– Did NOT predict Visual & Proprioceptive
patterns  Unsuccessful imitation
Prediction
Error
34
Prediction Error Minimization
Minimizing Prop. PE
• Minimizing the difference b/w
– Prop. Prediction (Predicted joint position)
– Observation (Perceived joint position)
Prediction
Error
Difference ( Observed Jnt Position, Predicted Jnt Postion ) =
PROP. PREDICTION ERROR
35
Prediction Error Minimization
Minimizing Prop. PE
• Minimizing the difference b/w
– Prop. Prediction (Predicted joint position)
– Observation (Perceived joint position)
• No External Visual Target Signal
– Visual Prediction was generated
simultaneously while minimizing Prop. PE
Prediction
Error
PROP. PREDICTION ERROR
36
Prediction Error Minimization
Minimizing Prop. PE
• With Proprioceptive PE Minimization
– Successfully minimized Proprioceptive PE
– Generated corresponding Visual Prediction
(imaginary)
• Without Proprioceptive PE Minimization
– Not able to adapt to incoming Proprioceptive
signal
Prediction
Error
37
Neural Activation while Minimizing Visual Prediction Error
38
Neural Activation while Minimizing Visual Prediction Error
Overlapping
Trajectories
b/w
Training
& Testing
– Inferring intention latent in observed patterns @ Higher-level
– Recalling the corresponding representations @ Lower-level
 Retrieval of missing sensorimotor signals
39
• MNS-like Behavior emerged from
1. Neural connectivity (between two pathways)
2. Learning sensorimotor experience
3. Prediction Error Minimization
• “Predictive Coding Account of MNS” (Kilner, Friston and Frith, 2007)
• “Within predictive coding, recognition of causes is simply the
process of jointly minimizing prediction error at all levels of a
cortical hierarchy.”
Neural Activation while Minimizing Visual Prediction Error
40
Conclusion
Build a Cognitive Agent based on
Embodiment
“Learning from sensorimotor experience”
acquired from dynamic interaction with the world
Prediction
“Brain = a Prediction Machine”
• Complex cognitive behaviors emerged
• Mental simulation, Intention recognition, MNS-like behavior, etc.
• From “Visuo-Motor associative learning under the predictive
coding framework”
Thank you
Photo by Takazumi
J. Hwang, J. Kim, A. Ahmadi, M. Choi and J. Tani, "Dealing With
Large-Scale Spatio-Temporal Patterns in Imitative Interaction
Between a Robot and a Human by Using the Predictive Coding
Framework," in IEEE Transactions on Systems, Man, and
Cybernetics: Systems. doi: 10.1109/TSMC.2018.2791984, 2018
CNRU @ OIST
Please see the following paper for more information.

Visuomotor Associative Learning under the Predictive Coding Framework: a Neuro-robotics Experiment

  • 1.
    1 Jungsik Hwang1,2 andJun Tani2 1Korea Advanced Institute of Science and Technology, Korea 2Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Japan Presented at the 28th Annual Conference of the Japanese Neural Network Society (JNNS2018) Oct 26th, 2018 Visuomotor Associative Learning under the Predictive Coding Framework : a Neuro-robotics Experiment
  • 2.
    2 Research Objectives Build aCognitive Agent which can • Develop cognitive functions autonomously Embodiment “Learning from sensorimotor experience” acquired from dynamic interaction with the world Prediction “Brain = a Prediction Machine”
  • 3.
    Obtaining sensorimotor experience By showinga robot how to do it. Then, we make the robot learn from this experience. Learning from Demonstration
  • 4.
    Experience consists of •Visual images (Vision) • Joint position values (Proprioception) How can a robot learn from experience?
  • 5.
    How can arobot learn from experience? Dynamic Neural Network Sensorimotor Experience
  • 6.
    6 Predictive Visuo-Motor DynamicNeural Network (P-VMDNN) Proposed Model “Learning to predict sensorimotor signals simultaneously in an end-to-end manner”
  • 7.
    7 • Predicts pixel-leveldynamic visual images – P-MSTRNN: Predictive - Multiple Spatio-Temporal Scales RNN Proposed Neural Network Model Visual Pathway Example of Visual Prediction INPUT PREDICTION
  • 8.
    8 • Predicts robot’saction (specified as joint positions) – MTRNN: Multiple Timescales RNN Proposed Neural Network Model Proprioceptive Pathway Example of Action Generation INPUT PREDICTION
  • 9.
    9 • Bi-directional flowof visuomotor information Proposed Neural Network Model Lateral Connections between 2 Pathways “Sensorimotor integration is a key part of the “intelligence algorithm” of the neocortex.” - Jeff Hawkins (2017)
  • 10.
    10 • Temporal Hierarchy –Imposing different constraints on neural activation Proposed Neural Network Model Key Characteristics Smaller Time Constants Larger Time Constants Fast-changing Neural Activity Slowly-changing Neural Activity “Emergence of Functional Hierarchy”
  • 11.
    11 • Task: Imitatinghuman gestures – 9 gestures x 3 human subjects • Robot Platform – iCub simulator – Vision) 64 x 48 grayscale – Action) 10 DoFs Experiment Setting
  • 12.
    12 Predictive Visuo-Motor DynamicNeural Network Key Features • Processing of Spatio- Temporal Patterns • Coupling of Vision & Proprioception • Mental Simulation • Prediction Error Minimization
  • 13.
  • 14.
    14 Mental Simulation ofAction • Mental Simulation – Ability to imagine probable result of our actions – Important in social interaction – Needs “What to simulate” • In Our Experiment – Ability to generate visuo-proprioceptive predictions with given intention • *Intention: specified as initial states • They are learnable parameters.
  • 15.
    15 Mental Simulation ofAction • Mental Simulation in the Proposed Model 1. Set the “Intention” • Specified as the initial states 2. Generate Output • Visual & Proprioceptive predictions 3. Feed Prediction Output into Input • “Closed-loop Generation” 4. Iterate (2) – (3)
  • 16.
    16 Mental Simulation ofAction • Mental Simulation in the Proposed Model 1. Set the “Intention” • Specified as the initial states 2. Generate Output • Visual & Proprioceptive predictions 3. Feed Prediction Output into Input • “Closed-loop Generation” 4. Iterate (2) – (3)
  • 17.
    17 Mental Simulation ofAction • Mental Simulation in the Proposed Model 1. Set the “Intention” • Specified as the initial states 2. Generate Output • Visual & Proprioceptive predictions 3. Feed Prediction Output into Input • “Closed-loop Generation” 4. Iterate (2) – (3) • Without external input from environment • Only with given intention Environment X X
  • 18.
    18 Mental Simulation ofAction • Mental Simulation in the Proposed Model 1. Set the “Intention” • Specified as the initial states 2. Generate Output • Visual & Proprioceptive predictions 3. Feed Prediction Output into Input • “Closed-loop Generation” 4. Iterate (2) – (3) • Without external input from environment • Only with given intention Environment X X
  • 19.
    19 • Setting intentionstates – at the onset of mental simulation – Obtained from training Result Mental Simulation of Action
  • 20.
    20 • With given‘intention’, the model generated coherent visuo- proprioceptive patterns – Imagination without any input from the external world Result Mental Simulation of Action
  • 21.
    21 • Hierarchical representationof visuo-proprioceptive patterns – Abstract information at higher-level: Type of gesture – Specific information at lower-level: Shape of specific human subject Self-organized Functional Hierarchy Low-level Representation (shape of a specific subject) High-level Representation (type of the gesture) Initial States obtained from Training B A C
  • 22.
  • 23.
    23 • Core of“Predictive Coding” – Recognizing intention from observation by minimizing prediction error – Account for MNS (Mirror Neuron Systems) • *Mirror Neurons: Activated while executing & observing an action Prediction Error Minimization Predictive Coding Framework Stefanics, et. al., (2014)
  • 24.
    24 At each timestep 1. Generate Visuo-Prop. Predictions • From Intention State (Top-Down Process) 2. Compute Prediction Error • Difference b/w Predicted & Observed Patterns 3. Backpropagate Prediction Error & Update Intention State • Bottom-Up Process 4. Iterates a Few Times Prediction Error Minimization Environment
  • 25.
    25 At each timestep 1. Generate Visuo-Prop. Predictions • From Intention State (Top-Down Process) 2. Compute Prediction Error • Difference b/w Predicted & Observed Patterns 3. Backpropagate Prediction Error & Update Intention State • Bottom-Up Process 4. Iterates a Few Times Prediction Error Minimization Environment Prediction
  • 26.
    26 Observation At each timestep 1. Generate Visuo-Prop. Predictions • From Intention State (Top-Down Process) 2. Compute Prediction Error • Difference b/w Predicted & Observed Patterns 3. Backpropagate Prediction Error & Update Intention State • Bottom-Up Process 4. Iterates a Few Times Prediction Error Minimization Environment PREDICTION ERROR Prediction
  • 27.
    27 Observation At each timestep 1. Generate Visuo-Prop. Predictions • From Intention State (Top-Down Process) 2. Compute Prediction Error • Difference b/w Predicted & Observed Patterns 3. Backpropagate Prediction Error & Update Intention State • Bottom-Up Process 4. Iterates a Few Times Prediction Error Minimization Environment PREDICTION ERROR
  • 28.
    28 Observation At each timestep 1. Generate Visuo-Prop. Predictions • From Intention State (Top-Down Process) 2. Compute Prediction Error • Difference b/w Predicted & Observed Patterns 3. Backpropagate Prediction Error & Update Intention State • Bottom-Up Process 4. Iterates a Few Times Prediction Error Minimization Environment PREDICTION ERROR
  • 29.
    29 Observation At each timestep 1. Generate Visuo-Prop. Predictions • From Intention State (Top-Down Process) 2. Compute Prediction Error • Difference b/w Predicted & Observed Patterns 3. Backpropagate Prediction Error & Update Intention State • Bottom-Up Process 4. Iterates a Few Times Prediction Error Minimization Environment MINIMIZED PREDICTION ERROR “Perception as an Active Process”  Solely determined by input
  • 30.
    30 Prediction Error Minimization MinimizingVisual PE Prediction Error Prediction Error Minimizing Prop. PE
  • 31.
    31 Prediction Error Minimization MinimizingVisual PE • Minimizing the difference b/w – Visual Prediction (predicted gesture) – Observation (observed gesture) Prediction Error = Difference ( Observed Gesture, Predicted Gesture )
  • 32.
    32 Prediction Error Minimization MinimizingVisual PE • Minimizing the difference b/w – Visual Prediction (predicted gesture) – Observation (observed gesture) • No External Proprioceptive Signal – Robot’s action was generated simultaneously while minimizing Visual PE Prediction Error
  • 33.
    33 Prediction Error Minimization MinimizingVisual PE • With Visual PE Minimization – Predicted coherent visual & Proprioceptive patterns  Successful imitation • Without Visual PE Minimization – Did NOT predict Visual & Proprioceptive patterns  Unsuccessful imitation Prediction Error
  • 34.
    34 Prediction Error Minimization MinimizingProp. PE • Minimizing the difference b/w – Prop. Prediction (Predicted joint position) – Observation (Perceived joint position) Prediction Error Difference ( Observed Jnt Position, Predicted Jnt Postion ) = PROP. PREDICTION ERROR
  • 35.
    35 Prediction Error Minimization MinimizingProp. PE • Minimizing the difference b/w – Prop. Prediction (Predicted joint position) – Observation (Perceived joint position) • No External Visual Target Signal – Visual Prediction was generated simultaneously while minimizing Prop. PE Prediction Error PROP. PREDICTION ERROR
  • 36.
    36 Prediction Error Minimization MinimizingProp. PE • With Proprioceptive PE Minimization – Successfully minimized Proprioceptive PE – Generated corresponding Visual Prediction (imaginary) • Without Proprioceptive PE Minimization – Not able to adapt to incoming Proprioceptive signal Prediction Error
  • 37.
    37 Neural Activation whileMinimizing Visual Prediction Error
  • 38.
    38 Neural Activation whileMinimizing Visual Prediction Error Overlapping Trajectories b/w Training & Testing – Inferring intention latent in observed patterns @ Higher-level – Recalling the corresponding representations @ Lower-level  Retrieval of missing sensorimotor signals
  • 39.
    39 • MNS-like Behavioremerged from 1. Neural connectivity (between two pathways) 2. Learning sensorimotor experience 3. Prediction Error Minimization • “Predictive Coding Account of MNS” (Kilner, Friston and Frith, 2007) • “Within predictive coding, recognition of causes is simply the process of jointly minimizing prediction error at all levels of a cortical hierarchy.” Neural Activation while Minimizing Visual Prediction Error
  • 40.
    40 Conclusion Build a CognitiveAgent based on Embodiment “Learning from sensorimotor experience” acquired from dynamic interaction with the world Prediction “Brain = a Prediction Machine” • Complex cognitive behaviors emerged • Mental simulation, Intention recognition, MNS-like behavior, etc. • From “Visuo-Motor associative learning under the predictive coding framework”
  • 41.
    Thank you Photo byTakazumi J. Hwang, J. Kim, A. Ahmadi, M. Choi and J. Tani, "Dealing With Large-Scale Spatio-Temporal Patterns in Imitative Interaction Between a Robot and a Human by Using the Predictive Coding Framework," in IEEE Transactions on Systems, Man, and Cybernetics: Systems. doi: 10.1109/TSMC.2018.2791984, 2018 CNRU @ OIST Please see the following paper for more information.