Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов"

1
HHiieerraarrcchhiiccaall RReeiinnffoorrcceemmeenntt
LLeeaarrnniinngg ffoorr IInntteerraaccttiivvee
SSyysstteemmss aanndd RRoobboottss
HHeerriibbeerrttoo CCuuaayyááhhuuiittll
IInntteerraaccttiioonn LLaabb
HHeerriioott--WWaatttt UUnniivveerrssiittyy,, EEddiinnbbuurrgghh,, UUKK
SScchhooooll ooff MMaatthheemmaattiiccaall && CCoommppuutteerr SScciieenncceess
hhcc221133@@hhww..aacc..uukk
AINL, Moscow, 12-13 September 2014

Mary Ellen Foster
Simon Keizer
Zhuoran Wang
Oliver Lemon Helen Hastie
Srini Janarthanam
Xingkun Liu
Verena Rieser
Dimitra Gkatzia
Nina Dethlefs Arash Eshghi
2
Heriberto
Cuayahuitl
Ioannis
Efstathiou
Wenshuo Tang
Kathin Lohan

3
Reinforcement
Learning
Projects

Interactive Learning System/Robot
• Interactive learning machine: is an entity which
improves its performance through interacting with
other machines, its physical world and/or humans.
(Cuayáhuitl, H., et al., 2013, IJCAI-MLIS) 4

A Motivating Scenario
A robot learning to
play multiple games
from interaction
5

Outline
1.
Reinforcement
Learning (RL)
2.
Hierarchical
RL
3.
Applications
4. Related
Work
5. Future
Directions
Interactive
Learning
Systems
6
6.
Summary

Outline: Where are we?
1.
Reinforcement
Learning (RL)
2.
Hierarchical
RL
3.
Applications
4. Related
Work
5. Future
Directions
Interactive
Learning
Systems
7
6.
Summary

Interaction as a Markov Decision Process
(MDP)
● The environment is described as an MDP:
● A set of states S;
● A set of actions A;
● A state transition function T;
● A reward function R.
● The MDP solution (policy or interaction manager)
decides what to do using reinforcement learning
Pr(s2|s1,a1) Choice points

Reinforcement Learning is not Trivial
1030
1025
1020
1015
1010
Known Issues:
Scalability and
Partial Observability
100 101 102 100
9
105
State Space Growth
Number of Binary Variables

The Goal of Reinforcement Learners
 The goal is to find an optimal policy:

How to Represent the Agent's Policy?
● Tabular representations
● Tree-based representations
● Function approximation
● Linear
● Non-linear
11

Reinforcement Learning Algorithms
● Q-Learning
● Q-Learning with Linear Function Approximation
(Sutton & Barto, MIT Press, 1998; Szepesvari, Morgan Clay Pub., 2010) 12

Illustrative Example: The Interactive Taxi
• State Trans.: 0.8 of correct navigation/recognition
• Reward:+100 for reaching the goal, 0 otherwise
• Size of state-action space:
|S*A| = 50*5^4*3*4*16 = 6M state-actions 13

1.
Reinforcement
Learning (RL)
2.
Hierarchical
RL
3.
Applications
4. Related
Work
5. Future
Directions
Interactive
Learning
Systems
14
6.
Summary

Hierarchical Reinforcement Learning
• Why? To learn system behaviours to carry out
multiple tasks jointly (not separately)
15
I know how to
do that, from
playing the
other game

Interaction as a Semi-Markov Decision
Process (SMDP)
● Environment as an SMDP:
● S: set of states
● A: set of (complex) actions
● T: state transition function
● R: reward function
● One SMDP for each task or
subtask
● Hierarchical reinforcement
learning algorithms to solve
SMDPs (e.g. HSMQ, MAXQ)
Tasks
Task
1
Task
N
Sub-task
Sub-
Task
Sub-task
Sub-
Task
The goal is to find:
16

Conceptual SMDP for Interactive Systems
quicker learning,
more scalability,
behaviour reuse
Bene fits

Hierarchical Reinforcement Learning
Algorithms
● HSMQ-Learning
● HSMQ-Learning with Linear Function Approximation
● Other HRL algorithms: MAXQ, HAMQ
● Algorithms for structure learning: HEXQ, VISA, HI-MAT
(Barto & Mahadevan, 2003; Hengst, 2010) 18

Illustrative Example: The Interactive Taxi
• State Trans.: 0.8 of correct navigation/recognition
• Reward:+100 for reaching the goal, 0 otherwise
• State-action space: |S*A| = 10.7K state-actions
19

1.
Reinforcement
Learning (RL)
2.
Hierarchical
RL
3.
Applications
4. Related
Work
5. Future
Directions
Interactive
Learning
Systems
20
6.
Summary

Speech-Based Human-Machine
Communication
HRL
Agents

Application 1: Travel Planning
● HRL without prior knowledge (HSMQ-Learning)
● HRL with prior knowledge (HAM+HSMQ-Learning)
W=joint state
(SMDP+HAM)
● Training with simulated interactions
● Testing with real users
(Cuayahuitl et al., Computer, Speech & Language, 2010) 22

Travel Planning Spoken Dialogue System
(Cuayáhuitl et al., Computer, Speech & Language, 2010) 23

Results in the Travel Planning Domain
24
• HRL finds solutions faster than flat learning
• HRL is more scalable than flat learning
• Learnt policies outperform hand-coded ones
(Cuayáhuitl et al., Computer, Speech & Language, 2010)

Application 2: Indoor Wayfinding
● HRL without policy reuse (HSMQ-Learning)
● HRL with policy reuse (HSMQ_PR-Learning)
● Detect situations where the system knows how to act
● Action-selection using an optimal (if reuse=true) or an
exploratory policy (if reuse=false)

Indoor Wayfinding Dialogue System
Infokiosk &
mobile phone
interfaces
(Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011) 26

Results in the Indoor Wayfinding Domain
• Policy reuse finds solutions faster than without it
• Adaptive route instructions are more efficient
(Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011)
27

Application 3: Human-Robot Interaction
● HSMQ vs. FlexHSMQ Learning w/linear function approx.

Robot Dialogue System (Quiz Game)
29
Interaction
Manager
(Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014)

Results in the Quiz Domain
• Non-strict HRL leads to more natural interactions
• Non-strict HRL is preferred by human users
(Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014)
30

Robot Asking and Answering Questions
(Belpaeme, et al., 2012, Intl. Journal of HRI) 31

1.
Reinforcement
Learning (RL)
2.
Hierarchical
RL
3.
Applications
4. Related
Work
5. Future
Directions
Interactive
Learning
Systems
32
6.
Summary

Learning with Large State Spaces
33

Learning under Uncertainty
34

Spectrum of Markov Process Models
Promising for
multi-task
learning
systems
35
(Mahadevan, S. et al., 2004, Handbook of Learning and Approx. Dyn. Prog.)

1.
Reinforcement
Learning (RL)
2.
Hierarchical
RL
3.
Applications
4. Related
Work
5. Future
Directions
Interactive
Learning
Systems
36
6.
Summary

Issues that Might Lead to Future
Interactive Learning Systems
1.Big effort to make the system perform similar tasks
2.Simulations may not represent the real world
3.It is often hard to specify the reward function
4.The real world is partially known and dynamic
5.Poor spatial cognition will affect real world impact
6.Small vocabularies discourage talking to machines
7.Lack of interactive learning systems in the real world
37

Towards Autonomous Interactive
Systems and Robots
Degree of autonomy
Amount of tasks
Current interactive
systems require a
lot of human
intervention
How do we
get here?
Wholistic
perspective for
language, vision
and robotics
Future interactive
systems should
be more
autonomous
38

1.
Reinforcement
Learning (RL)
2.
Hierarchical
RL
3.
Applications
4. Related
Work
5. Future
Directions
Interactive
Learning
Systems
39
6.
Summary

Summary
• Machines can be programmed to behave just
as expected, but the physical world and
humans demand systems that can learn
• Hierarchical learning plays an important role
for multi-tasked interactive systems and robots
• More autonomy is needed if systems are to
learn new skills with little human intervention
• A wholistic interdisciplinary perspective is
needed for intelligent interactive robots
40

References
• Cuayáhuitl, H., Dethlefs, N., Kruijff -Korbayová, I., (2014) Non-
Strict Hierarchical Reinforcement Learning for
Interactive Systems and Robots. To appear in ACM
Transactions on Intelligent Interactive Systems, vol. 4, no. 3.
• Cuayáhuitl, H. and Dethlefs, N., (2011), Spatially-Aware
Dialogue Control Using Hierarchical Reinforcement
Learning. In ACM Transactions on Speech and Language
Processing, vol. 7, no. 3, pp. 5:1-5:26.
• Cuayáhuitl, H., Renals, S., Lemon, O., Shimodaira, H., (2010),
Evaluation of a Hierarchical Reinforcement Learning
Spoken Dialogue System. In Computer Speech and
Language, vol. 24, no. 2, pp. 395-429.
E-Mail: hc213@hw.ac.uk
41

Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов"

Recommended

Recommended

More Related Content

Similar to Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов"

Similar to Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов" (20)

More from AINL Conferences

More from AINL Conferences (20)

Recently uploaded

Recently uploaded (20)

Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов"