SlideShare a Scribd company logo
1 of 41
Download to read offline
1 
HHiieerraarrcchhiiccaall RReeiinnffoorrcceemmeenntt 
LLeeaarrnniinngg ffoorr IInntteerraaccttiivvee 
SSyysstteemmss aanndd RRoobboottss 
HHeerriibbeerrttoo CCuuaayyááhhuuiittll 
IInntteerraaccttiioonn LLaabb 
HHeerriioott--WWaatttt UUnniivveerrssiittyy,, EEddiinnbbuurrgghh,, UUKK 
SScchhooooll ooff MMaatthheemmaattiiccaall && CCoommppuutteerr SScciieenncceess 
hhcc221133@@hhww..aacc..uukk 
AINL, Moscow, 12-13 September 2014
Mary Ellen Foster 
Simon Keizer 
Zhuoran Wang 
Oliver Lemon Helen Hastie 
Srini Janarthanam 
Xingkun Liu 
Verena Rieser 
Dimitra Gkatzia 
Nina Dethlefs Arash Eshghi 
2 
Heriberto 
Cuayahuitl 
Ioannis 
Efstathiou 
Wenshuo Tang 
Kathin Lohan
3 
Reinforcement 
Learning 
Projects
Interactive Learning System/Robot 
• Interactive learning machine: is an entity which 
improves its performance through interacting with 
other machines, its physical world and/or humans. 
(Cuayáhuitl, H., et al., 2013, IJCAI-MLIS) 4
A Motivating Scenario 
A robot learning to 
play multiple games 
from interaction 
5
Outline 
1. 
Reinforcement 
Learning (RL) 
2. 
Hierarchical 
RL 
3. 
Applications 
4. Related 
Work 
5. Future 
Directions 
Interactive 
Learning 
Systems 
6 
6. 
Summary
Outline: Where are we? 
1. 
Reinforcement 
Learning (RL) 
2. 
Hierarchical 
RL 
3. 
Applications 
4. Related 
Work 
5. Future 
Directions 
Interactive 
Learning 
Systems 
7 
6. 
Summary
Interaction as a Markov Decision Process 
(MDP) 
● The environment is described as an MDP: 
● A set of states S; 
● A set of actions A; 
● A state transition function T; 
● A reward function R. 
● The MDP solution (policy or interaction manager) 
decides what to do using reinforcement learning 
Pr(s2|s1,a1) Choice points
Reinforcement Learning is not Trivial 
1030 
1025 
1020 
1015 
1010 
Known Issues: 
Scalability and 
Partial Observability 
100 101 102 100 
9 
105 
State Space Growth 
Number of Binary Variables
The Goal of Reinforcement Learners 
 The goal is to find an optimal policy:
How to Represent the Agent's Policy? 
● Tabular representations 
● Tree-based representations 
● Function approximation 
● Linear 
● Non-linear 
11
Reinforcement Learning Algorithms 
● Q-Learning 
● Q-Learning with Linear Function Approximation 
(Sutton & Barto, MIT Press, 1998; Szepesvari, Morgan Clay Pub., 2010) 12
Illustrative Example: The Interactive Taxi 
• State Trans.: 0.8 of correct navigation/recognition 
• Reward:+100 for reaching the goal, 0 otherwise 
• Size of state-action space: 
|S*A| = 50*5^4*3*4*16 = 6M state-actions 13
Outline: Where are we? 
1. 
Reinforcement 
Learning (RL) 
2. 
Hierarchical 
RL 
3. 
Applications 
4. Related 
Work 
5. Future 
Directions 
Interactive 
Learning 
Systems 
14 
6. 
Summary
Hierarchical Reinforcement Learning 
• Why? To learn system behaviours to carry out 
multiple tasks jointly (not separately) 
15 
I know how to 
do that, from 
playing the 
other game
Interaction as a Semi-Markov Decision 
Process (SMDP) 
● Environment as an SMDP: 
● S: set of states 
● A: set of (complex) actions 
● T: state transition function 
● R: reward function 
● One SMDP for each task or 
subtask 
● Hierarchical reinforcement 
learning algorithms to solve 
SMDPs (e.g. HSMQ, MAXQ) 
Tasks 
Task 
1 
Task 
N 
Sub-task 
Sub- 
Task 
Sub-task 
Sub- 
Task 
The goal is to find: 
16
Conceptual SMDP for Interactive Systems 
quicker learning, 
more scalability, 
behaviour reuse 
Bene fits
Hierarchical Reinforcement Learning 
Algorithms 
● HSMQ-Learning 
● HSMQ-Learning with Linear Function Approximation 
● Other HRL algorithms: MAXQ, HAMQ 
● Algorithms for structure learning: HEXQ, VISA, HI-MAT 
(Barto & Mahadevan, 2003; Hengst, 2010) 18
Illustrative Example: The Interactive Taxi 
• State Trans.: 0.8 of correct navigation/recognition 
• Reward:+100 for reaching the goal, 0 otherwise 
• State-action space: |S*A| = 10.7K state-actions 
19
Outline: Where are we? 
1. 
Reinforcement 
Learning (RL) 
2. 
Hierarchical 
RL 
3. 
Applications 
4. Related 
Work 
5. Future 
Directions 
Interactive 
Learning 
Systems 
20 
6. 
Summary
Speech-Based Human-Machine 
Communication 
HRL 
Agents
Application 1: Travel Planning 
● HRL without prior knowledge (HSMQ-Learning) 
● HRL with prior knowledge (HAM+HSMQ-Learning) 
W=joint state 
(SMDP+HAM) 
● Training with simulated interactions 
● Testing with real users 
(Cuayahuitl et al., Computer, Speech & Language, 2010) 22
Travel Planning Spoken Dialogue System 
(Cuayáhuitl et al., Computer, Speech & Language, 2010) 23
Results in the Travel Planning Domain 
24 
• HRL finds solutions faster than flat learning 
• HRL is more scalable than flat learning 
• Learnt policies outperform hand-coded ones 
(Cuayáhuitl et al., Computer, Speech & Language, 2010)
Application 2: Indoor Wayfinding 
● HRL without policy reuse (HSMQ-Learning) 
● HRL with policy reuse (HSMQ_PR-Learning) 
● Detect situations where the system knows how to act 
● Action-selection using an optimal (if reuse=true) or an 
exploratory policy (if reuse=false) 
● Training with simulated interactions 
● Testing with real users 
(Cuayahuitl et al., Computer, Speech & Language, 2010) 25
Indoor Wayfinding Dialogue System 
Infokiosk & 
mobile phone 
interfaces 
(Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011) 26
Results in the Indoor Wayfinding Domain 
• Policy reuse finds solutions faster than without it 
• Adaptive route instructions are more efficient 
(Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011) 
27
Application 3: Human-Robot Interaction 
● HSMQ vs. FlexHSMQ Learning w/linear function approx. 
● Training with simulated interactions 
● Testing with real users 
(Cuayahuitl et al., Computer, Speech & Language, 2010) 28
Robot Dialogue System (Quiz Game) 
29 
Interaction 
Manager 
(Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014)
Results in the Quiz Domain 
• Non-strict HRL leads to more natural interactions 
• Non-strict HRL is preferred by human users 
(Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014) 
30
Robot Asking and Answering Questions 
(Belpaeme, et al., 2012, Intl. Journal of HRI) 31
Outline: Where are we? 
1. 
Reinforcement 
Learning (RL) 
2. 
Hierarchical 
RL 
3. 
Applications 
4. Related 
Work 
5. Future 
Directions 
Interactive 
Learning 
Systems 
32 
6. 
Summary
Learning with Large State Spaces 
33
Learning under Uncertainty 
34
Spectrum of Markov Process Models 
Promising for 
multi-task 
learning 
systems 
35 
(Mahadevan, S. et al., 2004, Handbook of Learning and Approx. Dyn. Prog.)
Outline: Where are we? 
1. 
Reinforcement 
Learning (RL) 
2. 
Hierarchical 
RL 
3. 
Applications 
4. Related 
Work 
5. Future 
Directions 
Interactive 
Learning 
Systems 
36 
6. 
Summary
Issues that Might Lead to Future 
Interactive Learning Systems 
1.Big effort to make the system perform similar tasks 
2.Simulations may not represent the real world 
3.It is often hard to specify the reward function 
4.The real world is partially known and dynamic 
5.Poor spatial cognition will affect real world impact 
6.Small vocabularies discourage talking to machines 
7.Lack of interactive learning systems in the real world 
37
Towards Autonomous Interactive 
Systems and Robots 
Degree of autonomy 
Amount of tasks 
Current interactive 
systems require a 
lot of human 
intervention 
How do we 
get here? 
Wholistic 
perspective for 
language, vision 
and robotics 
Future interactive 
systems should 
be more 
autonomous 
38
Outline: Where are we? 
1. 
Reinforcement 
Learning (RL) 
2. 
Hierarchical 
RL 
3. 
Applications 
4. Related 
Work 
5. Future 
Directions 
Interactive 
Learning 
Systems 
39 
6. 
Summary
Summary 
• Machines can be programmed to behave just 
as expected, but the physical world and 
humans demand systems that can learn 
• Hierarchical learning plays an important role 
for multi-tasked interactive systems and robots 
• More autonomy is needed if systems are to 
learn new skills with little human intervention 
• A wholistic interdisciplinary perspective is 
needed for intelligent interactive robots 
40
References 
• Cuayáhuitl, H., Dethlefs, N., Kruijff -Korbayová, I., (2014) Non- 
Strict Hierarchical Reinforcement Learning for 
Interactive Systems and Robots. To appear in ACM 
Transactions on Intelligent Interactive Systems, vol. 4, no. 3. 
• Cuayáhuitl, H. and Dethlefs, N., (2011), Spatially-Aware 
Dialogue Control Using Hierarchical Reinforcement 
Learning. In ACM Transactions on Speech and Language 
Processing, vol. 7, no. 3, pp. 5:1-5:26. 
• Cuayáhuitl, H., Renals, S., Lemon, O., Shimodaira, H., (2010), 
Evaluation of a Hierarchical Reinforcement Learning 
Spoken Dialogue System. In Computer Speech and 
Language, vol. 24, no. 2, pp. 395-429. 
E-Mail: hc213@hw.ac.uk 
41

More Related Content

Similar to Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов"

Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningEvolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningYoonho Lee
 
Using Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaborationUsing Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaborationUniversity of Auckland
 
#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentationparlamind
 
Educational uses of immersive learning environments
Educational uses of immersive learning environmentsEducational uses of immersive learning environments
Educational uses of immersive learning environmentsLeonel Morgado
 
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...EuroCAT CSCL
 
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV...
 Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV... Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV...
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV...Niki Lambropoulos PhD
 
Snips and snails and puppy dog tails: the need to preserve complexity in math...
Snips and snails and puppy dog tails: the need to preserve complexity in math...Snips and snails and puppy dog tails: the need to preserve complexity in math...
Snips and snails and puppy dog tails: the need to preserve complexity in math...Universidade de Lisboa
 
On data-driven systems analyzing, supporting and enhancing users’ interaction...
On data-driven systems analyzing, supporting and enhancing users’ interaction...On data-driven systems analyzing, supporting and enhancing users’ interaction...
On data-driven systems analyzing, supporting and enhancing users’ interaction...Grial - University of Salamanca
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Katrien Verbert
 
2015-11-11 research seminar
2015-11-11 research seminar2015-11-11 research seminar
2015-11-11 research seminarifi8106tlu
 
Using Simulations to Evaluated the Effects of Recommender Systems for Learner...
Using Simulations to Evaluated the Effects of Recommender Systems for Learner...Using Simulations to Evaluated the Effects of Recommender Systems for Learner...
Using Simulations to Evaluated the Effects of Recommender Systems for Learner...Hendrik Drachsler
 
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...AI Frontiers
 

Similar to Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов" (20)

SIGVerse Project: IROS 2016 Keynote talk
SIGVerse Project: IROS 2016 Keynote talkSIGVerse Project: IROS 2016 Keynote talk
SIGVerse Project: IROS 2016 Keynote talk
 
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningEvolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Using Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaborationUsing Physiological sensing and scene reconstruction in remote collaboration
Using Physiological sensing and scene reconstruction in remote collaboration
 
Machine learning
Machine learningMachine learning
Machine learning
 
Kartik csig talk
Kartik csig talkKartik csig talk
Kartik csig talk
 
#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation
 
Cook et al
Cook et alCook et al
Cook et al
 
Chapter 5 of 1
Chapter 5 of 1Chapter 5 of 1
Chapter 5 of 1
 
Educational uses of immersive learning environments
Educational uses of immersive learning environmentsEducational uses of immersive learning environments
Educational uses of immersive learning environments
 
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AVA...
 
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV...
 Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV... Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV...
Tools and Evaluation Techniques to Support Social Awareness in CSCeL: The AV...
 
Snips and snails and puppy dog tails: the need to preserve complexity in math...
Snips and snails and puppy dog tails: the need to preserve complexity in math...Snips and snails and puppy dog tails: the need to preserve complexity in math...
Snips and snails and puppy dog tails: the need to preserve complexity in math...
 
On data-driven systems analyzing, supporting and enhancing users’ interaction...
On data-driven systems analyzing, supporting and enhancing users’ interaction...On data-driven systems analyzing, supporting and enhancing users’ interaction...
On data-driven systems analyzing, supporting and enhancing users’ interaction...
 
XAI (IIT-Patna).pdf
XAI (IIT-Patna).pdfXAI (IIT-Patna).pdf
XAI (IIT-Patna).pdf
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?
 
Ai introduction
Ai  introductionAi  introduction
Ai introduction
 
2015-11-11 research seminar
2015-11-11 research seminar2015-11-11 research seminar
2015-11-11 research seminar
 
Using Simulations to Evaluated the Effects of Recommender Systems for Learner...
Using Simulations to Evaluated the Effects of Recommender Systems for Learner...Using Simulations to Evaluated the Effects of Recommender Systems for Learner...
Using Simulations to Evaluated the Effects of Recommender Systems for Learner...
 
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
 

More from AINL Conferences

Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...AINL Conferences
 
Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...AINL Conferences
 
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...AINL Conferences
 
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"AINL Conferences
 
Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"AINL Conferences
 
Николай Бузурнюк "Автономная система распознавания русской речи"
 Николай Бузурнюк "Автономная система распознавания русской речи" Николай Бузурнюк "Автономная система распознавания русской речи"
Николай Бузурнюк "Автономная система распознавания русской речи"AINL Conferences
 
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"AINL Conferences
 
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"AINL Conferences
 
Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"AINL Conferences
 
Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"AINL Conferences
 
Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"AINL Conferences
 
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...AINL Conferences
 
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл...
 Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл... Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл...
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл...AINL Conferences
 
Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех...
 Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех... Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех...
Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех...AINL Conferences
 
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...AINL Conferences
 
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...AINL Conferences
 
Paolo Rosso "On irony detection in social media"
Paolo Rosso "On irony detection in social media"Paolo Rosso "On irony detection in social media"
Paolo Rosso "On irony detection in social media"AINL Conferences
 
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...AINL Conferences
 
AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)AINL Conferences
 
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...AINL Conferences
 

More from AINL Conferences (20)

Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сло...
 
Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...
 
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
 
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
 
Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"
 
Николай Бузурнюк "Автономная система распознавания русской речи"
 Николай Бузурнюк "Автономная система распознавания русской речи" Николай Бузурнюк "Автономная система распознавания русской речи"
Николай Бузурнюк "Автономная система распознавания русской речи"
 
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
 
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
 
Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"
 
Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"
 
Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"
 
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
 
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл...
 Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл... Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл...
Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла сл...
 
Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех...
 Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех... Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех...
Игорь Андреев (Mail.ru) "Перевод с русского на русский, или о применении тех...
 
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
 
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
 
Paolo Rosso "On irony detection in social media"
Paolo Rosso "On irony detection in social media"Paolo Rosso "On irony detection in social media"
Paolo Rosso "On irony detection in social media"
 
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
 
AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)
 
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
 

Recently uploaded

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Эриберто Кваджавитль "Адаптивное обучение с подкреплением для интерактивных систем и роботов"

  • 1. 1 HHiieerraarrcchhiiccaall RReeiinnffoorrcceemmeenntt LLeeaarrnniinngg ffoorr IInntteerraaccttiivvee SSyysstteemmss aanndd RRoobboottss HHeerriibbeerrttoo CCuuaayyááhhuuiittll IInntteerraaccttiioonn LLaabb HHeerriioott--WWaatttt UUnniivveerrssiittyy,, EEddiinnbbuurrgghh,, UUKK SScchhooooll ooff MMaatthheemmaattiiccaall && CCoommppuutteerr SScciieenncceess hhcc221133@@hhww..aacc..uukk AINL, Moscow, 12-13 September 2014
  • 2. Mary Ellen Foster Simon Keizer Zhuoran Wang Oliver Lemon Helen Hastie Srini Janarthanam Xingkun Liu Verena Rieser Dimitra Gkatzia Nina Dethlefs Arash Eshghi 2 Heriberto Cuayahuitl Ioannis Efstathiou Wenshuo Tang Kathin Lohan
  • 4. Interactive Learning System/Robot • Interactive learning machine: is an entity which improves its performance through interacting with other machines, its physical world and/or humans. (Cuayáhuitl, H., et al., 2013, IJCAI-MLIS) 4
  • 5. A Motivating Scenario A robot learning to play multiple games from interaction 5
  • 6. Outline 1. Reinforcement Learning (RL) 2. Hierarchical RL 3. Applications 4. Related Work 5. Future Directions Interactive Learning Systems 6 6. Summary
  • 7. Outline: Where are we? 1. Reinforcement Learning (RL) 2. Hierarchical RL 3. Applications 4. Related Work 5. Future Directions Interactive Learning Systems 7 6. Summary
  • 8. Interaction as a Markov Decision Process (MDP) ● The environment is described as an MDP: ● A set of states S; ● A set of actions A; ● A state transition function T; ● A reward function R. ● The MDP solution (policy or interaction manager) decides what to do using reinforcement learning Pr(s2|s1,a1) Choice points
  • 9. Reinforcement Learning is not Trivial 1030 1025 1020 1015 1010 Known Issues: Scalability and Partial Observability 100 101 102 100 9 105 State Space Growth Number of Binary Variables
  • 10. The Goal of Reinforcement Learners  The goal is to find an optimal policy:
  • 11. How to Represent the Agent's Policy? ● Tabular representations ● Tree-based representations ● Function approximation ● Linear ● Non-linear 11
  • 12. Reinforcement Learning Algorithms ● Q-Learning ● Q-Learning with Linear Function Approximation (Sutton & Barto, MIT Press, 1998; Szepesvari, Morgan Clay Pub., 2010) 12
  • 13. Illustrative Example: The Interactive Taxi • State Trans.: 0.8 of correct navigation/recognition • Reward:+100 for reaching the goal, 0 otherwise • Size of state-action space: |S*A| = 50*5^4*3*4*16 = 6M state-actions 13
  • 14. Outline: Where are we? 1. Reinforcement Learning (RL) 2. Hierarchical RL 3. Applications 4. Related Work 5. Future Directions Interactive Learning Systems 14 6. Summary
  • 15. Hierarchical Reinforcement Learning • Why? To learn system behaviours to carry out multiple tasks jointly (not separately) 15 I know how to do that, from playing the other game
  • 16. Interaction as a Semi-Markov Decision Process (SMDP) ● Environment as an SMDP: ● S: set of states ● A: set of (complex) actions ● T: state transition function ● R: reward function ● One SMDP for each task or subtask ● Hierarchical reinforcement learning algorithms to solve SMDPs (e.g. HSMQ, MAXQ) Tasks Task 1 Task N Sub-task Sub- Task Sub-task Sub- Task The goal is to find: 16
  • 17. Conceptual SMDP for Interactive Systems quicker learning, more scalability, behaviour reuse Bene fits
  • 18. Hierarchical Reinforcement Learning Algorithms ● HSMQ-Learning ● HSMQ-Learning with Linear Function Approximation ● Other HRL algorithms: MAXQ, HAMQ ● Algorithms for structure learning: HEXQ, VISA, HI-MAT (Barto & Mahadevan, 2003; Hengst, 2010) 18
  • 19. Illustrative Example: The Interactive Taxi • State Trans.: 0.8 of correct navigation/recognition • Reward:+100 for reaching the goal, 0 otherwise • State-action space: |S*A| = 10.7K state-actions 19
  • 20. Outline: Where are we? 1. Reinforcement Learning (RL) 2. Hierarchical RL 3. Applications 4. Related Work 5. Future Directions Interactive Learning Systems 20 6. Summary
  • 22. Application 1: Travel Planning ● HRL without prior knowledge (HSMQ-Learning) ● HRL with prior knowledge (HAM+HSMQ-Learning) W=joint state (SMDP+HAM) ● Training with simulated interactions ● Testing with real users (Cuayahuitl et al., Computer, Speech & Language, 2010) 22
  • 23. Travel Planning Spoken Dialogue System (Cuayáhuitl et al., Computer, Speech & Language, 2010) 23
  • 24. Results in the Travel Planning Domain 24 • HRL finds solutions faster than flat learning • HRL is more scalable than flat learning • Learnt policies outperform hand-coded ones (Cuayáhuitl et al., Computer, Speech & Language, 2010)
  • 25. Application 2: Indoor Wayfinding ● HRL without policy reuse (HSMQ-Learning) ● HRL with policy reuse (HSMQ_PR-Learning) ● Detect situations where the system knows how to act ● Action-selection using an optimal (if reuse=true) or an exploratory policy (if reuse=false) ● Training with simulated interactions ● Testing with real users (Cuayahuitl et al., Computer, Speech & Language, 2010) 25
  • 26. Indoor Wayfinding Dialogue System Infokiosk & mobile phone interfaces (Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011) 26
  • 27. Results in the Indoor Wayfinding Domain • Policy reuse finds solutions faster than without it • Adaptive route instructions are more efficient (Cuayáhuitl & Dethlefs., ACM Trans. Speech & Lang. Proc., 2011) 27
  • 28. Application 3: Human-Robot Interaction ● HSMQ vs. FlexHSMQ Learning w/linear function approx. ● Training with simulated interactions ● Testing with real users (Cuayahuitl et al., Computer, Speech & Language, 2010) 28
  • 29. Robot Dialogue System (Quiz Game) 29 Interaction Manager (Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014)
  • 30. Results in the Quiz Domain • Non-strict HRL leads to more natural interactions • Non-strict HRL is preferred by human users (Cuayáhuitl et al., ACM Trans. Interactive Intelligent Sys., 2014) 30
  • 31. Robot Asking and Answering Questions (Belpaeme, et al., 2012, Intl. Journal of HRI) 31
  • 32. Outline: Where are we? 1. Reinforcement Learning (RL) 2. Hierarchical RL 3. Applications 4. Related Work 5. Future Directions Interactive Learning Systems 32 6. Summary
  • 33. Learning with Large State Spaces 33
  • 35. Spectrum of Markov Process Models Promising for multi-task learning systems 35 (Mahadevan, S. et al., 2004, Handbook of Learning and Approx. Dyn. Prog.)
  • 36. Outline: Where are we? 1. Reinforcement Learning (RL) 2. Hierarchical RL 3. Applications 4. Related Work 5. Future Directions Interactive Learning Systems 36 6. Summary
  • 37. Issues that Might Lead to Future Interactive Learning Systems 1.Big effort to make the system perform similar tasks 2.Simulations may not represent the real world 3.It is often hard to specify the reward function 4.The real world is partially known and dynamic 5.Poor spatial cognition will affect real world impact 6.Small vocabularies discourage talking to machines 7.Lack of interactive learning systems in the real world 37
  • 38. Towards Autonomous Interactive Systems and Robots Degree of autonomy Amount of tasks Current interactive systems require a lot of human intervention How do we get here? Wholistic perspective for language, vision and robotics Future interactive systems should be more autonomous 38
  • 39. Outline: Where are we? 1. Reinforcement Learning (RL) 2. Hierarchical RL 3. Applications 4. Related Work 5. Future Directions Interactive Learning Systems 39 6. Summary
  • 40. Summary • Machines can be programmed to behave just as expected, but the physical world and humans demand systems that can learn • Hierarchical learning plays an important role for multi-tasked interactive systems and robots • More autonomy is needed if systems are to learn new skills with little human intervention • A wholistic interdisciplinary perspective is needed for intelligent interactive robots 40
  • 41. References • Cuayáhuitl, H., Dethlefs, N., Kruijff -Korbayová, I., (2014) Non- Strict Hierarchical Reinforcement Learning for Interactive Systems and Robots. To appear in ACM Transactions on Intelligent Interactive Systems, vol. 4, no. 3. • Cuayáhuitl, H. and Dethlefs, N., (2011), Spatially-Aware Dialogue Control Using Hierarchical Reinforcement Learning. In ACM Transactions on Speech and Language Processing, vol. 7, no. 3, pp. 5:1-5:26. • Cuayáhuitl, H., Renals, S., Lemon, O., Shimodaira, H., (2010), Evaluation of a Hierarchical Reinforcement Learning Spoken Dialogue System. In Computer Speech and Language, vol. 24, no. 2, pp. 395-429. E-Mail: hc213@hw.ac.uk 41