Teacher-Aware
Active Robot Learning
Mattia Racca, Antti Oulasvirta and Ville Kyrki
ACM/IEEE International Conference on
Human-Robot Interaction (HRI), 2019
mattia.racca@aalto.fi
Why (active) learning robots?
2
Programming robots is hard, pre-programming them
for each task is harder impossible.
Why (active) learning robots?
3
Robot should learn by interacting with humans!
M. Racca and V. Kyrki, Active Robot Learning for Temporal Task models, HRI ‘18
The idea behind Active Learning
4
The idea behind Active Learning
5
The idea behind Active Learning
6
The agent can efficiently choose what to learn next.
The idea behind Active Learning
7
… and improve its model faster!
8
Important aspects of Active Learning for HRI
1. Interactive Nature
Transparency
Design of
questions
Control over interaction
Timing of questions
9
Important aspects of Active Learning for HRI
1. Interactive Nature
Transparency
Design of questions
Control over interaction
Timing of questions
2. Query Efficiency
Learning faster (with less data)
10
Important aspects of Active Learning for HRI
1. Interactive Nature
Transparency
Design of questions
Control over interaction
Timing of questions
2. Query Efficiency
Learning faster (with less data)
But what about REAL users?
What if efficient query
selection is not best
for the interaction?
11
Can efficiency
indirectly
counter its
own benefits?
12
Query
Efficiency
Complex
questions
Questions
out of context
Harder for the teacher
● slower interaction
● more effort
● more errors!
Different types of Active Learning
13
1. CLASSIC
AL STRATEGY
(LEARNER C)
2. TEACHER-AWARE
AL STRATEGY
(LEARNER M)
3. HYBRID AL
STRATEGY
(LEARNER H)
An agent has to learn the value of a certain attribute a for a
set E of entities by making queries. We used the Animals with
Attributes 2* dataset with 50 animals (entities) and 85
semantic attributes.
Problem statement & Evaluation scenario
14
* Y. Xian, et al.. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly, T-PAMI
An agent has to learn the value of a certain attribute a for a
set E of entities by making queries. We used the Animals with
Attributes 2* dataset with 50 animals (entities) and 85
semantic attributes.
Problem statement & Evaluation scenario
15
* Y. Xian, et al.. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly, T-PAMI
YES
Do giraffes have
patches?
● categories C over entities using WordNet
● Learner assumption: Entities in the same category are
more likely to share the same attribute value.
Problem statement & Evaluation scenario
16
● categories C over entities using WordNet
● Learner assumption: Entities in the same category are
more likely to share the same attribute value.
Problem statement & Evaluation scenario
17
Classic AL: Uncertainty Sampling
18
● Learner C:
○ uses Uncertainty Sampling
○ selects the most uncertain query,
given the current model.
○ As expected efficient!
Classic AL: Uncertainty Sampling
19
● Learner C drawbacks
○ Some questions are difficult!
○ Topic or context switches!
● Teacher-Aware strategy (Learner M)
○ Inspired by ACT-R declarative memory model,
saying “Information associated with recently
retrieved information is easier to retrieve”,
○ minimize the distance between consecutive
queries
In response to the drawbacks
20
● Teacher-Aware strategy (Learner M)
○ Inspired by ACT-R declarative memory model,
saying “Information associated with recently
retrieved information is easier to retrieve”,
○ minimize the distance between consecutive
queries;
● Hybrid strategy (Learner H)
○ a tradeoff between Learner C and Learner M
In response to the drawbacks
21
Teacher-Aware AL: Memory Effort strategy
22
Simulation on the entire dataset:
● Perfect users (no errors, no distraction)
● Baseline: asks random questions and cannot leverage our
model to make predictions
Performance in Simulation
23
Performance in Simulation
24
User study: 26 participants,
the 3 strategies as conditions
(within-subject).
What about real users?
25
User study: 26 participants,
the 3 strategies as conditions
(within-subject).
Data logged:
● NASA TLX
● Q&A, response times,
prediction power
● Overall preferences
What about real users?
26
User study: 26 participants,
the 3 strategies as conditions
(within-subject).
Data logged:
● NASA TLX
● Q&A, response times,
prediction power
● Overall preferences
What about real users?
27
Our hypotheses:
Learner M makes the
participants reply (a)
faster and (b) with less
errors compared to
Learner C, with Learner
H achieving
intermediate results.
Results
28
*
(Unexpected) Results
29
*
*
(Unexpected) Results
30
*
*
(Unexpected) Results
31
* *
● Higher response time and more errors for Learner C.
Discussion
32
● Higher response time and more errors for Learner C.
○ stressful, unpredictable and requiring more
thinking
Discussion
33
● Higher response time and more errors for Learner C.
○ stressful, unpredictable and requiring more
thinking
● Higher response time and more errors for Learner M.
Discussion
34
Discussion
35
● Higher response time and more errors for Learner C.
○ stressful, unpredictable and requiring more
thinking
● Higher response time and more errors for Learner M.
○ easy, natural and predictable
Discussion
36
● Higher response time and more errors for Learner C.
○ stressful, unpredictable and requiring more
thinking
● Higher response time and more errors for Learner M.
○ easy, natural and predictable
○ too easy? lowering attention or cause boredom
● Higher response time and more errors for Learner C.
○ stressful, unpredictable and requiring more
thinking
● Higher response time and more errors for Learner M.
○ easy, natural and predictable
○ too easy? lowering attention or cause boredom
○ too predictable? using the same (maybe wrong)
answer
Discussion
37
Discussion
38
● Overall preferences:
● Overall preferences:
● Learner C as efficient Mitigating difficulty!
Discussion
39
Discussion
40
● Overall preferences:
● Learner C as efficient Mitigating difficulty!
● Learner M as useless Frustration and boredom!
Discussion
41
● Overall preferences:
● Learner C as efficient Mitigating difficulty!
● Learner M as useless Frustration and boredom!
● AVOID USELESS QUESTIONS!
Conclusions
Can efficiency-driven Active Learning counter its
own benefits?
42
Can efficiency-driven Active Learning counter its
own benefits?
If we consider in the equation non-oracle users, yes!
But we just scratched the surface...
● We need a better understanding of interaction
aspects that can affect learning
● Strategies that can adapt to the specific user
Conclusions
43
Teacher-Aware Active Robot Learning
Mattia Racca, Antti Oulasvirta and Ville Kyrki
mattia.racca@aalto.fi
Thank you for the attention!
Code available at github.com/MattiaRacca
Can efficiency-driven Active Learning
counter its own benefits?
If we consider in the equation non-oracle
users and the interaction, yes!
45
Tree building algorithm
46
We model the probability of attribute a applying
to category c as
and then we maintain a prior over these
distribution. We can then compute the
probability of a applying to entity e as
and therefore predict attribute entities pairs, given our current model.
The update step of the model is the computation of the posterior distributions
given the user answer r as an observation.
Attribute-Category Model
47
Learner C
Learner M
Learner H
Scores for each active learner
Assumption choice

Teacher-Aware Active Robot Learning

  • 1.
    Teacher-Aware Active Robot Learning MattiaRacca, Antti Oulasvirta and Ville Kyrki ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2019 mattia.racca@aalto.fi
  • 2.
    Why (active) learningrobots? 2 Programming robots is hard, pre-programming them for each task is harder impossible.
  • 3.
    Why (active) learningrobots? 3 Robot should learn by interacting with humans! M. Racca and V. Kyrki, Active Robot Learning for Temporal Task models, HRI ‘18
  • 4.
    The idea behindActive Learning 4
  • 5.
    The idea behindActive Learning 5
  • 6.
    The idea behindActive Learning 6 The agent can efficiently choose what to learn next.
  • 7.
    The idea behindActive Learning 7 … and improve its model faster!
  • 8.
    8 Important aspects ofActive Learning for HRI 1. Interactive Nature Transparency Design of questions Control over interaction Timing of questions
  • 9.
    9 Important aspects ofActive Learning for HRI 1. Interactive Nature Transparency Design of questions Control over interaction Timing of questions 2. Query Efficiency Learning faster (with less data)
  • 10.
    10 Important aspects ofActive Learning for HRI 1. Interactive Nature Transparency Design of questions Control over interaction Timing of questions 2. Query Efficiency Learning faster (with less data) But what about REAL users?
  • 11.
    What if efficientquery selection is not best for the interaction? 11
  • 12.
    Can efficiency indirectly counter its ownbenefits? 12 Query Efficiency Complex questions Questions out of context Harder for the teacher ● slower interaction ● more effort ● more errors!
  • 13.
    Different types ofActive Learning 13 1. CLASSIC AL STRATEGY (LEARNER C) 2. TEACHER-AWARE AL STRATEGY (LEARNER M) 3. HYBRID AL STRATEGY (LEARNER H)
  • 14.
    An agent hasto learn the value of a certain attribute a for a set E of entities by making queries. We used the Animals with Attributes 2* dataset with 50 animals (entities) and 85 semantic attributes. Problem statement & Evaluation scenario 14 * Y. Xian, et al.. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly, T-PAMI
  • 15.
    An agent hasto learn the value of a certain attribute a for a set E of entities by making queries. We used the Animals with Attributes 2* dataset with 50 animals (entities) and 85 semantic attributes. Problem statement & Evaluation scenario 15 * Y. Xian, et al.. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly, T-PAMI YES Do giraffes have patches?
  • 16.
    ● categories Cover entities using WordNet ● Learner assumption: Entities in the same category are more likely to share the same attribute value. Problem statement & Evaluation scenario 16
  • 17.
    ● categories Cover entities using WordNet ● Learner assumption: Entities in the same category are more likely to share the same attribute value. Problem statement & Evaluation scenario 17
  • 18.
    Classic AL: UncertaintySampling 18 ● Learner C: ○ uses Uncertainty Sampling ○ selects the most uncertain query, given the current model. ○ As expected efficient!
  • 19.
    Classic AL: UncertaintySampling 19 ● Learner C drawbacks ○ Some questions are difficult! ○ Topic or context switches!
  • 20.
    ● Teacher-Aware strategy(Learner M) ○ Inspired by ACT-R declarative memory model, saying “Information associated with recently retrieved information is easier to retrieve”, ○ minimize the distance between consecutive queries In response to the drawbacks 20
  • 21.
    ● Teacher-Aware strategy(Learner M) ○ Inspired by ACT-R declarative memory model, saying “Information associated with recently retrieved information is easier to retrieve”, ○ minimize the distance between consecutive queries; ● Hybrid strategy (Learner H) ○ a tradeoff between Learner C and Learner M In response to the drawbacks 21
  • 22.
    Teacher-Aware AL: MemoryEffort strategy 22
  • 23.
    Simulation on theentire dataset: ● Perfect users (no errors, no distraction) ● Baseline: asks random questions and cannot leverage our model to make predictions Performance in Simulation 23
  • 24.
  • 25.
    User study: 26participants, the 3 strategies as conditions (within-subject). What about real users? 25
  • 26.
    User study: 26participants, the 3 strategies as conditions (within-subject). Data logged: ● NASA TLX ● Q&A, response times, prediction power ● Overall preferences What about real users? 26
  • 27.
    User study: 26participants, the 3 strategies as conditions (within-subject). Data logged: ● NASA TLX ● Q&A, response times, prediction power ● Overall preferences What about real users? 27 Our hypotheses: Learner M makes the participants reply (a) faster and (b) with less errors compared to Learner C, with Learner H achieving intermediate results.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    ● Higher responsetime and more errors for Learner C. Discussion 32
  • 33.
    ● Higher responsetime and more errors for Learner C. ○ stressful, unpredictable and requiring more thinking Discussion 33
  • 34.
    ● Higher responsetime and more errors for Learner C. ○ stressful, unpredictable and requiring more thinking ● Higher response time and more errors for Learner M. Discussion 34
  • 35.
    Discussion 35 ● Higher responsetime and more errors for Learner C. ○ stressful, unpredictable and requiring more thinking ● Higher response time and more errors for Learner M. ○ easy, natural and predictable
  • 36.
    Discussion 36 ● Higher responsetime and more errors for Learner C. ○ stressful, unpredictable and requiring more thinking ● Higher response time and more errors for Learner M. ○ easy, natural and predictable ○ too easy? lowering attention or cause boredom
  • 37.
    ● Higher responsetime and more errors for Learner C. ○ stressful, unpredictable and requiring more thinking ● Higher response time and more errors for Learner M. ○ easy, natural and predictable ○ too easy? lowering attention or cause boredom ○ too predictable? using the same (maybe wrong) answer Discussion 37
  • 38.
  • 39.
    ● Overall preferences: ●Learner C as efficient Mitigating difficulty! Discussion 39
  • 40.
    Discussion 40 ● Overall preferences: ●Learner C as efficient Mitigating difficulty! ● Learner M as useless Frustration and boredom!
  • 41.
    Discussion 41 ● Overall preferences: ●Learner C as efficient Mitigating difficulty! ● Learner M as useless Frustration and boredom! ● AVOID USELESS QUESTIONS!
  • 42.
    Conclusions Can efficiency-driven ActiveLearning counter its own benefits? 42
  • 43.
    Can efficiency-driven ActiveLearning counter its own benefits? If we consider in the equation non-oracle users, yes! But we just scratched the surface... ● We need a better understanding of interaction aspects that can affect learning ● Strategies that can adapt to the specific user Conclusions 43
  • 44.
    Teacher-Aware Active RobotLearning Mattia Racca, Antti Oulasvirta and Ville Kyrki mattia.racca@aalto.fi Thank you for the attention! Code available at github.com/MattiaRacca Can efficiency-driven Active Learning counter its own benefits? If we consider in the equation non-oracle users and the interaction, yes!
  • 45.
  • 46.
    46 We model theprobability of attribute a applying to category c as and then we maintain a prior over these distribution. We can then compute the probability of a applying to entity e as and therefore predict attribute entities pairs, given our current model. The update step of the model is the computation of the posterior distributions given the user answer r as an observation. Attribute-Category Model
  • 47.
    47 Learner C Learner M LearnerH Scores for each active learner
  • 48.