Julia Kiseleva
@julia_kiseleva
UserSat.com University of Amsterdam
Evaluating Personal Assistants
Google at SIGIR 2016
Google at SIGIR 2016
2016
Google at SIGIR 2016
It brings us new challenges
Google at SIGIR 2016
Google at SIGIR 2016
From Queries to Dialogues
Q1: how is the weather in Chicago
Q2: how is it this weekend
Q3: find me hotels
Q4: which one of these is the cheapest
Q5: which one of these has at least 4 stars
Q6: find me directions from the Chicago airport to
number one
User’s dialogue
with Cortana:
Task is “Finding
a hotel in
Chicago”
From Queries to Dialogues
Q1: find me a pharmacy nearby
Q2: which of these is highly rated
Q3: show more information about number 2
Q4: how long will it take me to get there
Q5: Thanks
User’s dialogue
with Cortana:
Task is “Finding
a pharmacy”
Main Research Question
How can we automatically predict user
satisfaction with search dialogues on
intelligent assistants using
click, touch, and voice interactions?
What is user satisfaction?
How to define user satisfaction
with search dialogues?
Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
No Clicks
???
Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
SAT? SAT? SAT?
Overall
SAT?
? SAT? SAT? SAT?
User Frustration
Q1: what's the weather like in San Francisco
Q2: what's the weather like in Mountain View
Q3: can you find me a hotel close to Mountain
View
Q4: can you show me the cheapest ones
Q5: show me the third one
Q6: show me the directions from SFO to this
hotel
Q6: show me the directions from SFO to this
hotel
Q7: go back to first hotel (misrecognition)
Q8: show me hotels in Mountain View
Q9: show me cheap hotels in Mountain View
Q10: show me more about the third one


Dialog with
Intelligent Assistant
Task is “Planning a
weekend ”
RestartsearchAuserissatisfied

What interaction signals can
track during search dialogues?
Tracking User Interaction:
Phonetic Similarity
Phonetic Similarity
between consecutive requests
Tracking User Interaction
3 seconds 6 seconds
33% of
ViewPort
66% of
ViewPort
ViewPortHeight
2 seconds
20% of
ViewPort
1s 4s 0.4s 5.4s+ + =
Tracking User Interaction
• Number of Swipes
• Number of up-swipes
• Number of down-swipes
• Total distance swiped (pixels)
• Number of swipes normalized by
time
• Total distance divided by num. of
swipes
• Total swiped distance divided by
time
• Number of swipe direction
changes
• SERP answer duration (seconds)
which is shown on screen (even
partially)
• Fraction of visible pixels belonging
to SERP answer
• Attributed time (seconds) to viewing
a particular element (answer) on
SERP
• Attributed time (seconds) per unit
height (pixels) associated with a
particular element on SERP
• Attributed time (milliseconds) per
unit area (square pixels) associated
with a particular element on SERP
Tracking User Interaction:
Touch Signals
Quality of Interaction Model
Method Accuracy (%) Average F1 (%)
Baseline 70.62 61.38
Interaction Model 80.81*
(14.43)
79.08*
(28.83)
* Statistically significant improvement (p < 0,05 )
How current prediction of user
satisfaction can be improved?
Cepstrum: Normal Voice
Cepstrum: Angry Voice
Normal vs Angry
Normal Voice
Angry Voice
Changes in User Emotions
ti
ti+1
Emotion
State
Emotion
State
Changes in User Emotions
ti
ti+1
Emotion
State
Emotion
State
SAT
DSAT
How to define a situational user
satisfaction?
User Situation Matters
Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
From Queries to Dialogues:
Sequential Interaction
Tendency Toward Direct Answers
Tendency Toward Direct Answers
User-System Interaction Interface
User-System Interaction Interface
How to restore the user reward function?
Inverse Reinforcement Learning
[P. Abbeel’s slides on IRL]
• User satisfaction with personal assistants is defined in the generalized
form, which showed understanding the nature of user satisfaction as an
aggregation of satisfaction with all dialogue’s tasks and not as a
satisfaction with all dialogue’s queries separately
• We showed that features derived from voice and especially from touch
and voice interactions add significant gain in accuracy over the baseline
• We proposed a novel and dynamic approach to restore user reward
function
Thank you!
Questions?

Evaluation Personal Assistants

  • 1.
    Julia Kiseleva @julia_kiseleva UserSat.com Universityof Amsterdam Evaluating Personal Assistants
  • 2.
  • 3.
  • 4.
  • 5.
    It brings usnew challenges Google at SIGIR 2016
  • 6.
  • 7.
    From Queries toDialogues Q1: how is the weather in Chicago Q2: how is it this weekend Q3: find me hotels Q4: which one of these is the cheapest Q5: which one of these has at least 4 stars Q6: find me directions from the Chicago airport to number one User’s dialogue with Cortana: Task is “Finding a hotel in Chicago”
  • 8.
    From Queries toDialogues Q1: find me a pharmacy nearby Q2: which of these is highly rated Q3: show more information about number 2 Q4: how long will it take me to get there Q5: Thanks User’s dialogue with Cortana: Task is “Finding a pharmacy”
  • 9.
    Main Research Question Howcan we automatically predict user satisfaction with search dialogues on intelligent assistants using click, touch, and voice interactions?
  • 10.
    What is usersatisfaction?
  • 13.
    How to defineuser satisfaction with search dialogues?
  • 14.
    Cortana: “Here are ten restaurants nearyou” Cortana: “Here are ten restaurants near you that have good reviews” Cortana: “Getting you direction to the Mayuri Indian Cuisine” User: “show restauran ts near me” User: “show the best ones” User: “show directions to the second one” No Clicks ???
  • 15.
    Cortana: “Here are ten restaurants nearyou” Cortana: “Here are ten restaurants near you that have good reviews” Cortana: “Getting you direction to the Mayuri Indian Cuisine” User: “show restauran ts near me” User: “show the best ones” User: “show directions to the second one” SAT? SAT? SAT? Overall SAT? ? SAT? SAT? SAT?
  • 16.
    User Frustration Q1: what'sthe weather like in San Francisco Q2: what's the weather like in Mountain View Q3: can you find me a hotel close to Mountain View Q4: can you show me the cheapest ones Q5: show me the third one Q6: show me the directions from SFO to this hotel Q6: show me the directions from SFO to this hotel Q7: go back to first hotel (misrecognition) Q8: show me hotels in Mountain View Q9: show me cheap hotels in Mountain View Q10: show me more about the third one   Dialog with Intelligent Assistant Task is “Planning a weekend ” RestartsearchAuserissatisfied 
  • 17.
    What interaction signalscan track during search dialogues?
  • 18.
    Tracking User Interaction: PhoneticSimilarity Phonetic Similarity between consecutive requests
  • 19.
  • 20.
    3 seconds 6seconds 33% of ViewPort 66% of ViewPort ViewPortHeight 2 seconds 20% of ViewPort 1s 4s 0.4s 5.4s+ + = Tracking User Interaction
  • 22.
    • Number ofSwipes • Number of up-swipes • Number of down-swipes • Total distance swiped (pixels) • Number of swipes normalized by time • Total distance divided by num. of swipes • Total swiped distance divided by time • Number of swipe direction changes • SERP answer duration (seconds) which is shown on screen (even partially) • Fraction of visible pixels belonging to SERP answer • Attributed time (seconds) to viewing a particular element (answer) on SERP • Attributed time (seconds) per unit height (pixels) associated with a particular element on SERP • Attributed time (milliseconds) per unit area (square pixels) associated with a particular element on SERP Tracking User Interaction: Touch Signals
  • 24.
    Quality of InteractionModel Method Accuracy (%) Average F1 (%) Baseline 70.62 61.38 Interaction Model 80.81* (14.43) 79.08* (28.83) * Statistically significant improvement (p < 0,05 )
  • 25.
    How current predictionof user satisfaction can be improved?
  • 26.
  • 27.
  • 28.
    Normal vs Angry NormalVoice Angry Voice
  • 29.
    Changes in UserEmotions ti ti+1 Emotion State Emotion State
  • 30.
    Changes in UserEmotions ti ti+1 Emotion State Emotion State SAT DSAT
  • 32.
    How to definea situational user satisfaction?
  • 33.
  • 34.
    Cortana: “Here are ten restaurants nearyou” Cortana: “Here are ten restaurants near you that have good reviews” Cortana: “Getting you direction to the Mayuri Indian Cuisine” User: “show restauran ts near me” User: “show the best ones” User: “show directions to the second one” From Queries to Dialogues: Sequential Interaction
  • 35.
  • 36.
  • 37.
  • 38.
    User-System Interaction Interface Howto restore the user reward function?
  • 39.
    Inverse Reinforcement Learning [P.Abbeel’s slides on IRL]
  • 40.
    • User satisfactionwith personal assistants is defined in the generalized form, which showed understanding the nature of user satisfaction as an aggregation of satisfaction with all dialogue’s tasks and not as a satisfaction with all dialogue’s queries separately • We showed that features derived from voice and especially from touch and voice interactions add significant gain in accuracy over the baseline • We proposed a novel and dynamic approach to restore user reward function Thank you! Questions?

Editor's Notes

  • #19 We utilize acoustic feature to characterize voice interaction happening in search dialogues. More specifically, we use the phonetic similarity between consecutive requests to identify patterns of repetition. Metaphone representation [39] is a way of indexing words by their pronunciation that allows us to represent words by how they are pronounced as opposed to how they are written.
  • #21 Consider movie recommendation