From queries to dialogues

From Queries to Dialogues:
Predicting User Satisfaction with
Intelligent Assistants
Julia Kiseleva, Kyle Williams, Ahmed Hassan Awadallah,
Aidan C. Crook, Imed Zitouni, Tasos Anastasakos
Eindhoven University of Technology
Pennsylvania State University
Microsoft

It brings us new challenges
Google at SIGIR 2016

From Queries to Dialogues
Q1: how is the weather in Chicago
Q2: how is it this weekend
Q3: find me hotels
Q4: which one of these is the cheapest
Q5: which one of these has at least 4 stars
Q6: find me directions from the Chicago airport to
number one
User’s dialogue
with Cortana:
Task is “Finding
a hotel in
Chicago”

Q1: find me a pharmacy nearby
Q2: which of these is highly rated
Q3: show more information about number 2
Q4: how long will it take me to get there
Q5: Thanks
User’s dialogue
with Cortana:
Task is “Finding
a pharmacy”

Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”

Main Research Question
How can we automatically predict user
satisfaction with search dialogues on
intelligent assistants using
click, touch, and voice interactions?

User:
“Do I need
to have a
jacket
tomorrow?”
Cortana: “You
could probably
go without one.
The forecast
shows …”
Single Task Search Dialogue

Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
Multi-Task Search Dialogues

How to define user satisfaction
with search dialogues?

Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
No Clicks
???

Cortana:
“Here are ten
restaurants
near you”
Cortana:
“Here are ten
restaurants near
you that have
good reviews”
Cortana:
“Getting you
direction to the
Mayuri Indian
Cuisine”
User:
“show
restauran
ts near
me”
User:
“show the
best ones”
User:
“show
directions
to the
second
one”
SAT? SAT? SAT?
Overall
SAT?
? SAT? SAT? SAT?

User Frustration
Q1: what's the weather like in San Francisco
Q2: what's the weather like in Mountain View
Q3: can you find me a hotel close to Mountain
View
Q4: can you show me the cheapest ones
Q5: show me the third one
Q6: show me the directions from SFO to this
hotel
Q6: show me the directions from SFO to this
hotel
Q7: go back to first hotel (misrecognition)
Q8: show me hotels in Mountain View
Q9: show me cheap hotels in Mountain View
Q10: show me more about the third one


Dialog with
Intelligent Assistant
Task is “Planning a
weekend ”
RestartsearchAuserissatisfied


What interaction signals can
track during search dialogues?

Tracking User Interaction:
Click Signals
• Number of queries in a dialogue
• Number of clicks in a dialogue
• Number of SAT clicks (> 30 sec. dwell time) in a dialogue
• Number of DSAT clicks (< 15 sec. dwell time) in a dialogue
• Time (seconds) until the first click in a dialogue

Acoustic Signals
Phonetic Similarity
between consecutive requests

3 seconds 6 seconds
33% of
ViewPort
66% of
ViewPort
ViewPortHeight
2 seconds
20% of
ViewPort
1s 4s 0.4s 5.4s+ + =
Tracking User Interaction

• Number of Swipes
• Number of up-swipes
• Number of down-swipes
• Total distance swiped (pixels)
• Number of swipes normalized by
time
• Total distance divided by num. of
swipes
• Total swiped distance divided by
time
• Number of swipe direction
changes
• SERP answer duration (seconds)
which is shown on screen (even
partially)
• Fraction of visible pixels belonging
to SERP answer
• Attributed time (seconds) to viewing
a particular element (answer) on
SERP
• Attributed time (seconds) per unit
height (pixels) associated with a
particular element on SERP
• Attributed time (milliseconds) per
unit area (square pixels) associated
with a particular element on SERP
Touch Signals

User Study Participants
75%
25%
GENDER
Male Female
55%
45%
LANGUAGE
English Other
82%
8%
2% 8%
EDUCATION Computer
Science
Electrical
Engineering
Mathematics
Other
• 60 Participants
• 25.53 +/- 5.42 years

You are planning a
vacation. Pick a place.
Check if the weather is
good enough for the
period you are planning
the vacation. Find a hotel
that suits you. Find the
driving directions to this
place.

Questionnaire
• Were you able to complete the task?
o Yes/No
• How satisfied are you with your experience in this task?
o If the task has sub-tasks participants indicate their graded satisfaction e.g.
o a. How satisfied are you with your experience in finding a hotel?
o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?
o 5-point Likert scale
• Did you put in a lot of effort to complete the task?

Questionnaire
• Were you able to complete the task?
o Yes/No
• How satisfied are you with your experience in this task?
o If the task has sub-tasks participants indicate their graded satisfaction e.g.
o a. How satisfied are you with your experience in finding a hotel?
o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?
• Did you put in a lot of effort to complete the task?
8 Tasks:
1 simple,
4 with 2 subtasks,
3 with 3 subtasks
~ 30 Minutes

Search Dialog Dataset
• Total amount of queries is 2, 040
• Amount of unique queries is 1, 969
• The average query-length is 7.07

Search Dialog Dataset
• Total amount of queries is 2, 040
• Amount of unique queries is 1, 969
• The average query-length is 7.07
• The simple task generated 130 queries
• Tasks with 2 context switches generated 685 queries
• Tasks with 3 context switches generated 1, 355
queries

How can we predict user
satisfaction
with search dialogues using
interaction signals?

Q1: what do you have medicine for the
stomach ache
Q2: stomach ache medicine over the counter
General
Web
SERP
User’s dialogue about the ‘stomach
ache’

Q1: what do you have medicine for the
stomach ache
Q2: stomach ache medicine over the counter
Q3: show me the nearest pharmacy
Q4: more information on the second one
General
Web
SERP
Structured
SERP
User’s dialogue about the ‘stomach
ache’

General Web and Structured SERP

Aggregating Touch Interactions (I)
I( )
1.

I( ) I( , )
1. 2.

I( ) I( ),I( )I( , )
1. 2. 3.

Quality of Interaction Model
Method Accuracy (%) Average F1 (%)
Baseline 70.62 61.38
Interaction Model 1 78.78*
(+11.55)
83.59*
(+35.90)
(+13.58)
83.31*
(+35.44)
(14.43)
79.08*
(28.83)
* Statistically significant improvement (p < 0,05 )

Which interaction signals have
the highest impact on predicting
user satisfaction with search
dialogues?

Predicting User Satisfaction
• F1: The SERP for a query is ordered by a measure of relevance as
determined by the system, then additional exploration is unlikely to achieve
user satisfaction, but is more likely an indication that the best-provided
results (i.e. the SERP top) are insufficient to address the user intent

• F2: In the converse case of F1, when users find content that satisfies their
intent, their likelihood of scrolling is reduced, and they dwell for an extended
period on the top viewport

• F2: In the converse case of F1, when users find content that satisfies their
intent, their likelihood of scrolling is reduced, and they dwell for an extended
period on the top viewport
• F3: When users are involved in a complex task, they are dissatisfied when
redirected to a general web SERP. Unlike F2, the absence of scrolling on this
landing page is an indication of dissatisfaction

How can we define user satisfaction with search dialogues?
• User satisfaction with search dialogues is defined in the generalized form,
which showed understanding the nature of user satisfaction as an
aggregation of satisfaction with all dialogue’s tasks and not as a satisfaction
with all dialogue’s queries separately
How can we predict user satisfaction with search dialogues using
• We showed that features derived from voice and especially from touch and
voice interactions add significant gain in accuracy over the baseline
How can we predict user satisfaction with search dialogues using
• Our analysis showed a strong negative correlation between user satisfaction
and swipe actions
Conclusion

• User satisfaction with search dialogues is defined in
the generalized form, which showed understanding
the nature of user satisfaction as an aggregation of
satisfaction with all dialogue’s tasks and not as a
satisfaction with all dialogue’s queries separately
• We showed that features derived from voice and
especially from touch and voice interactions add
significant gain in accuracy over the baseline
• Our analysis showed a strong negative correlation
between user satisfaction and swipe actions
Thank you!
Questions?

From queries to dialogues

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to From queries to dialogues

Similar to From queries to dialogues (20)

More from Julia Kiseleva

More from Julia Kiseleva (8)

Recently uploaded

Recently uploaded (20)

From queries to dialogues

Editor's Notes