Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Understanding User Satisfaction with Intelligent Assistants

1,046 views

Published on

Voice-controlled intelligent personal assistants, such as Cortana,
Google Now, Siri and Alexa, are increasingly becoming a part of
users’ daily lives, especially on mobile devices. They introduce
a significant change in information access, not only by introducing
voice control and touch gestures but also by enabling dialogues
where the context is preserved. This raises the need for evaluation
of their effectiveness in assisting users with their tasks. However,
in order to understand which type of user interactions reflect different
degrees of user satisfaction we need explicit judgements. In this
paper, we describe a user study that was designed to measure user
satisfaction over a range of typical scenarios of use: controlling a
device, web search, and structured search dialogue. Using this data,
we study how user satisfaction varied with different usage scenarios
and what signals can be used for modeling satisfaction in the
different scenarios. We find that the notion of satisfaction varies
across different scenarios, and show that, in some scenarios (e.g.
making a phone call), task completion is very important while for
others (e.g. planning a night out), the amount of effort spent is key.
We also study how the nature and complexity of the task at hand
affects user satisfaction, and find that preserving the conversation
context is essential and that overall task-level satisfaction cannot
be reduced to query-level satisfaction alone. Finally, we shed light
on the relative effectiveness and usefulness of voice-controlled intelligent
agents, explaining their increasing popularity and uptake
relative to the traditional query-response interaction.

Published in: Internet
  • Be the first to comment

Understanding User Satisfaction with Intelligent Assistants

  1. 1. Understanding User Satisfaction with Intelligent Assistants Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos Anastasakos Eindhoven University of Technology Pennsylvania State University University of Massachusetts Amherst Microsoft CHIIR’16, Chapel Hill, NC, USA
  2. 2. Q1: how is the weather in Chicago Q2: how is it this weekend Q3: find me hotels Q4: which one of these is the cheapest Q5: which one of these has at least 4 stars Q6: find me directions from the Chicago airport to number one User’s dialogue with Cortana: Task is “Finding a hotel in Chicago”
  3. 3. Q1: find me a pharmacy nearby Q2: which of these is highly rated Q3: show more information about number 2 Q4: how long will it take me to get there Q5: Thanks User’s dialogue with Cortana: Task is “Finding a pharmacy”
  4. 4. Research Questions • RQ1: What are characteristic types of scenarios of use?
  5. 5. Controlling Device • Call a person • Send a text message • Check on-device calendar • Open an application • Turn on/off wi-fi • Play music
  6. 6. Knowledge Pane Image Answer
  7. 7. Knowledge Pane Image Answer Image Answer Organic Results
  8. 8. Knowledge Pane Image Answer Image Answer Location Answer Organic Results
  9. 9. User: “Do I need to have a jacket tomorrow?” Search Dialogue
  10. 10. User: “Do I need to have a jacket tomorrow?” Cortana: “You could probably go without one. The forecast shows …” Search Dialogue
  11. 11. Cortana: “Here are ten restaurants near you” User: “show restaurant s near me” Search Dialogue
  12. 12. Cortana: “Here are ten restaurants near you” Cortana: “Here are ten restaurants near you that have good reviews” User: “show restaurant s near me” User: “show the best restaurants near me ” Search Dialogue
  13. 13. Cortana: “Here are ten restaurants near you” Cortana: “Here are ten restaurants near you that have good reviews” Cortana: “Getting you direction to the Mayuri Indian Cuisine” User: “show restaurant s near me” User: “show the best restaurants near me ” User: “show directions to the second one” Search Dialogue
  14. 14. Research Questions • RQ1: What are characteristic types of scenarios of use? • RQ2: How can we measure different aspects of user satisfaction? • RQ3: What are key factors determining user satisfaction for the different scenarios? • RQ4: How to characterize abandonment in the web search scenario? • RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?
  15. 15. Research Questions • RQ1: What are characteristic types of scenarios of use? • RQ2: How can we measure different aspects of user satisfaction? • RQ3: What are key factors determining user satisfaction for the different scenarios? • RQ4: How to characterize abandonment in the web search scenario? • RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario? USERSTUDY
  16. 16. User Study Participants 55% 45% LANGUAGE English Other • 60 Participants • 25.53 +/- 5.42 years
  17. 17. User Study Participants 75% 25% GENDER Male Female 55% 45% LANGUAGE English Other • 60 Participants • 25.53 +/- 5.42 years
  18. 18. User Study Participants 75% 25% GENDER Male Female 55% 45% LANGUAGE English Other 82% 8% 2% 8% EDUCATION Computer Science Electrical Engineering Mathematics Other • 60 Participants • 25.53 +/- 5.42 years
  19. 19. User Study Design • Video Instructions (same for all participants) • Tasks are realistic – mined from Cortana logs: o Control type of tasks o Queries where users don’t click o Search dialogue tasks – mostly localization type of queries
  20. 20. Find out what is the hair color of your favorite celebrity.
  21. 21. You are planning a vacation. Pick a place. Check if the weather is good enough for the period you are planning the vacation. Find a hotel that suits you. Find the driving directions to this place.
  22. 22. You are planning a vacation. Pick a place. Check if the weather is good enough for the period you are planning the vacation. Find a hotel that suits you. Find the driving directions to this place.
  23. 23. Questionnaire: Controlling Device • Were you able to complete the task? o Yes/No • How satisfied are you with your experience in this task? o 5-point Likert scale • How well did Cortana recognize what you said? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale
  24. 24. Questionnaire: Controlling Device • Were you able to complete the task? o Yes/No • How satisfied are you with your experience in this task? o 5-point Likert scale • How well did Cortana recognize what you said? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale 5 Tasks 20 Minutes
  25. 25. Questionnaire: Good Abandonment • Were you able to complete the task? o Yes/No • Where did you find the answer? o Answer Box, Image, SERP, Visited Website • Which query led you to finding the answer? o First, Second, Third, >= Fourth • How satisfied are you with your experience in this task? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale
  26. 26. Questionnaire: Good Abandonment • Were you able to complete the task? o Yes/No • Where did you find the answer? o Answer Box, Image, SERP, Visited Website • Which query led you to finding the answer? o First, Second, Third, >= Fourth • How satisfied are you with your experience in this task? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale 5 Tasks 20 Minutes
  27. 27. Questionnaire: Search Dialogue • Were you able to complete the task? o Yes/No • How satisfied are you with your experience in this task? o If the task has sub-tasks participants indicate their graded satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions? • How well did Cortana recognize what you said? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale
  28. 28. Questionnaire: Search Dialogue • Were you able to complete the task? o Yes/No • How satisfied are you with your experience in this task? o If the task has sub-tasks participants indicate their graded satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions? • How well did Cortana recognize what you said? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale 8 Tasks: 1 simple, 4 with 2 subtasks, 3 with 3 subtasks 30 Minutes
  29. 29. Search Dialog Dataset • 540 tasks that incorporated • 2, 040 queries, of which 1, 969 were unique • the average query-length is 7.07 • The simple task generated 130 queries in total • Tasks with 2 context switches generated 685 queries • Tasks with 3 context switches generated 1, 355 queries
  30. 30. Factors Determining Satisfaction RQ3: What are key factors determining user satisfaction for the different scenarios?
  31. 31. 0 1 2 3 4 5 6 Across Scenarious Device Control Web Search Structured Dialog 5 0 1 2 3 4 5 6 Across Scenarious Device Control Web Search Structured Dialog 5 SatisfactionLevel Efforts Results Over Scenarios Mean of Satisfaction
  32. 32. Results `Good Abandonment’ RQ4: How to characterize abandonment in the web search scenario?
  33. 33. 0 1 2 3 4 5 6 First Query Second Query Third Query >= Fourth Quey 0 1 2 3 4 5 6 Answer Box Image SERP Visited WebSite 5 SatisfactionLevel Results `Good Abandonment’ Mean of Satisfaction
  34. 34. Search Dialogue Satisfaction RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?
  35. 35. Cortana: “Here are ten restaurants near you” Cortana: “Here are ten restaurants near you that have good reviews” Cortana: “Getting you direction to the Mayuri Indian Cuisine” User: “show restaurant s near me” User: “show the best restaurants near me ” User: “show directions to the second one” SAT? SAT? SAT? SAT? SAT? SAT? Overall SAT? ?
  36. 36. Search Dialogue Satisfaction RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?
  37. 37. Satisfaction Over Different Tasks Satisfaction Level Weather Task NumberofAnswers 1 2 3 4 5
  38. 38. Satisfaction Over Different Tasks Satisfaction Level Weather Task Mission Task (2 sub-tasks) NumberofAnswers 1 2 3 4 5
  39. 39. Satisfaction Over Different Tasks Satisfaction Level Weather Task Mission Task (2 sub-tasks) Mission Task (3 sub-tasks) NumberofAnswers 1 2 3 4 5
  40. 40. Q1: what do you have medicine for the stomach ache Q2: stomach ache medicine over the counter Q3: show me the nearest pharmacy Q4: more information on the second one Q5: do they have a stool softener Q6: does Fred Meyer have stool softeners General Search Search Dialog Combination of scenarios User’s dialogue with Cortana related to the ‘stomach ache’ problem
  41. 41. Conclusions (1) • RQ1: What are characteristic types of scenarios of use? • We proposed three main types of scenarios • RQ2: How can we measure different aspects of user satisfaction? • We designed a series of user studies tailored to the three scenarios • RQ3: What are key factors determining user satisfaction for the different scenarios? • Effort is a key component of user satisfaction across the different intelligent assistants scenarios
  42. 42. Conclusions (2) • RQ4: How to characterize abandonment in the web search scenario? • We concluded that to measure good abandonment we need to investigate the other forms of interaction signals that are not based on clicks or reformulation • RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario? • We looked at user satisfaction as ‘a user journey towards an information goal where each step is important,’ and showed the importance of session context
  43. 43. Questions? • We proposed three main types of scenarios of use • We designed a series of user studies tailored to the three scenarios • Effort is a key component of user satisfaction across the different intelligent assistants scenarios • We concluded that to measure good abandonment we need to investigate the other forms of interaction signals that are not based on clicks or reformulation • We looked at user satisfaction as ‘a user journey towards an information goal where each step is important,’ and showed the importance of session context on user satisfaction Questions?

×