Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

71 views

Published on

The LTI is proud to announce the following PhD Thesis Defense:

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

Ting-Hao Kenneth Huang
11:00am - Tuesday June 12, 2018
GHC 4405

Committee:
Jeffrey P. Bigham, (Chair)
Alexander I. Rudnicky
Niki Kittur
Walter S. Lasecki, (University of Michigan)
Chris Callison-Burch, (University of Pennsylvania)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

  1. 1. Live Note/QA: http://tinyurl.com/KenDefense 1 / 85 [ Question / Feedback: http://tinyurl.com/KenDefense ] Ting-Hao (Kenneth) Huang, Carnegie Mellon University A Crowd-Powered Conversational Assistant That Automates Itself Over Time
  2. 2. Live Note/QA: http://tinyurl.com/KenDefense 2 / 85 A Crowd-Powered Conversation Assistant  CHI’18 , CHI LBW’16  UIST’17, UIST Poster’17  HCOMP’17, ‘16, ‘15, HCOMP DC’16, HCOMP WIP’14  CI’17  CSCW Workshop'17 Chorus
  3. 3. Live Note/QA: http://tinyurl.com/KenDefense 3 / 85
  4. 4. Live Note/QA: http://tinyurl.com/KenDefense 4 / 85
  5. 5. Live Note/QA: http://tinyurl.com/KenDefense 5 / 85
  6. 6. Live Note/QA: http://tinyurl.com/KenDefense 6 / 85
  7. 7. Live Note/QA: http://tinyurl.com/KenDefense 7 / 85
  8. 8. Live Note/QA: http://tinyurl.com/KenDefense 8 / 85
  9. 9. Live Note/QA: http://tinyurl.com/KenDefense 9 / 85 What just happened? • Open Conversation • Multi-turn interaction • Multiple domains • Personalized • Coherent dialog • Mix of task-oriented and social conversation
  10. 10. Live Note/QA: http://tinyurl.com/KenDefense 10 / 85 Today’s Conversational Assistants… “What’s new with Alexa?” “Talking to Siri”
  11. 11. Live Note/QA: http://tinyurl.com/KenDefense 11 / 85 Open Conversation Personal Assistants Automated
  12. 12. Live Note/QA: http://tinyurl.com/KenDefense 12 / 85 Existing Approaches to Open Conversation • Combining multiple automated dialog systems • DialPort (Zhao, et al., 2016) • End-to-end framework for dialogue systems • Serban, et al. 2016; Li, et al. 2017 • Adapting a model to many other domains • Walker, et al., 2007; Sun, et al., 2016 • Chit-chat systems (social bot) • Hold social conversations (Banchs, et al., 2012) • Still a very hard problem…
  13. 13. Live Note/QA: http://tinyurl.com/KenDefense 13 / 85 Existing Approaches to Open Conversation • Combining multiple task-oriented dialog systems • DialPort (Zhao, et al., 2016) • End-to-end framework for dialogue systems • Serban, et al. 2016; Li, et al. 2017 • Adapting a model to many other domains • Walker, et al., 2007; Sun, et al., 2016 • Chit-chat systems (social bot) • Hold social conversations (Banchs, et al., 2012) • Still a very hard problem… MIT Technology Review Feb 27, 2018
  14. 14. Live Note/QA: http://tinyurl.com/KenDefense 14 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated
  15. 15. Live Note/QA: http://tinyurl.com/KenDefense 15 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated
  16. 16. Live Note/QA: http://tinyurl.com/KenDefense 16 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems
  17. 17. Live Note/QA: http://tinyurl.com/KenDefense 17 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems
  18. 18. Live Note/QA: http://tinyurl.com/KenDefense 18 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems
  19. 19. Live Note/QA: http://tinyurl.com/KenDefense 19 / 85 Thesis Statement By allowing new chatbots to be easily integrated, reusing prior crowd answers, and gradually reducing the crowd's role in choosing high-quality responses, a deployed crowd-powered dialog system can be automated over time to support real-world open conversations.
  20. 20. Live Note/QA: http://tinyurl.com/KenDefense 20 / 85 Thesis Statement By allowing new chatbots to be easily integrated, reusing prior crowd answers, and gradually reducing the crowd's role in choosing high-quality responses, a deployed crowd-powered dialog system can be automated over time to support real-world open conversations. Chorus Deployment [ HCOMP’16, HCOMP’17 ]
  21. 21. Live Note/QA: http://tinyurl.com/KenDefense 21 / 85 Thesis Statement By allowing new chatbots to be easily integrated, reusing prior crowd answers, and gradually reducing the crowd's role in choosing high-quality responses, a deployed crowd-powered dialog system can be automated over time to support real-world open conversations. Chorus Deployment Evorus [ HCOMP’16, HCOMP’17 ] [ CHI’18 , UIST Poster’17 ]
  22. 22. Live Note/QA: http://tinyurl.com/KenDefense 22 / 85 Thesis Statement By allowing new chatbots to be easily integrated, reusing prior crowd answers, and gradually reducing the crowd's role in choosing high-quality responses, a deployed crowd-powered dialog system can be automated over time to support real-world open conversations. Chorus Deployment Evorus Guardian [ HCOMP’15, CI’17 ] [ HCOMP’16, HCOMP’17 ] [ CHI’18 , UIST Poster’17 ]
  23. 23. Live Note/QA: http://tinyurl.com/KenDefense 23 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems
  24. 24. Live Note/QA: http://tinyurl.com/KenDefense 24 / 85 Chorus: A Crowd-Powered Conversation Assistant [ HCOMP’16, HCOMP’17 ]
  25. 25. Live Note/QA: http://tinyurl.com/KenDefense 25 / 85 Chorus: A Crowd-Powered Conversational Assistant Lasecki, et al. UIST’13. •Crowd workers collectively hold a conversation by: 1. Propose Responses 2. Vote Responses 3. Take Notes • Reward points for each action • Agreement bonus Chorus: A Crowd-Powered Conversation Assistant
  26. 26. Live Note/QA: http://tinyurl.com/KenDefense 26 / 85 User Interface 26 / 31
  27. 27. Live Note/QA: http://tinyurl.com/KenDefense 27 / 85 User & Worker Interface 27 / 31
  28. 28. Live Note/QA: http://tinyurl.com/KenDefense 28 / 85
  29. 29. Live Note/QA: http://tinyurl.com/KenDefense 29 / 85 We Deployed Chorus • Launched on May 20th, 2016 • On Google Hangouts • 2200+ conversations, 420+ users • TalkingToTheCrowd.org
  30. 30. Live Note/QA: http://tinyurl.com/KenDefense 30 / 85 female, computer science PhD student in Texas we're going to visit her this weekend from Pittsburgh She's in Austin Does she have any favorite TV shows, movies, or video games? U Sure! What types of things does your friend like? U Can you suggest some birthday present for one of my friend? 30 Gift Suggestion
  31. 31. Live Note/QA: http://tinyurl.com/KenDefense 31 / 85 female, computer science PhD student in Texas we're going to visit her this weekend from Pittsburgh She's in Austin Does she have any favorite TV shows, movies, or video games? U Sure! What types of things does your friend like? U Can you suggest some birthday present for one of my friend? 31 Gift Suggestion
  32. 32. Live Note/QA: http://tinyurl.com/KenDefense 32 / 85 female, computer science PhD student in Texas we're going to visit her this weekend from Pittsburgh She's in Austin Does she have any favorite TV shows, movies, or video games? U Sure! What types of things does your friend like? U Can you suggest some birthday present for one of my friend? 32 Gift Suggestion
  33. 33. Live Note/QA: http://tinyurl.com/KenDefense 33 / 85 Pittsburgh with which company are you flying? U Let me check UHow many suitcases can I take on a flight from the US to Israel? Can I ask you from where are you planning to board the flight? and which air services are you using? Travel Planning Full transcript: Huang, et al. HCOMP 2016.
  34. 34. Live Note/QA: http://tinyurl.com/KenDefense 34 / 85 What Did We Learn? • Challenges Identified • Malicious workers & users • Identifying the end of a conversation • When workers’ consensus is not enough… • Basic Statistics • Avg session duration = 10.63 min (SD=8.38) • Avg #message per session = 25.87 (SD= 27.27) Foundation for future automation!
  35. 35. Live Note/QA: http://tinyurl.com/KenDefense 35 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems Chorus Deployment [ HCOMP’16, HCOMP’17 ]
  36. 36. Live Note/QA: http://tinyurl.com/KenDefense 36 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems Chorus Deployment [ HCOMP’16, HCOMP’17 ]
  37. 37. Live Note/QA: http://tinyurl.com/KenDefense 37 / 85 Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Over Time [ UIST Poster’17, CHI’18 ]
  38. 38. Live Note/QA: http://tinyurl.com/KenDefense 38 / 85 Automating Chorus Over Time
  39. 39. Live Note/QA: http://tinyurl.com/KenDefense 39 / 85 Automating Chorus Over Time
  40. 40. Live Note/QA: http://tinyurl.com/KenDefense 40 / 85 Automating Chorus Over Time
  41. 41. Live Note/QA: http://tinyurl.com/KenDefense 41 / 85 Automating Chorus Over Time
  42. 42. Live Note/QA: http://tinyurl.com/KenDefense 42 / 85 Empower Chorus with Multiple Chatbots
  43. 43. Live Note/QA: http://tinyurl.com/KenDefense 43 / 85 Chatbots How to select chatbots automatically?
  44. 44. Live Note/QA: http://tinyurl.com/KenDefense 44 / 85 Ranking Chatbots: Performance & Topic Chatbot’s Performance Topic Similarity Posterior of a Chatbot ≈
  45. 45. Live Note/QA: http://tinyurl.com/KenDefense 45 / 85 Ranking Chatbots: Performance & Topic Chatbot’s Performance Topic Similarity Posterior of a Chatbot ~= Overall Message Acceptance Rate ≈
  46. 46. Live Note/QA: http://tinyurl.com/KenDefense 46 / 85 Ranking Chatbots: Performance & Topic Topic Similarity User Message Domain of the Chatbot Hey what should I eat in Montreal? ≈
  47. 47. Live Note/QA: http://tinyurl.com/KenDefense 47 / 85 Ranking Chatbots: Performance & Topic Topic Similarity User Message Domain of the Chatbot Hey what should I eat in Montreal? Find me some good restaurants ! Where can I get Chinese food? Example Triggering Message ≈
  48. 48. Live Note/QA: http://tinyurl.com/KenDefense 48 / 85 Ranking Chatbots: Performance & Topic Topic Similarity User Message Domain of the Chatbot Hey what should I eat in Montreal? Example Triggering Message Find me some good restaurants ! Where can I get Chinese food? Topic Similarity ≈
  49. 49. Live Note/QA: http://tinyurl.com/KenDefense 49 / 85 Ranking Chatbots: Performance & Topic Chatbot’s Performance Topic Similarity Posterior of a Chatbot ≈
  50. 50. Live Note/QA: http://tinyurl.com/KenDefense 50 / 85 Ranking Chatbots: Performance & Topic Chatbot’s Performance Topic Similarity Posterior of a Chatbot  Add more chatbots over time ! ≈
  51. 51. Live Note/QA: http://tinyurl.com/KenDefense 51 / 85
  52. 52. Live Note/QA: http://tinyurl.com/KenDefense 52 / 85 Automatic Upvote How to estimate the impact of an automation?
  53. 53. Live Note/QA: http://tinyurl.com/KenDefense 53 / 85 Find the Best Confidence Threshold • High Threshold • Only vote when pretty sure • High precision, but little benefit • Low Threshold • Nearly always vote • Grant agreement bonus by mistake • Damage conversation quality
  54. 54. Live Note/QA: http://tinyurl.com/KenDefense 54 / 85 Automating Chorus Over Time
  55. 55. Live Note/QA: http://tinyurl.com/KenDefense 55 / 85 Automating Open Conversation • Setup • A 5-month-long deployment, 80 Users • 4 chatbots + 1 voting bot • Result • Automated responses were chosen 12.44% of the time. • Human upvotes were reduced by 13.81%. • The cost of each message is reduced by 32.76%. • Conversation quality and user satisfaction level remains. • Conversation Quality: Satisfaction, Clarity, Responsiveness, Comfort (Liu, et al., 2010)
  56. 56. Live Note/QA: http://tinyurl.com/KenDefense 56 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems Chorus Deployment [ HCOMP’16, HCOMP’17 ] Evorus [ CHI’18 , UIST Poster’17 ]
  57. 57. Live Note/QA: http://tinyurl.com/KenDefense 57 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems Chorus Deployment [ HCOMP’16, HCOMP’17 ] Evorus [ CHI’18 , UIST Poster’17 ]
  58. 58. Live Note/QA: http://tinyurl.com/KenDefense 58 / 85 Empower Chorus with Multiple Chatbots
  59. 59. Live Note/QA: http://tinyurl.com/KenDefense 59 / 85 How to build a set of chatbots quickly?
  60. 60. Live Note/QA: http://tinyurl.com/KenDefense 60 / 85 Use Web APIs to Empower Chorus 19,758+ APIS
  61. 61. Live Note/QA: http://tinyurl.com/KenDefense 61 / 85 Use Web APIs to Empower Chorus 19,758+ APIS How to convert an Web API into a chatbot?
  62. 62. Live Note/QA: http://tinyurl.com/KenDefense 62 / 85 Guardian: A Crowd-Powered Spoken Dialog System for Web APIs [ HCOMP WIP’14, HCOMP’15, CI’17 ]
  63. 63. Live Note/QA: http://tinyurl.com/KenDefense 63 / 85 Guardian: A Crowd-Powered Dialog System for Web APIs 3 2 Dialog ManagementHi, I’m in San Diego. Any Chinese restaurants here? 1 Language Understanding Response Generation Mandarin Wok Restaurant is good ! It’s on 4227 Balboa Ave. term = Chinese location = San Diego Yelp Search API 2.0 { ... "name": "Mandarin Wok Restaurant”,... "address":["4227 Balboa Ave”,...], …} JSON
  64. 64. Live Note/QA: http://tinyurl.com/KenDefense 64 / 85 Parameter Extraction offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... Hi, I’m in San Diego. Any Chinese restaurants here? Parameters Yelp Search API User
  65. 65. Live Note/QA: http://tinyurl.com/KenDefense 65 / 85 Parameter Extraction offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... Hi, I’m in San Diego. Any Chinese restaurants here? Parameters Yelp Search API User 1. How to extract parameters? 2. Which parameters to use?
  66. 66. Live Note/QA: http://tinyurl.com/KenDefense 66 / 85 How to Extract Parameters? offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... Hi, I’m in San Diego. Any Chinese restaurants here? Parameters Yelp Search API User 1. How to extract parameters? 2. Which parameters to use?
  67. 67. Live Note/QA: http://tinyurl.com/KenDefense 67 / 85 Real-time On-Demand Crowd-powered Entity Extraction. Huang, et al. Collective Intelligence 2017. Crowd-Powered Parameter Extraction Hi, I’m in San Diego. Answer Aggregate Location = San Diego RecruitedPlayers Time Constraint (10 – 20 sec)
  68. 68. Live Note/QA: http://tinyurl.com/KenDefense 68 / 85 Which Parameters to Use? offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... Hi, I’m in San Diego. Any Chinese restaurants here? Parameters Yelp Search API User 1. How to extract parameters? 2. Which parameters to use?
  69. 69. Live Note/QA: http://tinyurl.com/KenDefense 69 / 85 Parameter Rating Problem offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... Pick good parameters for the dialog system.
  70. 70. Live Note/QA: http://tinyurl.com/KenDefense 70 / 85 How about just do a survey? Task Parameter Name / Desc
  71. 71. Live Note/QA: http://tinyurl.com/KenDefense 71 / 85 Match Questions with Parameters I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... Yelp API Question Collection
  72. 72. Live Note/QA: http://tinyurl.com/KenDefense 72 / 85 Match Questions with Parameters offset I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... term location sw_latitude sw_longitude category_filter Yelp API Question Collection Parameter Filtering
  73. 73. Live Note/QA: http://tinyurl.com/KenDefense 73 / 85 Match Questions with Parameters offset I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... location ? ! term ? ! ! ? ! ? ! ? ! ? ! category_filter ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? ! ? ! term location sw_latitude sw_longitude category_filter BetterParameter Yelp API Question Collection Parameter Filtering Question-Parameter Matching
  74. 74. Live Note/QA: http://tinyurl.com/KenDefense 74 / 85 Evaluation on Parameter Ranking 0 0.2 0.4 0.6 0.8 1 MAP MRR Question Matching Ask Siri Ask a Friend • Average results of 8 Web APIs’ parameters
  75. 75. Live Note/QA: http://tinyurl.com/KenDefense 75 / 85 Guardian: A Crowd-Powered Dialog System for Web APIs 3 2 Dialog ManagementHi, I’m in San Diego. Any Chinese restaurants here? 1 Language Understanding Response Generation Mandarin Wok Restaurant is good ! It’s on 4227 Balboa Ave. term = Chinese location = San Diego Yelp Search API 2.0 { ... "name": "Mandarin Wok Restaurant”,... "address":["4227 Balboa Ave”,...], …} JSON
  76. 76. Live Note/QA: http://tinyurl.com/KenDefense 76 / 85 Task Find Chinese restaurants in Pittsburgh. Check current weather by using a zip code. Find information of “Titanic”. API Result 9 out of 10 9 out of 10 6 out of 10 Final Response 10 out of 10 9 out of 10 10 out of 10 Evaluation: Task Completion Rate Crowd Recover Errors Crowd Recover Errors 2 3
  77. 77. Live Note/QA: http://tinyurl.com/KenDefense 77 / 85 Open Conversation Personal Assistants AI-Powered Dialog Systems Automated Crowd-Powered Dialog Systems Chorus Deployment [ HCOMP’16, HCOMP’17 ] Evorus [ CHI’18 , UIST Poster’17 ] Guardian [ HCOMP’15, CI’17 ]
  78. 78. Live Note/QA: http://tinyurl.com/KenDefense 78 / 85 Thesis Statement By allowing new chatbots to be easily integrated, reusing prior crowd answers, and gradually reducing the crowd's role in choosing high-quality responses, a deployed crowd-powered dialog system can be automated over time to support real-world open conversations.
  79. 79. Live Note/QA: http://tinyurl.com/KenDefense 79 / 85 Thesis Statement By allowing new chatbots to be easily integrated, reusing prior crowd answers, and gradually reducing the crowd's role in choosing high-quality responses, a deployed crowd-powered dialog system can be automated over time to support real-world open conversations. Chorus Deployment Evorus Guardian [ HCOMP’15, CI’17 ] [ HCOMP’16, HCOMP’17 ] [ CHI’18 , UIST Poster’17 ]
  80. 80. Live Note/QA: http://tinyurl.com/KenDefense 80 / 85 Some More Projects… Ignition HCOMP’17 WearMail Swaminathan et al. UIST’17 InstructableCrowd CHI LBW’16, TOCHI (Under Review) Visual Storytelling (VIST) NAACL’16, Ferraro et al. EMNLP’15, EmotionLines Chen et al., LREC’18
  81. 81. Live Note/QA: http://tinyurl.com/KenDefense 81 / 85 Crowd Research is Critical For Building Future Computer Systems. • Collect data to guide AI models • Accomplish tasks that are not yet fully automated • Pave the way for future AI systems
  82. 82. Live Note/QA: http://tinyurl.com/KenDefense 82 / 85 Future Work • Deployed Chorus as An Open Research Platform  Chorus API  1000+ chatbots • Chorus on Smart Devices  Echo, Google Home… • Future Crowd-AI Systems!  Object Recognition  Speech Recognition  Programming Tools  … And More!
  83. 83. Live Note/QA: http://tinyurl.com/KenDefense 83 / 85 Future Work • Deployed Chorus as An Open Research Platform  Chorus API  1000+ chatbots • Chorus on Smart Devices  Echo, Google Home… • Future Crowd-AI Systems!  Object Recognition  Speech Recognition  Programming Tools  … And More!
  84. 84. Live Note/QA: http://tinyurl.com/KenDefense 84 / 85 Acknowledgment • Family, Yan-Zhu (Lavender) Chen • Jeffrey P. Bigham • Walter S. Lasecki, Chris Callison-Burch, Alex Rudnicky, Margaret Mitchell, Lun-Wei Ku, Hsin-Hsi Chen, Saiph Savage, Jane Hsu… • Shoou-I Yu, Joseph Chee Chang, Chih-Yi (Jessica) Lin, Shihyun Lo, Chu-Cheng Lin, Yun-Nung (Vivian) Chen, Lingpeng Kong, Luan Yi, William Wang, Zi Yang, Yen-Chia Hsu, Kuen-Bang Hou (Favonia), Kerry Shih-Ping Chang, Janet Huang, Yi-Chia Wang, Kai-min Kevin Chang… • Anhong Guo, Sai Ganesh, Kotaro Hara, Yashesh Gaur, Gierad Laput, Robert Xiao, Yang Zhang, Patrick Carrington, Luz Rello, Cole Gleason, Kristin Williams, Alex Chen, Susumu Saito… • Amos Azaria, Oscar Romero Lopez… • Stacey Young
  85. 85. Live Note/QA: http://tinyurl.com/KenDefense 85 / 85 @windx0303 KennethHuang.cc Ting-Hao (Kenneth) Huang Carnegie Mellon University tinghaoh@cs.cmu.edu Thank you!
  86. 86. Live Note/QA: http://tinyurl.com/KenDefense 86 / 85 Backup Slides
  87. 87. Live Note/QA: http://tinyurl.com/KenDefense 87 / 85
  88. 88. Live Note/QA: http://tinyurl.com/KenDefense 88 / 85 Automatic Voting
  89. 89. Live Note/QA: http://tinyurl.com/KenDefense 89 / 85 Find the Best Confidence Threshold Expected Reward Points Saved

×