Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Crowd-Powered Conversational Assistant
That Automates Itself Over Time
CMU LTI PhD Thesis Proposal
Ting-Hao (Kenneth) Hu...
Intro | Improving Chorus | Deployment | Automating Over Time 2
Intelligent Conversational Assistants
Intro | Improving Chorus | Deployment | Automating Over Time
Challenges of Open Conversation
• Combining multiple dialog s...
Chorus: A Crowd-powered
Conversation Assistant
• A group of crowd workers collectively hold a
conversation by:
1. Propose ...
What kind of conversations
can Chorus have?
female, computer science
PhD student in Texas
we're going to visit her this
weekend from Pittsburgh
She's in Austin
Does s...
Pittsburgh
with which company
are you flying?
U
Let me check
UHow many suitcases can I
take on a flight from the US
to Isr...
okay wait a sec
How can I help you?
UWho was the prime minister
of Australia when JFK was
assassinated
Intro | Improving C...
Intro | Improving Chorus | Deployment | Automating Over Time
A Top-Down Approach
9
Chorus
Fully-Automated System
Hybrid Sy...
Intro | Improving Chorus | Deployment | Automating Over Time
Outline
1. Intro
2. Part I: Improving Chorus
3. Part II: Depl...
Outline
1. Intro
2. Part I: Improving Chorus
– InstructableCrowd: Creating IF-THEN Rules via
Conversation with the Crowd (...
A Rule = IF(s) + THEN(s)
Intro | Improving Chorus | Deployment | Automating Over Time 12
https://commons.wikimedia.org/wik...
InstructableCrowd: Creating IF-THEN
Rules via Conversation with the Crowd
Intro | Improving Chorus | Deployment | Automati...
Intro | Improving Chorus | Deployment | Automating Over Time
InstructableCrowd Overview
14
Worker Interface
Intro | Improving Chorus | Deployment | Automating Over Time 15
User Study
• 10 participants, 6 scenarios, 10 workers per trial
• Evaluation
– App Selection (P/R/F1)
– Attribute Filling ...
Evaluation
Intro | Improving Chorus | Deployment | Automating Over Time
App Selection (P/R/F1)
Attribute Filling (Accuracy...
What did we learn?
• Crowd-powered conversational interface
can be used to create IF-THEN rules.
Intro | Improving Chorus ...
Intro | Improving Chorus | Deployment | Automating Over Time
Outline
1. Intro
2. Part I: Improving Chorus
3. Part II: Depl...
We deployed Chorus
• Launched on May 20th, 2016.
• 132 users used it during 1,028 conversational sessions
• TalkingToTheCr...
System Overview
Intro | Improving Chorus | Deployment | Automating Over Time 21
22
How to recruit workers
fast on-demand?
How to recruit workers
fast on-demand?
• Two Common Practices
– Start recruiting on-demand (Bigham, et al., 2010)
– Keep w...
Retainer Model
24 / 31
Conversation
Conv.
Ends
Wait in Retainer
Time
Conv.
Starts
Wait in Retainer
Workers’ waiting
time c...
Chorus’ Recruiting Method
Conversation 1 Conversation 2
Post
HIT
Fully
Occupied
Conv. 1
Ends
Post
HIT
Wait in Retainer
Tim...
Is this recruiting method
fast enough?
• Avg first crowd response time = 88.351 sec
Intro | Improving Chorus | Deployment ...
What challenges
did we identify?
Intro | Improving Chorus | Deployment | Automating Over Time
Challenges Identified
• Malicious workers & users
• Identifyi...
Intro | Improving Chorus | Deployment | Automating Over Time
Challenges Identified
• Malicious workers & users
• Identifyi...
Intro | Improving Chorus | Deployment | Automating Over Time
Malicious Users
• Abusive Languages
– Sexual content
– Profan...
What did we learn?
• Deploying a on-demand real-time crowd-
powered agent is feasible.
• Basic Statistics
– Avg session du...
Intro | Improving Chorus | Deployment | Automating Over Time
Outline
1. Intro
2. Part I: Improving Chorus
3. Part II: Depl...
Chorus Dataset (Proposed)
• Goal: Future Automation
– Automatic response generation & selection
– Dialog Learning (state t...
Intro | Improving Chorus | Deployment | Automating Over Time
Outline
1. Intro
2. Part I: Improving Chorus
3. Part II: Depl...
Intro | Improving Chorus | Deployment | Automating Over Time
Empower Chorus with
Multiple Dialog Systems
35
Intro | Improving Chorus | Deployment | Automating Over Time
How to build a set of
dialog systems quickly?
36
Use Web APIs to Empower Chorus
37 / 56
16,583+ APIS
Intro | Improving Chorus | Deployment | Automating Over Time 37
Guardian: A Crowd-Powered Dialog System
for Web APIs
3
2 Call Web APIHi, I’m in San Diego.
Any Chinese restaurants here?
1...
How to convert a Web API to
a conversational agent?
term
location
Hi, I’m in San Diego.
Any Chinese restaurants here?
Intr...
How to convert a Web API to
a conversational agent?
term
location
Hi, I’m in San Diego.
Any Chinese restaurants here?
Intr...
Select Parameter:
Step (1): Collect Questions
I like Chinese food.
What do you want to eat?
? !
I’m in Pittsburgh.
Which c...
Select Parameter:
Step (2): Filter Parameters
offset
I like Chinese food.
What do you want to eat?
? !
I’m in Pittsburgh.
...
Select Parameter:
Step (3): Parameter-Question Matching
offset
I like Chinese food.
What do you want to eat?
? !
I’m in Pi...
Which parameters to use?
How to convert a Web API to
a conversational agent?
term
location
Intro | Improving Chorus | Depl...
Extract Parameters:
Dialog ESP Game
Hi, I’m in San Diego.
Answer
Aggregate
Location =
San Diego
RecruitedPlayers
Time Cons...
Aggregate Method 1: ESP + 1st
ESP Answers
do NOT
Match
ESP Answer
Matches
Intro | Improving Chorus | Deployment | Automati...
Aggregate Method 2: 1st Only
ESP Answers
do NOT
Match
ESP Answer
Matches
Intro | Improving Chorus | Deployment | Automatin...
Experiment
• Data
– Airline Travel Information System (ATIS)
• Class A: Context Independent (simple)
• Class D: Context De...
How good? How fast?
Intro | Improving Chorus | Deployment | Automating Over Time 49
0
0.2
0.4
0.6
0.8
1
Class A Class D Cl...
Guardian: A Crowd-Powered Dialog System
for Web APIs
3
2 Call Web APIHi, I’m in San Diego.
Any Chinese restaurants here?
1...
Web API
Task
Find Chinese
restaurants in
Pittsburgh.
Check current
weather
by using a zip
code.
Find
information
of “Titan...
What did we learn?
• Use non-expert crowd to convert Web APIs to
dialog systems are feasible.
Intro | Improving Chorus | D...
Intro | Improving Chorus | Deployment | Automating Over Time
Outline
1. Intro
2. Part I: Improving Chorus
3. Part II: Depl...
Intro | Improving Chorus | Deployment | Automating Over Time
Empower Chorus with
Multiple Dialog Systems
54
Intro | Improving Chorus | Deployment | Automating Over Time
55 / 31
Initial Chorus
55
Automatic
Responder Selection
Intro | Improving Chorus | Deployment | Automating Over Time 56
• Label: Upvote/Downvote
• F...
57 / 31
Intro | Improving Chorus | Deployment | Automating Over Time
Automatic
Response Voting
57
Automatic
Responder Sele...
58 / 31
Intro | Improving Chorus | Deployment | Automating Over Time
Adjusting
Worker’s Workload
58
Automatic
Response Vot...
Intro | Improving Chorus | Deployment | Automating Over Time
Preliminary Results
• 3 automatic bots
• Randomly selected ea...
Intro | Improving Chorus | Deployment | Automating Over Time
Preliminary Results (Cont.)
60
Humans are good.
Auto bots rec...
Conclusion
• What did we do?
– InstructableCrowd (CHI EA 2016)
• Enabling Chorus to create IF-THEN rules
– Chorus Deployme...
Intro | Improving Chorus | Deployment | Automating Over Time
Contributions
62
Fully-Automated System
Hybrid System
Minor
A...
Intro | Improving Chorus | Deployment | Automating Over Time
Timeline
• January - March 2017: Automating response voting
•...
Intro | Improving Chorus | Deployment | Automating Over Time
Q&A
TalkingToTheCrowd.org
64
Reference
• Zhao, T., Lee, K., & Eskenazi, M. (2016). DialPort: Connecting the Spoken Dialog
Research Community to Real Us...
Publication List
1. "Is there anything else I can help you with?": Challenges in Deploying an On-
Demand Crowd-Powered Con...
Backup Slides
67
4 Conditions
Intro | Improving Chorus | Deployment | Automating Over Time
1. User Only2. Crowd Only
3. Crowd + User4. Crow...
Intro | Improving Chorus | Deployment | Automating Over Time
Worker Interface
69
Trade-offs (Class A)
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10
Avg.ResponseTime(sec)
# Player
ESP + First (20 sec)
ESP + First ...
Aggregate Method 1: ESP Only
ESP Answers
do NOT
Match
Empty
Label
ESP Answer
Matches
Intro | Improving Chorus | Deployment...
Evaluation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MAP MRR
Question Matching
Not Unnatural
Ask Siri
Ask a Friend
Intro | I...
Upcoming SlideShare
Loading in …5
×

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

1,011 views

Published on

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

Ting-Hao (Kenneth) Huang
11:30am – Wednesday, January 11th
GHC 6501

Committee:
Jeffrey P. Bigham, CMU (Chair)
Alexander I. Rudnicky, CMU
Niki Kittur, CMU
Walter S. Lasecki, University of Michigan
Chris Callison-Burch, University of Pennsylvania

Abstract:
Interaction in rich natural language enables people to exchange thoughts efficiently and come to a shared understanding quickly. Modern personal intelligent assistants such as Apple's Siri and Amazon's Echo all utilize conversational interfaces as their primary communication channels, and illustrate a future that in which getting help from a computer is as easy as asking a friend. However, despite decades of research, modern conversational assistants are still limited in domain, expressiveness, and robustness. In this thesis, we take an alternative approach that blends real-time human computation with artificial intelligence to reliably engage in conversations. Instead of bootstrapping automation from the bottom up with only automatic components, we start with our crowd-powered conversational assistant, Chorus, and create a framework that enables Chorus to automate itself over time. Each of Chorus' response is proposed and voted on by a group of crowd workers in real-time. Toward realizing the goal of full automation, we (i) augmented Chorus' capability by connecting it with sensors and effectors on smartphones so that users can safely control them via conversation, and (ii) deployed Chorus to the public as a Google Hangouts chatbot to collect a large corpus of conversations to help speed automation. The deployed Chorus also provides a working system to experiment automated approaches. In the future, we will (iii) create a framework that enables Chorus to automate itself over time by automatically obtaining response candidates from multiple dialog systems and selecting appropriate responses based on the current conversation. Over time, the automated systems will take over more responsibility in Chorus, not only helping us to deploy robust conversational assistants before we know how to automate everything, but also allowing us to drive down costs and gradually reduce reliance on the crowd.

For a copy of the thesis proposal, please go to:
https://www.cs.cmu.edu/~tinghaoh/pdf/2017_thesis_proposal.pdf

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

  1. 1. A Crowd-Powered Conversational Assistant That Automates Itself Over Time CMU LTI PhD Thesis Proposal Ting-Hao (Kenneth) Huang January 11th, 2017 Jeffrey P. Bigham CMU (Chair) Alexander I. Rudnicky CMU Chris Callison-Burch U Penn Walter S. Lasecki U Mich Niki Kittur CMU Thesis Committee
  2. 2. Intro | Improving Chorus | Deployment | Automating Over Time 2 Intelligent Conversational Assistants
  3. 3. Intro | Improving Chorus | Deployment | Automating Over Time Challenges of Open Conversation • Combining multiple dialog systems • DialPort (Zhao, et al., 2016) • Adapting a model to many other domains • Walker, et al., 2007; Sun, et al., 2016 • Chit-chat system • Hold social conversations (Banchs, et al., 2012) • Still a very hard problem… – Alexa Prize: $2.5 Million • “… achieves the grand challenge of conversing coherently and engagingly with humans on popular topics for 20 minutes.” 3
  4. 4. Chorus: A Crowd-powered Conversation Assistant • A group of crowd workers collectively hold a conversation by: 1. Propose Responses 2. Vote on Responses 3. Take Notes • Reward points for each action Lasecki, W. S.; Wesley, R.; Nichols, J.; Kulkarni, A.; Allen, J. F.; and Bigham, J. P. 2013. Chorus: A crowd-powered conversational assistant. In UIST 2013, UIST ’13, 151–162. Intro | Improving Chorus | Deployment | Automating Over Time 4
  5. 5. What kind of conversations can Chorus have?
  6. 6. female, computer science PhD student in Texas we're going to visit her this weekend from Pittsburgh She's in Austin Does she have any favorite TV shows, movies, or video games? U Sure! What types of things does your friend like? U Can you suggest some birthday present for one of my friend? Intro | Improving Chorus | Deployment | Automating Over Time 6 Gift Suggestion
  7. 7. Pittsburgh with which company are you flying? U Let me check UHow many suitcases can I take on a flight from the US to Israel? Intro | Improving Chorus | Deployment | Automating Over Time 7 Can I ask you from where are you planning to board the flight? and which air services are you using? Travel Planning
  8. 8. okay wait a sec How can I help you? UWho was the prime minister of Australia when JFK was assassinated Intro | Improving Chorus | Deployment | Automating Over Time 8 Let me check it Robert Menzies Arbitrary Question
  9. 9. Intro | Improving Chorus | Deployment | Automating Over Time A Top-Down Approach 9 Chorus Fully-Automated System Hybrid System Minor Automated Assistance Cost High Low Latency High Low
  10. 10. Intro | Improving Chorus | Deployment | Automating Over Time Outline 1. Intro 2. Part I: Improving Chorus 3. Part II: Deployment 4. Part III: Automating Over Time 5. Conclusion 10
  11. 11. Outline 1. Intro 2. Part I: Improving Chorus – InstructableCrowd: Creating IF-THEN Rules via Conversation with the Crowd ( Huang et. al., CHI EA 2016) 3. Part II: Deployment 4. Part III: Automating Over Time 5. Conclusion Intro | Improving Chorus | Deployment | Automating Over Time 11 Ting-Hao K. Huang, Amos Azaria, Jeffrey P Bigham. InstructableCrowd: Creating IF-THEN Rules via Conversations with the Crowd. In CHI LBW 2016. (Best Paper Honorable Mention) Icon made by Pixel Buddha from www.flaticon.com
  12. 12. A Rule = IF(s) + THEN(s) Intro | Improving Chorus | Deployment | Automating Over Time 12 https://commons.wikimedia.org/wiki/File:IFTTT_Logo.svg, https://www.flickr.com/photos/cjmartin/9261707401, https://www.flickr.com/photos/paperon/15641138784, https://www.flickr.com/photos/chriscoyier/16673560329 , http://www.publicdomainpictures.net/view- image.php?image=23182 IF(s) THEN(s)
  13. 13. InstructableCrowd: Creating IF-THEN Rules via Conversation with the Crowd Intro | Improving Chorus | Deployment | Automating Over Time 13
  14. 14. Intro | Improving Chorus | Deployment | Automating Over Time InstructableCrowd Overview 14
  15. 15. Worker Interface Intro | Improving Chorus | Deployment | Automating Over Time 15
  16. 16. User Study • 10 participants, 6 scenarios, 10 workers per trial • Evaluation – App Selection (P/R/F1) – Attribute Filling (Accuracy) Intro | Improving Chorus | Deployment | Automating Over Time 16
  17. 17. Evaluation Intro | Improving Chorus | Deployment | Automating Over Time App Selection (P/R/F1) Attribute Filling (Accuracy) 17
  18. 18. What did we learn? • Crowd-powered conversational interface can be used to create IF-THEN rules. Intro | Improving Chorus | Deployment | Automating Over Time 18 IF(s) THEN(s) User
  19. 19. Intro | Improving Chorus | Deployment | Automating Over Time Outline 1. Intro 2. Part I: Improving Chorus 3. Part II: Deployment – Chorus Deployment (Huang et. al., HCOMP 2016) – Chorus Dataset (Proposed) 4. Part III: Automating Over Time 5. Conclusion 19 Ting-Hao K. Huang, Walter S. Lasecki, Amos Azaria, Jeffrey P. Bigham. "Is there anything else I can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent. In Proceedings of Conference on Human Computation & Crowdsourcing (HCOMP 2016), 2016, Austin, TX, USA.
  20. 20. We deployed Chorus • Launched on May 20th, 2016. • 132 users used it during 1,028 conversational sessions • TalkingToTheCrowd.org Intro | Improving Chorus | Deployment | Automating Over Time 20
  21. 21. System Overview Intro | Improving Chorus | Deployment | Automating Over Time 21
  22. 22. 22 How to recruit workers fast on-demand?
  23. 23. How to recruit workers fast on-demand? • Two Common Practices – Start recruiting on-demand (Bigham, et al., 2010) – Keep workers on-call (Retainer) (Bernstein, et al., 2011) • Both are designed for short tasks Intro | Improving Chorus | Deployment | Automating Over Time 23
  24. 24. Retainer Model 24 / 31 Conversation Conv. Ends Wait in Retainer Time Conv. Starts Wait in Retainer Workers’ waiting time cost money. Intro | Improving Chorus | Deployment | Automating Over Time 24
  25. 25. Chorus’ Recruiting Method Conversation 1 Conversation 2 Post HIT Fully Occupied Conv. 1 Ends Post HIT Wait in Retainer Time Intro | Improving Chorus | Deployment | Automating Over Time 25
  26. 26. Is this recruiting method fast enough? • Avg first crowd response time = 88.351 sec Intro | Improving Chorus | Deployment | Automating Over Time 26 56.08% first crowd respond within 1 min
  27. 27. What challenges did we identify?
  28. 28. Intro | Improving Chorus | Deployment | Automating Over Time Challenges Identified • Malicious workers & users • Identifying the end of a conversation • When workers’ consensus is not enough 28
  29. 29. Intro | Improving Chorus | Deployment | Automating Over Time Challenges Identified • Malicious workers & users • Identifying the end of a conversation • When workers’ consensus is not enough 29
  30. 30. Intro | Improving Chorus | Deployment | Automating Over Time Malicious Users • Abusive Languages – Sexual content – Profanity – Hate speech – Threats of criminal acts • Solutions – Word detection 30
  31. 31. What did we learn? • Deploying a on-demand real-time crowd- powered agent is feasible. • Basic Statistics – Avg session duration = 10.493 min (SD = 14.139 min) – Avg #message per session = 17.877 (SD = 24.158) – Avg cost per conversation = $2.48 (SD = $0.99) Intro | Improving Chorus | Deployment | Automating Over Time 31
  32. 32. Intro | Improving Chorus | Deployment | Automating Over Time Outline 1. Intro 2. Part I: Improving Chorus 3. Part II: Deployment – Chorus Deployment (Huang et. al., HCOMP 2016) – Chorus Dataset (Proposed) 4. Part III: Automating Over Time 5. Conclusion 32
  33. 33. Chorus Dataset (Proposed) • Goal: Future Automation – Automatic response generation & selection – Dialog Learning (state tracking) • Data – Message, Vote (upvote / downvote), Note • Data Pre-processing – Anonymization – Inappropriate Content – Spamming Messages – Conversation Segmentation Intro | Improving Chorus | Deployment | Automating Over Time 33
  34. 34. Intro | Improving Chorus | Deployment | Automating Over Time Outline 1. Intro 2. Part I: Improving Chorus 3. Part II: Deployment 4. Part III: Automating Over Time – Guardian: A Crowd-Powered Dialog System for Web APIs (Huang et. al., HCOMP 2015; Huang et. al., HCOMP WIP 2014) – Automate Chorus over time (Proposed) 5. Conclusion 34 Ting-Hao K. Huang, Walter S. Lasecki, Jeffrey P. Bigham. Guardian: A Crowd-Powered Spoken Dialog System for Web APIs. In HCOMP 2015. Ting-Hao K. Huang, Walter S. Lasecki, Alan L. Ritter, Jeffrey P. Bigham. Combining Non-Expert and Expert Crowd Work to Convert Web APIs to Dialog Systems. HCOMP WIP 2014.
  35. 35. Intro | Improving Chorus | Deployment | Automating Over Time Empower Chorus with Multiple Dialog Systems 35
  36. 36. Intro | Improving Chorus | Deployment | Automating Over Time How to build a set of dialog systems quickly? 36
  37. 37. Use Web APIs to Empower Chorus 37 / 56 16,583+ APIS Intro | Improving Chorus | Deployment | Automating Over Time 37
  38. 38. Guardian: A Crowd-Powered Dialog System for Web APIs 3 2 Call Web APIHi, I’m in San Diego. Any Chinese restaurants here? 1 Talk and Extract Parameter Interpret Result to User Mandarin Wok Restaurant is good ! It’s on 4227 Balboa Ave. term = Chinese location = San Diego Yelp Search API 2.0 { ... "name": "Mandarin Wok Restaurant”,... "address":["4227 Balboa Ave”,...], …} JSON Intro | Improving Chorus | Deployment | Automating Over Time 38
  39. 39. How to convert a Web API to a conversational agent? term location Hi, I’m in San Diego. Any Chinese restaurants here? Intro | Improving Chorus | Deployment | Automating Over Time Which parameters to use? How to extract parameters? 39
  40. 40. How to convert a Web API to a conversational agent? term location Hi, I’m in San Diego. Any Chinese restaurants here? Intro | Improving Chorus | Deployment | Automating Over Time Which parameters to use? How to extract parameters? 40
  41. 41. Select Parameter: Step (1): Collect Questions I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... Yelp API Question Collection Intro | Improving Chorus | Deployment | Automating Over Time 41
  42. 42. Select Parameter: Step (2): Filter Parameters offset I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... term location sw_latitude sw_longitude category_filter Yelp API Question Collection Parameter Filtering Intro | Improving Chorus | Deployment | Automating Over Time 42
  43. 43. Select Parameter: Step (3): Parameter-Question Matching offset I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... location ? ! term ? ! ! ? ! ? ! ? ! ? ! category_filter ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? ! ? ! term location sw_latitude sw_longitude category_filter BetterParameter Yelp API Question Collection Parameter Filtering Question-Parameter Matching Intro | Improving Chorus | Deployment | Automating Over Time 43
  44. 44. Which parameters to use? How to convert a Web API to a conversational agent? term location Intro | Improving Chorus | Deployment | Automating Over Time How to extract parameters? 44 Hi, I’m in San Diego. Any Chinese restaurants here?
  45. 45. Extract Parameters: Dialog ESP Game Hi, I’m in San Diego. Answer Aggregate Location = San Diego RecruitedPlayers Time Constraint Intro | Improving Chorus | Deployment | Automating Over Time 45
  46. 46. Aggregate Method 1: ESP + 1st ESP Answers do NOT Match ESP Answer Matches Intro | Improving Chorus | Deployment | Automating Over Time 46
  47. 47. Aggregate Method 2: 1st Only ESP Answers do NOT Match ESP Answer Matches Intro | Improving Chorus | Deployment | Automating Over Time 47
  48. 48. Experiment • Data – Airline Travel Information System (ATIS) • Class A: Context Independent (simple) • Class D: Context Dependent • Class X: Unevaluable • Settings – Number of workers = 10 – Time constraint = 20 seconds – 2 aggregate methods – Using Amazon Mechanical Turk Intro | Improving Chorus | Deployment | Automating Over Time 48
  49. 49. How good? How fast? Intro | Improving Chorus | Deployment | Automating Over Time 49 0 0.2 0.4 0.6 0.8 1 Class A Class D Class X CRF 1st only ESP + 1st 0 1 2 3 4 5 6 7 8 9 Class A Class D Class X 1st only ESP + 1st F1-score = 0.8 (Class D) < 9 sec (ESP+1st) F1-score Response Time (sec)
  50. 50. Guardian: A Crowd-Powered Dialog System for Web APIs 3 2 Call Web APIHi, I’m in San Diego. Any Chinese restaurants here? 1 Talk and Extract Parameter Interpret Result to User Mandarin Wok Restaurant is good ! It’s on 4227 Balboa Ave. term = Chinese location = San Diego Yelp Search API 2.0 { ... "name": "Mandarin Wok Restaurant”,... "address":["4227 Balboa Ave”,...], …} JSON Intro | Improving Chorus | Deployment | Automating Over Time 50
  51. 51. Web API Task Find Chinese restaurants in Pittsburgh. Check current weather by using a zip code. Find information of “Titanic”. API Only 9 / 10 9 / 10 6 / 10 API + Crowd Recover 10 / 10 9 / 10 10 / 10 Domain Referenced 0.96 0.94 0.88 End-to-end Evaluation (TCR) Intro | Improving Chorus | Deployment | Automating Over Time 51
  52. 52. What did we learn? • Use non-expert crowd to convert Web APIs to dialog systems are feasible. Intro | Improving Chorus | Deployment | Automating Over Time 52 Define Parameters Extract Parameters
  53. 53. Intro | Improving Chorus | Deployment | Automating Over Time Outline 1. Intro 2. Part I: Improving Chorus 3. Part II: Deployment 4. Part III: Automating Over Time – Guardian: A Crowd-Powered Dialog System for Web APIs (Huang et. al., HCOMP 2015; Huang et. al., HCOMP WIP 2014) – Automate Chorus over time (Proposed) 5. Conclusion 53
  54. 54. Intro | Improving Chorus | Deployment | Automating Over Time Empower Chorus with Multiple Dialog Systems 54
  55. 55. Intro | Improving Chorus | Deployment | Automating Over Time 55 / 31 Initial Chorus 55
  56. 56. Automatic Responder Selection Intro | Improving Chorus | Deployment | Automating Over Time 56 • Label: Upvote/Downvote • Feature: • Conversation content • Previously selected bot • Generated text • End-user’s responses • …
  57. 57. 57 / 31 Intro | Improving Chorus | Deployment | Automating Over Time Automatic Response Voting 57 Automatic Responder Selection • Label: Upvote/Downvote • Feature: • Conversation content • Previously selected bot • Mete data of the bot • Other response candidates • …
  58. 58. 58 / 31 Intro | Improving Chorus | Deployment | Automating Over Time Adjusting Worker’s Workload 58 Automatic Response Voting Automatic Responder Selection • Bootstrapping • Bot v.s. workers • Competing for votes
  59. 59. Intro | Improving Chorus | Deployment | Automating Over Time Preliminary Results • 3 automatic bots • Randomly selected each turn 59
  60. 60. Intro | Improving Chorus | Deployment | Automating Over Time Preliminary Results (Cont.) 60 Humans are good. Auto bots receive much more downvotes.
  61. 61. Conclusion • What did we do? – InstructableCrowd (CHI EA 2016) • Enabling Chorus to create IF-THEN rules – Chorus Deployment (HCOMP 2016) • Chorus is deployable – Guardian (HCOMP 2015; HCOMP WIP 2014) • Converting a Web API to a dialog system • What will we do? – Automate Chorus over time – Release Chorus dataset 61
  62. 62. Intro | Improving Chorus | Deployment | Automating Over Time Contributions 62 Fully-Automated System Hybrid System Minor Automated Assistance Deploying Chorus Automate Chorus Over Time Data for Future Automation
  63. 63. Intro | Improving Chorus | Deployment | Automating Over Time Timeline • January - March 2017: Automating response voting • March - May 2017: Automating responder selection • May - September 2017: Automating dynamic workload assignment • September - December 2017: Chorus Dataset • September - December 2017: Thesis writing • Spring 2018: Thesis Defense 63
  64. 64. Intro | Improving Chorus | Deployment | Automating Over Time Q&A TalkingToTheCrowd.org 64
  65. 65. Reference • Zhao, T., Lee, K., & Eskenazi, M. (2016). DialPort: Connecting the Spoken Dialog Research Community to Real User Data. arXiv preprint arXiv:1606.02562.. • Banchs, R. E., & Li, H. (2012, July). IRIS: a chat-oriented dialogue system based on the vector space model. In Proceedings of the ACL 2012 System Demonstrations (pp. 37-42). Association for Computational Linguistics. • Walker, M. A., Stent, A., Mairesse, F., & Prasad, R. (2007). Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research, 30, 413-456. • Sun, M. (2016). Adapting Spoken Dialog Systems Towards Domains and Users. Doctoral dissertation, YAHOO! Research. • Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., ... & Yeh, T. (2010, October). VizWiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology (pp. 333- 342). ACM. • Bernstein, M. S., Brandt, J., Miller, R. C., & Karger, D. R. (2011, October). Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology (pp. 33-42). ACM. 65
  66. 66. Publication List 1. "Is there anything else I can help you with?": Challenges in Deploying an On- Demand Crowd-Powered Conversational Agent Ting-Hao K. Huang, Walter S. Lasecki, Amos Azaria, Jeffrey P. Bigham. In Proceedings of Conference on Human Computation & Crowdsourcing (HCOMP 2016), 2016, Austin, TX, USA. 2. Guardian: A Crowd-Powered Spoken Dialog System for Web APIs Ting-Hao K. Huang, Walter S. Lasecki, Jeffrey P. Bigham. In Conference on Human Computation & Crowdsourcing (HCOMP 2015), pages 62–71, November, 2015, San Diego, USA. 3. InstructableCrowd: Creating IF-THEN Rules via Conversations with the Crowd Ting-Hao K. Huang, Amos Azaria, Jeffrey P Bigham. In CHI '16 Late-Breaking Work on Human Factors in Computing Systems (CHI LBW 2016), May, 2016, San Jose, CA, USA. (Best Paper Honorable Mention Award) 4. Combining Non-Expert and Expert Crowd Work to Convert Web APIs to Dialog Systems Ting-Hao K. Huang, Walter S. Lasecki, Alan L. Ritter, Jeffrey P. Bigham. Work-in-Progress paper in the Proceeding of Conference on Human Computation & Crowdsourcing (HCOMP WIP 2014), pages 22-23, November 2-4, 2014, Pittsburgh, USA. 66
  67. 67. Backup Slides 67
  68. 68. 4 Conditions Intro | Improving Chorus | Deployment | Automating Over Time 1. User Only2. Crowd Only 3. Crowd + User4. Crowd Voting 68
  69. 69. Intro | Improving Chorus | Deployment | Automating Over Time Worker Interface 69
  70. 70. Trade-offs (Class A) 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 Avg.ResponseTime(sec) # Player ESP + First (20 sec) ESP + First (15 sec) First (20 sec) First (15 sec) 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0 2 4 6 8 10 F1-score # Player ESP + 1st (20 sec) ESP + 1st (15 sec) 1st (20 sec) 1st (15 sec) 0.65 0.70 0.75 0.80 0.85 0.90 0.95 5 6 7 8 9 10 11 12 F1-score Avg. Response Time (sec) 10 Players 9 Player 8 players 7 Players 6 Players 5 Players ESP + 1st (20 sec) 1st Only (20 sec) Intro | Improving Chorus | Deployment | Automating Over Time 70 More Workers, Faster More Workers, Better Quality Faster, Worse Quality
  71. 71. Aggregate Method 1: ESP Only ESP Answers do NOT Match Empty Label ESP Answer Matches Intro | Improving Chorus | Deployment | Automating Over Time 71
  72. 72. Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MAP MRR Question Matching Not Unnatural Ask Siri Ask a Friend Intro | Improving Chorus | Deployment | Automating Over Time 72 Question Matching outperforms all baselines.

×