Successfully reported this slideshow.

Guardian: A Crowd-Powered Spoken Dialog System for Web APIs

3

Share

1 of 28
1 of 28

Guardian: A Crowd-Powered Spoken Dialog System for Web APIs

3

Share

Download to read offline

Ting-Hao K. Huang, Walter S Lasecki, Jeffrey P Bigham. (2015). Guardian: A Crowd-Powered Spoken Dialog System for Web APIs. Conference on Human Computation & Crowdsourcing (HCOMP 2015), November, 2015, San Diego, USA.

Ting-Hao K. Huang, Walter S Lasecki, Jeffrey P Bigham. (2015). Guardian: A Crowd-Powered Spoken Dialog System for Web APIs. Conference on Human Computation & Crowdsourcing (HCOMP 2015), November, 2015, San Diego, USA.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Guardian: A Crowd-Powered Spoken Dialog System for Web APIs

  1. 1. 1 / 28 http://www.flickr.com/photos/joshmichtom/4311110421/ Guardian: A Crowd-Powered Spoken Dialog System for Web APIs Ting-Hao (Kenneth) Huang Carnegie Mellon University Walter S. Lasecki University of Michigan Jeffrey P. Bigham Carnegie Mellon University
  2. 2. 2 / 28 What time is it? It’s 9:30. Kenneth’s apartment.
  3. 3. 3 / 28 How was the Pirates game last night? ! Kenneth’s apartment.
  4. 4. 4 / 28 How was the Steelers game yesterday? ! Kenneth’s apartment.
  5. 5. 5 / 28 Is the movie Martian still playing in theaters? ! Kenneth’s apartment.
  6. 6. 6 / 28 Use Web APIs to Empower Dialog Systems
  7. 7. 7 / 28 Gap between User & Machine ?
  8. 8. 8 / 28 A Crowdsourcing Solution
  9. 9. 9 / 28 Two Challenges term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  10. 10. 10 / 28 How Do Dialog Systems Usually Do? term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  11. 11. 11 / 28 Bridging this Gap is Expensive • Define Parameters requires Experts – Experts are expensive. – Most services are not designed for dialog systems. – Unsupervised Slot Induction • Extract Parameters requires Data – (Which we don’t have.) – Supervised Slot Filling • Slot Filling / Entity Recognition – No labeled data • State Tracking – No dialogue data – Unsupervised Slot Filling
  12. 12. 12 / 28 Can the Crowd Do It? term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  13. 13. 13 / 28 Define Parameters term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  14. 14. 14 / 28 How machines understand a Web API? 1. Use which parameters ? 2. Ask user what questions to elicit these parameters? Yelp Search API 2.0 has 22 parameters.
  15. 15. 15 / 28 Parameter Rating Problem offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... Pick good parameters for the dialog system.
  16. 16. 16 / 28 How about just do a survey? Task Parameter Name / Desc
  17. 17. 17 / 28 Baselines 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MAP MRR Not Unnatural Ask Siri Ask a Friend Average results of 8 Web APIs’ parameters Results are not so good...
  18. 18. 18 / 28 Match Questions with Parameters offset I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... location ? ! term ? ! ! ? ! ? ! ? ! ? ! category_filter ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? ! ? ! term location sw_latitude sw_longitude category_filter BetterParameter Yelp API Question Collection Parameter Filtering Qestion-Parameter Matching
  19. 19. 19 / 28 Evaluation on Parameter Ranking 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MAP MRR Question Matching Not Unnatural Ask Siri Ask a Friend Question Matching outperforms all baselines. Average results of 8 Web APIs’ parameters
  20. 20. 20 / 28 Questions Collected Already! 1. Use which parameters ? 2. Ask user what questions to elicit these parameters? Yelp Search API 2.0 has 22 parameters.
  21. 21. 21 / 28 Extract Parameters term location Define Parameters Extract Parameters Hi, I’m in San Diego. Any Chinese restaurants here?
  22. 22. 22 / 28 Dialog ESP Game Hi, I’m in San Diego. Answer Aggregate Location = San Diego RecruitedPlayers Time Constraint
  23. 23. 23 / 28 Guardian: A Crowd-Powered Spoken Dialog System for Web APIs 3 2 Call Web APIHi, I’m in San Diego. Any Chinese restaurants here? 1 Talk and Extract Parameter Interpret Result to User Mandarin Wok Restaurant is good ! It’s on 4227 Balboa Ave. term = Chinese location = San Diego Yelp Search API 2.0 { ... "name": "Mandarin Wok Restaurant”,... "address":["4227 Balboa Ave”,...], …} JSON
  24. 24. 24 / 28 Engineering Challenges • Real-time Response ……..…..……… Retainer Model • Converse with User ……………………………….. Chorus • Speech Recognition ………………... Web Speech API • Parameter Extraction ………..…… Dialog ESP-Game • JSON Visualization ………………..….. JSON Visualizer • Response Generation Assistant ………………. jQuery • Workflow Control ………………. Game-like Interface • Dialog Management …………. Finite-state Machine • Crowdsourcing Platform …………. Mechanical Turk
  25. 25. 25 / 28 System Evaluation Web API Task Find Chinese restaurants in Pittsburgh. Check current weather by using a zip code. Find information of “Titanic”. Valid JSON 9 / 10 9 / 10 6 / 10 Task Completion 10 / 10 9 / 10 10 / 10 Domain Referenced TCR 0.96 0.94 0.88
  26. 26. 26 / 28 Guardian: A Hybrid Framework Annotate Data on the Fly !
  27. 27. 27 / 28 What’s next? • More Automations – Slot Filling / Entity Recognition – Dialog Management – Response Generation • 1,000+ APIs? • Future of Dialog Systems – What if you can really talk to a machine… – On wearable device?
  28. 28. 28 / 28 Thank you! http://www.flickr.com/photos/joshmichtom/4311110421/

Editor's Notes

  • Hi every one, I am Kenneth from Carnegie Mellon University, Pittsburgh.
    Today, I am going to talk about the Guardian, a crowd-powered dialog system for Web APIs.
    This is a joint work with Walter from Michigan and Jeff from CMU.

    For this talk, we found this interesting photo on Flickr.
    I says DO.NOT.TALK.TO.MACHINE.
    I mean, why?
    Today we have many devices that we can talk with.
    We have Siri, we have Cortana, we have Google Now, and we have Amazon Echo.
    I have an Echo in my apartment.





  • I can ask simple questions like what time is it, and it will say, it is 9 :30.
    Or I can ask the weather, and it will tell me the weather today.

    However, as researcher from Pittsburgh, I really want to ask this question to Echo:
  • Echo is not able to answer this question.
    This is NOT because of a bad speech recognition.
    When I ask “what is Pittsburgh Pirate”, it can tell you basic information about this baseball team.

    It can not answer this question because it does not have this knowledge and service supported in the bad end.

    OK, let’e try again
  • How about Steelers?

  • How about the movie?

    It turned out that the dialog systems almost always have a limitation of its capability.
    They can answer questions in some certain domains, and answers is reasonably good.
    But when you ask something out of the system’s scope, the system almost has no ability to handle it.

    How to empower your intelligent agent to handle many many different domains?
  • We think of Web APIs.

    This page shows the ProgrammableWeb, a web site that collects Web APIs.
    Nowadays, it contains 14 thousands of Web APIs.
    That is a lot of resources we can explore.

    Web API is a representation of the knowledges available on the Internet.
    Most Web APIs follow the same identical protocols, that is the RSET protocol, and it makes life much easier when you try to implement a new wrapper for a new Web API.

    However, adding new arbitrary service to a dialog system is not quite easy.
  • All dialog systems in the world encounter this challenge:
    Humans and machines do not talk to each other easily.

    Machines do not have problems talking with other machines.

    However, there is a significant communication gap between humans and machines.
    Machine needs to understand your word to do the task for you.
  • In this work, we propose to use crowdsourcing to bridge this gap.

    How to do it? Let’s take a closer look:
  • In any dialog systems, if you want to add a new service to your system, you need to solve 2 main problems:
    Define the slots, and fill the slots.
    In other word, under the context of APIs, it is to define the parameters, and to fill the parameters.

    For example, if you want to add Yelp Search API to your system.
    Firstly, you need to know what information is required by this API.
    That is location, and query term.

    And in your system, you need to have something extracting the location and query term for you, so that you can call and use the API.

    How do modern dialog systems usually do it?
  • Not surprisingly, they have the experts or the API provider to define a set of parameters that fit the capability of the service and the context of dialog.
    And for parameter extraction, modern dialog systems usually use automated approaches to do it.
    There are bunch of supervised learning methods like CRF or RNN that you can train a entity recognizer or slot filler from labeled training data.

    What is the problem here?
  • Those bridging steps are expensive and painful.

    Most services are not designed for dialog systems, so you need to design a set of proper slots that can be used in the dialog of this services.
    However, experts who understand both the API and the dialog system is not always available.
    Even they are available, they can be expensive.

    More importantly, automated parameter extraction technology usually relies on labelled training data.
    But we do not have data.
    We do not always have training data available for arbitrary APIs.
  • In response to these pains, we propose to use crowdsourcing to solve both of the problems:
    Parameter defining and parameter extraction.

    How do we do it?
  • Let’s start from defining parameters.
    How to use the crowd to do it?
  • Let’s take a step back and think about this problem.
    What does a machine understand a web API?
    There are two main things:
    First, know the parameters to use.
    Second, know the question to ask.

    First, in Yelp Search API, there are 22 parameters in total. Not all of them fits in the context of dialog systems.
    For instance, in a human conversation, it’s not likely you’re going to specify the location by longitude and latitude.
    Only the location parameter which takes location names as input can be used in the dialog system.

    How to choose the reasonable parameter?

    Second, now you know Yelp API requires the parameter “location”, but how to make the user provide his location?
    -- Simply by asking.
    The system needs to know what question to ask. In this case, “where are you?”

    Let’s start from the first problem.


  • How to choose parameters?

    We think of this problem as a Parameter Rating Problem.
    Imagine you have a list of all parameters of Yelp API.
    The task is to rate how good is each parameter for dialog systems.
    The output is the rating score attached to each parameter, and thus you can have a ranking list of all parameters.
  • As a crowdsourcing person, people would ask: Why don’t you just tell the crowd what you want and do a survey on each parameters?
    So we did.

    This is our interface. This survey is conducted on CrowdFlower.
    For each parameter, we show the parameter name, parameter’s description, and the task of the API.
    Then we ask the worker to imagine a scenario, and rate how likely you are going to provide the information of this parameter as a user.

    To be more careful, we run experiment on three different scenarios.
    First, ask Siri. Imagine you’re talking to Siri, how likely you’re going to provide this information?
    Second, as a friend. Imagine you can not use Internet right now and call a friend for help, how likely you’re going to provide this information?
    Third, we also ask the workers to rate how wired is the parameter, and use “Not Weird” as rating.

    How does this work?
  • We run this three experiments on 8 Web APIs.
    For evaluation, we compare the output ranking list against the expert annotated ranking list.
    We use MAP -- the Mean Average Precision -- and MRR -- the Mean Reciprocal Rank -- which are two common evaluation metrics for ranking list to evaluation the output.
    It turned out these three survey questions are all not good enough.
    The MAP and MRR are all low. And when you take a look by eyes, the output is not close to practical use.

    So we might need some workflow that are more complicated…
  • Like this!

    The ideas we propose here is to collect questions related to this task, and then ask the workers use questions to vote for parameters.
    Take the Yelp API for example, we first collect all possible questions from the crowd.
    Like “what do you want to eat?”, “where are you?”, “What’s your budget?”and so on.
    And then we ask workers to associate questions with parameters.
    So essentially, the workers are using questions to vote for parameters.
    We assume the parameters that are associated with more questions are better for dialog systems.

    How does this work?

  • It turned out our workflow outperforms all three baselines.
    When you take a look at the result, you will know the quality is much better and close to practical use.

  • Even better, this workflow naturally solves the second question of this stage!
    The questions have been collected in the workflow. And we can just use it.

    So, let’s move to the second challenge.
  • How to extract parameters from a running conversation, if you don’t have any training data?
    How to do it?

    We think real-time crowdsourcing can help.
  • We propose a multi-player Dialog ESP Game to extract parameter values from a running conversation.
    ESP Game is originally proposed for image labeling, now we adopt the idea to dialog.
    In the interface, we show the dialog, we show the description of the parameter, and ask the workers to type what the other workers might type
    If there are two answers matching with each other, we take it as the extracted parameter value.

    This method works well. Now we can extract parameters without having any training data.

    Therefore, based on all the works we’ve done, we propose a system called “Guardian”:
  • Guaridan’s framework contains three main steps:
    First, the workers have a conversation with the user, and extract the parameter values with a dialog ESP Game.
    Second, behind the scenes, the system will us these values to call the Yelp API and run the query.
    Finally, when Yelp API returns the result, it’s in a JSON file. We also use the crowd to interpret the response.
    We visualize the JSON file as a user friendly interface. The workers can click through the data and explore the information inside the JSON.

    By using Guardian, we can have a running dialog system without using any training data or even pre-knowledge of task.
  • To build an end-to-end system of Guardian to test our idea, we encounter a lot of engineering challenges.
    For real-time response , we implement a retainer model;
    For having the capability to converse with the user, we use the propose and vote mechanism of Chorus;
    For speech recognition, we use Web Speech API of Google Chrome;
    For parameter extraction, we implement the dialog ESP-Game;
    We also use JSON Visualizer to visualize the JSON object.
    Finally, we use a game-like interface to put every small features together.
  • We implement the system on 3 different Web APIs.
    Yelp API for restaurant search, Weather Underground API for weather query, and RottenTomatoes API for movie query.
    We design three small tasks for each API, and run 10 trials on each systems.

    Here we only talking about the task completion rate.
    By task completion we mean the system provides the valid responses that contains the information the user requires.
    You can see the task completion rate is almost perfect.
    It’s because, first, the task here is relatively simple, second, even when the results returned from the API is incorrect, most of the time, crowd workers is able to figure it out the recover the correct answers.

    We also compare our result with the task completion rate reported by literature.
    The numbers are not directly comparable, but you can still see that our system reaches the same level of task completion rate with automated systems.
  • At the end of the day, we have a hybrid dialog system framework that is running both by the crowd and the machine.
    It doesn’t requires any training data or domain knowledge.
    Furthermore, it keeps annotating data when it runs.
    So when you run this system for an API for a while, you can have a small amount of labeled data and start to think about possible automation.

    That brings us to the future work.
  • What’s next?

    The first thing comes to our mind is automations.
    Each steps in Guardian system can be somewhat automated.
    Entity extraction, dialog management, and response generation.
    Once we start running Guardian, we start creating annotated data;
    And once we collect enough data, all the automations will become possible.

    Second, what happens when we what to add 1000 APIs? Will there be any new challenges?
    We would also like to explore on that.

    And the ultimate question we want to ask is, if we have a system that contains thousands of web APIs and can actually talk to us, what are we going to do with it?
  • Maybe one day, we can proudly say: Oh, sure, you can talk to machine.
    But inside that smart machine, some crowd workers are working hard on-line to help the machine.
    So, sort of.

    Thank you very much.
  • ×