Successfully reported this slideshow.
Your SlideShare is downloading. ×

STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021

More Related Content

More from Rasa Technologies

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021

  1. 1. STAR: A Schema-Guided Dialog Dataset for Transfer Learning Johannes Mosig*, Shikib Mehri*, Thomas Kober @shikibmehri
  2. 2. Motivation Pre-trained models changed NLP Large-scale pre-training → Downstream fine-tuning 2
  3. 3. Motivation Pre-trained models have helped significantly in open-domain dialog ● DialoGPT ● Meena ● Blender 3
  4. 4. Motivation But what about task-oriented dialog? ● Unlike chit-chat systems, task-oriented systems must accomplish a goal ○ Limited space of valid responses ○ Often must interface with API/knowledge ● After training on Reddit, can a system make restaurant reservations? 4
  5. 5. Motivation Scenario: Mary joins a COVID-19 hotline. Mary has human-level NLU/NLG and open-domain dialog ability. Can Mary do her job without any training? Answer: Probably not. 5
  6. 6. Motivation So what can we get from large-scale pre-training? ● Language understanding ● Language generation ● General dialog skills What don’t we get? ● Task-specific instructions/rules 6
  7. 7. Motivation Scenario: Mary joins a COVID-19 hotline. Mary has human-level NLU/NLG and open-domain dialog ability. What is the fastest way to train Mary? ● A training corpus that covers many situations → Training data ● A few examples? → Few-shot dialog ● A flow-chart describing the task? → Task-specific schema 7
  8. 8. Motivation Scenario: COVID-19 is cured! Mary is out of a job. Luckily she found a new job at a tech support call center. They give her a task-specific schema that looks a lot like what she used at her old job. Can Mary do her job without any training? Answer: Probably. 8
  9. 9. Dialog Schema Paradigm ● Schema acts as an inductive bias ● Pre-train → Train with schemas → Zero-shot on new task ● What transfers? ○ NLU, NLG, general dialog skills ○ how to follow the schema 9
  10. 10. Dataset Overview STAR: Schema-Guided Dialog Dataset for Transfer Learning ● Wizard of Oz data collection ● Schema guided data collection ● 24 tasks in 13 domains 10
  11. 11. Dataset Overview 11
  12. 12. Dialog Schemas 12
  13. 13. Dialog Schemas 13
  14. 14. Motivation ctd. What else makes a task-oriented dialog dataset good? ● System actions should be consistent ● Realistic and variable user behavior ● API/KB interface should be explicit ● There should be a progression of difficulty in the data 14
  15. 15. Consistency of System Actions ● Dialog often has one-to-many problem → We try to eliminate that ○ Responses in a task-oriented system need not be diverse ○ System action at each timestep should be deterministic ● Achieved this by ○ Told AMT workers to follow the schema ○ Suggestions module 15
  16. 16. Suggestions Module ● Wizard enters a response ● Gets a list of suggestions from the schema (using an NLU model) ● Can select suggestion (80%) or write custom response (20%) 16
  17. 17. Realistic User Behavior ● Realistic dialog rarely follows the task schema (happy path) ● Users: ○ Change their mind ○ Request explanation/justification ○ Engage in small talk ○ Get angry ○ Anything that deviates from the standard schema 17
  18. 18. In-Dialog User Instructions 18
  19. 19. Explicit API Interface ● Can’t transfer to a new task without: ○ making task-specific API requests ○ interpreting task-specific API outputs ● Make API requests/responses part of the schema and the dialog 19
  20. 20. API as Part of the Dialog ● Can’t transfer to a new task without: ○ making task-specific API requests ○ interpreting task-specific API outputs ● Make API requests/responses part of the schema and the dialog 20
  21. 21. Instead of … 21
  22. 22. Formulate Dialog as a 3-Party Interaction 22
  23. 23. Formulate Dialog as a 3-Party Interaction 23
  24. 24. Progression of Difficulty ● Happy Single-Task ● Unhappy Single-Task ● Multi-Task 24
  25. 25. Data Collection Stages ● STAGE I: Single-Task Tutorial + Quiz [10 questions] ● STAGE II: Happy/Unhappy Single-Task Collection ● STAGE III: Multi-Task Tutorial + Quiz [10 questions] ● STAGE IV: Multi-Task Data Collection 25
  26. 26. Example Quiz For the assistant, what of these things is the MOST important? (a) Being helpful to the user (b) Following the flow chart of the current task whenever possible (c) Making the conversation as short as possible (d) Making the conversation as long as possible 26
  27. 27. Data Collection Interface 27
  28. 28. Dataset Overview ● 5820 dialogs (127,833 utterances) across 24 tasks (13 domains) ○ Single-task: happy (2,688)/unhappy (1,464) ○ Multi-task (1,668) ● Properties ○ Schema-guided data collection for transfer learning ○ System-side consistency ○ Explicit API requests 28
  29. 29. Happy Dialog user: hello wizard: hello , how can i help ? user: my name is angela and i need to rsvp to a party wizard: at what venue is the party taking place ? user: north heights venue wizard: could you provide me with the name of the host , please ? user: the hosts name is alexis wizard: when are you planning to arrive at the party ? … wizard: name = north heights venue, hostname = alexia, guestname = angela, arrivaltime = 2 pm, numberguests = 6 , needparking = api . is _ equal _ to ( yes ) api: message = thank you for your rsvp . see you there . wizard: great , your rsvp is all done and confirmed ! user: thank you ! 29
  30. 30. Happy Dialog user: hello wizard: hello , how can i help ? user: my name is angela and i need to rsvp to a party wizard: at what venue is the party taking place ? user: north heights venue wizard: could you provide me with the name of the host , please ? user: the hosts name is alexis wizard: when are you planning to arrive at the party ? … wizard: name = north heights venue, hostname = alexia, guestname = angela, arrivaltime = 2 pm, numberguests = 6 , needparking = api . is _ equal _ to ( yes ) api: message = thank you for your rsvp . see you there . wizard: great , your rsvp is all done and confirmed ! user: thank you ! 30
  31. 31. Unhappy Dialog user: hi ! i need a room at the shadyside , can you book it for me ? wizard: when are you arriving ? user: may 11th - 18th , and i need two beds in the room , must have them wizard: may i have your name , please ? user: yup , i ' m mark but i need to change the hotel to the old town inn , i just remembered that my friends said i should check that place out . still need two beds wizard: name = old town inn , startdate = 11th , enddate = 18th , customername = mark , requesttype = check api: message = available , hotelname = old town inn wizard: alright , the old town inn ticks all of your boxes , can i book this room for you ? user: hate to do this to you but my friends dick and jane stayed at another old town inn in a completely different city . i ' m not going to stay there without a recommendation . can you try the hilton ? this blog i love just raved about the authentic feel of the place . 31
  32. 32. Unhappy Dialog user: hi ! i need a room at the shadyside , can you book it for me ? wizard: when are you arriving ? user: may 11th - 18th , and i need two beds in the room , must have them wizard: may i have your name , please ? user: yup , i ' m mark but i need to change the hotel to the old town inn , i just remembered that my friends said i should check that place out . still need two beds wizard: name = old town inn , startdate = 11th , enddate = 18th , customername = mark , requesttype = check api: message = available , hotelname = old town inn wizard: alright , the old town inn ticks all of your boxes , can i book this room for you ? user: hate to do this to you but my friends dick and jane stayed at another old town inn in a completely different city . i ' m not going to stay there without a recommendation . can you try the hilton ? this blog i love just raved about the authentic feel of the place . 32
  33. 33. Multi-Task Dialog user: i ' m havin a party for my friend on sunday . help me out wizard: when would you like your party to kick off ? user: 5 pm wizard: how many people would you like to invite ? user: 80 . she ' s a big deal . . . user: my name is angela . actually , what would the weather be like on sunday ? we might just take it outside wizard: for what location would you like the weather forecast ? . . . user: great . let ' s book a restaurant instead then . somewhere in the west wizard: i can filter restaurants by name , location , food type , its average rating , whether or not they take reservations , and whether or not they have a delivery service . user: either chinese or pizza is fine 33
  34. 34. Multi-Task Dialog user: i ' m havin a party for my friend on sunday . help me out wizard: when would you like your party to kick off ? user: 5 pm wizard: how many people would you like to invite ? user: 80 . she ' s a big deal . . . user: my name is angela . actually , what would the weather be like on sunday ? we might just take it outside wizard: for what location would you like the weather forecast ? . . . user: great . let ' s book a restaurant instead then . somewhere in the west wizard: i can filter restaurants by name , location , food type , its average rating , whether or not they take reservations , and whether or not they have a delivery service . user: either chinese or pizza is fine 34 party_plan weather restaurant_search eventually restaurant_reserve
  35. 35. 35
  36. 36. Tasks ● Response Generation ○ Produce the next response given (i) dialog history, (ii) API outputs, (iii) schema ● Next Action Prediction ○ Predict the next action (ask_name, query, confirm_booking) given (i) dialog history, (ii) API outputs, (iii) schema 36
  37. 37. Tasks user: can you help me ? wizard: hello , how can i help ? user: i need an urgent ride to the airport . wizard: may i have your name , please ? user: alexis . i need the ride at forbes and murray . wizard: customername = alexis, departurelocation = forbes and murray , arrivallocation = airport , requesttype = check api: id = 592 , price = 49 , allowschanges = false , minutestillpickup = 9 , serviceprovider = taxi , drivername = mark , carmodel = corolla , licenseplate = 019 eas , departurelocation = forbes and murray arrivallocation = airport , customername = alexis Response: i found a taxi ride for you from ' forbes and murray ' to ' airport ' for 49 credits that could pick you up in 9 minutes . should i book that for you ? Action: confirm_book_ride 37
  38. 38. Schema-Free Classification 38
  39. 39. Schema Representation 39
  40. 40. Schema Representation 40
  41. 41. Schema Representation 41
  42. 42. Schema Representation 42
  43. 43. Determinism in the Schema 43 System action is always deterministic. The nodes representing a dialog state before the system action always has an out-degree of 1
  44. 44. Schema-Guided Classification 44 Goal: Use the task- specific schema to guide next action prediction.
  45. 45. Schema-Guided Classification 45
  46. 46. Schema-Guided Classification 46
  47. 47. Schema-Guided Classification 47
  48. 48. Schema-Guided Classification 48
  49. 49. Results - Next Action Prediction 49 Model Happy Unhappy Multi-Task BERT 73.30 73.93 73.61 BERT + Schema 71.09 72.28 73.13 For each stage (happy/unhappy/multi), models are trained with 80% of data from current stage + all data from previous stages
  50. 50. Zero-Shot Next Action Prediction 50 Task Transfer Model Happy Happy + Unhappy BERT 36.45 36.89 BERT + Schema 36.77 37.15 Domain Transfer Model Happy Happy + Unhappy BERT 34.84 35.63 BERT + Schema 37.20 35.71
  51. 51. Other Tasks ● Response Generation ● Knowledge base query prediction (state tracking) ● Schema prediction (set of dialogs → schema graph) ● Out-of-domain detection (predict when you’ve gone outside the schema) 51
  52. 52. Contributions ● STAR: Schema-guided Dialog Dataset for Transfer Learning ○ Task-specific schema allows zero-shot transfer learning ○ System consistency ○ Realistic user behavior ○ Progression of difficulty ● Schema-guided models for classification and generation 52
  53. 53. Future Work ● Improve schema-guided models ● Investigate happy → unhappy transfer ● Investigate happy → multi-task transfer ● Schema prediction (infer schema from a set of example dialogs) 53
  54. 54. Thank You 54 @shikibmehri Paper https://arxiv.org/pdf/2010.11853.pdf Data https://github.com/rasahq/star

Editor's Notes

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

  • Evaluation metrics are important. They determine what gets PUBLISHED → SHAPE THE DIRECTION OF RESEARCH

×