Advertisement
Advertisement

More Related Content

Similar to End-to-End Task-Completion Neural Dialogue Systems(20)

Advertisement
Advertisement

End-to-End Task-Completion Neural Dialogue Systems

  1. End-to-End Task-Completion Neural Dialogue Systems Xiujun Li Yun-Nung (Vivian) Chen Lihong Li Jianfeng Gao Asli Celikyilmaz The 8th International Joint Conference on Natural Language Processing https://github.com/MiuLab/TC-Bot 1
  2. Dialogue Interactions “I want to talk” Chit-chat “I have a question” Information consumption “I need to get this done” Task completion Task-oriented Non task-oriented 2
  3. Dialogue Interactions “I want to talk” Chit-chat “I have a question” Information consumption “I need to get this done” Task completion • What is the paper review schedule? • Which room is the dialogue tutorial in? • When is the IJCNLP 2017 conference? • What does NLP stand for? 3
  4. Dialogue Interactions “I want to talk” Chit-chat “I have a question” Information consumption “I need to get this done” Task completion • Book me the flight from Seattle to Taipei • Reserve a table at Din Tai Fung for 5 people, 7PM tonight • Purchase two movie tickets about Thor 3 at 9PM tonight 4 Goal: navigating users to complete tasks (transactions, API calls, etc.) Challenges: including multiple information consumption
  5. Task-Completion Dialogues • Crowdsourced data – Domain: Movie – Dialogue act: intent & slot-value pairs • Problem setting – User wants to achieve a specific task, i.e. buying a movie ticket – Agent tries to understand user’s goal via dialogue before booking the ticket Turn Speaker Intent & Slots Utterance 1 User greeting(greeting=hi) Hi 1 User request(moviename; city=Houma; state=Louisiana; date=this week; genre=foreign) Are there any good movies showing around Houma Louisiana this week? 2 Agent request(date) What date would you like me to look for a reservation 3 User inform(date=9th) How about the 9th 4 Agent inform(result={}; genre=foreign) Unfortunately here are no foreign movies playing at this time 5
  6. Rule-Based Agent • Gather all information via a series of requests (form-filling) • Issues 1) No notion about what requests can be skipped User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle. User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm? Agent: Which theater do you want? User: AMC pacific place 11 theater, please! Agent: How many tickets do you want? User: Two! Agent: What date? User: Tomorrow! Agent: Which city do you want? User: Seattle! Agent: Do you have any specification for zip code? User: No, I do not care. Agent: Do you have any distance constraint? User: No, I do not care. Agent: How about video format? 2d? Imax 3d? User: Any kind is okay. Agent: Do you have any constraint to price? User: No! Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User: Okay, thanks. Agent: Thanks! Wasteful! (very likely the user does not care) 6
  7. Rule-Based Agent • Gather all information via a series of requests (form-filling) • Issues 1) No notion about what requests can be skipped 2) No principle way to deal with errors/uncertainty in language understanding User goal – Two tickets for “Hail Caesar” tomorrow 7:15PM at AMC Lowes Oak tree 6, Seattle. User: Do you any ticket for movie ‘Hail Caesar’ at theater AMC Lowes Oak Tree 6? Agent: What start time do you prefer? User: 7:15 pm Tomorrow Agent: Which city you are looking for? User: Seattle Agent: How many tickets do you want? User: Two Agent: Okay, I have issued two tickets for movie ‘Hail Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Oh, no. Agent: Thanks! Task failed! (cannot correct the errors from proceeding component) 7
  8. Rule-Based Agent • Gather all information via a series of requests (form-filling) • Issues 1) No notion about what requests can be skipped 2) No principle way to deal with errors/uncertainty in language understanding 3) Do not know how to recommend options if the user’s goal is not achievable 8
  9. Task-Oriented Dialogue System Framework Language Understanding (LU) Natural Language Generation (NLG) Semantic Frame request_ticket (moviename=Star Wars; numberofpeople=5) System Action/ Policy request(theater) Text Response “Which theater do you prefer?” Text Input “Book 5 tickets for movie Star Wars” Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) 9 State Representation
  10. Task-Oriented Dialogue System Framework Language Understanding (LU) Natural Language Generation (NLG) Dialogue Act Natural Language Natural Language Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) Dialogue Act 10 State Representation User System
  11. User Simulation in Frame-Level Semantics User Dialogue Act Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) Error Model Controller • Recognition error • LU error User Model User Simulation System Dialogue Act User Dialogue Act 11 State Representation
  12. User Simulation in Natural Language Level Natural Language Generation (NLG) User Model User Simulation System Dialogue Act Language Understanding (LU) NL 12 Knowledge Database Dialogue State Tracking (DST) Dialogue Policy Learning Dialogue Management (DM) State RepresentationUser Dialogue Act
  13. Task-Completion Neural Dialogue Systems • Language Understanding – Joint semantic frame parsing by BLSTM (Hakkani-Tur et al., 2016) • Dialogue State Tracking – Available results returned by the formed symbolic query – Latest user dialogue action • Dialogue Policy Learning – Reinforcement learning policy (Mnih et al., 2015) • Natural Language Generation – Template-based – Model-based: semantically-conditioned LSTM generation (SC-LSTM) (Wen et al., 2015) 13 Supervised Supervised Supervised Reinforcement + Reinforcement (fine-tuning) + Reinforcement (fine-tuning) + Reinforcement (fine-tuning)
  14. wi <slot> wi+1 O EOS <intent> wi <slot> wi+1 O EOS <intent> End-to-End Neural Dialogue Systems • LU, DST (neural dialogue system), and NLG (user simulation) are trained in supervised way • End-to-end training for dialogue policy learning Knowledge Database Neural Dialogue System User Model User Simulation Dialogue Policy Natural Language w 0 w1 w2 NLG EOS User Goal wi <slot> wi+1 O EOS <intent> LU 𝑠𝑡 DST 𝑠1 𝑠2 𝑠 𝑛 𝑎1 𝑎2 𝑎 𝑘 …… … Dialogue Policy Learning 14
  15. Reinforcement Learning Agent • Dialogue policy learning – Deep Q-network: estimate Q-value given the state and action pair • Reward – Success: agent answers all the requested slots based on the user’s constraints, and book the movie tickets within the max turns – Failure: 1) agent finds no matching movie based on the user’s constraints or 2) exceed the max turns 15
  16. Experiments • Rule-based agent – Actions • ask question (request) • answer question (inform) • give multiple_choice to user • say “confirm_answer” to user • say “closing”, “thanks” – Agent asks the slot in a priority order, but within a sliding window (say, size = 2 or 3), there is randomness. • RL agent – Model: Deep Q-Network – Rewards • Success: 2 × max_turn • Fail: - max_turn • -1 for per turn penalty – Actions: 45 actions • i.e. request(starttime), inform(moviename), confirm(question), etc. – State transition tuples ( 𝑠𝑡, 𝑎 𝑡, 𝑟𝑡, 𝑠𝑡+1) – Experience replay (Schaul et al., 2015) • Priority, dynamic pool • Starts with a pool of rule-based tuples 16
  17. • Frame-level semantics  Natural language The RL agent is able to learn how to interact with users to complete tasks more efficiently and effectively, and outperforms the rule-based agent. End-to-End Reinforcement Learning17
  18. Language Understanding Impact • Different error rates in LU 18 Language understanding results are crucial in terms of dialogue-level performance
  19. Intent Error Analysis • Intent error type – I0: random – I1: within group – I2: between group • Intent error rate – I3: 0.00 – I4: 0.10 – I5: 0.20 Intent errors slightly influence the RL system performance Group 1: greeting(), thanks(), etc Group 2: inform(xx) Group 3: request(xx) Between-group intent errors degrade the system performance more request_moviename(actor=Robert Downey Jr) request_year 19
  20. Slot Error Analysis • Slot error type – I0: random – I1: slot deletion – I2: value substitution – I3: slot substitution • Slot error rate – S4: 0.00 – S5: 0.10 – S6: 0.20 Slot errors significantly degrade the RL system performance Value substitution has the largest impact on the system performance request_moviename (actor=Robert Downey Jr) director Robert Downey Sr20
  21. Error Comparison • Intent error rate • Slot error rate The RL agent has better robustness to intent errors for dialogue-level performance Slot filling is more important than intent detection in language understanding 21
  22. Human Evaluation • Users presented with a predefined user goal • User rating: 1 (worst) to 5 (best) based on both naturalness and coherence of the dialogue 22
  23. User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle. RULE BASED SYSTEM User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm? Agent: Which theater do you want? User: AMC pacific place 11 theater, please! Agent: How many tickets do you want? User: Two! Agent: What date? User: Tomorrow! Agent: Which city do you want? User: Seattle! Agent: Do you have any specification for zip code? User: No, I do not care. Agent: Do you have any distance constraint? User: No, I do not care. Agent: How about video format? 2d? Imax 3d? User: Any kind is okay. Agent: Do you have any constraint to price? User: No! Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User: Okay, thanks. Agent: Thanks! REINFORCEMENT LEARNING SYSTEM User: I want two tickets for movie ‘Deadpool’. Agent: What is the starttime you prefer? User: 9:00 PM Agent: Which day? User: Tomorrow Agent: Which city? User: Seattle Agent: Which theater do you prefer? User: AMC Pacific Place 11 theater. Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User: Okay, thanks. Agent: Thanks! Dialogue Examples Skip the requests the user may not care about to improve efficiency Issue 1: no notion about what requests can be skipped 23
  24. User goal – Two tickets for “Hail Caesar” tomorrow 7:15PM at AMC Lowes Oak tree 6, Seattle. Dialogue Examples Automatically learn when and what to confirm for better robustness REINFORCEMENT LEARNING SYSTEM Agent: What start time do you prefer? User: 7:15 pm Tomorrow Agent: Which city you are looking for? User: Seattle Agent: Okay, I will issue you tickets for movie ‘Hail Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Oh, no. Agent: It is 7:50 pm, right? User: Two tickets for 7:15 pm, please. Agent: Okay, I will issue you two tickets for movie ‘Hail Caeser’ tomorrow 7:15 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Thanks! Agent: Thanks! RULE BASED SYSTEM User: Do you any ticket for movie ‘Hail Caesar’ at theater AMC Lowes Oak Tree 6? Agent: What start time do you prefer? User: 7:15 pm Tomorrow Agent: Which city you are looking for? User: Seattle Agent: How many tickets do you want? User: Two Agent: Okay, I have issued two tickets for movie ‘Hail Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle. User: Oh, no. Agent: Thanks! Issue 2: no principle way to deal with LU errors/uncertainty 24
  25. Conclusions • Our neural dialogue system demonstrates – Robustness • selecting actions based on uncertainty and confusion by RL • systematic analysis about language understanding errors on dialogue system performance  slot-level errors affect system performance more than intent-level errors  slot value replacement degrades the performance most – Flexibility • first neural dialogue system that allows user-initiated behaviors during conversations – Reproducibility • demonstrate how to evaluate RL dialogue agents using crowdsourced datasets and simulated users in an end-to-end fashion • guaranteeing reproducibility and consistent comparisons of competing methods in an identical setting 25
  26. Thanks for Attention! Q & A https://github.com/MiuLab/TC-Bot

Editor's Notes

  1. General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it. But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit. Where does it make sense for MS to focus its efforts in this space? For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product. For 2 & 3 the user already has some idea of what the information need is. For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.
  2. General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it. But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit. Where does it make sense for MS to focus its efforts in this space? For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product. For 2 & 3 the user already has some idea of what the information need is. For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.
  3. General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it. But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit. Where does it make sense for MS to focus its efforts in this space? For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product. For 2 & 3 the user already has some idea of what the information need is. For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.
Advertisement