Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for Goal-Oriented Dialogue Systems

914 views

Published on

In this talk, I will present recent developments in Google Research for end-to-end goal-oriented dialogue systems, with components for language understanding, dialogue state tracking, policy, and language generation. The talk will summarize novel aspects of each component, and highlight novel approaches where dialogue is viewed as a collaborative game between a user and an agent: The user has a goal in mind and the agent has access to the data that user is interested in, and can perform actions in order to realize the user’s goal. The two engage in a conversation so that the agent can help the user find a way for task completion.

Published in: Data & Analytics
  • Be the first to comment

Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for Goal-Oriented Dialogue Systems

  1. 1. Deep Learning for Goal-Oriented Conversational Understanding Dilek Hakkani-Tur ACKNOWLEDGMENTS: GOKHAN TUR, LARRY HECK, ABHINAV RASTOGI, PARARTH SHAH, ANKUR BAPNA, NEHA NAYAK, ANNA KHASIN, RAGHAV GUPTA, YANG SONG, GRADY SIMON, AMIR FAYAZI, JINDONG CHEN, GEORGI NIKOLOV, BING LIU (CMU), IZZEDDIN GUR (UCSB), RAMA PASUMARTHI (CMU), SAURABH KUMAR (GT), SHYAM UDAPHYAY (UIUC), ASLI CELIKYILMAZ (MSR), VIVIAN CHEN (NTU), MARILYN WALKER (UCSB)
  2. 2. Data-Driven Dialogue Systems Human-like interactions for goal/task-oriented dialogues. Learn from data: ● High variability and noise in language ● Adapt to available meaning representations ● Integrate common sense and world knowledge ● Robust modeling of context Book me a table at Cascal Sure, for what time? Nothing is available at 7pm, would 8pm be ok? Around 7pm, for 2 people That is too late, what about Amarin? OK, I can book you a table at Amarin at 7pm.
  3. 3. 3 Dialogue Systems •Personal assistant, helps users achieve a certain task •Goal: Task completion •Combination of rules and learning. •Examples: •End-to-end trainable task-oriented dialogue system (Wen et al., 2016) •End-to-end reinforcement learning dialogue system (Zhao and Eskenazi, 2016) Goal/Task-Oriented •No specific goal, focus on natural responses •Goal: User engagement, naturalness •Using variants of seq2seq models •Examples: •A neural conversation model (Vinyals and Le, 2015) •Reinforcement learning for dialogue generation (Li et al., 2016) Chit-Chat 3
  4. 4. 4 Task-Oriented Dialogue as a Collaborative Game USER Has a goal (fixed/flexible) AGENT Has access to data Can perform actions Book my flu shot with Dr. Straw OK. Monday at October 6th at 5:15pm and 6pm are available. What time would you prefer? Games take many forms: ● Adversarial (Chess, Go, …) ● Cooperative (20 questions, Pictionary) ● Collaborative (Dialogue)
  5. 5. 5 Task-Oriented Dialogue as a Collaborative Game USER Has a goal (fixed/flexible) AGENT Has access to data Can perform actions Book my flu shot with Dr. Straw OK. Monday at October 6th at 5:15pm and 6pm are available. What time would you prefer? Games take many forms: ● Adversarial (Chess, Go, …) ● Cooperative (20 questions, Pictionary) ● Collaborative (Dialogue) Large space of actions and states Multi-action turns and flexible turn-taking
  6. 6. 6 Why learn? Challenge Our solutions Variety in NL & user requests More flexible parsing mechanism Noise in input Models learn to correct for likely noise (e.g., ASR errors) Modeling context Integrating contextual information Dialogue-level planning End-to-end modeling with reinforcement learning Scale Recall Continuous training from the logs, transfer learning, active learning Intents Transfer learning, warm-start, multi-task modeling Languages Transfer learning, multi-lingual embeddings
  7. 7. 7 Conversational Language Understanding Dialogue State Tracking Response Generation SYSTEM/AGENT Dialogue Manager BackEnd Action/Knowledge Providers Book me a table at Cascal for 2 people Sure, at what time do you want the reservation? Request(time) Goal-Oriented Dialogue Systems restaurants reserve_rest. Rest._name: Cascal Num_people: 2 Back-end query Response
  8. 8. 8 Conversation Understanding Dialogue State Tracking Response Generation SYSTEM/AGENT Dialogue Manager BackEnd Action/Knowledge Providers Book me a table at Cascal for 2 people Sure, at what time do you want the reservation? Request(time) Goal-Oriented Dialogue Systems - Components restaurants reserve_rest. Rest._name: Cascal Num_people: 2 Back-end query Response
  9. 9. ht-1 ht+1 ht W W W W taiwanese B-cuisine U food U please U V O V O V hT+1 EOS U FIND_REST V Slot Filling Domain/Intent Prediction Conversational Language Understanding (CLU): Multi-Domain, Joint Semantic Frame Parsing Joint, Sequence-based • Slot filling and intent prediction in the same output sequence https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf ➢ One model: Holistic multi-domain, multi-task modeling ➢ Estimate all semantic frames covering all domains in single RNN model ➢ Data from each domain reinforces each other D. Hakkani-Tur, G. Tur, A. Celikyilmaz, Y-N. Chen, J. Gao, L. Deng, and Y-Y. Wang, “Multidomain joint semantic frame parsing using bi-directional RNN-LSTM,” in INTERSPEECH, 2016.
  10. 10. E2E MemNN for Contextual CLU What does this utterance say? What do the previous utterances say? (what the last slide showed) Y-N. Chen, D. Hakkani-Tur, Gokhan Tur, J. Gao, and L. Deng, “End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding,” in INTERSPEECH, 2016. A. Bapna, G. Tur, D. Hakkani-Tur, L.Heck. “Improving frame semantic parsing with hierarchical dialogue encoders”, SigDial, 2017.
  11. 11. E2E MemNN for Contextual CLU How relevant are each of the previous utterances to the current one? What does this utterance say? What do the previous utterances say? (what the last slide showed) Y-N. Chen, D. Hakkani-Tur, Gokhan Tur, J. Gao, and L. Deng, “End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding,” in INTERSPEECH, 2016. A. Bapna, G. Tur, D. Hakkani-Tur, L.Heck. “Improving frame semantic parsing with hierarchical dialogue encoders”, SigDial, 2017.
  12. 12. E2E MemNN for Contextual CLU How relevant are each of the previous utterances to the current one? What do the relevant previous utterances say? What does this utterance say? What do the previous utterances say? (what the last slide showed) Y-N. Chen, D. Hakkani-Tur, Gokhan Tur, J. Gao, and L. Deng, “End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding,” in INTERSPEECH, 2016. A. Bapna, G. Tur, D. Hakkani-Tur, L.Heck. “Improving frame semantic parsing with hierarchical dialogue encoders”, SigDial, 2017.
  13. 13. E2E MemNN for Contextual CLU How relevant are each of the previous utterances to the current one? What do the relevant previous utterances say? 4. Sequence tagging Given the relevant information from the previous and current utterances, how do I tag each token? What does this utterance say? What do the previous utterances say? (what the last slide showed) Y-N. Chen, D. Hakkani-Tur, Gokhan Tur, J. Gao, and L. Deng, “End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding,” in INTERSPEECH, 2016. A. Bapna, G. Tur, D. Hakkani-Tur, L.Heck. “Improving frame semantic parsing with hierarchical dialogue encoders”, SigDial, 2017.
  14. 14. Do you wanna take Angela to go see a movie tonight? Sure, I will be home by 6. Let's grab dinner before the movie. How about some Mexican? Let's go to Vive Sol and see Inferno after that. Angela wants to watch the Trolls movie. Ok. Lets catch the 8 pm show. InfernoMovie Date Time #People Movies 6 pm 7 pm 2 3 11/15/16 Vive SolRestaurant MexicanCuisine 6:30 pm 7 pm 11/15/16Date Time Restaurants 7:30 pm Century 16 Theatre Trolls 8 pm 9 pm Dialogue State Tracking (DST) ● System's belief of the user's goal at any time ● Inputs at user turn t: DSt-1 , CLUt , Output: DSt ● Used for accessing information and making transactions ● NN models
  15. 15. Dialogue State Tracking (DST) A. Rastogi, D. Hakkani-Tur, L. Heck. “Scalable Multi-Domain Dialogue State Tracking”, IEEE ASRU, 2017. S> How about 6 pm? U> I am busy at 6, book it for 7 pm instead. ● Candidate set generation ○ Slots with large/unbounded value sets ○ Previously unseen slot values
  16. 16. Dialogue State Tracking (DST) A. Rastogi, D. Hakkani-Tur, L. Heck. “Scalable Multi-Domain Dialogue State Tracking”, IEEE ASRU, 2017. S> How about 6 pm? U> I am busy at 6, book it for 7 pm instead. ● Candidate set generation ○ Slots with large/unbounded value sets ○ Previously unseen slot values
  17. 17. Dialogue State Tracking (DST) A. Rastogi, D. Hakkani-Tur, L. Heck. “Scalable Multi-Domain Dialogue State Tracking”, IEEE ASRU, 2017. S> How about 6 pm? U> I am busy at 6, book it for 7 pm instead. ● Candidate set generation ○ Slots with large/unbounded value sets ○ Previously unseen slot values
  18. 18. Dialogue State Tracking (DST) A. Rastogi, D. Hakkani-Tur, L. Heck. “Scalable Multi-Domain Dialogue State Tracking”, IEEE ASRU, 2017. S> How about 6 pm? U> I am busy at 6, book it for 7 pm instead. ● Candidate set generation ○ Slots with large/unbounded value sets ○ Previously unseen slot values ● Sharing parameters between different slots ● Transfer learning to unseen domains
  19. 19. 19 Dialogue State ~ Game Board User Acts: inform(category) System Acts: request(location) Grounded Information: time Dialogue Move ~ Transformation of the dialogue state I’m hungry, find me a Mediterranean restaurant Which area do you prefer? Near downtown Mountain View. User Acts: inform(location) Dialogue Manager (DM) Policy
  20. 20. 20 Dialogue State ~ Game Board User Acts: inform(category) System Acts: request(location) Grounded Information: time System Acts: offer(restaurant) Grounded Information: time, location Dialogue Move ~ Transformation of the dialogue state I’m hungry, find me a Mediterranean restaurant Which area do you prefer? Would you like to eat at Cascal? Near downtown Mountain View. User Acts: inform(location) Dialogue Manager (DM) Policy
  21. 21. Learning DM Policy Multi stage training of dialogue manager: Dialogue Manager Human expert User Dialogue Corpus Bootstrap Supervised Learning P. Shah, D. Hakkani-Tur, L. Heck. “Interactive reinforcement learning for task-oriented dialogue management”, Deep Learning for Action and Interaction, NIPS, 2016.
  22. 22. Learning DM Policy Multi stage training of dialogue manager: Dialogue Manager Human expert User Dialogue Corpus Bootstrap Dialogue Manager Task-level Reward User Simulator Simulated Refinement Supervised Learning Reinforcement Learning P. Shah, D. Hakkani-Tur, L. Heck. “Interactive reinforcement learning for task-oriented dialogue management”, Deep Learning for Action and Interaction, NIPS, 2016.
  23. 23. Learning DM Policy Multi stage training of dialogue manager: Dialogue Manager Human expert User Dialogue Corpus Bootstrap Dialogue Manager Task-level Reward User Simulator Simulated Refinement Dialogue Manager Task-level Reward User Continual Learning Turn-level Feedback Supervised Learning Interactive RLReinforcement Learning P. Shah, D. Hakkani-Tur, L. Heck. “Interactive reinforcement learning for task-oriented dialogue management”, Deep Learning for Action and Interaction, NIPS, 2016.
  24. 24. Learning task-oriented dialogue management through: Dialogue Manager Human expert User Dialogue Corpus Pretraining Dialogue Manager Reward Function User Simulator Simulated Play Dialogue Manager Reward Function User Real Interactions Feedback Imitation Experimentation Feedback Supervised Learning Reinforcement Learning Interactive RL to scalably manage: ● Task complexity ● Discourse complexity Learning DM Policy
  25. 25. Natural Language Generation (NLG) ● Convert system’s action into natural language system turns. ○ Sequence-to-sequence model with attention ● System action is flattened into a sequence. ● Output could be de-lexicalized NL, i.e., <restaurant> does not have a table at <time1>, would <time2> work for you? ● Slot values are important for surface realization. request time go reservationyouriswhen ci … N. Nayak, D. Hakkani-Tur, M. Walker, L. Heck. “To Plan or not to Plan? Discourse planning in slot-value informed sequence to sequence models for language generation”, INTERSPEECH, 2017.
  26. 26. 26 Conversation Understanding Dialogue State Tracking Response Generation SYSTEM/AGENT Dialogue Manager BackEnd Action/Knowledge Providers Book me a table at Cascal for 2 people Sure, at what time do you want the reservation? Request(time) Goal-Oriented Dialogue Systems - Training restaurants reserve_rest. Rest._name: Cascal Num_people: 2 Back-end query Response
  27. 27. 27 Conversational Language Understanding Dialogue State Tracking Response Generation SYSTEM/AGENT Dialogue Manager Task Data Conversational Language Understanding Dialogue State Tracking Response Generation USER SIMULATOR Dialogue Manager User Goal ● User simulators that mimic real user and interact with system agent to collect data, bootstrap modeling, and perform evaluation. I/O: dialogue states Building User Simulators
  28. 28. 28 Building User Simulators: User Characteristics Personality traits: OCEAN (Wiggins, 1996), PEN (Eysenck, 1990) Model aspects that change conversation flow ● Talkativeness ● Cooperativeness ● Consistency ● Flexibility 0 0.71 1 0 0.49 1 0 0.71 1 0 0.26 1 quiet talkative consistenthesitant strict flexible cooperativeuncooperative
  29. 29. 29 Machines Talking to Machines Dialogue Acts S: greeting() U: greeting intent=reserve_restaurant inform(restaurant_name=il fornaio) S: request(date,time) U: inform(date=tonight,time=7pm) S: request(num_people) U: inform(num_people=3) S: negate(time=7pm) offer(time=6:30) U: affirm() S: notify_success() U: thanks() bye() S: bye() User Simulator System Agent Scenario: User type: cooperative User goal: Intent= reserve_restaurant r_name= Il Fornaio date=tonight time = 7pm * Num_people = 3
  30. 30. 30 Machines Talking to Machines Dialogue Acts Crowd Workers’ Surface Realization S: greeting() U: greeting intent=reserve_restaurant inform(restaurant_name=il fornaio) S: request(date,time) U: inform(date=tonight,time=7pm) S: request(num_people) U: inform(num_people=3) S: negate(time=7pm) offer(time=6:30) U: affirm() S: notify_success() U: thanks() bye() S: bye() Hi, how can I help you? Hey, can I reserve a spot at il Fornaio. Sure, what time and day are you dining? The dinner is tonight at 7 pm How many people will be attending? Myself and two others. Il Fornaio doesn’t have a table available at 7 pm. Would you be ok with 6:30 pm? Sure, that is also good. Great, We have your appointment all set. Awesome, I appreciate it. have a good day. You too. bye. User Simulator System Agent Scenario: User type: cooperative User goal: Intent= reserve_restaurant r_name= Il Fornaio date=tonight time = 7pm * Num_people = 3
  31. 31. 31 Machines Talking to Machines Dialogue Acts Crowd Workers’ Surface Realization S: greeting() U: greeting intent=reserve_restaurant inform(restaurant_name=il fornaio) S: request(date,time) U: inform(date=tonight,time=7pm) S: request(num_people) U: inform(num_people=3) S: negate(time=7pm) offer(time=6:30) U: affirm() S: notify_success() U: thanks() bye() S: bye() Hi, how can I help you? Hey, can I reserve a spot at il Fornaio. Sure, what time and day are you dining? The dinner is tonight at 7 pm How many people will be attending? Myself and two others. Il Fornaio doesn’t have a table available at 7 pm. Would you be ok with 6:30 pm? Sure, that is also good. Great, We have your appointment all set. Awesome, I appreciate it. have a good day. You too. bye. User Simulator System Agent Scenario: User type: cooperative User goal: Intent= reserve_restaurant r_name= Il Fornaio date=tonight time = 7pm * Num_people = 3 NLG CLU D S T
  32. 32. 32 What is next? ● Understanding meaning beyond words ○ “Later today”: 7-9pm for dinner, 3-5pm for meetings ● Personalization ● More lively conversations ● Complex conversations ○ Compositionality ○ Multi-domain tasks ● Interactions beyond domain boundaries
  33. 33. Thanks!

×