Advertisement

[系列活動] 一天搞懂對話機器人

台灣資料科學年會
Apr. 22, 2017
Advertisement

More Related Content

Slideshows for you(20)

Similar to [系列活動] 一天搞懂對話機器人(20)

Advertisement

More from 台灣資料科學年會(20)

Advertisement

[系列活動] 一天搞懂對話機器人

  1. 2 Future Life – Artificial Intelligence 2  Iron Man’s JARVIS https://www.facebook.com/zuck/videos/10103351034741311/
  2. 3 Tutorial Outline I. 對話系統及基本背景知識 II. 語言理解 (Language Understanding) III. 對話管理 (Dialogue Management) IV. 語言生成 (Language Generation) V. 對話系統評估、發展及趨勢 3
  3. 4 Tutorial Outline I. 對話系統及基本背景知識 II. 語言理解 (Language Understanding) III. 對話管理 (Dialogue Management) IV. 語言生成 (Language Generation) V. 對話系統評估、發展及趨勢 4
  4. 5 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG) 5
  5. Introduction Introduction 6
  6. 7 Early 1990s Early 2000s 2017 Multi-modal systems e.g., Microsoft MiPad, Pocket PC Keyword Spotting (e.g., AT&T) System: “Please say collect, calling card, person, third number, or operator” TV Voice Search e.g., Bing on Xbox Intent Determination (Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house” Task-specific argument extraction (e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.” Brief History of Dialogue Systems Apple Siri (2011) Google Now (2012) Facebook M & Bot (2015) Google Home (2016) Microsoft Cortana (2014) Amazon Alexa/Echo (2014) Google Assistant (2016) DARPA CALO Project Virtual Personal Assistants
  7. 8 Language Empowering Intelligent Assistant Apple Siri (2011) Google Now (2012) Facebook M & Bot (2015) Google Home (2016) Microsoft Cortana (2014) Amazon Alexa/Echo (2014) Google Assistant (2016)
  8. 9 Why We Need?  Daily Life Usage  Weather  Schedule  Transportation  Restaurant Seeking 9
  9. 10 Why We Need?  Get things done  E.g. set up alarm/reminder, take note  Easy access to structured data, services and apps  E.g. find docs/photos/restaurants  Assist your daily schedule and routine  E.g. commute alerts to/from work  Be more productive in managing your work and personal life 10
  10. 11 Why Natural Language?  Global Digital Statistics (2015 January) 11 Global Population 7.21B Active Internet Users 3.01B Active Social Media Accounts 2.08B Active Unique Mobile Users 3.65B The more natural and convenient input of devices evolves towards speech.
  11. 12 Intelligent Assistant Architecture 12 Reactive Assistance ASR, LU, Dialog, LG, TTS Proactive Assistance Inferences, User Modeling, Suggestions Data Back-end Data Bases, Services and Client Signals Device/Service End-points (Phone, PC, Xbox, Web Browser, Messaging Apps) User Experience “restaurant suggestions”“call taxi”
  12. 13 Spoken Dialogue System (SDS)  Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.  Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). 13 JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion Good dialogue systems assist users to access information conveniently and finish tasks efficiently.
  13. 14 App  Bot  A bot is responsible for a “single” domain, similar to an app 14 Seamless and automatic information transferring across domains  reduce duplicate information and interaction Food MAP LINE Goal: Schedule lunch with Vivian KKBOX
  14. 15 GUI v.s. CUI (Conversational UI) 15 https://github.com/enginebai/Movie-lol-android
  15. 16 GUI v.s. CUI (Conversational UI) 網站/APP內GUI 即時通訊內的CUI 情境 探索式,使用者沒有特定目標 搜尋式,使用者有明確的指令 資訊量 多 少 資訊精準度 低 高 資訊呈現 結構式 非結構式 介面呈現 以圖像為主 以文字為主 介面操作 以點選為主 以文字或語音輸入為主 學習與引導 使用者需了解、學習、適應不同 的介面操作 如同使用即時通訊軟體,使用者 無需另行學習,只需遵循引導 服務入口 需另行下載App或進入特定網站 與即時通訊軟體整合,可適時在 對話裡提供服務 服務內容 需有明確架構供檢索 可接受龐雜、彈性的內容 人性化程度 低,如同操作機器 高,如同與人對話 16
  16. 17 ChatBot Ecosystem & Company 17
  17. Two Branches of Bots  Personal assistant, helps users achieve a certain task  Combination of rules and statistical components  POMDP for spoken dialog systems (Williams and Young, 2007)  End-to-end trainable task-oriented dialogue system (Wen et al., 2016)  End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)  No specific goal, focus on natural responses  Using variants of seq2seq model  A neural conversation model (Vinyals and Le, 2015)  Reinforcement learning for dialogue generation (Li et al., 2016)  Conversational contextual cues for response ranking (AI-Rfou et al., 2016) 18 Task-Oriented Bot Chit-Chat Bot
  18. 19 Task-Oriented Dialogue System (Young, 2000) 19 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  19. 20 Interaction Example 20 User Intelligent Agent Q: How does a dialogue system process this request? Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. find a good eating place for taiwanese food
  20. 21 Task-Oriented Dialogue System (Young, 2000) 21 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers
  21. 22 1. Domain Identification Requires Predefined Domain Ontology 22 find a good eating place for taiwanese food User Organized Domain Knowledge (Database)Intelligent Agent Restaurant DB Taxi DB Movie DB Classification!
  22. 23 2. Intent Detection Requires Predefined Schema 23 find a good eating place for taiwanese food User Intelligent Agent Restaurant DB FIND_RESTAURANT FIND_PRICE FIND_TYPE : Classification!
  23. 24 3. Slot Filling Requires Predefined Schema find a good eating place for taiwanese food User Intelligent Agent 24 Restaurant DB Restaurant Rating Type Rest 1 good Taiwanese Rest 2 bad Thai : : : FIND_RESTAURANT rating=“good” type=“taiwanese” SELECT restaurant { rest.rating=“good” rest.type=“taiwanese” }Semantic Frame Sequence Labeling O O B-rating O O O B-type O
  24. 25 Task-Oriented Dialogue System (Young, 2000) 25 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers
  25. 26 State Tracking Requires Hand-Crafted States User Intelligent Agent find a good eating place for taiwanese food 26 location ratin g typ e loc, rating rating, type loc, type all i want it near to my office NULL
  26. 27 State Tracking Requires Hand-Crafted States User Intelligent Agent find a good eating place for taiwanese food 27 location ratin g typ e loc, rating rating, type loc, type all i want it near to my office NULL
  27. 28 State Tracking Handling Errors and Confidence User Intelligent Agent find a good eating place for taixxxx food 28 FIND_RESTAURANT rating=“good” type=“taiwanese” FIND_RESTAURANT rating=“good” type=“thai” FIND_RESTAURAN T rating=“good” location ratin g typ e loc, rating rating, type loc, type all NULL ? ?
  28. 29 Policy for Agent Action & Generation  Inform  “The nearest one is at Taipei 101”  Request  “Where is your home?”  Confirm  “Did you want Taiwanese food?”  Database Search  Task Completion / Information Display  ticket booked, weather information 29 Din Tai Fung : :
  29. Background Knowledge30 Neural Network Basics Reinforcement Learning
  30. 31 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG) 31
  31. 32 What Computers Can Do? 32 Programs can do the things you ask them to do
  32. 33 Program for Solving Tasks  Task: predicting positive or negative given a product review 33 “I love this product!” “It’s a little expensive.”“It claims too much.” + ?- “台灣第一波上市!” “樓下買了我才考慮 ” “規格好雞肋…” 推 ?噓 program.py Some tasks are complex, and we don’t know how to write a program to solve them. if input contains “love”, “like”, etc. output = positive if input contains “too much”, “bad”, etc. output = negative program.py program.py program.py program.py program.py
  33. 34 Learning ≈ Looking for a Function  Task: predicting positive or negative given a product review 34 “I love this product!” “It’s a little expensive.”“It claims too much.” + ?- “台灣第一波上 市!” “樓下買了我才考慮 ” “規格好雞肋…” 推 ?噓 f f f f f f Given a large amount of data, the machine learns what the function f should be.
  34. 35 Learning ≈ Looking for a Function  Speech Recognition  Image Recognition  Go Playing  Chat Bot  f  f  f  f cat “你好” 5-5 (next move) “中研院在哪裡” “地址為… 需要為你導航嗎”
  35. 36 Framework 36  1f “cat”  1f “do g”  2f “monkey”  2f “snak e” A set of function 21, ff Model  f “cat” Image Recognition:
  36. 37 Framework 37  f “cat” Image Recognition: A set of function 21, ff Model Training Data Goodness of function f “monkey” “cat” “dog” * f Pick the “Best” Function Using  f “cat” Training Testing Training is to pick the best function given the observed data Testing is to predict the label using the learned function
  37. 38 Machine Learning 38 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning
  38. 39 Deep Learning  DL is a general purpose framework for representation learning  Given an objective  Learn representation that is required to achieve objective  Directly from raw inputs  Use minimal domain knowledge 39 1x 2x … … 1y 2y … … … … … … … MyNx vector x vector y
  39. 40 A Single Neuron z 1w 2w Nw … 1x 2x Nx  b  z  z zbias y   z e z    1 1  Sigmoid function Activation function 1 w, b are the parameters of this neuron 40
  40. 41 A Single Neuron z 1w 2w Nw …1x 2x Nx  b bias y 1      5.0"2" 5.0"2" ynot yis A single neuron can only handle binary classification 41 MN RRf :
  41. 42 A Layer of Neurons  Handwriting digit classification MN RRf : A layer of neurons can handle multiple possible output, and the result depends on the max one … 1x 2x Nx  1  1y  … … “1” or not “2” or not “3” or not 2y 3y 10 neurons/10 classes Which one is max?
  42. 43 Deep Neural Networks (DNN)  Fully connected feedforward network 1x 2x …… Layer 1 …… 1y 2y …… Layer 2 …… Layer L …… …… …… Input Output MyNx vector x vector y Deep NN: multiple hidden layers MN RRf :
  43. 44 Loss / Objective 44 1x 2x …… 256x …… …… …… …… …… y1 y2 y10 Loss 𝑙 “1” …… 1 0 0…… Loss can be square error or cross entropy between the network output and target target Softmax As close as possible A good function should make the loss of all examples as small as possible. Given a set of parameters
  44. 45 Recurrent Neural Network (RNN) http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ : tanh, ReLU time RNN can learn accumulated sequential information (time-series)
  45. 46 Model Training  All model parameters can be updated by SGD 46http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ yt-1 yt+1yt target predicted
  46. 47 BPTT 47 For 𝐶(1)Backward Pass: For 𝐶(2) For 𝐶(3)For 𝐶(4) Forward Pass: Compute s1, s2, s3, s4 …… y1 y2 y3 x1 x2 x3 o1 o2 o3 ini t y4 x4 o4 𝐶(1) 𝐶(2) 𝐶(3) 𝐶(4) s1 s2 s3 s4 The model is trained by comparing the correct sequence tags and the predicted ones
  47. 48 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG) 48
  48. 49 Reinforcement Learning  RL is a general purpose framework for decision making  RL is for an agent with the capacity to act  Each action influences the agent’s future state  Success is measured by a scalar reward signal  Goal: select actions to maximize future reward Big three: action, state, reward
  49. 50 Reinforcement Learning 50 Agent Environment Observation Action RewardDon’t do that
  50. 51 Reinforcement Learning 51 Agent Environment Observation Action RewardThank you. Agent learns to take actions to maximize expected reward.
  51. Scenario of Reinforcement Learning Agent learns to take actions to maximize expected reward. 52 Environment Observation ot Action at Reward rt If win, reward = 1 If loss, reward = -1 Otherwise, reward = 0 Next Move
  52. 53 Supervised v.s. Reinforcement  Supervised  Reinforcement 53 Hello  Agent …… Agent ……. ……. …… Bad “Hello” Say “Hi” “Bye bye” Say “Good bye” Learning from teacher Learning from critics
  53. 54 Supervised v.s. Reinforcement  Supervised Learning  Training based on supervisor/label/annotation  Feedback is instantaneous  Time does not matter 54  Reinforcement Learning  Training only based on reward signal  Feedback is delayed  Time matters  Agent actions affect subsequent data
  54. 55 Reward  Reinforcement learning is based on reward hypothesis  A reward rt is a scalar feedback signal  Indicates how well agent is doing at step t Reward hypothesis: all agent goals can be desired by maximizing expected cumulative reward
  55. 56 Sequential Decision Making  Goal: select actions to maximize total future reward  Actions may have long-term consequences  Reward may be delayed  It may be better to sacrifice immediate reward to gain more long-term reward 56
  56. 57 Deep Reinforcement Learning Environment Observation Action Reward Function Input Function Output Used to pick the best function … … … DNN
  57. Modular Dialogue System58
  58. 59 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG) 59
  59. 60 Semantic Frame Representation  Requires a domain ontology  Contains core content (intent, a set of slots with fillers) find a cheap taiwanese restaurant in oakland show me action movies directed by james cameron find_restaurant (price=“cheap”, type=“taiwanese”, location=“oakland”) find_movie (genre=“action”, director=“james cameron”) Restaurant Domain Movie Domain restaurant typeprice location movie yeargenre director 60
  60. 61 Language Understanding (LU)  Pipelined 61 1. Domain Classification 2. Intent Classification 3. Slot Filling
  61. LU – Domain/Intent Classification • Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels for new utterances uk. Mainly viewed as an utterance classification task 62 find me a cheap taiwanese restaurant in oakland Movies Restaurants Sports Weather Music … Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics …
  62. 63 Language Understanding - Slot Filling 63 Is there um a cheap place in the centre of town please? Is there um a cheap place in the centre of town please? O O O O B-price O O O B-area O I-areaI-area Figure credited by Milica Gašić As a sequence tagging task •CRF for tagging each utterance As a classification task •SVM for each slot value pair
  63. 64 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG) 64
  64. 65 Dialogue State Tracking (DST)  Maintain a probabilistic distribution instead of a 1-best prediction for better robustness 65Slide credited by Sungjin Lee Incorrect for both!
  65. 66 Dialogue State Tracking (DST)  Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input 66 How can I help you? Book a table at Sumiko for 5 How many people? 3 Slot Value # people 5 (0.5) time 5 (0.5) Slot Value # people 3 (0.8) time 5 (0.8)
  66. 67 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG) 67
  67. 68 Dialogue Policy Optimization  Dialogue management in a RL framework 68 U s e r Reward R Observation OAction A Environment Agent Natural Language Generation Language Understanding Dialogue Manager Slides credited by Pei-Hao Su Optimized dialogue policy selects the best action that can maximize the future reward. Correct rewards are a crucial factor in dialogue policy training
  68. 69 Reward for RL ≅ Evaluation for SDS  Dialogue is a special RL task  Human involves in interaction and rating (evaluation) of a dialogue  Fully human-in-the-loop framework  Rating: correctness, appropriateness, and adequacy - Expert rating high quality, high cost - User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost 69
  69. 70 Dialogue Reinforcement Signal Typical Reward Function  per turn penalty -1  Large reward at completion if successful Typically requires domain knowledge ✔ Simulated user ✔ Paid users (Amazon Mechanical Turk) ✖ Real users ||| … ﹅ 70
  70. 71 User Simulation  User Simulation  Goal: generate natural and reasonable conversations to enable reinforcement learning for exploring the policy space  Approach  Rule-based crafted by experts (Li et al., 2016)  Learning-based (Schatzmann et al., 2006) Dialogue Corpus Simulated User Real User Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Interaction 71
  71. 72 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG) 72
  72. 73 Natural Language Generation (NLG)  Mapping semantic frame into natural language inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant 73
  73. 74 Template-Based NLG  Define a set of rules to map frames to NL 74 Pros: simple, error-free, easy to control Cons: time-consuming, poor scalability Semantic Frame Natural Language confirm() “Please tell me more about the product your are looking for.” confirm(area=$V) “Do you want somewhere in the $V?” confirm(food=$V) “Do you want a $V restaurant?” confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
  74. 75 RNN Language Generator (Wen et al., 2015) <BOS> SLOT_NAME serves SLOT_FOOD . <BOS> EAT serves British . delexicalisation Inform(name=EAT, food=British) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… … dialog act 1-hot representation SLOT_NAME serves SLOT_FOOD . <EOS> Weight tying 75
  75. 76 Concluding Remarks  Modular dialogue system 76 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal
  76. 77 Tutorial Outline I. 對話系統及基本背景知識 II. 語言理解 (Language Understanding) III. 對話管理 (Dialogue Management) IV. 語言生成 (Language Generation) V. 對話系統評估、發展及趨勢 77
  77. 語言理解 Ontology and Database78
  78. 79 Task-Oriented Dialogue System (Young, 2000) 79 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Database/ Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  79. 80 Task-Oriented Dialogue System (Young, 2000) 80 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  80. 81 Semantic Frame Representation  Requires a domain ontology: early connection to backend  Contains core concept (intent, a set of slots with fillers) find me a cheap taiwanese restaurant in oakland show me action movies directed by james cameron find_restaurant (price=“cheap”, type=“taiwanese”, location=“oakland”) find_movie (genre=“action”, director=“james cameron”) Restaurant Domain Movie Domain restaurant typeprice location movie yeargenre director 81
  81. 82 Backend Database / Ontology 82  Domain-specific table  Target and attributes Movie Name Theater Rating Date Time 美女與野獸 台北信義威秀 8.5 2017/03/21 09:00 美女與野獸 台北信義威秀 8.5 2017/03/21 09:25 美女與野獸 台北信義威秀 8.5 2017/03/21 10:15 美女與野獸 台北信義威秀 8.5 2017/03/21 10:40 美女與野獸 台北信義威秀 8.5 2017/03/21 11:05 movie name daterating theater time
  82. 83 Supported Functionality  Information access  Finding the specific entries from the table  E.g. available theater, movie rating, etc.  Task completion  Finding the row that satisfies the constraints 83 Movie Name Theater Rating Date Time 美女與野獸 台北信義威秀 8.5 2017/03/21 09:00 美女與野獸 台北信義威秀 8.5 2017/03/21 09:25 美女與野獸 台北信義威秀 8.5 2017/03/21 10:15 美女與野獸 台北信義威秀 8.5 2017/03/21 10:40 美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
  83. 84 Dialogue Schema  Slot: domain-specific attributes  Columns from the table  e.g. theater, date 84 Movie Name Theater Rating Date Time 美女與野獸 台北信義威秀 8.5 2017/03/21 09:00 美女與野獸 台北信義威秀 8.5 2017/03/21 09:25 美女與野獸 台北信義威秀 8.5 2017/03/21 10:15 美女與野獸 台北信義威秀 8.5 2017/03/21 10:40 美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
  84. 85 Dialogue Schema  Dialogue Act  inform, request, confirm (system only)  Task-specific action (e.g. book_ticket)  Others (e.g. thanks) 85 Find me the Bill Murray’s movie. I think it came out in 1993. When was it released? BotUser request(moviename; actor=Bill Murray) request(releaseyear) inform(releaseyear=1993) User Intent = Dialogue Act + Slot System Action = Dialogue Act + Slot
  85. 86 Examples  User: 我想要聽適合唸書的歌 request(song; situation ​=“唸書​")  Require situation labels for songs  User: 請問附近有類似有錢真好的店嗎? request(restaurant; similar_to=“有錢真好”)  Require similar_to labels for restaurants 86
  86. 87 Examples  User: 離散數學的教室是哪間?  User: 週四7,8,9有資訊系的課嗎? 87
  87. 88 Trick  The typed constraints are attributes/slots stored in the columns 88
  88. 89 Table & Graph Backend 89 https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html
  89. 90 Data Collection  Given the task-specific goal, what do users say? 90 Movie Name Theater Rating Date Time 美女與野獸 台北信義威秀 8.5 2017/03/21 09:00 美女與野獸 台北信義威秀 8.5 2017/03/21 09:25 美女與野獸 台北信義威秀 8.5 2017/03/21 10:15 美女與野獸 台北信義威秀 8.5 2017/03/21 10:40 美女與野獸 台北信義威秀 8.5 2017/03/21 11:05 你知道美女與野獸電影的評價如何嗎? 我想看十點左右在信義威秀的電影 Information Access Task completion
  90. 91 Annotation Design  Query target: intent  request + targeted attribute  Constraints: slots & values  Attributes with specified values 你知道美女與野獸電影的評價如何嗎? request(rating; moviename=美女與野獸) book_ticket(time=十點左右, theater=信義威秀) 我想看十點左右在信義威秀的電影
  91. 92 Extension  Multiple tables as backend  Each table is responsible for a domain  Associate intent with corresponding table/domain Movie Name Theater Date Time 美女與野獸 台北信義威秀 2017/03/21 09:00 美女與野獸 台北信義威秀 2017/03/21 09:25 美女與野獸 台北信義威秀 2017/03/21 10:15 Movie Name Cast Release Year 美女與野獸 Emma Watson 2017 鋼鐵人 : : 你知道美女與野獸是那一年的電影嗎? request(kng.release_year; moviename=美女與野獸) Ticket Table Knowledge Table
  92. 93 Concluding Remarks  Semantic schema highly corresponds to backend DB Movie Name Theater Rating Date Time 美女與野獸 台北信義威秀 8.5 2017/03/21 09:00 美女與野獸 台北信義威秀 8.5 2017/03/21 09:25 美女與野獸 台北信義威秀 8.5 2017/03/21 10:15 美女與野獸 台北信義威秀 8.5 2017/03/21 10:40 美女與野獸 台北信義威秀 8.5 2017/03/21 11:05 slot show me the rating of beauty and beast find_rating (movie=“beauty and beast”) movie timerating date
  93. 語言理解 Language Understanding94
  94. 95 Task-Oriented Dialogue System (Young, 2000) 95 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Database/ Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  95. 96 Task-Oriented Dialogue System (Young, 2000) 96 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  96. Classic Language Understanding97
  97. 98 Language Understanding (LU)  Pipelined 98 1. Domain Classification 2. Intent Classification 3. Slot Filling
  98. LU – Domain/Intent Classification As an utterance classification task • Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels for new utterances uk. 99 find me a cheap taiwanese restaurant in oakland Movies Restaurants Music Sports … find_movie, buy_tickets find_restaurant, find_price, book_table find_lyrics, find_singer … Domain Intent
  99. 100 Conventional Approach 100 Data Model Prediction dialogue utterances annotated with domains/intents domains/intents machine learning classification model e.g. support vector machine (SVM)
  100. 101 Theory: Support Vector Machine  SVM is a maximum margin classifier  Input data points are mapped into a high dimensional feature space where the data is linearly separable  Support vectors are input data points that lie on the margin 101 http://www.csie.ntu.edu.tw/~htlin/mooc/
  101. 102  z z Theory: Support Vector Machine  Multiclass SVM  Extended using one-versus-rest approach  Then transform into probability http://www.csie.ntu.edu.tw/~htlin/mooc/ SVM1 SVM2 SVM3 SVMk S1 S2 S3 Sk score for each class … … P1 P2 P3 Pk prob for each class … … Domain/intent can be decided based on the estimated scores
  102. LU – Slot Filling 103 flights from Boston to New York today O O B-city O B-city I-city O O O B-dept O B-arrival I-arrival B-date As a sequence tagging task • Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …} where ti ∊ M, the goal is to estimate tags for a new word sequence. flights from Boston to New York today Entity Tag Slot Tag
  103. 104 Conventional Approach 104 Data Model Prediction dialogue utterances annotated with slots slots and their values machine learning tagging model e.g. conditional random fields (CRF)
  104. 105 Theory: Conditional Random Fields  CRF assumes that the label at time step t depends on the label in the previous time step t-1  Maximize the log probability log p(y | x) with respect to parameters λ 105 input output Slots can be tagged based on the y that maximizes p(y|x)
  105. Neural Network Based LU106
  106. 107 Recurrent Neural Network (RNN) http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ : tanh, ReLU time RNN can learn accumulated sequential information (time-series)
  107. 108 Deep Learning Approach 108 Data Model Prediction dialogue utterances annotated with semantic frames (user intents & slots) user intents, slots and their values deep learning model (classification/tagging) e.g. recurrent neural networks (RNN)
  108. 109 Classification Model  Input: each utterance ui is represented as a feature vector fi  Output: a domain/intent label ci for each input utterance 109 As an utterance classification task • Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels for new utterances uk. How to represent a sentence using a feature vector
  109. Sequence Tagging Model 110 As a sequence tagging task • Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …} where ti ∊ M, the goal is to estimate tags for a new word sequence.  Input: each word wi,j is represented as a feature vector fi,j  Output: a slot label ti for each word in the utterance How to represent a word using a feature vector
  110. 111 Word Representation  Atomic symbols: one-hot representation 111 [0 0 0 0 0 0 1 0 0 … 0] [0 0 0 0 0 0 1 0 0 … 0] [0 0 1 0 0 0 0 0 0 … 0]AND = 0 Issues: difficult to compute the similarity (i.e. comparing “car” and “motorcycle”) car car car motorcycle
  111. 112 Word Representation  Neighbor-based: low-dimensional dense word embedding 112 Idea: words with similar meanings often have similar neighbors
  112. 113 Chinese Input Unit of Representation  Character  Feed each char to each time step  Word  Word segmentation required 你知道美女與野獸電影的評價如何嗎? 你/知道/美女與野獸/電影/的/評價/如何/嗎 Can two types of information fuse together for better performance?
  113. LU – Domain/Intent Classification As an utterance classification task • Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels for new utterances uk. 114 find me a cheap taiwanese restaurant in oakland Movies Restaurants Music Sports … find_movie, buy_tickets find_restaurant, find_price, book_table find_lyrics, find_singer … Domain Intent
  114. 115 Deep Neural Networks for Domain/Intent Classification (Ravuri and Stolcke, 2015)  RNN and LSTMs for utterance classification  Word hashing to deal with large number of singletons  Kat: #Ka, Kat, at#  Each character n-gram is associated with a bit in the input encoding 115 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf
  115. LU – Slot Filling 116 flights from Boston to New York today O O B-city O B-city I-city O O O B-dept O B-arrival I-arrival B-date As a sequence tagging task • Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …} where ti ∊ M, the goal is to estimate tags for a new word sequence. flights from Boston to New York today Entity Tag Slot Tag
  116. 117 Recurrent Neural Nets for Slot Tagging – I (Yao et al, 2013; Mesnil et al, 2015)  Variations: a. RNNs with LSTM cells b. Input, sliding window of n-grams c. Bi-directional LSTMs 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 𝑓 ℎ1 𝑓 ℎ2 𝑓 ℎ 𝑛 𝑓 ℎ0 𝑏 ℎ1 𝑏 ℎ2 𝑏 ℎ 𝑛 𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛 (b) LSTM-LA (c) bLSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 (a) LSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380
  117. 118 Recurrent Neural Nets for Slot Tagging – II (Kurata et al., 2016; Simonnet et al., 2015)  Encoder-decoder networks  Leverages sentence level information  Attention-based encoder- decoder  Use of attention (as in MT) in the encoder-decoder network  Attention is estimated using a feed-forward network with input: ht and st at time t 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤 𝑛 𝑤2 𝑤1 𝑤0 ℎ 𝑛 ℎ2 ℎ1 ℎ0 𝑤0 𝑤1 𝑤2 𝑤 𝑛 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 𝑠0 𝑠1 𝑠2 𝑠 𝑛 ci ℎ0 ℎ 𝑛… http://www.aclweb.org/anthology/D16-1223
  118. ht- 1 ht+ 1 ht W W W W taiwanese B-type U food U please U V O V O V hT+1 EOS U FIND_RES T V Slot Filling Intent Prediction Joint Semantic Frame Parsing Sequence- based (Hakkani-Tur et al., 2016) • Slot filling and intent prediction in the same output sequence Parallel (Liu and Lane, 2016) • Intent prediction and slot filling are performed in two branches 119 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454
  119. 120 LU Evaluation  Metrics  Sub-sentence-level: intent accuracy, slot F1  Sentence-level: whole frame accuracy 120
  120. 121 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Database/ Knowledge Providers Concluding Remarks 121
  121. 122 Language Understanding Demo  Movie Bot  http://140.112.21.82:8000/  Example  晚上九點的當他們認真編織時  我想要看晚上七點的電影  我想要看晚上七點在新竹大遠百威秀影城的電影  我要看新竹大遠百威秀影城的刀劍神域劇場版序列爭戰  我想要看晚上七點在新竹大遠百威秀影城的刀劍神域劇 場版序列爭戰 122
  122. 123 Tutorial Outline I. 對話系統及基本背景知識 II. 語言理解 (Language Understanding) III. 對話管理 (Dialogue Management) IV. 語言生成 (Language Generation) V. 對話系統評估、發展及趨勢 123
  123. 對話管理 Dialogue State Tracking124
  124. 125 Task-Oriented Dialogue System (Young, 2000) 125 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Database/ Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  125. 126 Task-Oriented Dialogue System (Young, 2000) 126 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy
  126. 127 Rule-Based Management 127
  127. 128 Example Dialogue 128 request (restaurant; foodtype=Thai) inform (area=centre) request (address) bye ()
  128. 129 Elements of Dialogue Management 129(Figure from Gašić)
  129. 130 Dialogue State Tracking (DST)  Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to recognition errors 130 Incorrect for both!
  130. 131 Dialogue State Tracking (DST)  Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input 131 How can I help you? Book a table at Sumiko for 5 How many people? 3 Slot Value # people 5 (0.5) time 5 (0.5) Slot Value # people 3 (0.8) time 5 (0.8)
  131. 132 1-Best Input w/o State Tracking 132
  132. 133 N-Best Inputs w/o State Tracking 133
  133. 134 N-Best Inputs w/ State Tracking 134
  134. 135 Dialogue State Tracking (DST)  Definition  Representation of the system's belief of the user's goal(s) at any time during the dialogue  Challenge  How to define the state space?  How to tractably maintain the dialogue state?  Which actions to take for each state? 135 Define dialogue as a control problem where the behavior can be automatically learned
  135. 136 Reinforcement Learning  RL is a general purpose framework for decision making  RL is for an agent with the capacity to act  Each action influences the agent’s future state  Success is measured by a scalar reward signal  Goal: select actions to maximize future reward Big three: action, state, reward
  136. 137 Agent and Environment  At time step t  The agent  Executes action at  Receives observation ot  Receives scalar reward rt  The environment  Receives action at  Emits observation ot+1  Emits scalar reward rt+1  t increments at env. step observation ot action at reward rt
  137. 138 State  Experience is the sequence of observations, actions, rewards  State is the information used to determine what happens next  what happens depends on the history experience • The agent selects actions • The environment selects observations/rewards  The state is the function of the history experience
  138. 139 observation ot action at reward rt Environment State  The environment state 𝑠𝑡 𝑒 is the environment’s private representation  whether data the environment uses to pick the next observation/reward  may not be visible to the agent  may contain irrelevant information
  139. 140 observation ot action at reward rt Agent State  The agent state 𝑠𝑡 𝑎 is the agent’s internal representation  whether data the agent uses to pick the next action  information used by RL algorithms  can be any function of experience
  140. 141 Reward  Reinforcement learning is based on reward hypothesis  A reward rt is a scalar feedback signal  Indicates how well agent is doing at step t Reward hypothesis: all agent goals can be desired by maximizing expected cumulative reward
  141. 142 Elements of Dialogue Management 142(Figure from Gašić) Dialogue state tracking
  142. 143 Dialogue State Tracking (DST)  Requirement  Dialogue history  Keep tracking of what happened so far in the dialogue  Normally done via Markov property  Task-oriented dialogue  Need to know what the user wants  Modeled via the user goal  Robustness to errors  Need to know what the user says  Modeled via the user action 143
  143. 144 DST Problem Formulation  The DST dataset consists of  Goal: for each informable slot  e.g. price=cheap  Requested: slots by the user  e.g. moviename  Method: search method for entities  e.g. by constraints, by name  The dialogue state is  the distribution over possible slot-value pairs for goals  the distribution over possible requested slots  the distribution over possible methods 144
  144. 145 Class-Based DST 145 Data Model Prediction • Observations labeled w/ dialogue state • Distribution over dialogue states – Dialogue State Tracking • Neural networks • Ranking models
  145. 146 DNN for DST 146 feature extraction DNN A slot value distribution for each slot multi-turn conversation state of this turn
  146. 147 Sequence-Based DST 147 Data Model Prediction • Sequence of observations labeled w/ dialogue state • Distribution over dialogue states – Dialogue State Tracking • Recurrent neural networks (RNN)
  147. 148 Recurrent Neural Network (RNN)  Elman-type  Jordan-type 148
  148. 149 RNN DST  Idea: internal memory for representing dialogue context  Input  most recent dialogue turn  last machine dialogue act  dialogue state  memory layer  Output  update its internal memory  distribution over slot values 149
  149. 150 RNN-CNN DST 150(Figure from Wen et al, 2016) http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190
  150. 151 Multichannel Tracker (Shi et al., 2016) 151  Training a multichannel CNN for each slot  Chinese character CNN  Chinese word CNN  English word CNN https://arxiv.org/abs/1701.06247
  151. 152 DST Evaluation  Metric  Tracked state accuracy with respect to user goal  L2-norm of the hypothesized dist. and the true label  Recall/Precision/F-measure individual slots 152
  152. 153 Dialog State Tracking Challenge (DSTC) (Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016) Challenge Type Domain Data Provider Main Theme DSTC1 Human- Machine Bus Route CMU Evaluation Metrics DSTC2 Human- Machine Restaurant U. Cambridge User Goal Changes DSTC3 Human- Machine Tourist Information U. Cambridge Domain Adaptation DSTC4 Human- Human Tourist Information I2R Human Conversation DSTC5 Human- Human Tourist Information I2R Language Adaptation
  153. 154 DSTC1  Type: Human-Machine  Domain: Bus Route 154
  154. 155 DSTC4-5  Type: Human-Human  Domain: Tourist Information 155 Tourist: Can you give me some uh- tell me some cheap rate hotels, because I'm planning just to leave my bags there and go somewhere take some pictures. Guide: Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right? Tourist: Yes. I'm just gonna bring my backpack and my buddy with me. So I'm kinda looking for a hotel that is not that expensive. Just gonna leave our things there and, you know, stay out the whole day. Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit uh not so roomy like hotel because you just back to sleep. Tourist: Yes. Yes. As we just gonna put our things there and then go out to take some pictures. Guide: Okay, um- Tourist: Hm. Guide: Let's try this one, okay? Tourist: Okay. Guide: It's InnCrowd Backpackers Hostel in Singapore. If you take a dorm bed per person only twenty dollars. If you take a room, it's two single beds at fifty nine dollars. Tourist: Um. Wow, that's good. Guide: Yah, the prices are based on per person per bed or dorm. But this one is room. So it should be fifty nine for the two room. So you're actually paying about ten dollars more per person only. Tourist: Oh okay. That's- the price is reasonable actually. It's good. {Topic: Accommodation; Type: Hostel; Pricerange: Cheap; GuideAct: ACK; TouristAct: REQ} {Topic: Accommodation; NAME: InnCrowd Backpackers Hostel; GuideAct: REC; TouristAct: ACK}
  155. 156 Concluding Remarks  Dialogue state tracking (DST) of DM has Markov assumption to model the user goal and be robust to errors 156
  156. 對話管理 Dialogue Policy157
  157. 158 Task-Oriented Dialogue System (Young, 2000) 158 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Database/ Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  158. 159 Task-Oriented Dialogue System (Young, 2000) 159 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy
  159. 160 Rule-Based Management 160
  160. 161 Example Dialogue 161 request (restaurant; foodtype=Thai) inform (area=centre) request (address) bye () greeting () request (area) inform (restaurant=Bangkok city, area=centre of town, foodtype=Thai) inform (address=24 Green street)
  161. 162 Example Dialogue 162 request (restaurant; foodtype=Thai) inform (area=centre) request (address) bye () greeting () request (area) inform (restaurant=Bangkok city, area=centre of town, foodtype=Thai) inform (address=24 Green street)
  162. 163 Elements of Dialogue Management 163(Figure from Gašić)
  163. 164 Reinforcement Learning  RL is a general purpose framework for decision making  RL is for an agent with the capacity to act  Each action influences the agent’s future state  Success is measured by a scalar reward signal  Goal: select actions to maximize future reward Big three: action, state, reward
  164. 165 State  Experience is the sequence of observations, actions, rewards  State is the information used to determine what happens next  what happens depends on the history experience • The agent selects actions • The environment selects observations/rewards  The state is the function of the history experience
  165. 166 Elements of Dialogue Management 166(Figure from Gašić) Dialogue policy optimization
  166. 167 Dialogue Policy Optimization  Dialogue management in a RL framework 167 U s e r Reward R Observation OAction A Environment Agent Natural Language Generation Language Understanding Dialogue Manager The optimized dialogue policy selects the best action that maximizes the future reward. Correct rewards are a crucial factor in dialogue policy training
  167. 168 Policy Optimization Issue  Optimization problem size  Belief dialogue state space is large and continuous  System action space is large  Knowledge environment (user)  Transition probability is unknown (user status)  How to get rewards 168
  168. 169 Reinforcement Learning for Dialogue Policy Optimization 169 Language understanding Language (response) generation Dialogue Policy 𝑎 = 𝜋(𝑠) Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’) Optimize 𝑄(𝑠, 𝑎) User input (o) Response 𝑠 𝑎 Type of Bots State Action Reward Social ChatBots Chat history System Response # of turns maximized; Intrinsically motivated reward InfoBots (interactive Q/A) User current question + Context Answers to current question Relevance of answer; # of turns minimized Task-Completion Bots User current input + Context System dialogue act w/ slot value (or API calls) Task success rate; # of turns minimized Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories
  169. 170 Large Belief Space and Action Space  Solution: perform optimization in a reduced summary space built according to the heuristics 170 Belief Dialogue State System Action Summary Dialogue State Summary Action Summary Policy Summary Function Master Function
  170. 171 Transition Probability and Rewards  Solution: learn from real users Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Backend Database/ Knowledge Providers
  171. 172 Transition Probability and Rewards  Solution: learn from a simulated user 172 Error Model • Recognition error • LU error Dialogue State Tracking (DST) System dialogue acts Reward Backend Action / Knowledge Providers Dialogue Policy Optimization Dialogue Management (DM) User Model Reward Model User Simulation Distribution over user dialogue acts (semantic frames)
  172. 173 Reward for RL ≅ Evaluation for System  Dialogue is a special RL task  Human involves in interaction and rating (evaluation) of a dialogue  Fully human-in-the-loop framework  Rating: correctness, appropriateness, and adequacy - Expert rating high quality, high cost - User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost 173
  173. 174 Dialogue Reinforcement Learning Signal Typical reward function  -1 for per turn penalty  Large reward at completion if successful Typically requires domain knowledge ✔ Simulated user ✔ Paid users (Amazon Mechanical Turk) ✖ Real users ||| … ﹅ 174 The user simulator is usually required for dialogue system training before deployment
  174. 175 RL-Based Dialogue Management (Li et al., 2017)  Deep RL for training dialogue policy  Input: current semantic frame observation, database returned results  Output: system action 175 Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Dialogue Management (DM)Simulated/paid/real User Backend DB https://arxiv.org/abs/1703.01008
  175. 176 Major Components in an RL Agent  An RL agent may include one or more of these components  Policy: agent’s behavior function  Value function: how good is each state and/or action  Model: agent’s representation of the environment 176
  176. 177 Policy  A policy is the agent’s behavior  A policy maps from state to action  Deterministic policy:  Stochastic policy: 177
  177. 178 Value Function  A value function is a prediction of future reward (with action a in state s)  Q-value function gives expected total reward  from state and action  under policy  with discount factor  Value functions decompose into a Bellman equation 178
  178. 179 Optimal Value Function  An optimal value function  is the maximum achievable value  allows us to act optimally  informally maximizes over all decisions  decompose into a Bellman equation 179
  179. 180 Reinforcement Learning Approach  Policy-based RL  Search directly for optimal policy  Value-based RL  Estimate the optimal value function  Model-based RL  Build a model of the environment  Plan (e.g. by lookahead) using model 180 is the policy achieving maximum future reward is maximum value achievable under any policy
  180. 181 Maze Example  Rewards: -1 per time- step  Actions: N, E, S, W  States: agent’s location 181
  181. 182 Maze Example: Policy  Rewards: -1 per time- step  Actions: N, E, S, W  States: agent’s location 182 Arrows represent policy π(s) for each state s
  182. 183 Maze Example: Value Function  Rewards: -1 per time- step  Actions: N, E, S, W  States: agent’s location 183 Numbers represent value Qπ(s) of each state s
  183. Estimate How Good Each State and/or Action is Value-Based Deep RL184
  184. 185 Value Function Approximation  Value functions are represented by a lookup table  too many states and/or actions to store  too slow to learn the value of each entry individually  Values can be estimated with function approximation 185
  185. 186 Q-Networks  Q-networks represent value functions with weights  generalize from seen states to unseen states  update parameter for function approximation 186
  186. 187 Q-Learning  Goal: estimate optimal Q-values  Optimal Q-values obey a Bellman equation  Value iteration algorithms solve the Bellman equation 187 learning target
  187. 188 Deep Q-Networks (DQN)  Represent value function by deep Q-network with weights  Objective is to minimize mean square error (MSE) loss by SGD 188 Issue: naïve Q-learning oscillates or diverges using NN due to: 1) correlations between samples 2) non-stationary targets learning target
  188. 189 Deep RL in Atari Games 189 Different tricks are performed to enable RL training
  189. 190 DQN in Atari  Goal: end-to-end learning of values Q(s, a) from pixels  Input: state is stack of raw pixels from last 4 frames  Output: Q(s, a) for all joystick/button positions a  Reward is the score change for that step 190DQN Nature Paper [link] [code]
  190. DQN in Atari 191 DQN Nature Paper [link] [code]
  191. DQN in Movie Bot  Goal: interact with the user for movie ticket booking 192 User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle. RULE BASED SYSTEM User: Can I get 2 tickets for the witch? Agent: Which theater do you want? User: Regal meridian 16 Agent: What time would you like to see it? User: 9:30 pm Agent: What date would you like to see it? User: Tomorrow! Agent: How many people are you? User: Just two adults. Agent: What is your location? User: Seattle! Agent: What kind of video format do you like? User: Oh, any kind is okay. Agent: Do you have any price constraint? User: No, I don’t care. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thank you. Agent: Thank you! REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch? Agent: What date would you like to see it? User: Tomorrow Agent: Which theater do you prefer? User: Regal meridian 16 Agent: What is your location? User: Seattle Agent: What time would you like to see it? User: 9:30 pm. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thanks. Agent: Thanks! https://arxiv.org/abs/1703.01008
  192. Estimate How Good An Agent’s Behavior is Policy-Based Deep RL193
  193. 194 Deep Policy Networks  Represent policy by deep network with weights  Objective is to maximize total discounted reward by SGD 194 stochastic policy deterministic policy
  194. 195 Policy Gradient  The gradient of a stochastic policy is given by  The gradient of a deterministic policy is given by 195 How to deal with continuous actions
  195. 196 Actor-Critic (Value-Based + Policy-Based)  Estimate value function  Update policy parameters by SGD  Stochastic policy  Deterministic policy 196 Q-networks tell whether a policy is good or not Policy networks optimize the policy accordingly
  196. 197 Deterministic Deep Policy Gradient  Goal: end-to-end learning of control policy from pixels  Input: state is stack of raw pixels from last 4 frames  Output: two separate CNNs for Q and 𝜋 197Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv, 2015.
  197. 198 Deep RL AI Examples  Play games: Atari, poker, Go, …  Control physical systems: manipulate, …  Explore worlds: 3D worlds, …  Interact with users: recommend, optimize, personalize, … 198
  198. 199 Concluding Remarks  Dialogue policy optimization can be viewed as an RL task  Transition probability and reward come from  Real user  Simulated user
  199. 對話管理 User Simulation200
  200. 201 Task-Oriented Dialogue System (Young, 2000) 201 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Database/ Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  201. 202 Task-Oriented Dialogue System (Young, 2000) 202 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy
  202. 203 User Simulation  Goal: generate natural and reasonable conversations to enable reinforcement learning for exploring the policy space  Approach  Rule-based crafted by experts (Li et al., 2016)  Learning-based (Schatzmann et al., 2006; El Asri et al., 2016) Dialogue Corpus Simulated User Real User Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Interaction keeps a list of its goals and actions randomly generates an agenda updates its list of goals and adds new ones
  203. 204 Elements of User Simulation Error Model • Recognition error • LU error Dialogue State Tracking (DST) System dialogue acts Backend Action / Knowledge Providers Dialogue Policy Optimization Dialogue Management (DM) User Model Reward Model User Simulation Distribution over user dialogue acts (semantic frames) Reward
  204. 205 Elements of User Simulation Error Model • Recognition error • LU error Dialogue State Tracking (DST) System dialogue acts Reward Backend Action / Knowledge Providers Dialogue Policy Optimization Dialogue Management (DM) User Model Reward Model User Simulation Distribution over user dialogue acts (semantic frames) The user model defines the natural interactive flow to convey the user’s goal
  205. 206 User Goal  Sampled from a real corpus / hand-crafted list  Information access  Inflexible: the user has specific entries for search  Task completion  Inflexible: the user has specific rows for search  Flexible: some slot values are unknown or can be suggested 206 request_ticket, inform_slots: {moviename=“A”, …} request_slot: rating, inform_slots: {moviename=“A”, …} request_ticket, request_slots: {time, theater}, inform_slots: {moviename=“A”, …}
  206. 207 Supported Functionality of Ontology  Information access  Finding the specific entries from the table  Task completion  Finding the row that satisfies the constraints 207 Movie Name Theater Rating Date Time 美女與野獸 台北信義威秀 8.5 2017/03/21 09:00 美女與野獸 台北信義威秀 8.5 2017/03/21 09:25 美女與野獸 台北信義威秀 8.5 2017/03/21 10:15 美女與野獸 台北信義威秀 8.5 2017/03/21 10:40 美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
  207. 208 User Goal – Information Access  Information access with inflexible constraints  Finding the specific entries from the table  Intent examples in the ticket DB: request_moviename, request_theater, request_time, etc  Intent examples in the knowledge DB: request_moviename, request_year  Specifying the constraints for the search target  The specified slots should align with the corresponding DB 208 Movie Name Theater Date Time 美女與野獸 台北信義威秀 2017/03/21 09:00 美女與野獸 台北信義威秀 2017/03/21 09:25 美女與野獸 台北信義威秀 2017/03/21 10:15 Movie Name Cast Year 美女與野獸 Emma Watson 2017 鋼鐵人 : :
  208. 209 User Goal – Task-Completion  Task completion with inflexible constraints  Finding the specific rows from the table  Intent examples: request_ticket (ticket DB)  Specifying the slot constraints for the search target  Informable slot: theater, date, etc  Requestable slot: time, number_of_people 209 Movie Name Theater Date Time 美女與野獸 台北信義威秀 2017/03/21 09:00 美女與野獸 台北信義威秀 2017/03/21 09:25 美女與野獸 台北信義威秀 2017/03/21 10:15
  209. 210 User Goal – Task-Completion  Task completion with flexible constraints  No priority: {台北信義威秀, 台北日新威秀}  Prioritized: {台北信義威秀>台北日新威秀} 210 Movie Name Theater Date Time 美女與野獸 台北信義威秀 2017/03/21 09:00 美女與野獸 台北信義威秀 2017/03/21 09:25 美女與野獸 台北信義威秀 2017/03/21 10:15
  210. 211 Satisfiability and Uniqueness  Use goal characteristics  Satisfiability – the predefined goal can be satisfied by the backend database  Uniqueness – all satisfied entries should be in the goal 211 Movie Name Actor Year 鋼鐵人 Robert John Downey 2008 福爾摩斯 Robert John Downey 2008 request_actor(moviename=鋼鐵人, year=2007) request_moviename(actor=Robert John Downey, year=2008)
  211. 212 More Extension  Slot value hierarchy  “morning”  {9:15, 9:45, 10:15, 10:45, 11:15, 11:45}  Inexact match / partial match  “ten o’clock”  {9:45, 10:15}  “emma”  {Emma Watson, Emma Stone} 212
  212. 213 User Agenda Model  User agenda defines how a user interacts with the system in the dialogue level  Random order: random sample a slot for informing or request  Fixed order: define a list of ordering slots for informing or request 213
  213. 214 Elements of User Simulation Error Model • Recognition error • LU error Dialogue State Tracking (DST) System dialogue acts Reward Backend Action / Knowledge Providers Dialogue Policy Optimization Dialogue Management (DM) User Model Reward Model User Simulation Distribution over user dialogue acts (semantic frames) The error model enables the system to maintain the robustness during training
  214. 215 Frame-Level Interaction  DM receives frame-level information  No error model: perfect recognizer and LU  Error model: simulate the possible errors 215 Error Model • Recognition error • LU error Dialogue State Tracking (DST) system dialogue acts Dialogue Policy Optimization Dialogue Management (DM) User Model User Simulation user dialogue acts (semantic frames)
  215. 216 Natural Language Level Interaction  User simulator sends natural language  No recognition error  Errors from NLG or LU 216 Natural Language Generation (NLG) Dialogue State Tracking (DST) system dialogue acts Dialogue Policy Optimization Dialogue Management (DM) User Model User Simulation NL Language Understanding (LU)
  216. 217 Error Model  Simple  Randomly generate errors by  Replacing with an incorrect intent  Deleting an informable slot  Replacing with an incorrect slot (value is original)  Replacing with an incorrect slot value  Learning  Simulate the errors based on the ASR or LU confusion 217 request_moviename (actor=Robert Downey Jr) request_year director Robert Downey Sr
  217. 218 Elements of User Simulation Error Model • Recognition error • LU error Dialogue State Tracking (DST) System dialogue acts Reward Backend Action / Knowledge Providers Dialogue Policy Optimization Dialogue Management (DM) User Model Reward Model User Simulation Distribution over user dialogue acts (semantic frames) The reward model checks whether the returned results satisfy the goal in order to decide the reward
  218. 219 Reward Model  Success measurement  Check whether the returned results satisfy the predefined user goal  Success – large positive reward  Fail – large negative reward  Minimize the number of turns  Penalty for additional turns 219
  219. 220 Concluding Remarks Error Model • Recognition error • LU error Dialogue State Tracking (DST) System dialogue acts Reward Backend Action / Knowledge Providers Dialogue Policy Optimization Dialogue Management (DM) User Model Reward Model User Simulation Distribution over user dialogue acts (semantic frames)
  220. 221 Tutorial Outline I. 對話系統及基本背景知識 II. 語言理解 (Language Understanding) III. 對話管理 (Dialogue Management) IV. 語言生成 (Language Generation) V. 對話系統評估、發展及趨勢 221
  221. 語言生成 Language Generation222
  222. 223 Task-Oriented Dialogue System (Young, 2000) 223 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Database/ Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  223. 224 Task-Oriented Dialogue System (Young, 2000) 224 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers
  224. 225 Natural Language Generation (NLG)  Mapping dialogue acts into natural language inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant 225
  225. 226 Template-Based NLG  Define a set of rules to map frames to NL 226 Pros: simple, error-free, easy to control Cons: time-consuming, rigid, poor scalability Semantic Frame Natural Language confirm() “Please tell me more about the product your are looking for.” confirm(area=$V) “Do you want somewhere in the $V?” confirm(food=$V) “Do you want a $V restaurant?” confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
  226. 227 Class-Based LM NLG (Oh and Rudnicky, 2000)  Class-based language modeling  NLG by decoding 227 Pros: easy to implement/ understand, simple rules Cons: computationally inefficient Classes: inform_area inform_address … request_area request_postcode http://dl.acm.org/citation.cfm?id=1117568
  227. 228 Phrase-Based NLG (Mairesse et al, 2010) Semantic DBN Phrase DBN Charlie Chan is a Chinese Restaurant near Cineworld in the centre d d Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre) 228 Pros: efficient, good performance Cons: require semantic alignments realization phrase semantic stack http://dl.acm.org/citation.cfm?id=1858838
  228. 229 RNN-Based LM NLG (Wen et al., 2015) <BOS> SLOT_NAME serves SLOT_FOOD . <BOS> Din Tai Fung serves Taiwanese . delexicalisation Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… dialogue act 1-hot representation SLOT_NAME serves SLOT_FOOD . <EOS> Slot weight tying conditioned on the dialogue act Input Output http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295
  229. 230 Handling Semantic Repetition  Issue: semantic repetition  Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.  Din Tai Fung is a child friendly restaurant, and also allows kids.  Deficiency in either model or decoding (or both)  Mitigation  Post-processing rules (Oh & Rudnicky, 2000)  Gating mechanism (Wen et al., 2015)  Attention (Mei et al., 2016; Wen et al., 2015) 230
  230. 231  Original LSTM cell  Dialogue act (DA) cell  Modify Ct Semantic Conditioned LSTM (Wen et al., 2015) DA cell LSTM cell Ct it ft ot rt ht dtdt-1 xt xt ht-1 xt ht-1 xt ht-1 xt ht- 1 ht-1 Inform(name=Seven_Days, food=Chinese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d0 231 Idea: using gate mechanism to control the generated semantics (dialogue act/slots) http://www.aclweb.org/anthology/D/D15/D15-1199.pdf
  231. 232 Structural NLG (Dušek and Jurčíček, 2016)  Goal: NLG based on the syntax tree  Encode trees as sequences  Seq2Seq model for generation 232 https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79
  232. 233 Contextual NLG (Dušek and Jurčíček, 2016)  Goal: adapting users’ way of speaking, providing context- aware responses  Context encoder  Seq2Seq model 233 https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203
  233. 234 Chit-Chat Bot  Neural conversational model  Non task-oriented 234
  234. 235 Many-to-Many  Both input and output are both sequences → Sequence-to- sequence learning  E.g. Machine Translation (machine learning→機器學習) 235 learning machine 機 習器 學 Add an end symbol “===“ [Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15] ===
  235. 236 A Neural Conversational Model  Seq2Seq 236 [Vinyals and Le, 2015]
  236. 237 Chit-Chat Bot 237 電視影集 (~40,000 sentences)、美國總統大選辯論
  237. 238 Sci-Fi Short Film - SUNSPRING 238https://www.youtube.com/watch?v=LY7x2Ihqj
  238. 239 Tutorial Outline I. 對話系統及基本背景知識 II. 語言理解 (Language Understanding) III. 對話管理 (Dialogue Management) IV. 語言生成 (Language Generation) V. 對話系統評估、發展及趨勢 239
  239. 240 Outline  Dialogue System Evaluation  Recent Trends  End-to-End Learning for Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth  Challenges 240
  240. 241 Dialogue System Evaluation  Language understanding  Sentence-level: frame accuracy (%)  Subsentence-level: intent accuracy (%), slot F-measure (%)  Dialogue management  Dialogue state tracking  Frame accuracy (%)  Dialogue policy  Turn-level: accuracy (%)  Dialogue-level: success rate (%), reward  Natural language generation 241
  241. 242 NLG Evaluation  Metrics  Subjective: human judgement (Stent et al., 2005)  Adequacy: correct meaning  Fluency: linguistic fluency  Readability: fluency in the dialogue context  Variation: multiple realizations for the same concept  Objective: automatic metrics  Word overlap: BLEU (Papineni et al, 2002), METEOR, ROUGE  Word embedding based: vector extrema, greedy matching, embedding average 242 There is a gap between human perception and automatic metrics
  242. 243 Dialogue Evaluation Measures 243 •Whether constrained values specified by users can be understood by the system •Agreement percentage of system/user understandings over the entire dialog (averaging all turns) Understanding Ability •Number of dialogue turns for task completion Efficiency •An explicit confirmation for an uncertain user utterance is an appropriate system action •Providing information based on misunderstood user requirements Action Appropriateness
  243. 244 Outline  Dialogue System Evaluation  Recent Trends  End-to-End Learning for Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth  Challenges 244
  244. 245 E2E Joint NLU and DM (Yang et al., 2017)  Idea: errors from DM can be propagated to NLU for better robustness DM 245 Model DM NLU Baseline (CRF+SVMs) 7.7 33.1 Pipeline-BLSTM 12.0 36.4 JointModel 22.8 37.4 Both DM and NLU performance is improved https://arxiv.org/abs/1612.00913
  245. 246 0 0 0 … 0 1 Database Operator Copy field … Database Sevendays CurryPrince Nirala RoyalStandard LittleSeuol DB pointer Can I have korean Korean 0.7 British 0.2 French 0.1 … Belief Tracker Intent Network Can I have <v.food> E2E Supervised Dialogue System (Wen et al., 2016) Generation Network <v.name> serves great <v.food> . Policy Network 246 zt pt xt MySQL query: “Select * where food=Korean” qt https://arxiv.org/abs/1604.04562
  246. 247 E2E MemNN for Dialogues (Bordes et al., 2016)  Split dialogue system actions into subtasks  API issuing  API updating  Option displaying  Information informing 247 https://arxiv.org/abs/1605.07683 dialogue-levelturn-level
  247. 248 E2E RL-Based Info-Bot (Dhingra et al., 2016) Movie=?; Actor=Bill Murray; Release Year=1993 Find me the Bill Murray’s movie. I think it came out in 1993. When was it released? Groundhog Day is a Bill Murray movie which came out in 1993. KB-InfoBot User (Groundhog Day, actor, Bill Murray) (Groundhog Day, release year, 1993) (Australia, actor, Nicole Kidman) (Mad Max: Fury Road, release year, 2015) Knowledge Base (head, relation, tail) 248Idea: differentiable database for propagating the gradients https://arxiv.org/abs/1609.00777
  248. 249 E2E RL-Based System (Zhao and Eskenazi, 2016) 249  Joint learning  NLU, DST, Dialogue Policy  Deep RL for training  Deep Q-network  Deep recurrent network Baseline RL Hybrid-RL http://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=19
  249. 250 E2E LSTM-Based Dialogue Control (Williams and Zweig, 2016) 250  Idea: an LSTM maps from raw dialogue history directly to a distribution over system actions  Developers can provide software including business rules & programmatic APIs  LSTM can take actions in the real world on behalf of the user  The LSTM can be optimized using SL or RL https://arxiv.org/abs/1606.01269
  250. 251 E2E Task-Completion Bot (TC-Bot) (Li et al., 2017) wi B- type wi +1 wi+2 O O EOS <intent > wi B- type wi +1 wi+2 O O EOS <intent > Semantic Frame request_movie genre=action, date=this weekend System Action / Policy request_location User Dialogue Action Inform(location=San Francisco) Time t-1 wi <slot> wi +1 wi+2 O O EOS <intent> Language Understanding (LU) Time t-2 Time t Dialogue Management (DM) w0 w1 w2 Natural Language Generation (NLG) EOS User Goal User Agenda Modeling User Simulator End-to-End Neural Dialogue System Text Input Are there any action movies to see this weekend? Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system 251 https://arxiv.org/abs/1703.01008
  251. 252 E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)  User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle. RULE BASED SYSTEM User: Can I get 2 tickets for the witch? Agent: Which theater do you want? User: Regal meridian 16 Agent: What time would you like to see it? User: 9:30 pm Agent: What date would you like to see it? User: Tomorrow! Agent: How many people are you? User: Just two adults. Agent: What is your location? User: Seattle! Agent: What kind of video format do you like? User: Oh, any kind is okay. Agent: Do you have any price constraint? User: No, I don’t care. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thank you. Agent: Thank you! 252 The system can learn how to efficiently interact with users for task completion REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch? Agent: What date would you like to see it? User: Tomorrow Agent: Which theater do you prefer? User: Regal meridian 16 Agent: What is your location? User: Seattle Agent: What time would you like to see it? User: 9:30 pm. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thanks. Agent: Thanks! https://arxiv.org/abs/1703.01008
  252. 253 Outline  Dialogue System Evaluation  Recent Trends  End-to-End Learning for Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth  Challenges 253
  253. 254 Brain Signal for Under