Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelligent Assistants

764 views

Published on

NTHU ISA Talk

Published in: Technology
  • Be the first to comment

"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelligent Assistants

  1. 1. DR. YUN-NUNG (VIVIAN) CHEN H T T P : / / V I V I A N C H E N . I D V.T W Statistical Learning from Dialogues for Intelligence Assistants Sorry, I didn’t get that! 1"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
  2. 2. My Background Yun-Nung (Vivian) Chen 陳縕儂 http://vivianchen.idv.tw National Taiwan University 2009 B.S. 2005 Freshman 2011 M.S. 2015 Ph.D. Carnegie Mellon University spoken dialogue system language understanding user modeling speech summarization key term extraction spoken term detection Microsoft Research 2016 Postdoc "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 2
  3. 3. Outline Intelligent Assistant ◦ What are they? ◦ Why do we need them? ◦ Why do companies care? Reactive Assistant – Spoken Dialogue System (SDS) ◦ Pipeline Architecture ◦ Current Challenges & Overview Contributions Semantic Decoding Intent Prediction Conclusions & Future Work "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 3
  4. 4. Outline Intelligent Assistant ◦ What are they? ◦ Why do we need them? ◦ Why do companies care? Reactive Assistant – Spoken Dialogue System (SDS) ◦ Pipeline Architecture ◦ Current Challenges & Overview Contributions Semantic Decoding Intent Prediction Conclusions & Future Work "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 4
  5. 5. Apple Siri (2011) Google Now (2012) Microsoft Cortana (2014) Amazon Alexa/Echo (2014) https://www.apple.com/ios/siri/ https://www.google.com/landing/now/ http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortana http://www.amazon.com/oc/echo/ Facebook M (2015) What are Intelligent Assistants? "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 5
  6. 6. Why do we need them? Daily Life Usage ◦ Weather ◦ Schedule ◦ Transportation ◦ Restaurant Seeking "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 6
  7. 7. Why do we need them? ◦ Get things done ◦ E.g. set up alarm/reminder, take note ◦ Easy access to structured data, services and apps ◦ E.g. find docs/photos/restaurants ◦ Assist your daily schedule and routine ◦ E.g. commute alerts to/from work ◦ Be more productive in managing your work and personal life "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 7
  8. 8. Why do companies care? Global Digital Statistics (2015 January) Global Population 7.21B Active Internet Users 3.01B Active Social Media Accounts 2.08B Active Unique Mobile Users 3.65B The more natural and convenient input of the devices evolves towards speech. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 8
  9. 9. Personal Intelligent Architecture Reactive Assistance ASR, LU, Dialog, LG, TTS Proactive Assistance Inferences, User Modeling, Suggestions Data Back-end Data Bases, Services and Client Signals Device/Service End-points (Phone, PC, Xbox, Web Browser, Messaging Apps) User Experience “restaurant suggestions”“call taxi” "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 9
  10. 10. Personal Intelligent Architecture Reactive Assistance ASR, LU, Dialog, LG, TTS Proactive Assistance Inferences, User Modeling, Suggestions Data Back-end Data Bases, Services and Client Signals Device/Service End-points (Phone, PC, Xbox, Web Browser, Messaging Apps) User Experience “call taxi” "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 10
  11. 11. Outline Intelligent Assistant ◦ What are they? ◦ Why do we need them? ◦ Why do companies care? Reactive Assistant – Spoken Dialogue System (SDS) ◦ Pipeline Architecture ◦ Current Challenges & Overview Contributions Semantic Decoding Intent Prediction Conclusions & Future Work "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 11
  12. 12. Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions. Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). Good SDSs assist users to organize and access information conveniently. Spoken Dialogue System (SDS) JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 12
  13. 13. Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people. What is Baymax’s intelligence? Big Hero 6 -- Video content owned and licensed by Disney Entertainment, Marvel Entertainment, LLC, etc "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 13
  14. 14. ASR: Automatic Speech Recognition SLU: Spoken Language Understanding DM: Dialogue Management NLG: Natural Language Generation SDS Architecture DomainDMASR SLU NLG current bottleneck "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 14
  15. 15. Interaction Example User Intelligent Agent Q: How does a dialogue system process this request? Cheap Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 15
  16. 16. SDS Process – Available Domain Ontology find a cheap eating place for taiwanese food User target foodprice AMOD NN seeking PREP_FOR Organized Domain Knowledge Intelligent Agent "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 16
  17. 17. SDS Process – Available Domain Ontology User target foodprice AMOD NN seeking PREP_FOR Organized Domain Knowledge Intelligent Agent Ontology Induction (semantic slot) find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 17
  18. 18. SDS Process – Available Domain Ontology User target foodprice AMOD NN seeking PREP_FOR Organized Domain Knowledge Intelligent Agent Ontology Induction (semantic slot) Structure Learning (inter-slot relation) find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 18
  19. 19. SDS Process – Spoken Language Understanding (SLU) User target foodprice AMOD NN seeking PREP_FOR Intelligent Agent seeking=“find” target=“eating place” price=“cheap” food=“taiwanese” find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 19
  20. 20. find a cheap eating place for taiwanese food SDS Process – Spoken Language Understanding (SLU) User target foodprice AMOD NN seeking PREP_FOR Intelligent Agent seeking=“find” target=“eating place” price=“cheap” food=“taiwanese” Semantic Decoding "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 20
  21. 21. find a cheap eating place for taiwanese food SDS Process – Dialogue Management (DM) User target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“taiwanese” }Intelligent Agent "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 21
  22. 22. find a cheap eating place for taiwanese food SDS Process – Dialogue Management (DM) User target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“taiwanese” }Intelligent Agent Surface Form Derivation (natural language) "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 22
  23. 23. SDS Process – Dialogue Management (DM) User SELECT restaurant { restaurant.price=“cheap” restaurant.food=“taiwanese” } Din Tai Fung Boiling Point : : Predicted intent: navigation Intelligent Agent find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 23
  24. 24. SDS Process – Dialogue Management (DM) User SELECT restaurant { restaurant.price=“cheap” restaurant.food=“taiwanese” } Din Tai Fung Boiling Point : : Predicted intent: navigation Intelligent Agent Intent Prediction find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 24
  25. 25. SDS Process – Natural Language Generation (NLG) User Intelligent Agent Cheap Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. (navigation) find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 25
  26. 26. Required Knowledge target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“taiwanese” } Predicted intent: navigation User Required Domain-Specific Information find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 26
  27. 27. Challenges for SDS An SDS in a new domain requires 1) A hand-crafted domain ontology 2) Utterances labelled with semantic representations 3) An SLU component for mapping utterances into semantic representations Manual work results in high cost, long duration and poor scalability of system development. The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modeling in order to handle the open-domain requests. seeking=“find” target=“eating place” price=“cheap” food=“asian food” find a cheap eating place for asian food   fully unsupervised Prior Focus "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 27
  28. 28. Contributions target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted intent: navigation find a cheap eating place for taiwanese food User Ontology Induction Structure Learning Surface Form Derivation Semantic Decoding Intent Prediction (natural language) (inter-slot relation) (semantic slot) "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 28
  29. 29. Contributions User Ontology Induction Structure Learning Surface Form Derivation Semantic Decoding Intent Prediction find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 29
  30. 30.  Ontology Induction  Structure Learning  Surface Form Derivation  Semantic Decoding  Intent Prediction Contributions User Knowledge Acquisition SLU Modeling find a cheap eating place for taiwanese food "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 30
  31. 31. Knowledge Acquisition 1) Given unlabelled conversations, how can a system automatically induce and organize domain-specific concepts? Restaurant Asking Conversations target food price seeking quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD Organized Domain Knowledge Unlabelled Collection Knowledge Acquisition Knowledge Acquisition  Ontology Induction  Structure Learning  Surface Form Derivation "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 31
  32. 32. SLU Modeling 2) With the automatically acquired knowledge, how can a system understand utterance semantics and user intents? Organized Domain Knowledge price=“cheap” target=“restaurant” intent=navigation SLU Modeling SLU Component “can i have a cheap restaurant” SLU Modeling  Semantic Decoding  Intent Prediction "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 32
  33. 33. SDS Architecture – Contributions DomainDMASR SLU NLG Knowledge Acquisition SLU Modeling current bottleneck "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 33
  34. 34. SDS Flowchart Ontology Induction Structure Learning Semantic Decoding Intent Prediction Knowledge Acquisition SLU Modeling "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 34
  35. 35. SDS Flowchart – Semantic Decoding Ontology Induction Structure Learning Semantic Decoding Intent Prediction Knowledge Acquisition SLU Modeling "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 35
  36. 36. Outline Intelligent Assistant ◦ What are they? ◦ Why do we need them? ◦ Why do companies care? Reactive Assistant – Spoken Dialogue System (SDS) ◦ Pipeline Architecture ◦ Current Challenges & Overview Contributions Semantic Decoding Intent Prediction Conclusions & Future Work "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 36
  37. 37. Semantic Decoding [ACL-IJCNLP’15] Input: user utterances Output: semantic concepts included in each individual utterance Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015. SLU Model target=“restaurant” price=“cheap” “can I have a cheap restaurant” Frame-Semantic Parsing Unlabeled Collection Semantic KG Ontology Induction Fw Fs Feature Model Rw Rs Knowledge Graph Propagation Model Word Relation Model Lexical KG Slot Relation Model Structure Learning × Semantic KG MF-SLU: SLU Modeling by Matrix Factorization Semantic Representation "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 37
  38. 38. [Baker et al. 1998; Das et al., 2014] Frame-Semantic Parsing FrameNet [Baker et al., 1998] ◦ a linguistically semantic resource, based on the frame-semantics theory ◦ words/phrases can be represented as frames ◦ “low fat milk”  “milk” evokes the “food” frame; “low fat” fills the descriptor frame element SEMAFOR [Das et al., 2014] ◦ a state-of-the-art frame-semantics parser, trained on manually annotated FrameNet sentences "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 38
  39. 39. Ontology Induction [ASRU’13, SLT’14a] can i have a cheap restaurant Frame: capability Frame: expensiveness Frame: locale by use 1st Issue: differentiate domain-specific frames from generic frames for SDSs Good! Good! ? Das et al., " Frame-semantic parsing," in Proc. of Computational Linguistics, 2014. slot candidate Best Student Paper Award "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 39
  40. 40. 1 Utterance 1 i would like a cheap restaurant Train ……… cheap restaurant foodexpensiveness 1 locale_by_use 11 find a restaurant with chinese food Utterance 2 1 1 food 1 1 1 Test 1 .97 .95 Frame Semantic Parsing show me a list of cheap restaurants Test Utterance Word Observation Slot Candidate Ontology Induction [ASRU’13, SLT’14a] Best Student Paper Award Idea: increase weights of domain-specific slots and decrease weights of others "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 40
  41. 41. 1st Issue: How to adapt generic slots to a domain-specific setting? Knowledge Graph Propagation Model Assumption: domain-specific words/slots have more dependencies to each other Word Relation Model Slot Relation Model word relation matrix slot relation matrix × 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 Slot Induction Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph, so that domain-specific words/slots have higher scores after matrix multiplication. i like 1 1 capability 1 locale_by_use food expensiveness seeking relational_quantitydesiring Utterance 1 i would like a cheap restaurant …… find a restaurant with chinese food Utterance 2 show me a list of cheap restaurants Test Utterance "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 41
  42. 42. Semantic Decoding [ACL-IJCNLP’15] Input: user utterances Output: semantic concepts included in each individual utterance Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015. SLU Model target=“restaurant” price=“cheap” “can I have a cheap restaurant” Frame-Semantic Parsing Unlabeled Collection Semantic KG Ontology Induction Fw Fs Feature Model Rw Rs Knowledge Graph Propagation Model Word Relation Model Lexical KG Slot Relation Model Structure Learning × Semantic KG MF-SLU: SLU Modeling by Matrix Factorization Semantic Representation "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 42
  43. 43. Knowledge Graph Construction Syntactic dependency parsing on utterances ccomp amod dobjnsubj det can i have a cheap restaurant capability expensiveness locale_by_use Word-based lexical knowledge graph Slot-based semantic knowledge graph restaurant can have i a cheap w w capability locale_by_use expensiveness s "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 43
  44. 44. Dependency-based word embeddings Dependency-based slot embeddings Edge Weight Measurement Slot/Word Embeddings Training (Levy and Goldberg, 2014) can = [0.8 … 0.24] have = [0.3 … 0.21] : : expensiveness = [0.12 … 0.7] capability = [0.3 … 0.6] : : can i have a cheap restaurant ccomp amod dobjnsubj det have acapability expensiveness locale_by_use ccomp amod dobjnsubj det Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 44
  45. 45. Edge Weight Measurement Compute edge weights to represent relation importance ◦ Slot-to-slot semantic relation 𝑅 𝑠 𝑆 : similarity between slot embeddings ◦ Slot-to-slot dependency relation 𝑅 𝑠 𝐷 : dependency score between slot embeddings ◦ Word-to-word semantic relation 𝑅 𝑤 𝑆 : similarity between word embeddings ◦ Word-to-word dependency relation 𝑅 𝑤 𝐷 : dependency score between word embeddings 𝑅 𝑤 𝑆𝐷 = 𝑅 𝑤 𝑆 +𝑅 𝑤 𝐷 𝑅 𝑠 𝑆𝐷 = 𝑅 𝑠 𝑆+𝑅 𝑠 𝐷 w1 w2 w3 w4 w5 w6 w7 s2 s1 s3 "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 45
  46. 46. Word Relation Model Slot Relation Model word relation matrix slot relation matrix × 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 Slot Induction Knowledge Graph Propagation Model 𝑅 𝑤 𝑆𝐷 𝑅 𝑠 𝑆𝐷 Structure information is integrated to make the self-training data more reliable. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 46
  47. 47. Ontology Induction SLU Fw Fs Structure Learning × 1 Utterance 1 i would like a cheap restaurant Word Observation Slot Candidate Train … cheap restaurant foodexpensiveness 1 locale_by_use 11 find a restaurant with chinese food Utterance 2 1 1 food 1 1 1 Test 1 .97.90 .95.85 Ontology Induction show me a list of cheap restaurants Test Utterance hidden semantics 2nd Issue: unobserved semantics may benefit understanding Semantic Decoding [ACL-IJCNLP’15] "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 47
  48. 48. Reasoning with Matrix Factorization Word Relation Model Slot Relation Model word relation matrix slot relation matrix × 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 .97.90 .95.85 .93 .92.98.05 .05 Slot Induction Feature Model + Knowledge Graph Propagation Model 𝑅 𝑤 𝑆𝐷 𝑅 𝑠 𝑆𝐷 Idea: MF completes a partially-missing matrix based on a low-rank latent semantics assumption, which is able to model hidden semantics and more robust to noisy data. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 48
  49. 49. 2nd Issue: How to model the unobserved hidden semantics? Matrix Factorization (MF) (Rendle et al., 2009) The decomposed matrices represent latent semantics for utterances and words/slots respectively The product of two matrices fills the probability of hidden semantics 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 .97.90 .95.85 .93 .92.98.05 .05 𝑼 𝑾 + 𝑺 ≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑺× Rendle et al., “BPR: Bayesian Personalized Ranking from Implicit Feedback," in Proc. of UAI, 2009. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 49
  50. 50. Bayesian Personalized Ranking for MF Model implicit feedback ◦ not treat unobserved facts as negative samples (true or false) ◦ give observed facts higher scores than unobserved facts Objective: 1 𝑓+ 𝑓− 𝑓− The objective is to learn a set of well-ranked semantic slots per utterance. 𝑢 𝑥 "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 50
  51. 51. Ontology Induction SLU Fw Fs Structure Learning × 1 Utterance 1 i would like a cheap restaurant Word Observation Slot Candidate Train … cheap restaurant foodexpensiveness 1 locale_by_use 11 find a restaurant with chinese food Utterance 2 1 1 food 1 1 1 Test 1 .97.90 .95.85 Ontology Induction show me a list of cheap restaurants Test Utterance Matrix Factorization SLU (MF-SLU) MF-SLU can estimate probabilities for slot candidates given test utterances. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 51
  52. 52. Semantic Decoding [ACL-IJCNLP’15] Input: user utterances Output: semantic concepts included in each individual utterance Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015. SLU Model target=“restaurant” price=“cheap” “can I have a cheap restaurant” Frame-Semantic Parsing Unlabeled Collection Semantic KG Ontology Induction Fw Fs Feature Model Rw Rs Knowledge Graph Propagation Model Word Relation Model Lexical KG Slot Relation Model Structure Learning × Semantic KG MF-SLU: SLU Modeling by Matrix Factorization Semantic Representation Idea: utilize the acquired knowledge to decode utterance semantics (fully unsupervised) "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 52
  53. 53. Experimental Setup Dataset: Cambridge University SLU Corpus ◦ Restaurant recommendation (WER = 37%) ◦ 2,166 dialogues ◦ 15,453 utterances ◦ dialogue slot: addr, area, food, name, phone, postcode, price range, task, type Metric: MAP of all estimated slot probabilities over all utterances The mapping table between induced and reference slots Henderson et al., "Discriminative spoken language understanding using word confusion networks," in Proc. of SLT, 2012. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 53
  54. 54. Experiments of Semantic Decoding Quality of Semantics Estimation Dataset: Cambridge University SLU Corpus Metric: MAP of all estimated slot probabilities for all utterances Approach ASR Transcripts Baseline: SLU Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8 "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 54
  55. 55. Experiments of Semantic Decoding Quality of Semantics Estimation Dataset: Cambridge University SLU Corpus Metric: MAP of all estimated slot probabilities for all utterances The MF-SLU effectively models implicit information to decode semantics. The structure information further improves the results. Approach ASR Transcripts Baseline: SLU Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8 Proposed: MF-SLU Feature Model 37.6* 45.3* Feature Model + Knowledge Graph Propagation 43.5* (+27.9%) 53.4* (+37.6%) *: the result is significantly better than the MLR with p < 0.05 in t-test "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 55
  56. 56. Experiments of Semantic Decoding Effectiveness of Relations Dataset: Cambridge University SLU Corpus Metric: MAP of all estimated slot probabilities for all utterances In the integrated structure information, both semantic and dependency relations are useful for understanding. Approach ASR Transcripts Feature Model 37.6 45.3 Feature + Knowledge Graph Propagation Semantic 41.4* 51.6* Dependency 41.6* 49.0* All 43.5* (+15.7%) 53.4* (+17.9%) *: the result is significantly better than the MLR with p < 0.05 in t-test "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 56
  57. 57. Experiments for Structure Learning Relation Discovery Analysis Discover inter-slot relations connecting important slot pairs The reference ontology with the most frequent syntactic dependencies locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ type food pricerange DOBJ AMOD AMOD AMOD task area PREP_IN The automatically learned domain ontology aligns well with the reference one. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57 The data-driven one is more objective while expert-annotated one is more subjective.
  58. 58. Contributions of Semantic Decoding Ontology Induction Structure Learning Semantic Decoding Intent Prediction Knowledge Acquisition SLU Modeling  Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge.  MF-SLU for Semantic Decoding is able to 1) unify the automatically acquired knowledge 2) adapt to a domain- specific setting 3) and then allows systems to model implicit semantics for better understanding. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 58
  59. 59. Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents) The follow-up behaviors usually correspond to user intents price=“cheap” target=“restaurant” SLU Model “can i have a cheap restaurant” intent=navigation restaurant=“legume” time=“tonight” SLU Model “i plan to dine in legume tonight” intent=reservation "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 59
  60. 60. Ontology Induction Structure Learning Semantic Decoding Intent Prediction Knowledge Acquisition SLU Modeling SDS Flowchart – Intent Prediction "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 60
  61. 61. Outline Intelligent Assistant ◦ What are they? ◦ Why do we need them? ◦ Why do companies care? Reactive Assistant – Spoken Dialogue System (SDS) ◦ Pipeline Architecture ◦ Current Challenges & Overview Contributions Semantic Decoding Intent Prediction Conclusions & Future Work "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 61
  62. 62. [Chen & Rudnicky, SLT 2014; Chen et al., ICMI 2015] Input: spoken utterances for making requests about launching an app Output: the apps supporting the required functionality Intent Identification ◦ popular domains in Google Play please dial a phone call to alex Skype, Hangout, etc. Intent Prediction of Mobile Apps [SLT’14c] Chen and Rudnicky, "Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings," in Proc. of SLT, 2014. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 62
  63. 63. Input: single-turn request Output: apps that are able to support the required functionality Intent Prediction– Single-Turn Request 1 Enriched Semantics communication .90 1 1 Utterance 1 i would like to contact alex Word Observation Intended App …… contact message Gmail Outlook Skypeemail Test .90 Reasoning with Feature-Enriched MF Train … your email, calendar, contacts… … check and send emails, msgs … Outlook Gmail IR for app candidates App Desc Self-Train Utterance Test Utterance 1 1 1 1 1 1 1 1 1 1 1 .90 .85 .97 .95 Feature Enrichment Utterance 1 i would like to contact alex … 1 1 The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 63
  64. 64. Intent Prediction– Multi-Turn Interaction [ICMI’15] Input: multi-turn interaction Output: apps the user plans to launch Challenge: language ambiguity 1) User preference 2) App-level contexts Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/. send to vivian v.s. Email? Message? Communication Idea: Behavioral patterns in history can help intent prediction. previous turn "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 64
  65. 65. Intent Prediction– Multi-Turn Interaction [ICMI’15] Input: multi-turn interaction Output: apps the user plans to launch 1 Lexical Intended App photo check camera IMtell take this photo tell vivian this is me in the lab CAMERA IM Train Dialogue check my grades on website send an email to professor … CHROME EMAIL send Behavior History null camera .85 take a photo of this send it to alice CAMERA IM … email 1 1 1 1 1 1 .70 chrome 1 1 1 1 1 1 chrome email 1 1 1 1 .95 .80 .55 User Utterance Intended App Reasoning with Feature-Enriched MF Test Dialogue take a photo of this send it to alice … Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/. The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 65
  66. 66. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 26.1 Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 55.5 LM-Based IR Model (unsupervised) Multinomial Logistic Regression (supervised) Experiments for Intent Prediction "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 66
  67. 67. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%) Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%) Modeling hidden semantics helps intent prediction especially for noisy data. Experiments for Intent Prediction "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 67
  68. 68. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%) Word + Embedding-Based Semantics 32.0 33.3 Word + Type-Embedding-Based Semantics 31.5 32.9 Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%) Word + Behavioral Patterns 53.9 56.6 Semantic enrichment provides rich cues to improve performance. Experiments for Intent Prediction "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 68
  69. 69. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%) Word + Embedding-Based Semantics 32.0 34.2 (+6.8%) 33.3 33.3 (-0.2%) Word + Type-Embedding-Based Semantics 31.5 32.2 (+2.1%) 32.9 34.0 (+3.4%) Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%) Word + Behavioral Patterns 53.9 55.7 (+3.3%) 56.6 57.7 (+1.9%) Intent prediction can benefit from both hidden information and low-level semantics. Experiments for Intent Prediction "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 69
  70. 70. Ontology Induction Structure Learning Semantic Decoding Intent Prediction Knowledge Acquisition SLU Modeling Contributions of Intent Prediction  Feature-Enriched MF-SLU for Intent Prediction is able to 1) unify the knowledge at different levels 2) learn inference relations between various features 3) and create personalized models by leveraging contextual behaviors. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 70
  71. 71. Personal Intelligent Architecture Reactive Assistance ASR, LU, Dialog, LG, TTS Proactive Assistance Inferences, User Modeling, Suggestions Data Back-end Data Bases, Services and Client Signals Device/Service End-points (Phone, PC, Xbox, Web Browser, Messaging Apps) User Experience “call taxi” "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 71
  72. 72. Outline Intelligent Assistant ◦ What are they? ◦ Why do we need them? ◦ Why do companies care? Reactive Assistant – Spoken Dialogue System (SDS) ◦ Pipeline Architecture ◦ Current Challenges & Overview Contributions Semantic Decoding Intent Prediction Conclusions & Future Work "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 72
  73. 73. Conclusions The work shows the feasibility and the potential for improving generalization, maintenance, efficiency, and scalability of SDSs. The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies. The proposed MF-SLU unifies the automatically acquired knowledge, and then allows systems to consider implicit semantics for better understanding. ◦ Better semantic representations for individual utterances ◦ Better high-level intent prediction about follow-up behaviors "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 73
  74. 74. Future Work Apply the proposed technology to domain discovery ◦ not covered by the current systems but users are interested in ◦ guide the next developed domains Improve the proposed approach by handling the uncertainty SLU SLU Modeling ASR Knowledge Acquisition recognition errors unreliable knowledge "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 74
  75. 75. d d d U S1 S2 P(S1 | U) P(S2 | U) … Semantic Relation Posterior Probability Utterance Slot Candidate … w1 w2 wd Word Sequence: x Word Vector: lw Pooling Operation R(U, S1) R(U, S2) Knowledge Graph Propagation Matrix: Wp Semantic Projection Matrix: Ws Semantic Layer: y Knowledge Graph Propagation Layer: lp d Sn P(Sn | U) Utterance Vector: lf … R(U, Sn) Slot Vector: lf Convolution Matrix: Wc Convolutional Layer: lc Towards Unsupervised Deep Learning "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 75 Treating MF as a one-layer neural net, we can add more layers in the model towards unsupervised deep learning.
  76. 76. Take Home Message Available big data w/o annotations Challenge: how to acquire and organize important knowledge, and further utilize it for applications Language understanding for AI ◦ language  action ◦ understand voice to control music, lights, etc. ◦ teach to let friends in by face recognition, etc. "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 76 Unsupervised or weakly-supervised methods will be the future trend! Deep language understanding is an emerging field!
  77. 77. Q & A THANKS FOR YOUR ATTENTIONS!! • Chen et al., "Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing," in Proc. of ASRU, 2013. (Best Student Paper Award) • Chen et al., "Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding," in Proc. of NAACL-HLT, 2015. • Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015. • Chen et al., “Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings,” in Proc. of SLT, 2014. • Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. • Chen et al., “Matrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling," in Extended Abstract of NIPS-SLU, 2015. • Chen et al., “Unsupervised User Intent Modeling by Feature-Enriched Matrix Factorization," in Proc. of ICASSP, 2016. 77"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

×