Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

26,379 views

Published on

Carnegie Mellon University - PhD Thesis Proposal (April, 2015)

  • Be the first to comment

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

  1. 1. Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems YUN-NUNG (VIVIAN) CHEN H T T P : / / V I V I A N C H E N . I D V . T W 2015 APRIL UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS COMMITTEE: ALEXANDER I. RUDNICKY (CHAIR) ANATOLE GERSHMAN (CO-CHAIR) ALAN W BLACK DILEK HAKKANI-TÜR ( ) 1
  2. 2. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS2
  3. 3. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS3
  4. 4. A Popular Robot - Baymax UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Big Hero 6 -- Video content owned and licensed by Disney Entertainment, Marvel Entertainment, LLC, etc 4
  5. 5. A Popular Robot - Baymax UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people. The thesis goal is to automate learning and understanding procedures in system development. 5
  6. 6. Spoken Dialogue System (SDS) Spoken dialogue systems are the intelligent agents that are able to help users finish tasks more efficiently via speech interactions. Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Apple’s Siri Microsoft’s Cortana Amazon’s Echo Samsung’s SMART TV Google Now https://www.apple.com/ios/siri/ http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortana http://www.amazon.com/oc/echo/ http://www.samsung.com/us/experience/smart-tv/ https://www.google.com/landing/now/ 6
  7. 7. Large Smart Device Population The number of global smartphone users will surpass 2 billion in 2016. As of 2012, there are 1.1 billion automobiles on the earth. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS The more natural and convenient input of the devices evolves towards speech 7
  8. 8. Knowledge Representation/Ontology Traditional SDSs require manual annotations for specific domains to represent domain knowledge. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Restaurant Domain Movie Domain restaurant typeprice location movie yeargenre director Node: semantic concept/slot Edge: relation between concepts located_in directed_by released_in 8
  9. 9. Utterance Semantic Representation A spoken language understanding (SLU) component requires the domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot-fillers) of the utterance. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS find a cheap taiwanese restaurant in oakland show me action movies directed by james cameron target=“restaurant”, price=“cheap”, type=“taiwanese”, location=“oakland” target=“movie”, genre=“action”, director=“james cameron” Restaurant Domain Movie Domain restaurant typeprice location movie yeargenre director 9
  10. 10. Challenges for SDS An SDS in a new domain requires 1) A hand-crafted domain ontology 2) Utterances labelled with semantic representations 3) An SLU component for mapping utterances into semantic representations With increasing spoken interactions, building domain ontologies and annotating utterances cost a lot so that the data does not scale up. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS The thesis goal is to enable an SDS to automatically learn this knowledge so that open domain requests can be handled. 10
  11. 11. Questions to Address 1) Given unlabelled raw audio recordings, how can a system automatically induce and organize domain-specific concepts? 2) With the automatically acquired knowledge, how can a system understand individual utterances and user intents? UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS11
  12. 12. Interaction Example find a cheap eating place for asian food User Intelligent Agent UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Q: How does a dialogue system process this request? 12 Cheap Asian eating places include Rose Tea Cafe, Little Asia, etc. What do you want to choose? I can help you go there.
  13. 13. SDS Process – Available Domain Ontology find a cheap eating place for asian food User target foodprice AMOD NN seeking PREP_FOR Organized Domain Knowledge UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 13
  14. 14. SDS Process – Available Domain Ontology find a cheap eating place for asian food User target foodprice AMOD NN seeking PREP_FOR Organized Domain Knowledge UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 14 Ontology Induction (semantic slot)
  15. 15. SDS Process – Available Domain Ontology find a cheap eating place for asian food User target foodprice AMOD NN seeking PREP_FOR Organized Domain Knowledge UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 15 Ontology Induction (semantic slot) Structure Learning (inter-slot relation)
  16. 16. SDS Process – Spoken Language Understanding (SLU) find a cheap eating place for asian food User target foodprice AMOD NN seeking PREP_FOR UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent seeking=“find” target=“eating place” price=“cheap” food=“asian food” 16
  17. 17. SDS Process – Spoken Language Understanding (SLU) find a cheap eating place for asian food User target foodprice AMOD NN seeking PREP_FOR UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent seeking=“find” target=“eating place” price=“cheap” food=“asian food” 17 Semantic Decoding
  18. 18. SDS Process – Dialogue Management (DM) find a cheap eating place for asian food User target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 18
  19. 19. SDS Process – Dialogue Management (DM) find a cheap eating place for asian food User target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 19 Surface Form Derivation (natural language)
  20. 20. SDS Process – Dialogue Management (DM) find a cheap eating place for asian food User SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Rose Tea Cafe Little Asia Lulu’s Everyday Noodles : :Predicted behavior: navigation UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 20
  21. 21. SDS Process – Dialogue Management (DM) find a cheap eating place for asian food User SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Rose Tea Cafe Little Asia Lulu’s Everyday Noodles : :Predicted behavior: navigation UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 21 Behavior Prediction
  22. 22. SDS Process – Natural Language Generation (NLG) find a cheap eating place for asian food User UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Intelligent Agent 22 Cheap Asian eating places include Rose Tea Cafe, Little Asia, etc. What do you want to choose? I can help you go there. (navigation)
  23. 23. Thesis Goals target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted behavior: navigation find a cheap eating place for asian food User UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Required Domain-Specific Information 23
  24. 24. Thesis Goals – 5 Goals target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted behavior: navigation find a cheap eating place for asian food User 1. Ontology Induction 2. Structure Learning 3. Surface Form Derivation 4. Semantic Decoding 5. Behavior Prediction UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS (natural language) (inter-slot relation) (semantic slot) 24
  25. 25. Thesis Goals – 5 Goals find a cheap eating place for asian food User 1. Ontology Induction 2. Structure Learning 3. Surface Form Derivation 4. Semantic Decoding 5. Behavior Prediction UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS25
  26. 26. Thesis Goals – 5 Goals find a cheap eating place for asian food User 1. Ontology Induction 2. Structure Learning 3. Surface Form Derivation 4. Semantic Decoding 5. Behavior Prediction Knowledge Acquisition SLU Modeling UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS26
  27. 27. SDS Architecture - Thesis Contributions UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS DomainDMASR SLU NLG Knowledge Acquisition SLU Modeling 27
  28. 28. Knowledge Acquisition 1) Given unlabelled raw audio recordings, how can a system automatically induce and organize domain-specific concepts? Restaurant Asking Conversations target food price seeking quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD Organized Domain Knowledge Unlabelled Collection Knowledge Acquisition 1. Ontology Induction 2. Structure Learning 3. Surface Form Derivation Knowledge Acquisition UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS28
  29. 29. SLU Modeling 2) With the automatically acquired knowledge, how can a system understand individual utterances and user intents? Organized Domain Knowledge price=“cheap” target=“restaurant” behavior=navigation SLU Modeling SLU Component “can i have a cheap restaurant” 4. Semantic Decoding 5. Behavior Prediction SLU Modeling UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS29
  30. 30. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Knowledge Acquisition SLU Modeling 30
  31. 31. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS31 100%
  32. 32. Ontology Induction [ASRU’13, SLT’14a] Input: Unlabelled user utterances Output: Slots that are useful for a domain-specific SDS Y.-N. Chen et al., "Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing," in Proc. of ASRU, 2013. (Best Student Paper Award) Y.-N. Chen et al., "Leveraging Frame Semantics and Distributional Semantics for Unsupervised Semantic Slot Induction in Spoken Dialogue Systems," in Proc. of SLT, 2014. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Restaurant Asking Conversations target food price seeking quantity Domain-Specific Slot Unlabelled Collection Step 1: Frame-semantic parsing on all utterances for creating slot candidates Step 2: Slot ranking model for differentiating domain-specific concepts from generic concepts Step 3: Slot selection 32
  33. 33. Probabilistic Frame-Semantic Parsing FrameNet [Baker et al., 1998] ◦ a linguistically semantic resource, based on the frame- semantics theory ◦ “low fat milk”  “milk” evokes the “food” frame; “low fat” fills the descriptor frame element SEMAFOR [Das et al., 2014] ◦ a state-of-the-art frame-semantics parser, trained on manually annotated FrameNet sentences UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Baker et al., " The berkeley framenet project," in Proc. of International Conference on Computational linguistics, 1998. Das et al., " Frame-semantic parsing," in Proc. of Computational Linguistics, 2014. 33
  34. 34. Step 1: Frame-Semantic Parsing for Utterances can i have a cheap restaurant Frame: capability FT LU: can FE LU: i Frame: expensiveness FT LU: cheap Frame: locale by use FT/FE LU: restaurant Task: adapting generic frames to domain-specific settings for SDSs Good! Good! ? UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS FT: Frame Target; FE: Frame Element; LU: Lexical Unit 34
  35. 35. Step 2: Slot Ranking Model Main Idea: rank domain-specific concepts higher than generic semantic concepts can i have a cheap restaurant Frame: capability FT LU: can FE LU: i Frame: expensiveness FT LU: cheap Frame: locale by use FT/FE LU: restaurant slot candidate slot filler UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS35
  36. 36. Step 2: Slot Ranking Model Rank a slot candidate s by integrating two scores slot frequency in the domain-specific conversation semantic coherence of slot fillers slots with higher frequency  more important domain-specific concepts  fewer topics UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS36
  37. 37. Step 2: Slot Ranking Model h(s): Semantic coherence lower coherence in topic space higher coherence in topic space slot name: quantity slot name: expensiveness a one allthree cheap expensive inexpensive UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS measured by cosine similarity between their word embeddings the slot-filler set corresponding to s 37
  38. 38. Step 3: Slot Selection Rank all slot candidates by their importance scores Output slot candidates with higher scores based on a threshold UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS frequency semantic coherence locale_by_use 0.89 capability 0.68 food 0.64 expensiveness 0.49 quantity 0.30 seeking 0.22 : locale_by_use food expensiveness capability quantity 38
  39. 39. Experiments of Ontology Induction Dataset ◦ Cambridge University SLU corpus [Henderson, 2012] ◦ Restaurant recommendation in an in-car setting in Cambridge ◦ WER = 37% ◦ vocabulary size = 1868 ◦ 2,166 dialogues ◦ 15,453 utterances ◦ dialogue slot: addr, area, food, name, phone, postcode, price range, task, type The mapping table between induced and reference slots UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Henderson et al., "Discriminative spoken language understanding using word confusion networks," in Proc. of SLT, 2012. 39
  40. 40. Experiments of Ontology Induction ◦ Slot Induction Evaluation: Average Precision (AP) and Area Under the Precision-Recall Curve (AUC) of the slot ranking model to measure quality of induced slots via the mapping table UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Induced slots have 70% of AP and align well with human-annotated slots for SDS. Semantic relations help decide domain-specific knowledge. Approach ASR Manual AP (%) AUC (%) AP (%) AUC (%) Baseline: MLE 56.7 54.7 53.0 50.8 MLE + Semantic Coherence 71.7 (+26.5%) 70.4 (+28.7%) 74.4 (+40.4%) 73.6 (+44.9%) 40
  41. 41. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS41 100% 90%
  42. 42. Structure Learning [NAACL-HLT’15] UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Input: Unlabelled user utterances Output: Slots with relations Step 1: Construct a graph to represent slots, words, and relations Step 2: Compute scores for edges (relations) and nodes (slots) Step 3: Identify important relations connecting important slot pairs Y.-N. Chen et al., "Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding," in Proc. of NAACL-HLT, 2015. Restaurant Asking Conversations Domain-Specific Ontology Unlabelled Collection locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ 42
  43. 43. Step 1: Knowledge Graph Construction UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS ccomp amod dobjnsubj det Syntactic dependency parsing on utterances Word-based lexical knowledge graph Slot-based semantic knowledge graph can i have a cheap restaurant capability expensiveness locale_by_use restaurant can have i a cheap w w capability locale_by_use expensiveness s 43
  44. 44. Step 1: Knowledge Graph Construction UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS The edge between a node pair is weighted as relation importance Word-based lexical knowledge graph Slot-based semantic knowledge graph restaurant can have i a cheap w w capability locale_by_use expensiveness s How to decide the weights to represent relation importance? 44
  45. 45. Dependency-based word embeddings Dependency-based slot embeddings Step 2: Weight Measurement Slot/Word Embeddings Training UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS can = [0.8 … 0.24] have = [0.3 … 0.21] : : expensiveness = [0.12 … 0.7] capability = [0.3 … 0.6] : : can i have a cheap restaurant ccomp amod dobjnsubj det have acapability expensiveness locale_by_use ccomp amod dobjnsubj det 45 Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014.
  46. 46. Step 2: Weight Measurement UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Compute edge weights to represent relation importance ◦ Slot-to-slot relation 𝐿 𝑠𝑠: similarity between slot embeddings ◦ Word-to-slot relation 𝐿 𝑤𝑠 or 𝐿 𝑠𝑤: frequency of the slot-word pair ◦ Word-to-word relation 𝐿 𝑤𝑤: similarity between word embeddings 𝐿 𝑤𝑤 𝐿 𝑤𝑠 𝑜𝑟 𝐿 𝑠𝑤 𝐿 𝑠𝑠 w1 w2 w3 w4 w5 w6 w7 s2 s1 s3 46
  47. 47. Step 2: Slot Importance by Random Walk UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS scores propagated from word-layer then propagated within slot-layer Assumption: the slots with more dependencies to more important slots should be more important The random walk algorithm computes importance for each slot original frequency score slot importance 𝐿 𝑤𝑤 𝐿 𝑤𝑠 𝑜𝑟 𝐿 𝑠𝑤 𝐿 𝑠𝑠 w1 w2 w3 w4 w5 w6 w7 s2 s1 s3 Converged scores can be obtained by a closed form solution. https://github.com/yvchen/MRRW 47
  48. 48. The converged slot importance suggests whether the slot is important Rank slot pairs by summing up their converged slot importance Select slot pairs with higher scores according to a threshold Step 3: Identify Domain Slots w/ Relations UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS s1 s2 rs(1) + rs(2) s3 s4 rs(3) + rs(4) : : : : locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ 48 (Experiment 1) (Experiment 2)
  49. 49. Experiments for Structure Learning Experiment 1: Quality of Slot Importance UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Dataset: Cambridge University SLU Corpus Approach ASR Manual AP (%) AUC (%) AP (%) AUC (%) Baseline: MLE 56.7 54.7 53.0 50.8 Random Walk: MLE + Dependent Relations 69.0 (+21.8%) 68.5 (+24.8%) 75.2 (+41.8%) 74.5 (+46.7%) Dependent relations help decide domain-specific knowledge. 49
  50. 50. Experiments for Structure Learning Experiment 2: Relation Discovery Evaluation UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Discover inter-slot relations connecting important slot pairs The reference ontology with the most frequent syntactic dependencies locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ type food pricerange DOBJ AMOD AMOD AMOD task area PREP_IN 50
  51. 51. Experiments for Structure Learning Experiment 2: Relation Discovery Evaluation UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Discover inter-slot relations connecting important slot pairs The reference ontology with the most frequent syntactic dependencies locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ type food pricerange DOBJ AMOD AMOD AMOD task area PREP_IN The automatically learned domain ontology aligns well with the reference one. 51
  52. 52. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS52 100% 90% 100%
  53. 53. Surface Form Derivation [SLT’14b] Input: a domain-specific organized ontology Output: surface forms corresponding to entities in the ontology UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Domain-Specific Ontology Y.-N. Chen et al., "Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding," in Proc. of SLT, 2014. movie genrechar director directed_by genre casting character, actor, role, … action, comedy, … director, filmmaker, … 53
  54. 54. Surface Form Derivation Step 1: Mining query snippets corresponding to specific relations Step 2: Training dependency-based entity embeddings Step 3: Deriving surface forms based on embeddings Y.-N. Chen et al., "Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding," in Proc. of SLT, 2014. Relational Surface Form Derivation Entity Embeddings PF (r | w) Entity Surface Forms PC (r | w) Entity Syntactic Contexts Query Snippets Knowledge Graph “filmmaker”, “director”  $director “play”  $character UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS54
  55. 55. Step 1: Mining Query Snippets on Web Query snippets including entity pairs connected with specific relations in KG Avatar is a 2009 American epic science fiction film directed by James Cameron. directed_by UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS The Dark Knight is a 2008 superhero film directed and produced by Christopher Nolan. directed_by directed_by Avatar James Cameron directed_by The Dark Knight Christopher Nolan 55
  56. 56. Step 2: Training Entity Embeddings Dependency parsing for training dependency-based embeddings Avatar is a 2009 American epic science fiction film Camerondirected by James nsub numdetcop nn vmod prop_by nn $movie $director prop pobjnnnnnn UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS $movie = [0.8 … 0.24] is = [0.3 … 0.21] : : film = [0.12 … 0.7] 56 Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014.
  57. 57. Step 3: Deriving Surface Forms Entity Surface Forms ◦ learn the surface forms corresponding to entities ◦ most similar word vectors for each entity embedding Entity Syntactic Contexts ◦ learn the important contexts of entities ◦ most similar context vectors for each entity embedding $char: “character”, “role”, “who” $director: “director”, “filmmaker” $genre: “action”, “fiction” $char: “played” $director: “directed”  with similar contexts  frequently occurring together UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS57
  58. 58. Experiments of Surface Form Derivation Knowledge Base: Freebase ◦ 670K entities; 78 entity types (movie names, actors, etc) UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Entity Tag Derived Word $character character, role, who, girl, she, he, officier $director director, dir, filmmaker $genre comedy, drama, fantasy, cartoon, horror, sci $language language, spanish, english, german $producer producer, filmmaker, screenwriter The dependency-based embeddings are able to derive reasonable surface forms, which may benefit semantic understanding. 58
  59. 59. Integrated with Background Knowledge Relation Inference from Gazetteers Entity Dict. PE (r | w) Knowledge Graph Entity Probabilistic Enrichment Ru (r) Final Results “find me some films directed by james cameron” Input Utterance Background Knowledge Relational Surface Form Derivation Entity Embeddings PF (r | w) Entity Surface Forms PC (r | w) Entity Syntactic Contexts Local Relational Surface Form Bing Query Snippets Knowledge Graph D. Hakkani-Tür et al., “Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,” in Proc. of Interspeech, 2014. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS59
  60. 60. Approach Micro F-Measure (%) Baseline: Gazetteer 38.9 Gazetteer + Entity Surface Form + Entity Context 43.3 (+11.4%) Experiments of Surface Form Derivation Relation Detection Data (NL-SPARQL): a dialog system challenge set for converting natural language to structured queries [Hakkani-Tür, 2014] ◦ Crowd-sourced utterances (3,338 for self-training, 1,084 for testing) Evaluation Metric: micro F-measure (%) UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Derived entity surface forms benefit the SLU performance. 60 D. Hakkani-Tür et al., “Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,” in Proc. of Interspeech, 2014.
  61. 61. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Knowledge Acquisition SLU Modeling Restaurant Asking Conversations Organized Domain Knowledge Unlabelled Collection 61
  62. 62. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS62 100% 90% 100% 90%
  63. 63. SLU Model Semantic Representation “can I have a cheap restaurant” Ontology Induction Unlabeled Collection Semantic KG Frame-Semantic Parsing Fw Fs Feature Model Rw Rs Knowledge Graph Propagation Model Word Relation Model Lexical KG Slot Relation Model Structure Learning . Semantic KG SLU Modeling by Matrix Factorization Semantic Decoding Input: user utterances, automatically learned knowledge Output: the semantic concepts included in each individual utterance UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Y.-N. Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," submitted. 63
  64. 64. Ontology Induction SLU Fw Fs Structure Learning . Matrix Factorization (MF) Feature Model 1 Utterance 1 i would like a cheap restaurant Word Observation Slot Candidate Train ……… cheap restaurant foodexpensiveness 1 locale_by_use 11 find a restaurant with chinese food Utterance 2 1 1 food 1 1 1 Test 1 1 .97.90 .95.85 .93 .92.98.05 .05 Slot Induction UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS show me a list of cheap restaurants Test Utterance 64 hidden semantics
  65. 65. Matrix Factorization (MF) Knowledge Graph Propagation Model UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Reasoning with Matrix Factorization Word Relation Model Slot Relation Model word relation matrix slot relation matrix ‧ 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 .97.90 .95.85 .93 .92.98.05 .05 Slot Induction 65 The MF method completes a partially-missing matrix based on the latent semantics by decomposing it into product of two matrices.
  66. 66. Bayesian Personalized Ranking for MF UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Model implicit feedback ◦ not treat unobserved facts as negative samples (true or false) ◦ give observed facts higher scores than unobserved facts Objective: 1 𝑓+ 𝑓− 𝑓− The objective is to learn a set of well-ranked semantic slots per utterance. 𝑢 66 𝑥
  67. 67. Experiments of Semantic Decoding Experiment 1: Quality of Semantics Estimation UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Dataset: Cambridge University SLU Corpus Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance Approach ASR Manual Baseline: Logistic Regression 34.0 38.8 Random 22.5 25.1 Majority Class 32.9 38.4 MF Approach Feature Model 37.6 45.3 Feature Model + Knowledge Graph Propagation 43.5 (+27.9%) 53.4 (+37.6%) Modeling Implicit Semantics The MF approach effectively models hidden semantics to improve SLU. Adding a knowledge graph propagation model further improves the results. 67
  68. 68. Experiments of Semantic Decoding Experiment 2: Effectiveness of Relations UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Dataset: Cambridge University SLU Corpus Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance Both semantic and dependent relations are useful to infer hidden semantics. Approach ASR Manual Feature Model 37.6 45.3 Feature + Knowledge Graph Propagation Semantic Relation 41.4 (+10.1%) 51.6 (+13.9%) Dependent Relation 41.6 (+10.6%) 49.0 (+8.2%) Both 43.5 (+15.7%) 53.4 (+17.9%) Combining both types of relations further improves the performance. 68
  69. 69. Low- and High-Level Understanding UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Semantic concepts for individual utterances do not consider high-level semantics (user intents) The follow-up behaviors are observable and usually correspond to user intents price=“cheap” target=“restaurant” SLU Component “can i have a cheap restaurant” behavior=navigation 69 restaurant=“legume” time=“tonight” SLU Component “i plan to dine in legume tonight” behavior=reservation
  70. 70. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS70 100% 90% 100% 90% 10%
  71. 71. Behavior Prediction 1 Utterance 1 play lady gaga’s song bad romance Feature Observation Behavior Train ……… play song pandora youtube 1 maps 1 i’d like to listen to lady gaga’s bad romance Utterance 2 1 listen 1 1 Test 1 .97.90 .05.85 Feature Relation Behavior Relation Predicting with Matrix Factorization Identification SLU Model Predicted Behavior “play lady gaga’s bad romance” Behavior Identification Unlabeled Collection SLU Modeling for Behavior Prediction Ff Fb Feature Model Rf Rb Relation Model Feature Relation Model Behavior Relation Model . UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS71
  72. 72. Data: 195 utterances for making requests about launching an app Behavior Identification ◦ popular domains in Google Play Feature Relation Model ◦ Word similarity, etc. Behavior Relation Model ◦ Relation between different apps Mobile Application Domain please dial a phone call to alex I’d like navigation instruction from my home to cmu skype, hangout, etc. Maps UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS72
  73. 73. Insurance-Related Domain Data: 52 conversations between customers and agents from MetLife call center; 2,205 automatically segmented utterances Behavior Identification ◦ call_back, verify_id, ask_info Feature Relation Model ◦ Word similarity, etc. Behavior Relation Model ◦ transition probability, etc. A: Thanks for calling MetLife this is XXX how may help you today C: I wish to change the beneficiaries on my three life insurance policies and I want to change from two different people so I need 3 sets of forms A: Currently your name A: now will need to send a format I'm can mail the form fax it email it orders on the website C: please mail the hard copies to me greeting ask_action verify_id ask_detail ask_action UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS73
  74. 74. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS SLU Modeling Organized Domain Knowledge price=“cheap” target=“restaurant” behavior=navigation SLU Model “can i have a cheap restaurant” 74
  75. 75. Outline Introduction Ontology Induction [ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding (submitted) Behavior Prediction (proposed) Conclusions & Timeline UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS75 100% 90% 100% 90% 10%
  76. 76. Summary UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS76 1. Ontology Induction  Semantic relations are useful 2. Structure Learning  Dependent relations are useful 3. Surface Form Derivation  Surface forms benefit understanding Knowledge Acquisition 4. Semantic Decoding  The MF approach builds an SLU model to decode semantics 5. Behavior Prediction  The MF approach builds an SLU model to predict behaviors SLU Modeling
  77. 77. Conclusions and Future Work The knowledge acquisition procedure enables systems to automatically learn open domain knowledge and produce domain-specific ontologies. The MF technique for SLU modeling provides a principle model that is able to unify the automatically acquired knowledge, and then allows systems to consider implicit semantics for better understanding. ◦ Better semantic representations for individual utterances ◦ Better follow-up behavior prediction The thesis work shows the feasibility and the potential of improving generalization, maintenance, efficiency, and scalability of SDSs. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS77
  78. 78. Timeline April 2015 – finish identifying important behaviors for insurance data May 2015 – finish annotating behavior labels for insurance data June 2015 – implement behavior prediction for both data July 2015 – submit a conference paper about behavior prediction (ASRU) August 2015 – submit a journal paper about open domain SLU September 2015 – finish thesis writing October 2015 – thesis defense December 2015 – finish thesis revision UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS78
  79. 79. Q & A THANKS FOR YOUR ATTENTIONS!! THANKS TO MY COMMITTEE MEMBERS FOR THEIR HELPFUL FEEDBACK. UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS79
  80. 80. Word Embeddings Training Process ◦ Each word w is associated with a vector ◦ The contexts within the window size c are considered as the training data D ◦ Objective function: UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS [back] wt-2 wt-1 wt+1 wt+2 wt SUM INPUT PROJECTION OUTPUT CBOW Model Mikolov et al., " Efficient Estimation of Word Representations in Vector Space," in Proc. of ICLR, 2013. Mikolov et al., " Distributed Representations of Words and Phrases and their Compositionality," in Proc. of NIPS, 2013. Mikolov et al., " Linguistic Regularities in Continuous Space Word Representations," in Proc. of NAACL-HLT, 2013. 80
  81. 81. Dependency-Based Embeddings Word & Context Extraction Word Contexts can have/ccomp i have/nsub-1 have can/ccomp-1, i/nsubj, restaurant/dobj a restaurant/det-1 cheap restaurant/amod-1 restaurant have/dobj-1, a/det, cheap/amod UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014. can i have a cheap restaurant ccomp amod dobjnsubj det 81
  82. 82. Dependency-Based Embeddings Training Process ◦ Each word w is associated with a vector vw and each context c is represented as a vector vc ◦ Learn vector representations for both words and contexts such that the dot product vw.vc associated with good word-context pairs belonging to the training data D is maximized ◦ Objective function: UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS [back]Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014. 82
  83. 83. Evaluation Metrics ◦ Slot Induction Evaluation: Average Precision (AP) and Area Under the Precision-Recall Curve (AUC) of the slot ranking model to measure the quality of induced slots via the mapping table UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS [back] 1. locale_by_use 0.89 2. capability 0.68 3. food 0.64 4. expensiveness 0.49 5. quantity 0.30 Precision 1 0 2/3 3/4 0 AP = 80.56% 83
  84. 84. Slot Induction on ASR & Manual Results The slot importance: UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS [back] 1. locale_by_use 0.88 2. expensiveness 0.78 3. food 0.64 4. capability 0.59 5. quantity 0.44 1. locale_by_use 0.89 2. capability 0.68 3. food 0.64 4. expensiveness 0.49 5. quantity 0.30 ASR Manual Users tend to speak important information more clearly, so misrecognition of less important slots may slightly benefit the slot induction performance. frequency 84
  85. 85. Slot Mapping Table origin UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS food u1 u2 : uk : un asian : : japan : : asian beer : japan : noodle food : beer : : : noodle Create the mapping if slot fillers of the induced slot are included by the reference slot induced slots reference slot [back] 85
  86. 86. Random Walk Algorithm UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS The converged algorithm satisfies The derived closed form solution is the dominant eigenvector of M [back] 86
  87. 87. SEMAFOR Performance UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS The SEMAFOR evaluation [back] 87
  88. 88. Matrix Factorization UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS The decomposed matrices represent latent semantics for utterances and words/slots respectively The product of two matrices fills the probability of hidden semantics [back] 88 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 .97.90 .95.85 .93 .92.98.05 .05 𝑼 𝑾 + 𝑺 ≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑺×
  89. 89. Cambridge University SLU Corpus UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS hi i'd like a restaurant in the cheap price range in the centre part of town um i'd like chinese food please how much is the main cost okay and uh what's the address great uh and if i wanted to uh go to an italian restaurant instead italian please what's the address i would like a cheap chinese restaurant something in the riverside [back] 89 type=restaurant, pricerange=cheap, area=centre food=chinese pricerange addr food=italian, type=restaurant food=italian addr pricerange=cheap, food=chinese, type=restaurant area=centre
  90. 90. Relation Detection Data: NL-SPARQL UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS how many movies has alfred hitchcock directed? who are the actors in captain america [back] 90 movie.directed_by; movie.type; director.name movie.starring.actor; movie.name The SPARQL queries corresponding natural languages are converted into semantic relations as follows:

×