Advertisement

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

Assistant Professor at National Taiwan University
Dec. 4, 2015
Advertisement

More Related Content

Similar to Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems(20)

Advertisement

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

  1. Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems YUN-NUNG (VIVIAN) CHEN H T T P : / / V I V I A N C H E N . I D V . T W 2015 DECEMBER COMMITTEE: ALEXANDER I. RUDNICKY (CHAIR) ANATOLE GERSHMAN (CO-CHAIR) ALAN W BLACK DILEK HAKKANI-TÜR ( ) 1UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS
  2. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 2 Knowledge Acquisition SLU Modeling
  3. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 3
  4. Apple Siri (2011) Google Now (2012) Microsoft Cortana (2014) Amazon Alexa/Echo (2014) https://www.apple.com/ios/siri/ https://www.google.com/landing/now/ http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortana http://www.amazon.com/oc/echo/ Facebook M (2015) Intelligent Assistants UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 4
  5. Large Smart Device Population Global Digital Statistics (2015 January) Global Population 7.21B Active Internet Users 3.01B Active Social Media Accounts 2.08B Active Unique Mobile Users 3.65B UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 5 The more natural and convenient input of the devices evolves towards speech.
  6. Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions. Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). Good SDSs assist users to organize and access information conveniently. Spoken Dialogue System (SDS) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 6 JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
  7. ASR: Automatic Speech Recognition SLU: Spoken Language Understanding DM: Dialogue Management NLG: Natural Language Generation SDS Architecture DomainDMASR SLU NLG UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 7 current bottleneck
  8. Knowledge Representation/Ontology Traditional SDSs require manual annotations for specific domains to represent domain knowledge. Restaurant Domain Movie Domain restaurant typeprice location movie yeargenre director Node: semantic concept/slot Edge: relation between concepts located_in directed_by released_in UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 8
  9. Utterance Semantic Representation An SLU model requires a domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot- fillers) of the utterance. find a cheap taiwanese restaurant in oakland show me action movies directed by james cameron target=“restaurant”, price=“cheap”, type=“taiwanese”, location=“oakland” target=“movie”, genre=“action”, director=“james cameron” Restaurant Domain Movie Domain restaurant typeprice location movie yeargenre director UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 9
  10. Challenges for SDS An SDS in a new domain requires 1) A hand-crafted domain ontology 2) Utterances labelled with semantic representations 3) An SLU component for mapping utterances into semantic representations Manual work results in high cost, long duration and poor scalability of system development. The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modeling in order to handle the open-domain requests. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 10 seeking=“find” target=“eating place” price=“cheap” food=“asian food” find a cheap eating place for asian food   fully unsupervised Prior Focus
  11. Questions to Address 1) Given unlabelled conversations, how can a system automatically induce and organize domain-specific concepts? 2) With the automatically acquired knowledge, how can a system understand utterance semantics and user intents? UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 11
  12. Interaction Example find a cheap eating place for asian food User Intelligent Agent Q: How does a dialogue system process this request? Cheap Asian eating places include Rose Tea Cafe, Little Asia, etc. What do you want to choose? I can help you go there. (navigation) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 12
  13. Process Pipeline target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted intent: navigation find a cheap eating place for asian food User Required Domain-Specific Information UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 13
  14. Thesis Contributions target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted intent: navigation find a cheap eating place for asian food User Ontology Induction (semantic slot) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 14
  15. Thesis Contributions target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted intent: navigation find a cheap eating place for asian food User Ontology Induction Structure Learning (inter-slot relation) (semantic slot) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 15
  16. Thesis Contributions target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted intent: navigation find a cheap eating place for asian food User Ontology Induction Structure Learning Surface Form Derivation (natural language) (inter-slot relation) (semantic slot) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 16
  17. Thesis Contributions target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted intent: navigation find a cheap eating place for asian food User Ontology Induction Structure Learning Surface Form Derivation Semantic Decoding (natural language) (inter-slot relation) (semantic slot) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 17
  18. Thesis Contributions target foodprice AMOD NN seeking PREP_FOR SELECT restaurant { restaurant.price=“cheap” restaurant.food=“asian food” } Predicted intent: navigation find a cheap eating place for asian food User Ontology Induction Structure Learning Surface Form Derivation Semantic Decoding Intent Prediction (natural language) (inter-slot relation) (semantic slot) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 18
  19. Thesis Contributions find a cheap eating place for asian food User Ontology Induction Structure Learning Surface Form Derivation Semantic Decoding Intent Prediction UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 19
  20.  Ontology Induction  Structure Learning  Surface Form Derivation  Semantic Decoding  Intent Prediction Thesis Contributions find a cheap eating place for asian food User Knowledge Acquisition SLU Modeling UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 20
  21. Knowledge Acquisition 1) Given unlabelled conversations, how can a system automatically induce and organize domain-specific concepts? Restaurant Asking Conversations target food price seeking quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD Organized Domain Knowledge Unlabelled Collection Knowledge Acquisition Knowledge Acquisition  Ontology Induction  Structure Learning  Surface Form Derivation UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 21
  22. SLU Modeling 2) With the automatically acquired knowledge, how can a system understand utterance semantics and user intents? Organized Domain Knowledge price=“cheap” target=“restaurant” intent=navigation SLU Modeling SLU Component “can i have a cheap restaurant” SLU Modeling  Semantic Decoding  Intent Prediction UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 22
  23. SDS Architecture – Contributions DomainDMASR SLU NLG Knowledge Acquisition SLU Modeling UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 23 current bottleneck
  24. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work Knowledge Acquisition UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 24 SLU Modeling
  25. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 25
  26. Ontology Induction [ASRU’13, SLT’14a] Input: Unlabelled user utterances Output: Slots that are useful for a domain-specific SDS Chen et al., "Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing," in Proc. of ASRU, 2013. (Best Student Paper Award) Chen et al., "Leveraging Frame Semantics and Distributional Semantics for Unsupervised Semantic Slot Induction in Spoken Dialogue Systems," in Proc. of SLT, 2014. Restaurant Asking Conversations target food price seeking quantity Domain-Specific Slot Unlabelled Collection UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 26 Idea: select a subset of FrameNet-based slots for domain-specific SDS Best Student Paper Award
  27. Step 1: Frame-Semantic Parsing (Das et al., 2014) can i have a cheap restaurant Frame: capability Frame: expensiveness Frame: locale by use Task: differentiate domain-specific frames from generic frames for SDSs Good! Good! ? UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 27 Das et al., " Frame-semantic parsing," in Proc. of Computational Linguistics, 2014. slot candidate
  28. Step 2: Slot Ranking Model Compute an importance score of a slot candidate s by slot frequency in the domain-specific conversation semantic coherence of slot fillers slots with higher frequency  more important domain-specific concepts  fewer topics UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 28 measured by cosine similarity between their word embeddings
  29. Step 3: Slot Selection Rank all slot candidates by their importance scores Output slot candidates with higher scores based on a threshold frequency semantic coherence locale_by_use 0.89 capability 0.68 food 0.64 expensiveness 0.49 quantity 0.30 seeking 0.22 : locale_by_use food expensiveness capability quantity UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 29
  30. Experiments of Ontology Induction Dataset ◦ Cambridge University SLU corpus [Henderson, 2012] ◦ Restaurant recommendation (WER = 37%) ◦ 2,166 dialogues ◦ 15,453 utterances ◦ dialogue slot: addr, area, food, name, phone, postcode, price range, task, type The mapping table between induced and reference slots Henderson et al., "Discriminative spoken language understanding using word confusion networks," in Proc. of SLT, 2012. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 30
  31. Experiments of Ontology Induction Experiment: Slot Induction ◦ Metric: Average Precision (AP) and Area Under the Precision-Recall Curve (AUC) of the slot ranking model to measure quality of induced slots via the mapping table Semantic relations help decide domain-specific knowledge. Approach Word Embedding ASR Transcripts AP (%) AUC (%) AP (%) AUC (%) Baseline: MLE 58.2 56.2 55.0 53.5 Proposed: + Coherence In-Domain Word Vec. 67.0 65.8 58.0 56.5 External Word Vec. 74.5 (+39.9%) 73.5 (+44.1%) 65.0 (+18.1%) 64.2 (+19.9%) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 31
  32. Experiments of Ontology Induction Sensitivity to Amount of Training Data ◦ Different amount of training transcripts for ontology induction Most approaches are not sensitive to training data size due to single-domain dialogues. The external word vectors trained on larger data perform better. 45 55 65 0.2 0.4 0.6 0.8 1.0 Frequency In-Domain Word Vector External Word Vector AvgP (%) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 32 Ratio of Training Data
  33. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 33
  34. Structure Learning [NAACL-HLT’15] Input: Unlabelled user utterances Output: Slots with relations Chen et al., "Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding," in Proc. of NAACL-HLT, 2015. Restaurant Asking Conversations Domain-Specific Ontology Unlabelled Collection locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 34 Idea: construct a knowledge graph and then compute slot importance based on relations
  35. Step 1: Knowledge Graph Construction ccomp amod dobjnsubj det Syntactic dependency parsing on utterances Word-based lexical knowledge graph Slot-based semantic knowledge graph can i have a cheap restaurant capability expensiveness locale_by_use restaurant can have i a cheap w w capability locale_by_use expensiveness s UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 35
  36. Step 2: Edge Weight Measurement Compute edge weights to represent relation importance ◦ Slot-to-slot relation 𝐿 𝑠𝑠: similarity between slot embeddings ◦ Word-to-slot relation 𝐿 𝑤𝑠 or 𝐿 𝑠𝑤: frequency of the slot-word pair ◦ Word-to-word relation 𝐿 𝑤𝑤: similarity between word embeddings 𝐿 𝑤𝑤 𝐿 𝑤𝑠 𝑜𝑟 𝐿 𝑠𝑤 𝐿 𝑠𝑠 w1 w2 w3 w4 w5 w6 w7 s2 s1 s3 UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 36
  37. Step 2: Slot Importance by Random Walk scores propagated from word-layer then propagated within slot-layer Assumption: the slots with more dependencies to more important slots should be more important The random walk algorithm computes importance for each slot original frequency score slot importance 𝐿 𝑤𝑤 𝐿 𝑤𝑠 𝑜𝑟 𝐿 𝑠𝑤 𝐿 𝑠𝑠 w1 w2 w3 w4 w5 w6 w7 s2 s1 s3 Converged scores can represent the importance. https://github.com/yvchen/MRRW UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 37 locale_by_use food expensiveness seeking relational_quantity desiring
  38. The converged slot importance suggests whether the slot is important Rank slot pairs by summing up their converged slot importance Select slot pairs with higher scores according to a threshold Step 3: Identify Domain Slots w/ Relations s1 s2 rs(1) + rs(2) s3 s4 rs(3) + rs(4) : : : : locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ (Experiment 1) (Experiment 2) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 38
  39. Experiments for Structure Learning Experiment 1: Quality of Slot Importance Dataset: Cambridge University SLU Corpus Approach ASR Transcripts AP (%) AUC (%) AP (%) AUC (%) Baseline: MLE 56.7 54.7 53.0 50.8 Proposed: Random Walk via Dependencies 71.5 (+26.1%) 70.8 (+29.4%) 76.4 (+44.2%) 76.0 (+49.6%) Dependency relations help decide domain-specific knowledge. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 39
  40. Experiments for Structure Learning Experiment 2: Relation Discovery Analysis Discover inter-slot relations connecting important slot pairs The reference ontology with the most frequent syntactic dependencies locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ type food pricerange DOBJ AMOD AMOD AMOD task area PREP_IN UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 40
  41. Experiments for Structure Learning Experiment 2: Relation Discovery Analysis Discover inter-slot relations connecting important slot pairs The reference ontology with the most frequent syntactic dependencies locale_by_use food expensiveness seeking relational_quantity PREP_FOR PREP_FOR NN AMOD AMOD AMOD desiring DOBJ type food pricerange DOBJ AMOD AMOD AMOD task area PREP_IN The automatically learned domain ontology aligns well with the reference one. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 41
  42. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 42
  43. Surface Form Derivation [SLT’14b] Input: a domain-specific organized ontology Output: surface forms corresponding to entities in the ontology Domain-Specific Ontology Chen et al., "Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding," in Proc. of SLT, 2014. movie genrechar director directed_by genre casting character, actor, role, … action, comedy, … director, filmmaker, … UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 43 Idea: mine patterns from the web to help understanding
  44. Step 1: Mining Query Snippets on Web Query snippets including entity pairs connected with specific relations in KG Avatar is a 2009 American epic science fiction film directed by James Cameron. directed_by The Dark Knight is a 2008 superhero film directed and produced by Christopher Nolan. directed_by directed_by Avatar James Cameron directed_by The Dark Knight Christopher Nolan UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 44
  45. Step 2: Training Entity Embeddings Dependency parsing for training dependency-based embeddings Avatar is a 2009 American epic science fiction film Camerondirected by James nsub numdetcop nn vmod prop_by nn $movie $director prop pobjnnnnnn $movie = [0.8 … 0.24] is = [0.3 … 0.21] : : film = [0.12 … 0.7] Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 45
  46. Step 3: Deriving Surface Forms Entity Surface Forms ◦ learn the surface forms corresponding to entities ◦ most similar word vectors for each entity embedding Entity Contexts ◦ learn the important contexts of entities ◦ most similar context vectors for each entity embedding $char: “character”, “role”, “who” $director: “director”, “filmmaker” $genre: “action”, “fiction” $char: “played” $director: “directed”  with similar contexts  frequently occurring together UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 46
  47. Experiments of Surface Form Derivation Knowledge Base: Freebase ◦ 670K entities; 78 entity types (movie names, actors, etc) Entity Tag Derived Word $character character, role, who, girl, she, he, officier $director director, dir, filmmaker $genre comedy, drama, fantasy, cartoon, horror, sci $language language, spanish, english, german $producer producer, filmmaker, screenwriter The web-derived surface forms provide useful knowledge for better understanding. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 47
  48. Integrated with Background Knowledge Inference from GazetteersEntity Dict. PE (r | w) Knowledge Graph Entity Probabilistic Enrichment Ru (r) Final Results “find me some films directed by james cameron” Input Utterance Background Knowledge Surface Form Derivation Entity Embeddings PF (r | w) Entity Surface Forms PC (r | w) Entity Contexts Local Relational Surface Form Snippets Knowledge Graph Hakkani-Tür et al., “Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,” in Proc. of Interspeech, 2014. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 48
  49. Approach Micro F-Measure (%) Baseline: Gazetteer 38.9 Gazetteer + Entity Surface Form + Entity Context 43.3 (+11.4%) Experiments of Surface Form Derivation Relation Detection Data (NL-SPARQL): a dialog system challenge set for converting natural language to structured queries (Hakkani-Tür et al., 2014) ◦ Crowd-sourced utterances (3,338 for self-training, 1,084 for testing) Metric: micro F-measure (%) The web-derived knowledge can benefit SLU performance. Hakkani-Tür et al., “Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding,” in Proc. of Interspeech, 2014. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 49
  50. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work SLU Modeling Knowledge Acquisition Restaurant Asking Conversations Organized Domain Knowledge Unlabelled Collection UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 50
  51. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 51
  52. Semantic Decoding [ACL-IJCNLP’15] Input: user utterances, automatically acquired knowledge Output: the semantic concepts included in each individual utterance Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015. SLU Model target=“restaurant” price=“cheap” “can I have a cheap restaurant” Frame-Semantic Parsing Unlabeled Collection Semantic KG Ontology Induction Fw Fs Feature Model Rw Rs Knowledge Graph Propagation Model Word Relation Model Lexical KG Slot Relation Model Structure Learning × Semantic KG MF-SLU: SLU Modeling by Matrix Factorization Semantic Representation UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 52 Idea: utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
  53. Ontology Induction SLU Fw Fs Structure Learning × 1 Utterance 1 i would like a cheap restaurant Word Observation Slot Candidate Train … cheap restaurant foodexpensiveness 1 locale_by_use 11 find a restaurant with chinese food Utterance 2 1 1 food 1 1 1 Test 1 .97.90 .95.85 Ontology Induction show me a list of cheap restaurants Test Utterance hidden semantics Matrix Factorization SLU (MF-SLU) Feature Model UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 53 MF completes a partially-missing matrix based on a low-rank latent semantics assumption.
  54. Matrix Factorization (Rendle et al., 2009) The decomposed matrices represent latent semantics for utterances and words/slots respectively The product of two matrices fills the probability of hidden semantics 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 .97.90 .95.85 .93 .92.98.05 .05 𝑼 𝑾 + 𝑺 ≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑺× UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 54 Rendle et al., “BPR: Bayesian Personalized Ranking from Implicit Feedback," in Proc. of UAI, 2009.
  55. Reasoning with Matrix Factorization Word Relation Model Slot Relation Model word relation matrix slot relation matrix × 1 Word Observation Slot Candidate Train cheap restaurant foodexpensiveness 1 locale_by_use 11 1 1 food 1 1 1 Test 1 1 .97.90 .95.85 .93 .92.98.05 .05 Slot Induction Matrix Factorization SLU (MF-SLU) Feature Model + Knowledge Graph Propagation Model UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 55 Structure information is integrated to make the self-training data more reliable before MF.
  56. Experiments of Semantic Decoding Experiment 1: Quality of Semantics Estimation Dataset: Cambridge University SLU Corpus Metric: MAP of all estimated slot probabilities for each utterance UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 56 Approach ASR Transcripts Baseline: SLU Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8
  57. Experiments of Semantic Decoding Experiment 1: Quality of Semantics Estimation Dataset: Cambridge University SLU Corpus Metric: MAP of all estimated slot probabilities for each utterance The MF-SLU effectively models implicit information to decode semantics. The structure information further improves the results. Approach ASR Transcripts Baseline: SLU Support Vector Machine 32.5 36.6 Multinomial Logistic Regression 34.0 38.8 Proposed: MF-SLU Feature Model 37.6* 45.3* Feature Model + Knowledge Graph Propagation 43.5* (+27.9%) 53.4* (+37.6%) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 57 *: the result is significantly better than the MLR with p < 0.05 in t-test
  58. Experiments of Semantic Decoding Experiment 2: Effectiveness of Relations Dataset: Cambridge University SLU Corpus Metric: MAP of all estimated slot probabilities for each utterance In the integrated structure information, both semantic and dependency relations are useful for understanding. Approach ASR Transcripts Feature Model 37.6 45.3 Feature + Knowledge Graph Propagation Semantic 41.4* 51.6* Dependency 41.6* 49.0* All 43.5* (+15.7%) 53.4* (+17.9%) UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 58 *: the result is significantly better than the MLR with p < 0.05 in t-test
  59. Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents) The follow-up behaviors usually correspond to user intents price=“cheap” target=“restaurant” SLU Model “can i have a cheap restaurant” intent=navigation restaurant=“legume” time=“tonight” SLU Model “i plan to dine in legume tonight” intent=reservation UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 59
  60. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 60
  61. [Chen & Rudnicky, SLT 2014; Chen et al., ICMI 2015] Input: spoken utterances for making requests about launching an app Output: the apps supporting the required functionality Intent Identification ◦ popular domains in Google Play please dial a phone call to alex Skype, Hangout, etc. Intent Prediction of Mobile Apps [SLT’14c] UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 61 Chen and Rudnicky, "Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings," in Proc. of SLT, 2014.
  62. Input: single-turn request Output: the apps that are able to support the required functionality Intent Prediction– Single-Turn Request 1 Enriched Semantics communication .90 1 1 Utterance 1 i would like to contact alex Word Observation Intended App …… contact message Gmail Outlook Skypeemail Test .90 Reasoning with Feature-Enriched MF Train … your email, calendar, contacts… … check and send emails, msgs … Outlook Gmail IR for app candidates App Desc Self-Train Utterance Test Utterance 1 1 1 1 1 1 1 1 1 1 1 .90 .85 .97 .95 Feature Enrichment Utterance 1 i would like to contact alex … 1 1 UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 62 The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents.
  63. Intent Prediction– Multi-Turn Interaction [ICMI’15] Input: multi-turn interaction Output: the app the user plans to launch Challenge: language ambiguity 1) User preference 2) App-level contexts Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 63 send to vivian v.s. Email? Message? Communication Idea: Behavioral patterns in history can help intent prediction. previous turn
  64. Intent Prediction– Multi-Turn Interaction [ICMI’15] Input: multi-turn interaction Output: the app the user plans to launch 1 Lexical Intended App photo check camera IMtell take this photo tell vivian this is me in the lab CAMERA IM Train Dialogue check my grades on website send an email to professor … CHROME EMAIL send Behavior History null camera .85 take a photo of this send it to alice CAMERA IM … email 1 1 1 1 1 1 .70 chrome 1 1 1 1 1 1 chrome email 1 1 1 1 .95 .80 .55 User Utterance Intended App Reasoning with Feature-Enriched MF Test Dialogue take a photo of this send it to alice … Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 64 The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction.
  65. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 26.1 Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 55.5 LM-Based IR Model (unsupervised) Multinomial Logistic Regression (supervised) Experiments for Intent Prediction UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 65
  66. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%) Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%) Modeling hidden semantics helps intent prediction especially for noisy data. Experiments for Intent Prediction UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 66
  67. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%) Word + Embedding-Based Semantics 32.0 33.3 Word + Type-Embedding-Based Semantics 31.5 32.9 Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%) Word + Behavioral Patterns 53.9 56.6 Semantic enrichment provides rich cues to improve performance. Experiments for Intent Prediction UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 67
  68. Single-Turn Request: Mean Average Precision (MAP) Multi-Turn Interaction: Mean Average Precision (MAP) Feature Matrix ASR Transcripts LM MF-SLU LM MF-SLU Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%) Word + Embedding-Based Semantics 32.0 34.2 (+6.8%) 33.3 33.3 (-0.2%) Word + Type-Embedding-Based Semantics 31.5 32.2 (+2.1%) 32.9 34.0 (+3.4%) Feature Matrix ASR Transcripts MLR MF-SLU MLR MF-SLU Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%) Word + Behavioral Patterns 53.9 55.7 (+3.3%) 56.6 57.7 (+1.9%) Intent prediction can benefit from both hidden information and low-level semantics. Experiments for Intent Prediction UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 68
  69. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work SLU Modeling Organized Domain Knowledge price=“cheap” target=“restaurant” intent=navigation SLU Model “can i have a cheap restaurant” UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 69
  70. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 70
  71. SLU in Human-Human Dialogues Computing devices have been easily accessible during regular human-human conversations. The dialogues include discussions for identifying speakers’ next actions. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 71 Is it possible to apply the techniques developed for human-machine dialogues to human-human dialogues?
  72. Will Vivian come here for the meeting? Find Calendar Entry Actionable Item Utterances UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 72
  73. Goal: provide actions an existing system can handle w/o interrupting conversations Assumption: some actions and associated arguments can be shared across genres Human-Machine Genre create_calendar_entry schedule a meeting with John this afternoon Human-Human Genre create_calendar_entry how about the three of us discuss this later this afternoon ?  more casual, include conversational terms Actionable Item Detection [ASRU’15] Task: multi-class utterance classification • train on the available human-machine genre • test on the human-human genre Adaptation • model adapation • embedding vector adaptation Chen et al., "Detecting Actionable Items in Meetings by Convolutional Deep Structred Semantic Models," in Proc. of ASRU, 2015. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 73 contact_name start_time contact_name start_time
  74. Action Item Detection Idea: human-machine interactions may help detect actionable items in human-human conversations 12 30 on Monday the 2nd of July schedule meeting to discuss taxes with Bill, Judy and Rick. 12 PM lunch with Jessie 12:00 meeting with cleaning staff to discuss progress 2 hour business presentation with Rick Sandy on March 8 at 7am create_calendar_entry A read receipt should be sent in Karen's email. Activate meeting delay by 15 minutes. Inform the participants. Add more message saying "Thanks". send_email Query Doc Convolutional deep structured semantic models for IR may be useful for this task. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 74
  75. 20K 20K 20K 1000 w1 w2 w3 1000 1000 20K wd 1000 300 Word Sequence: x Word Hashing Matrix: Wh Word Hashing Layer: lh Convolution Matrix: Wc Convolutional Layer: lc Max Pooling Operation Max Pooling Layer: lm Semantic Projection Matrix: Ws Semantic Layer: y max max max 300 300 300 300 U A1 A2 An CosSim(U, Ai) P(A1 | U) P(A2 | U) P(An | U) … Utterance Action how about we discuss this later (Huang et al., 2013; Shen et al., 2014) Convolutional Deep Structured Semantic Models (CDSSM) Huang et al., "Learning deep structured semantic models for web search using clickthrough data," in Proc. of CIKM, 2013. Shen et al., “Learning semantic representations using ´ convolutional neural networks for web search," in Proc. of WWW, 2014. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 75 maximizes the likelihood of associated actions given utterances
  76. Adaptation Issue: mismatch between a source genre and a target genre Solution: ◦ Adapting CDSSM ◦ Continually train the CDSSM using the data from the target genre ◦ Adapting Action Embeddings ◦ Moving learned action embeddings close to the observed corresponding utterance embeddings UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 76
  77. Adapting Action Embeddings The mismatch may result in inaccurate action embeddings Action1 Action2 Action3 utt utt utt utt utt utt utt • Action embedding: trained on the source genre • Utterance embedding: generated on the target genre Idea: moving action embeddings close to the observed utterance embeddings from the target genre UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 77
  78. Adapting Action Embeddings Learning adapted action embeddings by minimizing an objective: Action1 Action2 Action3 utt utt utt utt utt utt utt Action3 Action1 Action2 UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 78 the distance between original & new action embeddings the distance between new action vec & corresponding utterance vecs The actionable scores can be measured by the similarity between utterance embeddings and adapted action embeddings.
  79. Iterative Ontology Refinement Actionable information may help refine the induced domain ontology ◦ Intent: create_single_reminder ◦ Higher score  utterance with more core contents ◦ Lower score  less important utterance weighted frequency semantic coherence Ontology Induction Actionable Item Detection SLU Modeling The iterative framework can benefit understanding in both human-machine and human-human conversations. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 79 Utt1 0.7 Receiving Commerce_pay Utt2 0.2 Capability Unlabeled Collection
  80. Experiments of H-H SLU Experiment 1: Actionable Item Detection Dataset: 22 meetings from the ICSI meeting corpus (3 types of meeting: Bed, Bmr, Bro) Identified action: find_calendar_entry, create_calendar_entry, open_agenda, add_agenda_item, create_single_reminder, send_email, find_email, make_call, search, open_setting Annotating agreement: Cohen’s Kappa = 0.64~0.67 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bed Bmr Bro create_single_reminder find_calendar_entry search add_agenda_item create_calendar_entry open_agenda send_email find_email make_call open_setting UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 80 Data & Model Available at http://research.microsoft.com/en-us/projects/meetingunderstanding/
  81. Metrics: the average AUC for 10 actions+others Approach Mismatch-CDSSM Adapt-CDSSM Original Similarity 49.1 50.4 Similarity based on Adapted Embeddings 55.8 (+13.6%) 60.1 (+19.2%) Experiments of H-H SLU Experiment 1: Actionable Item Detection UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 81 trained on Cortana data then continually trained on meeting data model adaptation action embedding adaptation Two adaptation approaches are useful to overcome the genre mismatch.
  82. Experiments of H-H SLU Experiment 1: Actionable Item Detection Model AUC (%) Baseline N-gram (N=1,2,3) 52.84 Paragraph Vector (doc2vec) 59.79 Proposed CDSSM Adapted Vector 69.27 Baselines ◦ Lexical: ngram ◦ Semantic: paragraph vector (Le and Mikolov, 2014) Classifier: SVM with RBF The CDSSM semantic features outperform lexical n-grams and paragraph vectors, where about 70% actionable items in meetings can be detected. UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 82 Le and Mikolov, “Distributed Representations of Sentences and Documents," in Proc. of JMLR, 2014.
  83. Dataset: 155 conversations between customers and agents from the MetLife call center; 5,229 automatically segmented utterances (WER = 31.8%) Annotating agreement of actionable utterances (customer intents or agent actions): Cohen’s Kappa = 0.76 Reference Ontology ◦ Slot: frames selected by annotators ◦ Structure: slot pairs with dependency relations FrameNet Coverage: 79.5% ◦ #additional important concepts: 8 ◦ cancel, refund, delete, discount, benefit, person, status, care ◦ #reference slots: 31 Experiments of H-H SLU Experiment 2: Iterative Ontology Refinement UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 83
  84. Metric: AUC for evaluating the ranking lists about slot and slot pairs Approach ASR Transcripts Slot Structure Slot Structure Baseline: MLE 43.4 11.4 59.5 25.9 Proposed: External Word Vec 49.6 12.8 64.7 40.2 The proposed ontology induction significantly improves the baseline in terms of slot and structure performance for multi-domain dialogues. Experiments of H-H SLU Experiment 2: Iterative Ontology Refinement UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 84
  85. Metric: AUC for evaluating the ranking lists about slot and slot pairs Approach ASR Transcripts Slot Structure Slot Structure Baseline: MLE 43.4 11.4 59.5 25.9 + Actionable Score Proposed Oracle Proposed: External Word Vec 49.6 12.8 64.7 40.2 + Actionable Score Proposed Oracle  ground truth of actionable utterances (upper bound) Experiments of H-H SLU Experiment 2: Iterative Ontology Refinement UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 85  the estimation of actionable item detection The proposed ontology induction significantly improves the baseline in terms of slot and structure performance for multi-domain dialogues.
  86. Metric: AUC for evaluating the ranking lists about slot and slot pairs Approach ASR Transcripts Slot Structure Slot Structure Baseline: MLE 43.4 11.4 59.5 25.9 + Actionable Score Proposed 42.9 11.3 Oracle 44.3 12.2 Proposed: External Word Vec 49.6 12.8 64.7 40.2 + Actionable Score Proposed 49.2 12.8 Oracle 48.4 12.9 Actionable information does not significantly improve ASR results due to high WER. Experiments of H-H SLU Experiment 2: Iterative Ontology Refinement UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 86
  87. Metric: AUC for evaluating the ranking lists about slot and slot pairs Approach ASR Transcripts Slot Structure Slot Structure Baseline: MLE 43.4 11.4 59.5 25.9 + Actionable Score Proposed 42.9 11.3 59.8 26.6 Oracle 44.3 12.2 66.7 37.8 Proposed: External Word Vec 49.6 12.8 64.7 40.2 + Actionable Score Proposed 49.2 12.8 65.0 40.5 Oracle 48.4 12.9 82.4 56.9 Actionable information significantly improves the performance for transcripts. Experiments of H-H SLU Experiment 2: Iterative Ontology Refinement UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 87 The iterative ontology refinement is feasible, and it shows the potential room for improvement.
  88. Outline Introduction Ontology Induction [ ASRU’13, SLT’14a] Structure Learning [NAACL-HLT’15] Surface Form Derivation [SLT’14b] Semantic Decoding [ACL-IJCNLP’15] Intent Prediction [SLT’14c, ICMI’15] SLU in Human-Human Conversations [ASRU’15] Conclusions & Future Work UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 88
  89. Summary of Contributions  Ontology Induction  Semantic relations are useful.  Structure Learning  Dependency relations are useful.  Surface Form Derivation  Web-derived surface forms benefit SLU. Knowledge Acquisition  Semantic Decoding  The MF-SLU decodes semantics.  Intent Prediction  The feature-enriched MF-SLU predicts intents. SLU Modeling  CDSSM learns intent embeddings to detect actionable utterances, which may help ontology refinement as an iterative framework. SLU in Human-Human Conversations UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 89
  90. Conclusions The dissertation shows the feasibility and the potential for improving generalization, maintenance, efficiency, and scalability of SDSs, where the proposed techniques work for both human-machine and human-human conversations. The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies. The proposed MF-SLU unifies the automatically acquired knowledge, and then allows systems to consider implicit semantics for better understanding. ◦ Better semantic representations for individual utterances ◦ Better high-level intent prediction about follow-up behaviors UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 90
  91. Future Work Apply the proposed technology to domain discovery ◦ not covered by the current systems but users are interested in ◦ guide the next developed domains Improve the proposed approach by handling the uncertainty UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 91 SLU SLU Modeling ASR Knowledge Acquisition Active Learning for SLU - w/o labels: data selection, filter uncertain instance - w/ explicit labels: crowd-sourcing - w/ implicit labels: successful interactions implies the pseudo labels Topic Prediction for ASR Improvement - Lexicon expansion with potential OOVs - LM adaptation - Lattice rescoring recognition errors unreliable knowledge
  92. Take Home Message Big Data without annotations is available Main challenge: how to acquire and organize important knowledge, and further utilize it for applications UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS 92 Unsupervised or weakly-supervised methods will be the future trend!
  93. Q & A THANKS FOR YOUR ATTENTIONS!! THANKS TO MY COMMITTEE MEMBERS FOR THEIR HELPFUL FEEDBACK. 93UNSUPERVISED LEARNING & MODELING OF KNOWLEDGE & INTENT FOR SPOKEN DIALOGUE SYSTEMS

Editor's Notes

  1. Two slides to describe more, formalism for representing what people say (definition first -> example); set of slots; domain knowledge representation
  2. Two slides to describe more, formalism for representing what people say (definition first -> example); set of slots; domain knowledge representation
  3. New domain requires 1) hand-crafted ontology 2) slot representations for utterances (labelled info) 3) parser for mapping utterances into; labelling cost is too high; so I want to automate this proces
  4. How can find a restaurant
  5. SEMAFOR outputs set of frames; hopefully includes all domain slots; pick some slots from SEMAFOR
  6. What’s V(s_i)? One slide for freq, one for coherence
  7. What’s V(s_i)? One slide for freq, one for coherence
  8. What’s the weight for each edge?
  9. What’s the weight for each edge?
  10. Graph for all utterances
Advertisement