Statistical Learning from Dialogues for Intelligent Assistants
1. DR. YUN-NUNG (VIVIAN) CHEN H T T P : / / V I V I A N C H E N . I D V.T W
Statistical Learning from Dialogues for Intelligence Assistants
Sorry, I didn’t get that!
1"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
2. My Background
Yun-Nung (Vivian) Chen 陳縕儂 http://vivianchen.idv.tw
National Taiwan
University
2009
B.S.
2005
Freshman
2011
M.S.
2015
Ph.D.
Carnegie Mellon
University
spoken dialogue system
language understanding
user modeling
speech summarization
key term extraction
spoken term detection
Microsoft
Research
2016
Postdoc
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 2
3. Outline
Intelligent Assistant
◦ What are they?
◦ Why do we need them?
◦ Why do companies care?
Reactive Assistant – Spoken Dialogue System (SDS)
◦ Pipeline Architecture
◦ Current Challenges & Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions & Future Work
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 3
4. Outline
Intelligent Assistant
◦ What are they?
◦ Why do we need them?
◦ Why do companies care?
Reactive Assistant – Spoken Dialogue System (SDS)
◦ Pipeline Architecture
◦ Current Challenges & Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions & Future Work
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 4
5. Apple Siri
(2011)
Google Now
(2012)
Microsoft Cortana
(2014)
Amazon Alexa/Echo
(2014)
https://www.apple.com/ios/siri/
https://www.google.com/landing/now/
http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortana
http://www.amazon.com/oc/echo/
Facebook M
(2015)
What are Intelligent Assistants?
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 5
6. Why do we need them?
Daily Life Usage
◦ Weather
◦ Schedule
◦ Transportation
◦ Restaurant Seeking
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 6
7. Why do we need them?
◦ Get things done
◦ E.g. set up alarm/reminder, take note
◦ Easy access to structured data, services and apps
◦ E.g. find docs/photos/restaurants
◦ Assist your daily schedule and routine
◦ E.g. commute alerts to/from work
◦ Be more productive in managing your work and personal life
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 7
8. Why do companies care?
Global Digital Statistics (2015 January)
Global Population
7.21B
Active Internet Users
3.01B
Active Social
Media Accounts
2.08B
Active Unique
Mobile Users
3.65B
The more natural and convenient input of the devices evolves towards speech.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 8
9. Personal Intelligent Architecture
Reactive
Assistance
ASR, LU, Dialog, LG, TTS
Proactive
Assistance
Inferences, User
Modeling, Suggestions
Data
Back-end Data
Bases, Services and
Client Signals
Device/Service End-points
(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience
“restaurant suggestions”“call taxi”
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 9
10. Personal Intelligent Architecture
Reactive
Assistance
ASR, LU, Dialog, LG, TTS
Proactive
Assistance
Inferences, User
Modeling, Suggestions
Data
Back-end Data
Bases, Services and
Client Signals
Device/Service End-points
(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience
“call taxi”
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 10
11. Outline
Intelligent Assistant
◦ What are they?
◦ Why do we need them?
◦ Why do companies care?
Reactive Assistant – Spoken Dialogue System (SDS)
◦ Pipeline Architecture
◦ Current Challenges & Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions & Future Work
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 11
12. Spoken dialogue systems are intelligent agents that are able to help
users finish tasks more efficiently via spoken interactions.
Spoken dialogue systems are being incorporated into various devices
(smart-phones, smart TVs, in-car navigating system, etc).
Good SDSs assist users to organize and access information conveniently.
Spoken Dialogue System (SDS)
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 12
13. Baymax is capable of maintaining a good spoken dialogue system and learning new
knowledge for better understanding and interacting with people.
What is Baymax’s intelligence?
Big Hero 6 -- Video content owned and licensed by Disney Entertainment, Marvel Entertainment, LLC, etc
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 13
14. ASR: Automatic Speech Recognition
SLU: Spoken Language Understanding
DM: Dialogue Management
NLG: Natural Language Generation
SDS Architecture
DomainDMASR SLU
NLG
current
bottleneck
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 14
15. Interaction Example
User
Intelligent
Agent Q: How does a dialogue system process this request?
Cheap Taiwanese eating places include Din Tai
Fung, Boiling Point, etc. What do you want to
choose? I can help you go there.
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 15
16. SDS Process – Available Domain Ontology
find a cheap eating place for taiwanese food
User
target
foodprice
AMOD
NN
seeking
PREP_FOR
Organized Domain Knowledge
Intelligent
Agent
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 16
17. SDS Process – Available Domain Ontology
User
target
foodprice
AMOD
NN
seeking
PREP_FOR
Organized Domain Knowledge
Intelligent
Agent
Ontology Induction
(semantic slot)
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 17
18. SDS Process – Available Domain Ontology
User
target
foodprice
AMOD
NN
seeking
PREP_FOR
Organized Domain Knowledge
Intelligent
Agent
Ontology Induction
(semantic slot)
Structure Learning
(inter-slot relation)
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 18
19. SDS Process – Spoken Language Understanding (SLU)
User
target
foodprice
AMOD
NN
seeking
PREP_FOR
Intelligent
Agent
seeking=“find”
target=“eating place”
price=“cheap”
food=“taiwanese”
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 19
20. find a cheap eating place for taiwanese food
SDS Process – Spoken Language Understanding (SLU)
User
target
foodprice
AMOD
NN
seeking
PREP_FOR
Intelligent
Agent
seeking=“find”
target=“eating place”
price=“cheap”
food=“taiwanese”
Semantic Decoding
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 20
21. find a cheap eating place for taiwanese food
SDS Process – Dialogue Management (DM)
User
target
foodprice
AMOD
NN
seeking
PREP_FOR
SELECT restaurant {
restaurant.price=“cheap”
restaurant.food=“taiwanese”
}Intelligent
Agent
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 21
22. find a cheap eating place for taiwanese food
SDS Process – Dialogue Management (DM)
User
target
foodprice
AMOD
NN
seeking
PREP_FOR
SELECT restaurant {
restaurant.price=“cheap”
restaurant.food=“taiwanese”
}Intelligent
Agent
Surface Form Derivation
(natural language)
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 22
23. SDS Process – Dialogue Management (DM)
User
SELECT restaurant {
restaurant.price=“cheap”
restaurant.food=“taiwanese”
}
Din Tai Fung
Boiling Point
:
:
Predicted intent: navigation
Intelligent
Agent
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 23
24. SDS Process – Dialogue Management (DM)
User
SELECT restaurant {
restaurant.price=“cheap”
restaurant.food=“taiwanese”
}
Din Tai Fung
Boiling Point
:
:
Predicted intent: navigation
Intelligent
Agent
Intent Prediction
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 24
25. SDS Process – Natural Language Generation (NLG)
User
Intelligent
Agent
Cheap Taiwanese eating places include Din Tai
Fung, Boiling Point, etc. What do you want to
choose? I can help you go there. (navigation)
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 25
26. Required Knowledge
target
foodprice
AMOD
NN
seeking
PREP_FOR
SELECT restaurant {
restaurant.price=“cheap”
restaurant.food=“taiwanese”
}
Predicted intent: navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 26
27. Challenges for SDS
An SDS in a new domain requires
1) A hand-crafted domain ontology
2) Utterances labelled with semantic representations
3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost, long duration and poor scalability of
system development.
The goal is to enable an SDS to
1) automatically infer domain knowledge and then to
2) create the data for SLU modeling
in order to handle the open-domain requests.
seeking=“find”
target=“eating place”
price=“cheap”
food=“asian food”
find a cheap eating
place for asian food
fully unsupervised
Prior
Focus
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 27
29. Contributions
User
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 29
30. Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
Contributions
User
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 30
31. Knowledge Acquisition
1) Given unlabelled conversations, how can a system automatically
induce and organize domain-specific concepts?
Restaurant
Asking
Conversations
target
food
price
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMOD
Organized Domain
Knowledge
Unlabelled
Collection
Knowledge
Acquisition
Knowledge Acquisition Ontology Induction
Structure Learning
Surface Form Derivation
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 31
32. SLU Modeling
2) With the automatically acquired knowledge, how can a system
understand utterance semantics and user intents?
Organized
Domain
Knowledge
price=“cheap”
target=“restaurant”
intent=navigation
SLU Modeling
SLU Component
“can i have a cheap restaurant”
SLU Modeling
Semantic Decoding
Intent Prediction
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 32
33. SDS Architecture – Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current
bottleneck
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 33
35. SDS Flowchart – Semantic Decoding
Ontology
Induction
Structure
Learning
Semantic
Decoding
Intent
Prediction
Knowledge Acquisition
SLU Modeling
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 35
36. Outline
Intelligent Assistant
◦ What are they?
◦ Why do we need them?
◦ Why do companies care?
Reactive Assistant – Spoken Dialogue System (SDS)
◦ Pipeline Architecture
◦ Current Challenges & Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions & Future Work
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 36
37. Semantic Decoding [ACL-IJCNLP’15]
Input: user utterances
Output: semantic concepts included in each individual utterance
Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015.
SLU Model
target=“restaurant”
price=“cheap”
“can I have a cheap restaurant”
Frame-Semantic Parsing
Unlabeled
Collection
Semantic KG
Ontology Induction
Fw Fs
Feature Model
Rw
Rs
Knowledge Graph
Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure
Learning
×
Semantic KG
MF-SLU: SLU Modeling by Matrix Factorization
Semantic Representation
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 37
38. [Baker et al. 1998; Das et al., 2014]
Frame-Semantic Parsing
FrameNet [Baker et al., 1998]
◦ a linguistically semantic resource, based on the frame-semantics theory
◦ words/phrases can be represented as frames
◦ “low fat milk” “milk” evokes the “food” frame;
“low fat” fills the descriptor frame element
SEMAFOR [Das et al., 2014]
◦ a state-of-the-art frame-semantics parser, trained on manually annotated
FrameNet sentences
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 38
39. Ontology Induction [ASRU’13, SLT’14a]
can i have a cheap restaurant
Frame: capability
Frame: expensiveness
Frame: locale by use
1st Issue: differentiate domain-specific frames from generic frames for SDSs
Good!
Good!
?
Das et al., " Frame-semantic parsing," in Proc. of Computational Linguistics, 2014.
slot candidate
Best Student Paper Award
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 39
40. 1
Utterance 1
i would like a cheap restaurant
Train
………
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese food
Utterance 2
1 1
food
1 1
1
Test
1 .97 .95
Frame Semantic Parsing
show me a list of cheap restaurants
Test Utterance
Word Observation Slot Candidate
Ontology Induction [ASRU’13, SLT’14a]
Best Student Paper Award
Idea: increase weights of domain-specific slots and decrease weights of others
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 40
41. 1st Issue: How to adapt generic slots to a domain-specific setting?
Knowledge Graph Propagation Model
Assumption: domain-specific words/slots have more dependencies to each other
Word Relation Model Slot Relation Model
word
relation
matrix
slot
relation
matrix
×
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1
Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors
in the knowledge graph, so that domain-specific words/slots have
higher scores after matrix multiplication.
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1
i would like a cheap restaurant
……
find a restaurant with chinese food
Utterance 2
show me a list of cheap restaurants
Test Utterance
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 41
42. Semantic Decoding [ACL-IJCNLP’15]
Input: user utterances
Output: semantic concepts included in each individual utterance
Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015.
SLU Model
target=“restaurant”
price=“cheap”
“can I have a cheap restaurant”
Frame-Semantic Parsing
Unlabeled
Collection
Semantic KG
Ontology Induction
Fw Fs
Feature Model
Rw
Rs
Knowledge Graph
Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure
Learning
×
Semantic KG
MF-SLU: SLU Modeling by Matrix Factorization
Semantic Representation
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 42
43. Knowledge Graph Construction
Syntactic dependency parsing on utterances
ccomp
amod
dobjnsubj det
can i have a cheap restaurant
capability expensiveness locale_by_use
Word-based lexical
knowledge graph
Slot-based semantic
knowledge graph
restaurant
can
have
i
a
cheap
w
w
capability
locale_by_use expensiveness
s
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 43
44. Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight Measurement
Slot/Word Embeddings Training (Levy and Goldberg, 2014)
can = [0.8 … 0.24]
have = [0.3 … 0.21]
:
:
expensiveness = [0.12 … 0.7]
capability = [0.3 … 0.6]
:
:
can i have a cheap restaurant
ccomp
amod
dobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amod
dobjnsubj det
Levy and Goldberg, " Dependency-Based Word Embeddings," in Proc. of ACL, 2014.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 44
45. Edge Weight Measurement
Compute edge weights to represent relation importance
◦ Slot-to-slot semantic relation 𝑅 𝑠
𝑆
: similarity between slot embeddings
◦ Slot-to-slot dependency relation 𝑅 𝑠
𝐷
: dependency score between slot embeddings
◦ Word-to-word semantic relation 𝑅 𝑤
𝑆
: similarity between word embeddings
◦ Word-to-word dependency relation 𝑅 𝑤
𝐷
: dependency score between word embeddings
𝑅 𝑤
𝑆𝐷
= 𝑅 𝑤
𝑆
+𝑅 𝑤
𝐷
𝑅 𝑠
𝑆𝐷 = 𝑅 𝑠
𝑆+𝑅 𝑠
𝐷
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 45
46. Word Relation Model Slot Relation Model
word
relation
matrix
slot
relation
matrix
×
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1
Test
1
1
Slot Induction
Knowledge Graph Propagation Model
𝑅 𝑤
𝑆𝐷
𝑅 𝑠
𝑆𝐷
Structure information is integrated to make the self-training data more reliable.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 46
47. Ontology Induction
SLU
Fw Fs
Structure
Learning
×
1
Utterance 1
i would like a cheap restaurant
Word Observation Slot Candidate
Train
…
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese food
Utterance 2
1 1
food
1 1
1
Test
1 .97.90 .95.85
Ontology
Induction
show me a list of cheap restaurants
Test Utterance hidden semantics
2nd Issue: unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLP’15]
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 47
48. Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word
relation
matrix
slot
relation
matrix
×
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1
Test
1
1
.97.90 .95.85
.93 .92.98.05 .05
Slot Induction
Feature Model +
Knowledge Graph Propagation Model
𝑅 𝑤
𝑆𝐷
𝑅 𝑠
𝑆𝐷
Idea: MF completes a partially-missing matrix based on a low-rank latent semantics
assumption, which is able to model hidden semantics and more robust to noisy data.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 48
49. 2nd Issue: How to model the unobserved hidden semantics?
Matrix Factorization (MF) (Rendle et al., 2009)
The decomposed matrices represent latent semantics for utterances
and words/slots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1
Test
1
1
.97.90 .95.85
.93 .92.98.05 .05
𝑼
𝑾 + 𝑺
≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑺×
Rendle et al., “BPR: Bayesian Personalized Ranking from Implicit Feedback," in Proc. of UAI, 2009.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 49
50. Bayesian Personalized Ranking for MF
Model implicit feedback
◦ not treat unobserved facts as negative samples (true or false)
◦ give observed facts higher scores than unobserved facts
Objective:
1
𝑓+
𝑓−
𝑓−
The objective is to learn a set of well-ranked semantic slots per utterance.
𝑢
𝑥
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 50
51. Ontology Induction
SLU
Fw Fs
Structure
Learning
×
1
Utterance 1
i would like a cheap restaurant
Word Observation Slot Candidate
Train
…
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese food
Utterance 2
1 1
food
1 1
1
Test
1 .97.90 .95.85
Ontology
Induction
show me a list of cheap restaurants
Test Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 51
52. Semantic Decoding [ACL-IJCNLP’15]
Input: user utterances
Output: semantic concepts included in each individual utterance
Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015.
SLU Model
target=“restaurant”
price=“cheap”
“can I have a cheap restaurant”
Frame-Semantic Parsing
Unlabeled
Collection
Semantic KG
Ontology Induction
Fw Fs
Feature Model
Rw
Rs
Knowledge Graph
Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure
Learning
×
Semantic KG
MF-SLU: SLU Modeling by Matrix Factorization
Semantic Representation
Idea: utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 52
53. Experimental Setup
Dataset: Cambridge University SLU Corpus
◦ Restaurant recommendation (WER = 37%)
◦ 2,166 dialogues
◦ 15,453 utterances
◦ dialogue slot:
addr, area, food, name,
phone, postcode,
price range, task, type
Metric: MAP of all estimated slot probabilities over all utterances
The mapping table between induced and reference slots
Henderson et al., "Discriminative spoken language understanding using word confusion networks," in Proc. of SLT, 2012.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 53
54. Experiments of Semantic Decoding
Quality of Semantics Estimation
Dataset: Cambridge University SLU Corpus
Metric: MAP of all estimated slot probabilities for all utterances
Approach ASR Transcripts
Baseline:
SLU
Support Vector Machine 32.5 36.6
Multinomial Logistic Regression 34.0 38.8
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 54
55. Experiments of Semantic Decoding
Quality of Semantics Estimation
Dataset: Cambridge University SLU Corpus
Metric: MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics.
The structure information further improves the results.
Approach ASR Transcripts
Baseline:
SLU
Support Vector Machine 32.5 36.6
Multinomial Logistic Regression 34.0 38.8
Proposed:
MF-SLU
Feature Model 37.6* 45.3*
Feature Model +
Knowledge Graph Propagation
43.5*
(+27.9%)
53.4*
(+37.6%)
*: the result is significantly better than the MLR with p < 0.05 in t-test
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 55
56. Experiments of Semantic Decoding
Effectiveness of Relations
Dataset: Cambridge University SLU Corpus
Metric: MAP of all estimated slot probabilities for all utterances
In the integrated structure information, both semantic and dependency
relations are useful for understanding.
Approach ASR Transcripts
Feature Model 37.6 45.3
Feature + Knowledge
Graph Propagation
Semantic 41.4* 51.6*
Dependency 41.6* 49.0*
All 43.5* (+15.7%) 53.4* (+17.9%)
*: the result is significantly better than the MLR with p < 0.05 in t-test
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 56
57. Experiments for Structure Learning
Relation Discovery Analysis
Discover inter-slot relations
connecting important slot pairs
The reference ontology with the most
frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMOD
desiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
task
area
PREP_IN
The automatically learned domain ontology aligns well with the reference one.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective.
58. Contributions of Semantic Decoding
Ontology
Induction
Structure
Learning
Semantic
Decoding
Intent
Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and
Structure Learning enable
systems to automatically
acquire open domain
knowledge.
MF-SLU for Semantic
Decoding is able to
1) unify the automatically
acquired knowledge
2) adapt to a domain-
specific setting
3) and then allows
systems to model
implicit semantics for
better understanding.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 58
59. Low- and High-Level Understanding
Semantic concepts for individual utterances do not consider high-level
semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=“cheap”
target=“restaurant”
SLU Model
“can i have a cheap restaurant”
intent=navigation
restaurant=“legume”
time=“tonight”
SLU Model
“i plan to dine in legume tonight”
intent=reservation
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 59
61. Outline
Intelligent Assistant
◦ What are they?
◦ Why do we need them?
◦ Why do companies care?
Reactive Assistant – Spoken Dialogue System (SDS)
◦ Pipeline Architecture
◦ Current Challenges & Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions & Future Work
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 61
62. [Chen & Rudnicky, SLT 2014; Chen et al., ICMI 2015]
Input: spoken utterances for
making requests about launching
an app
Output: the apps supporting the
required functionality
Intent Identification
◦ popular domains in Google Play
please dial a phone call to alex
Skype, Hangout, etc.
Intent Prediction of Mobile Apps [SLT’14c]
Chen and Rudnicky, "Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings," in Proc. of SLT, 2014.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 62
63. Input: single-turn request
Output: apps that are able to support the required functionality
Intent Prediction– Single-Turn Request
1
Enriched
Semantics
communication
.90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
……
contact message Gmail Outlook Skypeemail
Test
.90
Reasoning with Feature-Enriched MF
Train
… your email, calendar, contacts…
… check and send emails, msgs …
Outlook
Gmail
IR for app
candidates
App Desc
Self-Train
Utterance
Test
Utterance
1
1
1
1
1
1
1
1 1
1
1 .90 .85 .97 .95
Feature
Enrichment
Utterance 1 i would like to contact alex
…
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically
inferred semantics to predict high-level intents.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 63
64. Intent Prediction– Multi-Turn Interaction [ICMI’15]
Input: multi-turn interaction
Output: apps the user plans to launch
Challenge: language ambiguity
1) User preference
2) App-level contexts
Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/.
send to vivian
v.s.
Email? Message?
Communication
Idea: Behavioral patterns in history can help intent prediction.
previous
turn
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 64
65. Intent Prediction– Multi-Turn Interaction [ICMI’15]
Input: multi-turn interaction
Output: apps the user plans to launch
1
Lexical Intended App
photo check camera IMtell
take this photo
tell vivian this is me in the lab
CAMERA
IM
Train
Dialogue
check my grades on website
send an email to professor
…
CHROME
EMAIL
send
Behavior
History
null camera
.85
take a photo of this
send it to alice
CAMERA
IM
…
email
1
1
1 1
1
1 .70
chrome
1
1
1
1
1
1
chrome email
1
1
1
1
.95
.80 .55
User Utterance
Intended
App
Reasoning with Feature-Enriched MF
Test
Dialogue
take a photo of this
send it to alice
…
Chen et al., "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015. Data Available at http://AppDialogue.com/.
The feature-enriched MF-SLU leverages behavioral patterns to model contextual
information and user preference for better intent prediction.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 65
66. Single-Turn Request: Mean Average Precision (MAP)
Multi-Turn Interaction: Mean Average Precision (MAP)
Feature Matrix
ASR Transcripts
LM MF-SLU LM MF-SLU
Word Observation 25.1 26.1
Feature Matrix
ASR Transcripts
MLR MF-SLU MLR MF-SLU
Word Observation 52.1 55.5
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression
(supervised)
Experiments for Intent Prediction
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 66
67. Single-Turn Request: Mean Average Precision (MAP)
Multi-Turn Interaction: Mean Average Precision (MAP)
Feature Matrix
ASR Transcripts
LM MF-SLU LM MF-SLU
Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%)
Feature Matrix
ASR Transcripts
MLR MF-SLU MLR MF-SLU
Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%)
Modeling hidden semantics helps intent prediction especially for noisy data.
Experiments for Intent Prediction
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 67
68. Single-Turn Request: Mean Average Precision (MAP)
Multi-Turn Interaction: Mean Average Precision (MAP)
Feature Matrix
ASR Transcripts
LM MF-SLU LM MF-SLU
Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%)
Word + Embedding-Based Semantics 32.0 33.3
Word + Type-Embedding-Based Semantics 31.5 32.9
Feature Matrix
ASR Transcripts
MLR MF-SLU MLR MF-SLU
Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%)
Word + Behavioral Patterns 53.9 56.6
Semantic enrichment provides rich cues to improve performance.
Experiments for Intent Prediction
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 68
69. Single-Turn Request: Mean Average Precision (MAP)
Multi-Turn Interaction: Mean Average Precision (MAP)
Feature Matrix
ASR Transcripts
LM MF-SLU LM MF-SLU
Word Observation 25.1 29.2 (+16.2%) 26.1 30.4 (+16.4%)
Word + Embedding-Based Semantics 32.0 34.2 (+6.8%) 33.3 33.3 (-0.2%)
Word + Type-Embedding-Based Semantics 31.5 32.2 (+2.1%) 32.9 34.0 (+3.4%)
Feature Matrix
ASR Transcripts
MLR MF-SLU MLR MF-SLU
Word Observation 52.1 52.7 (+1.2%) 55.5 55.4 (-0.2%)
Word + Behavioral Patterns 53.9 55.7 (+3.3%) 56.6 57.7 (+1.9%)
Intent prediction can benefit from both hidden information and low-level semantics.
Experiments for Intent Prediction
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 69
71. Personal Intelligent Architecture
Reactive
Assistance
ASR, LU, Dialog, LG, TTS
Proactive
Assistance
Inferences, User
Modeling, Suggestions
Data
Back-end Data
Bases, Services and
Client Signals
Device/Service End-points
(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience
“call taxi”
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 71
72. Outline
Intelligent Assistant
◦ What are they?
◦ Why do we need them?
◦ Why do companies care?
Reactive Assistant – Spoken Dialogue System (SDS)
◦ Pipeline Architecture
◦ Current Challenges & Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions & Future Work
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 72
73. Conclusions
The work shows the feasibility and the potential for improving generalization,
maintenance, efficiency, and scalability of SDSs.
The proposed knowledge acquisition procedure enables systems to
automatically produce domain-specific ontologies.
The proposed MF-SLU unifies the automatically acquired knowledge, and
then allows systems to consider implicit semantics for better understanding.
◦ Better semantic representations for individual utterances
◦ Better high-level intent prediction about follow-up behaviors
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 73
74. Future Work
Apply the proposed technology to domain discovery
◦ not covered by the current systems but users are interested in
◦ guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLU
SLU
Modeling
ASR
Knowledge
Acquisition
recognition
errors
unreliable
knowledge
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 74
75. d d d
U S1 S2
P(S1 | U) P(S2 | U)
…
Semantic Relation
Posterior Probability
Utterance
Slot Candidate
…
w1 w2 wd
Word Sequence: x
Word Vector: lw
Pooling Operation
R(U, S1) R(U, S2)
Knowledge Graph Propagation Matrix: Wp
Semantic Projection Matrix: Ws
Semantic Layer: y
Knowledge Graph Propagation Layer: lp
d
Sn
P(Sn | U)
Utterance Vector: lf
…
R(U, Sn)
Slot Vector: lf
Convolution Matrix: Wc
Convolutional Layer: lc
Towards Unsupervised Deep Learning
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 75
Treating MF as a one-layer neural
net, we can add more layers in
the model towards unsupervised
deep learning.
76. Take Home Message
Available big data w/o annotations
Challenge: how to acquire and
organize important knowledge, and
further utilize it for applications
Language understanding for AI
◦ language action
◦ understand voice to control music, lights, etc.
◦ teach to let friends in by face recognition, etc.
"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 76
Unsupervised or weakly-supervised
methods will be the future trend!
Deep language understanding is an
emerging field!
77. Q & A
THANKS FOR YOUR ATTENTIONS!!
• Chen et al., "Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing," in Proc. of ASRU, 2013. (Best Student Paper
Award)
• Chen et al., "Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding," in Proc. of NAACL-HLT, 2015.
• Chen et al., "Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding," in Proc. of ACL-IJCNLP, 2015.
• Chen et al., “Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings,” in Proc. of SLT, 2014.
• Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, 2015.
• Chen et al., “Matrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling," in Extended Abstract of NIPS-SLU, 2015.
• Chen et al., “Unsupervised User Intent Modeling by Feature-Enriched Matrix Factorization," in Proc. of ICASSP, 2016.
77"SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Editor's Notes
How can find a restaurant
Domain knowledge representation (graph)
Domain knowledge representation (graph)
Domain knowledge representation (graph)
Map lexical unit into concepts
Map lexical unit into concepts
What domain does the user wants? Mapp to domain; slot -> SQL
What domain does the user wants? Mapp to domain; slot -> SQL
Execute the action and predict behavior
Execute the action and predict behavior
I can help you to get there
New domain requires 1) hand-crafted ontology 2) slot representations for utterances (labelled info) 3) parser for mapping utterances into; labelling cost is too high; so I want to automate this proces
SEMAFOR outputs set of frames; hopefully includes all domain slots; pick some slots from SEMAFOR