3. Dialog System
Natural Language
Understanding
• Domain identification
• User intent detection
• Slot Filling
Dialogue
Management
• Dialogue state tracking
• Dialogue policy optimization
Natural
Language
Generation
Backend
Knowledge
Providers
Speech
Recognition
Speech
Synthesis
Utterance
Response
Trigger
Semantic Frame
Ask_weather(date=weekends)
Text Input
“I will go out at weekends, what is the weather?”
System Action
Request_locationText Response
“Where will you go?/”Where you want to ask for the weather this weekends?”
6. A solution for a trigger word system
Convoluted Neural Network Recurrent Neural Network
Wave Sound Frequency Domain
Trigger
Word
Unknown
Output
Low level features
high level features
Patterns extraction Classifier
What’s else…
Hey Bot/Ok Bot!
8. Speech Recognition
Acoustic Dictionary
(Pronunciation model)
Pre-processing
Acoustic Features
Speech wave
Decoder Acoustic Model
WORD PRON (ipa)
v ə ˨˩ˀ
w e
vợ
quê
“Vợ tôi ở quê rất đẹp”
NGRAMs SCORE
Vợ
quê
2.5
0.7
Language Model
v
w
17. King
Queen
Man
Woman
Boy
Girl
(0.12, 0.23, 0.43)
(0.14, 0.57, 0.88)
(0.44, 0.90, 0.11)
(0.19, 0.23, 0.53)
(0.12, 0.65, 0.42)
(0.34, 0.44, 0.68)
EmbeddingWords
Word Embedding
(1, 0, 0, 0, 0, 0, 0)
(0, 1, 0, 0, 0, 0, 0)
(0, 0, 1, 0, 0, 0, 0)
)
(0, 0, 0, 0, 1, 0, 0)
(0, 0, 0, 0, 0, 1, 0)
One-hot Encoding Vector
10 0 1 0
0 0 0 2
4 0 7 0
0 5 0 12
Frequency Based Vector
Documents
1 2 3 4
1
2
3
4
Terms
Docs Vector
word Vector
word Vector
man
woman
queen
king Predication Based Vector
18. One-hot Encoding Vector
1 2 3 4 5 6 7 8
Co gai 1 0 0 0 0 0 0 0
hot girl 0 1 0 0 0 0 0 0
xinh dep 0 0 1 0 0 0 0 0
truoc day 0 0 0 1 0 0 0 0
la 0 0 0 0 1 0 0 0
mot 0 0 0 0 0 1 0 0
chang trai 0 0 0 0 0 0 1 0
dam my 0 0 0 0 0 0 0 1
each word
Gets a 1x 8
vector
representation
What’s wrong…
Corpus:
Co gai, hot girl,
xinh đep, truoc
day, la, mot,
chang trai, dam
my
19. Custom Encoding Vector
Corpus:
Co gai, hot girl,
xinh đep, truoc
day, la, mot,
chang trai, dam
my
nguoi ban chat
Thoi
gian
So
dem
Nu
tinh
Co gai 1 0 0 0 1
hot girl 0.7 1 0 0 0.7
xinh dep 0.6 1 0 0 0.5
truoc day 0 0 1 1 0
la 0 0 0 0 0
mot 0 0 0 1 0
chang trai 1 0 0 0 0
dam my 0.7 1 0 0 0
each word
Gets a 1x5
vector
representation
20. Custom Encoding Vector
each word
Gets a 1x5
vector
representation
And better
relationship
Corpus:
Co gai, hot girl,
xinh đep, truoc
day, la, mot,
chang trai, dam
my
nguoi ban chat
Thoi
gian
So
dem
Nu
tinh
Co gai 1 0 0 0 1
hot girl 0.7 1 0 0 0.7
xinh dep 0.6 1 0 0 0.5
truoc day 0 0 1 1 0
la 0 0 0 0 0
mot 0 0 0 1 0
chang trai 1 0 0 0 0
dam my 0.7 1 0 0 0
21. Count Vector
Let us understand this using a simple example.
• D1: He is a lazy boy. She is also lazy.
• D2: Neeraj is a lazy person.
Dictionary = [‘He’, ‘She’, ‘lazy’, ‘boy’, ‘Neeraj’, ‘person’]
D=2 (# docs), N=6 (# words in the dictionary)
He She lazy boy Neeraj person
D1 1 1 2 1 0 0
D2 0 0 1 0 1 1
Count Vector matrix M = DXN, vector (“lazy”) = [2, 1]
22. TF-IDF vectorization
TF = (Number of times term t appears in a document)/(Number of terms
in the document)
So, TF(This,Document1) = 1/8
TF(This, Document2)=1/5
DF = log(N/n), where, N is the number of documents and n is the number
of documents a term t has appeared in.
where N is the number of documents and n is the number of documents
a term t has appeared in.
So, IDF(This) = log(2/2) = 0.
Let us compute IDF for the word ‘Messi’.
IDF(Messi) = log(2/1) = 0.301.
Now, let us compare the TF-IDF for a common word ‘This’
and a word ‘Messi’ which seems to be of relevance to Document 1.
TF-IDF(This,Document1) = (1/8) * (0) = 0
TF-IDF(This, Document2) = (1/5) * (0) = 0
TF-IDF(Messi, Document1) = (4/8)*0.301 = 0.15
TF-IDF penalizes the word ‘This’
but assigns greater weight to
‘Messi’.
23. Co-Occurrence Matrix with a fixed context window
The big idea – Similar words tend to occur together and will have
similar context for example –
“Apple is a fruit. Mango is a fruit.”
Apple and mango tend to have a similar context i.e fruit.
Not preferred in practical
24. Prediction based Vector
• Continuous Bag of words & Skip-Grams model
https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
Input Weight matrix = a word vector
P(word|context) P(context|word)
25. Intent and Entities
• Intent = topic/domain
• Entities = keywords
“Go home to have the dinner”
Action ObjectLocation
Intent = “Home_activity”
28. Natural Language Generation
• Fix Response + slot filling + random from a pool
User: Do you know “I’m really quite something”?
Bot: “I’m really quite something” composed by Son Tung-MTP
• Using Neural Network and Language Model
Not recommended
31. Tips
• Script Writer
• Personality
• Control the dialogue
• API saves time
• Label Intent and Entities
• Design the flow
• Expandable
• Lots of testing
40. Use case 1: Health-care
Speech
Google
Virtual
Assistant
Text
(Rest API)
Analyzed
Text
Request
Text
NLP
Intent
Entities
Dialog
Management
Data
Data
Logical Functions
Emotion
Detection
Health-Care
System
Recommendation
System
Data cleaning: transform special characters, number, date, times, into words. Could use Hidden Markov Model or Neural Network to convert because of some ambiguous cases such as spelling years, numbers. 1984. Especially in English, some words are written the same but spelling different.
Segmentation, Fragmentation: from text to words,
Phone dictionary: contain a reference of word and phone