[系列活動] 一天搞懂對話機器人

2
Future Life – Artificial Intelligence
2
 Iron Man’s JARVIS
https://www.facebook.com/zuck/videos/10103351034741311/

3
Tutorial Outline
I. 對話系統及基本背景知識
II. 語言理解 (Language Understanding)
III. 對話管理 (Dialogue Management)
IV. 語言生成 (Language Generation)
V. 對話系統評估、發展及趨勢
3

4
Tutorial Outline
4

5
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
5

7
Early 1990s
Early 2000s
2017
Multi-modal systems
e.g., Microsoft MiPad, Pocket PC
Keyword Spotting
(e.g., AT&T)
System: “Please say collect,
calling card, person, third
number, or operator”
TV Voice Search
e.g., Bing on Xbox
Intent Determination
(Nuance’s Emily™, AT&T HMIHY)
User: “Uh…we want to move…we
want to change our phone line
from this house to another house”
Task-specific argument extraction
(e.g., Nuance, SpeechWorks)
User: “I want to fly from Boston
to New York next week.”
Brief History of Dialogue Systems
Apple Siri
(2011)
Google Now (2012)
Facebook M & Bot
(2015)
Google Home
(2016)
Microsoft Cortana
(2014)
Amazon Alexa/Echo
(2014)
Google Assistant
(2016)
DARPA
CALO Project
Virtual Personal Assistants

8
Language Empowering Intelligent Assistant
Apple Siri (2011) Google Now (2012)
Facebook M & Bot (2015) Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)

9
Why We Need?
 Daily Life Usage
 Weather
 Schedule
 Transportation
 Restaurant Seeking
9

10
Why We Need?
 Get things done
 E.g. set up alarm/reminder, take note
 Easy access to structured data, services and apps
 E.g. find docs/photos/restaurants
 Assist your daily schedule and routine
 E.g. commute alerts to/from work
 Be more productive in managing your work and personal life
10

11
Why Natural Language?
 Global Digital Statistics (2015 January)
11
Global Population
7.21B
Active Internet Users
3.01B
Active Social
Media Accounts
2.08B
Active Unique
Mobile Users
3.65B
The more natural and convenient input of devices evolves towards speech.

12
Intelligent Assistant Architecture
12
Reactive
Assistance
ASR, LU, Dialog, LG, TTS
Proactive
Assistance
Inferences, User
Modeling, Suggestions
Data
Back-end Data
Bases, Services and
Client Signals
Device/Service End-points
(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience
“restaurant suggestions”“call taxi”

13
Spoken Dialogue System (SDS)
 Spoken dialogue systems are intelligent agents that are able to help users
finish tasks more efficiently via spoken interactions.
 Spoken dialogue systems are being incorporated into various devices
(smart-phones, smart TVs, in-car navigating system, etc).
13
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
Good dialogue systems assist users to access information conveniently
and finish tasks efficiently.

14
App  Bot
 A bot is responsible for a “single” domain, similar to an app
14
Seamless and automatic information transferring across domains
 reduce duplicate information and interaction
Food
MAP
LINE
Goal: Schedule lunch with Vivian
KKBOX

15
GUI v.s. CUI (Conversational UI)
15
https://github.com/enginebai/Movie-lol-android

16
GUI v.s. CUI (Conversational UI)
網站/APP內GUI 即時通訊內的CUI
情境探索式，使用者沒有特定目標搜尋式，使用者有明確的指令
資訊量多少
資訊精準度低高
資訊呈現結構式非結構式
介面呈現以圖像為主以文字為主
介面操作以點選為主以文字或語音輸入為主
學習與引導
使用者需了解、學習、適應不同
的介面操作
如同使用即時通訊軟體，使用者
無需另行學習，只需遵循引導
服務入口需另行下載App或進入特定網站
與即時通訊軟體整合，可適時在
對話裡提供服務
服務內容需有明確架構供檢索可接受龐雜、彈性的內容
人性化程度低，如同操作機器高，如同與人對話 16

17
ChatBot Ecosystem & Company
17

Two Branches of Bots
 Personal assistant, helps
users achieve a certain task
 Combination of rules and
statistical components
 POMDP for spoken dialog systems
(Williams and Young, 2007)
 End-to-end trainable task-oriented
dialogue system (Wen et al., 2016)
 End-to-end reinforcement learning
dialogue system (Li et al., 2017; Zhao
and Eskenazi, 2016)
 No specific goal, focus on natural
responses
 Using variants of seq2seq model
 A neural conversation model (Vinyals and
Le, 2015)
 Reinforcement learning for dialogue
generation (Li et al., 2016)
 Conversational contextual cues for
response ranking (AI-Rfou et al., 2016)
18
Task-Oriented Bot Chit-Chat Bot

19
Task-Oriented Dialogue System (Young, 2000)
19
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short

20
Interaction Example
20
User
Intelligent
Agent Q: How does a dialogue system process this
request?
Good Taiwanese eating places include Din Tai
Fung, Boiling Point, etc. What do you want to
choose? I can help you go there.
find a good eating place for taiwanese food

21
21
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers

22
1. Domain Identification
Requires Predefined Domain Ontology
22
User
Organized Domain Knowledge (Database)Intelligent
Agent
Restaurant DB Taxi DB Movie DB
Classification!

23
2. Intent Detection
Requires Predefined Schema
23
User
Intelligent
Agent
Restaurant DB
FIND_RESTAURANT
FIND_PRICE
FIND_TYPE
:
Classification!

24
3. Slot Filling
Requires Predefined Schema
User
Intelligent
Agent
24
Restaurant DB
Restaurant Rating Type
Rest 1 good Taiwanese
Rest 2 bad Thai
: : :
FIND_RESTAURANT
rating=“good”
type=“taiwanese”
SELECT restaurant {
rest.rating=“good”
rest.type=“taiwanese”
}Semantic Frame Sequence Labeling
O O B-rating O O O B-type O

25
25
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers

26
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
26
location
ratin
g
typ
e
loc, rating
rating,
type
loc,
type
all
i want it near to my office
NULL

27
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
27
location
ratin
g
typ
e
loc, rating
rating,
type
loc,
type
all
i want it near to my office
NULL

28
State Tracking
Handling Errors and Confidence
User
Intelligent
Agent
find a good eating place for taixxxx food
28
FIND_RESTAURANT
rating=“good”
type=“taiwanese”
FIND_RESTAURANT
rating=“good”
type=“thai”
FIND_RESTAURAN
T
rating=“good”
location
ratin
g
typ
e
loc, rating
rating,
type
loc,
type
all
NULL
?
?

29
Policy for Agent Action & Generation
 Inform
 “The nearest one is at Taipei 101”
 Request
 “Where is your home?”
 Confirm
 “Did you want Taiwanese food?”
 Database Search
 Task Completion / Information Display
 ticket booked, weather information
29
Din Tai Fung
:
:

Background Knowledge30
Neural Network Basics
Reinforcement Learning

31
Outline
 Introduction
31

32
What Computers Can Do?
32
Programs can do the things you ask them to do

33
Program for Solving Tasks
 Task: predicting positive or negative given a product review
33
“I love this product!” “It’s a little expensive.”“It claims too much.”
+ ?-
“台灣第一波上市!” “樓下買了我才考慮
”
“規格好雞肋…”
推 ?噓
program.py
Some tasks are complex, and we don’t know how to write a program to solve
them.
if input contains “love”, “like”, etc.
output = positive
if input contains “too much”, “bad”, etc.
output = negative
program.py program.py
program.py program.py program.py

34
Learning ≈ Looking for a Function
 Task: predicting positive or negative given a product review
34
“I love this product!” “It’s a little expensive.”“It claims too much.”
+ ?-
“台灣第一波上
市!”
“樓下買了我才考慮
”
“規格好雞肋…”
推 ?噓
f f f
f f f
Given a large amount of data, the machine learns what the function f should be.

35
Learning ≈ Looking for a Function
 Speech Recognition
 Image Recognition
 Go Playing
 Chat Bot
 f
 f
 f
 f
cat
“你好”
5-5 (next move)
“中研院在哪裡” “地址為…
需要為你導航嗎”

36
Framework
36
 1f “cat”
 1f “do
g”
 2f “monkey”
 2f “snak
e”
A set of
function 21, ff
Model
 f “cat”
Image Recognition:

37
Framework
37
 f “cat”
Image Recognition:
A set of
function 21, ff
Model
Training
Data
Goodness of
function f
“monkey” “cat” “dog”
*
f
Pick the “Best” Function
Using 
f
“cat”
Training Testing
Training is to pick the best function given the observed data
Testing is to predict the label using the learned function

38
Machine Learning
38
Machine
Learning
Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning

39
Deep Learning
 DL is a general purpose framework for representation learning
 Given an objective
 Learn representation that is required to achieve objective
 Directly from raw inputs
 Use minimal domain knowledge
39
1x
2x
…
…
1y
2y
…
…
…
…
…
…
… MyNx
vector
x
vector
y

40
A Single Neuron
z
1w
2w
Nw
…
1x
2x
Nx

b
 z
 z
zbias
y
  z
e
z 


1
1

Sigmoid function
Activation function
1
w, b are the parameters of this neuron
40

41
A Single Neuron
z
1w
2w
Nw
…1x
2x
Nx

b
bias
y
1





5.0"2"
5.0"2"
ynot
yis
A single neuron can only handle binary classification
41
MN
RRf :

42
A Layer of Neurons
 Handwriting digit classification
MN
RRf :
A layer of neurons can handle multiple possible output,
and the result depends on the max one
…
1x
2x
Nx

1
 1y

…
…
“1” or not
“2” or not
“3” or not
2y
3y
10 neurons/10 classes
Which
one is
max?

43
Deep Neural Networks (DNN)
 Fully connected feedforward network
1x
2x
……
Layer 1
……
1y
2y
……
Layer 2
……
Layer L
……
……
……
Input Output
MyNx
vector
x
vector
y
Deep NN: multiple hidden layers
MN
RRf :

44
Loss / Objective
44
1x
2x
……
256x
……
……
……
……
……
y1
y2
y10
Loss 𝑙
“1”
……
1
0
0……
Loss can be square error or cross entropy between the network
output and target
target
Softmax
As close as
possible
A good function should make the loss of all examples as small as possible.
Given a set of
parameters

45
Recurrent Neural Network (RNN)
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)

46
Model Training
 All model parameters can be
updated by SGD
46http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
yt-1 yt+1yt target
predicted

47
BPTT
47
For 𝐶(1)Backward Pass:
For 𝐶(2)
For 𝐶(3)For 𝐶(4)
Forward Pass: Compute s1, s2, s3, s4 ……
y1 y2 y3
x1
x2 x3
o1 o2 o3
ini
t
y4
x4
o4
𝐶(1)
𝐶(2) 𝐶(3) 𝐶(4)
s1 s2 s3
s4
The model is trained by comparing the correct
sequence tags and the predicted ones

48
Outline
 Introduction
48

49
 RL is a general purpose framework for decision making
 RL is for an agent with the capacity to act
 Each action influences the agent’s future state
 Success is measured by a scalar reward signal
 Goal: select actions to maximize future reward
Big three: action, state, reward

50
50
Agent
Environment
Observation Action
RewardDon’t do
that

51
51
Agent
Environment
Observation Action
RewardThank you.
Agent learns to take actions to maximize expected reward.

Scenario of Reinforcement Learning
Agent learns to take actions to maximize expected reward.
52
Environment
Observation ot Action at
Reward rt
If win, reward = 1
If loss, reward = -1
Otherwise, reward = 0
Next Move

53
Supervised v.s. Reinforcement
 Supervised
 Reinforcement
53
Hello 
Agent
……
Agent
……. ……. ……
Bad
“Hello” Say “Hi”
“Bye bye” Say “Good bye”
Learning from teacher
Learning from critics

54
Supervised v.s. Reinforcement
 Supervised Learning
 Training based on
supervisor/label/annotation
 Feedback is instantaneous
 Time does not matter
54
 Reinforcement Learning
 Training only based on reward
signal
 Feedback is delayed
 Time matters
 Agent actions affect
subsequent data

55
Reward
 Reinforcement learning is based on reward hypothesis
 A reward rt is a scalar feedback signal
 Indicates how well agent is doing at step t
Reward hypothesis: all agent goals can be desired by maximizing
expected cumulative reward

56
Sequential Decision Making
 Goal: select actions to maximize total future reward
 Actions may have long-term consequences
 Reward may be delayed
 It may be better to sacrifice immediate reward to gain more
long-term reward
56

57
Deep Reinforcement Learning
Environment
Observation Action
Reward
Function
Input
Function
Output
Used to pick the
best function
…
…
…
DNN

59
Outline
 Introduction
59

60
Semantic Frame Representation
 Requires a domain ontology
 Contains core content (intent, a set of slots with fillers)
find a cheap taiwanese restaurant in oakland
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”,
director=“james cameron”)
Restaurant
Domain
Movie
Domain
restaurant
typeprice
location
movie
yeargenre
director
60

61
 Pipelined
61
1. Domain
Classification
2. Intent
Classification
3. Slot Filling

LU – Domain/Intent Classification
• Given a collection of utterances ui with labels ci, D=
{(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels
for new utterances uk.
Mainly viewed as an utterance classification task
62
find me a cheap taiwanese restaurant in oakland
Movies
Restaurants
Sports
Weather
Music
…
Find_movie
Buy_tickets
Find_restaurant
Book_table
Find_lyrics
…

63
Language Understanding - Slot Filling
63
Is there um a cheap place in the centre of town please?
Is there um a cheap place in the centre of town please?
O O O O
B-price
O O O
B-area
O
I-areaI-area
Figure credited by Milica Gašić
As a sequence tagging task
•CRF for tagging each utterance
As a classification task
•SVM for each slot value pair

64
Outline
 Introduction
64

65
Dialogue State Tracking (DST)
 Maintain a probabilistic distribution instead of a 1-best
prediction for better robustness
65Slide credited by Sungjin Lee
Incorrect
for both!

66
prediction for better robustness to SLU errors or
ambiguous input
66
How can I help you?
Book a table at Sumiko for 5
How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)

67
Outline
 Introduction
67

68
Dialogue Policy Optimization
 Dialogue management in a RL framework
68
U s e r
Reward R Observation OAction A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
Slides credited by Pei-Hao Su
Optimized dialogue policy selects the best action that can maximize the future reward.
Correct rewards are a crucial factor in dialogue policy training

69
Reward for RL ≅ Evaluation for SDS
 Dialogue is a special RL task
 Human involves in interaction and rating (evaluation) of a
dialogue
 Fully human-in-the-loop framework
 Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost
- Objective rating Check desired aspects, low cost
69

70
Dialogue Reinforcement Signal
Typical Reward Function
 per turn penalty -1
 Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
70

71
User Simulation
 User Simulation
 Goal: generate natural and reasonable conversations to enable
reinforcement learning for exploring the policy space
 Approach
 Rule-based crafted by experts (Li et al., 2016)
 Learning-based (Schatzmann et al., 2006)
Dialogue
Corpus
Simulated User
Real User
• Dialogue Policy
Interaction
71

72
Outline
 Introduction
72

73
Natural Language Generation (NLG)
 Mapping semantic frame into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
73

74
Template-Based NLG
 Define a set of rules to map frames to NL
74
Pros: simple, error-free, easy to control
Cons: time-consuming, poor scalability
Semantic Frame Natural Language
confirm() “Please tell me more about the product your are
looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

75
RNN Language Generator (Wen et al., 2015)
<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> EAT serves British .
delexicalisation
Inform(name=EAT, food=British)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… …
dialog act 1-hot
representation
SLOT_NAME serves SLOT_FOOD . <EOS>
Weight tying
75

76
Concluding Remarks
 Modular dialogue system
76
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal

77
Tutorial Outline
77

語言理解
Ontology and Database78

79
79
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Database/
Knowledge Providers

80
80
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers

81
Semantic Frame Representation
 Requires a domain ontology: early connection to backend
 Contains core concept (intent, a set of slots with fillers)
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”,
director=“james cameron”)
Restaurant
Domain
Movie
Domain
restaurant
typeprice
location
movie
yeargenre
director
81

82
Backend Database / Ontology
82
 Domain-specific table
 Target and attributes
Movie Name Theater Rating Date Time
美女與野獸台北信義威秀 8.5 2017/03/21 09:00
美女與野獸台北信義威秀 8.5 2017/03/21 09:25
美女與野獸台北信義威秀 8.5 2017/03/21 10:15
美女與野獸台北信義威秀 8.5 2017/03/21 10:40
美女與野獸台北信義威秀 8.5 2017/03/21 11:05
movie name
daterating
theater time

83
Supported Functionality
 Information access
 Finding the specific entries from the table
 E.g. available theater, movie rating, etc.
 Task completion
 Finding the row that satisfies the constraints
83
美女與野獸台北信義威秀 8.5 2017/03/21 09:00
美女與野獸台北信義威秀 8.5 2017/03/21 09:25
美女與野獸台北信義威秀 8.5 2017/03/21 10:15
美女與野獸台北信義威秀 8.5 2017/03/21 10:40
美女與野獸台北信義威秀 8.5 2017/03/21 11:05

84
Dialogue Schema
 Slot: domain-specific attributes
 Columns from the table
 e.g. theater, date
84
美女與野獸台北信義威秀 8.5 2017/03/21 09:00
美女與野獸台北信義威秀 8.5 2017/03/21 09:25
美女與野獸台北信義威秀 8.5 2017/03/21 10:15
美女與野獸台北信義威秀 8.5 2017/03/21 10:40
美女與野獸台北信義威秀 8.5 2017/03/21 11:05

85
Dialogue Schema
 Dialogue Act
 inform, request, confirm (system only)
 Task-specific action (e.g. book_ticket)
 Others (e.g. thanks)
85
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
BotUser
request(moviename; actor=Bill Murray)
request(releaseyear)
inform(releaseyear=1993)
User Intent
= Dialogue Act + Slot
System Action
= Dialogue Act + Slot

86
Examples
 User: 我想要聽適合唸書的歌
request(song; situation =“唸書")
 Require situation labels for songs
 User: 請問附近有類似有錢真好的店嗎？
request(restaurant; similar_to=“有錢真好”)
 Require similar_to labels for restaurants
86

87
Examples
 User: 離散數學的教室是哪間？
 User: 週四7,8,9有資訊系的課嗎？
87

88
Trick
 The typed constraints are attributes/slots stored in
the columns
88

89
Table & Graph Backend
89
https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html

90
Data Collection
 Given the task-specific goal, what do users say?
90
美女與野獸台北信義威秀 8.5 2017/03/21 09:00
美女與野獸台北信義威秀 8.5 2017/03/21 09:25
美女與野獸台北信義威秀 8.5 2017/03/21 10:15
美女與野獸台北信義威秀 8.5 2017/03/21 10:40
美女與野獸台北信義威秀 8.5 2017/03/21 11:05
你知道美女與野獸電影的評價如何嗎?
我想看十點左右在信義威秀的電影
Information Access
Task completion

91
Annotation Design
 Query target: intent
 request + targeted attribute
 Constraints: slots & values
 Attributes with specified values
request(rating; moviename=美女與野獸)
book_ticket(time=十點左右, theater=信義威秀)
我想看十點左右在信義威秀的電影

92
Extension
 Multiple tables as backend
 Each table is responsible for a domain
 Associate intent with corresponding table/domain
Movie Name Theater Date Time
美女與野獸台北信義威秀 2017/03/21 09:00
美女與野獸台北信義威秀 2017/03/21 09:25
美女與野獸台北信義威秀 2017/03/21 10:15
Movie Name Cast
Release
Year
美女與野獸 Emma Watson 2017
鋼鐵人 : :
你知道美女與野獸是那一年的電影嗎?
request(kng.release_year; moviename=美女與野獸)
Ticket Table Knowledge Table

93
Concluding Remarks
 Semantic schema highly corresponds to backend DB
美女與野獸台北信義威秀 8.5 2017/03/21 09:00
美女與野獸台北信義威秀 8.5 2017/03/21 09:25
美女與野獸台北信義威秀 8.5 2017/03/21 10:15
美女與野獸台北信義威秀 8.5 2017/03/21 10:40
美女與野獸台北信義威秀 8.5 2017/03/21 11:05
slot
show me the rating of beauty and beast
find_rating (movie=“beauty and beast”)
movie
timerating
date

語言理解
Language Understanding94

95
95
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Database/
Knowledge Providers

96
96
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers

Classic Language Understanding97

98
 Pipelined
98
1. Domain
Classification
2. Intent
Classification
3. Slot Filling

As an utterance
classification
task
• Given a collection of utterances ui with labels ci,
D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a
model to estimate labels for new utterances uk.
99
Movies
Restaurants
Music
Sports
…
find_movie, buy_tickets
find_restaurant, find_price, book_table
find_lyrics, find_singer
…
Domain Intent

100
Conventional Approach
100
Data
Model
Prediction
dialogue utterances annotated with
domains/intents
domains/intents
machine learning classification model
e.g. support vector machine (SVM)

101
Theory: Support Vector Machine
 SVM is a maximum margin classifier
 Input data points are mapped into a high dimensional
feature space where the data is linearly separable
 Support vectors are input data points that lie on the
margin
101
http://www.csie.ntu.edu.tw/~htlin/mooc/

102
 z
z
Theory: Support Vector Machine
 Multiclass SVM
 Extended using one-versus-rest approach
 Then transform into probability
http://www.csie.ntu.edu.tw/~htlin/mooc/
SVM1 SVM2 SVM3 SVMk
S1 S2 S3 Sk
score for
each class … …
P1 P2 P3 Pk
prob for
each class
… …
Domain/intent can be decided based on the estimated scores

LU – Slot Filling
103
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence
tagging task
• Given a collection tagged word sequences,
S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)),
((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
where ti ∊ M, the goal is to estimate tags for a new word
sequence.
Entity Tag
Slot Tag

104
Conventional Approach
104
Data
Model
Prediction
dialogue utterances annotated with slots
slots and their values
machine learning tagging model
e.g. conditional random fields (CRF)

105
Theory: Conditional Random Fields
 CRF assumes that the label at time step t depends on
the label in the previous time step t-1
 Maximize the log probability log p(y | x) with respect
to parameters λ
105
input
output
Slots can be tagged based on the y that maximizes p(y|x)

107
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)

108
Deep Learning Approach
108
Data
Model
Prediction
dialogue utterances annotated with
semantic frames (user intents & slots)
user intents, slots and their values
deep learning model (classification/tagging)
e.g. recurrent neural networks (RNN)

109
Classification Model
 Input: each utterance ui is represented as a feature vector fi
 Output: a domain/intent label ci for each input utterance
109
As an utterance
classification
task
How to represent a sentence using a feature vector

Sequence Tagging Model
110
As a sequence
tagging task
S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)),
((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
sequence.
 Input: each word wi,j is represented as a feature vector fi,j
 Output: a slot label ti for each word in the utterance
How to represent a word using a feature vector

111
Word Representation
 Atomic symbols: one-hot representation
111
[0 0 0 0 0 0 1 0 0 … 0]
[0 0 0 0 0 0 1 0 0 … 0] [0 0 1 0 0 0 0 0 0 … 0]AND = 0
Issues: difficult to compute the similarity
(i.e. comparing “car” and “motorcycle”)
car
car
car motorcycle

112
Word Representation
 Neighbor-based: low-dimensional dense word embedding
112
Idea: words with similar meanings often have similar neighbors

113
Chinese Input Unit of Representation
 Character
 Feed each char to each time step
 Word
 Word segmentation required
你/知道/美女與野獸/電影/的/評價/如何/嗎
Can two types of information fuse together for better performance?

As an utterance
classification
task
114
Movies
Restaurants
Music
Sports
…
find_movie, buy_tickets
find_restaurant, find_price, book_table
find_lyrics, find_singer
…
Domain Intent

115
Deep Neural Networks for Domain/Intent
Classification (Ravuri and Stolcke, 2015)
 RNN and LSTMs for
utterance classification
 Word hashing to deal with
large number of singletons
 Kat: #Ka, Kat, at#
 Each character n-gram is
associated with a bit in the
input encoding
115
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf

LU – Slot Filling
116
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence
tagging task
S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)),
((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
sequence.
Entity Tag
Slot Tag

117
Recurrent Neural Nets for Slot Tagging – I
(Yao et al, 2013; Mesnil et al, 2015)
 Variations:
a. RNNs with LSTM cells
b. Input, sliding window of n-grams
c. Bi-directional LSTMs
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0
𝑓
ℎ1
𝑓
ℎ2
𝑓
ℎ 𝑛
𝑓
ℎ0
𝑏
ℎ1
𝑏
ℎ2
𝑏 ℎ 𝑛
𝑏
𝑦0 𝑦1 𝑦2 𝑦𝑛
(b) LSTM-LA (c) bLSTM
ℎ0 ℎ1 ℎ2 ℎ 𝑛
(a) LSTM
http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

118
Recurrent Neural Nets for Slot Tagging – II
(Kurata et al., 2016; Simonnet et al., 2015)
 Encoder-decoder networks
 Leverages sentence level
information
 Attention-based encoder-
decoder
 Use of attention (as in MT)
in the encoder-decoder
network
 Attention is estimated using
a feed-forward network with
input: ht and st at time t
𝑤 𝑛 𝑤2 𝑤1 𝑤0
ℎ 𝑛 ℎ2 ℎ1 ℎ0
𝑠0 𝑠1 𝑠2 𝑠 𝑛
ci
ℎ0 ℎ 𝑛…
http://www.aclweb.org/anthology/D16-1223

ht-
1
ht+
1
ht
W W W W
taiwanese
B-type
U
food
U
please
U
V
O
V
O
V
hT+1
EOS
U
FIND_RES
T
V
Slot Filling Intent
Prediction
Joint Semantic Frame Parsing
Sequence-
based
(Hakkani-Tur
et al., 2016)
• Slot filling and
intent prediction
in the same
output sequence
Parallel
(Liu and
Lane, 2016)
• Intent prediction
and slot filling
are performed
in two branches
119 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454

120
LU Evaluation
 Metrics
 Sub-sentence-level: intent accuracy, slot F1
 Sentence-level: whole frame accuracy
120

121
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Database/
Knowledge Providers
Concluding Remarks
121

122
Language Understanding Demo
 Movie Bot
 http://140.112.21.82:8000/
 Example
 晚上九點的當他們認真編織時
 我想要看晚上七點的電影
 我想要看晚上七點在新竹大遠百威秀影城的電影
 我要看新竹大遠百威秀影城的刀劍神域劇場版序列爭戰
 我想要看晚上七點在新竹大遠百威秀影城的刀劍神域劇
場版序列爭戰
122

123
Tutorial Outline
123

對話管理
Dialogue State Tracking124

125
125
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Database/
Knowledge Providers

126
126
Speech
Recognition
• Slot Filling
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers
• Dialogue Policy

128
Example Dialogue
128
request (restaurant; foodtype=Thai)
inform (area=centre)
request (address)
bye ()

129
Elements of Dialogue Management
129(Figure from Gašić)

130
prediction for better robustness to recognition errors
130
Incorrect
for both!

131
prediction for better robustness to SLU errors or
ambiguous input
131
How can I help you?
Book a table at Sumiko for 5
How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)

132
1-Best Input w/o State Tracking
132

133
N-Best Inputs w/o State Tracking
133

134
N-Best Inputs w/ State Tracking
134

135
 Definition
 Representation of the system's belief of the user's goal(s)
at any time during the dialogue
 Challenge
 How to define the state space?
 How to tractably maintain the dialogue state?
 Which actions to take for each state?
135
Define dialogue as a control problem where the behavior can be
automatically learned

136

137
Agent and Environment
 At time step t
 The agent
 Executes action at
 Receives observation ot
 Receives scalar reward rt
 The environment
 Receives action at
 Emits observation ot+1
 Emits scalar reward rt+1
 t increments at env. step
observation
ot
action
at
reward
rt

138
State
 Experience is the sequence of observations, actions, rewards
 State is the information used to determine what happens next
 what happens depends on the history experience
• The agent selects actions
• The environment selects observations/rewards
 The state is the function of the history experience

139
observation
ot
action
at
reward
rt
Environment State
 The environment state 𝑠𝑡
𝑒
is the
environment’s private
representation
 whether data the environment
uses to pick the next
observation/reward
 may not be visible to the agent
 may contain irrelevant information

140
observation
ot
action
at
reward
rt
Agent State
 The agent state 𝑠𝑡
𝑎
is the agent’s
internal representation
 whether data the agent uses to
pick the next action 
information used by RL algorithms
 can be any function of experience

141
Reward
 Reinforcement learning is based on reward hypothesis
 A reward rt is a scalar feedback signal
 Indicates how well agent is doing at step t
Reward hypothesis: all agent goals can be desired by maximizing
expected cumulative reward

142
Dialogue state tracking

143
 Requirement
 Dialogue history
 Keep tracking of what happened so far in the dialogue
 Normally done via Markov property
 Task-oriented dialogue
 Need to know what the user wants
 Modeled via the user goal
 Robustness to errors
 Need to know what the user says
 Modeled via the user action
143

144
DST Problem Formulation
 The DST dataset consists of
 Goal: for each informable slot
 e.g. price=cheap
 Requested: slots by the user
 e.g. moviename
 Method: search method for entities
 e.g. by constraints, by name
 The dialogue state is
 the distribution over possible slot-value pairs for goals
 the distribution over possible requested slots
 the distribution over possible methods
144

145
Class-Based DST
145
Data
Model
Prediction
• Observations labeled w/ dialogue state
• Distribution over dialogue states
– Dialogue State Tracking
• Neural networks
• Ranking models

146
DNN for DST
146
feature
extraction
DNN
A slot value distribution
for each slot
multi-turn conversation
state of this turn

147
Sequence-Based DST
147
Data
Model
Prediction
• Sequence of observations labeled w/
dialogue state
• Distribution over dialogue states
– Dialogue State Tracking
• Recurrent neural networks (RNN)

148
 Elman-type
 Jordan-type
148

149
RNN DST
 Idea: internal memory for
representing dialogue context
 Input
 most recent dialogue turn
 last machine dialogue act
 dialogue state
 memory layer
 Output
 update its internal memory
 distribution over slot values
149

150
RNN-CNN DST
150(Figure from Wen et al, 2016)
http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190

151
Multichannel Tracker (Shi et al., 2016)
151
 Training a multichannel CNN
for each slot
 Chinese character CNN
 Chinese word CNN
 English word CNN
https://arxiv.org/abs/1701.06247

152
DST Evaluation
 Metric
 Tracked state accuracy with respect to user goal
 L2-norm of the hypothesized dist. and the true label
 Recall/Precision/F-measure individual slots
152

153
Dialog State Tracking Challenge (DSTC)
(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)
Challenge Type Domain Data Provider Main Theme
DSTC1
Human-
Machine
Bus Route CMU Evaluation Metrics
DSTC2
Human-
Machine
Restaurant U. Cambridge User Goal Changes
DSTC3
Human-
Machine
Tourist Information U. Cambridge Domain Adaptation
DSTC4
Human-
Human
Tourist Information I2R Human Conversation
DSTC5
Human-
Human
Tourist Information I2R Language Adaptation

154
DSTC1
 Type: Human-Machine
 Domain: Bus Route
154

155
DSTC4-5
 Type: Human-Human
 Domain: Tourist Information
155
Tourist: Can you give me some uh- tell me some cheap rate hotels,
because I'm planning just to leave my bags there and go
somewhere take some pictures.
Guide: Okay. I'm going to recommend firstly you want to have a
backpack type of hotel, right?
Tourist: Yes. I'm just gonna bring my backpack and my buddy with me. So
I'm kinda looking for a hotel that is not that expensive. Just
gonna leave our things there and, you know, stay out the whole
day.
Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit uh
not so roomy like hotel because you just back to sleep.
Tourist: Yes. Yes. As we just gonna put our things there and then go out
to take some pictures.
Guide: Okay, um-
Tourist: Hm.
Guide: Let's try this one, okay?
Tourist: Okay.
Guide: It's InnCrowd Backpackers Hostel in Singapore. If you
take a dorm bed per person only twenty dollars. If you
take a room, it's two single beds at fifty nine dollars.
Tourist: Um. Wow, that's good.
Guide: Yah, the prices are based on per person per bed or
dorm. But this one is room. So it should be fifty nine for
the two room. So you're actually paying about ten
dollars more per person only.
Tourist: Oh okay. That's- the price is reasonable actually. It's
good.
{Topic: Accommodation; Type: Hostel; Pricerange:
Cheap; GuideAct: ACK; TouristAct: REQ}
{Topic: Accommodation; NAME: InnCrowd Backpackers
Hostel; GuideAct: REC; TouristAct: ACK}

156
Concluding Remarks
 Dialogue state tracking (DST) of DM has Markov assumption to model
the user goal and be robust to errors
156

對話管理
Dialogue Policy157

158
158
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Database/
Knowledge Providers

159
159
Speech
Recognition
• Slot Filling
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers
• Dialogue Policy

161
Example Dialogue
161
request (address)
bye ()
greeting ()
request (area)
inform (restaurant=Bangkok
city, area=centre of town,
foodtype=Thai)
inform (address=24 Green street)

162
Example Dialogue
162
request (address)
bye ()
greeting ()
request (area)
inform (restaurant=Bangkok
city, area=centre of town,
foodtype=Thai)
inform (address=24 Green street)

163

164

165
State
 Experience is the sequence of observations, actions, rewards
 State is the information used to determine what happens next
 what happens depends on the history experience
• The agent selects actions
• The environment selects observations/rewards
 The state is the function of the history experience

166
Dialogue policy optimization

167
Dialogue Policy Optimization
 Dialogue management in a RL framework
167
U s e r
Reward R Observation OAction A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
The optimized dialogue policy selects the best action that maximizes the future reward.
Correct rewards are a crucial factor in dialogue policy training

168
Policy Optimization Issue
 Optimization problem size
 Belief dialogue state space is large and continuous
 System action space is large
 Knowledge environment (user)
 Transition probability is unknown (user status)
 How to get rewards
168

169
Reinforcement Learning for Dialogue
Policy Optimization
169
Language
understanding
Language
(response)
generation
Dialogue
Policy
𝑎 = 𝜋(𝑠)
Collect rewards
(𝑠, 𝑎, 𝑟, 𝑠’)
Optimize
𝑄(𝑠, 𝑎)
User input (o)
Response
𝑠
𝑎
Type of Bots State Action Reward
Social ChatBots Chat history System Response
# of turns maximized;
Intrinsically motivated reward
InfoBots (interactive Q/A)
User current
question + Context
Answers to current
question
Relevance of answer;
# of turns minimized
Task-Completion Bots
User current input +
Context
System dialogue act w/
slot value (or API calls)
Task success rate;
# of turns minimized
Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories

170
Large Belief Space and Action Space
 Solution: perform optimization in a reduced
summary space built according to the heuristics
170
Belief Dialogue
State
System Action
Summary
Dialogue State
Summary Action
Summary Policy
Summary
Function
Master
Function

171
Transition Probability and Rewards
 Solution: learn from real users
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Backend Database/
Knowledge Providers

172
Transition Probability and Rewards
 Solution: learn from a simulated user
172
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
System dialogue acts
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
User Model
Reward Model
User Simulation Distribution over
user dialogue acts
(semantic frames)

173
Reward for RL ≅ Evaluation for System
 Dialogue is a special RL task
 Human involves in interaction and rating (evaluation) of a
dialogue
 Fully human-in-the-loop framework
 Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost
- Objective rating Check desired aspects, low cost
173

174
Dialogue Reinforcement Learning Signal
Typical reward function
 -1 for per turn penalty
 Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
174
The user simulator is usually required for
dialogue system training before deployment

175
RL-Based Dialogue Management (Li et al., 2017)
 Deep RL for training dialogue policy
 Input: current semantic frame observation, database
returned results
 Output: system action
175
Semantic Frame
request_movie
request_location
Dialogue
Management
(DM)Simulated/paid/real
User
Backend DB

176
Major Components in an RL Agent
 An RL agent may include one or more of these
components
 Policy: agent’s behavior function
 Value function: how good is each state and/or action
 Model: agent’s representation of the environment
176

177
Policy
 A policy is the agent’s behavior
 A policy maps from state to action
 Deterministic policy:
 Stochastic policy:
177

178
Value Function
 A value function is a prediction of future reward (with action
a in state s)
 Q-value function gives expected total reward
 from state and action
 under policy
 with discount factor
 Value functions decompose into a Bellman equation
178

179
Optimal Value Function
 An optimal value function
 is the maximum achievable value
 allows us to act optimally
 informally maximizes over all decisions
 decompose into a Bellman equation
179

180
Reinforcement Learning Approach
 Policy-based RL
 Search directly for optimal policy
 Value-based RL
 Estimate the optimal value function
 Model-based RL
 Build a model of the environment
 Plan (e.g. by lookahead) using model
180
is the policy achieving maximum future reward
is maximum value achievable under any policy

181
Maze Example
 Rewards: -1 per time-
step
 Actions: N, E, S, W
 States: agent’s location
181

182
Maze Example: Policy
step
182
Arrows represent policy π(s) for each state s

183
Maze Example: Value Function
step
183
Numbers represent value Qπ(s) of each state s

Estimate How Good Each State and/or Action is
Value-Based Deep RL184

185
Value Function Approximation
 Value functions are represented by a lookup table
 too many states and/or actions to store
 too slow to learn the value of each entry individually
 Values can be estimated with function approximation
185

186
Q-Networks
 Q-networks represent value functions with weights
 generalize from seen states to unseen states
 update parameter for function approximation
186

187
Q-Learning
 Goal: estimate optimal Q-values
 Optimal Q-values obey a Bellman equation
 Value iteration algorithms solve the Bellman equation
187
learning target

188
Deep Q-Networks (DQN)
 Represent value function by deep Q-network with weights
 Objective is to minimize mean square error (MSE) loss by SGD
188
Issue: naïve Q-learning oscillates or diverges using NN due to:
1) correlations between samples 2) non-stationary targets
learning target

189
Deep RL in Atari Games
189
Different tricks are performed to enable RL training

190
DQN in Atari
 Goal: end-to-end learning of values Q(s, a) from pixels
 Input: state is stack of raw pixels from last 4 frames
 Output: Q(s, a) for all joystick/button positions a
 Reward is the score change for that step
190DQN Nature Paper [link] [code]

DQN in Atari
191
DQN Nature Paper [link] [code]

DQN in Movie Bot
 Goal: interact with the user for movie ticket booking
192
User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
Agent: Great - I was able to purchase 2 tickets for
you to see the witch tomorrow at regal meridian 16
theater in Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you!
REINFORCEMENT LEARNING SYSTEM
User: Tomorrow
Agent: Which theater do you prefer?
User: Seattle
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets
for you to see the witch tomorrow at regal
meridian 16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!

Estimate How Good An Agent’s Behavior is
Policy-Based Deep RL193

194
Deep Policy Networks
 Represent policy by deep network with weights
 Objective is to maximize total discounted reward by SGD
194
stochastic policy deterministic policy

195
Policy Gradient
 The gradient of a stochastic policy is given by
 The gradient of a deterministic policy is given by
195
How to deal with continuous actions

196
Actor-Critic (Value-Based + Policy-Based)
 Estimate value function
 Update policy parameters by SGD
 Stochastic policy
 Deterministic policy
196
Q-networks tell whether a policy is good or not
Policy networks optimize the policy accordingly

197
Deterministic Deep Policy Gradient
 Goal: end-to-end learning of control policy from pixels
 Input: state is stack of raw pixels from last 4 frames
 Output: two separate CNNs for Q and 𝜋
197Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv, 2015.

198
Deep RL AI Examples
 Play games: Atari, poker, Go, …
 Control physical systems: manipulate, …
 Explore worlds: 3D worlds, …
 Interact with users: recommend, optimize, personalize, …
198

199
Concluding Remarks
 Dialogue policy optimization can be viewed as an RL task
 Transition probability and reward come from
 Real user
 Simulated user

對話管理
User Simulation200

201
201
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Database/
Knowledge Providers

202
202
Speech
Recognition
• Slot Filling
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers
• Dialogue Policy

203
User Simulation
 Goal: generate natural and reasonable conversations to
enable reinforcement learning for exploring the policy space
 Approach
 Rule-based crafted by experts (Li et al., 2016)
 Learning-based (Schatzmann et al., 2006; El Asri et al., 2016)
Dialogue
Corpus
Simulated User
Real User
• Dialogue Policy
Interaction
keeps a list of its
goals and actions
randomly
generates an
agenda
updates its list of goals
and adds new ones

204
Elements of User Simulation
Error Model
• LU error
Dialogue State
Tracking (DST)
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
User Model
Reward Model
user dialogue acts
(semantic frames)
Reward

205
Error Model
• LU error
Dialogue State
Tracking (DST)
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
User Model
Reward Model
user dialogue acts
(semantic frames)
The user model defines the natural interactive
flow to convey the user’s goal

206
User Goal
 Sampled from a real corpus / hand-crafted list
 Information access
 Inflexible: the user has specific entries for search
 Task completion
 Inflexible: the user has specific rows for search
 Flexible: some slot values are unknown or can be suggested
206
request_ticket, inform_slots: {moviename=“A”, …}
request_slot: rating, inform_slots: {moviename=“A”, …}
request_ticket, request_slots: {time, theater},
inform_slots: {moviename=“A”, …}

207
Supported Functionality of Ontology
 Information access
 Task completion
 Finding the row that satisfies the constraints
207
美女與野獸台北信義威秀 8.5 2017/03/21 09:00
美女與野獸台北信義威秀 8.5 2017/03/21 09:25
美女與野獸台北信義威秀 8.5 2017/03/21 10:15
美女與野獸台北信義威秀 8.5 2017/03/21 10:40
美女與野獸台北信義威秀 8.5 2017/03/21 11:05

208
User Goal – Information Access
 Information access with inflexible constraints
 Intent examples in the ticket DB: request_moviename,
request_theater, request_time, etc
 Intent examples in the knowledge DB:
request_moviename, request_year
 Specifying the constraints for the search target
 The specified slots should align with the corresponding DB
208
美女與野獸台北信義威秀 2017/03/21 09:00
美女與野獸台北信義威秀 2017/03/21 09:25
美女與野獸台北信義威秀 2017/03/21 10:15
Movie Name Cast Year
美女與野獸 Emma Watson 2017
鋼鐵人 : :

209
User Goal – Task-Completion
 Task completion with inflexible constraints
 Finding the specific rows from the table
 Intent examples: request_ticket (ticket DB)
 Specifying the slot constraints for the search target
 Informable slot: theater, date, etc
 Requestable slot: time, number_of_people
209
美女與野獸台北信義威秀 2017/03/21 09:00
美女與野獸台北信義威秀 2017/03/21 09:25
美女與野獸台北信義威秀 2017/03/21 10:15

210
User Goal – Task-Completion
 Task completion with flexible constraints
 No priority: {台北信義威秀, 台北日新威秀}
 Prioritized: {台北信義威秀>台北日新威秀}
210
美女與野獸台北信義威秀 2017/03/21 09:00
美女與野獸台北信義威秀 2017/03/21 09:25
美女與野獸台北信義威秀 2017/03/21 10:15

211
Satisfiability and Uniqueness
 Use goal characteristics
 Satisfiability – the predefined goal can be satisfied by
the backend database
 Uniqueness – all satisfied entries should be in the goal
211
Movie Name Actor Year
鋼鐵人 Robert John Downey 2008
福爾摩斯 Robert John Downey 2008
request_actor(moviename=鋼鐵人, year=2007)
request_moviename(actor=Robert John Downey, year=2008)

212
More Extension
 Slot value hierarchy
 “morning”  {9:15, 9:45, 10:15, 10:45, 11:15, 11:45}
 Inexact match / partial match
 “ten o’clock”  {9:45, 10:15}
 “emma”  {Emma Watson, Emma Stone}
212

213
User Agenda Model
 User agenda defines how a user interacts with the
system in the dialogue level
 Random order: random sample a slot for informing or
request
 Fixed order: define a list of ordering slots for
informing or request
213

214
Error Model
• LU error
Dialogue State
Tracking (DST)
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
User Model
Reward Model
user dialogue acts
(semantic frames)
The error model enables the system to maintain
the robustness during training

215
Frame-Level Interaction
 DM receives frame-level information
 No error model: perfect recognizer and LU
 Error model: simulate the possible errors
215
Error Model
• LU error
Dialogue State
Tracking (DST)
system dialogue acts
Dialogue Policy
Optimization
User Model
User Simulation
user dialogue acts
(semantic frames)

216
Natural Language Level Interaction
 User simulator sends natural language
 No recognition error
 Errors from NLG or LU
216
Natural Language
Generation (NLG)
Dialogue State
Tracking (DST)
system dialogue acts
Dialogue Policy
Optimization
User Model
User Simulation
NL
Language
Understanding
(LU)

217
Error Model
 Simple
 Randomly generate errors by
 Replacing with an incorrect intent
 Deleting an informable slot
 Replacing with an incorrect slot (value is original)
 Replacing with an incorrect slot value
 Learning
 Simulate the errors based on the ASR or LU confusion
217
request_moviename
(actor=Robert Downey Jr)
request_year
director Robert Downey Sr

218
Error Model
• LU error
Dialogue State
Tracking (DST)
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
User Model
Reward Model
user dialogue acts
(semantic frames)
The reward model checks whether the returned
results satisfy the goal in order to decide the reward

219
Reward Model
 Success measurement
 Check whether the returned results satisfy the
predefined user goal
 Success – large positive reward
 Fail – large negative reward
 Minimize the number of turns
 Penalty for additional turns
219

220
Concluding Remarks
Error Model
• LU error
Dialogue State
Tracking (DST)
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
User Model
Reward Model
user dialogue acts
(semantic frames)

221
Tutorial Outline
221

語言生成
Language Generation222

223
223
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Database/
Knowledge Providers

224
224
Speech
Recognition
• Slot Filling
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
see this weekend
Semantic Frame
request_movie
request_location
Text response
Text Input
Speech Signal
Backend Action /
Knowledge Providers

225
 Mapping dialogue acts into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
225

226
Template-Based NLG
 Define a set of rules to map frames to NL
226
Pros: simple, error-free, easy to control
Cons: time-consuming, rigid, poor scalability
Semantic Frame Natural Language
confirm() “Please tell me more about the product your are
looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”

227
Class-Based LM NLG (Oh and Rudnicky, 2000)
 Class-based language modeling
 NLG by decoding
227
Pros: easy to implement/
understand, simple rules
Cons: computationally inefficient
Classes:
inform_area
inform_address
…
request_area
request_postcode
http://dl.acm.org/citation.cfm?id=1117568

228
Phrase-Based NLG (Mairesse et al, 2010)
Semantic
DBN
Phrase
DBN
Charlie Chan is a Chinese Restaurant near Cineworld in the centre
d d
Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)
228
Pros: efficient, good performance
Cons: require semantic alignments
realization phrase semantic stack

229
RNN-Based LM NLG (Wen et al., 2015)
<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> Din Tai Fung serves Taiwanese .
delexicalisation
Inform(name=Din Tai Fung, food=Taiwanese)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…
dialogue act 1-hot
representation
SLOT_NAME serves SLOT_FOOD . <EOS>
Slot weight tying
conditioned on
the dialogue act
Input
Output
http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295

230
Handling Semantic Repetition
 Issue: semantic repetition
 Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.
 Din Tai Fung is a child friendly restaurant, and also allows kids.
 Deficiency in either model or decoding (or both)
 Mitigation
 Post-processing rules (Oh & Rudnicky, 2000)
 Gating mechanism (Wen et al., 2015)
 Attention (Mei et al., 2016; Wen et al., 2015)
230

231
 Original LSTM cell
 Dialogue act (DA) cell
 Modify Ct
Semantic Conditioned LSTM (Wen et al., 2015)
DA cell
LSTM cell
Ct
it
ft
ot
rt
ht
dtdt-1
xt
xt ht-1
xt ht-1 xt ht-1 xt ht-
1
ht-1
Inform(name=Seven_Days, food=Chinese)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, …
dialog act 1-hot
representation
d0
231
Idea: using gate mechanism to control the
generated semantics (dialogue act/slots)
http://www.aclweb.org/anthology/D/D15/D15-1199.pdf

232
Structural NLG (Dušek and Jurčíček, 2016)
 Goal: NLG based on the syntax tree
 Encode trees as sequences
 Seq2Seq model for generation
232
https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79

233
Contextual NLG (Dušek and Jurčíček, 2016)
 Goal: adapting users’
way of speaking,
providing context-
aware responses
 Context encoder
 Seq2Seq model
233
https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203

234
Chit-Chat Bot
 Neural conversational model
 Non task-oriented
234

235
Many-to-Many
 Both input and output are both sequences → Sequence-to-
sequence learning
 E.g. Machine Translation (machine learning→機器學習)
235
learning
machine
機習器學
Add an end symbol “===“
[Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15]
===

236
A Neural Conversational Model
 Seq2Seq
236
[Vinyals and Le, 2015]

237
Chit-Chat Bot
237
電視影集 (~40,000 sentences)、美國總統大選辯論

238
Sci-Fi Short Film - SUNSPRING
238https://www.youtube.com/watch?v=LY7x2Ihqj

239
Tutorial Outline
239

240
Outline
 Dialogue System Evaluation
 Recent Trends
 End-to-End Learning for Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
 Challenges
240

241
Dialogue System Evaluation
 Language understanding
 Sentence-level: frame accuracy (%)
 Subsentence-level: intent accuracy (%), slot F-measure (%)
 Dialogue management
 Dialogue state tracking
 Frame accuracy (%)
 Dialogue policy
 Turn-level: accuracy (%)
 Dialogue-level: success rate (%), reward
 Natural language generation
241

242
NLG Evaluation
 Metrics
 Subjective: human judgement (Stent et al., 2005)
 Adequacy: correct meaning
 Fluency: linguistic fluency
 Readability: fluency in the dialogue context
 Variation: multiple realizations for the same concept
 Objective: automatic metrics
 Word overlap: BLEU (Papineni et al, 2002), METEOR, ROUGE
 Word embedding based: vector extrema, greedy matching,
embedding average
242
There is a gap between human perception and automatic metrics

243
Dialogue Evaluation Measures
243
•Whether constrained values specified by users can be understood by
the system
•Agreement percentage of system/user understandings over the entire
dialog (averaging all turns)
Understanding Ability
•Number of dialogue turns for task completion
Efficiency
•An explicit confirmation for an uncertain user utterance is an
appropriate system action
•Providing information based on misunderstood user requirements
Action Appropriateness

244
Outline
 Recent Trends
 Multimodality
 Dialogue Breath
 Dialogue Depth
 Challenges
244

245
E2E Joint NLU and DM (Yang et al., 2017)
 Idea: errors from DM can be propagated to NLU
for better robustness
DM
245
Model DM NLU
Baseline (CRF+SVMs) 7.7 33.1
Pipeline-BLSTM 12.0 36.4
JointModel 22.8 37.4
Both DM and NLU
performance is improved

246
0 0 0 … 0 1
Database Operator
Copy
field
…
Database
Sevendays
CurryPrince
Nirala
RoyalStandard
LittleSeuol
DB pointer
Can I have korean
Korean
0.7
British
0.2
French
0.1
…
Belief Tracker
Intent Network
Can I have <v.food>
E2E Supervised Dialogue System (Wen et al., 2016)
Generation Network
<v.name> serves great <v.food> .
Policy Network
246
zt
pt
xt
MySQL query:
“Select * where
food=Korean”
qt

247
E2E MemNN for Dialogues (Bordes et al., 2016)
 Split dialogue system
actions into subtasks
 API issuing
 API updating
 Option displaying
 Information informing
247
dialogue-levelturn-level

248
E2E RL-Based Info-Bot (Dhingra et al., 2016)
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray
movie which came out in 1993.
KB-InfoBot
User
(Groundhog Day, actor, Bill Murray)
(Groundhog Day, release year, 1993)
(Australia, actor, Nicole Kidman)
(Mad Max: Fury Road, release year, 2015)
Knowledge Base (head, relation, tail)
248Idea: differentiable database for propagating the gradients

249
E2E RL-Based System (Zhao and Eskenazi, 2016)
249
 Joint learning
 NLU, DST, Dialogue Policy
 Deep RL for training
 Deep Q-network
 Deep recurrent network
Baseline
RL
Hybrid-RL
http://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=19

250
E2E LSTM-Based Dialogue Control
(Williams and Zweig, 2016)
250
 Idea: an LSTM maps from raw dialogue history directly to a
distribution over system actions
 Developers can provide software including business rules &
programmatic APIs
 LSTM can take actions in the real world on behalf of the user
 The LSTM can be optimized using SL or RL

251
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
wi
B-
type
wi
+1
wi+2
O O
EOS
<intent
>
wi
B-
type
wi
+1
wi+2
O O
EOS
<intent
>
Semantic Frame
request_movie
genre=action,
date=this weekend
System Action /
Policy
request_location
User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi
+1
wi+2
O O
EOS
<intent>
Time t-2
Time t
Dialogue
Management
(DM)
w0 w1 w2
EOS
User
Goal
User Agenda Modeling
User Simulator
End-to-End Neural Dialogue System
Text Input
Are there any action movies
to see this weekend?
Idea: supervised learning for each component and reinforcement
learning for end-to-end training the neural dialogue system
251

252
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
 User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
Agent: Which theater do you want?
User: 9:30 pm
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
you to see the witch tomorrow at regal meridian 16
theater in Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you! 252
The system can learn how to efficiently
interact with users for task completion
REINFORCEMENT LEARNING SYSTEM
User: Tomorrow
Agent: Which theater do you prefer?
User: Seattle
User: 9:30 pm.
you to see the witch tomorrow at regal meridian
16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!

253
Outline
 Recent Trends
 Multimodality
 Dialogue Breath
 Dialogue Depth
 Challenges
253

254
Brain Signal for Understanding
254
 Misunderstanding detection by brain signal
 Green: listen to the correct answer
 Red: listen to the wrong answer
Detecting misunderstanding via brain signal in order to correct the understanding results

255
Video for Intent Understanding
255
Proactive (from camera)
I want to see a movie on TV!
Intent: turn_on_tv
Sir, may I turn on the TV for you?
Proactively understanding user intent to initiate the dialogues.

256
App Behavior for Understanding
256
 Task: user intent prediction
 Challenge: language ambiguity
 User preference
 Some people prefer “Message” to “Email”
 Some people prefer “Outlook” to “Gmail”
 App-level contexts
 “Message” is more likely to follow “Camera”
 “Email” is more likely to follow “Excel”
send to vivian
v.s.
Email? Message?
Communication
Considering behavioral patterns in history to model understanding for intent prediction.

257
Evolution Roadmap
257
Dialogue breadth (coverage)
Dialoguedepth(complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…

258
Outline
 Recent Trends
 Multimodality
 Dialogue Breath
 Dialogue Depth
 Challenges
258

259
Evolution Roadmap
259
Single
domain
systems
Extended
systems
Multi-
domain
systems
Open
domain
systems
What is influenza?
Tell me a joke.
I feel sad…

260
Intent Expansion (Chen et al., 2016)
 Transfer dialogue acts across domains
 Dialogue acts are similar for multiple domains
 Learning new intents by information from other domains
CDSSM
New Intent
Intent Representation
1
2
K
:
Embedding
Generation
K+1
K+2<change_calender>
Training Data
<change_note>
“adjust my note”
:
<change_setting>
“volume turn down”
300 300 300 300
U A1 A2 An
CosSi
m
P(A1 | U) P(A2 | U) P(An | U)
…
Utterance Action
The dialogue act representations can be
automatically learned for other domains
http://ieeexplore.ieee.org/abstract/document/7472838/
postpone my meeting to five pm

261
Zero-Shot Learning (Daupin et al., 2016)
 Semantic utterance classification
 Use query click logs to define a task that makes the
networks learn the meaning or intent behind the queries
 The semantic features c can be learned

262
Domain Adaptation for SLU (Kim et al., 2016)
 Frustratingly easy domain adaptation
 Novel neural approaches to domain adaptation
 Improve slot tagging on several domains
http://www.aclweb.org/anthology/C/C16/C16-1038.pdf

263
Policy for Domain Adaptation (Gašić et al., 2015)
 Bayesian committee machine (BCM) enables estimated
Q-function to share knowledge across domains
QRDR
QHDH
QL DL
Committee Model
The policy from a new domain can be boosted by the committee policy
http://ieeexplore.ieee.org/abstract/document/7404871/

264
Outline
 Recent Trends
 Multimodality
 Dialogue Breath
 Dialogue Depth
 Challenges
264

265
Evolution Roadmap
265
Knowledge based system
Common sense system
Empathetic systems
What is influenza?
Tell me a joke.
I feel sad…

266
High-Level Intention for Dialogue Planning
(Sun et al., 2016; Sun et al., 2016)
 High-level intention may span several domains
Schedule a lunch with Vivian.
find restaurant check location contact play music
What kind of restaurants do you prefer?
The distance is …
Should I send the restaurant information to Vivian?
Users can interact via high-level descriptions and the
system learns how to plan the dialogues
http://dl.acm.org/citation.cfm?id=2856818; http://www.lrec-conf.org/proceedings/lrec2016/pdf/75_Paper.pdf

267
Empathy in Dialogue System (Fung et al., 2016)
 Embed an empathy module
 Recognize emotion using multimodality
 Generate emotion-aware responses
267Emotion Recognizer
vision
speech
text

268
Outline
 Recent Trends
 Multimodality
 Dialogue Breath
 Dialogue Depth
 Challenges
268

269
Challenges in Dialogue Modeling - I
269
 Semantic schema induction (Chen et al., 2013; Athanasopoulou, et al., 2014)
 No predefined semantic schema
 How to learn from data?
 Tractability, and dimensionality reduction methods
 Learning with large state action spaces
 End-to-end learning methods
 Learning when the user input is complex NL utterance
 Learning with humans or KBs ?
 Learning under domain shifts

270
Challenges in Dialogue Modeling - II
270
 Multiple-State hypothesis
 Tracking a distribution over multiple dialog states can improve dialog
accuracy
 How does current dialog systems deal with this?
 Proactive v.s. reactive approaches to dialog modeling
 How to build DM models when the agent is proactive (i.e., does not
wait for the user but sends messages and drives the conversation)
 Localization, personalization, etc.
 How to deal with issue pertaining to place, temporal and personal
context. Mostly dealt on speech side. How about DM side for when
learning the policy?
 Hierarchical RL approach to policy learning actually works?
 When are they useful?
 How about for open domain systems (like chit-chat) - Are they
powerful?

271
Challenges in Dialogue Modeling - III
271
 Chat-Bot challenges
 Consistency: Keep similar answers in spite of different wordings
 Human: what is your job?
 Machine: I am lawyer
 Human: what do you do ?
 Machine: I am a doctor
 Quick domain-dependent adaptation: specially from un-
structured data (Yan et.al, 2016)
 Personalization: handling profiles, interaction levels, and keep
relevant context history (Li et al., 2016)
 Long sentence generation: most sentence are short or common
phrases

272
Concluding Remarks
272
Human-Robot interfaces is a hot topic but several components must be integrated!
Most state-of-the-art technologies are based on DNN
• Requires huge amounts of labeled data
• Several frameworks/models are available
Fast domain adaptation with scarse data + re-use of rules/knowledge
Handling reasoning
Data collection and analysis from un-structured data
Complex-cascade systems requires high accuracy for working good as a whole

THANKS FOR ATTENTION!
Q & A
We thanks Tsung-Hsien Wen, Pei-Hao Su, Li Deng, Sungjin Lee,
Milica Gašić, Lihong Li for sharing their slides

[系列活動] 一天搞懂對話機器人

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to [系列活動] 一天搞懂對話機器人

Similar to [系列活動] 一天搞懂對話機器人 (20)

More from 台灣資料科學年會

More from 台灣資料科學年會 (20)

Recently uploaded

Recently uploaded (20)

[系列活動] 一天搞懂對話機器人