End-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion
Neural Dialogue Systems
Xiujun Li Yun-Nung (Vivian) Chen Lihong Li Jianfeng Gao Asli Celikyilmaz
The 8th International Joint Conference on Natural Language Processing
https://github.com/MiuLab/TC-Bot
1
Dialogue Interactions
“I want to talk” Chit-chat
“I have a question” Information consumption
“I need to get this done” Task completion
Task-oriented
Non task-oriented
2
Dialogue Interactions
“I want to talk” Chit-chat
“I have a question” Information consumption
“I need to get this done” Task completion
• What is the paper review schedule?
• Which room is the dialogue tutorial in?
• When is the IJCNLP 2017 conference?
• What does NLP stand for?
3
Dialogue Interactions
“I want to talk” Chit-chat
“I have a question” Information consumption
“I need to get this done” Task completion
• Book me the flight from Seattle to Taipei
• Reserve a table at Din Tai Fung for 5 people, 7PM tonight
• Purchase two movie tickets about Thor 3 at 9PM tonight
4
Goal: navigating users to complete tasks (transactions, API calls, etc.)
Challenges: including multiple information consumption
Task-Completion Dialogues
• Crowdsourced data
– Domain: Movie
– Dialogue act: intent & slot-value pairs
• Problem setting
– User wants to achieve a specific task, i.e. buying a movie ticket
– Agent tries to understand user’s goal via dialogue before booking the ticket
Turn Speaker Intent & Slots Utterance
1 User greeting(greeting=hi) Hi
1 User
request(moviename; city=Houma;
state=Louisiana; date=this week; genre=foreign)
Are there any good movies showing around
Houma Louisiana this week?
2 Agent request(date)
What date would you like me to look for a
reservation
3 User inform(date=9th) How about the 9th
4 Agent inform(result={}; genre=foreign)
Unfortunately here are no foreign movies
playing at this time
5
Rule-Based Agent
• Gather all information via a
series of requests (form-filling)
• Issues
1) No notion about what requests
can be skipped
User goal – Two tickets for “Deadpool” tomorrow
9PM at AMC Pacific Place 11 theater, Seattle.
User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm?
Agent: Which theater do you want?
User: AMC pacific place 11 theater, please!
Agent: How many tickets do you want?
User: Two!
Agent: What date?
User: Tomorrow!
Agent: Which city do you want?
User: Seattle!
Agent: Do you have any specification for zip code?
User: No, I do not care.
Agent: Do you have any distance constraint?
User: No, I do not care.
Agent: How about video format? 2d? Imax 3d?
User: Any kind is okay.
Agent: Do you have any constraint to price?
User: No!
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at
AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
Wasteful! (very likely
the user does not care)
6
Rule-Based Agent
• Gather all information via a
series of requests (form-filling)
• Issues
1) No notion about what requests
can be skipped
2) No principle way to deal with
errors/uncertainty in language
understanding
User goal – Two tickets for “Hail Caesar” tomorrow
7:15PM at AMC Lowes Oak tree 6, Seattle.
User: Do you any ticket for movie ‘Hail Caesar’ at theater
AMC Lowes Oak Tree 6?
Agent: What start time do you prefer?
User: 7:15 pm Tomorrow
Agent: Which city you are looking for?
User: Seattle
Agent: How many tickets do you want?
User: Two
Agent: Okay, I have issued two tickets for movie ‘Hail Caeser’
tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle.
User: Oh, no.
Agent: Thanks!
Task failed!
(cannot correct the errors from proceeding component)
7
Rule-Based Agent
• Gather all information via a
series of requests (form-filling)
• Issues
1) No notion about what requests
can be skipped
2) No principle way to deal with
errors/uncertainty in language
understanding
3) Do not know how to
recommend options if the
user’s goal is not achievable
8
Task-Oriented Dialogue System Framework
Language Understanding
(LU)
Natural Language
Generation (NLG)
Semantic Frame
request_ticket
(moviename=Star Wars;
numberofpeople=5)
System Action/ Policy
request(theater)
Text Response
“Which theater do you prefer?”
Text Input
“Book 5 tickets for movie Star Wars”
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
9
State Representation
Task-Oriented Dialogue System Framework
Language Understanding
(LU)
Natural Language
Generation (NLG)
Dialogue Act
Natural Language
Natural Language
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
Dialogue Act
10
State Representation
User
System
User Simulation in Frame-Level Semantics
User
Dialogue Act
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
Error Model Controller
• Recognition error
• LU error
User Model
User Simulation
System
Dialogue Act
User
Dialogue Act
11
State Representation
User Simulation in Natural Language Level
Natural Language
Generation (NLG)
User Model
User Simulation
System
Dialogue Act
Language
Understanding
(LU)
NL
12
Knowledge
Database
Dialogue State Tracking
(DST)
Dialogue Policy
Learning
Dialogue Management (DM)
State RepresentationUser
Dialogue Act
Task-Completion Neural Dialogue Systems
• Language Understanding
– Joint semantic frame parsing by BLSTM (Hakkani-Tur et al., 2016)
• Dialogue State Tracking
– Available results returned by the formed symbolic query
– Latest user dialogue action
• Dialogue Policy Learning
– Reinforcement learning policy (Mnih et al., 2015)
• Natural Language Generation
– Template-based
– Model-based: semantically-conditioned LSTM generation (SC-LSTM)
(Wen et al., 2015)
13
Supervised
Supervised
Supervised
Reinforcement
+ Reinforcement (fine-tuning)
+ Reinforcement (fine-tuning)
+ Reinforcement (fine-tuning)
wi
<slot>
wi+1
O
EOS
<intent>
wi
<slot>
wi+1
O
EOS
<intent>
End-to-End Neural Dialogue Systems
• LU, DST (neural dialogue system), and NLG (user simulation)
are trained in supervised way
• End-to-end training for dialogue policy learning
Knowledge
Database
Neural
Dialogue
System
User Model
User Simulation
Dialogue
Policy
Natural
Language
w
0
w1 w2
NLG
EOS
User Goal
wi
<slot>
wi+1
O
EOS
<intent>
LU
𝑠𝑡
DST
𝑠1 𝑠2 𝑠 𝑛
𝑎1 𝑎2 𝑎 𝑘
……
…
Dialogue Policy Learning
14
Reinforcement Learning Agent
• Dialogue policy learning
– Deep Q-network: estimate Q-value given the state and action pair
• Reward
– Success: agent answers all the requested slots based on the user’s
constraints, and book the movie tickets within the max turns
– Failure: 1) agent finds no matching movie based on the user’s constraints
or 2) exceed the max turns
15
Experiments
• Rule-based agent
– Actions
• ask question (request)
• answer question (inform)
• give multiple_choice to user
• say “confirm_answer” to user
• say “closing”, “thanks”
– Agent asks the slot in a priority
order, but within a sliding
window (say, size = 2 or 3),
there is randomness.
• RL agent
– Model: Deep Q-Network
– Rewards
• Success: 2 × max_turn
• Fail: - max_turn
• -1 for per turn penalty
– Actions: 45 actions
• i.e. request(starttime), inform(moviename),
confirm(question), etc.
– State transition tuples ( 𝑠𝑡, 𝑎 𝑡, 𝑟𝑡, 𝑠𝑡+1)
– Experience replay (Schaul et al., 2015)
• Priority, dynamic pool
• Starts with a pool of rule-based tuples
16
• Frame-level semantics Natural language
The RL agent is able to learn how to interact with users to complete tasks more
efficiently and effectively, and outperforms the rule-based agent.
End-to-End Reinforcement Learning17
Language Understanding Impact
• Different error rates in LU
18
Language understanding results are crucial in terms of dialogue-level performance
Intent Error Analysis
• Intent error type
– I0: random
– I1: within group
– I2: between group
• Intent error rate
– I3: 0.00
– I4: 0.10
– I5: 0.20
Intent errors slightly influence the RL system performance
Group 1: greeting(), thanks(), etc
Group 2: inform(xx)
Group 3: request(xx)
Between-group intent errors degrade the system performance more
request_moviename(actor=Robert Downey Jr)
request_year
19
Slot Error Analysis
• Slot error type
– I0: random
– I1: slot deletion
– I2: value substitution
– I3: slot substitution
• Slot error rate
– S4: 0.00
– S5: 0.10
– S6: 0.20
Slot errors significantly degrade the RL system performance
Value substitution has the largest impact on the system performance
request_moviename
(actor=Robert Downey Jr)
director Robert Downey Sr20
Error Comparison
• Intent error rate • Slot error rate
The RL agent has better robustness to intent errors for dialogue-level performance
Slot filling is more important than intent detection in language understanding
21
Human Evaluation
• Users presented with a predefined user goal
• User rating: 1 (worst) to 5 (best) based on both naturalness
and coherence of the dialogue
22
User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle.
RULE BASED SYSTEM
User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm?
Agent: Which theater do you want?
User: AMC pacific place 11 theater, please!
Agent: How many tickets do you want?
User: Two!
Agent: What date?
User: Tomorrow!
Agent: Which city do you want?
User: Seattle!
Agent: Do you have any specification for zip code?
User: No, I do not care.
Agent: Do you have any distance constraint?
User: No, I do not care.
Agent: How about video format? 2d? Imax 3d?
User: Any kind is okay.
Agent: Do you have any constraint to price?
User: No!
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC
pacific place 11 theater, Seattle, movie ‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
REINFORCEMENT LEARNING SYSTEM
User: I want two tickets for movie ‘Deadpool’.
Agent: What is the starttime you prefer?
User: 9:00 PM
Agent: Which day?
User: Tomorrow
Agent: Which city?
User: Seattle
Agent: Which theater do you prefer?
User: AMC Pacific Place 11 theater.
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm
at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
Dialogue Examples
Skip the requests the user may not care about to improve efficiency
Issue 1: no notion about what
requests can be skipped
23
User goal – Two tickets for “Hail Caesar” tomorrow 7:15PM at AMC Lowes Oak tree 6, Seattle.
Dialogue Examples
Automatically learn when and what to confirm for better robustness
REINFORCEMENT LEARNING SYSTEM
Agent: What start time do you prefer?
User: 7:15 pm Tomorrow
Agent: Which city you are looking for?
User: Seattle
Agent: Okay, I will issue you tickets for movie ‘Hail Caeser’
tomorrow 7:50 pm at theater AMC Lowes Oak Tree 6, Seattle.
User: Oh, no.
Agent: It is 7:50 pm, right?
User: Two tickets for 7:15 pm, please.
Agent: Okay, I will issue you two tickets for movie ‘Hail Caeser’
tomorrow 7:15 pm at theater AMC Lowes Oak Tree 6, Seattle.
User: Thanks!
Agent: Thanks!
RULE BASED SYSTEM
User: Do you any ticket for movie ‘Hail Caesar’ at theater
AMC Lowes Oak Tree 6?
Agent: What start time do you prefer?
User: 7:15 pm Tomorrow
Agent: Which city you are looking for?
User: Seattle
Agent: How many tickets do you want?
User: Two
Agent: Okay, I have issued two tickets for movie ‘Hail
Caeser’ tomorrow 7:50 pm at theater AMC Lowes Oak
Tree 6, Seattle.
User: Oh, no.
Agent: Thanks!
Issue 2: no principle way to deal
with LU errors/uncertainty
24
Conclusions
• Our neural dialogue system demonstrates
– Robustness
• selecting actions based on uncertainty and confusion by RL
• systematic analysis about language understanding errors on dialogue
system performance
slot-level errors affect system performance more than intent-level errors
slot value replacement degrades the performance most
– Flexibility
• first neural dialogue system that allows user-initiated behaviors during
conversations
– Reproducibility
• demonstrate how to evaluate RL dialogue agents using crowdsourced
datasets and simulated users in an end-to-end fashion
• guaranteeing reproducibility and consistent comparisons of competing
methods in an identical setting
25
General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it.
But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit.
Where does it make sense for MS to focus its efforts in this space?
For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product.
For 2 & 3 the user already has some idea of what the information need is.
For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.
General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it.
But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit.
Where does it make sense for MS to focus its efforts in this space?
For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product.
For 2 & 3 the user already has some idea of what the information need is.
For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.
General goal, of course involves treating natural language as a knowledge representation language. So MR involves deriving structured information from free text, and then doing something with it.
But the work that’s been done in this space represents a tangle of different agendas. Let’s unpack them a bit.
Where does it make sense for MS to focus its efforts in this space?
For 1, doesn’t drive mainstream AI research. Hobby strand, though maybe good for PR purposes. Not a good driver of research/product.
For 2 & 3 the user already has some idea of what the information need is.
For 4, the user is overwhelmed by the complexity of the problem/solution space, needs help navigating.