This is a survey about Dialog System, Question and Answering, including the 03 generations: (1) Symbolic Rule/Template Based QA; (2) Data Driven, Learning; (3) Data-Driven Deep Learning. It also presents the available Frameworks and Datas for Dialog Systems.
2. Communicating Knowledge
Vietnam Development Center
What is Dialogue System?
Definition
3 generations of Dialog System
Evaluation
Spoken Dialogue System
Architecture
Components
Some approaches
A Neural Conversation Model
Deep Reinforcement Learning for Dialogue Generation
Common Frameworks and Data sets
Discussion
Contents
3. Communicating Knowledge
Vietnam Development Center
Definition:
DS is a computer program developed to converse with human, with a coherent structure.
DS can use text, speech, graphics, haptics, gestures and other modes for
communication on both the input and output.
Nowadays, speech is most commonly used for the input and output => Spoken Dialogue
System.
3 Generations of DS
G1: Symbolic Rule/Template Based QA
Focus on grammatical rule & ontological design by human experts (early AI approach)
Easy interpretation, debugging, and system update
Popular before late 90’s
Still in use in commercial systems and by bots startups
Limitations:
heavily reliance on experts
hard to scale over domains
data used only to help design rules, not for learning
What is Dialogue System - 1/3
4. Communicating Knowledge
Vietnam Development Center
G2: Data Driven, Learning
Data used not to design rules for NLU and action, but to learn statistical parameters in
dialogue systems
Reduce cost of hand-crafting complex dialogue manager
Robustness against speech recognize errors in noisy environment
MDP(Markov Decision Process)/POMDP (Partially Observed MDP) & RL for dialogue
policy
Discriminative (CRF) & generative (HMM) methods for NLU
Popular in academic research until 2014 (before deep learning arrived at the dialogue
world); in parallel with G1 (BBN, AT&T, CMU, SRI, CU ...)
Limitations:
Not easy to interpret, debug, and update systems
Still hard to scale over domains
Models & representations not powerful enough; no end-2-end, hard to scale up
Remained academic until deep learning arrived
What is Dialogue System – 2/3
5. Communicating Knowledge
Vietnam Development Center
G3: Data-Driven Deep Learning:
Like G2, data used to learn everything in dialogue systems
Reduce cost of hand-crafting complex dialogue manager
Robustness against speech recognize errors in noisy environment & against NLU
errors
MDP/POMDP & reinforcement learning for dialogue policy (same)
Neural models & representations are much more powerful
End-to-End learning becomes feasible
Attracted huge research efforts since 2015 (after deep learning’s success in
vision/speech and in deep RL shown success in Atari games)
Limitations:
Still not easy to interpret, debug, and update systems
Lack interface btw cont. neural learning and symbolic NL structure to human users
Lack active research in scaling over domains via deep transfer learning & RL
No clear commercial success reported yet
Evaluation:?
Still argueing, no evaluation method is set as standard.
BLEU is usually used.
Some researchers define their own evaluation metrics to measure quality.
What is Dialogue System 3/3
7. Communicating Knowledge
Vietnam Development Center
Automatic Speech Recognition (ASR):
Convert from voice signal to Words and Manage uncertainty.
Challenges:
Environment noises
Speech production: low fluency, false starts, filled pauses, repeats, corrections,
accent, age, gender, differences between human-human and human-machine
speech
Technological familiarity of user
Spoken Language Understanding (SLU)”
Spoken Language Understanding is the task of extracting meaning from utterances
Convert from words to concepts
Dialog acts (the overall intent of an utterance)
Domain specific concepts
Syntactic/Semantic parser
Very difficult under noisy conditions
Challenges:
Recognizer error, background noise resulting in indels (insertions / substitutions /
deletions), word boundary detection problems
Language production phenomena: low fluency, false starts, corrections, repairs are difficult
to parse
Meaning must often be assembled from multiple speaker turns
There are many, many possible ways to say the same thing.
Spoken Dialog System - Components
8. Communicating Knowledge
Vietnam Development Center
Dialogue Management:
Map concepts to action.
Manage dialog history, states and general flow of the conversation
Language Generation:
Generate response for the input.
Text To Speech Synthesis:
Convert the generated response to speech and present to user.
Spoken Dialog System - Components
9. Communicating Knowledge
Vietnam Development Center
Previous approaches are often restricted to specific domains (e.g., booking an airline
ticket) and require hand-crafted rules.
Proposed a model based on their “Sequence to sequence learning with neural networks”
(NIPS, 2014).
Can be trained end-to-end and thus requires much fewer hand-crafted rules.
Allows researchers to work on tasks for which domain knowledge may not be readily
available, or for tasks which are simply too hard to design rules manually.
The model:
A Neural Conversation Model – Oriol Vinyals, Quoc V.Le – Google
Using the seq2seq framework for modeling conversations
10. Communicating Knowledge
Vietnam Development Center
Data sets:
IT Helpdesk Troubleshooting:
Typical interaction word length: 400
Turn talking is clearly signaled
30M tokens (3M used as validation)
OpenSubtitles (Tiedemann, 2009):
Noisy data set
Movie conversation in XML format
After preprocessed:
– Train set: 62M sentences, 923M tokens
– Validation set: 26M sentences, 295M tokens
A Neural Conversation Model – Oriol Vinyals, Quoc V.Le – Google
11. Communicating Knowledge
Vietnam Development Center
Experiments:
IT Helpdesk:
Trained single layer LSTM with 1024 memory cells using stochastic gradient descent
with gradient clipping.
Vocabulary: 20K words
Conversation 1: VPN issues
A Neural Conversation Model – Oriol Vinyals, Quoc V.Le – Google
12. Communicating Knowledge
Vietnam Development Center
Experiments:
OpenSubtitles:
Train 2-layered LSTM, 4096 memory cells for each layer.
Vocabulary: 100k most frequently words.
A Neural Conversation Model – Oriol Vinyals, Quoc V.Le – Google
15. Communicating Knowledge
Vietnam Development Center
Conclusion:
A simple language model based on the seq2seq framework can be used to train a
conversational engine .
It can generate simple and basic conversations, and extract knowledge from a noisy but
open-domain dataset.
Purely data driven without any rules, but can generate quite proper answers.
A big limitation: lack of a coherent personality.
A Neural Conversation Model – Oriol Vinyals, Quoc V.Le – Google
16. Communicating Knowledge
Vietnam Development Center
Authors: J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky
Despite the success of SEQ2SEQ models in dialogue generation,
two problems emerge:
How to keep the conversation longer?
Seq2seq models tend to generate generic responses like “I don’t know” regardless
the input. => Responses like this will close the conversion.
The cause is seq2seq use MLE objective function. But the frequency if those generic
responses is very high in training set.
System becomes stuck in infinite loop of repetitive responses. This is due to MLE-base
seq2seq models’ inability to account for repetition.
Deep RL for Dialogue Generation
17. Communicating Knowledge
Vietnam Development Center
=> we need a conversation framework that has the ability to:
(1) integrate developer-defined rewards that better mimic the true goal of chatbot
development.
(2) model the long- term influence of a generated response in an ongoing dialogue.
Proposed a neural RL generation method:
can optimize long-term rewards designed by system developers.
uses the encoder- decoder architecture as its backbone
simulates conversation between two virtual agents to explore the space of possible
actions while learning to maximize expected reward.
We define simple heuristic approximations to rewards that characterize good
conversations: good conversations are forward-looking or interactive (a turn suggests a
following turn), informative, and coherent.
Use policy gradient method instead of MLE objective function.
Authors goal is to integrate Seq2seq and RL to get advantages of
both.
Deep RL for Dialogue Generation
18. Communicating Knowledge
Vietnam Development Center
Reward: r
Ease of answering: generated answer should be easy to respond.
S: set of 8 manually collected dull response (“I don’t know”, …)
NS: size of S, s: a sequence in S, Ns: # of token in s.
Pseq2seq: the likelihood calculated by Seq2seq model.
Information flow: agent should contribute new information to keep dialogue moving =>
penalizing semantic similarity between 2 consecutive turns of agent:
hpi, hpi+1 resulted from encoder for pi, pi+1
Deep RL for Dialogue Generation
19. Communicating Knowledge
Vietnam Development Center
Reward: r
Semantic Coherence: to avoid high reward but not grammatical and coherent
Pseq2seq(a|pi, qi): probability of generating a given the previous utterances [pi, qi]
2nd part: backward probability of generating the previous dialogue utterance
qi based on response a
Final reward r:
lamda1 + lamda2 + lamda3 = 1, lamda1 = lamda2 = 0.25, lamda3 = 0.5
Deep RL for Dialogue Generation
21. Communicating Knowledge
Vietnam Development Center
Experiment results:
Sub set of 10M messages from OpenSubtitles dataset and extract 0.8M message with
lowest likelihood of generating dull response to ensure the initial input is easy to respond
Deep RL for Dialogue Generation
24. Communicating Knowledge
Vietnam Development Center
TensorFlow:
Open source software library for numerical computation using data flow graphs
IrisTK:
Java-based framework for developing spoken dialogue systems.
Url: http://www.iristk.net/
OpenDial:
Java-based, domain-independent toolkit for developing spoken dialogue systems
Url: http://www.opendial-toolkit.net/
CSLU Toolkit:
a comprehensive suite of tools to enable exploration, learning, and research into speech and human-computer
interaction.
http://www.cslu.ogi.edu/toolkit/
NADIA: (developed by MARKUS M. BERG)
set of Java-based components that deals with the creation of spoken dialogue systems.
Detail information (Phd Thesis, paper: http://mmberg.net/nadia/
Reference source code (include data model): https://github.com/mmberg
Datasets:
https://github.com/karthikncode/nlp-datasets
Unbutu Dialogue Corpus
Frameworks and Datasets for SDS
25. Communicating Knowledge
Vietnam Development Center
Three generations of SDS – Li Deng, Chief Scientist of AI, MS AI
The Unreasonable Effectiveness of Recurrent Neural Networks
A neural conversation model – Oriol Vinyals, Quoc V.Le – Google - 2015
Deep reinforcement learning for Dialogue Generation – Jiwei Li, Will Monroe,
Dan Jurafsky (Standford Univ.), Alan Ritter (Ohio State Univ.), Michel Galley,
Jianfeng Gao (MS Research) - 2016
Neural responding machine for short-text conversation – Lifeng Shang,
Zhengdong Lu, Hang Li – Huawei Tech., 2015
Deep RL: An overview – Yuxi Li - 2017
Dialogue System – Wikipedia: https://en.wikipedia.org/wiki/Dialog_system
Speech Recognition: https://en.wikipedia.org/wiki/Speech_recognition
Neural Network Dialog System Papers:
https://github.com/snakeztc/NeuralDialogPapers
Datasets for Natural Language Processing:
https://github.com/karthikncode/nlp-datasets
References