Instructions for Submissions thorugh G- Classroom.pptx
NLP DLforDS
1. Deep Learning for Dialogue
Systems
Liangqun Lu
PhD program in Biology/Bioinformatics
MS program in Computer Science
2. JARVIS (Just Another Rather Very Intelligent System)
"J.A.R.V.I.S., are you up?"
"For you sir, always."
"J.A.R.V.I.S.? You ever hear the
tale of Jonah?"
"I wouldn't consider him a role
model."
"J.A.R.V.I.S., where's my flight
power?!"
"Working on it, sir. This is a
prototype."
https://www.youtube.com/watch?v=ZwOxM0-byvc
10. Summaries
● Seq2seq model can generate output sentences based on the input
sentences
● The maximum likelihood estimation (MLE) objective function does not
guarantee good responses to human beings in read world.
● It is likely to generate highly dull and generic responses such as “I
don’t know” regardless of the input, which is a buzzkiller in a
conversation.
● Mutual Information (MI) could avoid ~30% dull responses.
● It is likely to get stuck in an infinite loop of repetitive responses.
11. 2. RL for sentence generation
Encoder Decoder
Generator
(x)
Input
(h) Human R(h, x)
12. Hung-yi Lee : RL and GAN for Sentence Generation and Chat-bot
13. Hung-yi Lee : RL and GAN for Sentence Generation and Chat-bot
14. Hung-yi Lee : RL and GAN for Sentence Generation and Chat-bot
16. Evaluation
● Training: OpenSubtitles dataset (0.8 M pairs)
● Testing: 1000 input messages
● Length of dialogue;
● lexical diversity;
● human evaluation
17. Summaries
● Reinforcement Learning implemented in dialogue
generation rewards the conversation with properties:
informativity, coherence and ease of answering
● The model has the advantages on diversity, length, better
human judges and more interactive responses
● This approach makes it potential to generate long-term
dialogues
23. ● Random: random token
generation
● MLE: Seq2Seq with MLE
objective function
● SS: scheduled sampling
● PG-BLEU: policy gradient
with BLEU
* bilingual evaluation understudy
* NLL oracle:
24. ● The stability of SeqGAN
depends on the training
strategy such as g-steps,
d-steps and epoch
number k for g-step
● g-steps=1, d-steps=5,
k=3 has the best
performance
25. ● Table 2: 16,394
Chinese quatrains
● Table 3: 11,092
paragraphs
● Table 4: 695 music
26. Summaries
● Generative Adversarial Net (GAN) that uses a discriminative model to
guide the training of the generative model has enjoyed considerable
success in generating real-valued data.
● SeqGAN applying policy gradient to update from the discriminative
model to the generative model demonstrates significant
improvements in synthetic and real-world data.
27. References
1. Li, Jiwei, et al. "Deep reinforcement learning for dialogue generation." arXiv
preprint arXiv:1606.01541 (2016)
2. Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with
Policy Gradient" (2016)
3. Stanford CS224d: Deep Learning for Natural Language Processing
4. DL/ML Tutorial from Hung-yi Lee
Editor's Notes
My interest on this topic is actually from Iron Man movies. In the movies, we know that Iron man tony stark has an intelligent assistant called JARVIS, they have many interesting conversations. It will be a pleasure to have such smart virtual friend.
Deep Learning techniques have successful applications in many areas, including Natural Language Processing. These two papers from 2 years played an important role in dialogue systems, with advanced skills in RL and GAN.
In my understanding, deep learning toolbox provides tools which can be applied in dialogues, at least in these 3 steps.
So far, there are some intelligent sentence generation for dialogues from these techniques.
In seq2seq generation, the simplified architecture is like this one.
Here is an example:
There are 2 optimizations in this system, MLE and MI.
The seq2seq model is based on RNN with LSTM.
RNN is ---, the structure is this, including input and output. Unfold shows the details here, from xt to ot, the input actually is xt, s(t-1) and the output is ot and st, s(t-1) records the previous information, which is important in sequence tasks.
The advantage of RNN, compared to other DL models, is that RNN is suitable to process sequence data.
However RNN has gradient exploding or vanishing problem when the sequence is long, because the optimization has to consider all memory from previous steps. LSTM was developed to optimize the memory problem with three gates in a cell.
Encoder and Decoder, a function used to model the complex system.
An encoder and decoder example from Keras shows the parameters in layers. The encoder and decoder has the same number 256.
Evaluating dialogue systems is difficult. Metrics such as BLEU (Papineni et al., 2002) and perplexity have been widely used for dialogue quality evaluation (Li et al., 2016a; Vinyals and Le, 2015; Sordoni et al., 2015), but it is widely debated how well these automatic metrics are correlated with true response quality (Liu et al., 2016; Galley et al., 2015). Since the goal of the proposed system is not to predict the highest probability response, but rather the long-term success of the dialogue, we do not employ BLEU or perplexity for evaluation.
We propose to measure the ease of answering a generated turn by using the negative log likelihood of responding to that utterance with a dull response.
Li, Jiwei, et al. "Deep reinforcement learning for dialogue generation." arXiv preprint arXiv:1606.01541 (2016).
Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient" (2016)