SlideShare a Scribd company logo
Adversarial Learning for
Neural Dialogue Generation
Presenter: Keon Kim
Original Paper by: Jiwei Li, Will Monroe, Tianlin Shi,
Alan Ritter and Dan Jurafsky
TODOs
● What is this about?
● Result First!
● Why and How on Text Data?
● Adversarial Learning?
● The Model Breakdown
- Generative
- Discriminative
● Training Methods
- Monte Carlo Policy Gradient (REINFORCE)
- Reward for Every Generation Step (REGS)
● Teacher Forcing
● Notes
What Is This About?
- Adversarial Training for open-domain dialogue generation
“to train to produce sequences that are indistinguishable from
human-generated dialogue utterances.”
Result First!
Adversarially-Trained system generates higher-quality responses than previous baselines!
Adversarial Training
MinMax Game between Generator vs Discriminator
Why and How on Text Data?
- Analogous to Turing Test ( just discriminator instead of human )
- Enjoyed great success in computer vision
- But hard to apply to NLP because the text space is too discontinuous
- Small updates generally don’t change the reinforcement feedback
- Progress has been made and this is one of them
Given a dialogue history X consisting of a sequence of dialogue utterances, the model needs
to generate a response Y. We view the process of sentence generation as a sequence of
actions that are taken according to a policy defined by an encoder-decoder recurrent neural
networks.
Model Breakdown
The model has two main parts, G and D:
Generative Model (G)
- Generates a response y given dialogue history x.
- Standard Seq2Seq model with Attention Mechanism
Discriminative Model (D)
- Binary Classifier that takes as input a sequence of dialogue
utterances {x, y} and outputs label indicating whether the
input is generated by human or machines
- Hierarchical Encoder + 2 class softmax function -> returns probability of the input dialogue episode
being a machine or human generated dialogues.
Training Methods (Important Part)
Policy Gradient Methods:
- The score of current utterances being human-generated ones assigned by the discriminator is used
as a reward for the generator, which is trained to maximize the expected reward of generated
utterances using REINFORCE algorithm.
Uses Monte Carlo Policy Gradient (REINFORCE)
approximated by likelihood ratio
Training Methods (Important Part)
Policy Gradient Methods:
- The score of current utterances being human-generated ones assigned by the discriminator is used
as a reward for the generator, which is trained to maximize the expected reward of generated
utterances using REINFORCE algorithm.
Uses Monte Carlo Policy Gradient (REINFORCE)
approximated by likelihood ratio
classification score
baseline value to reduce
the variance of the
estimate while keeping it
unbiased
policygradient in
parameter space
Training Methods (Important Part)
Policy Gradient Methods:
- The score of current utterances being human-generated ones assigned by the discriminator is used
as a reward for the generator, which is trained to maximize the expected reward of generated
utterances using REINFORCE algorithm.
Uses Monte Carlo Policy Gradient (REINFORCE)
approximated by likelihood ratio
scalar reward
policy updates by the direction of the
reward in the parameter space
Training Methods (Cont’d)
Problem with REINFORCE:
- has disadvantage that the expectation of the reward is approximated by only one sample
- reward associated with the sample is used for all actions
- REINFORCE assigns the same negative reward to all tokens [I, don’t, know] by comparing them
with I don’t know
- Proper credit assignment in training would give separate rewards,
- most likely a neutral token for token I, and negative reward to don’t and know.
Authors of the paper calls it: Reward for Every Generation Step (REGS)
Input : What’s your name
human : I am John
machine : I don’t know
Reward for Every Generation Step (REGS)
We need rewards for intermediate steps.
Two Strategies Introduced:
1. Monte Carlo (MC) Search
2. Training Discriminator For Rewarding Partially Decoded Sequences
Monte Carlo Search
1. Given a partially decoded step s, the model keeps sampling tokens from the distribution until the decoding
finishes
2. Repeats N times (N generated sequences will share a common prefix s).
3. These N sequences are fed to the discriminator, the average score of which is used as a reward.
Rewarding Partially Decoded Sequences
Directly train a discriminator that is able to assign rewards to both fully and partially decoded sequences
- Break generated sequences into partial sequences
Problem:
- Earlier actions in a sequence are shared among multiple training examples for discriminator.
- Result in overfitting
The author proposes a similar strategy used in AlphaGo to mitigate the problem.
Rewarding Partially Decoded Sequences
For each collection of subsequences of Y, randomly sample only one example from positive examples and
one example from negative examples, which are used to update discriminator.
- Time effective but less accurate than MC model.
Rewarding Partially Decoded Sequences
For each collection of subsequences of Y, randomly sample only one example from positive examples and
one example from negative examples, which are used to update discriminator.
- Time effective but less accurate than MC model.
classification score baseline value to reduce the
variance of the estimate while
keeping it unbiased
policy
gradient in
parameter space
Rewarding Partially Decoded Sequences
classification score baseline value
policy
gradient in
parameter space
classification score
baseline value
policygradient in
parameter space
Teacher Forcing
Generative model is still unstable, because:
- generative model can only be indirectly exposed to the gold-standard target sequences through the
reward passed back from the discriminator.
- This reward is used to promote or discourage the generator’s own generated sequences.
This is fragile, because:
- Once a generator accidentally deteriorates in some training batches
- And Discriminator consequently does an extremely good job in recognizing sequences from the
generator, the generator immediately gets lost
- It knows that the generated results are bad, but does not know what results are good.
Teacher Forcing (Cont’d)
The author proposes feeding human generated responses to the generator for model updates.
- discriminator automatically assigns a reward of 1 to the human responses and feed it to the
generator to use this reward to update itself.
- Analogous to having a teacher intervene and force it to generate the true responses
Generator then updates itself using this reward on the human generated example only if the reward is
larger than the baseline value.
Pseudocode for the Algorithm
Result Again
Adversarially-Trained system generates higher-quality responses than previous baselines!
Notes
It did not show great performance on abstractive summarization task.
Maybe because adversarial training strategy is more beneficial to:
- Tasks in which there is a big discrepancy between the distributions of the generated sequences and
the reference target sequences
- Tasks in which input sequences do not bear all the information needed to generate the target
- in other words, there is no single correct target sequence in the semantic space.

More Related Content

What's hot

오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGANNAVER Engineering
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMustafa Yagmur
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
EuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Deep Advances in Generative Modeling
Deep Advances in Generative ModelingDeep Advances in Generative Modeling
Deep Advances in Generative Modelingindico data
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical ImagingSanghoon Hong
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...남주 김
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsArtifacia
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 

What's hot (20)

오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGAN
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
EuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Deep Advances in Generative Modeling
Deep Advances in Generative ModelingDeep Advances in Generative Modeling
Deep Advances in Generative Modeling
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical Imaging
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 

Similar to Adversarial learning for neural dialogue generation

Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Rama Irsheidat
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playingSaeid Ghafouri
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine LearningGaurav Bhalotia
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingCrossing Minds
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview QuestionsRock Interview
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdfcaa28steve
 
Real-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert AdviceReal-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert AdviceHila Becker
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!To Sum It Up
 
1. Demystifying ML.pdf
1. Demystifying ML.pdf1. Demystifying ML.pdf
1. Demystifying ML.pdfJyoti Yadav
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 

Similar to Adversarial learning for neural dialogue generation (20)

Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playing
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Endsem AI merged.pdf
Endsem AI merged.pdfEndsem AI merged.pdf
Endsem AI merged.pdf
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview Questions
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
Real-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert AdviceReal-time Ranking of Electrical Feeders using Expert Advice
Real-time Ranking of Electrical Feeders using Expert Advice
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!
 
1. Demystifying ML.pdf
1. Demystifying ML.pdf1. Demystifying ML.pdf
1. Demystifying ML.pdf
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

Adversarial learning for neural dialogue generation

  • 1. Adversarial Learning for Neural Dialogue Generation Presenter: Keon Kim Original Paper by: Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter and Dan Jurafsky
  • 2. TODOs ● What is this about? ● Result First! ● Why and How on Text Data? ● Adversarial Learning? ● The Model Breakdown - Generative - Discriminative ● Training Methods - Monte Carlo Policy Gradient (REINFORCE) - Reward for Every Generation Step (REGS) ● Teacher Forcing ● Notes
  • 3. What Is This About? - Adversarial Training for open-domain dialogue generation “to train to produce sequences that are indistinguishable from human-generated dialogue utterances.”
  • 4. Result First! Adversarially-Trained system generates higher-quality responses than previous baselines!
  • 5. Adversarial Training MinMax Game between Generator vs Discriminator
  • 6. Why and How on Text Data? - Analogous to Turing Test ( just discriminator instead of human ) - Enjoyed great success in computer vision - But hard to apply to NLP because the text space is too discontinuous - Small updates generally don’t change the reinforcement feedback - Progress has been made and this is one of them Given a dialogue history X consisting of a sequence of dialogue utterances, the model needs to generate a response Y. We view the process of sentence generation as a sequence of actions that are taken according to a policy defined by an encoder-decoder recurrent neural networks.
  • 7. Model Breakdown The model has two main parts, G and D: Generative Model (G) - Generates a response y given dialogue history x. - Standard Seq2Seq model with Attention Mechanism Discriminative Model (D) - Binary Classifier that takes as input a sequence of dialogue utterances {x, y} and outputs label indicating whether the input is generated by human or machines - Hierarchical Encoder + 2 class softmax function -> returns probability of the input dialogue episode being a machine or human generated dialogues.
  • 8. Training Methods (Important Part) Policy Gradient Methods: - The score of current utterances being human-generated ones assigned by the discriminator is used as a reward for the generator, which is trained to maximize the expected reward of generated utterances using REINFORCE algorithm. Uses Monte Carlo Policy Gradient (REINFORCE) approximated by likelihood ratio
  • 9. Training Methods (Important Part) Policy Gradient Methods: - The score of current utterances being human-generated ones assigned by the discriminator is used as a reward for the generator, which is trained to maximize the expected reward of generated utterances using REINFORCE algorithm. Uses Monte Carlo Policy Gradient (REINFORCE) approximated by likelihood ratio classification score baseline value to reduce the variance of the estimate while keeping it unbiased policygradient in parameter space
  • 10. Training Methods (Important Part) Policy Gradient Methods: - The score of current utterances being human-generated ones assigned by the discriminator is used as a reward for the generator, which is trained to maximize the expected reward of generated utterances using REINFORCE algorithm. Uses Monte Carlo Policy Gradient (REINFORCE) approximated by likelihood ratio scalar reward policy updates by the direction of the reward in the parameter space
  • 11. Training Methods (Cont’d) Problem with REINFORCE: - has disadvantage that the expectation of the reward is approximated by only one sample - reward associated with the sample is used for all actions - REINFORCE assigns the same negative reward to all tokens [I, don’t, know] by comparing them with I don’t know - Proper credit assignment in training would give separate rewards, - most likely a neutral token for token I, and negative reward to don’t and know. Authors of the paper calls it: Reward for Every Generation Step (REGS) Input : What’s your name human : I am John machine : I don’t know
  • 12. Reward for Every Generation Step (REGS) We need rewards for intermediate steps. Two Strategies Introduced: 1. Monte Carlo (MC) Search 2. Training Discriminator For Rewarding Partially Decoded Sequences
  • 13. Monte Carlo Search 1. Given a partially decoded step s, the model keeps sampling tokens from the distribution until the decoding finishes 2. Repeats N times (N generated sequences will share a common prefix s). 3. These N sequences are fed to the discriminator, the average score of which is used as a reward.
  • 14. Rewarding Partially Decoded Sequences Directly train a discriminator that is able to assign rewards to both fully and partially decoded sequences - Break generated sequences into partial sequences Problem: - Earlier actions in a sequence are shared among multiple training examples for discriminator. - Result in overfitting The author proposes a similar strategy used in AlphaGo to mitigate the problem.
  • 15. Rewarding Partially Decoded Sequences For each collection of subsequences of Y, randomly sample only one example from positive examples and one example from negative examples, which are used to update discriminator. - Time effective but less accurate than MC model.
  • 16. Rewarding Partially Decoded Sequences For each collection of subsequences of Y, randomly sample only one example from positive examples and one example from negative examples, which are used to update discriminator. - Time effective but less accurate than MC model. classification score baseline value to reduce the variance of the estimate while keeping it unbiased policy gradient in parameter space
  • 17. Rewarding Partially Decoded Sequences classification score baseline value policy gradient in parameter space classification score baseline value policygradient in parameter space
  • 18. Teacher Forcing Generative model is still unstable, because: - generative model can only be indirectly exposed to the gold-standard target sequences through the reward passed back from the discriminator. - This reward is used to promote or discourage the generator’s own generated sequences. This is fragile, because: - Once a generator accidentally deteriorates in some training batches - And Discriminator consequently does an extremely good job in recognizing sequences from the generator, the generator immediately gets lost - It knows that the generated results are bad, but does not know what results are good.
  • 19. Teacher Forcing (Cont’d) The author proposes feeding human generated responses to the generator for model updates. - discriminator automatically assigns a reward of 1 to the human responses and feed it to the generator to use this reward to update itself. - Analogous to having a teacher intervene and force it to generate the true responses Generator then updates itself using this reward on the human generated example only if the reward is larger than the baseline value.
  • 20. Pseudocode for the Algorithm
  • 21. Result Again Adversarially-Trained system generates higher-quality responses than previous baselines!
  • 22. Notes It did not show great performance on abstractive summarization task. Maybe because adversarial training strategy is more beneficial to: - Tasks in which there is a big discrepancy between the distributions of the generated sequences and the reference target sequences - Tasks in which input sequences do not bear all the information needed to generate the target - in other words, there is no single correct target sequence in the semantic space.

Editor's Notes

  1. To set up the synthetic data experiments, we first initialize the parameters of an LSTM network following the normal distribution N (0, 1) as the oracle describing the real data distribution Goracle(xt|x1, . . . , xt−1). Then we use it to generate 10,000 sequences of length 20 as the training set S for the generative models. We use a randomly initialized LSTM as the true model, aka, the oracle, to generate the real data distribution p(xt|x1, . . . , xt−1) for the following experiments. When optimizing discriminative models, supervised training is applied to minimize the cross entropy, which is widely used as the objective function for classification and prediction tasks: L(y, yˆ) = −y log ˆy − (1 − y) log(1 − yˆ), (35) where y is the ground truth label of the input sequence and yˆ is the predicted probability from the discriminative models.