SlideShare a Scribd company logo
Contrastive Learning with
Adversarial Perturbations for
Conditional Text Generation
Seanie Lee1*, Dong Bok Lee1*, Sung Ju Hwang1,2
KAIST1, Daejeon, South Korea
AITRICS2, Seoul, South Korea
1
Pretrained Language Model
2
Pretraining language model with large corpus and finetuning it for target task
requires a large amount of labeled data.
Conditional Text Generation
3
Conditional text generation is to generate another sequence from the given
sequence. Generally, we use encoder-decoder architecture.
the blue
Encoder Encoder Encoder
house
Embed Embed Embed
Decoder Decoder Decoder Decoder
Embed Embed Embed Embed
la masion bleu <eos>
Exposure Bias
4
Seq2seq models trained with teacher forcing often show exposure bias problem,
which hurts generalization to unseen inputs.
the blue
Encoder Encoder Encoder
house <bos>
Embed Embed Embed
Decoder Decoder Decoder Decoder
Embed Embed Embed Embed
le masion bleu <eos>
la masion bleu
prediction
Ground Truth
Contrastive Learning Framework
5
We propose to use contrast a ground truth pair to negative pairs for better
representation of target sentence.
I cannot do that.
GT Target
Sentence
Source Sent
ence
Encoder-Decoder
He wasn’t in great shape <eos>
<bos> He wasn’t in great shape
Randomly sampled negative examples are easily discriminated with the pretrained
language model and requires a large batch size to mine meaningful negative examples.
Contrastive Learning with Adversarial Perturbation
6
We propose to use adversarial perturbation to generate an “imposter” which is
close to the GT in embedding space but semantically different.
Imposter
He wasn’t in good shape.
Distant-Target
Perturbation
He was was
in good shape.
Perturbation
Source Sent
ence
Encoder-Decoder
He wasn’t in great shape <eos>
<bos> He wasn’t in great shape
Manifold
Contrastive Learning with Adversarial Perturbation
7
Conversely, we generate a “distant target” which is far away from the source
sentence in embedding space but semantically similar.
Imposter
He wasn’t in good shape.
Distant-Target
Perturbation
He was was
in good shape.
Perturbation
Source Sent
ence
Encoder-Decoder
He wasn’t in great shape <eos>
<bos> He wasn’t in great shape
Manifold
Contrastive Learning with Adversarial Perturbation
8
We pull the imposter as well as the negative examples away from the source and
push the distant target and target to the source.
Max
Min
push
source target
dist-target imposter
pull
Contrastive Learning objective
9
Given a pair of source and target sentence 𝑥("), 𝒚(𝒊), we randomly sample 𝒚(𝒋) with
𝑖 ≠ 𝑗 and use them as a set of negative examples 𝑆.
As SimCLR, we maximize the cosine similarity between source and target and minimize
it between source and negative examples.
Nu era într-o formă prea bună.
𝒙(𝒊)
He wasn’t in great shape.
𝒚(𝒊)
But I cannot do it anymore.
By mid-July, it was 40 percent.
𝒚(𝒋)
𝒚(𝒌)
Chen et al. "A simple framework for contrastive learning of visual representations." ICML 2020.
Generation of Imposter
10
We add a small perturbation to the hidden representation of target sentence to
generate imposter with linear approximation as Goodfellow et al. (2015).
Encoder-Decoder
Nu era într-o formă
prea bună.
<bos> He wasn’t
in great shape.
He was was in
great shape. Pooling
Pooling
Min
Source Sentence Target Sentence
Goodfellow et al. "Explaining and harnessing adversarial examples. International Conference on Learning Representations." ICLR 2015.
Objective
Linear Approximation
Generation of Distant Target
11
Add a large perturbation to the target embedding to be far away from the source
sentence but preserving the semantics of target sentence.
Maximize Distance
Semantic Preservation
Encoder Decoder
Pooling
Pooling
He wasn’t in
good shape.
Source Sentence
Max
Nu era într-o formă
prea bună.
<bos> He wasn’t
in great shape.
Target Sentence
Learning Objective – (1)
12
We add the imposter to the set of negative examples 𝑆 and use distant target as
another positive example of source sentence for contrastive learning.
Learning Objective
13
We jointly maximize the following objectives with stochastic gradient ascent.
Experimental Setup – (1)
14
1) Tasks and Evaluation Metric
• Neural Machine Translation: BLEU score
• Question Generation : BLEU score, F1/EM
• Text Summarization: Rouge score
2) Data
• WMT’16 RO-EN
• SQuAD
• Xsum
Experimental Setup – (2)
15
3) Baselines
• T5-MLE:
The T5 model trained with maximum likelihood estimation.
• T5-𝛼-MLE:
The T5 model trained with MLE but decode target sequence with temperature
scaling 𝛼 in softmax.
• T5-MLE-contrastive:
Naïve contrastive learning with MLE.
[Caccia 2020]Caccia et al., Language gans falling short, ICLR 2019
Experimental Setup – (2)
16
3) Baselines
• T5-SSMBA [Ng 2020]:
Generating additional examples by denoising and reconstructing target
sentences with masked language model
• T5-WordDropout Contrastive [Yang 2019]:
Generate negative examples by removing the most frequent word from the
target sentence.
• T5-R3f [Aghajanyan 2021]:
Add a Gaussian noise and enforce consistency loss.
[Ng 2020] Ng et al, Ssmba: Self-supervised manifold based data augmentation for improving out-of-domain robustness, EMNLP 2020
[Yang2021] Reducing word omission errors in neural machine translation: A contrastive learning approach, ACL 2019
[Aghajanyan 2019] Better fine-tuning by reducing representational collapse, ICLR2021
Experimental Result – (1)
17
Method BLEU
Machine Translation – WMT’16 RO-EN
T5-MLE 32.43
T5-𝛼-MLE 32.14
T5-MLE-contrastive 32.03
T5-SSMBA 32.81
T5-WordDropout Contrastive 32.44
T5-CLAPS (Ours) 33.96
Experimental Result – (2)
18
Method BLEU F1 EM
Question Generation – SQuAD
T5-MLE 21.00 67.64 55.91
T5-𝛼-MLE 20.50 68.04 56.30
T5-MLE-contrastive 20.91 67.32 55.25
T5-SSMBA 21.07 68.47 56.37
T5-WordDropout Contrastive 21.19 68.16 56.41
T5-CLAPS (Ours) 21.55 69.01 57.06
Experimental Result – (3)
19
Method Rouge-1 Rouge-2 Rouge-L
Text Summarization – Xsum
T5-MLE 36.10 14.72 29.16
T5-𝛼-MLE 36.68 15.10 29.72
T5-MLE-contrastive 36.34 14.81 29.41
T5-SSMBA 36.58 14.81 29.79
T5-WordDropout Contrastive 36.88 15.11 29.68
T5-CLAPS (Ours) 37.89 15.78 30.59
Visualization of Sentence Embedding
20
The model learns to push away the imposter from the target sentence and pull the
distant target to the source sentence.
Conclusion
21
• We propose a contrastive learning framework for conditional sequence
generation to mitigate the exposure bias problem.
• With adversarial perturbation, we generate negative and positive pairs that are
more difficult for the model to distinguish from the GT pair.
• Results show that we outperforms the baselines of T5 model across machine
translation, question generation and summarization tasks.
Future work
22
• For future work, we will improve the quality of imposter which contains
many grammatical errors.
• Generating imposter and distant target still requires a large amount of labeled
data. We need to improve the sample efficiency.
Thank you

More Related Content

What's hot

Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Rabab Munawar
 
Transformers
TransformersTransformers
Transformers
Anup Joseph
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
Loic Merckel
 
introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learning
butest
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
Jun Lang
 
Ant colony optimization
Ant colony optimizationAnt colony optimization
Ant colony optimization
Meenakshi Devi
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
SynaptonIncorporated
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
Machine learning
Machine learningMachine learning
Machine learning
Wes Eklund
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
Thomas Delteil
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
Chun-Ming Chang
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
Yuta Niki
 
RM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lectureRM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lecture
VIT University (Chennai Campus)
 
LLM presentation final
LLM presentation finalLLM presentation final
LLM presentation final
Ruth Griffin
 
What Is Artificial Emotional Intelligence?
What Is Artificial Emotional Intelligence?What Is Artificial Emotional Intelligence?
What Is Artificial Emotional Intelligence?
Bernard Marr
 
Deep contextualized word representations
Deep contextualized word representationsDeep contextualized word representations
Deep contextualized word representations
Junya Kamura
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
WarNik Chow
 

What's hot (20)

Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Transformers
TransformersTransformers
Transformers
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learning
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Ant colony optimization
Ant colony optimizationAnt colony optimization
Ant colony optimization
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Machine learning
Machine learningMachine learning
Machine learning
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
RM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lectureRM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lecture
 
LLM presentation final
LLM presentation finalLLM presentation final
LLM presentation final
 
What Is Artificial Emotional Intelligence?
What Is Artificial Emotional Intelligence?What Is Artificial Emotional Intelligence?
What Is Artificial Emotional Intelligence?
 
Deep contextualized word representations
Deep contextualized word representationsDeep contextualized word representations
Deep contextualized word representations
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 

Similar to Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
Hiroki Shimanaka
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Tae Hwan Jung
 
UWB semeval2016-task5
UWB semeval2016-task5UWB semeval2016-task5
UWB semeval2016-task5
Lukáš Svoboda
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deren Lei
 
Bootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingBootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph Embedding
Nanjing University
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
SDL
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
AIST
 
nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
Yuki Saito
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detectionDynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Koza Ozawa
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
AbdurrahimDerric
 
2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases
JAEMINJEONG5
 
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDINGUNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
ijaia
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken Down
Chin Huan Tan
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Maximum likelihood-set - introduction
Maximum likelihood-set - introductionMaximum likelihood-set - introduction
Maximum likelihood-set - introduction
Yusuke Matsubara
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
Huali Zhao
 

Similar to Contrastive Learning with Adversarial Perturbations for Conditional Text Generation (20)

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
UWB semeval2016-task5
UWB semeval2016-task5UWB semeval2016-task5
UWB semeval2016-task5
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Bootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingBootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph Embedding
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
 
nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
 
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detectionDynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases
 
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDINGUNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
UNDERSTANDING NEGATIVE SAMPLING IN KNOWLEDGE GRAPH EMBEDDING
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken Down
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
 
Maximum likelihood-set - introduction
Maximum likelihood-set - introductionMaximum likelihood-set - introduction
Maximum likelihood-set - introduction
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
 

More from MLAI2

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient DistillationOnline Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient Distillation
MLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
MLAI2
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Skill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement LearningSkill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement Learning
MLAI2
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with Hypergraphs
MLAI2
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
MLAI2
 
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set EncodingMini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
MLAI2
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
MLAI2
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
MLAI2
 

More from MLAI2 (20)

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient DistillationOnline Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient Distillation
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 
Skill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement LearningSkill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement Learning
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with Hypergraphs
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
 
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set EncodingMini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability Suppression
 
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 

Recently uploaded

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 

Recently uploaded (20)

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

  • 1. Contrastive Learning with Adversarial Perturbations for Conditional Text Generation Seanie Lee1*, Dong Bok Lee1*, Sung Ju Hwang1,2 KAIST1, Daejeon, South Korea AITRICS2, Seoul, South Korea 1
  • 2. Pretrained Language Model 2 Pretraining language model with large corpus and finetuning it for target task requires a large amount of labeled data.
  • 3. Conditional Text Generation 3 Conditional text generation is to generate another sequence from the given sequence. Generally, we use encoder-decoder architecture. the blue Encoder Encoder Encoder house Embed Embed Embed Decoder Decoder Decoder Decoder Embed Embed Embed Embed la masion bleu <eos>
  • 4. Exposure Bias 4 Seq2seq models trained with teacher forcing often show exposure bias problem, which hurts generalization to unseen inputs. the blue Encoder Encoder Encoder house <bos> Embed Embed Embed Decoder Decoder Decoder Decoder Embed Embed Embed Embed le masion bleu <eos> la masion bleu prediction Ground Truth
  • 5. Contrastive Learning Framework 5 We propose to use contrast a ground truth pair to negative pairs for better representation of target sentence. I cannot do that. GT Target Sentence Source Sent ence Encoder-Decoder He wasn’t in great shape <eos> <bos> He wasn’t in great shape Randomly sampled negative examples are easily discriminated with the pretrained language model and requires a large batch size to mine meaningful negative examples.
  • 6. Contrastive Learning with Adversarial Perturbation 6 We propose to use adversarial perturbation to generate an “imposter” which is close to the GT in embedding space but semantically different. Imposter He wasn’t in good shape. Distant-Target Perturbation He was was in good shape. Perturbation Source Sent ence Encoder-Decoder He wasn’t in great shape <eos> <bos> He wasn’t in great shape Manifold
  • 7. Contrastive Learning with Adversarial Perturbation 7 Conversely, we generate a “distant target” which is far away from the source sentence in embedding space but semantically similar. Imposter He wasn’t in good shape. Distant-Target Perturbation He was was in good shape. Perturbation Source Sent ence Encoder-Decoder He wasn’t in great shape <eos> <bos> He wasn’t in great shape Manifold
  • 8. Contrastive Learning with Adversarial Perturbation 8 We pull the imposter as well as the negative examples away from the source and push the distant target and target to the source. Max Min push source target dist-target imposter pull
  • 9. Contrastive Learning objective 9 Given a pair of source and target sentence 𝑥("), 𝒚(𝒊), we randomly sample 𝒚(𝒋) with 𝑖 ≠ 𝑗 and use them as a set of negative examples 𝑆. As SimCLR, we maximize the cosine similarity between source and target and minimize it between source and negative examples. Nu era într-o formă prea bună. 𝒙(𝒊) He wasn’t in great shape. 𝒚(𝒊) But I cannot do it anymore. By mid-July, it was 40 percent. 𝒚(𝒋) 𝒚(𝒌) Chen et al. "A simple framework for contrastive learning of visual representations." ICML 2020.
  • 10. Generation of Imposter 10 We add a small perturbation to the hidden representation of target sentence to generate imposter with linear approximation as Goodfellow et al. (2015). Encoder-Decoder Nu era într-o formă prea bună. <bos> He wasn’t in great shape. He was was in great shape. Pooling Pooling Min Source Sentence Target Sentence Goodfellow et al. "Explaining and harnessing adversarial examples. International Conference on Learning Representations." ICLR 2015. Objective Linear Approximation
  • 11. Generation of Distant Target 11 Add a large perturbation to the target embedding to be far away from the source sentence but preserving the semantics of target sentence. Maximize Distance Semantic Preservation Encoder Decoder Pooling Pooling He wasn’t in good shape. Source Sentence Max Nu era într-o formă prea bună. <bos> He wasn’t in great shape. Target Sentence
  • 12. Learning Objective – (1) 12 We add the imposter to the set of negative examples 𝑆 and use distant target as another positive example of source sentence for contrastive learning.
  • 13. Learning Objective 13 We jointly maximize the following objectives with stochastic gradient ascent.
  • 14. Experimental Setup – (1) 14 1) Tasks and Evaluation Metric • Neural Machine Translation: BLEU score • Question Generation : BLEU score, F1/EM • Text Summarization: Rouge score 2) Data • WMT’16 RO-EN • SQuAD • Xsum
  • 15. Experimental Setup – (2) 15 3) Baselines • T5-MLE: The T5 model trained with maximum likelihood estimation. • T5-𝛼-MLE: The T5 model trained with MLE but decode target sequence with temperature scaling 𝛼 in softmax. • T5-MLE-contrastive: Naïve contrastive learning with MLE. [Caccia 2020]Caccia et al., Language gans falling short, ICLR 2019
  • 16. Experimental Setup – (2) 16 3) Baselines • T5-SSMBA [Ng 2020]: Generating additional examples by denoising and reconstructing target sentences with masked language model • T5-WordDropout Contrastive [Yang 2019]: Generate negative examples by removing the most frequent word from the target sentence. • T5-R3f [Aghajanyan 2021]: Add a Gaussian noise and enforce consistency loss. [Ng 2020] Ng et al, Ssmba: Self-supervised manifold based data augmentation for improving out-of-domain robustness, EMNLP 2020 [Yang2021] Reducing word omission errors in neural machine translation: A contrastive learning approach, ACL 2019 [Aghajanyan 2019] Better fine-tuning by reducing representational collapse, ICLR2021
  • 17. Experimental Result – (1) 17 Method BLEU Machine Translation – WMT’16 RO-EN T5-MLE 32.43 T5-𝛼-MLE 32.14 T5-MLE-contrastive 32.03 T5-SSMBA 32.81 T5-WordDropout Contrastive 32.44 T5-CLAPS (Ours) 33.96
  • 18. Experimental Result – (2) 18 Method BLEU F1 EM Question Generation – SQuAD T5-MLE 21.00 67.64 55.91 T5-𝛼-MLE 20.50 68.04 56.30 T5-MLE-contrastive 20.91 67.32 55.25 T5-SSMBA 21.07 68.47 56.37 T5-WordDropout Contrastive 21.19 68.16 56.41 T5-CLAPS (Ours) 21.55 69.01 57.06
  • 19. Experimental Result – (3) 19 Method Rouge-1 Rouge-2 Rouge-L Text Summarization – Xsum T5-MLE 36.10 14.72 29.16 T5-𝛼-MLE 36.68 15.10 29.72 T5-MLE-contrastive 36.34 14.81 29.41 T5-SSMBA 36.58 14.81 29.79 T5-WordDropout Contrastive 36.88 15.11 29.68 T5-CLAPS (Ours) 37.89 15.78 30.59
  • 20. Visualization of Sentence Embedding 20 The model learns to push away the imposter from the target sentence and pull the distant target to the source sentence.
  • 21. Conclusion 21 • We propose a contrastive learning framework for conditional sequence generation to mitigate the exposure bias problem. • With adversarial perturbation, we generate negative and positive pairs that are more difficult for the model to distinguish from the GT pair. • Results show that we outperforms the baselines of T5 model across machine translation, question generation and summarization tasks.
  • 22. Future work 22 • For future work, we will improve the quality of imposter which contains many grammatical errors. • Generating imposter and distant target still requires a large amount of labeled data. We need to improve the sample efficiency.