SlideShare a Scribd company logo
1 of 30
Download to read offline
BRIO: Bringing Order to Abstractive Summarization
딥러닝 논문읽기 모임
자연어처리팀 : 조해창, 김유진, 변현정, 박산희, 이기성(발표자)
Content
1. Background
2. Abstract
3. Introduction
4. Neural Abstractive Summarization
5. Coordinating Abstractive Models
6. Experiments
7. Conclusion
1. Background
Exposure bias
ROUGE
• N-gram을 기준으로 Recall, Precision, F1을 계산
Ex)
Reference Text : 딥러닝 논문 읽기 모임은 유익하다.
Inference Text : 논문 모임은 매우 유익하다.
• Recall : 0.6
• Precision : 0.75
• F1 : 0.67
• Train 과정에서는 Teacher forcing이 적용
• Inference 과정에서는 이전 스텝에서 예측한 토큰을 사용
Contrastive Learning
• 대상들의 차이를 명확하게 보여줄 수 있도록 학습하는 것
• Metric Learning의 일종
2. Abstract
• 추상 요약은 Maximum likelihood estimation으로 학습(deterministic – one point)
• Non-deterministic 분포를 가정하는 새로운 학습 paradigm을 제시
• 서로 다른 생성 요약 후보에 확률 질량을 부여(품질에 따라서)
3. Introduction
• Autoregressive Manner → Exposure bias
• Metric(Rouge)와 실제 생성 요약의 확률 간의 불일치를 해결하는 것이 목적
Reference samples
• 딥러닝 논문 읽기 모임은 유익하다.
Generate samples
• 딥러닝 논문 읽기 모임은 유익하다. / P : 0.5
• 논문 모임은 매우 유익하다. / P : 0.7
• 모델에 두 개의 역할을 부여 → generation & Evaluation
• Generation Model : MLE Loss로 학습
• Evaluation Model : Contrastive Loss로 학습
• Main contribution : 모델이 학습할 타겟 분포를 deterministic
한 분포에서 non-deterministic 분포로 변경
4. Neural Abstractive Summarization
• D : Source document
• S : Reference summary / s : summary token
• g : Function
Training Objective
Inference and Exposure Bias
5. Coordination Abstractive Models
Contrastive Learning for Coordination
Generate samples
• 딥러닝 모임은 유익
• 논문 모임은 매우 유익하다.
• 딥러닝 논문 읽기 모임은 재미있다.
• 모임은 재미있다.
Reference samples
• 딥러닝 논문 읽기 모임은 유익하고 재미있다.
MLE Loss
Generate samples(Sorted by ROUGE)
1. 딥러닝 논문 읽기 모임은 재미있다.
2. 논문 모임은 매우 유익하다.
3. 딥러닝 모임은 유익
4. 모임은 재미있다.
Contrastive Loss
6. Experiments
Experimental Settings / Implementation Details
• Datasets : CNNDM, Xsum, NYT
• Baselines : BART, PEGASUS, Gsum, SimCLS, GOLD, SeqCo, ConSum
• BRIO-Ctr : Contrastive loss only
• BRIO-Mul : CE loss + Contrastive loss
Results
• Coefficients of the Multi-Task Loss
Analysis
• Generation-Finetuning as a Loop
Analysis
• Increasing the Beam Width
Analysis
• Training with Different Evaluation Metrics
Analysis
• Novel n-grams
Analysis
• Rank Correlation
Analysis
Token-level Calibration
Few-shot Fine-tuning
Case Study on CNNDM
6. Conclusion And Future Work
• Metric에 따른 점수와 candidate outputs의 확률을 활용한 Contrastive한 학습 방법을 제시
• 추상 요약 뿐만 아니라 기계 번역 Task에도 확장 가능성
• 강화 학습에 적용 가능성이 존재
• Diverse beam search 외의 다른 생성 방법을 활용한 성능 개선 가능성

More Related Content

More from taeseon ryu

Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper reviewtaeseon ryu
 
Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...taeseon ryu
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu
 
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdfPaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdftaeseon ryu
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matchingtaeseon ryu
 
Deep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human PreferencesDeep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human Preferencestaeseon ryu
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...taeseon ryu
 

More from taeseon ryu (20)

Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper review
 
Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domains
 
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdfPaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Deep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human PreferencesDeep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human Preferences
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 

BRIO: Bringing Order to Abstractive Summarization

  • 1. BRIO: Bringing Order to Abstractive Summarization 딥러닝 논문읽기 모임 자연어처리팀 : 조해창, 김유진, 변현정, 박산희, 이기성(발표자)
  • 2. Content 1. Background 2. Abstract 3. Introduction 4. Neural Abstractive Summarization 5. Coordinating Abstractive Models 6. Experiments 7. Conclusion
  • 4. Exposure bias ROUGE • N-gram을 기준으로 Recall, Precision, F1을 계산 Ex) Reference Text : 딥러닝 논문 읽기 모임은 유익하다. Inference Text : 논문 모임은 매우 유익하다. • Recall : 0.6 • Precision : 0.75 • F1 : 0.67 • Train 과정에서는 Teacher forcing이 적용 • Inference 과정에서는 이전 스텝에서 예측한 토큰을 사용
  • 5. Contrastive Learning • 대상들의 차이를 명확하게 보여줄 수 있도록 학습하는 것 • Metric Learning의 일종
  • 7. • 추상 요약은 Maximum likelihood estimation으로 학습(deterministic – one point) • Non-deterministic 분포를 가정하는 새로운 학습 paradigm을 제시 • 서로 다른 생성 요약 후보에 확률 질량을 부여(품질에 따라서)
  • 9. • Autoregressive Manner → Exposure bias • Metric(Rouge)와 실제 생성 요약의 확률 간의 불일치를 해결하는 것이 목적 Reference samples • 딥러닝 논문 읽기 모임은 유익하다. Generate samples • 딥러닝 논문 읽기 모임은 유익하다. / P : 0.5 • 논문 모임은 매우 유익하다. / P : 0.7
  • 10. • 모델에 두 개의 역할을 부여 → generation & Evaluation • Generation Model : MLE Loss로 학습 • Evaluation Model : Contrastive Loss로 학습 • Main contribution : 모델이 학습할 타겟 분포를 deterministic 한 분포에서 non-deterministic 분포로 변경
  • 11. 4. Neural Abstractive Summarization
  • 12. • D : Source document • S : Reference summary / s : summary token • g : Function Training Objective
  • 15. Contrastive Learning for Coordination
  • 16. Generate samples • 딥러닝 모임은 유익 • 논문 모임은 매우 유익하다. • 딥러닝 논문 읽기 모임은 재미있다. • 모임은 재미있다. Reference samples • 딥러닝 논문 읽기 모임은 유익하고 재미있다. MLE Loss Generate samples(Sorted by ROUGE) 1. 딥러닝 논문 읽기 모임은 재미있다. 2. 논문 모임은 매우 유익하다. 3. 딥러닝 모임은 유익 4. 모임은 재미있다. Contrastive Loss
  • 18. Experimental Settings / Implementation Details • Datasets : CNNDM, Xsum, NYT • Baselines : BART, PEGASUS, Gsum, SimCLS, GOLD, SeqCo, ConSum • BRIO-Ctr : Contrastive loss only • BRIO-Mul : CE loss + Contrastive loss
  • 20. • Coefficients of the Multi-Task Loss Analysis
  • 21. • Generation-Finetuning as a Loop Analysis
  • 22. • Increasing the Beam Width Analysis
  • 23. • Training with Different Evaluation Metrics Analysis
  • 28. Case Study on CNNDM
  • 29. 6. Conclusion And Future Work
  • 30. • Metric에 따른 점수와 candidate outputs의 확률을 활용한 Contrastive한 학습 방법을 제시 • 추상 요약 뿐만 아니라 기계 번역 Task에도 확장 가능성 • 강화 학습에 적용 가능성이 존재 • Diverse beam search 외의 다른 생성 방법을 활용한 성능 개선 가능성