SlideShare a Scribd company logo
A Deep Reinforced Model for Abstractive
Summarization
Romain Paulus, Caiming Xiong & Richard Socher
-
Slides by Park JeeHyun
7 JUN 2018
Contents
1. Introduction
2. Neural Intra-Attention Model
1) Intra-Temporal Attention on Input Sequence
2) Intra-Decoder Attention
3) Token Generation & Pointer
4) Sharing Decoder Weights
5) Repetition Avoidance at Test Time
3. Hybrid Learning Objective
1) Supervised Learning with Teacher Forcing
2) Policy Learning
3) Mixed Training Objective Function
4. Results
1) Experiments
2) Quantitative Analysis
3) Qualitative Analysis
01. Introduction
• Attentional, RNN-based encoder-decoder models
 good performance on short input and output sequences.
• However, longer documents and summaries these models
 repetitive (1) and incoherent (2) phrases.
• A neural network model with a novel intra-attention (1) that attends over the input and continuously generated
output separately
& new training method that combines standard supervised word prediction and reinforcement learning (2).
Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous
state-of-the-art models.
Human evaluation also shows that our model produces higher quality summaries.
01. Introduction
• KEYPOINTS
1. Intra-attention
• Encoder: intra-temporal attention
• Decoder: sequential intra-attention
2. New objective function
• { Maximum-likelihood } + { Policy gradient reinforcement }
02. Neural Intra-Attention Model
1) Intra-Temporal Attention on Input Sequence
2) Intra-Decoder Attention
• While this intra-temporal attention function ensures that different parts of the
encoded input sequence are used (Equation 3), our decoder can still generate
repeated phrases based on its own hidden states, especially when generating long
sequences.
• To prevent that, we can incorporate more information about the previously decoded
sequence into the decoder.
• To achieve this, we introduce an intra-decoder attention mechanism.
• This mechanism is not present in existing encoder-decoder models for abstractive
summarization.
2) Intra-Decoder Attention
3) Token Generation & Pointer
• To generate a token, our decoder uses either
• a token-generation softmax layer or
• a pointer mechanism to copy rare or unseen from the input
sequence
3) Token Generation & Pointer
3) Token Generation & Pointer
4) Sharing Decoder Weights
How?
Why?
5) Repetition Avoidance at Test Time
Beam search window-size = 5
03. Hybrid Learning Objective
• SUPERVISED LEARNING WITH TEACHER FORCING
• Does not always produce the best results on discrete evaluation metrics
 There are two main reasons for this;
1. exposure bias
• the network has knowledge of the ground truth sequence up to the next token during training,
BUT does not have such supervision when testing, hence accumulating errors as it predicts the
sequence
2. there are more ways to arrange tokens to produce paraphrases or different sentence
orders
 the ROUGE metrics take some of this flexibility into account, but the maximum-
likelihood objective does not.
• POLICY LEARNING
1) Supervised Learning with Teacher
Forcing
we choose with a 25% probability the previously
generated token instead of the ground-truth token
as the decoder input token 𝒚 𝒕−𝟏,
which reduces exposure bias.
1) Supervised Learning with Teacher
Forcing
• A figure from
• Arun Venkatraman, Martial Hebert, and J Andrew Bagnell. Improving multi-step
prediction of learned time series models. In AAAI, pp. 3024–3030, 2015.
2) Policy Learning
2) Policy Learning
𝐿 𝑟𝑙 (or 𝑦 𝑠
)
𝐿 𝑚𝑙 (or 𝑦)
3) Mixed Training Objective Function
We use a 𝛾 = 0.9984 for the ML+RL loss function
3) Mixed Training Objective Function
• Hyperparameters and Implementation Details
04. Results
• A result example
1) Experiments
• DATASETS
• CNN/DAILY MAIL
• NEW YORK TIMES
• Setup
1. We first run maximum-likelihood (ML) training with and without intra-decoder
attention (removing 𝑐𝑡
𝑑
from Equations 9 and 11 to disable intra-attention) and
select the best performing architecture.
2. Next, we initialize our model with the best ML parameters and we compare
reinforcement learning (RL) with our mixed-objective learning (ML+RL).
2) Quantitative Analysis
• Evaluation Metric
• F-1 score
• F-score (F1 score) is the harmonic mean of precision and recall
• ROUGE-N: Overlap of N-grams between the system and reference summaries.
• ROUGE-1
• refers to the overlap of 1-gram (each word) between the system and reference summaries.
• ROUGE-2
• refers to the overlap of bigrams between the system and reference summaries.
• ROUGE-L
• Longest Common Subsequence (LCS) based statistics.
• Longest common subsequence problem takes into account sentence level structure similarity
naturally and identifies longest co-occurring in sequence n-grams automatically.
2) Quantitative Analysis
• ROUGE metrics and options:
• We report the full-length F-1 score of the ROUGE-1, ROUGE-2
and ROUGE-L metrics with the Porter stemmer option. For RL and
ML+RL training, we use the ROUGE-L score as a reinforcement
reward.
• We also tried ROUGE-2 but we found that it created summaries
that almost always reached the maximum length, often ending
sentences abruptly.
2) Quantitative Analysis
2) Quantitative Analysis
• This confirms our assumption that intra-attention improves performance on longer output sequences, and
explains why intra-attention does not improve performance on the NYT dataset, which has shorter summaries
on average.
2) Quantitative Analysis
What?
3) Qualitative Analysis
• Evaluation Setup:
• To perform this evaluation, we randomly select 100 test examples from the CNN/Daily Mail dataset.
• For each example, we show the original article, the ground truth summary as well as summaries generated
by different models side by side to a human evaluator.
• The human evaluator does not know which summaries come from which model or which one is the ground truth.
• Two scores from 1 to 10 are then assigned to each summary,
• one for relevance (how well does the summary capture the important parts of the article) and
• one for readability (how well-written the summary is).
• Each summary is rated by 5 different human evaluators on Amazon Mechanical Turk and the results are
averaged across all examples and evaluators.
3) Qualitative Analysis
Reference
• A Deep Reinforced Model for Abstractive Summarization
(https://arxiv.org/abs/1705.04304)
• Improving Multi-Step Prediction of Learned Time Series Models
(https://www.ri.cmu.edu/pub_files/2015/1/Venkatraman.pdf)
Main
sub

More Related Content

What's hot

text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
amit nagarkoti
 
Text summarization
Text summarization Text summarization
Text summarization
prateek khandelwal
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
JEE HYUN PARK
 
SFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateSFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress Update
Linaro
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
Abdelaziz Al-Rihawi
 
Cache memory
Cache memoryCache memory
Cache memory
Abir Rahman
 
Logic synthesis,flootplan&placement
Logic synthesis,flootplan&placementLogic synthesis,flootplan&placement
Logic synthesis,flootplan&placement
shaik sharief
 
Low Power Design Verification of Complex Chips
Low Power Design Verification of Complex ChipsLow Power Design Verification of Complex Chips
Low Power Design Verification of Complex ChipsDVClub
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
Tho Phan
 
Clock Gating
Clock GatingClock Gating
Clock Gating
Mahesh Dananjaya
 
Low power in vlsi with upf basics part 2
Low power in vlsi with upf basics part 2Low power in vlsi with upf basics part 2
Low power in vlsi with upf basics part 2
SUNODH GARLAPATI
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Seonghyun Kim
 
Text summarization
Text summarizationText summarization
Text summarization
Akash Karwande
 
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...shaotao liu
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Bert
BertBert
Lecture 2
Lecture 2Lecture 2
Lecture 2Mr SMAK
 
Digital VLSI Design : Combinational Circuit
Digital VLSI Design : Combinational CircuitDigital VLSI Design : Combinational Circuit
Digital VLSI Design : Combinational Circuit
Usha Mehta
 
Methods for Achieving RTL to Gate Power Consistency
Methods for Achieving RTL to Gate Power ConsistencyMethods for Achieving RTL to Gate Power Consistency
Methods for Achieving RTL to Gate Power Consistency
Ansys
 

What's hot (20)

text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Text summarization
Text summarization Text summarization
Text summarization
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
SFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateSFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress Update
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
Cache memory
Cache memoryCache memory
Cache memory
 
Logic synthesis,flootplan&placement
Logic synthesis,flootplan&placementLogic synthesis,flootplan&placement
Logic synthesis,flootplan&placement
 
Low Power Design Verification of Complex Chips
Low Power Design Verification of Complex ChipsLow Power Design Verification of Complex Chips
Low Power Design Verification of Complex Chips
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
Clock Gating
Clock GatingClock Gating
Clock Gating
 
Low power in vlsi with upf basics part 2
Low power in vlsi with upf basics part 2Low power in vlsi with upf basics part 2
Low power in vlsi with upf basics part 2
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Text summarization
Text summarizationText summarization
Text summarization
 
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Bert
BertBert
Bert
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Digital VLSI Design : Combinational Circuit
Digital VLSI Design : Combinational CircuitDigital VLSI Design : Combinational Circuit
Digital VLSI Design : Combinational Circuit
 
Methods for Achieving RTL to Gate Power Consistency
Methods for Achieving RTL to Gate Power ConsistencyMethods for Achieving RTL to Gate Power Consistency
Methods for Achieving RTL to Gate Power Consistency
 

Similar to a deep reinforced model for abstractive summarization

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Endsem AI merged.pdf
Endsem AI merged.pdfEndsem AI merged.pdf
Endsem AI merged.pdf
ShivamMishra603376
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
Pranov Mishra
 
operation research notes
operation research notesoperation research notes
operation research notesRenu Thakur
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET Journal
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
KiranKumar918931
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
IRJET Journal
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
daniahendric
 
Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...
Nane Kratzke
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
Praveen Penumathsa
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
changedaeoh
 
Data Generation with PROSPECT: a Probability Specification Tool
Data Generation with PROSPECT: a Probability Specification ToolData Generation with PROSPECT: a Probability Specification Tool
Data Generation with PROSPECT: a Probability Specification Tool
Ivan Ruchkin
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at Uber
Jisang Yoon
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Vincenzo Lomonaco
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
arthi v
 
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
DataScienceConferenc1
 

Similar to a deep reinforced model for abstractive summarization (20)

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Endsem AI merged.pdf
Endsem AI merged.pdfEndsem AI merged.pdf
Endsem AI merged.pdf
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
 
operation research notes
operation research notesoperation research notes
operation research notes
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 
Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Data Generation with PROSPECT: a Probability Specification Tool
Data Generation with PROSPECT: a Probability Specification ToolData Generation with PROSPECT: a Probability Specification Tool
Data Generation with PROSPECT: a Probability Specification Tool
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at Uber
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...
 

More from JEE HYUN PARK

keti companion classifier
keti companion classifierketi companion classifier
keti companion classifier
JEE HYUN PARK
 
Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330
JEE HYUN PARK
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classification
JEE HYUN PARK
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
JEE HYUN PARK
 
Historical Finance Data
Historical Finance DataHistorical Finance Data
Historical Finance Data
JEE HYUN PARK
 
KCC2017 28APR2017
KCC2017 28APR2017KCC2017 28APR2017
KCC2017 28APR2017
JEE HYUN PARK
 
Short-Term Load Forecasting of Australian National Electricity Market by Hier...
Short-Term Load Forecasting of Australian National Electricity Market by Hier...Short-Term Load Forecasting of Australian National Electricity Market by Hier...
Short-Term Load Forecasting of Australian National Electricity Market by Hier...
JEE HYUN PARK
 

More from JEE HYUN PARK (7)

keti companion classifier
keti companion classifierketi companion classifier
keti companion classifier
 
Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classification
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
Historical Finance Data
Historical Finance DataHistorical Finance Data
Historical Finance Data
 
KCC2017 28APR2017
KCC2017 28APR2017KCC2017 28APR2017
KCC2017 28APR2017
 
Short-Term Load Forecasting of Australian National Electricity Market by Hier...
Short-Term Load Forecasting of Australian National Electricity Market by Hier...Short-Term Load Forecasting of Australian National Electricity Market by Hier...
Short-Term Load Forecasting of Australian National Electricity Market by Hier...
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

a deep reinforced model for abstractive summarization

  • 1. A Deep Reinforced Model for Abstractive Summarization Romain Paulus, Caiming Xiong & Richard Socher - Slides by Park JeeHyun 7 JUN 2018
  • 2. Contents 1. Introduction 2. Neural Intra-Attention Model 1) Intra-Temporal Attention on Input Sequence 2) Intra-Decoder Attention 3) Token Generation & Pointer 4) Sharing Decoder Weights 5) Repetition Avoidance at Test Time 3. Hybrid Learning Objective 1) Supervised Learning with Teacher Forcing 2) Policy Learning 3) Mixed Training Objective Function 4. Results 1) Experiments 2) Quantitative Analysis 3) Qualitative Analysis
  • 3. 01. Introduction • Attentional, RNN-based encoder-decoder models  good performance on short input and output sequences. • However, longer documents and summaries these models  repetitive (1) and incoherent (2) phrases. • A neural network model with a novel intra-attention (1) that attends over the input and continuously generated output separately & new training method that combines standard supervised word prediction and reinforcement learning (2). Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. Human evaluation also shows that our model produces higher quality summaries.
  • 4. 01. Introduction • KEYPOINTS 1. Intra-attention • Encoder: intra-temporal attention • Decoder: sequential intra-attention 2. New objective function • { Maximum-likelihood } + { Policy gradient reinforcement }
  • 6. 1) Intra-Temporal Attention on Input Sequence
  • 7. 2) Intra-Decoder Attention • While this intra-temporal attention function ensures that different parts of the encoded input sequence are used (Equation 3), our decoder can still generate repeated phrases based on its own hidden states, especially when generating long sequences. • To prevent that, we can incorporate more information about the previously decoded sequence into the decoder. • To achieve this, we introduce an intra-decoder attention mechanism. • This mechanism is not present in existing encoder-decoder models for abstractive summarization.
  • 9. 3) Token Generation & Pointer • To generate a token, our decoder uses either • a token-generation softmax layer or • a pointer mechanism to copy rare or unseen from the input sequence
  • 10. 3) Token Generation & Pointer
  • 11. 3) Token Generation & Pointer
  • 12. 4) Sharing Decoder Weights How? Why?
  • 13. 5) Repetition Avoidance at Test Time Beam search window-size = 5
  • 14. 03. Hybrid Learning Objective • SUPERVISED LEARNING WITH TEACHER FORCING • Does not always produce the best results on discrete evaluation metrics  There are two main reasons for this; 1. exposure bias • the network has knowledge of the ground truth sequence up to the next token during training, BUT does not have such supervision when testing, hence accumulating errors as it predicts the sequence 2. there are more ways to arrange tokens to produce paraphrases or different sentence orders  the ROUGE metrics take some of this flexibility into account, but the maximum- likelihood objective does not. • POLICY LEARNING
  • 15. 1) Supervised Learning with Teacher Forcing we choose with a 25% probability the previously generated token instead of the ground-truth token as the decoder input token 𝒚 𝒕−𝟏, which reduces exposure bias.
  • 16. 1) Supervised Learning with Teacher Forcing • A figure from • Arun Venkatraman, Martial Hebert, and J Andrew Bagnell. Improving multi-step prediction of learned time series models. In AAAI, pp. 3024–3030, 2015.
  • 18. 2) Policy Learning 𝐿 𝑟𝑙 (or 𝑦 𝑠 ) 𝐿 𝑚𝑙 (or 𝑦)
  • 19. 3) Mixed Training Objective Function We use a 𝛾 = 0.9984 for the ML+RL loss function
  • 20. 3) Mixed Training Objective Function • Hyperparameters and Implementation Details
  • 21. 04. Results • A result example
  • 22. 1) Experiments • DATASETS • CNN/DAILY MAIL • NEW YORK TIMES • Setup 1. We first run maximum-likelihood (ML) training with and without intra-decoder attention (removing 𝑐𝑡 𝑑 from Equations 9 and 11 to disable intra-attention) and select the best performing architecture. 2. Next, we initialize our model with the best ML parameters and we compare reinforcement learning (RL) with our mixed-objective learning (ML+RL).
  • 23. 2) Quantitative Analysis • Evaluation Metric • F-1 score • F-score (F1 score) is the harmonic mean of precision and recall • ROUGE-N: Overlap of N-grams between the system and reference summaries. • ROUGE-1 • refers to the overlap of 1-gram (each word) between the system and reference summaries. • ROUGE-2 • refers to the overlap of bigrams between the system and reference summaries. • ROUGE-L • Longest Common Subsequence (LCS) based statistics. • Longest common subsequence problem takes into account sentence level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically.
  • 24. 2) Quantitative Analysis • ROUGE metrics and options: • We report the full-length F-1 score of the ROUGE-1, ROUGE-2 and ROUGE-L metrics with the Porter stemmer option. For RL and ML+RL training, we use the ROUGE-L score as a reinforcement reward. • We also tried ROUGE-2 but we found that it created summaries that almost always reached the maximum length, often ending sentences abruptly.
  • 26. 2) Quantitative Analysis • This confirms our assumption that intra-attention improves performance on longer output sequences, and explains why intra-attention does not improve performance on the NYT dataset, which has shorter summaries on average.
  • 28. 3) Qualitative Analysis • Evaluation Setup: • To perform this evaluation, we randomly select 100 test examples from the CNN/Daily Mail dataset. • For each example, we show the original article, the ground truth summary as well as summaries generated by different models side by side to a human evaluator. • The human evaluator does not know which summaries come from which model or which one is the ground truth. • Two scores from 1 to 10 are then assigned to each summary, • one for relevance (how well does the summary capture the important parts of the article) and • one for readability (how well-written the summary is). • Each summary is rated by 5 different human evaluators on Amazon Mechanical Turk and the results are averaged across all examples and evaluators.
  • 30. Reference • A Deep Reinforced Model for Abstractive Summarization (https://arxiv.org/abs/1705.04304) • Improving Multi-Step Prediction of Learned Time Series Models (https://www.ri.cmu.edu/pub_files/2015/1/Venkatraman.pdf)
  • 31.
  • 32. Main
  • 33. sub