SlideShare a Scribd company logo
1 of 26
Download to read offline
A Neural Attention Model for
Sentence Summarization
Authors: Alexander M. Rush, Sumit Chopra, Jason Weston
Conference: EMNLP 2015
Presentor: Mengsay LOEM
May 21, 2021
1
Overview
● Proposed a fully data-driven approach for sentence
abstractive summarization
● Combined Attention-based probabilistic model with Beam
Search to generate sentence summary
● Proposed method outperforms several strong baselines
on headline-generation task
2
Sentence Summarization
russian defense minister ivanov called sunday for the
creation of a joint front for combating global terrorism
russia calls for joint front against terrorism
Input
Output
Approaches:
● Compressive : deletion
● Extractive : deletion and reordering
● Abstractive : arbitrary transformation
3
Problems
● Compressive/Extractive methods
○ Cannot perform various summary operations
■ Paraphrasing, generalization
● Abstractive methods
○ Require linguistically-inspired constraints
○ Require syntactic transformation on input text
Can we generate an abstractive summary with
a fully data-driven approach?
4
Mostly Heuristics
Approach
Solution
● Fully data-driven approach for abstractive summaries
generation
○ NO syntactic transformations/linguistic structure required
○ Can be trained directly on any document-summary pair
● Utilize Attention-based Neural Networks model
○ Inspired by Neural Machine Translation*
5
* Bahdanau et al.: Neural Machine Translation by Jointly Learning to Align and Translate
Proposed Method: outline
6
enc:
• Bag-of-words encoder
• Convolutional encoder
• Attention-based encoder
dec:
Neural Network Language Model +
Beam Search
enc
⋯
dec
⋯
russian
defense
minister
ivanov
terrorism
<s>
russia
called
calls
for
terrorism
⋯
russia
calls
for
joint
<s>
Attention-based Encoder
7
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
Smoothed
version of input
8
Example of Attention
Output (summary)
Input sentence
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
9
Example of Attention
Output (summary)
Input sentence
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
10
Input sentence
Example of Attention
Output (summary)
Smoothing window
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
11
Input sentence
Example of Attention
Output (summary)
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
12
Input sentence
Example of Attention
Output (summary)
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
13
Input sentence
Example of Attention
Output (summary)
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
14
Input sentence
Example of Attention
Output (summary)
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
15
Input sentence
Example of Attention
Output (summary)
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
16
Input sentence
Example of Attention
Output (summary)
dec
⋯
russian defense
<s>
russia
calls
for
against
⋯
russian
calls
for
joint
terrorism
<s>
!
Weighting with Attention
⋯ terrorism
</s>
Generation: Beam Search
17
russia
for
country
calls
…
defense
terrorism
russian
for
country
calls
…
defense
terrorism
russia
for
country
calls
…
defense
terrorism
russia
for
country
calls
…
defense
terrorism
russia
for
country
calls
…
defense
terrorism
[ ]
join
join
join
join
join
russia russia calls
defense
defense terrorism
russia calls for
defense terrorism country
Extension : Extractive Tuning
18
● Fully abstractive model cannot find extractive word
matches when necessary
○ transferring unseen proper noun phrases from the input
● Solution:
○ tuning a small set of additional features
Experiment
● Evaluate on Headline generation task
○ Metrics: ROUGE-1, ROUGE-2, ROUGE-L
○ Data sets:
■ Training: Annotated Gigaword data set (3.8M )
■ Evaluation: DUC-2003, 2004 (500)
● Baselines
○ Compressive: Prefix, Compress
○ Abstractive: IR,Topiary, MOSES+
○ W&L (Quasi-synchronous grammar approach)
19
Result : Summary Tasks
ABS : Proposed model (Attention-based encoder)
ABS+ : ABS with Extractive Tuning 20
● Only input article or LM alone is not sufficient (IR, Compress)
● Full model ABS+ scores the best, but additional extractive
features bias the system toward inputs words (useful for the ROUGE metric)
Compressive
Winning system
on the task
Statistical MT
Result : Encoding Ablation
21
● NNLM with no encoder performs better than n-gram LM.
● Including proposed encoders improve the model
● Ignore word order
● No use of generated
context
● Averaging over input
words
● No use of generated
context
● Allow local interactions
between words in input
sentence
Result : Model and Decoding Ablation
22
Pure extractive
● Attention-based encoder with Beam search give the biggest
impact
● For ROUGE, using pure extractive generation is also effective
Example
ABS: Interesting rewording
Input: australian foreign minister stephen smith sunday congratulated
new zealand ’s new prime minister-elect john key as he praised ousted
leader helen clark as a “ gutsy ” and respected politician .
ABS: australian foreign minister congratulates new nz pm after election
ABS+:australian foreign minister congratulates smith new zealand as
leader
Head-line: time caught up with nz ’s gutsy clark says australian fm
23
Example
ABS: Interesting rewording, but making MISTAKE
Input: russia ’s gas and oil giant gazprom and us oil major chevron have set
up a joint venture based in resource-rich northwestern siberia , the
interfax news agency reported thursday quoting gazprom officials .
ABS: russian oil giant chevron set up siberia joint venture
ABS+:russia ’s gazprom set up joint venture in siberia
Head-line: gazprom chevron set up joint venture
24
Example
good at picking keywords, but reorder words in syntactically incorrect way
Input: the white house on thursday warned iran of possible new sanctions
after the un nuclear watchdog reported that tehran had begun sensitive
nuclear work at a key site in defiance of un resolutions .
ABS: iran warns of possible new sanctions on nuclear work
ABS+:un nuclear watchdog warns iran of possible new sanctions
Head-line: us warns iran of step backward on nuclear issue
25
Conclusion
● Proposed a Neural Attention-based model for Abstractive
Sentence Summarization
● Fully data-driven approach
● Improve baselines scores, but the performance is far from
human's
○ Repeating semantic elements, improper generalization
26

More Related Content

What's hot

[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural Language[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural LanguageJinho Choi
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfPo-Chuan Chen
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy재연 윤
 
SPARQL: un API per ghermire dati
SPARQL: un API per ghermire datiSPARQL: un API per ghermire dati
SPARQL: un API per ghermire datiSynapta
 
Teaching Writing: Short Functional Text (Short Message and Notice)
Teaching Writing: Short Functional Text (Short Message and Notice)Teaching Writing: Short Functional Text (Short Message and Notice)
Teaching Writing: Short Functional Text (Short Message and Notice)AstridWindiana
 
有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案
有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案
有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案Takao Murakami
 
Data Wrangling with Pandas
Data Wrangling with PandasData Wrangling with Pandas
Data Wrangling with PandasLuis Carrasco
 
AI_Session 8 A searching algorithm .pptx
AI_Session 8 A searching algorithm .pptxAI_Session 8 A searching algorithm .pptx
AI_Session 8 A searching algorithm .pptxAsst.prof M.Gokilavani
 
Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDBsky_jackson
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 

What's hot (17)

[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural Language[2019] Class-based N-gram Models of Natural Language
[2019] Class-based N-gram Models of Natural Language
 
Text summarization
Text summarizationText summarization
Text summarization
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Text summerization
Text summerizationText summerization
Text summerization
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Silabus bhs inggris wajib kls 11
Silabus bhs inggris wajib kls 11Silabus bhs inggris wajib kls 11
Silabus bhs inggris wajib kls 11
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
SPARQL: un API per ghermire dati
SPARQL: un API per ghermire datiSPARQL: un API per ghermire dati
SPARQL: un API per ghermire dati
 
Teaching Writing: Short Functional Text (Short Message and Notice)
Teaching Writing: Short Functional Text (Short Message and Notice)Teaching Writing: Short Functional Text (Short Message and Notice)
Teaching Writing: Short Functional Text (Short Message and Notice)
 
有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案
有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案
有用性の高い局所プライベートな分布推定に向けた新たな安全性指標の提案
 
Data Wrangling with Pandas
Data Wrangling with PandasData Wrangling with Pandas
Data Wrangling with Pandas
 
AI_Session 8 A searching algorithm .pptx
AI_Session 8 A searching algorithm .pptxAI_Session 8 A searching algorithm .pptx
AI_Session 8 A searching algorithm .pptx
 
Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDB
 
Analytical
AnalyticalAnalytical
Analytical
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 

Recently uploaded

Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfssuser5c9d4b1
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Toolssoginsider
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Studentskannan348865
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptamrabdallah9
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New HorizonMorshed Ahmed Rahath
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniR. Sosa
 

Recently uploaded (20)

Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 

A Neural Attention Model for Sentence Summarization

  • 1. A Neural Attention Model for Sentence Summarization Authors: Alexander M. Rush, Sumit Chopra, Jason Weston Conference: EMNLP 2015 Presentor: Mengsay LOEM May 21, 2021 1
  • 2. Overview ● Proposed a fully data-driven approach for sentence abstractive summarization ● Combined Attention-based probabilistic model with Beam Search to generate sentence summary ● Proposed method outperforms several strong baselines on headline-generation task 2
  • 3. Sentence Summarization russian defense minister ivanov called sunday for the creation of a joint front for combating global terrorism russia calls for joint front against terrorism Input Output Approaches: ● Compressive : deletion ● Extractive : deletion and reordering ● Abstractive : arbitrary transformation 3
  • 4. Problems ● Compressive/Extractive methods ○ Cannot perform various summary operations ■ Paraphrasing, generalization ● Abstractive methods ○ Require linguistically-inspired constraints ○ Require syntactic transformation on input text Can we generate an abstractive summary with a fully data-driven approach? 4 Mostly Heuristics Approach
  • 5. Solution ● Fully data-driven approach for abstractive summaries generation ○ NO syntactic transformations/linguistic structure required ○ Can be trained directly on any document-summary pair ● Utilize Attention-based Neural Networks model ○ Inspired by Neural Machine Translation* 5 * Bahdanau et al.: Neural Machine Translation by Jointly Learning to Align and Translate
  • 6. Proposed Method: outline 6 enc: • Bag-of-words encoder • Convolutional encoder • Attention-based encoder dec: Neural Network Language Model + Beam Search enc ⋯ dec ⋯ russian defense minister ivanov terrorism <s> russia called calls for terrorism ⋯ russia calls for joint <s>
  • 8. 8 Example of Attention Output (summary) Input sentence dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 9. 9 Example of Attention Output (summary) Input sentence dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 10. 10 Input sentence Example of Attention Output (summary) Smoothing window dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 11. 11 Input sentence Example of Attention Output (summary) dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 12. 12 Input sentence Example of Attention Output (summary) dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 13. 13 Input sentence Example of Attention Output (summary) dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 14. 14 Input sentence Example of Attention Output (summary) dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 15. 15 Input sentence Example of Attention Output (summary) dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 16. 16 Input sentence Example of Attention Output (summary) dec ⋯ russian defense <s> russia calls for against ⋯ russian calls for joint terrorism <s> ! Weighting with Attention ⋯ terrorism </s>
  • 18. Extension : Extractive Tuning 18 ● Fully abstractive model cannot find extractive word matches when necessary ○ transferring unseen proper noun phrases from the input ● Solution: ○ tuning a small set of additional features
  • 19. Experiment ● Evaluate on Headline generation task ○ Metrics: ROUGE-1, ROUGE-2, ROUGE-L ○ Data sets: ■ Training: Annotated Gigaword data set (3.8M ) ■ Evaluation: DUC-2003, 2004 (500) ● Baselines ○ Compressive: Prefix, Compress ○ Abstractive: IR,Topiary, MOSES+ ○ W&L (Quasi-synchronous grammar approach) 19
  • 20. Result : Summary Tasks ABS : Proposed model (Attention-based encoder) ABS+ : ABS with Extractive Tuning 20 ● Only input article or LM alone is not sufficient (IR, Compress) ● Full model ABS+ scores the best, but additional extractive features bias the system toward inputs words (useful for the ROUGE metric) Compressive Winning system on the task Statistical MT
  • 21. Result : Encoding Ablation 21 ● NNLM with no encoder performs better than n-gram LM. ● Including proposed encoders improve the model ● Ignore word order ● No use of generated context ● Averaging over input words ● No use of generated context ● Allow local interactions between words in input sentence
  • 22. Result : Model and Decoding Ablation 22 Pure extractive ● Attention-based encoder with Beam search give the biggest impact ● For ROUGE, using pure extractive generation is also effective
  • 23. Example ABS: Interesting rewording Input: australian foreign minister stephen smith sunday congratulated new zealand ’s new prime minister-elect john key as he praised ousted leader helen clark as a “ gutsy ” and respected politician . ABS: australian foreign minister congratulates new nz pm after election ABS+:australian foreign minister congratulates smith new zealand as leader Head-line: time caught up with nz ’s gutsy clark says australian fm 23
  • 24. Example ABS: Interesting rewording, but making MISTAKE Input: russia ’s gas and oil giant gazprom and us oil major chevron have set up a joint venture based in resource-rich northwestern siberia , the interfax news agency reported thursday quoting gazprom officials . ABS: russian oil giant chevron set up siberia joint venture ABS+:russia ’s gazprom set up joint venture in siberia Head-line: gazprom chevron set up joint venture 24
  • 25. Example good at picking keywords, but reorder words in syntactically incorrect way Input: the white house on thursday warned iran of possible new sanctions after the un nuclear watchdog reported that tehran had begun sensitive nuclear work at a key site in defiance of un resolutions . ABS: iran warns of possible new sanctions on nuclear work ABS+:un nuclear watchdog warns iran of possible new sanctions Head-line: us warns iran of step backward on nuclear issue 25
  • 26. Conclusion ● Proposed a Neural Attention-based model for Abstractive Sentence Summarization ● Fully data-driven approach ● Improve baselines scores, but the performance is far from human's ○ Repeating semantic elements, improper generalization 26