SlideShare a Scribd company logo
1 of 16
2017/10/16
M2 Masahiro Kaneko
AAAI-17
Overview
• Many natural language generation tasks, such as abstractive
summarization and text simplification, are paraphrase-
orientated.
• In these tasks, copying and rewriting are two main writing
modes.
• But seq2seq seldom take into account the two major writing
modes of paraphrase.
Problems
• Copying
• Many keywords provided in the source text may be overlooked in the
target text.
• In addition, certain keywords like named entities are often rare words and
masked as unknown (UNK) tags in Seq2Seq models, which unavoidably
causes the decoder to generate a number of UNK tags.
• Rewriting
• The decoders in most previous work generate words by simply picking the
likely target words that fit the contexts out of a large vocabulary.
• Consequently, the decoding process becomes quite time-consuming.
• Moreover, the decoder sometimes produces the named entities or
numbers which are common but do not exist in or even irrelevant to the
input text, and in turn the meaningless paraphrases.
Proposed Method : CoRe (Copy and Rewriting)
• This model captures the two core writing modes in paraphrase.
• The copying decoder finds the position to be copied based on the
existing attention mechanism.
• Therefore, the weights learned by the attention mechanism have
the explicit meanings in the copying mode.
• Meanwhile, the generative decoder produces the words restricted
in the source-specific vocabulary.
• This vocabulary is composed of a source-target word alignment
table and a small set of frequent words.
• While the output dimension of our generative decoder is just one
tenth of the output dimension used by the common Seq2Seq
models, it is able to generate highly relevant words concerning
rewriting.
Contributions
• We develop two different decoders to simulate major human
writing behaviors in paraphrase.
• We introduce a binary sequence labeling task to predict the
current writing mode, which utilizes additional supervision.
• We add restrictions to the generative decoder in order to
produce highly relevant content efficiently.
Seq2Seq Models and Attention Mechanism
The source sequence
The hidden states of encoder
The context vector
The hidden states of decoder
yt is the predicted target word at the state t.
The attention
Model overview
Copying Decoder
• Copying Decoder picks the words from the source text.
• Most keywords from the original document will be reserved in the
output.
• the vocabulary for this decoder is VC = X.
• We observe that quite a number of low-frequency words in the
actual target text are extracted from the source text.
• Therefore, the copying decoder largely reduces the chance to
produce the unknown (UNK) tags.
The copying probability
distribution.
Restricted Generative Decoder
• Restricts the output in a small yet highly relevant vocabulary according
to the source text.
• Specifically, we train a rough alignment table A based on the IBM
Model beforehand.
• Our pilot experimental results show that the alignment table covers
most target words which are not extracted from the source.
• To further increase the coverage, we supplement an additional frequent
word table U (We add the UNK tag in this table) .
• The final vocabulary for this decoder is limited to:
• Compared with the generation on the large vocabulary V, the restricted
decoder not only runs much faster but also produces more relevant
words.
In our experiments, we retain 10 most
reliable alignments for a source word,
and set |U| = 2000.
Binary sequence labeling task
• we introduce a binary sequence labeling task to decide
whether the current target word should come from copying or
rewriting.
• Specifically, for each hidden state st, we compute a predictor
λt to represent the probability of copying at the current
generation position:
Learning
• The cost function ε in our model is the sum of two parts, i.e.,
• The first one ε1 is the difference between the output {yt}
and the actual target sequence {yt∗}.
• ε2 utilizes the additional supervision of the training data.
• The experiments show that this cost function accurately
balances the proportion of the words derived from copying
and generation.
Cross Entropy
Experiments
• One-sentence abstractive summarization
• One-sentence summarization is to use a condensed sentence (aka.
highlight) to describe the main idea of a document.
• In this work, a collection of news documents and the corresponding
highlights are downloaded from CNN and Daily Mail websites.
• Text simplification
• Text simplification modifies a document in such a way that the
grammar and vocabulary is greatly simplified, while the underlying
meaning remains the same.
• We use Wikipedia text and Simple English Wikipedia corpus
The basic information of the two datasets
Implementation
• Embedding size : 256
• Hidden size : 512
• The initial learning rate : 0.05
• The batch size : 32.
• Theano
• We leverage the popular tool Fast Align (Dyer, Chahuneau, and Smith
2013) to construct the source-target word alignment table A.
• The vocabulary of our generative decoder is restricted in the top 10
alignments of the source words plus 2000 frequent words.
• Although our model is more complex than the standard attentive
Seq2Seq model, it only spends two thirds of the time in both training
and test.
Results
Case Study
• Moses just translates each word of the source.
• It is acceptable for simplification task, but not for summarization task.
• ABS and CoRe output are close to the target length, but ABS fails to capture the main
points.
• CoRe outputs a good summary of the source.

More Related Content

Similar to Joint Copying and Restricted Generation for Paraphrase

System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit IIIManoj Patil
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problemJaeHo Jang
 
Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02riddhi viradiya
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...Hayahide Yamagishi
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design BasicsAkhil Kaushik
 
Encode, tag, realize high precision text editing
Encode, tag, realize high precision text editingEncode, tag, realize high precision text editing
Encode, tag, realize high precision text editingtaeseon ryu
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction Sarmad Ali
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptxSouvikRoy149
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 

Similar to Joint Copying and Restricted Generation for Paraphrase (20)

Eskm20140903
Eskm20140903Eskm20140903
Eskm20140903
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit III
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Compilers.pptx
Compilers.pptxCompilers.pptx
Compilers.pptx
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 
1 cc
1 cc1 cc
1 cc
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design Basics
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
LANGUAGE TRANSLATOR
LANGUAGE TRANSLATORLANGUAGE TRANSLATOR
LANGUAGE TRANSLATOR
 
The Phases of a Compiler
The Phases of a CompilerThe Phases of a Compiler
The Phases of a Compiler
 
Encode, tag, realize high precision text editing
Encode, tag, realize high precision text editingEncode, tag, realize high precision text editing
Encode, tag, realize high precision text editing
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction
 
Presentation1
Presentation1Presentation1
Presentation1
 
Presentation1
Presentation1Presentation1
Presentation1
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptx
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Joint Copying and Restricted Generation for Paraphrase

  • 2. Overview • Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase- orientated. • In these tasks, copying and rewriting are two main writing modes. • But seq2seq seldom take into account the two major writing modes of paraphrase.
  • 3. Problems • Copying • Many keywords provided in the source text may be overlooked in the target text. • In addition, certain keywords like named entities are often rare words and masked as unknown (UNK) tags in Seq2Seq models, which unavoidably causes the decoder to generate a number of UNK tags. • Rewriting • The decoders in most previous work generate words by simply picking the likely target words that fit the contexts out of a large vocabulary. • Consequently, the decoding process becomes quite time-consuming. • Moreover, the decoder sometimes produces the named entities or numbers which are common but do not exist in or even irrelevant to the input text, and in turn the meaningless paraphrases.
  • 4. Proposed Method : CoRe (Copy and Rewriting) • This model captures the two core writing modes in paraphrase. • The copying decoder finds the position to be copied based on the existing attention mechanism. • Therefore, the weights learned by the attention mechanism have the explicit meanings in the copying mode. • Meanwhile, the generative decoder produces the words restricted in the source-specific vocabulary. • This vocabulary is composed of a source-target word alignment table and a small set of frequent words. • While the output dimension of our generative decoder is just one tenth of the output dimension used by the common Seq2Seq models, it is able to generate highly relevant words concerning rewriting.
  • 5. Contributions • We develop two different decoders to simulate major human writing behaviors in paraphrase. • We introduce a binary sequence labeling task to predict the current writing mode, which utilizes additional supervision. • We add restrictions to the generative decoder in order to produce highly relevant content efficiently.
  • 6. Seq2Seq Models and Attention Mechanism The source sequence The hidden states of encoder The context vector The hidden states of decoder yt is the predicted target word at the state t. The attention
  • 8. Copying Decoder • Copying Decoder picks the words from the source text. • Most keywords from the original document will be reserved in the output. • the vocabulary for this decoder is VC = X. • We observe that quite a number of low-frequency words in the actual target text are extracted from the source text. • Therefore, the copying decoder largely reduces the chance to produce the unknown (UNK) tags. The copying probability distribution.
  • 9. Restricted Generative Decoder • Restricts the output in a small yet highly relevant vocabulary according to the source text. • Specifically, we train a rough alignment table A based on the IBM Model beforehand. • Our pilot experimental results show that the alignment table covers most target words which are not extracted from the source. • To further increase the coverage, we supplement an additional frequent word table U (We add the UNK tag in this table) . • The final vocabulary for this decoder is limited to: • Compared with the generation on the large vocabulary V, the restricted decoder not only runs much faster but also produces more relevant words. In our experiments, we retain 10 most reliable alignments for a source word, and set |U| = 2000.
  • 10. Binary sequence labeling task • we introduce a binary sequence labeling task to decide whether the current target word should come from copying or rewriting. • Specifically, for each hidden state st, we compute a predictor λt to represent the probability of copying at the current generation position:
  • 11. Learning • The cost function ε in our model is the sum of two parts, i.e., • The first one ε1 is the difference between the output {yt} and the actual target sequence {yt∗}. • ε2 utilizes the additional supervision of the training data. • The experiments show that this cost function accurately balances the proportion of the words derived from copying and generation. Cross Entropy
  • 12. Experiments • One-sentence abstractive summarization • One-sentence summarization is to use a condensed sentence (aka. highlight) to describe the main idea of a document. • In this work, a collection of news documents and the corresponding highlights are downloaded from CNN and Daily Mail websites. • Text simplification • Text simplification modifies a document in such a way that the grammar and vocabulary is greatly simplified, while the underlying meaning remains the same. • We use Wikipedia text and Simple English Wikipedia corpus
  • 13. The basic information of the two datasets
  • 14. Implementation • Embedding size : 256 • Hidden size : 512 • The initial learning rate : 0.05 • The batch size : 32. • Theano • We leverage the popular tool Fast Align (Dyer, Chahuneau, and Smith 2013) to construct the source-target word alignment table A. • The vocabulary of our generative decoder is restricted in the top 10 alignments of the source words plus 2000 frequent words. • Although our model is more complex than the standard attentive Seq2Seq model, it only spends two thirds of the time in both training and test.
  • 16. Case Study • Moses just translates each word of the source. • It is acceptable for simplification task, but not for summarization task. • ABS and CoRe output are close to the target length, but ABS fails to capture the main points. • CoRe outputs a good summary of the source.

Editor's Notes

  1. source側のVがどうなってるかわからん
  2. source側に複数同じ単語があるときどうする
  3. なんで、A(X) 1対1だけしか扱えない? それとも、フレーズも!?
  4. copy netの実装ない