SlideShare a Scribd company logo
1 of 7
Download to read offline
Deep Reinforcement Learning with Distributional Semantic Rewards
for Abstractive Summarization
Siyao Li1∗
, Deren Lei1∗
, Pengda Qin2
, William Yang Wang1
1
University of California, Santa Barbara
2
Beijing University of Posts and Telecommunications
{siyaoli, derenlei}@ucsb.edu, qinpengda@bupt.edu.cn, william@cs.ucsb.edu
Abstract
Deep reinforcement learning (RL) has been
a commonly-used strategy for the abstractive
summarization task to address both the expo-
sure bias and non-differentiable task issues.
However, the conventional reward ROUGE-L
simply looks for exact n-grams matches be-
tween candidates and annotated references,
which inevitably makes the generated sen-
tences repetitive and incoherent. In this paper,
instead of ROUGE-L, we explore the practica-
bility of utilizing the distributional semantics
to measure the matching degrees. With dis-
tributional semantics, sentence-level evalua-
tion can be obtained, and semantically-correct
phrases can also be generated without be-
ing limited to the surface form of the refer-
ence sentences. Human judgments on Giga-
word and CNN/Daily Mail datasets show that
our proposed distributional semantics reward
(DSR) has distinct superiority in capturing the
lexical and compositional diversity of natural
language.
1 Introduction
Abstractive summarization is a task of paraphras-
ing a long article with fewer words. Unlike extrac-
tive summarization, abstractive summaries can in-
clude tokens out of the article’s vocabulary. There
exists several encoder-decoder approaches such
as attention-based architecture (Bahdanau et al.,
2015; Rush et al., 2015; Nallapati et al., 2016;
Chopra et al., 2016), incorporating graph tech-
niques (Moawad and Aref, 2012; Ganesan et al.,
2010), and adding pointer generator (Vinyals
et al., 2015; Nallapati et al., 2016; See et al., 2017;
Bello et al., 2017). However, long sentence gener-
ation suffers from exposure bias (Bahdanau et al.,
2017) as the error accumulates during the decod-
ing process.
∗
Equal contributions.
Many innovative deep RL methods (Ranzato
et al., 2016; Wu et al., 2016; Paulus et al., 2018;
Lamb et al., 2016) are developed to alleviate this
issue by providing sentence-level feedback after
generating a complete sentence, in addition to op-
timal transport usage (Napoles et al., 2012). How-
ever, commonly used automatic evaluation metrics
for generating sentence-level rewards count exact
n-grams matches and are not robust to different
words that share similar meanings since the se-
mantic level reward is deficient.
Currently, many studies on contextualized word
representations (Peters et al., 2018; Devlin et al.,
2019) prove that they have a powerful capacity
of reflecting distributional semantic. In this pa-
per, we propose to use the distributional seman-
tic reward to boost the RL-based abstractive sum-
marization system. Moreover, we design several
novel objective functions.
Experiment results show that they outperform
the conventional objectives while increasing the
sentence fluency. Our main contributions are
three-fold:
• We are the first to introduce DSR to abstractive
summarization and achieve better results than con-
ventional rewards.
• Unlike ROUGE, our DSR does not rely on cross-
entropy loss (XENT) to produce readable phrases.
Thus, no exposure bias is introduced.
• DSR improves generated tokens’ diversity and
fluency while avoiding unnecessary repetitions.
2 Methodology
Background While sequence models are usu-
ally trained using XENT, they are typically eval-
uated at test time using discrete NLP metrics
such as BLEU (Papineni et al., 2002), ROUGE
(Lin, 2004), METEOR (Banerjee and Lavie, 2005).
Therefore, they suffer from both the exposure bias
arXiv:1909.00141v1[cs.CL]31Aug2019
and non-differentiable task metric issues. To solve
these problems, many resorts to deep RL with se-
quence to sequence model (Paulus et al., 2018;
Ranzato et al., 2016; Ryang and Abekawa, 2012),
where the learning agent interacts with a given en-
vironment. However, RL models have poor sam-
ple efficiency and lead to very slow convergence
rate. Therefore, RL methods usually start from a
pretrained policy, which is established by optimiz-
ing XENT at each word generation step.
LXENT = −
n
t=1
log P(yt|y1, . . . , yt−1, x). (1)
Then, during RL stage, the conventional way is
to adopt self-critical strategy to fine-tune based on
the target evaluation metric,
LRL =
n
t=1
log P( ˆyt| ˆy1, . . . , ˆyt−1, x) (2)
× (rmetric(yb
) − rmetric(ˆy))
Distributional Semantic Reward During eval-
uating the quality of the generated sentences,
ROUGE looks for exact matches between refer-
ences and generations, which naturally overlooks
the expression diversity of the natural language. In
other words, it fails to capture the semantic rela-
tion between similar words. To solve this problem,
distributional semantic representations are a prac-
tical way. Recent works on contextualized word
representations, including ELMO (Peters et al.,
2018), GPT (Radford et al., 2018), BERT (Devlin
et al., 2019), prove that distributional semantics
can be captured effectively. Based on that, a recent
study, called BERTSCORE (Zhang et al., 2019),
focuses on sentence-level generation evaluation
by using pre-trained BERT contextualized embed-
dings to compute the similarity between two sen-
tences as a weighted aggregation of cosine simi-
larities between their tokens. It has a higher cor-
relation with human evaluation on text generation
tasks comparing to existing evaluation metrics.
In this paper, we introduce it as a DSR for deep
RL. The BERTSCORE is defined as:
RBERT =
yi∈y idf(yi) maxˆyj∈ˆy yi ˆyj
yi∈y idf(yi)
(3)
PBERT =
yj∈y idf(yj) maxˆyi∈ˆy yj ˆyi
yj∈y idf(yj)
(4)
FBERT = 2
RBERT · PBERT
RBERT + PBERT
(5)
where y and ˆy represent BERT contextual embed-
dings of reference word y and candidate word ˆy,
respectively. The function idf(·) calculates inverse
document frequency (idf). In our DSR, we do not
use the idf since Zhang et al. (2019) requires to
use the entire dataset including test set for calcula-
tion. Besides, ROUGE do not use similar weight,
so we do not include idf for consistency.
3 Experimental Setup
3.1 Datasets
Gigaword corpus It is an English sentence sum-
marization dataset based on annotated Gigaword
(Napoles et al., 2012). A single sentence sum-
marization is paired with a short article. We use
the OpenNMT provided version It contains 3.8M
training, 189k development instances. We ran-
domly sample 15k instances as our test data.
CNN/Daily Mail dataset It consists of on-
line news articles and their corresponding multi-
sentence abstracts (Hermann et al., 2015; Nallap-
ati et al., 2016). We use the non-anonymized ver-
sion provided by See et al. (2017), which contains
287k training, 13k validation, and 11k testing ex-
amples. We truncate the articles to 400 tokens and
limit the summary lengths to 100 tokens.
3.2 Pretrain
We first pretrain a sequence-to-sequence model
with attention using XENT and then select the best
parameters to initialize models for RL. Our mod-
els have 256-dimensional hidden states and 128-
dimensional word embeddings and also incorpo-
rates the pointer mechanism (See et al., 2017) for
handling out of vocabulary words.
3.3 Baseline
In abstractive summarization, ROUGE (Lin, 2004)
is a common evaluation metric to provide a
sentence-level reward for RL. However, using
ROUGE as a pure RL objective may cause too
many repetitions and reduced fluency in outputs.
Paulus et al. (2018) propose a hybrid learning
objective that combines XENT and self-critical
ROUGE reward (Paulus et al., 2018).
Lbaseline = γLrouge + (1 − γ)LXENT (6)
where γ is a scaling factor and the F score
of ROUGE-L is used as the reward to calcu-
late LRouge. In our experiment, we select γ =
0.998 for Gigaword Corpus and γ = 0.9984 for
CNN/Daily Mail dataset. 1 Note that we do not
avoid repetition during the test time as Paulus et al.
(2018) do, because we want to examine the repe-
tition of sentence directly produced after training.
3.4 Proposed Objective Functions
Inspired by the above objective function (Paulus
et al., 2018), we optimize RL models with a simi-
lar loss function as equation 2. Instead of ROUGE-
L, we incorporate BERTSCORE, a DSR to pro-
vide sentence-level feedback.
In our experiment, LDSR is the self-critical RL
loss (equation 2) with FBERT as the reward. We
introduce the following objective functions:
DSR+ROUGE: A combined reward function of
ROUGE and FBERT
L1 = γLDSR + (1 − γ)Lrouge (7)
In our experiment, we select γ = 0.5 for both
datasets to balance the influence of two reward
functions.
DSR+XENT: FBERT reward with XENT to
make the generated phrases more readable.
L2 = γ LDSR + (1 − γ )LXENT (8)
In our experiment, we select γ = 0.998 for Gi-
gaword Corpus and γ = 0.9984 for CNN/Daily
Mail dataset.
DSR: Pure FBERT objective function without
any teacher forcing.
L3 = LDSR (9)
4 Results
For the abstractive summarization task, we test
our models with different objectives on the Giga-
word and CNN/Daily Mail datasets. We choose
FBERT and ROUGE-L as our automatic evalua-
tion metrics. For multi-sentence summaries in
1
γ is very close to 1 due to the difference in the scales of
the two loss functions. For Gigaword Corpus, we tune the
γ on the development set. For CNN/Daily Mail, we use the
same γ as Paulus et al. (2018) does.
85000 87500 90000 92500 95000 97500 100000 102500
Training Steps
0.60
0.62
0.64
0.66
F-BERT
DSR
ROUGE
DSR+XENT
ROUGE+XENT
DSR+ROUGE
85000 87500 90000 92500 95000 97500 100000 102500
Training Steps
0.32
0.34
0.36
0.38
0.40
0.42
0.44
ROUGE
DSR
ROUGE
DSR+XENT
ROUGE+XENT
DSR+ROUGE
Figure 1: FBERT (left) and ROUGE (right) on Giga-
word Corpus during training time after pretraining with
XENT. Note that ROUGE is not our target evaluation
metrics.
CNN/Daily Mail dataset, we calculate ROUGE-
L and FBERT after concatenating the original ab-
stract sentences or their BERT contextualized em-
beddings. The results are shown in Table 1. The
model DSR+ROUGE (equation 7) achieves the
best ROUGE performance on Gigaword and sim-
ilar performance comparing to ROUGE model on
CNN/Daily Mail. Training with DSR (equation 9)
and DSR+XENT (equation 8) can obtain the best
BERTScores as expected. It is also expected that
ROUGE model will obtain worse BERTScores as
simply optimizing ROUGE will generate less read-
able sentences (Paulus et al., 2018); however, DSR
model without XENT as a teacher forcing can im-
prove the performance of pretrained model in both
FBERT and ROUGE-L scale.
Note that DSR model’s ROUGE-L is high in
training time but does not have a good general-
ization on test set, and ROUGE-L is not our tar-
get evaluation metrics. In the next section, we will
do human evaluation to analyze the summarization
performance of different reward systems.
5 Human Evaluation
We perform human evaluation on the Amazon Me-
chanical Turk to assure the benefits of DSR on
output sentences’ coherence and fluency. We ran-
domly sample 500 items as an evaluation set us-
Gigawords CNN/Daily Mail
Model FBERT PBERT RBERT ROUGE FBERT PBERT RBERT ROUGE
XENT 65.78 67.61 64.53 40.77 62.77 62.18 63.79 29.46
ROUGE 61.46 60.89 62.70 42.73 60.11 59.31 61.33 33.89
ROUGE+XENT 66.50 67.24 66.28 42.72 61.38 61.07 62.17 33.07
DSR+ROUGE 66.48 66.79 66.65 42.95 65.01 65.92 64.56 33.61
DSR+XENT 67.02 67.69 66.85 42.29 66.64 66.06 67.63 31.28
DSR 67.06 67.34 67.28 41.73 66.93 66.27 67.98 30.96
Table 1: Results on Gigawords and CNN/Daily Mail for abstractive summarization. Upper rows show the results
of baselines. Rouge stands for the F score of Rouge-L.
Task Model v. base
Gigawords CNN/Daily Mail
Win Lose Tie Win Lose Tie
Relevance
DSR + XENT 31.0% 25.2% 43.8% 51.1% 34.1% 14.8%
DSR 45.4% 27.2% 27.4% 48.7% 38.5% 12.8%
Fluency
DSR + XENT 40.2% 19.6% 40.2% 55.8% 28.5% 15.6%
DSR 45.4% 28.8% 25.8% 54.0% 31.7% 14.3%
Table 2: Human evaluation results on Gigaword and CNN/Daily Mail for abstractive summarization.
ROUGE+XENT is the selected baseline model.
ing uniform distribution. During the evaluation,
given the context without knowing the ground
truth summary, judges are asked to decide two
things between generations of ROUGE+XENT and
DSR+XENT: 1. which summary is more rele-
vant to the article. 2. which summary is more
fluent. Ties are permitted. A similar test is also
done between ROUGE+XENT and DSR model.
As shown in Table 2, DSR and DSR+XENT mod-
els improve the relevance and fluency of generated
summary significantly. In addition, using pure
DSR achieve better performances on Gigaword
Corpus and comparable results as DSR+XENT on
CNN/Daily Mail. While Paulus et al.’s (2018) ob-
jective function requires XENT to make generated
sentence readable, our proposed DSR does not re-
quire XENT and can limit the exposure bias orig-
inated from it.
6 Analysis
Diversity Other than extractive summarization,
abstractive summarization allows more degrees of
freedom in the choice of words. While simply se-
lecting words from the article made the task eas-
ier to train, higher action space can provide more
paths to potentially better results (Nema et al.,
2017). Using the DSR as deep RL reward will sup-
port models to choose actions that are not n-grams
of the articles. In Table 4, we list a few generated
samples on Gigaword Corpus. In our first exam-
ple in Table 4, the word “sino-german” provides
Model
Gigawords CNN/Daily Mail
Rep(%) Div(%) Rep(%) Div(%)
ROUGE+XENT 11.57 16.33 66.82 21.32
DSR+ROUGE 18.42 15.97 50.41 34.70
DSR+XENT 20.15 16.24 25.98 31.80
DSR 7.20 19.17 22.03 41.35
Table 3: Qualitative analysis on repetition(Rep) / di-
versity(Div). They are calculated by the percentage of
repeat/out-of-article n-grams (unigrams for Gigaword
and 5-grams for CNN/Daily Mail) in generated sen-
tences.
an interesting and efficient way to express the re-
lation between China and Germany. FBERT is also
improved by making this change. In addition, the
second example in Table 4 shows that RL model
with DSR corrects the sentence’ grammar and sig-
nificantly improves the FBERT score by switching
“down” to an unseen word “drops”. On the other
hand, when optimizing DSR to improve the di-
versity of generation, some semantically similar
words may also be generated and harm the sum-
marization quality as shown in the third example
in Table 4. The new token “wins” reduces the
scores of both metrics. We also evaluate the di-
versity of a model quantitively by averaging the
percentage of out-of-article n-grams in generated
sentences. Results can be found in Table 3. The
DSR model achieves the highest diversity.
1) Context economic links between china and germany will be at the center of talks here next month between li
peng and chancellor helmut kohl when the chinese premier makes a week-long visit to germany ,
sources on both sides said .
Groundtruth li peng ’s visit to germany to focus on economic ties
ROUGE+XENT chinese premier to visit germany next month { FBERT: 49.48, ROUGE:18.64}
DSR+XENT chinese premier to visit germany next month { FBERT: 49.48, ROUGE:18.64}
DSR sino-german economic links to be at center of talks with germany {FBERT: 50.77, ROUGE:17.33}
2) Context the sensitive index dropped by #.## percent friday on the bombay stock exchange -lrb- bse -rrb-
following brisk inquiries from operators that generally pressed sales to square up positions on the last
day of the current settlement .
Groundtruth sensex falls in bombay stock exchange
ROUGE+XENT sensitive index down on bombay stock exchange {FBERT: 70.03, ROUGE:45.62}
DSR+XENT sensitive index drops on bombay stock exchange {FBERT: 73.92, ROUGE:45.62}
DSR sensitive index drops on bombay stock exchange {FBERT: 73.92, ROUGE:45.62}
3) Context belgium ’s rik verbrugghe took victory in tuesday ’s prologue stage of the tour de romandie , with a
confident ride through the streets of geneva ’s old town .
Groundtruth verbrugghe takes prologue victory in tour de romandie
ROUGE+XENT verbrugghe takes victory in prologue stage of tour de romandie {FBERT: 91.41, ROUGE:75.93}
DSR+XENT verbrugghe takes victory in tour de romandie prologue stage {FBERT: 91.56, ROUGE:81.79}
DSR verbrugghe wins victory in tour de romandie prologue stage {FBERT: 89.45, ROUGE:70.10}
4) Context european finance ministers on saturday urged swedes to vote “ yes ” to adopting the euro , saying entry
into the currency bloc would be good for sweden and the rest of europe .
Groundtruth european finance ministers urge swedes to vote yes to euro
ROUGE+XENT eu finance ministers urge swedes to vote yes to euro {FBERT: 96.63, ROUGE:93.47}
DSR+XENT eu finance ministers urge swedes to vote yes to adopting euro {FBERT: 90.38, ROUGE:88.89}
DSR eu finance ministers urge swedes to vote yes to euro euro {FBERT: 95.05, ROUGE:93.47}
Table 4: Qualitative analysis of generated samples on Gigaword corpus. Generated words that do not appear in
the context are marked blue. Repeated words are marked red. The first two examples represent DSR’s generated
tokens are more diverse. However, it may suffer from problems as shown in example 3 and 4.
Repetition Repetition will lead to lower FBERT
as shown in the last example in Table 4. Using
DSR reduces the probability of producing repe-
titions. The average percentage of repeated n-
grams in generated sentences are presented in the
Table 3. As shown in this table, unlike ROUGE,
the DSR model can achieve high fluency without
XENT; moreover, it produces the fewest repeti-
tions among all the rewards. Table 4 gives an ex-
ample that DSR produces a repeated word (from
example 4), but it does not reflect the overall dis-
tribution of repeated word generation for all eval-
uated models.
7 Conclusion
This paper demonstrates the effectiveness of ap-
plying the distributional semantic reward to rein-
forcement learning in abstractive summarization,
and specifically, we choose BERTSCORE. Our
experimental results demonstrate that we achieve
better performance on Gigaword and CNN/Daily
Mail datasets. Besides, the generated sentences
have fewer repetitions, and the fluency is also im-
proved. Our finding is aligned to a contempora-
neous study (Wieting et al., 2019) on leveraging
semantic similarity for machine translation.
References
Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu,
Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron C.
Courville, and Yoshua Bengio. 2017. An actor-critic
algorithm for sequence prediction. In 5th Inter-
national Conference on Learning Representations,
Conference Track Proceedings.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-
gio. 2015. Neural machine translation by jointly
learning to align and translate. In 3rd International
Conference on Learning Representations, Confer-
ence Track Proceedings.
Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An
automatic metric for mt evaluation with improved
correlation with human judgments. In Proceedings
of the acl workshop on intrinsic and extrinsic evalu-
ation measures for machine translation and/or sum-
marization, pages 65–72.
Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad
Norouzi, and Samy Bengio. 2017. Neural combina-
torial optimization with reinforcement learning. In
5th International Conference on Learning Represen-
tations, Workshop Track Proceedings.
Sumit Chopra, Michael Auli, and Alexander M Rush.
2016. Abstractive sentence summarization with at-
tentive recurrent neural networks. In Proceedings of
the 2016 Conference of the North American Chap-
ter of the Association for Computational Linguistics:
Human Language Technologies, pages 93–98.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. Bert: Pre-training of
deep bidirectional transformers for language under-
standing. In Proceedings of the 2019 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language Tech-
nologies, Volume 1 (Long and Short Papers), pages
4171–4186.
Kavita Ganesan, ChengXiang Zhai, and Jiawei Han.
2010. Opinosis: A graph based approach to abstrac-
tive summarization of highly redundant opinions. In
Proceedings of the 23rd International Conference on
Computational Linguistics, pages 340–348.
Karl Moritz Hermann, Tomas Kocisky, Edward
Grefenstette, Lasse Espeholt, Will Kay, Mustafa Su-
leyman, and Phil Blunsom. 2015. Teaching ma-
chines to read and comprehend. In Advances in
neural information processing systems, pages 1693–
1701.
Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying
Zhang, Saizheng Zhang, Aaron C Courville, and
Yoshua Bengio. 2016. Professor forcing: A new
algorithm for training recurrent networks. In Ad-
vances In Neural Information Processing Systems,
pages 4601–4609.
Chin-Yew Lin. 2004. Rouge: A package for auto-
matic evaluation of summaries. Text Summarization
Branches Out.
Ibrahim F Moawad and Mostafa Aref. 2012. Semantic
graph reduction approach for abstractive text sum-
marization. In 2012 Seventh International Confer-
ence on Computer Engineering & Systems (ICCES),
pages 132–138. IEEE.
Ramesh Nallapati, Bowen Zhou, Cicero dos Santos,
C¸ a glar Gulc¸ehre, and Bing Xiang. 2016. Abstrac-
tive text summarization using sequence-to-sequence
rnns and beyond. CoNLL 2016, page 280.
Courtney Napoles, Matthew Gormley, and Benjamin
Van Durme. 2012. Annotated gigaword. In Pro-
ceedings of the Joint Workshop on Automatic Knowl-
edge Base Construction and Web-scale Knowledge
Extraction, pages 95–100. Association for Compu-
tational Linguistics.
Preksha Nema, Mitesh M Khapra, Anirban Laha, and
Balaraman Ravindran. 2017. Diversity driven atten-
tion model for query-based abstractive summariza-
tion. In Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 1063–1072.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
Jing Zhu. 2002. Bleu: a method for automatic eval-
uation of machine translation. In Proceedings of
the 40th annual meeting on association for compu-
tational linguistics, pages 311–318. Association for
Computational Linguistics.
Romain Paulus, Caiming Xiong, and Richard Socher.
2018. A deep reinforced model for abstractive sum-
marization. In 6th International Conference on
Learning Representations, Conference Track Pro-
ceedings.
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, and Luke
Zettlemoyer. 2018. Deep contextualized word repre-
sentations. In Proceedings of the 2018 Conference
of the North American Chapter of the Association
for Computational Linguistics: Human Language
Technologies, NAACL-HLT, pages 2227–2237.
Alec Radford, Karthik Narasimhan, Tim Salimans, and
Ilya Sutskever. 2018. Improving language under-
standing by generative pre-training.
Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli,
and Wojciech Zaremba. 2016. Sequence level train-
ing with recurrent neural networks. In 4th Inter-
national Conference on Learning Representations,
Conference Track Proceedings.
Alexander M Rush, Sumit Chopra, and Jason Weston.
2015. A neural attention model for abstractive sen-
tence summarization. In Proceedings of the 2015
Conference on Empirical Methods in Natural Lan-
guage Processing, pages 379–389.
Seonggi Ryang and Takeshi Abekawa. 2012. Frame-
work of automatic text summarization using rein-
forcement learning. In Proceedings of the 2012
Joint Conference on Empirical Methods in Natu-
ral Language Processing and Computational Natu-
ral Language Learning, pages 256–265. Association
for Computational Linguistics.
Abigail See, Peter J Liu, and Christopher D Manning.
2017. Get to the point: Summarization with pointer-
generator networks. In Proceedings of the 55th An-
nual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages 1073–
1083.
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly.
2015. Pointer networks. In Advances in Neural In-
formation Processing Systems, pages 2692–2700.
John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel,
and Graham Neubig. 2019. Beyond bleu: Training
neural machine translation with semantic similarity.
In Proceedings of the 57th Conference of the Asso-
ciation for Computational Linguistics, pages 4344–
4355.
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V
Le, Mohammad Norouzi, Wolfgang Macherey,
Maxim Krikun, Yuan Cao, Qin Gao, Klaus
Macherey, et al. 2016. Google’s neural ma-
chine translation system: Bridging the gap between
human and machine translation. arXiv preprint
arXiv:1609.08144.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q
Weinberger, and Yoav Artzi. 2019. Bertscore: Eval-
uating text generation with bert. arXiv preprint
arXiv:1904.09675.

More Related Content

What's hot

Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalYI-JHEN LIN
 
Mathematics 06-00092 (1)
Mathematics 06-00092 (1)Mathematics 06-00092 (1)
Mathematics 06-00092 (1)Dr. Hari Arora
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Latent Structured Ranking
Latent Structured RankingLatent Structured Ranking
Latent Structured RankingSunny Kr
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Association for Computational Linguistics
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesAlgoscale Technologies Inc.
 
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)Daisuke BEKKI
 
Linguistic variable
Linguistic variable Linguistic variable
Linguistic variable Math-Circle
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationAttaporn Ninsuwan
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015Conor McGrory
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemIRJET Journal
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationIJERD Editor
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 

What's hot (18)

Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
 
Mathematics 06-00092 (1)
Mathematics 06-00092 (1)Mathematics 06-00092 (1)
Mathematics 06-00092 (1)
 
p138-jiang
p138-jiangp138-jiang
p138-jiang
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Latent Structured Ranking
Latent Structured RankingLatent Structured Ranking
Latent Structured Ranking
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
P13 corley
P13 corleyP13 corley
P13 corley
 
Word embedding
Word embedding Word embedding
Word embedding
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
 
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
 
Linguistic variable
Linguistic variable Linguistic variable
Linguistic variable
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
ijcai05_srl
ijcai05_srlijcai05_srl
ijcai05_srl
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 

Similar to Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization

Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateDaniel Koh
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFDaniel Koh
 
Audit report[rollno 49]
Audit report[rollno 49]Audit report[rollno 49]
Audit report[rollno 49]RAHULROHAM2
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Implicit regularization of SGD in NLP
Implicit regularization of SGD in NLPImplicit regularization of SGD in NLP
Implicit regularization of SGD in NLPDeren Lei
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data scienceNikhil Jaiswal
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisGina Rizzo
 
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningLearning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningDeren Lei
 
Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfSarah Marie
 
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsA Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsDawn Cook
 
result analysis for deep leakage from gradients
result analysis for deep leakage from gradientsresult analysis for deep leakage from gradients
result analysis for deep leakage from gradients國騰 丁
 

Similar to Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization (20)

Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generate
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIF
 
Audit report[rollno 49]
Audit report[rollno 49]Audit report[rollno 49]
Audit report[rollno 49]
 
ssc_icml13
ssc_icml13ssc_icml13
ssc_icml13
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Implicit regularization of SGD in NLP
Implicit regularization of SGD in NLPImplicit regularization of SGD in NLP
Implicit regularization of SGD in NLP
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic Analysis
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningLearning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
 
J017256674
J017256674J017256674
J017256674
 
Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdf
 
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsA Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
 
result analysis for deep leakage from gradients
result analysis for deep leakage from gradientsresult analysis for deep leakage from gradients
result analysis for deep leakage from gradients
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 

Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization

  • 1. Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization Siyao Li1∗ , Deren Lei1∗ , Pengda Qin2 , William Yang Wang1 1 University of California, Santa Barbara 2 Beijing University of Posts and Telecommunications {siyaoli, derenlei}@ucsb.edu, qinpengda@bupt.edu.cn, william@cs.ucsb.edu Abstract Deep reinforcement learning (RL) has been a commonly-used strategy for the abstractive summarization task to address both the expo- sure bias and non-differentiable task issues. However, the conventional reward ROUGE-L simply looks for exact n-grams matches be- tween candidates and annotated references, which inevitably makes the generated sen- tences repetitive and incoherent. In this paper, instead of ROUGE-L, we explore the practica- bility of utilizing the distributional semantics to measure the matching degrees. With dis- tributional semantics, sentence-level evalua- tion can be obtained, and semantically-correct phrases can also be generated without be- ing limited to the surface form of the refer- ence sentences. Human judgments on Giga- word and CNN/Daily Mail datasets show that our proposed distributional semantics reward (DSR) has distinct superiority in capturing the lexical and compositional diversity of natural language. 1 Introduction Abstractive summarization is a task of paraphras- ing a long article with fewer words. Unlike extrac- tive summarization, abstractive summaries can in- clude tokens out of the article’s vocabulary. There exists several encoder-decoder approaches such as attention-based architecture (Bahdanau et al., 2015; Rush et al., 2015; Nallapati et al., 2016; Chopra et al., 2016), incorporating graph tech- niques (Moawad and Aref, 2012; Ganesan et al., 2010), and adding pointer generator (Vinyals et al., 2015; Nallapati et al., 2016; See et al., 2017; Bello et al., 2017). However, long sentence gener- ation suffers from exposure bias (Bahdanau et al., 2017) as the error accumulates during the decod- ing process. ∗ Equal contributions. Many innovative deep RL methods (Ranzato et al., 2016; Wu et al., 2016; Paulus et al., 2018; Lamb et al., 2016) are developed to alleviate this issue by providing sentence-level feedback after generating a complete sentence, in addition to op- timal transport usage (Napoles et al., 2012). How- ever, commonly used automatic evaluation metrics for generating sentence-level rewards count exact n-grams matches and are not robust to different words that share similar meanings since the se- mantic level reward is deficient. Currently, many studies on contextualized word representations (Peters et al., 2018; Devlin et al., 2019) prove that they have a powerful capacity of reflecting distributional semantic. In this pa- per, we propose to use the distributional seman- tic reward to boost the RL-based abstractive sum- marization system. Moreover, we design several novel objective functions. Experiment results show that they outperform the conventional objectives while increasing the sentence fluency. Our main contributions are three-fold: • We are the first to introduce DSR to abstractive summarization and achieve better results than con- ventional rewards. • Unlike ROUGE, our DSR does not rely on cross- entropy loss (XENT) to produce readable phrases. Thus, no exposure bias is introduced. • DSR improves generated tokens’ diversity and fluency while avoiding unnecessary repetitions. 2 Methodology Background While sequence models are usu- ally trained using XENT, they are typically eval- uated at test time using discrete NLP metrics such as BLEU (Papineni et al., 2002), ROUGE (Lin, 2004), METEOR (Banerjee and Lavie, 2005). Therefore, they suffer from both the exposure bias arXiv:1909.00141v1[cs.CL]31Aug2019
  • 2. and non-differentiable task metric issues. To solve these problems, many resorts to deep RL with se- quence to sequence model (Paulus et al., 2018; Ranzato et al., 2016; Ryang and Abekawa, 2012), where the learning agent interacts with a given en- vironment. However, RL models have poor sam- ple efficiency and lead to very slow convergence rate. Therefore, RL methods usually start from a pretrained policy, which is established by optimiz- ing XENT at each word generation step. LXENT = − n t=1 log P(yt|y1, . . . , yt−1, x). (1) Then, during RL stage, the conventional way is to adopt self-critical strategy to fine-tune based on the target evaluation metric, LRL = n t=1 log P( ˆyt| ˆy1, . . . , ˆyt−1, x) (2) × (rmetric(yb ) − rmetric(ˆy)) Distributional Semantic Reward During eval- uating the quality of the generated sentences, ROUGE looks for exact matches between refer- ences and generations, which naturally overlooks the expression diversity of the natural language. In other words, it fails to capture the semantic rela- tion between similar words. To solve this problem, distributional semantic representations are a prac- tical way. Recent works on contextualized word representations, including ELMO (Peters et al., 2018), GPT (Radford et al., 2018), BERT (Devlin et al., 2019), prove that distributional semantics can be captured effectively. Based on that, a recent study, called BERTSCORE (Zhang et al., 2019), focuses on sentence-level generation evaluation by using pre-trained BERT contextualized embed- dings to compute the similarity between two sen- tences as a weighted aggregation of cosine simi- larities between their tokens. It has a higher cor- relation with human evaluation on text generation tasks comparing to existing evaluation metrics. In this paper, we introduce it as a DSR for deep RL. The BERTSCORE is defined as: RBERT = yi∈y idf(yi) maxˆyj∈ˆy yi ˆyj yi∈y idf(yi) (3) PBERT = yj∈y idf(yj) maxˆyi∈ˆy yj ˆyi yj∈y idf(yj) (4) FBERT = 2 RBERT · PBERT RBERT + PBERT (5) where y and ˆy represent BERT contextual embed- dings of reference word y and candidate word ˆy, respectively. The function idf(·) calculates inverse document frequency (idf). In our DSR, we do not use the idf since Zhang et al. (2019) requires to use the entire dataset including test set for calcula- tion. Besides, ROUGE do not use similar weight, so we do not include idf for consistency. 3 Experimental Setup 3.1 Datasets Gigaword corpus It is an English sentence sum- marization dataset based on annotated Gigaword (Napoles et al., 2012). A single sentence sum- marization is paired with a short article. We use the OpenNMT provided version It contains 3.8M training, 189k development instances. We ran- domly sample 15k instances as our test data. CNN/Daily Mail dataset It consists of on- line news articles and their corresponding multi- sentence abstracts (Hermann et al., 2015; Nallap- ati et al., 2016). We use the non-anonymized ver- sion provided by See et al. (2017), which contains 287k training, 13k validation, and 11k testing ex- amples. We truncate the articles to 400 tokens and limit the summary lengths to 100 tokens. 3.2 Pretrain We first pretrain a sequence-to-sequence model with attention using XENT and then select the best parameters to initialize models for RL. Our mod- els have 256-dimensional hidden states and 128- dimensional word embeddings and also incorpo- rates the pointer mechanism (See et al., 2017) for handling out of vocabulary words. 3.3 Baseline In abstractive summarization, ROUGE (Lin, 2004) is a common evaluation metric to provide a sentence-level reward for RL. However, using ROUGE as a pure RL objective may cause too many repetitions and reduced fluency in outputs. Paulus et al. (2018) propose a hybrid learning objective that combines XENT and self-critical ROUGE reward (Paulus et al., 2018). Lbaseline = γLrouge + (1 − γ)LXENT (6)
  • 3. where γ is a scaling factor and the F score of ROUGE-L is used as the reward to calcu- late LRouge. In our experiment, we select γ = 0.998 for Gigaword Corpus and γ = 0.9984 for CNN/Daily Mail dataset. 1 Note that we do not avoid repetition during the test time as Paulus et al. (2018) do, because we want to examine the repe- tition of sentence directly produced after training. 3.4 Proposed Objective Functions Inspired by the above objective function (Paulus et al., 2018), we optimize RL models with a simi- lar loss function as equation 2. Instead of ROUGE- L, we incorporate BERTSCORE, a DSR to pro- vide sentence-level feedback. In our experiment, LDSR is the self-critical RL loss (equation 2) with FBERT as the reward. We introduce the following objective functions: DSR+ROUGE: A combined reward function of ROUGE and FBERT L1 = γLDSR + (1 − γ)Lrouge (7) In our experiment, we select γ = 0.5 for both datasets to balance the influence of two reward functions. DSR+XENT: FBERT reward with XENT to make the generated phrases more readable. L2 = γ LDSR + (1 − γ )LXENT (8) In our experiment, we select γ = 0.998 for Gi- gaword Corpus and γ = 0.9984 for CNN/Daily Mail dataset. DSR: Pure FBERT objective function without any teacher forcing. L3 = LDSR (9) 4 Results For the abstractive summarization task, we test our models with different objectives on the Giga- word and CNN/Daily Mail datasets. We choose FBERT and ROUGE-L as our automatic evalua- tion metrics. For multi-sentence summaries in 1 γ is very close to 1 due to the difference in the scales of the two loss functions. For Gigaword Corpus, we tune the γ on the development set. For CNN/Daily Mail, we use the same γ as Paulus et al. (2018) does. 85000 87500 90000 92500 95000 97500 100000 102500 Training Steps 0.60 0.62 0.64 0.66 F-BERT DSR ROUGE DSR+XENT ROUGE+XENT DSR+ROUGE 85000 87500 90000 92500 95000 97500 100000 102500 Training Steps 0.32 0.34 0.36 0.38 0.40 0.42 0.44 ROUGE DSR ROUGE DSR+XENT ROUGE+XENT DSR+ROUGE Figure 1: FBERT (left) and ROUGE (right) on Giga- word Corpus during training time after pretraining with XENT. Note that ROUGE is not our target evaluation metrics. CNN/Daily Mail dataset, we calculate ROUGE- L and FBERT after concatenating the original ab- stract sentences or their BERT contextualized em- beddings. The results are shown in Table 1. The model DSR+ROUGE (equation 7) achieves the best ROUGE performance on Gigaword and sim- ilar performance comparing to ROUGE model on CNN/Daily Mail. Training with DSR (equation 9) and DSR+XENT (equation 8) can obtain the best BERTScores as expected. It is also expected that ROUGE model will obtain worse BERTScores as simply optimizing ROUGE will generate less read- able sentences (Paulus et al., 2018); however, DSR model without XENT as a teacher forcing can im- prove the performance of pretrained model in both FBERT and ROUGE-L scale. Note that DSR model’s ROUGE-L is high in training time but does not have a good general- ization on test set, and ROUGE-L is not our tar- get evaluation metrics. In the next section, we will do human evaluation to analyze the summarization performance of different reward systems. 5 Human Evaluation We perform human evaluation on the Amazon Me- chanical Turk to assure the benefits of DSR on output sentences’ coherence and fluency. We ran- domly sample 500 items as an evaluation set us-
  • 4. Gigawords CNN/Daily Mail Model FBERT PBERT RBERT ROUGE FBERT PBERT RBERT ROUGE XENT 65.78 67.61 64.53 40.77 62.77 62.18 63.79 29.46 ROUGE 61.46 60.89 62.70 42.73 60.11 59.31 61.33 33.89 ROUGE+XENT 66.50 67.24 66.28 42.72 61.38 61.07 62.17 33.07 DSR+ROUGE 66.48 66.79 66.65 42.95 65.01 65.92 64.56 33.61 DSR+XENT 67.02 67.69 66.85 42.29 66.64 66.06 67.63 31.28 DSR 67.06 67.34 67.28 41.73 66.93 66.27 67.98 30.96 Table 1: Results on Gigawords and CNN/Daily Mail for abstractive summarization. Upper rows show the results of baselines. Rouge stands for the F score of Rouge-L. Task Model v. base Gigawords CNN/Daily Mail Win Lose Tie Win Lose Tie Relevance DSR + XENT 31.0% 25.2% 43.8% 51.1% 34.1% 14.8% DSR 45.4% 27.2% 27.4% 48.7% 38.5% 12.8% Fluency DSR + XENT 40.2% 19.6% 40.2% 55.8% 28.5% 15.6% DSR 45.4% 28.8% 25.8% 54.0% 31.7% 14.3% Table 2: Human evaluation results on Gigaword and CNN/Daily Mail for abstractive summarization. ROUGE+XENT is the selected baseline model. ing uniform distribution. During the evaluation, given the context without knowing the ground truth summary, judges are asked to decide two things between generations of ROUGE+XENT and DSR+XENT: 1. which summary is more rele- vant to the article. 2. which summary is more fluent. Ties are permitted. A similar test is also done between ROUGE+XENT and DSR model. As shown in Table 2, DSR and DSR+XENT mod- els improve the relevance and fluency of generated summary significantly. In addition, using pure DSR achieve better performances on Gigaword Corpus and comparable results as DSR+XENT on CNN/Daily Mail. While Paulus et al.’s (2018) ob- jective function requires XENT to make generated sentence readable, our proposed DSR does not re- quire XENT and can limit the exposure bias orig- inated from it. 6 Analysis Diversity Other than extractive summarization, abstractive summarization allows more degrees of freedom in the choice of words. While simply se- lecting words from the article made the task eas- ier to train, higher action space can provide more paths to potentially better results (Nema et al., 2017). Using the DSR as deep RL reward will sup- port models to choose actions that are not n-grams of the articles. In Table 4, we list a few generated samples on Gigaword Corpus. In our first exam- ple in Table 4, the word “sino-german” provides Model Gigawords CNN/Daily Mail Rep(%) Div(%) Rep(%) Div(%) ROUGE+XENT 11.57 16.33 66.82 21.32 DSR+ROUGE 18.42 15.97 50.41 34.70 DSR+XENT 20.15 16.24 25.98 31.80 DSR 7.20 19.17 22.03 41.35 Table 3: Qualitative analysis on repetition(Rep) / di- versity(Div). They are calculated by the percentage of repeat/out-of-article n-grams (unigrams for Gigaword and 5-grams for CNN/Daily Mail) in generated sen- tences. an interesting and efficient way to express the re- lation between China and Germany. FBERT is also improved by making this change. In addition, the second example in Table 4 shows that RL model with DSR corrects the sentence’ grammar and sig- nificantly improves the FBERT score by switching “down” to an unseen word “drops”. On the other hand, when optimizing DSR to improve the di- versity of generation, some semantically similar words may also be generated and harm the sum- marization quality as shown in the third example in Table 4. The new token “wins” reduces the scores of both metrics. We also evaluate the di- versity of a model quantitively by averaging the percentage of out-of-article n-grams in generated sentences. Results can be found in Table 3. The DSR model achieves the highest diversity.
  • 5. 1) Context economic links between china and germany will be at the center of talks here next month between li peng and chancellor helmut kohl when the chinese premier makes a week-long visit to germany , sources on both sides said . Groundtruth li peng ’s visit to germany to focus on economic ties ROUGE+XENT chinese premier to visit germany next month { FBERT: 49.48, ROUGE:18.64} DSR+XENT chinese premier to visit germany next month { FBERT: 49.48, ROUGE:18.64} DSR sino-german economic links to be at center of talks with germany {FBERT: 50.77, ROUGE:17.33} 2) Context the sensitive index dropped by #.## percent friday on the bombay stock exchange -lrb- bse -rrb- following brisk inquiries from operators that generally pressed sales to square up positions on the last day of the current settlement . Groundtruth sensex falls in bombay stock exchange ROUGE+XENT sensitive index down on bombay stock exchange {FBERT: 70.03, ROUGE:45.62} DSR+XENT sensitive index drops on bombay stock exchange {FBERT: 73.92, ROUGE:45.62} DSR sensitive index drops on bombay stock exchange {FBERT: 73.92, ROUGE:45.62} 3) Context belgium ’s rik verbrugghe took victory in tuesday ’s prologue stage of the tour de romandie , with a confident ride through the streets of geneva ’s old town . Groundtruth verbrugghe takes prologue victory in tour de romandie ROUGE+XENT verbrugghe takes victory in prologue stage of tour de romandie {FBERT: 91.41, ROUGE:75.93} DSR+XENT verbrugghe takes victory in tour de romandie prologue stage {FBERT: 91.56, ROUGE:81.79} DSR verbrugghe wins victory in tour de romandie prologue stage {FBERT: 89.45, ROUGE:70.10} 4) Context european finance ministers on saturday urged swedes to vote “ yes ” to adopting the euro , saying entry into the currency bloc would be good for sweden and the rest of europe . Groundtruth european finance ministers urge swedes to vote yes to euro ROUGE+XENT eu finance ministers urge swedes to vote yes to euro {FBERT: 96.63, ROUGE:93.47} DSR+XENT eu finance ministers urge swedes to vote yes to adopting euro {FBERT: 90.38, ROUGE:88.89} DSR eu finance ministers urge swedes to vote yes to euro euro {FBERT: 95.05, ROUGE:93.47} Table 4: Qualitative analysis of generated samples on Gigaword corpus. Generated words that do not appear in the context are marked blue. Repeated words are marked red. The first two examples represent DSR’s generated tokens are more diverse. However, it may suffer from problems as shown in example 3 and 4. Repetition Repetition will lead to lower FBERT as shown in the last example in Table 4. Using DSR reduces the probability of producing repe- titions. The average percentage of repeated n- grams in generated sentences are presented in the Table 3. As shown in this table, unlike ROUGE, the DSR model can achieve high fluency without XENT; moreover, it produces the fewest repeti- tions among all the rewards. Table 4 gives an ex- ample that DSR produces a repeated word (from example 4), but it does not reflect the overall dis- tribution of repeated word generation for all eval- uated models. 7 Conclusion This paper demonstrates the effectiveness of ap- plying the distributional semantic reward to rein- forcement learning in abstractive summarization, and specifically, we choose BERTSCORE. Our experimental results demonstrate that we achieve better performance on Gigaword and CNN/Daily Mail datasets. Besides, the generated sentences have fewer repetitions, and the fluency is also im- proved. Our finding is aligned to a contempora- neous study (Wieting et al., 2019) on leveraging semantic similarity for machine translation. References Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. An actor-critic algorithm for sequence prediction. In 5th Inter- national Conference on Learning Representations, Conference Track Proceedings. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- gio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, Confer- ence Track Proceedings. Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evalu- ation measures for machine translation and/or sum- marization, pages 65–72. Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural combina- torial optimization with reinforcement learning. In 5th International Conference on Learning Represen- tations, Workshop Track Proceedings. Sumit Chopra, Michael Auli, and Alexander M Rush. 2016. Abstractive sentence summarization with at- tentive recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 93–98.
  • 6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language under- standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, Volume 1 (Long and Short Papers), pages 4171–4186. Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph based approach to abstrac- tive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 340–348. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Su- leyman, and Phil Blunsom. 2015. Teaching ma- chines to read and comprehend. In Advances in neural information processing systems, pages 1693– 1701. Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In Ad- vances In Neural Information Processing Systems, pages 4601–4609. Chin-Yew Lin. 2004. Rouge: A package for auto- matic evaluation of summaries. Text Summarization Branches Out. Ibrahim F Moawad and Mostafa Aref. 2012. Semantic graph reduction approach for abstractive text sum- marization. In 2012 Seventh International Confer- ence on Computer Engineering & Systems (ICCES), pages 132–138. IEEE. Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, C¸ a glar Gulc¸ehre, and Bing Xiang. 2016. Abstrac- tive text summarization using sequence-to-sequence rnns and beyond. CoNLL 2016, page 280. Courtney Napoles, Matthew Gormley, and Benjamin Van Durme. 2012. Annotated gigaword. In Pro- ceedings of the Joint Workshop on Automatic Knowl- edge Base Construction and Web-scale Knowledge Extraction, pages 95–100. Association for Compu- tational Linguistics. Preksha Nema, Mitesh M Khapra, Anirban Laha, and Balaraman Ravindran. 2017. Diversity driven atten- tion model for query-based abstractive summariza- tion. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers), pages 1063–1072. Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. 2002. Bleu: a method for automatic eval- uation of machine translation. In Proceedings of the 40th annual meeting on association for compu- tational linguistics, pages 311–318. Association for Computational Linguistics. Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive sum- marization. In 6th International Conference on Learning Representations, Conference Track Pro- ceedings. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word repre- sentations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pages 2227–2237. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language under- standing by generative pre-training. Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level train- ing with recurrent neural networks. In 4th Inter- national Conference on Learning Representations, Conference Track Proceedings. Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sen- tence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Lan- guage Processing, pages 379–389. Seonggi Ryang and Takeshi Abekawa. 2012. Frame- work of automatic text summarization using rein- forcement learning. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natu- ral Language Processing and Computational Natu- ral Language Learning, pages 256–265. Association for Computational Linguistics. Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer- generator networks. In Proceedings of the 55th An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073– 1083. Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In Advances in Neural In- formation Processing Systems, pages 2692–2700. John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel, and Graham Neubig. 2019. Beyond bleu: Training neural machine translation with semantic similarity. In Proceedings of the 57th Conference of the Asso- ciation for Computational Linguistics, pages 4344– 4355. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural ma- chine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
  • 7. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Eval- uating text generation with bert. arXiv preprint arXiv:1904.09675.