Improving Neural Abstractive Text Summarization with Prior Knowledge

Improving Neural Abstractive Text Summarization
with Prior Knowledge
Gaetano Rossiello, Pierpaolo Basile, Giovanni Semeraro, Marco Di Ciano and
Gaetano Grasso
gaetano.rossiello@uniba.it
Department of Computer Science
University of Bari - Aldo Moro, Italy
URANIA 16 - 1st Italian Workshop on Deep Understanding and Reasoning:
A challenge for Next-generation Intelligent Agents
28 November 2016
AI*IA 16 - Genoa, Italy

Text Summarization
The goal of summarization is to produce a shorter version of a source
text by preserving the meaning and the key contents of the original.
A well written summary can signiﬁcantly reduce the amount of
cognitive work needed to digest large amounts of text.
Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Information Overload
Information overload is a problem in modern digital society caused
by the explosion of the amount of information produced on both the
World Wide Web and the enterprise environments.

Text Summarization - Approaches
Input
Single-document
Multi-document
Output
Extractive
Abstract
Headline
Extractive Summarization
The generated summary is a selection of relevant sentences from the
source text in a copy-paste fashion.
Abstractive Summarization
The generated summary is a new cohesive text not necessarily present
in the original source.

Extractive Summarization - Methods
Statistical methods
Features based
Machine Learning
Fuzzy Logic
Graph based
Distributional Semantic
LSA (Latent Semantic Analysis)
NMF (Non-Negative Matrix Factorization)
Word2Vec

Abstractive Summarization: a Challenging Task
Abstractive summarization requires deep understanding and
reasoning over the text, determining the explicit or implicit meaning
of each element, such as words, phrases, sentences and paragraphs,
and making inferences about their propertiesa in order to generate
new sentences which compose the summary
a
Norvig, P.: Inference in text understanding. AAAI, 1987.
– Abstractive Example –
Original: Russian defense minister Ivanov called Sunday for the
creation of a joint front for combating global terrorism.
Summary: Russia calls for joint front against terrorism.

Deep Learning for Abstractive Text Summarization
Idea
Casting the summarization task as a neural machine translation
problem, where the models, trained on a large amount of data,
learn the alignments between the input text and the target summary
through an attention encoder-decoder paradigm.
Rush, A., et al. A neural attention model for abstractive
sentence summarization. EMNLP 2015
Nallapati, R., et al. Sequence-to-sequence RNNs for text
summarization and Beyond. CoNNL 2016
Chopra, S., et al. Abstractive Sentence Summarization
with Attentive Recurrent Neural Networks. NAACL 2016

Deep Learning for Abstractive Text Summarization
1
Rush, A., et al.: A neural attention model for abstractive sentence
summarization. EMNLP 2015

Abstractive Summarization - Problem Formulation
Let us consider:
Original text x = {x1, x2, . . . , xn}
Summary y = {y1, y2, . . . , ym}
where n >> m and xi , yj ∈ V (V is the vocabulary)
A probabilistic perspective goal
The summarization problem consists in ﬁnding an output sequence
y that maximizes the conditional probability of y given an input
sequence x
arg maxy∈V P(y|x)
P(y|x) = P(y|x; θ) =
|y|
t=1 P(yt|{y1, . . . , yt−1}, x; θ)
where θ denotes a set of parameters learnt from a training set of
source text and target summary pairs.

Recurrent Neural Networks
Recurrent neural network (RNN) is a neural network model
proposed in the 80’s for modelling time series.
The structure of the network is similar to feedforward neural
network, with the distinction that it allows a recurrent hidden
state whose activation at each time is dependent on that of the
previous time (cycle).

Sequence to Sequence Learning
Sequence to sequence learning problem can be modeled by
RNNs using a encoder-decoder paradigm.
The encoder is a RNN that reads one token at time from the
input source and returns a ﬁxed-size vector representing the
input text.
The decoder is another RNN that generates words for the
summary and it is conditioned by the vector representation
returned by the ﬁrst network.

Abstractive Summarization and Sequence to Sequence
P(y|x; θ) = P(yt|{y1, . . . , yt−1}, x; θ) = gθ(ht, c)
where:
ht = gθ(yt−1, ht−1, c)
The vector context c is the output of the encoder and it
encodes the representation of the whole input source.
gθ is a RNN and it can be modeled using:
Elman RNN
LSTM (Long-Short Term Memory)
GRU (Gated Recurrent Unit)
At the time t the decoder RNN computes the probability of the word
yt given the last hidden state ht and the context input c.

Limits of the State-of-the-Art Neural Models
The proposed neural attention-based models for abstractive
summarization are still in an early stage, thus they show some
limitations:
Problems in distinguish rare and unknown words
Grammar errors in the generated summaries
– Example –
Suppose that none of two tokens 10 and Genoa belong to the
vocabulary, then the model cannot distinguish the probability of the
two sentences:
The airport is about 10 kilometers.
The airport is about Genoa kilometers.

Infuse Prior Knowledge into Neural Networks
Our Idea
Infuse prior knowledge, such as linguistic features, into a RNNs
in order to overtake these limits.
Motivation:
The
DT
airport
NN
is
VBZ
about
IN
?
CD
kilometers
NNS
where CD is the Part-of-Speech (POS) tag that identiﬁes a cardinal
number.
Thus, 10 is the unknown token with the higher probability because
it is tagged as CD.
Introducing information about the syntactical role of each word, the
neural network can tend to learn the right collocation of the words
by belonging to a certain part-of-speech class.

Infuse Prior Knowledge into Neural Networks
Preliminary approach:
Combine hand-crafted linguistic features and embeddings as
input vectors into RNNs.
Substitute the softmax layer of neural network with a
Log-Linear model.

Evaluation Plan - Dataset
We plan to evaluate our models on gold-standard datasets for the
summarization task:
DUC (Document Understanding Conference) 2002-20071
TAC (Text Analysis Conference) 2008-20112
Gigaword3
CNN/DailyMail4
Cornell University Library 5
Local government documents 6
1
http://duc.nist.gov/
2
http://tac.nist.gov/data/index.html
3
https://catalog.ldc.upenn.edu/LDC2012T21
4
https://github.com/deepmind/rc-data
5
https://arxiv.org/
6
made available by InnovaPuglia S.p.A.

Evaluation Plan - Metric
ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
ROUGEa metrics compare an automatically produced summary
against a reference or a set of references (human-produced) summary.
a
Lin, Chin-Yew. ROUGE: a Package for Automatic Evaluation of Summaries.
WAS 2004
ROUGE-N: N-gram based co-occurrence statistics.
ROUGE-L: Longest Common Subsequence (LCS) based
statistics.
ROUGEN(X) = S∈{Ref Summaries} gramn∈S countmatch(gramn,X)
S∈{Ref Summaries} gramn∈S count(gramn)

Future Works
Evaluate the proposed approach by comparing it with the SOA
models.
Integrate relational semantic knowledge into RNNs in order
to learn jointly word and knowledge embeddings by exploiting
knowledge bases and lexical thesaurus.
Abstractive summaries from whole documents or multiple
documents.

Improving Neural Abstractive Text Summarization with Prior Knowledge

Improving Neural Abstractive Text Summarization with Prior Knowledge

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Improving Neural Abstractive Text Summarization with Prior Knowledge

Similar to Improving Neural Abstractive Text Summarization with Prior Knowledge (20)

Recently uploaded

Recently uploaded (20)

Improving Neural Abstractive Text Summarization with Prior Knowledge