Automatic summarisation in the Information Age

                      Constantin Or˘san
                                  ...
Structure of the course



1 Introduction to automatic summarisation
Structure of the course



1 Introduction to automatic summarisation


2 Important methods in automatic summarisation
Structure of the course



1 Introduction to automatic summarisation


2 Important methods in automatic summarisation


3 ...
Structure of the course

1 Introduction to automatic summarisation
     What is a summary?
     What is automatic summaris...
What is a summary?
Abstract of scientific paper




Source: (Sparck Jones, 2007)
Summary of a news event




Source: Google news http://news.google.com
Summary of a web page




Source: Bing http://www.bing.com
Summary of financial news




Source: Yahoo! Finance http://finance.yahoo.com/
Summary of financial news




Source: Yahoo! Finance http://finance.yahoo.com/
Summary of financial news




Source: Yahoo! Finance http://finance.yahoo.com/
Maps




Source: Google Maps http://maps.google.co.uk/
Maps




Source: Google Maps http://maps.google.co.uk/
Summaries in everyday life
Summaries in everyday life

• Headlines: summaries of newspaper articles
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in everyday life

• Headlines: summaries of newspaper articles
• Table of contents: summary of a book, magazine
...
Summaries in the context of this tutorial




• are produced from the text of one or several documents
• the summary is a ...
Definitions of summary



• “an abbreviated, accurate representation of the content of a
  document preferably prepared by ...
Definitions of summary



• “an abbreviated, accurate representation of the content of a
  document preferably prepared by ...
Definitions of summary


• “an abbreviated, accurate representation of the content of a
  document preferably prepared by i...
Definitions of summary


• “an abbreviated, accurate representation of the content of a
  document preferably prepared by i...
Definitions of summary


• “an abbreviated, accurate representation of the content of a
  document preferably prepared by i...
Definitions of summary


• “an abbreviated, accurate representation of the content of a
  document preferably prepared by i...
Definitions of summary (II)


• “the abstract is a time saving device that can be used to find
  a particular part of the ar...
Definitions of summary (II)


• “the abstract is a time saving device that can be used to find
  a particular part of the ar...
Definitions of summary (II)


• “the abstract is a time saving device that can be used to find
  a particular part of the ar...
Definitions for automatic summaries



• these definitions are less ambitious
Definitions for automatic summaries



• these definitions are less ambitious
• “a concise representation of a document’s co...
Definitions for automatic summaries



• these definitions are less ambitious
• “a concise representation of a document’s co...
Definitions for automatic summaries



• these definitions are less ambitious
• “a concise representation of a document’s co...
Definitions for automatic summaries



• these definitions are less ambitious
• “a concise representation of a document’s co...
What is automatic summarisation?
What is automatic (text) summarisation



• Text summarisation
    • a reductive transformation of source text to summary ...
What is automatic (text) summarisation



• Text summarisation
    • a reductive transformation of source text to summary ...
What is automatic (text) summarisation


• Text summarisation
    • a reductive transformation of source text to summary t...
What is automatic (text) summarisation


• Text summarisation
    • a reductive transformation of source text to summary t...
What is automatic (text) summarisation


• Text summarisation
    • a reductive transformation of source text to summary t...
Related disciplines


There are many disciplines which are related to automatic
summarisation:
  • automatic categorisatio...
Automatic categorisation/classification


• Automatic text categorisation
    • is the task of building software tools capa...
Term/keyword extraction


• automatically identifies terms/keywords in texts
• a term is a word or group of words which are...
Information retrieval (IR)

• Information retrieval attempts to find information relevant to
  a user query and rank it acc...
Information extraction (IE)
• Information extraction is the automatic identification of
  predefined types of entities, rela...
Information extraction (IE)
• Information extraction is the automatic identification of
  predefined types of entities, rela...
Information extraction (IE)
• Information extraction is the automatic identification of
  predefined types of entities, rela...
Question answering (QA)

• Question answering aims at identifying the answer to a
  question in a large collection of docu...
Text generation




• Text generation creates text from computer-internal
  representations of information
• most generati...
Data mining


• Data mining is the (semi)automatic discovery of trends,
  patterns or unusual data across very large data ...
Opinion mining

• Opinion mining (OM) is a recent discipline at the crossroads
  of information retrieval and computationa...
Characteristics of summaries
Context factors



• the context factors defined by Sparck Jones (1999; 2001)
  represent a good way of characterising summ...
Context factors



• the context factors defined by Sparck Jones (1999; 2001)
  represent a good way of characterising summ...
Context factors



• the context factors defined by Sparck Jones (1999; 2001)
  represent a good way of characterising summ...
Context factors



• the context factors defined by Sparck Jones (1999; 2001)
  represent a good way of characterising summ...
Context factors



• the context factors defined by Sparck Jones (1999; 2001)
  represent a good way of characterising summ...
Context factors



• the context factors defined by Sparck Jones (1999; 2001)
  represent a good way of characterising summ...
Context factors



• the context factors defined by Sparck Jones (1999; 2001)
  represent a good way of characterising summ...
Context factors



Input factors     Purpose factors      Output factors
Form              Situation            Form
    -...
Input factors - Form


• structure: explicit organisation of documents.
  Can be problem - solution structure of scientific...
Input factors - Form


• structure: explicit organisation of documents.
  Can be problem - solution structure of scientific...
Input factors - Form


• structure: explicit organisation of documents.
  Can be problem - solution structure of scientific...
Input factors - Form


• language: monolingual/multilingual/cross-lingual
Input factors - Form


• language: monolingual/multilingual/cross-lingual
     • Monolingual: the source and the output ar...
Input factors - Form


• language: monolingual/multilingual/cross-lingual
     • Monolingual: the source and the output ar...
Input factors - Form


• language: monolingual/multilingual/cross-lingual
     • Monolingual: the source and the output ar...
Input factors - Form


• language: monolingual/multilingual/cross-lingual
     • Monolingual: the source and the output ar...
Input factors



• Subject type: intended readership
  Indicates whether the source was written from the general
  reader ...
Input factors



• Subject type: intended readership
  Indicates whether the source was written from the general
  reader ...
Why input factors are useful?




The input factors can be used whether to summarise a text or not:
  • Brandow, Mitze, an...
Purpose factors

• Use: how the summary is used
Purpose factors

• Use: how the summary is used
    • retrieving: the user uses the summary to decide whether to
      rea...
Purpose factors

• Use: how the summary is used
    • retrieving: the user uses the summary to decide whether to
      rea...
Purpose factors

• Use: how the summary is used
    • retrieving: the user uses the summary to decide whether to
      rea...
Purpose factors

• Use: how the summary is used
    • retrieving: the user uses the summary to decide whether to
      rea...
Purpose factors

• Use: how the summary is used
    • retrieving: the user uses the summary to decide whether to
      rea...
Purpose factors

• Use: how the summary is used
    • retrieving: the user uses the summary to decide whether to
      rea...
Purpose factors

• Use: how the summary is used
    • retrieving: the user uses the summary to decide whether to
      rea...
Purpose factors


• Relation to source: whether the summary is an extract or
  abstract
Purpose factors


• Relation to source: whether the summary is an extract or
  abstract
    • extract: contains units dire...
Purpose factors


• Relation to source: whether the summary is an extract or
  abstract
    • extract: contains units dire...
Purpose factors


• Relation to source: whether the summary is an extract or
  abstract
    • extract: contains units dire...
Purpose factors


• Relation to source: whether the summary is an extract or
  abstract
    • extract: contains units dire...
Purpose factors


• Relation to source: whether the summary is an extract or
  abstract
    • extract: contains units dire...
Output factors

• Scale (also referred to as compression rate): indicates the
  length of the summary
    • American Natio...
Evaluation of automatic
    summarisation
Why is evaluation necessary?


• Evaluation is very important because it allows us to assess the
  results of a method or ...
How the system is considered


• black-box evaluation:
     • the system is considered opaque to the user
     • the syste...
How the system is considered


• black-box evaluation:
     • the system is considered opaque to the user
     • the syste...
How humans interact with the process
• off-line evaluation
    • also called automatic evaluation because it does not requi...
How humans interact with the process
• off-line evaluation
    • also called automatic evaluation because it does not requi...
What it is measured


• intrinsic evaluation:
     • evaluates the results of a system directly
     • for example: qualit...
What it is measured


• intrinsic evaluation:
     • evaluates the results of a system directly
     • for example: qualit...
Evaluation used in automatic
                                   summarisation

• evaluation is very difficult task because t...
Evaluation used in automatic
                                    summarisation

 • evaluation is very difficult task because...
Direct evaluation


• intrinsic & online evaluation
• requires humans to read summaries and measure their quality
  and in...
Direct evaluation: quality

• it tries to assess the quality of a summary independently from
  the source
• can be simple ...
Direct evaluation: informativeness

• assesses how correctly the information in the source is
  reflected in the summary
• ...
Target-based evaluation


• it is the most used evaluation method
• compares the automatic summary with a gold standard
• ...
Corpora as gold standards



• usually annotated corpora are used as gold standard
• usually the annotation is very simple...
Manually produced corpora



• Require human judges to read each text from the corpus and
  to identify the important unit...
Guidelines for manually annotated corpora

• Edmundson (1969) annotated a heterogenous corpus
  consisting of 200 document...
Problems with manually produced corpora



• given how subjective the identification of important sentences
  is, the agree...
Automatically produced corpora

• Relies on the fact that very often human produce summaries
  by copy-paste from the sour...
Evaluation measures used with annotated
                                      corpora
 • usually precision, recall and f-m...
Summary Evaluation Environment (SEE)



• SEE environment was is being used in the DUC evaluations
• is a combination betw...
Relative utility of sentences (Radev et. al.,
                                         2000)


• Addresses the problem tha...
Target-based evaluation without
                             annotated corpora


• They require that the sources have a hu...
ROUGE


• ROUGE = Recall-Oriented Understudy for Gisting Evaluation
  (Lin, 2004)
• inspired by BLEU (Bilingual Evaluation...
ROUGE-N



N-gram co-occurrence statistics is a recall oriented metric
  • S1: Police killed the gunman
  • S2: Police kil...
ROUGE-L


Longest common sequence
  • S1: police killed the gunman
  • S2: police kill the gunman
  • S3: the gunman kill ...
ROUGE-W



Weighted Longest Common Subsequence
  • S1: [A B C D E F G]
  • S2: [A B C D H I J]
  • S3: [A H B J C I D]


 ...
ROUGE-S
ROUGE-S: Skip-bigram recall metric
  • Arbitrary in-sequence bigrams are computed
  • S1: police killed the gunman...
ROUGE




• Experiments on DUC 2000 - 2003 data shows good corelation
  with human judgement
• Using multiple references a...
Task-based evaluation

• is an extrinsic and on-line evaluation
• instead of evaluating the summaries directly, humans are...
Task-based evaluation



• this evaluation can be very useful because it assess a summary
  in real situations
• it is tim...
Automatic evaluation

• extrinsic and off-line evaluation method
• tries to replace humans in task-based evaluations with
 ...
intrinsic




extrinsic
From (Sparck Jones, 2007)
intrinsic
            • semi-purpose: inspection (e.g. for proper
              English)




extrinsic
From (Sparck Jones,...
intrinsic
            • semi-purpose: inspection (e.g. for proper
              English)

            • quasi-purpose: com...
intrinsic
            • semi-purpose: inspection (e.g. for proper
              English)

            • quasi-purpose: com...
intrinsic
            • semi-purpose: inspection (e.g. for proper
              English)

            • quasi-purpose: com...
Evaluation conferences



• evaluation conferences are conferences where all the
  participants have to complete the same ...
SUMMAC


• the first evaluation conference organised in automatic
  summarisation (in 1998)
• 6 participants in the dry-run...
SUMMAC


• the TREC dataset was used
• for the adhoc evaluation 20 topics each with 50 documents
  were selected
• the tim...
Text Summarization Challenge


• is an evaluation conference organised in Japan and its main
  goals are to evaluate Japan...
Document Understanding Conference
                                    (DUC)

• it is an evaluation conference organised pa...
Document Understanding Conference

• in 2004 participants were required to produce short (<665
  bytes) and (very short <7...
Structure of the course


1 Introduction to automatic summarisation


2 Important methods in automatic summarisation
     ...
Ideal summary processing model



        Source text(s)

                Interpretation

     Source representation

    ...
How humans produce summaries
How humans summarise documents

• Determining how humans summarise documents is a difficult
  task because it requires inter...
Document exploration


• it’s the first step
• the source’s title, outline, layout and table of contents are
  examined
• t...
Relevance assessment


• at this stage summarisers identify the theme and the thematic
  structure
• theme = a structured ...
Summary production


• the summary is produced from the expanded structure of the
  theme
• in order to avoid producing a ...
Single-document summarisation
           methods
Single document summarisation



• Produces summaries from a single document
Single document summarisation



• Produces summaries from a single document
• There are two main approaches:
Single document summarisation



• Produces summaries from a single document
• There are two main approaches:
    • automa...
Single document summarisation



• Produces summaries from a single document
• There are two main approaches:
    • automa...
Single document summarisation



• Produces summaries from a single document
• There are two main approaches:
    • automa...
Automatic text extraction


• Extracts important sentences from the text using different
  methods and produces an extract ...
Automatic text extraction



• These methods are quite robust
• The main drawback of this method is that it overlooks the
...
Surface-based summarisation
          methods
Term-based summarisation


• It was the first method used to produce summaries by Luhn
  (1958)
• Relies on the assumption ...
How to compute the importance of a word


• Different methods can be used:
    • Term frequency: how frequent is a word in ...
Term-based summarisation: the algorithm



(and can be used for other types of summarisers)
  1   Score all the words in t...
Term-based summarisation: the algorithm



(and can be used for other types of summarisers)
  1   Score all the words in t...
Term-based summarisation: the algorithm



(and can be used for other types of summarisers)
  1   Score all the words in t...
Term-based summarisation: the algorithm



(and can be used for other types of summarisers)
  1   Score all the words in t...
Position method


• It was noticed that in some genres important sentence appear
  in predefined positions
• First used by ...
Title method




• words in titles and headings are positively relevant to
  summarisation
• Edmundson (1969) noticed that...
Cue words/indicating phrases



• Makes use of words or phrases classified as ”positive” or
  ”negative” which may indicate...
Methods inspired from IR (Salton et. al.,
                                       1997)


• decomposes a document in a set ...
How to combine different methods



• Edmundson (1969) used a linear combination of features:

  Weight(S) = α∗Title(S)+β∗C...
Machine learning methods
What is machine learning (ML)?



Mitchell (1997):
  • “machine learning is concerned with the question of how to
    cons...
What is machine learning? (2)



• Reasoning is based on the similarity between new situations
  and the ones present in t...
ML for language processing



• Has been widely employed in a large number of NLP
  applications which range from part-of-...
ML as classification task



Very often an NLP problem can be seen as a classification problem
  • POS: finding the appropria...
Summarisation as a classification task


• Each example (instance) in the set to be learnt can be
  described by a set of f...
Kupiec et. al. (1995)

• used a Bayesian classifier to combine different features
• the features were:
    • if the length o...
Mani and Bloedorn (1998)


• learn rules about how to classify sentences
• features used:
     • location features: locati...
Other ML methods



• Osborne (2002) used maximum entropy with features such as
  word pairs, sentence length, sentence po...
Methods which exploit the discourse
            structure
Methods which exploit discourse cohesion


• summarisation methods which use discourse structure usually
  produce better ...
Methods which exploit text cohesion



• text cohesion involves relations between words, word senses,
  referring expressi...
Methods which exploit text cohesion



• text cohesion involves relations between words, word senses,
  referring expressi...
Lexical chains for text summarisation

• Telepattan system: Bembrahim and Ahmad (1995)
• two sentences are linked if the w...
Using coreferential chains for text
                                   summarisation



• method presented in (Azzam, Hump...
Coreference chain selection



The summarisation module implements several selection criteria:
  • Length of chain: prefer...
Summarisation methods which use
                     rhetorical structure of texts
• it is based on the Rhetorical Structu...
from (Marcu, 2000)
Summarisation using argumentative
                                       zoning



• Teufel and Moens (2002) exploit the s...
Knowledge-rich methods
Knowledge rich methods



• Produce abstracts
• Most of them try to “understand” (at least partially a text)
  and to make...
Knowledge-rich methods


• The abstracts obtained in this way are betters in terms of
  cohesion and coherence
• The abstr...
FRUMP (deJong, 1982)



• uses sketchy scripts to understand a situation
• these scripts only keep the information relevan...
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
2   The demonstrators march
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
2   The demonstrators march
3...
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
2   The demonstrators march
3...
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
2   The demonstrators march
3...
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
2   The demonstrators march
3...
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
2   The demonstrators march
3...
Example of script used by FRUMP


1   The demonstrators arrive at the demonstration location
2   The demonstrators march
3...
FRUMP


• the evaluation of the system revealed that it could not process
  a large number of scripts because it did not h...
Concept-based abstracting (Paice and
                                 Jones, 1993)

• Also referred to as extract and gene...
Other knowledge-rich methods


• Rumelhart (1975) developed a system to understand and
  summarise simple stories, using a...
Multi-document summarisation
          methods
Multi-document summarisation


• multi-document summarisation is the extension of
  single-document summarisation to colle...
Issues with multi-document summaries


• the collections to be summarised can vary a lot in size, so
  different methods mi...
IR inspired methods

• Salton et. al. (1997) can be adapted to multi-document
  summarisation
• instead of using paragraph...
Maximal Marginal Relevance



• proposed by (Goldstein et al., 2000)
• addresses the redundancy among multiple documents
•...
Cohesion text maps

• use knowledge based on lexical cohesion Mani and Bloedorn
  (1999)
• good to compare pairs of docume...
Cohesion text maps
Theme fusion Barzilay et. al. (1999)


• used to avoid redundancy in multi-document summaries
• Theme = collection of simi...
Centroid based summarisation



• a centroid = a set of words that are statistically important to
  a cluster of documents...
Cross Structure Theory


• Cross Structure Theory provides a theoretical model for issues
  that arise when trying to summ...
Automatic summarisation and the
            Internet
• New research topics have emerged at the confluence of
  summarisation with other disciplines (e.g. question answering
  a...
Challenges posed by the Web



• Huge amount of information
• Wide and diverse
• Information of all types e.g. structured ...
Summarisation of news on the Web

• Newsblaster (McKeown et. al. 2002) summarises news from
  the Web (http://newsblaster....
Email summarisation

• email summarisation is more difficult because they have a
  dialogue structure
• Muresan et. al. (200...
Blog summarisation



• Zhou et. al. (2006) see a blog entry as a summary of a news
  stories with personal opinions added...
Opinion mining and summarisation




• find what reviewers liked and disliked about a product
• usually large number of rev...
Producing the opinion summary



A three stage process:
  1   Extract object features that have been commented on in each
...
Opinion summaries

• Mao and Lebanon (2007) suggest to produce summaries that
  track the sentiment flow within a document ...
Opinion summarisation at TAC



• the Text Analysis Conference 2008 (TAC) contained an
  opinion summarisation from blogs
...
QA and Summarisation at INEX2009


• the QA track at INEX2009 requires participants to answer
  factual and complex questi...
Conclusions




• research in automatic summarisation is still a very active, but
  in many cases it merges with other fiel...
Thank you!
    More information and updates at:
http://www.summarizationonline.info
References
Alterman, Richard. 1986. Summarisation in small. In N. Sharkey, editor, Advances in
cognitive science. Chichester, England...
Endres-Niggemeyer, Brigitte. 1998. Summarizing information. Springer.
Fukusima, Takahiro and Manabu Okumura. 2001. Text Su...
Jing, Hongyan and Kathleen R. McKeown. 1999. The decomposition of
human-written summary sentences. In Proceedings of the 2...
Louis, Annie and Ani Nenkova. 2009. Performance confidence estimation for
automatic summarization. In Proceedings of the 12...
Tutorial on automatic summarization
Tutorial on automatic summarization
Tutorial on automatic summarization
Tutorial on automatic summarization
Tutorial on automatic summarization
Tutorial on automatic summarization
Tutorial on automatic summarization
Upcoming SlideShare
Loading in...5
×

Tutorial on automatic summarization

15,577

Published on

Tutorial on the topic of automatic summarization given at RANLP2009.

Published in: Education
3 Comments
15 Likes
Statistics
Notes
No Downloads
Views
Total Views
15,577
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
975
Comments
3
Likes
15
Embeds 0
No embeds

No notes for slide

Tutorial on automatic summarization

  1. 1. Automatic summarisation in the Information Age Constantin Or˘san a Research Group in Computational Linguistics Research Institute in Information and Language Processing University of Wolverhampton http://www.wlv.ac.uk/~in6093/ http://www.summarizationonline.info 12th Sept 2009
  2. 2. Structure of the course 1 Introduction to automatic summarisation
  3. 3. Structure of the course 1 Introduction to automatic summarisation 2 Important methods in automatic summarisation
  4. 4. Structure of the course 1 Introduction to automatic summarisation 2 Important methods in automatic summarisation 3 Automatic summarisation and the Internet
  5. 5. Structure of the course 1 Introduction to automatic summarisation What is a summary? What is automatic summarisation Context factors Evaluation General information about evaluation Direct evaluation Target-based evaluation Task-based evaluation Automatic evaluation Evaluation conferences 2 Important methods in automatic summarisation 3 Automatic summarisation and the Internet
  6. 6. What is a summary?
  7. 7. Abstract of scientific paper Source: (Sparck Jones, 2007)
  8. 8. Summary of a news event Source: Google news http://news.google.com
  9. 9. Summary of a web page Source: Bing http://www.bing.com
  10. 10. Summary of financial news Source: Yahoo! Finance http://finance.yahoo.com/
  11. 11. Summary of financial news Source: Yahoo! Finance http://finance.yahoo.com/
  12. 12. Summary of financial news Source: Yahoo! Finance http://finance.yahoo.com/
  13. 13. Maps Source: Google Maps http://maps.google.co.uk/
  14. 14. Maps Source: Google Maps http://maps.google.co.uk/
  15. 15. Summaries in everyday life
  16. 16. Summaries in everyday life • Headlines: summaries of newspaper articles
  17. 17. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine
  18. 18. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic
  19. 19. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.)
  20. 20. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.) • Abstract: summary of a scientific paper
  21. 21. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.) • Abstract: summary of a scientific paper • Bulletin: weather forecast, stock market, news
  22. 22. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.) • Abstract: summary of a scientific paper • Bulletin: weather forecast, stock market, news • Biography: resume, obituary
  23. 23. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.) • Abstract: summary of a scientific paper • Bulletin: weather forecast, stock market, news • Biography: resume, obituary • Abridgment: of books
  24. 24. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.) • Abstract: summary of a scientific paper • Bulletin: weather forecast, stock market, news • Biography: resume, obituary • Abridgment: of books • Review: of books, music, plays
  25. 25. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.) • Abstract: summary of a scientific paper • Bulletin: weather forecast, stock market, news • Biography: resume, obituary • Abridgment: of books • Review: of books, music, plays • Scale-downs: maps, thumbnails
  26. 26. Summaries in everyday life • Headlines: summaries of newspaper articles • Table of contents: summary of a book, magazine • Digest: summary of stories on the same topic • Highlights: summary of an event (meeting, sport event, etc.) • Abstract: summary of a scientific paper • Bulletin: weather forecast, stock market, news • Biography: resume, obituary • Abridgment: of books • Review: of books, music, plays • Scale-downs: maps, thumbnails • Trailer: from film, speech
  27. 27. Summaries in the context of this tutorial • are produced from the text of one or several documents • the summary is a text or a list of sentences
  28. 28. Definitions of summary • “an abbreviated, accurate representation of the content of a document preferably prepared by its author(s) for publication with it. Such abstracts are also useful in access publications and machine-readable databases” (American National Standards Institute Inc., 1979)
  29. 29. Definitions of summary • “an abbreviated, accurate representation of the content of a document preferably prepared by its author(s) for publication with it. Such abstracts are also useful in access publications and machine-readable databases” (American National Standards Institute Inc., 1979)
  30. 30. Definitions of summary • “an abbreviated, accurate representation of the content of a document preferably prepared by its author(s) for publication with it. Such abstracts are also useful in access publications and machine-readable databases” (American National Standards Institute Inc., 1979) • “an abstract summarises the essential contents of a particular knowledge record, and it is a true surrogate of the document” (Cleveland, 1983)
  31. 31. Definitions of summary • “an abbreviated, accurate representation of the content of a document preferably prepared by its author(s) for publication with it. Such abstracts are also useful in access publications and machine-readable databases” (American National Standards Institute Inc., 1979) • “an abstract summarises the essential contents of a particular knowledge record, and it is a true surrogate of the document” (Cleveland, 1983)
  32. 32. Definitions of summary • “an abbreviated, accurate representation of the content of a document preferably prepared by its author(s) for publication with it. Such abstracts are also useful in access publications and machine-readable databases” (American National Standards Institute Inc., 1979) • “an abstract summarises the essential contents of a particular knowledge record, and it is a true surrogate of the document” (Cleveland, 1983) • “the primary function of abstracts is to indicate and predict the structure and content of the text” (van Dijk, 1980)
  33. 33. Definitions of summary • “an abbreviated, accurate representation of the content of a document preferably prepared by its author(s) for publication with it. Such abstracts are also useful in access publications and machine-readable databases” (American National Standards Institute Inc., 1979) • “an abstract summarises the essential contents of a particular knowledge record, and it is a true surrogate of the document” (Cleveland, 1983) • “the primary function of abstracts is to indicate and predict the structure and content of the text” (van Dijk, 1980)
  34. 34. Definitions of summary (II) • “the abstract is a time saving device that can be used to find a particular part of the article without reading it; [...] knowing the structure in advance will help the reader to get into the article; [...] as a summary of the article, it can serve as a review, or as a clue to the content”. Also, an abstract gives “an exact and concise knowledge of the total content of the very much more lengthy original, a factual summary which is both an elaboration of the title and a condensation of the report [...] if comprehensive enough, it might replace reading the article for some purposes” (Graetz, 1985).
  35. 35. Definitions of summary (II) • “the abstract is a time saving device that can be used to find a particular part of the article without reading it; [...] knowing the structure in advance will help the reader to get into the article; [...] as a summary of the article, it can serve as a review, or as a clue to the content”. Also, an abstract gives “an exact and concise knowledge of the total content of the very much more lengthy original, a factual summary which is both an elaboration of the title and a condensation of the report [...] if comprehensive enough, it might replace reading the article for some purposes” (Graetz, 1985).
  36. 36. Definitions of summary (II) • “the abstract is a time saving device that can be used to find a particular part of the article without reading it; [...] knowing the structure in advance will help the reader to get into the article; [...] as a summary of the article, it can serve as a review, or as a clue to the content”. Also, an abstract gives “an exact and concise knowledge of the total content of the very much more lengthy original, a factual summary which is both an elaboration of the title and a condensation of the report [...] if comprehensive enough, it might replace reading the article for some purposes” (Graetz, 1985). • these definitions refer to human produced summaries
  37. 37. Definitions for automatic summaries • these definitions are less ambitious
  38. 38. Definitions for automatic summaries • these definitions are less ambitious • “a concise representation of a document’s content to enable the reader to determine its relevance to a specific information” (Johnson, 1995)
  39. 39. Definitions for automatic summaries • these definitions are less ambitious • “a concise representation of a document’s content to enable the reader to determine its relevance to a specific information” (Johnson, 1995)
  40. 40. Definitions for automatic summaries • these definitions are less ambitious • “a concise representation of a document’s content to enable the reader to determine its relevance to a specific information” (Johnson, 1995) • “a summary is a text produced from one or more texts, that contains a significant portion of the information in the original text(s), and is not longer than half of the original text(s)”. (Hovy, 2003)
  41. 41. Definitions for automatic summaries • these definitions are less ambitious • “a concise representation of a document’s content to enable the reader to determine its relevance to a specific information” (Johnson, 1995) • “a summary is a text produced from one or more texts, that contains a significant portion of the information in the original text(s), and is not longer than half of the original text(s)”. (Hovy, 2003)
  42. 42. What is automatic summarisation?
  43. 43. What is automatic (text) summarisation • Text summarisation • a reductive transformation of source text to summary text through content reduction by selection and/or generalisation on what is important in the source. (Sparck Jones, 1999)
  44. 44. What is automatic (text) summarisation • Text summarisation • a reductive transformation of source text to summary text through content reduction by selection and/or generalisation on what is important in the source. (Sparck Jones, 1999)
  45. 45. What is automatic (text) summarisation • Text summarisation • a reductive transformation of source text to summary text through content reduction by selection and/or generalisation on what is important in the source. (Sparck Jones, 1999) • the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks). (Mani and Maybury, 1999)
  46. 46. What is automatic (text) summarisation • Text summarisation • a reductive transformation of source text to summary text through content reduction by selection and/or generalisation on what is important in the source. (Sparck Jones, 1999) • the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks). (Mani and Maybury, 1999)
  47. 47. What is automatic (text) summarisation • Text summarisation • a reductive transformation of source text to summary text through content reduction by selection and/or generalisation on what is important in the source. (Sparck Jones, 1999) • the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks). (Mani and Maybury, 1999) • Automatic text summarisation = The process of producing summaries automatically.
  48. 48. Related disciplines There are many disciplines which are related to automatic summarisation: • automatic categorisation/classification • term/keyword extraction • information retrieval • information extraction • question answering • text generation • data/opinion mining
  49. 49. Automatic categorisation/classification • Automatic text categorisation • is the task of building software tools capable of classifying text documents under predefined categories or subject codes • each document can be in one or several categories • examples of categories: Library of Congress subject headings • Automatic text classification • is usually considered broader than text categorisation • includes text clustering and text categorisation • in does not necessary require to know the classes • Examples: email/spam filtering, routing,
  50. 50. Term/keyword extraction • automatically identifies terms/keywords in texts • a term is a word or group of words which are important in a domain and represent a concept of the domain • a keyword is an important word in a document, but it is not necessary a term • terms and keywords are extracted using a mixture of statistical and linguistic approaches • automatic indexing identifies all the relevant occurrences of a keyword in texts and produces indexes
  51. 51. Information retrieval (IR) • Information retrieval attempts to find information relevant to a user query and rank it according to its relevance • the output is usually a list of documents in some cases together with relevant snippets from the document • Example: search engines • needs to be able to deal with enormous quantities of information and process information in any format (e.g. text, image, video, etc.) • is a field which achieved a level of maturity and is used in industry and business • combines statistics, text analysis, link analysis and user interfaces
  52. 52. Information extraction (IE) • Information extraction is the automatic identification of predefined types of entities, relations or events in free text • quite often the best results are obtained by rule-based approaches, but machine learning approaches are used more and more • can generate database records • is domain dependent • this field developed a lot as a result of the MUC conferences • one of the tasks in the MUC conferences was to fill in templates • Example: Ford appointed Harriet Smith as president
  53. 53. Information extraction (IE) • Information extraction is the automatic identification of predefined types of entities, relations or events in free text • quite often the best results are obtained by rule-based approaches, but machine learning approaches are used more and more • can generate database records • is domain dependent • this field developed a lot as a result of the MUC conferences • one of the tasks in the MUC conferences was to fill in templates • Example: Ford appointed Harriet Smith as president
  54. 54. Information extraction (IE) • Information extraction is the automatic identification of predefined types of entities, relations or events in free text • quite often the best results are obtained by rule-based approaches, but machine learning approaches are used more and more • can generate database records • is domain dependent • this field developed a lot as a result of the MUC conferences • one of the tasks in the MUC conferences was to fill in templates • Example: Ford appointed Harriet Smith as president • Person: Harriet Smith • Job: president • Company: Ford
  55. 55. Question answering (QA) • Question answering aims at identifying the answer to a question in a large collection of documents • the information provided by QA is more focused than information retrieval • a QA system should be able to answer any question and should not be restricted to a domain (like IE) • the output can be the exact answer or a text snippet which contains the answer • the domain took off as a result of the introduction of QA track in TREC • user-focused summarisation = open-domain question answering
  56. 56. Text generation • Text generation creates text from computer-internal representations of information • most generation systems rely on massive amounts of linguistic knowledge and manually encoded rules for translating the underlying representation into language • text generation systems are very domain dependent
  57. 57. Data mining • Data mining is the (semi)automatic discovery of trends, patterns or unusual data across very large data sets, usually for the purposes of decision making • Text mining applies methods from data mining to textual collections • Processes really large amounts of data in order to find useful information • In many cases it is not known (clearly) what is sought • Visualisation has a very important role in data mining
  58. 58. Opinion mining • Opinion mining (OM) is a recent discipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. • Is usually applied to collections of documents (e.g. blogs) and seen part of text/data mining • Sentiment Analysis, Sentiment Classification, Opinion Extraction are other names used in literature to identify this discipline. • Examples of OM problems: • What is the general opinion on the proposed tax reform? • How is popular opinion on the presidential candidates evolving? • Which of our customers are unsatisfied? Why?
  59. 59. Characteristics of summaries
  60. 60. Context factors • the context factors defined by Sparck Jones (1999; 2001) represent a good way of characterising summaries
  61. 61. Context factors • the context factors defined by Sparck Jones (1999; 2001) represent a good way of characterising summaries • they do not necessary refer to automatic summaries
  62. 62. Context factors • the context factors defined by Sparck Jones (1999; 2001) represent a good way of characterising summaries • they do not necessary refer to automatic summaries • they do not necessary refer to summaries
  63. 63. Context factors • the context factors defined by Sparck Jones (1999; 2001) represent a good way of characterising summaries • they do not necessary refer to automatic summaries • they do not necessary refer to summaries • there are three types of factors:
  64. 64. Context factors • the context factors defined by Sparck Jones (1999; 2001) represent a good way of characterising summaries • they do not necessary refer to automatic summaries • they do not necessary refer to summaries • there are three types of factors: • input factors: characterise the input document(s)
  65. 65. Context factors • the context factors defined by Sparck Jones (1999; 2001) represent a good way of characterising summaries • they do not necessary refer to automatic summaries • they do not necessary refer to summaries • there are three types of factors: • input factors: characterise the input document(s) • purpose factors: define the transformations necessary to obtain the output
  66. 66. Context factors • the context factors defined by Sparck Jones (1999; 2001) represent a good way of characterising summaries • they do not necessary refer to automatic summaries • they do not necessary refer to summaries • there are three types of factors: • input factors: characterise the input document(s) • purpose factors: define the transformations necessary to obtain the output • output factors: characterise the produced summaries
  67. 67. Context factors Input factors Purpose factors Output factors Form Situation Form - Structure Use - Structure - Scale Summary type - Scale - Medium Coverage - Medium - Genre Relation to source - Language - Language - Format - Format Subject matter Subject type Unit
  68. 68. Input factors - Form • structure: explicit organisation of documents. Can be problem - solution structure of scientific documents, pyramidal structure of newspaper articles, presence of embedded structure in text (e.g. rhetorical patterns)
  69. 69. Input factors - Form • structure: explicit organisation of documents. Can be problem - solution structure of scientific documents, pyramidal structure of newspaper articles, presence of embedded structure in text (e.g. rhetorical patterns) • scale: the length of the documents Different methods need to be used for a book and for a newspaper article due to very different compression rates
  70. 70. Input factors - Form • structure: explicit organisation of documents. Can be problem - solution structure of scientific documents, pyramidal structure of newspaper articles, presence of embedded structure in text (e.g. rhetorical patterns) • scale: the length of the documents Different methods need to be used for a book and for a newspaper article due to very different compression rates • medium: natural language/sublanguage/specialised language If the text is written in a sublanguage it is less ambiguous and therefore it’s easier to process.
  71. 71. Input factors - Form • language: monolingual/multilingual/cross-lingual
  72. 72. Input factors - Form • language: monolingual/multilingual/cross-lingual • Monolingual: the source and the output are in the same language
  73. 73. Input factors - Form • language: monolingual/multilingual/cross-lingual • Monolingual: the source and the output are in the same language • Multilingual: the input is in several languages and output in one of these languages
  74. 74. Input factors - Form • language: monolingual/multilingual/cross-lingual • Monolingual: the source and the output are in the same language • Multilingual: the input is in several languages and output in one of these languages • Cross-lingual: the language of the output is different from the language of the source(s)
  75. 75. Input factors - Form • language: monolingual/multilingual/cross-lingual • Monolingual: the source and the output are in the same language • Multilingual: the input is in several languages and output in one of these languages • Cross-lingual: the language of the output is different from the language of the source(s) • formatting: whether the source is in any special formatting. This is more a programming problem, but needs to be taken into consideration if information is lost as a result of conversion.
  76. 76. Input factors • Subject type: intended readership Indicates whether the source was written from the general reader or for specific readers. It influences the amount of background information present in the source.
  77. 77. Input factors • Subject type: intended readership Indicates whether the source was written from the general reader or for specific readers. It influences the amount of background information present in the source. • Unit: single/multiple sources (single vs. multi-document summarisation) mainly concerned with the amount of redundancy in the text
  78. 78. Why input factors are useful? The input factors can be used whether to summarise a text or not: • Brandow, Mitze, and Rau (1995) use structure of the document (presence of speech, tables, embedded lists, etc.) to decide whether to summarise it or not. • Louis and Nenkova (2009) train a system on DUC data to determine whether the result is expected to be reliable or not.
  79. 79. Purpose factors • Use: how the summary is used
  80. 80. Purpose factors • Use: how the summary is used • retrieving: the user uses the summary to decide whether to read the whole document,
  81. 81. Purpose factors • Use: how the summary is used • retrieving: the user uses the summary to decide whether to read the whole document, • substituting: use the summary instead of the full document,
  82. 82. Purpose factors • Use: how the summary is used • retrieving: the user uses the summary to decide whether to read the whole document, • substituting: use the summary instead of the full document, • previewing: get the structure of the source, etc.
  83. 83. Purpose factors • Use: how the summary is used • retrieving: the user uses the summary to decide whether to read the whole document, • substituting: use the summary instead of the full document, • previewing: get the structure of the source, etc. • Summary type: indicates how is the summary
  84. 84. Purpose factors • Use: how the summary is used • retrieving: the user uses the summary to decide whether to read the whole document, • substituting: use the summary instead of the full document, • previewing: get the structure of the source, etc. • Summary type: indicates how is the summary • indicative summaries provide a brief description of the source without going into details,
  85. 85. Purpose factors • Use: how the summary is used • retrieving: the user uses the summary to decide whether to read the whole document, • substituting: use the summary instead of the full document, • previewing: get the structure of the source, etc. • Summary type: indicates how is the summary • indicative summaries provide a brief description of the source without going into details, • informative summaries follow the ideas main ideas and structure of the source
  86. 86. Purpose factors • Use: how the summary is used • retrieving: the user uses the summary to decide whether to read the whole document, • substituting: use the summary instead of the full document, • previewing: get the structure of the source, etc. • Summary type: indicates how is the summary • indicative summaries provide a brief description of the source without going into details, • informative summaries follow the ideas main ideas and structure of the source • critical summaries give a description of the source and discuss its contents (e.g. review articles can be considered critical summaries)
  87. 87. Purpose factors • Relation to source: whether the summary is an extract or abstract
  88. 88. Purpose factors • Relation to source: whether the summary is an extract or abstract • extract: contains units directly extracted from the document (i.e. paragraphs, sentences, clauses),
  89. 89. Purpose factors • Relation to source: whether the summary is an extract or abstract • extract: contains units directly extracted from the document (i.e. paragraphs, sentences, clauses), • abstract: includes units which are not present in the source
  90. 90. Purpose factors • Relation to source: whether the summary is an extract or abstract • extract: contains units directly extracted from the document (i.e. paragraphs, sentences, clauses), • abstract: includes units which are not present in the source • Coverage: which type of information should be present in the summary
  91. 91. Purpose factors • Relation to source: whether the summary is an extract or abstract • extract: contains units directly extracted from the document (i.e. paragraphs, sentences, clauses), • abstract: includes units which are not present in the source • Coverage: which type of information should be present in the summary • generic: the summary should cover all the important information of the document,
  92. 92. Purpose factors • Relation to source: whether the summary is an extract or abstract • extract: contains units directly extracted from the document (i.e. paragraphs, sentences, clauses), • abstract: includes units which are not present in the source • Coverage: which type of information should be present in the summary • generic: the summary should cover all the important information of the document, • user-focused: the user indicates which should be the focus of the summary
  93. 93. Output factors • Scale (also referred to as compression rate): indicates the length of the summary • American National Standards Institute Inc. (1979) recommends 250 words • Borko and Bernier (1975) point out that imposing an arbitrary limit on summaries is not good for their quality, but that a length of around 10% is usually enough • Hovy (2003) requires that the length of the summary is kept less then half of the source’s size • Goldstein et al. (1999) point out that the summary length seems to be independent from the length of the source • the structure of the output can be influenced by the structure of the input or by existing conventions • the subject matter can be the same as the input, or can be broader when background information is added
  94. 94. Evaluation of automatic summarisation
  95. 95. Why is evaluation necessary? • Evaluation is very important because it allows us to assess the results of a method or system • Evaluation allows us to compare the results of different methods or systems • Some types of evaluation allow us to understand why a method fails • almost each field has its specific evaluation methods • there are several ways to perform evaluation • How the system is considered • How humans interact with the evaluation process • What is measured
  96. 96. How the system is considered • black-box evaluation: • the system is considered opaque to the user • the system is considered as a whole • allows direct comparison between different systems • does not explain the system’s performance
  97. 97. How the system is considered • black-box evaluation: • the system is considered opaque to the user • the system is considered as a whole • allows direct comparison between different systems • does not explain the system’s performance • glass-box evaluation: • each of the system’s components are assessed in order to understand how the final result is obtained • is very time consuming and difficult • relies on phenomena which are not fully understood (e.g. error propagation)
  98. 98. How humans interact with the process • off-line evaluation • also called automatic evaluation because it does not require human intervention • usually involves the comparison between the system’s output and a gold standard • very often annotated corpora are used as gold standards • are usually preferred because they are fast and not directly influenced by the human subjectivity • can be repeated • cannot be (easily) used in all the fields
  99. 99. How humans interact with the process • off-line evaluation • also called automatic evaluation because it does not require human intervention • usually involves the comparison between the system’s output and a gold standard • very often annotated corpora are used as gold standards • are usually preferred because they are fast and not directly influenced by the human subjectivity • can be repeated • cannot be (easily) used in all the fields • online evaluation • requires humans to assess the output of the system according to some guidelines • is useful for those tasks where the output of the system cannot be uniquely predicted (e.g. summarisation, text generation, question answering, machine translation) • are time consuming, expensive and cannot be easily repeated
  100. 100. What it is measured • intrinsic evaluation: • evaluates the results of a system directly • for example: quality, informativeness • sometimes does not give a very accurate view of how useful the output can be for another task
  101. 101. What it is measured • intrinsic evaluation: • evaluates the results of a system directly • for example: quality, informativeness • sometimes does not give a very accurate view of how useful the output can be for another task • extrinsic evaluation: • evaluates the results of another system which uses the results of the first • examples: post-edit measures, relevance assessment, reading comprehension
  102. 102. Evaluation used in automatic summarisation • evaluation is very difficult task because there is no clear idea what constitutes a good summary • the number of perfectly acceptable summaries from a text is not limited • four types of evaluation methods
  103. 103. Evaluation used in automatic summarisation • evaluation is very difficult task because there is no clear idea what constitutes a good summary • the number of perfectly acceptable summaries from a text is not limited • four types of evaluation methods Intrinsic Extrinsic On-line Direct evaluation Task-based evaluation Off-line evaluation Target-based evaluation Automatic evaluation
  104. 104. Direct evaluation • intrinsic & online evaluation • requires humans to read summaries and measure their quality and informativeness according to some guidelines • is one of the first evaluation methods used in automatic summarisation • to a certain extent it is quite straight forward which makes it appealing for small scale evaluation • it is time consuming, subjective and in many cases cannot be repeated by others
  105. 105. Direct evaluation: quality • it tries to assess the quality of a summary independently from the source • can be simple classification of sentences in acceptable or unacceptable • Minel, Nugier, and Piat (1997) proposed an evaluation protocol which considers the coherence, cohesion and legibility of summaries • cohesion of a summary is measured in terms of dangling anaphors • the coherence in terms of discourse ruptures. • the legibility is decided by jurors who are requested to classify each summary in very bad, bad, mediocre, good and very good. • it does not assess the contents of a summary so it could be misleading
  106. 106. Direct evaluation: informativeness • assesses how correctly the information in the source is reflected in the summary • the judges are required to read both the source and the summary, for this reason making the process longer and more expensive • judges are generally required to: • identify important ideas from the source which do not appear in the summary • ideas from the summary which are not important enough and therefore should not be there • identify the logical development of the ideas and see whether they appear in the summary • given that it is time consuming automatic methods to compute the informativeness are preferred
  107. 107. Target-based evaluation • it is the most used evaluation method • compares the automatic summary with a gold standard • they are appropriate for extractive summarisation methods • it is intrinsic and off-line • it does not require to have humans involved in the evaluation • has the advantage of being fast, cheap and can be repeated by other researchers • the drawback is that it requires a gold standard which usually is not easy to produce
  108. 108. Corpora as gold standards • usually annotated corpora are used as gold standard • usually the annotation is very simple: for each sentence it indicates whether it is important enough to be included in the summary or not • such corpora are normally used to assess extracts • can be produced manually and automatically • these corpora normally represent one point of view
  109. 109. Manually produced corpora • Require human judges to read each text from the corpus and to identify the important units in each text according to guidelines • Kupiec, Pederson, and Chen (1995) and Teufel and Moens (1997) took advantage of the existence of human produced abstracts and asked human annotators to align sentences from the document with sentences from the abstracts. • it is not necessary to use specialised tools apply this annotation, but in many cases they can help
  110. 110. Guidelines for manually annotated corpora • Edmundson (1969) annotated a heterogenous corpus consisting of 200 documents in the fields of physics, life science, information science and humanities. The important sentences were considered to be those which indicated: • what the subject area is, • why the research is necessary, • how the problem is solved, • which are the findings of the research. • Hasler, Or˘san, and Mitkov (2003) annotated a corpus of a newspaper articles and the important sentences were considered those linked to the main topic of text as indicated in the title (See http://clg.wlv.ac.uk/projects/CAST/ for the complete guidelines)
  111. 111. Problems with manually produced corpora • given how subjective the identification of important sentences is, the agreement between annotators is low • the inter-annotator agreement is determined by the genre of texts and the length of summaries • Hasler, Or˘san, and Mitkov (2003) tries to measure the a agreement between three annotators and notice very low value, but • when the contents is compared the agreement increases
  112. 112. Automatically produced corpora • Relies on the fact that very often human produce summaries by copy-paste from the source • there are algorithms which identify sets of sentences from the source which cover the information in the summary • Marcu (1999) employed a greedy algorithm which eliminates sentences from the whole document that do not reduce the similarity between the summary and the remaining sentences. • Jing and McKeown (1999) treat the human produced abstract as a sequence of words which appears in the document, and reformulate the problem of alignment as the problem of finding the most likely position of the words from the abstract in the full document using a Hidden Markov Model.
  113. 113. Evaluation measures used with annotated corpora • usually precision, recall and f-measure are used to calculate the performance of a system • the list of sentences extracted by the program is compared with the list of sentences marked by humans Extracted by program Not-extracted by program Extracted by humans True Positives False negatives Not extracted by humans False positives True negatives TruePositives Precision = TruePositives + FalsePositives TruePositives Recall = TruePositives + FalseNegatives (β 2 + 1)PR F − score = β2P + R
  114. 114. Summary Evaluation Environment (SEE) • SEE environment was is being used in the DUC evaluations • is a combination between direct and target evaluation • it requires humans to assess whether each unit from the automatic summary appears in the target summary • it also offers the option to answer questions about the quality of the summary (e.g. Does the summary build from sentence to sentence to a coherent body of information about the topic?)
  115. 115. Relative utility of sentences (Radev et. al., 2000) • Addresses the problem that humans often disagree when they are asked to select the top n% sentences from a document • Each sentence in the document receives a score from 1 to 10 depending on how “summary worthy” is • The score of an automatic summary is the normalised score of the extracted sentences • When several judges are available the score of a summary is the average over all judges • Can be used for any compression rate
  116. 116. Target-based evaluation without annotated corpora • They require that the sources have a human provided summary (but they do not need to be annotated) • Donaway et. al. (2000) propose to use cosine similarity between an automatic summary and human summary - but it relies on words co-occurrences • ROUGE uses the number of overlapping units (Lin, 2004) • Nenkova and Passonneau (2004) proposed the pyramid evaluation method which addresses the problem that different people select different content when writing summaries
  117. 117. ROUGE • ROUGE = Recall-Oriented Understudy for Gisting Evaluation (Lin, 2004) • inspired by BLEU (Bilingual Evaluation Understudy) used in machine translation (Papineni et al., 2002) • Developed by Chin-Yew Lin and available at http://berouge.com • Compares quality of a summary by comparison with ideal summaries • Metrics count the number of overlapping units • There are several versions depending on how the comparison is made
  118. 118. ROUGE-N N-gram co-occurrence statistics is a recall oriented metric • S1: Police killed the gunman • S2: Police kill the gunman • S3: The gunman kill police • S2=S3
  119. 119. ROUGE-L Longest common sequence • S1: police killed the gunman • S2: police kill the gunman • S3: the gunman kill police • S2 = 3/4 (police the gunman) • S3 = 2/4 (the gunman) • S2 > S3
  120. 120. ROUGE-W Weighted Longest Common Subsequence • S1: [A B C D E F G] • S2: [A B C D H I J] • S3: [A H B J C I D] • ROUGE-W favours consecutive matches • S2 better than S3
  121. 121. ROUGE-S ROUGE-S: Skip-bigram recall metric • Arbitrary in-sequence bigrams are computed • S1: police killed the gunman (“police killed”, “police the”, “police gunman”, “killed the”, “killed gunman”, “the gunman”) • S2: police kill the gunman (“police the”, “police gunman”, “the gunman”) • S3: the gunman kill police (“the gunman”) • S4: the gunman police killed (“police killed”, “the gunman”) • S2 better than S4 better than S3 • ROUGE-SU adds unigrams to ROUGE-S
  122. 122. ROUGE • Experiments on DUC 2000 - 2003 data shows good corelation with human judgement • Using multiple references achieved better correlation with human judgement than just using a single reference. • Stemming and removing stopwords improved correlation with human judgement
  123. 123. Task-based evaluation • is an extrinsic and on-line evaluation • instead of evaluating the summaries directly, humans are asked to perform tasks using summaries and the accuracy of these tasks is measured • the assumption is that the accuracy does not decrease when good summaries are used • the time should reduce • Example of tasks: classification of summaries according to predefined classes (Saggion and Lapalme, 2000), determining the relevance of a summary to a topic (Miike et al., 1994; Oka and Ueda, 2000), and reading comprehension (Morris, Kasper, and Adams, 1992; Or˘san, Pekar, and Hasler, 2004). a
  124. 124. Task-based evaluation • this evaluation can be very useful because it assess a summary in real situations • it is time consuming and requires humans to be involved in the evaluation process • in order to obtain statistically significant results a large number of judges have to be involved • this evaluation method has been used in evaluation conferences
  125. 125. Automatic evaluation • extrinsic and off-line evaluation method • tries to replace humans in task-based evaluations with automatic methods which perform the same task and are evaluated automatically • Examples: • text retrieval (Brandow, Mitze, and Rau, 1995): increase in precision but drastic reduction of recall • text categorisation (Kolcz, Prabakarmurthi, and Kalita, 2001): the performance of categorisation increases • has the advantage of being fast and cheap, but in many cases the tasks which can benefit from summaries are as difficult to evaluate as automatic summarisation (e.g. Kuo et al. (2002) proposed to use QA)
  126. 126. intrinsic extrinsic From (Sparck Jones, 2007)
  127. 127. intrinsic • semi-purpose: inspection (e.g. for proper English) extrinsic From (Sparck Jones, 2007)
  128. 128. intrinsic • semi-purpose: inspection (e.g. for proper English) • quasi-purpose: comparison with models (e.g. ngrams, nuggets) extrinsic From (Sparck Jones, 2007)
  129. 129. intrinsic • semi-purpose: inspection (e.g. for proper English) • quasi-purpose: comparison with models (e.g. ngrams, nuggets) • pseudo-purpose: simulation of task contexts (e.g. action scenarios) extrinsic From (Sparck Jones, 2007)
  130. 130. intrinsic • semi-purpose: inspection (e.g. for proper English) • quasi-purpose: comparison with models (e.g. ngrams, nuggets) • pseudo-purpose: simulation of task contexts (e.g. action scenarios) • full-purpose: operation in task context (e.g. report writing) extrinsic From (Sparck Jones, 2007)
  131. 131. Evaluation conferences • evaluation conferences are conferences where all the participants have to complete the same task on a common set of data • these conferences allow direct comparison between the participants • such conferences determined quick advances in fields: MUC (information extraction), TREC (Information retrieval & question answering), CLEF (question answering for non-English languages and cross-lingual QA)
  132. 132. SUMMAC • the first evaluation conference organised in automatic summarisation (in 1998) • 6 participants in the dry-run and 16 in the formal evaluation • mainly extrinsic evaluation: • adhoc task determine the relevance of the source document to a query (topic) • categorisation assign to each document a category on the basis of its summary • question answering answer questions using the summary • a small acceptability test where direct evaluation was used
  133. 133. SUMMAC • the TREC dataset was used • for the adhoc evaluation 20 topics each with 50 documents were selected • the time for the adhoc task halves with a slight reduction in the accuracy (which is not significant) • for the categorisation task 10 topics each with 100 documents (5 categories) • there is no difference in the classification accuracy and the time reduces only for 10% summaries • more details can be found in (Mani et al., 1998)
  134. 134. Text Summarization Challenge • is an evaluation conference organised in Japan and its main goals are to evaluate Japanese summarisers • it was organised using the SUMMAC model • precision and recall were used to evaluate single document summaries • humans had to assess the relevance of summaries from text retrieved for specific queries to these queries • is also included some readability measures (e.g. how many deletions, insertions and replacements were necessary) • more details can be found in (Fukusima and Okumura, 2001; Okumura, Fukusima, and Nanba, 2003)
  135. 135. Document Understanding Conference (DUC) • it is an evaluation conference organised part of a larger program called TIDES (Translingual Information Detection, Extraction and Summarisation) • organised from 2000 • at be beginning it was not that different from SUMMAC, but in time more difficult tasks were introduced: • 2001: single and multi-document generic summaries with 50, 100, 200, 400 words • 2002: single and multi-document generic abstracts with 50, 100, 200, 400 words, and multi-document extracts with 200 and 400 words • 2003: abstracts of documents and document sets with 10 and 100 words, and focused multi-document summaries
  136. 136. Document Understanding Conference • in 2004 participants were required to produce short (<665 bytes) and (very short <75 bytes) summaries of single documents and document sets, short document profile, headlines • from 2004 ROUGE is used as evaluation method • in 2005: short multiple document summaries, user-oriented questions • in 2006: same as in 2005 but also used pyramid evaluation • more information available at: http://duc.nist.gov/ • in 2007: 250 word summary, 100 update task, pyramid evaluation was used as a community effort • in 2008 DUC became TAC (Text Analysis Conference)
  137. 137. Structure of the course 1 Introduction to automatic summarisation 2 Important methods in automatic summarisation How humans produce summaries Single-document summarisation methods Surface-based summarisation methods Machine learning methods Methods which exploit the discourse structure Knowledge-rich methods Multi-document summarisation methods 3 Automatic summarisation and the Internet
  138. 138. Ideal summary processing model Source text(s) Interpretation Source representation Transformation Summary representation Generation Summary text
  139. 139. How humans produce summaries
  140. 140. How humans summarise documents • Determining how humans summarise documents is a difficult task because it requires interdisciplinary research • Endres-Niggemeyer (1998) breaks the process in three stages: document exploration, relevance assessment and summary production • these have been determined through interviews with professional summarisers • use a top-down approach • the expert summarisers do not attempt to understand the source in great detail, instead they are trained to identify snippets which contain important information • very few automatic summarisation methods use an approach similar to humans
  141. 141. Document exploration • it’s the first step • the source’s title, outline, layout and table of contents are examined • the genre of the texts is investigated because very often each genre dictates a certain structure • For example expository texts are expected to have a problem-solution structure • the abstractor’s knowledge about the source is represented as a schema. • schema = an abstractor’s prior knowledge of document types and their information structure
  142. 142. Relevance assessment • at this stage summarisers identify the theme and the thematic structure • theme = a structured mental representation of what the document is about • this structure allows identification of relations between text chunks • is used to identify important information, deletion of irrelevant and unnecessary information • the schema is populated with elements from the thematic structure, producing an extended structure of the theme
  143. 143. Summary production • the summary is produced from the expanded structure of the theme • in order to avoid producing a distorted summary, summarisers relay mainly on copy/paste operations • the chunks which are copied are reorganised to fit the new structure • standard sentence patters are also used • summary production is a long process which requires several iterations • checklists can be used
  144. 144. Single-document summarisation methods
  145. 145. Single document summarisation • Produces summaries from a single document
  146. 146. Single document summarisation • Produces summaries from a single document • There are two main approaches:
  147. 147. Single document summarisation • Produces summaries from a single document • There are two main approaches: • automatic text extraction → produces extracts also referred to as extract and rearrange
  148. 148. Single document summarisation • Produces summaries from a single document • There are two main approaches: • automatic text extraction → produces extracts also referred to as extract and rearrange • automatic text abstraction → produces abstracts also referred to as understand and generate
  149. 149. Single document summarisation • Produces summaries from a single document • There are two main approaches: • automatic text extraction → produces extracts also referred to as extract and rearrange • automatic text abstraction → produces abstracts also referred to as understand and generate • Automatic text extraction is the most used method to produce summaries
  150. 150. Automatic text extraction • Extracts important sentences from the text using different methods and produces an extract by displaying the important sentences (usually in order of appearance) • A large proportion of the sentences used in human produces summaries are sentences have been extracted directly from the text or which contain only minor modifications • Uses different statistical, surface-based and machine learning techniques to determine which sentences are important • First attempts made in the 50s
  151. 151. Automatic text extraction • These methods are quite robust • The main drawback of this method is that it overlooks the way in which relationships between concepts in the text are realised by the use of anaphoric links and other discourse devices • Extracting paragraphs can solve some of these problems • Some methods involve excluding the unimportant sentences instead of extracting the important sentences
  152. 152. Surface-based summarisation methods
  153. 153. Term-based summarisation • It was the first method used to produce summaries by Luhn (1958) • Relies on the assumption that important sentences have a large number of important words • The importance of a word is calculated using statistical measures • Even though this method is very simple it is still used in combination with other methods • A demo summariser which relies on term frequency can be found at: http://clg.wlv.ac.uk/projects/CAST/demos.php
  154. 154. How to compute the importance of a word • Different methods can be used: • Term frequency: how frequent is a word in the document • TF*IDF: relies on how frequent is a word in a document and in how many documents from a collection the word appears Number of documents TF ∗ IDF (w ) = TF (w ) ∗ log ( ) Number of documents with w • other statistical measures, for examples see (Or˘san, 2009) a • Issues: • stoplists should be used • what should be counted: words, lemmas, truncation, stems • how to select the document collection
  155. 155. Term-based summarisation: the algorithm (and can be used for other types of summarisers) 1 Score all the words in the source according to the selected measure
  156. 156. Term-based summarisation: the algorithm (and can be used for other types of summarisers) 1 Score all the words in the source according to the selected measure 2 Score all the sentences in the text by adding the scores of the words from these sentences
  157. 157. Term-based summarisation: the algorithm (and can be used for other types of summarisers) 1 Score all the words in the source according to the selected measure 2 Score all the sentences in the text by adding the scores of the words from these sentences 3 Extract the sentences with top N scores
  158. 158. Term-based summarisation: the algorithm (and can be used for other types of summarisers) 1 Score all the words in the source according to the selected measure 2 Score all the sentences in the text by adding the scores of the words from these sentences 3 Extract the sentences with top N scores 4 Present the extracted sentences in the original order
  159. 159. Position method • It was noticed that in some genres important sentence appear in predefined positions • First used by Edmundson (1969) • Depends very much from one genre to another: • newswire: lead summary the first few sentences from the text • scientific papers: the first/last sentences in the paragraph are relevant for the topic of the paragraph (Baxendale, 1958) • scientific papers: important information occurs in specific sections of the document (introduction/conclusion) • Lin and Hovy (1997) use a corpus to determine the where these important sentences occur
  160. 160. Title method • words in titles and headings are positively relevant to summarisation • Edmundson (1969) noticed that can lead to an increase in performance of up to 8% if the score of sentences which include such words are increased
  161. 161. Cue words/indicating phrases • Makes use of words or phrases classified as ”positive” or ”negative” which may indicate the topicality and thus the sentence value in an abstract • positive: significant, purpose, in this paper, we show, • negative: Figure 1, believe, hardly, impossible, pronouns • Paice (1981) proposes indicating phrases which are basically patterns (e.g. [In] this paper/report/article we/I show)
  162. 162. Methods inspired from IR (Salton et. al., 1997) • decomposes a document in a set of paragraphs • computes the similarity between paragraphs and it represents the strength of the link between two paragraphs • similar paragraphs are considered those who have a similarity above a threshold • paragraphs can be extracted according to different strategies (e.g. the number of links they have, select connected paragraphs)
  163. 163. How to combine different methods • Edmundson (1969) used a linear combination of features: Weight(S) = α∗Title(S)+β∗Cue(S)+γ∗Keyword(S)+δ∗Position(S) • the weights were adjusted manually • the best system was cue + title + position • it is better to use machine learning methods to combine the results of different modules
  164. 164. Machine learning methods
  165. 165. What is machine learning (ML)? Mitchell (1997): • “machine learning is concerned with the question of how to construct computer programs that automatically improve with experience” • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”
  166. 166. What is machine learning? (2) • Reasoning is based on the similarity between new situations and the ones present in the training corpus • In some cases it is possible to understand what it is learnt (e.g. If-then rules) • But in many cases the knowledge learnt by an algorithm cannot be easily understood (instance-based learning, neural networks)
  167. 167. ML for language processing • Has been widely employed in a large number of NLP applications which range from part-of-speech tagging and syntactic parsing to word-sense disambiguation and coreference resolution. • In NLP both symbolic methods (e.g. decision trees, instance-based classifiers) and numerically oriented statistical and neural-network training approaches were used
  168. 168. ML as classification task Very often an NLP problem can be seen as a classification problem • POS: finding the appropriate class of a word • Segmentation (e.g. noun phrase extraction): each word is classified as the beginning, end or inside of the segment • Anaphora/coreference resolution: classify candidates in antecedent/non-antecedent
  169. 169. Summarisation as a classification task • Each example (instance) in the set to be learnt can be described by a set of features f1 , f2 , ...fn • The task is to find a way to assign an instance to one of the m disjoint classes c1 , c2 , ..., cm • The automatic summarisation process is usually transformed in a classification one • The features are different properties of sentences (e.g. position, keywords, etc.) • Two classes: extract/do-not-extract • Not always classification. It is possible to use the score or automatically learnt rules as well
  170. 170. Kupiec et. al. (1995) • used a Bayesian classifier to combine different features • the features were: • if the length of a sentence is above a threshold (true/false) • contains cue words (true/false) • position in the paragraph (initial/middle/final) • contains keywords (true/false) • contains capitalised words (true/false) • the training and testing corpus consisted of 188 documents with summaries • humans identified sentences from the full text which are used in the summary • the best combination was position + cue + length • Teufl and Moens (1997) used a similar method for sentence extraction
  171. 171. Mani and Bloedorn (1998) • learn rules about how to classify sentences • features used: • location features: location of sentence in paragraph, sentence in special section, etc. • thematic features: tf score, tf*idf score, number of section heading words • cohesion features: number of sentences with a synonym link to sentence • user focused features: number of terms relevant to the topic • Example of rule learnt: IF sentence in conclusion & tf*idf high & compression = 20% THEN summary sentence
  172. 172. Other ML methods • Osborne (2002) used maximum entropy with features such as word pairs, sentence length, sentence position, discourse features (e.g., whether sentence follows the “Introduction”, etc.) • Knight and Marcu(2000) use noisy channel for sentence compression • Conroy et. al. (2001) use HMM • Most of the methods these days try to use machine learning
  173. 173. Methods which exploit the discourse structure
  174. 174. Methods which exploit discourse cohesion • summarisation methods which use discourse structure usually produce better quality summaries because they consider the relations between the extracted chunks • they rely on global discourse structure • they are more difficult to implement because very often the theories on which they are based are difficult and not fully understood • there are methods which use text cohesion and text coherence • very often it is difficult to control the length of summaries produced in this way
  175. 175. Methods which exploit text cohesion • text cohesion involves relations between words, word senses, referring expressions which determine how tightly connected the text is • (S13) ”All we want is justice in our own country,” aboriginal activist Charles Perkins told Tuesday’s rally. ... (S14) ”We don’t want budget cuts - it’s hard enough as it is ,” said Perkins • there are methods which exploit lexical chains and coreferential chains
  176. 176. Methods which exploit text cohesion • text cohesion involves relations between words, word senses, referring expressions which determine how tightly connected the text is • (S13) ”All we want is justice in our own country,” aboriginal activist Charles Perkins told Tuesday’s rally. ... (S14) ”We don’t want budget cuts - it’s hard enough as it is ,” said Perkins • there are methods which exploit lexical chains and coreferential chains
  177. 177. Lexical chains for text summarisation • Telepattan system: Bembrahim and Ahmad (1995) • two sentences are linked if the words are related by repetition, synonymy, class/superclass, paraphrase • sentences which have a number of links above a threshold form a bond • on the basis of bonds a sentence has to previous and following sentences it is possible to classify them as start topic, end topic and mid topic • sentences are extracted on the basis of open-continue-end topic • Barzilay and Elhadad (1997) implemented a more refined version of the algorithm which includes ambiguity resolution
  178. 178. Using coreferential chains for text summarisation • method presented in (Azzam, Humphreys, Gaizauskas, 1999) • the underlying idea is that it is possible to capture the most important topic of a document by using a principal coreferential chain • The LaSIE system was used to produce the coreferential chains extended with a focus-based algorithm for resolution of pronominal anaphora
  179. 179. Coreference chain selection The summarisation module implements several selection criteria: • Length of chain: prefers a chain which contains most entires which represents the most mentioned instance in a text • Spread of the chain: the distance between the earliest and the latest entry in each chain • Start of Chain: the chain which starts in the title or in the first paragraph of the text (this criteria could be very useful for some genres such as newswire)
  180. 180. Summarisation methods which use rhetorical structure of texts • it is based on the Rhetorical Structure Theory (RST) (Mann and Thompson, 1988) • according to this theory text is organised in non-overlapping spans which are linked by rhetorical relations and can be organised in a tree structure • there are two types of spans: nuclei and satellites • a nucleus can be understood without satellites, but not the other way around • satellites can be removed in order to obtain a summary • the most difficult part is to build the rhetorical structure of a text • Ono, Sumita and Miike (1994), Marcu (1997) and Corston-Oliver (1998) present summarisation methods which use the rhetorical structure of the text
  181. 181. from (Marcu, 2000)
  182. 182. Summarisation using argumentative zoning • Teufel and Moens (2002) exploit the structure of scientific documents in order to produce summaries • the summarisation process is split into two parts 1 identification of important sentences using an approach similar to the one proposed by Kupiec, Pederson, and Chen (1995) 2 recognition of the rhetorical roles of the extracted sentences • for rhetorical roles the following classes are used: Aim, Textual, Own, Background, Contrast, Basis, Other
  183. 183. Knowledge-rich methods
  184. 184. Knowledge rich methods • Produce abstracts • Most of them try to “understand” (at least partially a text) and to make inferences before generating the summary • The systems do not really understand the contents of the documents, but they are using different techniques to extract the meaning • Since this process involves a huge amount of world knowledge the application is restricted to a specific domain only
  185. 185. Knowledge-rich methods • The abstracts obtained in this way are betters in terms of cohesion and coherence • The abstracts produced in this way tend to be more informative • This method is also known as the understand and generate approach • This method extracts the information from the text and holds it in some intermediate form • The representation is then used as the input for a natural language generator to produce an abstract
  186. 186. FRUMP (deJong, 1982) • uses sketchy scripts to understand a situation • these scripts only keep the information relevant to the event and discard the rest • 50 scripts were manually created • words from the source activate scripts and heuristics are used to decide which script is used in case more than one script is activated
  187. 187. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location
  188. 188. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location 2 The demonstrators march
  189. 189. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location 2 The demonstrators march 3 The police arrive on the scene
  190. 190. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location 2 The demonstrators march 3 The police arrive on the scene 4 The demonstrators communicate with the target of the demonstration
  191. 191. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location 2 The demonstrators march 3 The police arrive on the scene 4 The demonstrators communicate with the target of the demonstration 5 The demonstrators attack the target of the demonstration
  192. 192. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location 2 The demonstrators march 3 The police arrive on the scene 4 The demonstrators communicate with the target of the demonstration 5 The demonstrators attack the target of the demonstration 6 The demonstrators attack the police
  193. 193. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location 2 The demonstrators march 3 The police arrive on the scene 4 The demonstrators communicate with the target of the demonstration 5 The demonstrators attack the target of the demonstration 6 The demonstrators attack the police 7 The police attack the demonstrators
  194. 194. Example of script used by FRUMP 1 The demonstrators arrive at the demonstration location 2 The demonstrators march 3 The police arrive on the scene 4 The demonstrators communicate with the target of the demonstration 5 The demonstrators attack the target of the demonstration 6 The demonstrators attack the police 7 The police attack the demonstrators 8 The police arrest the demonstrators
  195. 195. FRUMP • the evaluation of the system revealed that it could not process a large number of scripts because it did not have the appropriate scripts • the system is very difficult to be ported to a different domain • sometimes it can misunderstand some scripts: Vatican City. The dead of the Pope shakes the world. He passed away → Earthquake in the Vatican. One dead. • the advantage of this method is that the output can be in any language
  196. 196. Concept-based abstracting (Paice and Jones, 1993) • Also referred to as extract and generate • Summaries in the field of agriculture • Relies on predefined text patterns such as this paper studies the effect of [AGENT] on the [HLP] of [SPECIES] → This paper studies the effect of G. pallida on the yield of potato. • The summarisation process involves instantiation of patterns with concepts from the source • Each pattern has a weight with is used to decide whether the generated sentence is included in the output • This method is good to produce informative summaries
  197. 197. Other knowledge-rich methods • Rumelhart (1975) developed a system to understand and summarise simple stories, using a grammar which generated semantic interpretations of the story on the basis of hand-coded rules. • Alterman (1986) used local understanding • Fum, Guida, and Tasso (1985) tries to replicate the human summarisation process • Rau, Jacobs, and Zernik (1989) integrates a bottom-up linguistic analyser and a top-down conceptual interpretation
  198. 198. Multi-document summarisation methods
  199. 199. Multi-document summarisation • multi-document summarisation is the extension of single-document summarisation to collections of related documents • very rarely methods from single-document summarisation can be directly used • it is not possible to produce single-document summaries from every single document in collection and then to concatenate them • normally they are user-focused summaries
  200. 200. Issues with multi-document summaries • the collections to be summarised can vary a lot in size, so different methods might need to be used • a much higher compression rate is needed • redundancy • ordering of sentences (usually the date of publication is used) • similarities and differences between different texts need to be considered • contradiction between information • fragmentary information
  201. 201. IR inspired methods • Salton et. al. (1997) can be adapted to multi-document summarisation • instead of using paragraphs from one documents, paragraphs from all the documents are used • the extraction strategies are kept
  202. 202. Maximal Marginal Relevance • proposed by (Goldstein et al., 2000) • addresses the redundancy among multiple documents • allows a balance between the diversity of the information and relevance to a user query • MMR(Q, R, S) = argmaxDi ∈RS [λSim1 (Di , Q) − (1 − λ)maxDj ∈R Sim2 (Di , Dj ))] • can be used also for single document summarisation
  203. 203. Cohesion text maps • use knowledge based on lexical cohesion Mani and Bloedorn (1999) • good to compare pairs of documents and tell what’s common, what’s different • builds a graph from the texts: the nodes of the graph are the words of the text. Arcs represent adjacency, grammatical, co-reference, and lexical similarity-based relations. • sentences are scored using tf.idf metric. • user query is used to traverse the graph (a spread activation is used) • to minimize redundancy in extracts, extraction can be greedy to cover as many different terms as possible
  204. 204. Cohesion text maps
  205. 205. Theme fusion Barzilay et. al. (1999) • used to avoid redundancy in multi-document summaries • Theme = collection of similar sentences drawn from one or more related documents • Computes theme intersection: phrases which are common to all sentences in a theme • paraphrasing rules are used (active vs. passive, different orders of adjuncts, classifier vs. apposition, ignoring certain premodifiers in NPs, synonymy) • generation is used to put the theme intersection together
  206. 206. Centroid based summarisation • a centroid = a set of words that are statistically important to a cluster of documents • each document is represented as a weighted vector of TF*IDF scores • each sentence receives a score equal with the sum of individual centroid values • sentence salience Boguraev and Kennedy (1999) • centroid score Radev, Jing, and Budzikowska (2000)
  207. 207. Cross Structure Theory • Cross Structure Theory provides a theoretical model for issues that arise when trying to summarise multiple texts (Radev, Otterbacher, and Zhang, 2004). • describing relationships between two or more sentences from different source documents related to the same topic. • similar to RST but at cross-document level • 18 domain-independent relations such as identity, equivalence, subsumption, contradiction, overlap, fulfilment and elaboration between texts spans • can be used to extract sentences and avoid redundancy
  208. 208. Automatic summarisation and the Internet
  209. 209. • New research topics have emerged at the confluence of summarisation with other disciplines (e.g. question answering and opinion mining) • Many of these fields appeared as a result of the expansion of the Internet • The Internet is probably the largest source of information, but it is largely unstructured and heterogeneous • Multi-document summarisation is more necessary than ever • Web content mining = extraction of useful information from the Web
  210. 210. Challenges posed by the Web • Huge amount of information • Wide and diverse • Information of all types e.g. structured data, texts, videos, etc. • Semi-structured • Linked • Redundant • Noisy
  211. 211. Summarisation of news on the Web • Newsblaster (McKeown et. al. 2002) summarises news from the Web (http://newsblaster.cs.columbia.edu/) • it is mainly statistical, but with symbolic elements • it crawls the Web to identify stories (e.g. filters out ads), clusters them on specific topics and produces a multidocument summary • theme sentences are analysed and fused together to produce the summary • summaries also contain images using high precision rules • similar services: newsinessence, Google News, News Explorer • tracking and updating are important features of such systems
  212. 212. Email summarisation • email summarisation is more difficult because they have a dialogue structure • Muresan et. al. (2001) use machine learning to learn rules for salient NP extraction • Nenkova and Bagga (2003) use developed a set of rules to extract important sentences • Newman and Blitzer (2003) use clustering to group messages together and then they extract a summary from each cluster • Rambow et. al. (2004) automatically learn rules to extract sentences from emails • these methods do not use may email specific features, but in general the subject of the first email is used as a query
  213. 213. Blog summarisation • Zhou et. al. (2006) see a blog entry as a summary of a news stories with personal opinions added. They produce a summary by deleting sentences not related to the story • Hu et. al. (2007) use blog’s comments to identify words that can be used to extract sentences from blogs • Conrad et. al. (2009) developed a query-based opinion summarisation for legal blog entries based on the TAC 2008 system
  214. 214. Opinion mining and summarisation • find what reviewers liked and disliked about a product • usually large number of reviews, so an opinion summary should be produced • visualisation of the result is important and it may not be a text • analogous to, but different to multi-document summarisation
  215. 215. Producing the opinion summary A three stage process: 1 Extract object features that have been commented on in each review. 2 Classify each opinion 3 Group feature synonym and produce the summary (pro vs. cons, detailed review, graphical representation)
  216. 216. Opinion summaries • Mao and Lebanon (2007) suggest to produce summaries that track the sentiment flow within a document i.e., how sentiment orientation changes from one sentence to the next • Pang and Lee (2008) suggest to create “subjectivity extracts.” • sometimes graph-based output seems much more appropriate or useful than text-based output • in traditional summarization redundant information is often discarded, in opinion summarization one wants to track and report the degree of redundancy, since in the opinion-oriented setting the user is typically interested in the (relative) number of times a given sentiment is expressed in the corpus. • there is much more contradictory information
  217. 217. Opinion summarisation at TAC • the Text Analysis Conference 2008 (TAC) contained an opinion summarisation from blogs • http://www.nist.gov/tac/ • generate summaries of opinions about targets • What features do people dislike about Vista? • a question answering system is used to extract snippets that are passed to the summariser
  218. 218. QA and Summarisation at INEX2009 • the QA track at INEX2009 requires participants to answer factual and complex questions • the complex questions will require to aggregate the answer from several documents • What are the main applications of bayesian networks in the field of bioinformatics? • for complex sentences evaluators will mark syntactic incoherence, unresolved anaphora, redundancy and not answering the question • Wikipedia will be used as document collection
  219. 219. Conclusions • research in automatic summarisation is still a very active, but in many cases it merges with other fields • evaluation is still a problem in summarisation • the current state-of-the-art is still sentence extraction • more language understanding needs to be added to the systems
  220. 220. Thank you! More information and updates at: http://www.summarizationonline.info
  221. 221. References
  222. 222. Alterman, Richard. 1986. Summarisation in small. In N. Sharkey, editor, Advances in cognitive science. Chichester, England, Ellis Horwood. American National Standards Institute Inc. 1979. American National Standard for Writing Abstracts. Technical Report ANSI Z39.14 – 1979, American National Standards Institute, New York. Baxendale, Phyllis B. 1958. Man-made index for technical literature - an experiment. I.B.M. Journal of Research and Development, 2(4):354 – 361. Boguraev, Branimir and Christopher Kennedy. 1999. Salience-based content characterisation of text documents. In Inderjeet Mani and Mark T. Maybury, editors, Advances in Automated Text Summarization. The MIT Press, pages 99 – 110. Borko, Harold and Charles L. Bernier. 1975. Abstracting concepts and methods. Academic Press, London. Brandow, Ronald, Karl Mitze, and Lisa F. Rau. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing & Management, 31(5):675 – 685. Cleveland, Donald B. 1983. Introduction to Indexing and Abstracting. Libraries Unlimited, Inc. Conroy, James M., Jjudith D. Schlesinger, Dianne P. O’Leary, and Mary E. Okurowski. 2001. Using HMM and logistic regression to generate extract summaries for DUC. In Proceedings of the 1st Document Understanding Conference, New Orleans, Louisiana USA, September 13-14. DeJong, G. 1982. An overview of the FRUMP system. In W. G. Lehnert and M. H. Ringle, editors, Strategies for natural language processing. Hillsdale, NJ: Lawrence Erlbaum, pages 149 – 176. Edmundson, H. P. 1969. New methods in automatic extracting. Journal of the Association for Computing Machinery, 16(2):264 – 285, April.
  223. 223. Endres-Niggemeyer, Brigitte. 1998. Summarizing information. Springer. Fukusima, Takahiro and Manabu Okumura. 2001. Text Summarization Challenge Text summarization evaluation in Japan (TSC). In Proceedings of Automatic Summarization Workshop. Fum, Danilo, Giovanni Guida, and Carlo Tasso. 1985. Evaluating importance: a step towards text summarisation. In Proceedings of the 9th International Joint Conference on Artificial Intelligence, pages 840 – 844, Los Altos CA, August. Goldstein, Jade, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell. 1999. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 121 – 128, Berkeley, California, August, 15 – 19. Goldstein, Jade, Vibhu O. Mittal, Jamie Carbonell, and Mark Kantrowitz. 2000. Multi-Document Summarization by Sentence Extraction. In Udo Hahn, Chin-Yew Lin, Inderjeet Mani, and Dragomir R. Radev, editors, Proceedings of the Workshop on Automatic Summarization at the 6th Applied Natural Language Processing Conference and the 1st Conference of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, April. Graetz, Naomi. 1985. Teaching EFL students to extract structural information from abstracts. In J. M. Ulign and A. K. Pugh, editors, Reading for Professional Purposes: Methods and Materials in Teaching Languages. Leuven: Acco, pages 123–135. Hasler, Laura, Constantin Or˘san, and Ruslan Mitkov. 2003. Building better corpora a for summarisation. In Proceedings of Corpus Linguistics 2003, pages 309 – 319, Lancaster, UK, March, 28 – 31. Hovy, Eduard. 2003. Text summarisation. In Ruslan Mitkov, editor, The Oxford Handbook of computational linguistics. Oxford University Press, pages 583 – 598.
  224. 224. Jing, Hongyan and Kathleen R. McKeown. 1999. The decomposition of human-written summary sentences. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR’99), pages 129 – 136, University of Berkeley, CA, August. Johnson, Frances. 1995. Automatic abstracting research. Library review, 44(8):28 – 36. Knight, Kevin and Daniel Marcu. 2000. Statistics-based summarization — step one: Sentence compression. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI), pages 703 – 710, Austin, Texas, USA, July 30 – August 3. Kolcz, Aleksander, Vidya Prabakarmurthi, and Jugal Kalita. 2001. Summarization as feature selection for text categorization. In Proceedings of the 10th International Conference on Information and Knowledge Management, pages 365 – 370, Atlanta, Georgia, US, October 05 - 10. Kuo, June-Jei, Hung-Chia Wung, Chuan-Jie Lin, and Hsin-Hsi Chen. 2002. Multi-document summarization using informative words and its evaluation with a QA system. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), pages 391 – 401, Mexico City, Mexico, February, 17 – 23. Kupiec, Julian, Jan Pederson, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th ACM/SIGIR Annual Conference on Research and Development in Information Retrieval, pages 68 – 73, Seattle, July 09 – 13. Lin, Chin-Yew. 2004. Rouge: a package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26. Lin, Chin-Yew and Eduard Hovy. 1997. Identifying topic by position. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 283 – 290, Washington, DC, March 31 – April 3.
  225. 225. Louis, Annie and Ani Nenkova. 2009. Performance confidence estimation for automatic summarization. In Proceedings of the 12th Conference of the European Chapter of the ACL, page 541548, Athens, Greece, March 30 - April 3. Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal of research and development, 2(2):159 – 165. Mani, Inderjeet and Eric Bloedorn. 1998. Machine learning of generic and user-focused summarization. In Proceedings of the Fifthteen National Conference on Artificial Intelligence, pages 821 – 826, Madison, Wisconsin. MIT Press. Mani, Inderjeet and Eric Bloedorn. 1999. Summarizing similarities and differences among related documents. In Inderjeet Mani and Mark T. Maybury, editors, Advances in automatic text summarization. The MIT Press, chapter 23, pages 357 – 379. Mani, Inderjeet, Therese Firmin, David House, Michael Chrzanowski, Gary Klein, Lynette Hirshman, Beth Sundheim, and Leo Obrst. 1998. The TIPSTER SUMMAC text summarisation evaluation: Final report. Technical Report MTR 98W0000138, The MITRE Corporation. Mani, Inderjeet and Mark T. Maybury, editors. 1999. Advances in automatic text summarisation. MIT Press. Marcu, Daniel. 1999. The automatic construction of large-scale corpora for summarization research. In The 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), pages 137–144, Berkeley, CA, August 15 – 19. Marcu, Daniel. 2000. The theory and practice of discourse parsing and summarisation. The MIT Press. Miike, Seiji, Etsuo Itoh, Kenji Ono, and Kazuo Sumita. 1994. A full-text retrieval system with a dynamic abstract generation function. In Proceedings of the 17th ACM SIGIR conference, pages 152 – 161, Dublin, Ireland, 3-6 July. ACM/Springer.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×