Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Query-Based Summarization Mariana Damova 30.07.2010
Outline <ul><li>Definition of the task </li></ul><ul><li>DUC evaluation criteria </li></ul><ul><li>General purpose approac...
The task of query-based summarization <ul><li>Producing a summary from a document or a set of documents satisfying a reque...
Types of summaries <ul><li>Summary construction methods </li></ul><ul><ul><li>Abstractive vs. Extractive </li></ul></ul><u...
Steps in the query-based summarization process <ul><li>Identification of relevant sections from the documents </li></ul><u...
Evaluation of DUC <ul><li>Recall-Oriented Understudy for Gisting Evaluation  (ROUGE) </li></ul><ul><li>DUC conferences sta...
Approaches based on Document graphs <ul><li>Ahmed A. Mohamed, Sanguthevar Rajasekaran </li></ul><ul><li>Query-Based Summar...
Approaches based on Document graphs <ul><li>Wauter Bosma (2005). Query-Based Summarization using Rhetorical Structure Theo...
Approaches using linguistics <ul><li>John M. Conroy, Judith D. Schlesinger, Jade Goldstein Stewart (2005). CLASSY Query-Ba...
Approaches using linguistics <ul><li>Liang Zhou, Chin-Yew, Eduard Hovy (2006). Summarizing Answers for Complicated Questio...
Machine-learning approaches <ul><li>Jagadeesh J, Prasad Pingali, Vasudeva Varma (2007).  Capturing Sentence Prior for Quer...
Machine-learning approaches <ul><li>Frank Schilder, Ravikumar Kondadadi (2008). FastSum: Fast and accurate query-based mul...
Application Tailored Systems <ul><li>Subject domain ontology based approach </li></ul><ul><li>Opinion Summarization </li><...
Medical Information Summarization System <ul><li>Uses UMLS and ontology from the National Library of Medicine </li></ul><u...
Opinion summarization <ul><li>Sentiment summarization in the legal domain </li></ul><ul><li>Opinion related question and a...
Conclusion <ul><li>CLASSY and FastSum score highest on the ROUGE criteria: top 4 and top 7 and 6 </li></ul>
Upcoming SlideShare
Loading in …5
×

Query Based Summarization

3,205 views

Published on

  • Be the first to comment

Query Based Summarization

  1. 1. Query-Based Summarization Mariana Damova 30.07.2010
  2. 2. Outline <ul><li>Definition of the task </li></ul><ul><li>DUC evaluation criteria </li></ul><ul><li>General purpose approaches </li></ul><ul><li>Application tailored systems </li></ul><ul><li>Conclusion </li></ul>
  3. 3. The task of query-based summarization <ul><li>Producing a summary from a document or a set of documents satisfying a request for information expressed by a query. </li></ul><ul><li>The summary is a sequence of sentences, which can be extracted from the documents, or produced with NLP techniques. </li></ul>
  4. 4. Types of summaries <ul><li>Summary construction methods </li></ul><ul><ul><li>Abstractive vs. Extractive </li></ul></ul><ul><li>Number of sources for the summary </li></ul><ul><ul><li>Single-document summaries vs. Multi-document summaries </li></ul></ul><ul><li>Summary trigger </li></ul><ul><ul><li>Generic vs. query-based </li></ul></ul><ul><ul><ul><li>Indicative </li></ul></ul></ul><ul><ul><ul><li>Informative </li></ul></ul></ul>
  5. 5. Steps in the query-based summarization process <ul><li>Identification of relevant sections from the documents </li></ul><ul><li>Generation of the summary </li></ul>
  6. 6. Evaluation of DUC <ul><li>Recall-Oriented Understudy for Gisting Evaluation (ROUGE) </li></ul><ul><li>DUC conferences starting 2001 run by the National Institute of Standards and Technology (NIST) </li></ul><ul><li>(Number of MUs marked) • E C = ------------------------------------------------------------- </li></ul><ul><li>Total number of MUs in the model summary </li></ul><ul><li>E , the ratio of completeness, ranges from 1 to 0: 1 for all , 3/4 for most , 1/2 for some , 1/4 for hardly any , and 0 for none . </li></ul>
  7. 7. Approaches based on Document graphs <ul><li>Ahmed A. Mohamed, Sanguthevar Rajasekaran </li></ul><ul><li>Query-Based Summarization Based on Document Graphs (2006) </li></ul><ul><li>The document graph is produced from a plain text document by tokenizing and parsing it into NPs. The relations of the type ISA, related_to, are generated following heuristic rules. </li></ul><ul><li>A centric graph is produced from all source documents and guides the summarizer in its search for candidate sentences to be added to the outputs summary. </li></ul><ul><li>Summarization: </li></ul><ul><li>(a) The centric graph is compared with the concepts in the query </li></ul><ul><li>(b) The graph of the document and a graph of the query are generated, and the similarity between each sentence and the query are measured </li></ul><ul><li>(c) A query modification technique is used by including the graph of a selected sentence to the query graph </li></ul>
  8. 8. Approaches based on Document graphs <ul><li>Wauter Bosma (2005). Query-Based Summarization using Rhetorical Structure Theory </li></ul><ul><li>Shows how answers to questions can be improved by extracting more information about the topic with summarization techniques for a single document extracts. </li></ul><ul><li>The RST (Rhetorical Structure Theory) is used to create a graph representation of the document – a weighted graph in which each node represents a sentence and the weight of an edge represents the distance between two sentences. </li></ul><ul><li>If a sentence is relevant to an answer, a second sentence is evaluated as relevant too, based on the weight of the path between the two sentences. </li></ul><ul><li>Two step approach: </li></ul><ul><ul><li>Relations between sentences are defined in a discourse graph </li></ul></ul><ul><ul><li>A graph search algorithm is used to extract the most salient sentences from the graph for the summary. </li></ul></ul>
  9. 9. Approaches using linguistics <ul><li>John M. Conroy, Judith D. Schlesinger, Jade Goldstein Stewart (2005). CLASSY Query-Based Multi-Document Summarization. </li></ul><ul><li>HMM (Hidden Markov Model) for sentence selection within a document and a question answering algorithm for generation of a multi-document summary </li></ul><ul><li>Patterns with lexical cues for sentence and phrase elimination </li></ul><ul><li>Typographic cues (title, paragraph, etc.) to detect the topic description and obtain question-answering capability </li></ul><ul><li>Named entity identifier ran on all document sets generates lists of entities for the categories of location, person, date, organization, and evaluates each topic description based on keywords </li></ul><ul><li>After all linguistic processing and query terms generated, HMM model is used to score the individual sentences as summary or non-summary ones </li></ul>
  10. 10. Approaches using linguistics <ul><li>Liang Zhou, Chin-Yew, Eduard Hovy (2006). Summarizing Answers for Complicated Questions. </li></ul><ul><li>Query interpretation is used to analyze the given user profile and topic narrative for document clusters, then the summary is created </li></ul><ul><li>The analysis is based on basic elements, head-modifier relation triple representation of the document content produced from a syntactic parse tree, and a set of ‘cutting rules’, extracting just the valid basic elements from the tree </li></ul><ul><li>Scores are assigned to the sentences based on their basic elements </li></ul><ul><li>Filtering and redundancy removal techniques are applied before generating the summary </li></ul><ul><li>The summary outputs the topmost sentences until the required sentence limit is reached </li></ul>
  11. 11. Machine-learning approaches <ul><li>Jagadeesh J, Prasad Pingali, Vasudeva Varma (2007). Capturing Sentence Prior for Query-Based Multi-Document Summarization. </li></ul><ul><li>Information retrieval techniques combined with summarization techniques </li></ul><ul><li>New notion of sentence importance independent of query into the final scoring </li></ul><ul><li>Sentences are scored using a set of features from all sentences, normalized in a maximum score, and the final score of a sentence is calculated using a weighted linear combination of individual feature values </li></ul><ul><li>Information measure </li></ul><ul><ul><li>A query dependent ranking of a document/sentence </li></ul></ul><ul><ul><li>Explicit notion of importance of a document/sentence </li></ul></ul>
  12. 12. Machine-learning approaches <ul><li>Frank Schilder, Ravikumar Kondadadi (2008). FastSum: Fast and accurate query-based multi-document summarization. </li></ul><ul><li>Word-frequency features of clusters, documents and topics </li></ul><ul><li>Summary sentences are ranked by a regression Support Vector Machine </li></ul><ul><li>Sentence splitting </li></ul><ul><li>Filtering candidate sentences </li></ul><ul><li>Computing the word frequencies in the documents of a cluster </li></ul><ul><li>Topic description (a list of key words, and phrases) </li></ul><ul><li>Topic title (the query of queries) </li></ul><ul><li>The features used are word-based and sentence-based </li></ul><ul><li>Least Angle Regression – minimal set of features, fast processing times </li></ul>
  13. 13. Application Tailored Systems <ul><li>Subject domain ontology based approach </li></ul><ul><li>Opinion Summarization </li></ul>
  14. 14. Medical Information Summarization System <ul><li>Uses UMLS and ontology from the National Library of Medicine </li></ul><ul><li>The summarization algorithm is term-based, only terms defined in UMLS are recognized and processed. </li></ul><ul><li>Steps </li></ul><ul><ul><li>Revising the query with UMLS ontology knowledge </li></ul></ul><ul><ul><li>Calculating distance of each sentence in the document wrt query </li></ul></ul><ul><ul><li>Calculating pair-wise distances among the candidate sentences </li></ul></ul>
  15. 15. Opinion summarization <ul><li>Sentiment summarization in the legal domain </li></ul><ul><li>Opinion related question and a set of documents that contain the answer, summary of for each target that summarizes the answers to the questions </li></ul><ul><li>Semi-automatic Web blog search module </li></ul><ul><li>FastSum </li></ul><ul><li>Sentiment integration (sentiment tagger, based on unigram term lookup using gazetteers of positive and negative polarity indicating terms based on the General Inquirer </li></ul>
  16. 16. Conclusion <ul><li>CLASSY and FastSum score highest on the ROUGE criteria: top 4 and top 7 and 6 </li></ul>

×