Never Stand Still Faculty of Engineering Computer Science and Engineering
Click to edit Present’s Name
Extractive Summarisation Based on Keyword Profile and
Language Model
Han Xu, Eric Martin and Ashesh Mahidadia
1/12
School of Computer Science and Engineering
Research Overview
2/12
Motivation:
 Research benefits from and expands on the work of others
 Interdependent nature of knowledge creates needs to explore new fields
 Large amount of literature -- challenging task
Objectives:
Design a tool that facilitates identification of the key contributions of papers
1. First by identifying keywords that capture the most important contributions of a paper
2. Then by creating an extractive summary consisting of information-rich sentences that
cover the former
School of Computer Science and Engineering
Theoretical Background
Abstract
– Retrospective perception by the authors
– At the time of writing
– Low information redundancy
o McDonald et al (2005) use the Chu-Liu-Edmonds (CLE) algorithm to solve the maximum
spanning tree problem.
o To learn these structures we used online large-margin learning (McDonald et al, 2005) that
empirically provides state-of-the-art performance for Czech.
Citation summary
– Extrospective judgment by the community
– Over a period of time
– High information redundancy
 Source of contributions:
 Qazvinian’s single paper summarisation corpus:
 25 highly cited papers in the ACL Anthology Network from 5 different domains
 Two files provided for each paper:
1. A citation summary
2. A manually constructed key contribution list
3/12
School of Computer Science and Engineering
Qualifying Key Contributions
4/12
 Statistical Characteristics:
 Over-representedness: keywords that are frequently used when citing a paper
 Exclusiveness: keywords that are only used when citing a paper
Citation sentences containing W1
Citation sentences containing W2
School of Computer Science and Engineering
Multi-stage Extractive Summarisation
5/12
School of Computer Science and Engineering
Stage 1: Keyword Profiling
 Input:
1. Citation summary of the target paper
2. Citation summaries of all papers in the target paper’s domain
 Method: one-tailed Fisher’s exact test
 Output: keyword profile
6/12
School of Computer Science and Engineering
Stage 2: Keyword Profile Language Modelling
 Input: paper keyword profile
 Method: negative log transformation
 Output: keyword profile language model (KPLM)
1. Directly encodes words’ salience as pseudo generative probabilities
2. More discriminative than a traditional language model
7/12
School of Computer Science and Engineering
Stage 3: Summarisation as Model Divergence IR
 Input: keyword profile language model
 Method: negative cross entropy retrieval model
 Output: top-k sentences whose MLEs are of the smallest divergence to the KPLM
8/12
School of Computer Science and Engineering
Stage 4: Novelty-driven Re-ranking
 Input: top-k ranked sentence pool
 Method: Top Sentence Re-ranking (TSR)
 Output: 5-sentence extractive summary
9/12
School of Computer Science and Engineering
Results
 Evaluation method: Pyramid score
 Performance comparison:
 Resilience to more stringent summarisation size limit:
10/12
School of Computer Science and Engineering
Conclusion
11/12
 KPLM -- a multi-stage statistical summarisation framework:
1. Keyword profiling
2. Keyword profile language modelling
3. Summarisation as model divergence based IR
4. Novelty-driven re-ranking
 State-of-the-art performance in summarising scientific papers
 Good resilience to more stringent summary length limit
 Future Work:
 Higher order n-grams
 Multiple paper summarisation
School of Computer Science and Engineering
THANK YOU!
12/12
QUESTIONS?

NAACL2015 presentation

  • 1.
    Never Stand StillFaculty of Engineering Computer Science and Engineering Click to edit Present’s Name Extractive Summarisation Based on Keyword Profile and Language Model Han Xu, Eric Martin and Ashesh Mahidadia 1/12
  • 2.
    School of ComputerScience and Engineering Research Overview 2/12 Motivation:  Research benefits from and expands on the work of others  Interdependent nature of knowledge creates needs to explore new fields  Large amount of literature -- challenging task Objectives: Design a tool that facilitates identification of the key contributions of papers 1. First by identifying keywords that capture the most important contributions of a paper 2. Then by creating an extractive summary consisting of information-rich sentences that cover the former
  • 3.
    School of ComputerScience and Engineering Theoretical Background Abstract – Retrospective perception by the authors – At the time of writing – Low information redundancy o McDonald et al (2005) use the Chu-Liu-Edmonds (CLE) algorithm to solve the maximum spanning tree problem. o To learn these structures we used online large-margin learning (McDonald et al, 2005) that empirically provides state-of-the-art performance for Czech. Citation summary – Extrospective judgment by the community – Over a period of time – High information redundancy  Source of contributions:  Qazvinian’s single paper summarisation corpus:  25 highly cited papers in the ACL Anthology Network from 5 different domains  Two files provided for each paper: 1. A citation summary 2. A manually constructed key contribution list 3/12
  • 4.
    School of ComputerScience and Engineering Qualifying Key Contributions 4/12  Statistical Characteristics:  Over-representedness: keywords that are frequently used when citing a paper  Exclusiveness: keywords that are only used when citing a paper Citation sentences containing W1 Citation sentences containing W2
  • 5.
    School of ComputerScience and Engineering Multi-stage Extractive Summarisation 5/12
  • 6.
    School of ComputerScience and Engineering Stage 1: Keyword Profiling  Input: 1. Citation summary of the target paper 2. Citation summaries of all papers in the target paper’s domain  Method: one-tailed Fisher’s exact test  Output: keyword profile 6/12
  • 7.
    School of ComputerScience and Engineering Stage 2: Keyword Profile Language Modelling  Input: paper keyword profile  Method: negative log transformation  Output: keyword profile language model (KPLM) 1. Directly encodes words’ salience as pseudo generative probabilities 2. More discriminative than a traditional language model 7/12
  • 8.
    School of ComputerScience and Engineering Stage 3: Summarisation as Model Divergence IR  Input: keyword profile language model  Method: negative cross entropy retrieval model  Output: top-k sentences whose MLEs are of the smallest divergence to the KPLM 8/12
  • 9.
    School of ComputerScience and Engineering Stage 4: Novelty-driven Re-ranking  Input: top-k ranked sentence pool  Method: Top Sentence Re-ranking (TSR)  Output: 5-sentence extractive summary 9/12
  • 10.
    School of ComputerScience and Engineering Results  Evaluation method: Pyramid score  Performance comparison:  Resilience to more stringent summarisation size limit: 10/12
  • 11.
    School of ComputerScience and Engineering Conclusion 11/12  KPLM -- a multi-stage statistical summarisation framework: 1. Keyword profiling 2. Keyword profile language modelling 3. Summarisation as model divergence based IR 4. Novelty-driven re-ranking  State-of-the-art performance in summarising scientific papers  Good resilience to more stringent summary length limit  Future Work:  Higher order n-grams  Multiple paper summarisation
  • 12.
    School of ComputerScience and Engineering THANK YOU! 12/12 QUESTIONS?

Editor's Notes

  • #3 Firstly, I would like to introduce the motivation behind this work. As we all know that science is not an isolated endeavour but benefits from and expands on the work of others with more or less cross fertilisation bt disciplines Researchers find themselves constantly in need to explore the scientific literature further from the core of their research This thus calls for utilities that can make this task less daunting. Our aim of this work is then to develop such a tool than can facilitate identification of the key contributions of a paper towards making this task less daunting. This tool first identifies Firstly, we aim to automatically identify the most important and unique contributions of a paper Furthermore, we aim to automatically extract information-rich sentences that cover those contributions to form an extractive summary for the paper to provide better contexts of how its contributions are discussed
  • #4 Well, how can we do this automatically? A first question is from what textual sources could we extract contributions of a paper? An information source came to mind immediately is the abstract of a paper. Arguably, the citation summary can be deemed as a form of crowd-sourced review of a paper’s main contributions. Former work had confirmed that citation summaries contain more focused coverage of a paper’s contributions. We thus chose citation summaries as our corpus to mine contributions from. As a case study, we use Qazvinian’s … in this paper … After deciding the source from which to mine contributions of a paper and identifying the statistical characteristics of main contributions of a paper, we introduce the data used in our experiments.
  • #5 After we have decided the information source, a second question we need to answer is: how can we qualify key contributions of a paper, or what are their characteristics? Let’s imagine a tiny domain containing 3 papers, whose citation summaries are represented with those circles and we aim to find main contributions from this green circle. Let’s further denote citation sentences containing a word W1 with blue triangles. We can see that W1 is evenly distributed across the 3 citation summaries. One would then heuristically conclude that W1 is not likely to be a keyword of the green paper. In contrast, we mark out citing sentences containing a word W2 with red dots, whose distribution has a high concentration in the green circle. Intuitively, W2 is much more likely to be a main contribution to the paper corresponding to the green circle as 8 out of the 10 total mentions of the word W2 belong to this paper’s citation summary. The above intuitive observation translates into the following statistical characteristics of paper’s key contributions which can be measured along 2 dimensions.
  • #6 After consolidating the theoretical background, we present our approach, a multi-stage extractive summarisation framework consists of 4 stages pipelined together with later stages consuming the output of early stages In the first stage, we take the citation summary of the target paper and those of papers belong to the same domain as input and generate a keyword profile of the target paper capturing words’ over-representedness and exclusiveness in the target paper’s citation summary In the second stage, paper keyword profile created in the first stage are processed into a keyword profile language model that incorporates words’ salience in reflecting the paper’s main contributions into pseudo generative probabilities … In the third stage, we cast the task of extractive summarisation as model-divergence based IR and select the top-k ranked sentences best conforming to the paper’s KPLM … It is a common problem that the top-ranked sentences tend to repetitively cover the most important contributions of a paper, while fall short in the diversity of coverage. In the final stage, we use a novelty-driven sentence re-ranking method to diversify contribution coverage and produce the extractive summary of a paper … The following few slides will dive into more details of each phase of our pipelined approach.
  • #7 In the first stage of our framework, we build a keyword profile for a target paper, our method is informed by the statistical characteristics of key contributions discussed earlier. We use the Fisher’s exact test to measure each word’s salience in characterising the target paper’s main contributions. More specifically, we use the following hypergeometric distribution to model words’ distribution in citation sentences … So this is the probability observing exactly k citing sentences in the target paper’s citation summary containing word W. We subsequently calculate the salience of each word using the following 1-tailed Fisher’s exact test and the salience scores are simply the p-values. The output is the target paper’s keyword profile in the form of a list of words ranked by their p-values. It statistically and objectively captures words’ salience in characterising the target paper’s main contributions using statistical surprise. Compared to the human generated key contributions list on the right, it can be seen that our method is highly accurate in identifying the paper’s main contributions that closely mirror those picked by human experts.
  • #8 So we have produced a keyword profile for a target paper and we know that keywords are of different importance, a good summariser should favour sentences covering the most important ones. Intuitively, the keyword profile of a paper containing valuable information on words’ salience in characterising the paper’s main contributions should be utlised to drive such a discriminative sentence selection process. Therefore in the second stage, we build a discriminative unigram language model using the paper’s keyword profile to incorporate such information. We achieve this by converting words’ salience into pseudo counts using negative log transformation and normalise into a probability distribution. The output of stage 2 is a keyword profile language model of the paper. Here is an illustrative KPLM built for an imaginary document consists of only 5 distinct words, W1 to W5 … We can see that … W5 is a non keyword with the lowest salience possible and it gets automatically eliminated from the resulting KPLM. It can be seen that the produced keyword profile language model can function as a true language model, but it has the following advantages over a traditional one: Firstly, the KPLM is not constructed using actual words frequencies but words’ pseudo frequencies that directly encode words’ salience in characterising the target paper’s main contributions. Secondly, the pseudo document that corresponds to the target paper’s keyword profile is essentially a bag of keywords repeating themselves according to their importance, it thus has more discriminative power than a traditional language model.
  • #9 The KPLM of a paper is a discriminative generative model that incorporates words’ salience in characterising its main contributions in generative probabilities. It thus represents an effective language model from which a model citing sentence covering the paper’s main contributions could be sampled from. We therefore in the third stage, cast the task of extractive summarisation to model divergence based IR More specifically, we adopt the negative cross entropy retrieval model to select the citation sentences that best conform to the KPLM of the paper. We first build a maximum likelihood estimation LM for each citing sentence Then we measure the divergence B/T the MLE of each citing sentence to the KPLM of the paper – the smaller the divergence, the better the citing sentence in capturing the target paper’s main contributions.
  • #10 We have now got a sentence pool containing the top-k citing sentences that best conform to the KPLM of the target paper. However, they are likely to repetitively cover the most important contributions while fall short in the diversity of contributions covered. Therefore the 4th and final stage of our framework performs novelty-driven re-ranking to diversify contributions coverage. We call our method top sentence re-ranking or TSR, which implements a very simple heuristic. We first select the top-ranked sentence in the sentence pool into our extractive summary, Then in subsequent iterations we select the citing sentence left in the pool with the largest divergence from the current extractive summary and append it to the end of the extractive summary till the summary length limit is reached, which is conventionally 5. And we output the extractive summary
  • #11 For evaluation, we use the pyramid method, a popular evaluation metric that scores summaries by how well they cover humanly picked keywords. More specifically, we first calculate the total weights of keywords a summary covers. The pyramid score of a summary is the ratio B/T the weighted sum of keywords it covers and that of an optimal summary of the highest total weights. Comparing our results with a state-of-the-art system, C-LexRank, it is clear that KPLM performs better. Finally, an artificially imposed limitation in the evaluation is the summary length limit, which may be adjusted to suit a specific application context. The summarisation task becomes increasingly more challenging when this limit is further tightened. To further evaluate KPLM’s performance under more stringent summarisation length limit, we gather its mean pyramid scores achieved under limits decreasing from 5 to 1 and visualise the results.
  • #12 We present KPLM, a multi-stage statistical summarisation framework consists of 4 stages … In the future, we plan to expand our method to higher order n-grams and see if larger information units would further boost summarisation performance Also, we plan to expand our approach to multiple paper summarisation, and more specifically to automatically generate surveys for scientific domains
  • #13 12