Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Best and worst
summary
sentences in
each paper
found with an
oracle, used as
training data
A Supervised Approach to Extrac...
Upcoming SlideShare
Loading in …5
×

A Supervised Approach to Extractive Summarisation of Scientific Papers (CoNLL 2017)

202 views

Published on

Paper: https://arxiv.org/abs/1706.03946
Abstract:
Automatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on neural approaches to summarisation, which can be very data-hungry. However, few large datasets exist and none for the traditionally popular domain of scientific publications, which opens up challenging research avenues centered on encoding large, complex documents. In this paper, we introduce a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and show straightforward ways of extending it further. We develop models on the dataset making use of both neural sentence encoding and traditionally used summarisation features and show that models which encode sentences as well as their local and global context perform best, significantly outperforming well-established baseline methods.

Published in: Education
  • Be the first to comment

  • Be the first to like this

A Supervised Approach to Extractive Summarisation of Scientific Papers (CoNLL 2017)

  1. 1. Best and worst summary sentences in each paper found with an oracle, used as training data A Supervised Approach to Extractive Summarisation of Scientific Papers Ed Collins, Isabelle Augenstein, Sebastian Riedel {edward.collins.13 | i.augenstein | s.riedel}@ucl.ac.uk Select the sentences from within a paper which best summarise that paper. Binary classification task - each sentence classified as either summary or not. The Task Challenges Data and Evaluation Setup Length Data Approach Features in Order of Utility: • AbstractROUGE - ROUGE-L score of sentence and abstract, taking inspiration from other work on summarising scientific papers • TF-IDF • Keyphrase Score • Title Score • Document TF-IDF • Sentence Length • Section Sentence Occurred In • Numeric Count - number of numbers in the sentence Results & Conclusion • Classifiers which use a neural network to read text suffer no significant changes to performance if a feature is missing Code: https://github.com/EdCo95/scientific-paper-summarisation Papers are long - a lot of information to summarise No suitable datasets available to train data-hungry learning algorithms • Remaining challenges are to encode the whole document, rather than just a sentence, with neural networks to better understand the global context of each sentence • Significantly outperforms many baselines • Classifiers trained on the automatically extended dataset performed better than those trained without it 400K sentences with classifications Train Test (accuracy) 263K 130K 150 full papers Test (summary quality) Using ROUGE-L 10K Papers Each with highlight statements. Assume highlights are good summaries even out of context.

×