Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi-Task Learning of Keyphrase Boundary Classification (ACL 2017)

Full paper:
Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, and cast the task as a multi-task learning problem with deep recurrent neural networks. Our multi-task models perform significantly better than previous state of the art approaches on two scientific KBC datasets, particularly for long keyphrases.

  • Be the first to comment

  • Be the first to like this

Multi-Task Learning of Keyphrase Boundary Classification (ACL 2017)

  1. 1. Isabelle Augenstein*§, Anders Søgaard§ University College London*§, University of Copenhagen§ {augenstein | soegaard} Multi-Task Learning of Keyphrase Boundary Classification Multi-Task LearningKeyphrase Boundary Classification Mention-level keyphrase identification & classification: PROCESS (e.g. methods, equipment), TASK, MATERIAL (e.g. corpus, physical materials) Challenges •  Novel task Ø  Scarcity of training data •  Mention-level prediction •  Many long-tailed phrases Ø  Existing KBs only of limited use Results Acknowledgements This work was partially supported by Elsevier. Data Statistics of training sets Discussion Overall •  Improvements over baselines for all our models •  Stronger gains for SemEval dataset Analysis •  SemEval contains more long keyphrases than ACL dataset •  Might explain performance difference •  ACL dataset contains many singletons •  No strong impact on performance •  Problematic for all models - •  Vague/broad keyphrases (e.g. `items’, `scope’, `key’) •  Long keyphrases containing clauses •  Benefit of multi-task models •  Better at recognising long keyphrases References SemEval 2017 Task 10 ScienceIE: Models 3-layer BiLSTMs with SENNA embeddings Auxiliary Tasks •  Chunking (Penn Treebank) •  FrameNet (target ID & classification) •  Hyperlink prediction •  Multiword ID (Streusle Corpus) •  Supersense tagging (SemCor) Baselines •  Stanford NER (Finkel et al. 2005) •  Neural Transition-Based NER (Lample et al. 2016) 0 10 20 30 40 50 60 70 SemEval 2017 Task 10 dev (Augenstein et al. 2017) ACL RD-TEC 2.0 (QasemiZadeh and Schumann, 2016) Finkel et al. (2005) Lample et al. (2016) LSTM LSTM+Chunking LSTM+FrameNet LSTM+Hyperlinks LSTM+MulPword LSTM+Supersense of the named white pages entity allowed recognition me ( O O T M T M T O O O Outputs Shared Inputs Main Task Auxiliary Task … addresses the task of named entity recognition (NER), a subtask of information extraction, using conditional random fields (CRF). Our method is evaluated on the CoNLL-2003 NER corpus. SemEval ScienceIE ACL RD-TEC Labels 3 7 Topics CS, Phys, MS NLP # Keyphr. 5730 2939 Prop. Singleton Keyphr. 31% 83% Prop. 1-Word Keyphr. 18% 23% Prop. >=2-Word Keyphr. 82% 77% Prop. >=3-Word Keyphr. 51% 33% Prop. >=5-Word Keyphr. 22% 8%