Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

  1. 1. A SUPERVISED KEYPHRASE EXTRACTION SYSTEM Semantics 2016, Leipzig Kolawole. J, Adebayo Luigi, Di Caro Guido, Boella
  2. 2. Outlines • Introduction • Related Works • Methodology • Experiments • Conclusions
  3. 3. Introduction • Keyphrases – What? – Why? • Document Indexing , Document Summarization , Clustering and visualization • Keyphrase Assignment Vs Keyphrase Extraction • Unsupervised Vs Supervised • Classification Vs Ranking
  4. 4. Introduction Semantic Features Supervised KeyPhrase Extraction, Keyphrase, Keyword
  5. 5. Related Works Algorithm Classification Features Algorithm Witten et al (1999) Statistical TF, TFIDF, length (bi or tri), first occurrence, node degree etc. Naïve Bayes P. Turney (2000) Statistical phrase frequency, position, TF, TFIDF, n-gram Overlap, etc. C4.5, Genex Hulth (2003) Linguistic Lexical and syntactic features Mihalcea and Tarau (2004) Graph Based Unsupervised TextRank Medelyan et al (2010) Graph Based Statistical, lexical , syntactic features MAUI
  6. 6. Methodology Training Document Select Candidate Extract Feature for Candidates Combine features with Classifier Training Document Select Candidate Extract Feature for Candidates Predictor Extracted Keyphrases
  7. 7. Methodology • Candidate Selection – extracts ngrams (range = 1-4) that do not start or end with a stopword – Candidate should not be proper nouns – Candidate should not end with adjective – Candidates could start with Abbreviation – Verbs are down-weighted
  8. 8. Methodology Category Description Statistical TF, TFIDF, Keyphrase Length Positional First and last point of appearance, geographical spread e.g., upper section, mid section and lower section. Also key candidates’ span Lexical NP, NE, Ngrams Semantic Wikipedia Lookup (Freq in Wikipedia), does it have wikipedia page, in-out link freq on wikipedia page Semantic LDA Topic count (T=50) Semantic Candidate similarity to POS-filtered words (Proper Nouns, Verbs and Adjective)
  9. 9. Methodology POS filtered n- grams (2,3,4) Candidate keyphrase Embedding Similarity
  10. 10. Results Features Dataset Precision Recall F-Measure Meldeyan et al (2010) Marujo 49.4 - - Marujo et al (2013) Marujo 55.4 - - All-features Marujo 58.3 42.0 48.8 Selected- features Marujo 48.7 36.5 41.7 Table 1: Evaluation result on Marujo dataset
  11. 11. Results Features Dataset Precision Recall F-Measure Selected Features Combined 29.9 20.3 16.9 Selected Features Reader 26.4 17.1 20.7 All-features Combined 32.7 21.0 25.5 All-features Reader 30.2 18.1 22.6 Table 2: Evaluation result on Semeval dataset
  12. 12. Results Features Dataset Precision Recall F-Measure (2,5,6,7,8,9) Combined 32.1 20.6 25.0 (1,2,5,7,8,9) Combined 31.8 20.1 24.7 (2,4,5,7,8,9) Combined 30.2 17.7 22.3 (3,4,6,7,8,9) Combined 27.4 16.3 20.4 Table 3: Ablation test on Semeval dataset
  13. 13. Good or Bad? Supervised Keyphrase Extraction, Keyphrase Extraction system, supervised machine learning, Random Forest algorithm, Feature Engineering, Candidate Word,Keyphrase Extraction, Behavioural sciences, supervised classification, Keyphrase overlap
  14. 14. References • A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 216{223. Association for Computational Linguistics, 2003. • S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientic articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21- 26. Association for Computational Linguistics, 2010. • L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light fltering and co-reference normalization. arXiv preprint arXiv:1306.4886, 2013. • P. Turney. Learning to extract keyphrases from text. 1999. • Xin Jianga, Yunhua Hub, Hang Lib : A Ranking Approach to Keyphrase Extraction, 2010 • T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientic publications. In Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317{326. Springer, 2007. • I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254{255. ACM, 1999. • R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational Linguistics, 2004.
  15. 15. Conclusions • Many Thanks For The Attention!!!