Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations

833 views

Published on

Paper presented at NIPS 2014 workshop on modern machine learning and natural language processing.

Many algorithms for natural language processing rely on manual feature engineering.
In this paper, we show that we can achieve state-of-the-art performance for part-of-speech tagging of Twitter microposts by solely relying on automatically inferred word embeddings as features and a neural network.
By pre-training the neural network with large amounts of automatically labeled Twitter microposts to initialize the weights, we achieve a state-of-the-art accuracy of 88.9% when tagging Twitter microposts with Penn Treebank tags.

Published in: Internet
  • Be the first to comment

Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations

  1. 1. http://multimedialab.elis.ugent.be Ghent University – iMinds, ELIS Department/Multimedia Lab Gaston Crommenlaan 8 bus 201 B-9050 Ledeberg – Ghent, Belgium Fréderic Godin, Baptist Vandersmissen, Azarakhsh Jalalvand, Wesley De Neve and Rik Van de Walle Workshop on Machine Learning and NLP, NIPS 2014 Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations 12/12/2014, Montreal, Canada Research Question Vote-Constrained Bootstrapping* Can we avoid manual feature engineering when developing a Part-of-Speech tagger for Twitter microposts? @frederic_godin, @BaptistV, @wmdeneve and @rvdwalle Solution Automatically learn features on 400 million raw Twitter microposts that capture syntactic and semantic patterns and feed them to a neural network Learn Features Train the Part-of-Speech Tagger 400 million Word2vec Skip-gram 400D vector 400D 400D vector 400D 400D 444000000DDD v vVeeecccttotoorr r Hidden Layer (500D) Output Layer (52D) im doin good VBG Evaluation ARK tagger GATE tagger im doin good VBG V Agree? Automatically generate high confidence labeled data Use this data to pre-train the neural network *Derczynski et al., 2013. "Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data" Word2vec dataset Pre-training dataset Accuracy validation set Accuracy test set 150M / 87.95% 87.46% 150M 50K 89.64% 88.82% 400M 50K 89.73% 88.95% 400M 125K 90.09% 88.90% Ritter et al. (2011) 84.55% Derczynski et al. (2013) 88.69%

×