Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An End-to-End Entity Linking Approach for Tweets

5,632 views

Published on

This presentation is presented in #Microposts 2015 at WWW2015

Published in: Software
  • Be the first to comment

An End-to-End Entity Linking Approach for Tweets

  1. 1. An End-to-End Entity Linking Approach for Tweets Ikuya Yamada1,2,3 Hideaki Takeda3 Yoshiyasu Takefuji2 1 Studio Ousia 2 Keio University 3 National Institute of Informatics 15年5月18日月曜日
  2. 2. STUDIO OUSIA Entity linking ‣ Entity Linking: The task of linking entity mentions to entries in a knowledge base (KB) (e.g., DBpedia, Wikipedia) ‣ Recently the task has received considerable attention ✦ Many research papers (2006-) [Cucerzan 2007, Milne et al. 2008, etc.] 2 Kyary Pamyu Pamyu is a Japanese model and singer. Her public image is associated with Japan's kawaisa culture centered in the Harajuku, Tokyo. Harajuku wikipedia/Harajuku wikipedia/Kawaii KawaiiKyary Pamyu Pamyu wikipedia/Kyary_Pamyu_Pamyu 15年5月18日月曜日
  3. 3. STUDIO OUSIA Background ‣ Twitter entity linking is difficult due to its noisy, short and colloquial nature ‣ Existing methods focus on well- written, long text (e.g., Illinois Wikifier and Wikipedia Miner) 3 15年5月18日月曜日
  4. 4. STUDIO OUSIA Outline ‣ Architecture ‣ Implementation ✦ End-to-End Entity Linking ✦ NIL mention detection ✦ Type prediction ‣ Experimental Results 4 15年5月18日月曜日
  5. 5. STUDIO OUSIA Architecture 5 Entity Linking NIL Mention Detection Input Text Results Four supervised machine-learning models are used Type Prediction (KB Entity Mentions) Type Prediction (NIL Mentions) ✦ Entity mentions are detected in an end-to-end manner ✦ NIL entity mentions are detected using a separate model ✦ Entity type are predicted using two separate models 15年5月18日月曜日
  6. 6. Entity Linking 15年5月18日月曜日
  7. 7. STUDIO OUSIA Entity Linking: Mention-Entity Dictionary ‣ Mention-Entity dictionary maps mention surface to the possible referent entities ‣ The possible mentions surfaces of an entity are extracted from: ✦ the corresponding Wikipedia page title ✦ Page titles of the Wikipedia pages that redirect to the entity ✦ Anchor texts in Wikipedia articles that point to the entity 7 apple Apple_Inc. Apple (fruit) 15年5月18日月曜日
  8. 8. STUDIO OUSIA Entity Linking: Mention Candidate Generation ‣ All the n-grams (n <= 10) are considered as mention candidates ‣ Mention candidates are generated by looking up the n-gram surface in the mention-entity dictionary ‣ Several approximate string matching methods are used to deal with the mentions in irregular forms (e.g., misspelled, abbreviated, and acronyms) ✦ Fuzzy match, approximate search, acronym search ‣ We generate a lot of possible candidates here and remove invalid ones in the next step 8 Mention Candidate Generation Mention Detection and Disambiguation 15年5月18日月曜日
  9. 9. STUDIO OUSIA Entity Linking: Mention Detection and Disambiguation ‣ Supervised machine-learning is used to classify mention candidates ‣ Algorithm: Random forest ‣ Main features: ✦ Base: Link probability, Capitalization probability, Commonness, # of inbound links, TAGME entity coherence, etc... ✦ String similarity: Levenshtein, Jaro-Winkler, Soft TF-IDF, Jaccard Similarity, etc... ✦ Entity embedding-based similarity feature: Average cosine similarity between tweet words and the entity ✦ Temporal populariry feature: Temporal popularity knowledge obtained from Wikipedia page view data 9 Mention Candidate Generation Mention Detection and Disambiguation 15年5月18日月曜日
  10. 10. NIL Mention Detection 15年5月18日月曜日
  11. 11. STUDIO OUSIA NIL Mention Detection ‣ To classify each possible n-gram that whether it should be detected using supervised machine- learning ‣ Algorithm: Random forest ‣ Main features: ✦ The output of Stanford NER (standard model and without-capitalization model) ✦ The ratio of capitalized words in the tweet ✦ Part-of-speech tags, Orthographic features, # of words, etc... 11 15年5月18日月曜日
  12. 12. Type Prediction 15年5月18日月曜日
  13. 13. STUDIO OUSIA Type Prediction: KB Entity Mentions ‣ To classify the type of an entity that exists in KB ‣ Algorithm: Ensemble model of the random forest and logistic regression ‣ Main features: ✦ KB entity classes: Entity classes in open KBs (DBpedia Ontology Classes and Freebase Types) ✦ NER entity types: Predicted entity types by Stanford NER 13 15年5月18日月曜日
  14. 14. STUDIO OUSIA Type Prediction: NIL Entity Mentions ‣ To classify the types of an entity that does not exist in KB ‣ Algorithm: Ensemble model of random forest and logistic regression ‣ Main features: ✦ Word embeddings: Average vector of vectors of words in the n-gram. GloVe Twitter 2B model is used as the word embeddings ✦ NER entity types: Predicted entity types by Stanford NER 14 15年5月18日月曜日
  15. 15. STUDIO OUSIA Experimental Results ‣ Evaluated our method using #Microposts 2015 dataset spilit into a training set (3,498 tweets) and a dev set (500 tweets) ‣ Achieved accurate performance in all of the measures. 15 15年5月18日月曜日
  16. 16. STUDIO OUSIA Conclusions & Future Works ‣ A novel entity linking approach is presented ‣ An accurate mention detection seems to be very hard (even for a human!) especially in inconsistently capitalized tweets ‣ A lot of errors are observed in replies (@name) and hashtags (#tag), we might need a method to handle them ‣ Future work: further analysis of the model (e.g., feature study, experimenting other machine-learning algorithms) 16 15年5月18日月曜日
  17. 17. THANK YOU! 15年5月18日月曜日

×