Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

598 views

Published on

Presented at CoNLL 2016: http://aclanthology.info/events/conll-2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

  1. 1. Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation Ikuya Yamada1,2 Hiroyuki Shindo3 Hideaki Takeda4 Yoshiyasu Takefuji2 1 Studio Ousia 2 Keio University 3 Nara Institute of Science and Technology 4 National Institute of Informatics
  2. 2. STUDIO OUSIA Named Entity Disambiguation ‣ Named Entity Disambiguation (NED) is the task of resolving named entity mentions to their correct references in a knowledge base 2 /wiki/Frozen_(2013_film) New Frozen Boutique to Open at Disney's Hollywood Studios /wiki/
 The_Walt_Disney_Company /wiki/
 Disney’s_Hollywood_Studios
  3. 3. STUDIO OUSIA Joint Learning of Embedding of
 Words and Entities ‣ The proposed method extends skip-gram model to map words and entities into the same continuous vector space ‣ Three models are combined to train the embedding: ‣ KB graph model (graph) learns to estimate neighboring entities given the entity in the link graph of Wikipedia ‣ Anchor context model (anchor) learns to predict neighboring words given the entity using anchors and their context words ‣ Conventional skip-gram model (word) learns to predict neighboring words given the target word 3 Wikipedia link graph Neighboring words of words and anchors Aristotle was a philosopher + Logic Science Europe Socrates Renaissance Metaphysics Philosopher Philosophy AvicennaAristotle Plato
  4. 4. STUDIO OUSIA ‣ We propose two simple context models based on the proposed embedding: ‣ Textual context: cosine similarity between the vector of the target entity and the average vector of noun words in a document ‣ Coherence: cosine similarity between the vector of the target entity and the average vector of other entities in a document ‣ These context models and standard NED features (e.g., prior probability and entity prior) are combined using supervised machine-learning (GBRT) ‣ We achieved state-of-the-art accuracies on two popular NED datasets 4 SOTA accuracies on two popular datasets!

×