Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

2,343 views

Published on

My presentation of the paper that "Entity Linking meets Word Sense Disambiguation: a Unified Approach" (TACL 2014), Andrea Moro, Alessandro Raganato, Roberto Navigli (University of Roma)

Published in: Engineering
  • A professional Paper writing services can alleviate your stress in writing a successful paper and take the pressure off you to hand it in on time. Check out, please ⇒ HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

  1. 1. Entity Linking meets Word Sense Disambiguation: a Unified Approach Andrea Moro, Alessandro Raganato, Roberto Navigli (University of Roma) TACL Vol.2 (5/2014) pp 231-244 Presenter : Koji Matsuda (Tohoku University) 最先端NLP勉強会#6 @東京大学 Entity Linking meets Word Sense Disambiguation: a Unified Approach 1
  2. 2. WSD and Entity Linking Together Input Text Entity Linking meets Word Sense Disambiguation: a Unified Approach 2 Lexical Knolwdge Base Emcyclopedical Knolwdge Base Integrated Knowledge Base Thomas and Mario are strikers playing in Munich. They are … Thomas Muller striker Munich Mario Gomez Thomas Millan playing FC Bayern Munich Semantic Interpretation Graph Semantic Signature → Select most suitable meaning on the Graph
  3. 3. Background : Word Sense Disambiguation(WSD) in a Nutshell “Thomas and Mario are strikers playing in Munich” Entity Linking meets Word Sense Disambiguation: a Unified Approach 3 Knowledge base (i.e. WordNet) WSD System Sense of target word
  4. 4. Background : Entity Linkng(EL) in a Nutshell “Thomas and Mario are strikers playing in Munich” Entity Linking meets Word Sense Disambiguation: a Unified Approach 4 Knowledge base (i.e. Wikipedia) EL System Named Entity • Mention Detection • Link Detection • Entity Disambiguation
  5. 5. it is encountered in other documents—regardless of context. This approach will always make mistakes, no matter what threshold is chosen. No matter how small a terms link probability is, if it exceeds zero then, by definition, there is some context in which Figure 4), there are multiple link probabilities. These are combined into two separate features: the average and the maximum. The former is expected to be more consistent, but the latter may be more indicative of links. For example, Democratic Hilary Rodham Clinton Democratic Party (United States) Barack Obama Delegate President of the i ure 4: Associatin document hrases with a ro riate Wiki edia articles Entity Linking meets Word Sense Disambiguation: a Unified Approach Michigan (US State) 5 United States Image taken from Milne and Witten (2008b). Learning to Link with Wikipedia. In CIKM '08. Florida (US State) Nomination Voting Democrat
  6. 6. A Joint approach to WSD and EL • Knowledge-based approaches perform well on both these two tasks: – The main difference is the kind of inventory(knowledge-base) used • If we had knowledge-base that contain concept and named entity, we can do it together ! Entity Linking meets Word Sense Disambiguation: a Unified Approach 6
  7. 7. BabelNet • Multilingual Encyclopedic Dictionary – Lexicographic & Encyclopedic knowledge – Based on Automatic Integration of : • WordNet, Wikipedia, Wiktionary, … Named Entities and specialized concepts from Wikipedia 50 Languages 21M definitions 62M entries Concepts from WordNet 7 Concepts integrated from both Entity Linking meets Word Sense Disambiguation: a Unified Approach resources
  8. 8. Babelfy: A Joint Approach to WSD and EL 1. Precompute Semantic Signatures; 2. Select all the possible candidate meanings from BabelNet by matching mentions with BabelNet lexicalizations; 3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph); 4. Extract a dense subgraph containing semantically coherent candidate; 5. Select the most connected candidate for each fragment; Entity Linking meets Word Sense Disambiguation: a Unified Approach 8
  9. 9. Step 1 : Compute Semantic Signature • Semantic Signatures : Set of relevant vertices for a given vertex in the semantic network – computed by using RandomWalk with Restart(RWR) over BabelNet 1. Start from the target vertex of the semantic network ; 2. Randomly select a neighbor of the current vertex or restart from the target vertex; 3. Keep the counts of the hitting frequencies; 4. Take the most visited vertices; Entity Linking meets Word Sense Disambiguation: a Unified Approach 9
  10. 10. Example of Semantic Signature striker offside athlete sport soccer player semSign(“striker”) = { “sport”, “offside”, “soccer player”, “athrete”, … } Entity Linking meets Word Sense Disambiguation: a Unified Approach 10
  11. 11. Babelfy: A Joint Approach to WSD and EL 1. Precompute Semantic Signatures; 2. Select all the possible candidate meanings from BabelNet by matching mentions with BabelNet lexicalizations; 3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph); 4. Extract a dense subgraph containing semantically coherent candidate; 5. Select the most connected candidate for each fragment; Entity Linking meets Word Sense Disambiguation: a Unified Approach 11
  12. 12. Step 3: Construct SI Graph (1) Create vertex that all candidate meaning of the text (Algorithm 2, Line 6 – 8) Example: Thomas and Mario are strikers playing in Munich ( forward, striker) ( striker, striker) (Mario Adorf, Mario) (Mario Gomez, Mario) ( Munich, Munich) ( FC Bayern Munich, Munich) ( Thomas Milan, Thomas ) ( Thomas Muller, Thomas ) Entity Linking meets Word Sense Disambiguation: a Unified Approach 12 ( Mario Basler, Mario)
  13. 13. Step 3: Construct SI Graph (2) Connect related meanings based on Semantic Signature (Algorithm 2 , Line 9 – 11) Example: Thomas and Mario are strikers playing in Munich ( Thomas Milan, Thomas ) (Mario Gomez, Mario) ( Thomas Muller, Thomas ) ( Mario Basler, Mario) ( Munich, Munich) ( FC Bayern Munich, Munich) ( forward, striker) ( striker, striker) Entity Linking meets Word Sense Disambiguation: a Unified Approach 13 (Mario Adorf, Mario)
  14. 14. Babelfy: A Joint Approach to WSD and EL 1. Precompute Semantic Signatures; 2. Select all the possible candidate meanings from BabelNet by matching mentions with BabelNet lexicalizations; 3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph); 4. Extract a dense subgraph containing semantically coherent candidate; 5. Select the most connected candidate for each fragment; Entity Linking meets Word Sense Disambiguation: a Unified Approach 14
  15. 15. Step 4: Densest subgraph Heuristics • Reducing the level of ambiguity of the SI graph is helpful – Main Idea : Most suitable meaning of flagment will belongs to the densest area of the graph – But, Identifying densest subgraph problem is NP-hard – Iterative removal of low-coherence vartices • Algorithm 3 • Identify most ambiguous flagment fmax and discard weakest interpretation of fmax iteratively Entity Linking meets Word Sense Disambiguation: a Unified Approach 15
  16. 16. Step 4: Densest subgraph Heuristics Remove weakest interpretation of the flagement iteratively (Algorithm 2 , Line 12) Example: Thomas and Mario are strikers playing in Munich ( Thomas Milan, Thomas ) (Mario Gomez, Mario) ( Thomas Muller, Thomas ) ( Mario Basler, Mario) ( Munich, Munich) ( FC Bayern Munich, Munich) ( forward, striker) ( striker, striker) Entity Linking meets Word Sense Disambiguation: a Unified Approach 16 (Mario Adorf, Mario)
  17. 17. Babelfy: A Joint Approach to WSD and EL 1. Precompute Semantic Signatures; 2. Select all the possible candidate meanings from BabelNet by matching mentions with BabelNet lexicalizations; 3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph); 4. Extract a dense subgraph containing semantically coherent candidate; 5. Select the most connected candidate for each fragment; Entity Linking meets Word Sense Disambiguation: a Unified Approach 17
  18. 18. Step 5: Select most reliable meanings Step3. Select most suitable meaning for each flagment with normalized weighted degree ( Eq. (2) ) (Algorithm 2 , 14 – 18 ) Example: Thomas and Mario are strikers playing in Munich ( Thomas Milan, Thomas ) (Mario Gomez, Mario) ( Thomas Muller, Thomas ) ( Mario Basler, Mario) ( FC Bayern Munich, Munich) ( forward, striker) ( striker, striker) Entity Linking meets Word Sense Disambiguation: a Unified Approach 18
  19. 19. Step 5: Select most reliable meanings • Lexical coherence : Fraction of fragments the candidate related to : • Semantic coherence : Graph centrality measure among the candidate meanings : Entity Linking meets Word Sense Disambiguation: a Unified Approach 19 Lexical cohorence Semantic cohorence deg(v) : number of incoming and outcoming edges of v
  20. 20. Experiment • WSD Datasets (only nominal mentions): – SemEval-2013 task12 : Multilingual WSD • English, French, German, Italian, Spanish – SemEval-2007 task 7 : Coarse-grained WSD – SemEval-2007 task17 : Fine-grained WSD – Senseval-3 WSD : Fine-grained WSD • Entity Linking Datasets: – KORE50 : 50 short sentences, YAGO2 – AIDA-CoNLL : 1392 articles, YAGO Entity Linking meets Word Sense Disambiguation: a Unified Approach 20
  21. 21. Result : WSD (fine grained) Proposed Baseline Entity Linking meets Word Sense Disambiguation: a Unified Approach 21 F1 Score
  22. 22. Result : Entity Linkng Proposed EL only Entity Linking meets Word Sense Disambiguation: a Unified Approach 22 Accuracy
  23. 23. Impact of Components Word Sense Disambiguation Entity Linking System Sens3 Sem07 Sem07 (coarse) Sem13 (EN, BN) KORE50 CoNLL Babelfy (proposed) 68.3 62.7 85.5 69.2 71.5 82.1 + unif. weight 67.0 65.2 85.7 68.5 69.4 81.7 + w/o dens. sub. 68.3 63.3 84.9 68.7 62.5 78.1 + only concepts 68.2 62.7 85.3 68.7 - - + only NE - - - - 68.1 78.8 + on senteces 66.0 65.2 82.3 67.1 - - Observations : • Triangle-based weighting has a smaller impact • Densest subgraph heuristics is effective in EL taks, but not for WSD dataset • Joint use of lexicographic and encyclopedic knowledge has benefit on each of task Entity Linking meets Word Sense Disambiguation: a Unified Approach 23
  24. 24. Conclusion • Integrated approach to EL and WSD – Semantic signature (random walk over knowledge-base) – Unconstrained identification of candidate meaning – Linkng based on high-coherence densest subtree heuristics • The approach exploits two key feature of BabelNet – Multilinguality and integration of lexicographic and encyclopedic knowledge • The result of the experiments show state-of-the-art performance – Robustness across language – Ablation tests shows which component is needed to contribute performance Entity Linking meets Word Sense Disambiguation: a Unified Approach 24
  25. 25. 感想など • Roberto Navigli のチームが長年取り組んできたKnowledge-base があっ てこその研究 – BabelNet: WordNet concept + Wikipedia NE (50言語) + etc… • 手法自体はそれほど凝ったものではないが,使っている知識ベースの性 質をよく生かしている – Semantic Signature については, [Pilehvar and Navigli 2013]の定義を簡略化 (語義上の多項分布=> 単なる語義集合) • WSD がEL の役に立つ(同時に解くことに意味がある),ということを述べ た初の研究 – しかし,実験結果からはEL がWSDの性能向上に有効,とは言えないと思う – BabelNetの多言語性が有効に働くかについては,Multilingual WSDの結果の みなので,明確には言えない • RandomWalk with Restart(RWR)はまんまPageRank じゃないかと思って著 者に聞いてみたら,”Yes, 無限回計算したらPersonalized PageRankと同じ 値になるよ”とのこと Entity Linking meets Word Sense Disambiguation: a Unified Approach 25
  26. 26. http://babelfy.org ※ SDK, REST APIもあるとのことです. Entity Linking meets Word Sense Disambiguation: a Unified Approach 26

×