Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Combining Textual and Graph-Based Features
for Named Entity Disambiguation using
Undirected Probabilistic Graphical Models...
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
2
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
3
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
4
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
5
Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, db...
Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, db...
Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, db...
NERFGUN
● Undirected Factor Graphs
● Collective disambiguation
9
NERFGUN
● Undirected Factor Graphs
● Collective disambiguation
● Textual & Graph-based Features (could be any language)
● ...
● Generates new states from given state
● Markov Chain Monte Carlo
Inference
is the capital ofIstanbul and the largest cit...
● Generates new states from given state
● Markov Chain Monte Carlo
● State - partial or full assignment
Inference
is the c...
Objective Score
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_Unio...
Inference - Initial State
is the capital ofIstanbul and the largest city in theTurkey E.U.
14
Inference - Initial State
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport db...
Inference
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:Eur...
Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport db...
Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport db...
Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport db...
Features
● PageRank - computed for all DBpedia resources using random walk
20
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between sur...
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between sur...
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between sur...
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between sur...
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turk...
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turk...
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turk...
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turk...
Model Training
● SampleRank - learning weights for features
● Datasets : AIDA/CoNLL Training & MicroPost 2014 Training
29
Model Training - Local Evaluation
30
Model Training - Local Evaluation
31
PageRank + Term Frequency + Edit Distance 0.70 -short text
PageRank + Topic Specific ...
Comparison
● GERBIL - framework for benchmarking named entity disambiguation and
recognition, question answering
● State-o...
33
34
Conclusion
● Collective disambiguation of named entities
● Model based on factor graphs to capture dependencies between an...
Conclusion
● Collective disambiguation of named entities
● Model based on factor graphs to capture dependencies between an...
Upcoming SlideShare
Loading in …5
×

Combining Textual and Graph-Based Features for Named Entity Disambiguation using Undirected Probabilistic Graphical Models

182 views

Published on

named entity disambiguation

Published in: Software
  • Be the first to comment

Combining Textual and Graph-Based Features for Named Entity Disambiguation using Undirected Probabilistic Graphical Models

  1. 1. Combining Textual and Graph-Based Features for Named Entity Disambiguation using Undirected Probabilistic Graphical Models Sherzod Hakimov, Hendrik ter Horst, Soufian Jebbara, Matthias Hartung & Philipp Cimiano Semantic Computing Group CITEC, Bielefeld University 1
  2. 2. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 2
  3. 3. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 3
  4. 4. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 4
  5. 5. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 5
  6. 6. Candidate Retrieval ● Index from DBpedia & Wikipedia data with Frequency values ○ DBpedia label properties (rdfs:label, dbo:firstName, etc.) ○ Wikipedia anchors 6
  7. 7. Candidate Retrieval ● Index from DBpedia & Wikipedia data with Frequency values ○ DBpedia label properties (rdfs:label, dbo:firstName, etc.) ○ Wikipedia anchors Link : dbr:Barack_Obama Term : “Barack Obama” Frequency : 1020 Link : dbr:Presidency_of_Obama Term : “Barack Obama” Frequency : 10 7
  8. 8. Candidate Retrieval ● Index from DBpedia & Wikipedia data with Frequency values ○ DBpedia label properties (rdfs:label, dbo:firstName, etc.) ○ Wikipedia anchors 8
  9. 9. NERFGUN ● Undirected Factor Graphs ● Collective disambiguation 9
  10. 10. NERFGUN ● Undirected Factor Graphs ● Collective disambiguation ● Textual & Graph-based Features (could be any language) ● Comparable with state-of-the-art systems 10
  11. 11. ● Generates new states from given state ● Markov Chain Monte Carlo Inference is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_UnionState s 11
  12. 12. ● Generates new states from given state ● Markov Chain Monte Carlo ● State - partial or full assignment Inference is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:IstanbulState s 12
  13. 13. Objective Score is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Union 13
  14. 14. Inference - Initial State is the capital ofIstanbul and the largest city in theTurkey E.U. 14
  15. 15. Inference - Initial State is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Randomly initialized 15
  16. 16. Inference is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si 16
  17. 17. Inference - Atomic change is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Commission atomic change 17
  18. 18. Inference - Atomic change is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Commission is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Commission ... 1 annotation changes New state from all possible candidates 18
  19. 19. Inference - Atomic change is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Commission is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Union Output : List of new states ... 19
  20. 20. Features ● PageRank - computed for all DBpedia resources using random walk 20
  21. 21. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI 21
  22. 22. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI ● Edit distance - Levenshtein distance between URI and surface form 22
  23. 23. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI ● Edit distance - Levenshtein distance between URI and surface form ● Document Similarity - Text similarity of the given document and DBpedia abstracts of each annotation 23
  24. 24. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI ● Edit distance - Levenshtein distance between URI and surface form ● Document Similarity - Text similarity of the given document and DBpedia abstracts of each annotation ● Topic Specific PageRank - computed for all DBpedia resources (while noting the source and target nodes for each walk) using random walk 24
  25. 25. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union 25
  26. 26. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union Edit distance 26
  27. 27. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union Term Frequency e.g. inverted index 27
  28. 28. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union Term Frequency Edit distance PageRank Topic Specific PageRank dbr:Turkey dbo:abstract “Turkey (/ˈtɜːrki/; Turkish: Türkiye [ˈtyɾcije]), officially the Republic of Turkey ... “ PageRank 28
  29. 29. Model Training ● SampleRank - learning weights for features ● Datasets : AIDA/CoNLL Training & MicroPost 2014 Training 29
  30. 30. Model Training - Local Evaluation 30
  31. 31. Model Training - Local Evaluation 31 PageRank + Term Frequency + Edit Distance 0.70 -short text PageRank + Topic Specific PR + Term Frequency + Edit Distance 0.78 -long text
  32. 32. Comparison ● GERBIL - framework for benchmarking named entity disambiguation and recognition, question answering ● State-of-the-art systems : AGDISTIS, AIDA, DBpedia Spotlight, TagMe, Babelfy, etc. 32
  33. 33. 33
  34. 34. 34
  35. 35. Conclusion ● Collective disambiguation of named entities ● Model based on factor graphs to capture dependencies between annotations ● Impact of combining different features ● Achieves better on unseen datasets ● Comparable results to state-of-the-art 35
  36. 36. Conclusion ● Collective disambiguation of named entities ● Model based on factor graphs to capture dependencies between annotations ● Impact of combining different features ● Achieves better on unseen datasets ● Comparable results to state-of-the-art Thank you! 36

×