Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

463 views

Published on

Published in: Science, Technology, Design
  • Be the first to comment

  • Be the first to like this

David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

  1. 1. Entity Linking (at SEA) David Graus, University of Amsterdam Photo by TRPultz (Creative Commons Attribution 3.0 Unported License)
  2. 2. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 2 Today’s talk Ò What? Ò Why? Ò How? Ò Etc.
  3. 3. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 3 Entity Linking? Ò Link mentions of entities (in text) to their referent entities (in a KB) Ò Example:
 
 “During Tank Johnson’s tumultuous tenure with the Bears, incidents with guns got him arrested, jailed and suspended, and his close friend was shot and killed in front of him after an altercation at a Chicago bar.”
  4. 4. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 4 Entity Linking? Ò Link mentions of entities (in text) to their referent entities (in a KB) Ò Example:
 
 “During Tank Johnson’s tumultuous tenure with the Bears, incidents with guns got him arrested, jailed and suspended, and his close friend was shot and killed in front of him after an altercation at a Chicago bar.”
  5. 5. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 5 Entity Mention: Tank TANK (VEHICLE) Knowledge Base (KB) Document r TANK query q ? ? TANK JOHNSON
  6. 6. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 6 Entity Search Outline Ò What? Ò Why? Ò How? Ò Etc.
  7. 7. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 7
  8. 8. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 8 Social Media Monitoring
  9. 9. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 9
  10. 10. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 10 Entity Search Outline Ò What? Ò Why? Ò How? Ò Etc.
  11. 11. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 11
  12. 12. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 12 The Semanticizer Ò Open source framework (https://github.com/semanticize/semanticizer/) Ò Links to Wikipedia Ò Entity = Wikipedia Page Ò “Lexical matching” approach Ò no NER, information extraction http://semanticize.uva.nl/
  13. 13. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 13 Lexical matching Ò Construct “entity dictionaries” Ò By taking entity Titles ! ! Ò Anchors ! ! Ò Redirect pages
  14. 14. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 14 n-gram -> entity Ò Kendrick Lamar Ò K-Dot Ò Kendrick Ò K. Dot Ò Kendrick Duckworth Ò Kendrick Lamar' Ò Kendrick Lamar's Ò K Dot Ò Kendrick Lama Ò Kendrick Lamarr Ò Kendrick Llama Ò The Jig Is Up (Dump'n)
  15. 15. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 Ò For an input sentence s; ! ! ! ! Ò Retrieve all possible entity candidates “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’” 15 Start linking!
  16. 16. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 Ò For an input sentence s; ! ! ! ! Ò Retrieve) all possible entity candidates “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’” 16 Start linking! http://en.wikipedia.org/wiki/Eminem http://en.wikipedia.org/wiki/Good_(economics) http://en.wikipedia.org/wiki/Lamar_County,_Alabama http://en.wikipedia.org/wiki/Lamar_County,_Mississippi http://en.wikipedia.org/wiki/Lamar_Advertising_Company http://en.wikipedia.org/wiki/Kendrick,_Idaho http://en.wikipedia.org/wiki/Good_Kid_Maad_City http://en.wikipedia.org/wiki/Kendrick_Lamar http://en.wikipedia.org/wiki/Kendrick_School http://en.wikipedia.org/wiki/Lamar_Cardinals_basketball
  17. 17. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 17 Ranking entity candidates Ò “Prior probabilities” Ò link probability Ò commonness Ò sense probability
  18. 18. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 18 1. Link Probability Ò “Kendrick Lamar” occurs 698x on Wikipedia Ò as hyperlink: 501x Ò no hyperlink: 197x ! ! Ò “Kendrick” occurs 5.037x on Wikipedia Ò as hyperlink: 24x Ò no hyperlink: 5.014x ! 24 5.037 = 0,005 ! 501 698 = 0,718
  19. 19. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 19 2. “Commonness” Ò “Kendrick” is used to refer to: Ò Kendrick,_Idaho Ò Kendrick,_Oklahoma Ò T._D._Kendrick Ò Kendrick_School Ò John_Kendrick_(American_sea_captain) Ò Kendrick_Lamar Ò Francis_Kenrick Ò Kendrick Ò Howie Kendrick ! 8 3 3 2 2 2 2 1 1 ! / 24 / 24 / 24 / 24 / 24 / 24 / 24 / 24 / 24 ! = 0,333 = 0,125 = 0,125 = 0.083 = 0.083 = 0.083 = 0.083 = 0.042 = 0.042
  20. 20. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 20 3. Sense Probability Ò no. of times n-gram links to entity Ò over all occurrences of n-gram ! 2 5.037 = 0,0004Kendrick -> Kendrick_Lamar = Kendrick Lamar -> Kendrick_Lamar = ! 500 698 = 0,716
  21. 21. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 21 Ranking by prior probability Works quite well for the bulk of times! ! High accuracy reported on naive linking using only “popularity ranking” [1] ! ! [1] Heng Ji, Ralph Grishman, “Knowledge Base Population: Successful Approaches and Challenges”, ACL 2011
  22. 22. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 22 Beyond ranking: supervised linking Ò Entity linking as binary classification ! Ò Input: Ò sentence s + set of target entities E
  23. 23. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 23 Beyond ranking: supervised linking “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’” http://en.wikipedia.org/wiki/Eminem http://en.wikipedia.org/wiki/Good_(economics) http://en.wikipedia.org/wiki/Lamar_County,_Alabama http://en.wikipedia.org/wiki/Lamar_County,_Mississippi http://en.wikipedia.org/wiki/Lamar_Advertising_Company http://en.wikipedia.org/wiki/Kendrick,_Idaho http://en.wikipedia.org/wiki/Good_Kid_Maad_City http://en.wikipedia.org/wiki/Kendrick_Lamar
  24. 24. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 24 Beyond ranking: supervised linking Ò Given a new sentence, for each candidate entity e output probability of belonging to class: Ò positive (= target), or Ò negative (= no target)
  25. 25. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 25 Features Ò Local: Ò link each entity mention separately Ò Global: Ò link all mentions in a document simultaneously, to arrive at a coherent set of entities
  26. 26. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 26 Global features “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”
  27. 27. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 27 Global features “[Eminem] Thinks [Kendrick Lamar]’s [good kid, m.A.A.d. city] Was ‘Genius’” http://en.wikipedia.org/wiki/Eminem http://en.wikipedia.org/wiki/Good_(economics) http://en.wikipedia.org/wiki/Lamar_County,_Alabama http://en.wikipedia.org/wiki/Lamar_County,_Mississippi http://en.wikipedia.org/wiki/Lamar_Advertising_Company http://en.wikipedia.org/wiki/Kendrick,_Idaho http://en.wikipedia.org/wiki/Good_Kid_Maad_City http://en.wikipedia.org/wiki/Kendrick_Lamar
  28. 28. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 28 “Relatedness” Source: Milne, D. and Witten, I.H. (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI'08.
  29. 29. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 29 Features: Semanticizer Ò Local: Ò n-gram Ò KB Ò n-gram+KB Ò Text similarity Ò Global: Ò Finding “related entities”
  30. 30. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 30 Local features: n-gram/KB Ò n-gram features: Ò link probability Ò length of n-gram Ò number of entity titles that contain n-gram Ò entity features: Ò entity’s number of inlinks Ò entity’s number of outlinks Ò number of redirect pages referring to entity Ò n-gram+entity features: Ò commonness Ò sense probability Ò edit distance between n-gram and entity title Ò does n-gram contain entity title? Ò does entity title contain n-gram? Ò does title equal n-gram? Ò TF of n-gram in entity document
  31. 31. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 31 Local features: Text similarity Ò Similarity between input sentence s ! ! ! and entity candidate document (Wikipedia page) ! Ò Kendrick_Lamar 0.4215 Ò Kendrick,_Idaho 0.1599 “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”
  32. 32. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 32 Global features query q Document r query q Document r Cand. e1 Cand. e2 query q Document r Cand. e1 Cand. e2 Outlink e3 Inlink e4 Inlink e5 Outlink e6 Inlink e7 query q Document r Cand. e1 Cand. e2 Outlink e3 Inlink e4 Inlink e5 Outlink e6 Inlink e7 Anchor 3 Anchor 4 Anchor 3A Anchor 3B Anchor 5A Anchor 5B Anchor 5C Anchor 4A Anchor 4B Anchor 2A Anchor 2B Anchor 2C Anchor 1B Anchor 1 Anchor 2 Anchor 1A query q Document r Cand. e1 Cand. e2 Outlink e3 Inlink e4 Inlink e5 Outlink e6 Inlink e7 Anchor 3 Anchor 4 Anchor 3A Anchor 3B Anchor 5A Anchor 5B Anchor 5C Anchor 4A Anchor 4B Anchor 2A Anchor 2B Anchor 2C Anchor 1B Anchor 1 Anchor 2 Anchor 1A Support Anchor 1A Support Anchor 5C Support Anchor 4B
  33. 33. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 33 But Ò Too slow in real life Ò Solution: 
 set of linked entities (inlinks / outlinks) as “virtual document”
  34. 34. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 34 Related entity document ["Entertainment Weekly”, "Compton, California”, “California", “Rapping", “songwriter", "Hip hop music”, "Top Dawg Entertainment”, "Aftermath Entertainment”, "Interscope Records”, "Black Hippy”, "Dr. Dre”, "The Game (rapper)”, "Jay Rock”, "J. Cole”, "Hip hop music”, "recording artist”, "Compton, California”, "Carson, California","Top Dawg Entertainment","Aftermath Entertainment","Interscope Records","West Coast hip hop","Supergroup (music)","Black Hippy","rapper","Schoolboy Q","Jay Rock","Ab-Soul","Overly Dedicated","independent album","Section.80","iTunes Store","Major record label","Dr. Dre","Game (rapper)","Drake (entertainer)","Young Jeezy","Talib Kweli","Busta Rhymes","E-40","Warren G”, …]
  35. 35. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 35 Ò Similarity between sentence s and virtual document as related entity approximation
  36. 36. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 36 Supervised Linking Ò Feature vector for each sentence-entity pair Ò Train a Random Forest classifier
  37. 37. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 37 Local vs. global Ò Hybrid > Local | Global Ò Local & Global > Hybrid Ò Approaches are complementary Ò Global preferred for highly ambiguous entity mentions (i.e., short ones)
  38. 38. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 38 Etc… Ò Open challenges: Ò out of KB entities Ò Knowledge Base Creation
  39. 39. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 39 Thanks! ! ! ! ! ! ! David Graus d.p.graus@uva.nl

×