Stat rosa pristina nomine,  nomine nuda tenemus   César de Pablo Sanchez
Overview of previous workTAC-KBP 2010 - Combining Similarities and Regression          Classifiers for Entity Linking     ...
Overview of previous work
Drug Drug InteractionsRelation extractionAnaphora resolution
OPINATOR - Opinion Mining               Sentiment loaded dictionaries               Sentiment classification              ...
Knowledge acquisitionList candidates for the Greek elections in June.
Knowledge acquisitionList candidates for the Greek elections in June.
Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What...
Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What...
Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What...
Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What...
TAC-KBP 2010 - Combining Similarities and Regression          Classifiers for Entity Linking           1. Task definition:...
TAC-KBP 2010 - Combining Similarities and Regression          Classifiers for Entity Linking             Knowledge Base Po...
Knowledge Base Population                  Knowledge                    Base      KBP
Knowledge Base Population                                       Knowledge                                         Base    ...
IE = KBP?            QA = KBP?
IE = KBP?Accurate extraction of facts – not annotation Learn facts from corpus - repetition is not      important but help...
IE = KBP?Accurate extraction of facts – not annotation          Slots are fixed but targets change Learn facts from corpus...
Task at TAC - KBP●        Task –1: Slot Filling in    Entity Linking grounding entity mentions    document to KB entries● ...
Task at TAC - KBP●        Task –1: Slot Filling in    Entity Linking grounding entity mentions    document to KB entries● ...
Task at TAC - KBP●   Entity Linking – grounding entity mentions in    document to KB entries●   Slot Filling – Learning at...
Entity Linking: ExampleFor a name string and a document, determine which mentions in  ● Entity Linking – grounding entity ...
Entity Linking: ExampleFor a name string and a document, determine which mentions in  ● Entity Linking – grounding entity ...
Entity Linking: ChallengesFocus on confusable entities   ● Entity Linking – grounding entity mentions in   ●     Ambiguous...
Entity Linking: ChallengesFocus on confusable entities   ● Entity Linking – grounding entity mentions in   ●     Ambiguous...
Entity Linking: ChallengesFocus on confusable entities   ● Entity Linking – grounding entity mentions in   ●     Ambiguous...
Entity Linking: ChallengesFocus on confusable entities   ● Entity Linking – grounding entity mentions in   ●     Ambiguous...
Entity Linking: ChallengesFocus on confusable entities   ● Entity Linking – grounding entity mentions in    ●     Ambiguou...
Entity Linking: EvaluationName mention – document pairs●    Accuracy micro = num correct / num queries●    Accuracy macro ...
uc3m EL system●    Supervised architecture     ● Entity Linking – grounding entity mentions in●    Use similarities to KB ...
uc3m EL system
1) Candidate Retrieval●    Each KB article is indexed using Lucene, using several     ● Entity Linking – grounding entity ...
1) Candidate Retrieval●    EL Query: [Michael Jordan,eng-NG-31-100316-11150589]     ● Entity Linking – grounding entity me...
2) Candidate Filtering●    Classification problem     ● Entity Linking – grounding entity mentions in    ●       decide (E...
Features●    Index-based scores:     ● Entity Linking – grounding entity mentions in    ●       sim (EL queries, KB entrie...
3) Validation●    Classification – selected candidate is good enough or NIL     ● Entity Linking – grounding entity mentio...
EL results - main●●    ●   Entity Linking – grounding entity mentions in        document to KB web                  news  ...
EL results - main●●    ●   Entity Linking – grounding entity mentions in        document to KB web                  news  ...
EL results - main●●     ●   Entity Linking – grounding entity mentions in         document to KB web                   new...
EL results - main●   AA     ● Entity Linking – grounding entity mentions in       document to KB web                 news ...
EL results - main●   AA     ● Entity Linking – grounding entity mentions in       document to KB web                 news ...
EL results – pilot w/o text●●    ●   Entity Linking – grounding entity mentions in        document to KB entries          ...
EL results – pilot w/o text●●    ●   Entity Linking – grounding entity mentions in        document to KB entries          ...
EL systems comparison●    Prior on Link probability/popularity (Stanford-UBC 2009, LCC 2010,     ● Entity Linking – ground...
Conclusion●    Supervised EL system     ● Entity Linking – grounding entity mentions in    ●       Influence of training s...
Related tasks●   Cluster Documents Mentioning Entities●   Entity correference – document and cross-    document●   Add mis...
Upcoming SlideShare
Loading in …5
×

Combining Similarities and Regression for Entity Linking.

389 views

Published on

An outline of the UC3M participation in TAC-KBP Entity Linking task in 2010. Joint work with Juan Perea and Paloma Martínez. This presentation was given at the Structural Biology Group at CNIO in June 2012, includes some initial slides as presentation.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
389
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Combining Similarities and Regression for Entity Linking.

  1. 1. Stat rosa pristina nomine, nomine nuda tenemus César de Pablo Sanchez
  2. 2. Overview of previous workTAC-KBP 2010 - Combining Similarities and Regression Classifiers for Entity Linking 1. Task definition: KBP and EL 2. System description 3. Results 4. Conclusions
  3. 3. Overview of previous work
  4. 4. Drug Drug InteractionsRelation extractionAnaphora resolution
  5. 5. OPINATOR - Opinion Mining Sentiment loaded dictionaries Sentiment classification Opinion summarization Search/Navigation
  6. 6. Knowledge acquisitionList candidates for the Greek elections in June.
  7. 7. Knowledge acquisitionList candidates for the Greek elections in June.
  8. 8. Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What does Syriza means?
  9. 9. Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What does Syriza means?
  10. 10. Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What does Syriza means?How old is Samaras?
  11. 11. Knowledge acquisitionList candidates for the Greek elections in June.What party does Tsipras represents?How old is he?What does Syriza means?How old is Samaras?
  12. 12. TAC-KBP 2010 - Combining Similarities and Regression Classifiers for Entity Linking 1. Task definition: KBP and EL 2. System description 3. Results 4. Conclusions
  13. 13. TAC-KBP 2010 - Combining Similarities and Regression Classifiers for Entity Linking Knowledge Base Population César de Pablo, Juan Perea, Paloma Martínez
  14. 14. Knowledge Base Population Knowledge Base KBP
  15. 15. Knowledge Base Population Knowledge Base KBP from Wikipedia dump (2008) ● Title, name, type, id, ● wiki text, ● several facts as [name, value]● 1.3 million English newswire documents ● Published from 1994 and 2008● 488.240 webpages
  16. 16. IE = KBP? QA = KBP?
  17. 17. IE = KBP?Accurate extraction of facts – not annotation Learn facts from corpus - repetition is not important but helps confidence Asserting wrong information is bad Scalability Provenance QA = KBP?
  18. 18. IE = KBP?Accurate extraction of facts – not annotation Slots are fixed but targets change Learn facts from corpus - repetition is not Leverage knowledge from the KB important but helps confidence Global resolution - ground information to the KB Asserting wrong information is bad Avoid contradiction Scalability Detect novel info Provenance QA = KBP?
  19. 19. Task at TAC - KBP● Task –1: Slot Filling in Entity Linking grounding entity mentions document to KB entries● Slot Filling – Learning attributes about target entities Task 2: Entity Linking
  20. 20. Task at TAC - KBP● Task –1: Slot Filling in Entity Linking grounding entity mentions document to KB entries● Slot Filling – Learning attributes about target entities
  21. 21. Task at TAC - KBP● Entity Linking – grounding entity mentions in document to KB entries● Slot Filling – Learning attributes about target entities Task 2: Entity Linking
  22. 22. Entity Linking: ExampleFor a name string and a document, determine which mentions in ● Entity Linking – grounding entity entity in a KBif any is being referred to by entries string document to KB the name ● <query id="EL006455"> Slot Filling – Learning attributes about target <name>Reserve Bank</name> entities <docid>eng-NG-31-100316-11150589</docid> <entity>E0700143</entity> </query> <query id="EL06472"> <name>Reserve Bank</name> <docid>eng-NG-31-142262-10040510</docid> <entity>E0421510</entity> </query>
  23. 23. Entity Linking: ExampleFor a name string and a document, determine which mentions in ● Entity Linking – grounding entity entity in a KBif any is being referred to by entries string document to KB the name ● <query id="EL006455"> Slot Filling – Learning attributes about target <name>Reserve Bank</name> entities <docid>eng-NG-31-100316-11150589</docid> <entity>E0700143</entity> … </query> E0421510: Reserve Bank of Australia … E0700143: Reserve Bank of India <query id="EL06472"> .... <name>Reserve Bank</name> <docid>eng-NG-31-142262-10040510</docid> NIL <entity>E0421510</entity> </query>
  24. 24. Entity Linking: ChallengesFocus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names : Reserve Bank, Alan Jackson, Fonda document to KB entries ●● Slot Filling – Learning attributes about target entities
  25. 25. Entity Linking: ChallengesFocus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants: Saddam Hussain, Saddam Hussein entities
  26. 26. Entity Linking: ChallengesFocus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants ● entities Acronym expansion: CDC, AZ
  27. 27. Entity Linking: ChallengesFocus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants ● entities Acronym expansion ● Variety of cases : Centre for Disease Control, European Centre for Disease Control, AZ, Arizona, Astra Zeneca
  28. 28. Entity Linking: ChallengesFocus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants ● entities Acronym expansion ● Variety of cases● Pilot task – entity linking withouth text support● Identify missing entities – then cluster (2011)
  29. 29. Entity Linking: EvaluationName mention – document pairs● Accuracy micro = num correct / num queries● Accuracy macro = group by entities (2009) queries NIL set genre % NIL 3904 2229 eval 2009 news 0.571 1500 426 train 2010 web 0.284 2250 1230 eval 2010 news + 0.547 web
  30. 30. uc3m EL system● Supervised architecture ● Entity Linking – grounding entity mentions in● Use similarities to KB entries or parts of them – avoid a document between objects wide feature vector ● Slot Filling – Learning attributes about target● entities 1) Candidate Entity Retrieval 2) Candidate Filtering 3) Validation (NIL classification)
  31. 31. uc3m EL system
  32. 32. 1) Candidate Retrieval● Each KB article is indexed using Lucene, using several ● Entity Linking – grounding entity mentions in indexes and fields KB entries document to ● ● ALIASFilling – names plus aliases extracted from wiki slots: Slot - include Learning attributes about target alias, abbreviation, website, etc. entities ● NER – Named entities extracted from text: <id, ne, text> ● KB - entity slots <id, [(slot_name,slot_value)]> ● WIKIPEDIA – anchorList, category, redirect, outlinks, inlinks● Each EL query transforms into several Lucene queries – result [KB name, score] list
  33. 33. 1) Candidate Retrieval● EL Query: [Michael Jordan,eng-NG-31-100316-11150589] ● Entity Linking – grounding entity mentions in● Lucene queries:to KB entries document ● ● name=Michael AND name = Jordan Slot Filling – Learning attributes about target ● entities alias=Michael AND alias = Jordan ● abbr=Michael AND abbr = Jordan● For each query: ● [EL0989789, Michael Jordan, 25.00] ● [EL6565356, Michael B. Jordan , 25.00] ● [EL6565356, Michael I. Jordan , 25.00] ● [EL6565356, Michael-Hakim Jordan , 25.00] ● [EL6565356, Jordan , 20.00]
  34. 34. 2) Candidate Filtering● Classification problem ● Entity Linking – grounding entity mentions in ● decide (EL query KB entries name + wiki text ) is a good document to + text , KB match ● Slot Filling – Learning attributes about target ● In fact, rank by prediction confidence entities● Use similarity scores as features – norm and unnorm● Use a cost sensitive classifier.● Best results: Model trees with linear regression leafs
  35. 35. Features● Index-based scores: ● Entity Linking – grounding entity mentions in ● sim (EL queries, KB entries) directly from initial retrieval document to KB entries● Context-similarity Learning attributes about target ● Slot Filling – scores: ● entities sim(document, wikitext) o sim(document,slots)● Name similarity score: ● sim (EL queries, KB entries) – more expensive: equal, QcontainsE, EcontainsQ, Jaro, Jaro-Winkler, SLIM (based on SecondString)
  36. 36. 3) Validation● Classification – selected candidate is good enough or NIL ● Entity Linking – grounding entity mentions in● Positive examples KBcorrect candidate example document to – entries● ● Slot Filling – Learning attributes about target Negative examples – top ranked entities for those queries entities that do not have a link in the KB● Balanced dataset● Best classifier: Logistic Regression
  37. 37. EL results - main●● ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median● 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60● entities 0.82 751 PER 0.76 0.85 0.96 0.85● 2250 ALL 0.67 0.65 0.68 0.87 0.69●● Influence of domain?
  38. 38. EL results - main●● ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median● 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60● entities 0.82 751 PER 0.76 0.85 0.96 0.85● 2250 ALL 0.67 0.65 0.68 0.87 0.69●
  39. 39. EL results - main●● ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median● 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60● entities 0.82 751 PER 0.76 0.85 0.96 0.85● 2250 ALL 0.67 0.65 0.68 0.87 0.69● GPE are particularly difficult
  40. 40. EL results - main● AA ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60 entities 0.82 751 PER 0.76 0.85 0.96 0.85 2250 ALL 0.67 0.65 0.68 0.87 0.69 news web news+web Highest Median 2250 ALL 0.67 0.65 0.68 0.87 0.69 1020 noNIL 0.51 0.59 0.49 1230 NIL 0.81 0.70 0.82
  41. 41. EL results - main● AA ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60 entities 0.82 751 PER 0.76 0.85 0.96 0.85 2250 ALL 0.67 0.65 0.68 0.87 0.69 news web news+web Highest Median 2250 ALL 0.67 0.65 0.68 0.87 0.69 1020 noNIL 0.51 0.59 0.49 1230 NIL 0.81 0.70 0.82
  42. 42. EL results – pilot w/o text●● ● Entity Linking – grounding entity mentions in document to KB entries news(main) news +n-sim NIL +n-sim all● 2250 ALL 0.67 0.58 0.66 0.70 ● Slot Filling – Learning attributes about target 1020 noNIL 0.51 0.35 0.40 0.47● entities NIL 0.81 1230 0.77 0.88 0.88●● Including name similarity scores helped
  43. 43. EL results – pilot w/o text●● ● Entity Linking – grounding entity mentions in document to KB entries news(main) news +n-sim NIL +n-sim all● 2250 ALL 0.67 0.58 0.66 0.70 ● Slot Filling – Learning attributes about target 1020 noNIL 0.51 0.35 0.40 0.47● entities NIL 0.81 1230 0.77 0.88 0.88●● Including name similarity scores helped
  44. 44. EL systems comparison● Prior on Link probability/popularity (Stanford-UBC 2009, LCC 2010, ● Entity Linking – grounding entity mentions in Microsoft 2011) document to KB entries Learning to rank algorithms: ListNet (CUNY 2011) ● Slot Filling – Learning attributes about target● Expand queries: acronym expansion/correference (NUS 2011) entities● Unsupervised system – entity co-ocurrence + PageRank (WebTLab 2010)● Inductive EL – first cluster, then link (LCC 2011)● Collective entity linking (Microsoft 2011)
  45. 45. Conclusion● Supervised EL system ● Entity Linking – grounding entity mentions in ● Influence of training size document to KB entries ●● beware of training data distribution Slot Filling – Learning attributes about target● entities Consider name-similarities even for reranking● Improve initial candidate retrieval● Perform collective Entity Linking● Efficiency?
  46. 46. Related tasks● Cluster Documents Mentioning Entities● Entity correference – document and cross- document● Add missing links between Wikipedia pages● Link entities to matching Wikipedia articles

×