Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Summarizing Entity Descriptions for
Effective and Efficient
Human-centered Entity Linking
Gong Cheng, Danyun Xu, Yuzhong Q...
Entity Linking (EL)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen te...
Human-centered EL is needed
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-s...
entity description:
set of property-value pairs (called features)
But with the release of the iPhone 6
and the 6 Plus phab...
Entity descriptions are long.
Short, extractive summaries are
adequate for human-centered EL.
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Co...
Short, extractive summaries are
adequate for human-centered EL.
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Co...
Optimization goal (1)
+characterizing power, -information overlap
• Characterizing power of a feature (ch)
ch(type: IT com...
Optimization goal (1)
+characterizing power, -information overlap
• Characterizing power of a feature (ch)
ch(type: IT com...
Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical i...
Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical i...
Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical i...
Optimization goal (1)
+characterizing power, -information overlap
• Formulated as k Quadratic Knapsack Problems (QKP)
weig...
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissi...
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissi...
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissi...
Optimization goal (2): +differentiating power
• Formulated as a Quadratic Multidimensional
Knapsack Problem (QMKP)
weight ...
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity...
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity...
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity...
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity...
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity...
Optimization goal (3): +relevance to context
• Solved by k Maximizing Marginal Relevance (MMR)
frameworks
• Features are i...
Optimization goal (1+2+3)
• Formulated as a Quadratic Multidimensional
Knapsack Problem (QMKP)
Experiments: data sets
• Text corpora (with entity mentions linked to Wikipedia)
• AQUAINT
• IITB
• Knowledge base
• DBped...
Experiments: EL tasks
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Corps)
- type: Company
- product: Let It Be
...
Experiments: approaches
• Proposed approaches
• CHR: +characterizing power, -information overlap
• DFF: +differentiating p...
Experiments: extrinsic evaluation
• COMB is the only approach that achieved the following
statistically significant result...
Experiments: intrinsic evaluation
• Statistically significant results on both data sets:
• human ratings: COMB > CHR > oth...
Future work
• More extensive experiments
• to test with not-in-the-list
• Summaries for automatic EL
Questions?
Upcoming SlideShare
Loading in …5
×

Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

523 views

Published on

Presented at WWW'15, Florence.

Gong Cheng, Danyun Xu, Yuzhong Qu. Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking. In Proceedings of the 24th International World Wide Web Conference (WWW), pages 184--194, 2015.

  • Be the first to comment

  • Be the first to like this

Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

  1. 1. Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking Gong Cheng, Danyun Xu, Yuzhong Qu Websoft Research Group State Key Laboratory for Novel Software Technology Nanjing University, China
  2. 2. Entity Linking (EL) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  3. 3. Human-centered EL is needed But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities • for defining gold standard, • for crowdsourced EL.
  4. 4. entity description: set of property-value pairs (called features) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  5. 5. Entity descriptions are long.
  6. 6. Short, extractive summaries are adequate for human-centered EL. Apple (Inc.) - type: Company - product: iPhone 5 Apple (Corps) - type: Company - product: Let It Be Apple (Fruit) - type: Fruit summary of k candidate entity descriptions: k subsets of features (subject to a length limit) ?… Apple
  7. 7. Short, extractive summaries are adequate for human-centered EL. Apple (Inc.) - type: Company - product: iPhone 5 Apple (Corps) - type: Company - product: Let It Be Apple (Fruit) - type: Fruit ?… Apple summarizing entity descriptions  combinatorial optimization summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
  8. 8. Optimization goal (1) +characterizing power, -information overlap • Characterizing power of a feature (ch) ch(type: IT company) < ch(product: iPhone 5) Apple (Inc.) Samsung Electronics Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  9. 9. Optimization goal (1) +characterizing power, -information overlap • Characterizing power of a feature (ch) ch(type: IT company) < ch(product: iPhone 5) Apple (Inc.) Samsung Electronics 𝑐ℎ 𝑓 = − log number of entities having 𝑓 number of all entities Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  10. 10. Optimization goal (1) +characterizing power, -information overlap • Information overlap between features (ov) a) logical inference entailment = maximized ov ov(type: IT company, type: Company) = MAX b) string/numerical similarity Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  11. 11. Optimization goal (1) +characterizing power, -information overlap • Information overlap between features (ov) a) logical inference entailment  maximized ov ov(type: IT company, type: Company) = MAX b) string/numerical similarity Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  12. 12. Optimization goal (1) +characterizing power, -information overlap • Information overlap between features (ov) a) logical inference entailment  maximized ov ov(type: IT company, type: Company) = MAX b) string/numerical similarity ov = max{similarity between properties, similarity between values} ov(type: IT company, product: iPhone 5) = SMALL Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ...
  13. 13. Optimization goal (1) +characterizing power, -information overlap • Formulated as k Quadratic Knapsack Problems (QKP) weight of a feature: length profit of a pair of features: to maximize characterizing power to minimize information overlap
  14. 14. Optimization goal (2): +differentiating power • Differentiating power of a pair of features (di) a) string/numerical dissimilarity di = property’s value uniqueness * dissimilarity between values di(type: IT company, type: Fruit) = SMALL*LARGE = MEDIUM (Single-valued properties are more useful.) b) logical inference entailment = minimized di di(type: IT company, type: Company) = MIN Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... Samsung Electronics - type: IT Company - ...
  15. 15. Optimization goal (2): +differentiating power • Differentiating power of a pair of features (di) a) string/numerical dissimilarity di = dissimilarity between values * property’s value uniqueness di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM (Single-valued properties are more useful.) b) logical inference entailment = minimized di di(type: IT company, type: Company) = MIN Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... Samsung Electronics - type: IT Company - ...
  16. 16. Optimization goal (2): +differentiating power • Differentiating power of a pair of features (di) a) string/numerical dissimilarity di = dissimilarity between values * property’s value uniqueness di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM (Single-valued properties are more useful.) b) logical inference entailment  minimized di di(type: IT company, type: Company) = MIN Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... Samsung Electronics - type: IT Company - ...
  17. 17. Optimization goal (2): +differentiating power • Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP) weight of a feature: length profit of a pair of features: differentiating power
  18. 18. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  19. 19. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  20. 20. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  21. 21. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  22. 22. Optimization goal (3): +relevance to context • Relevance of a feature to the context of entity mention • cosine similarity in the class vector model (cs) Vector(context) = {Smarphone, IT company} Vector(type: Fruit) = {Fruit} Vector(product: iPhone 5) = {Smartphone} cs(context, product: iPhone 5) = HIGH • class weighting: class frequency – inverse instance frequency (CF-IIF) But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now. Text Knowledge Base iPhone 6 - type: Smartphone - ... Samsung Electronics - type: IT Company - ... Apple (Inc.) - type: Company - type: IT company - product: iPhone 5 - ... Apple (Fruit) - type: Fruit - genus: Malus - ... ? Candidate entities
  23. 23. Optimization goal (3): +relevance to context • Solved by k Maximizing Marginal Relevance (MMR) frameworks • Features are iteratively selected. • In each iteration, candidate features are re-ranked by • relevance to context • dissimilarity to selected features
  24. 24. Optimization goal (1+2+3) • Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)
  25. 25. Experiments: data sets • Text corpora (with entity mentions linked to Wikipedia) • AQUAINT • IITB • Knowledge base • DBpedia • Gold-standard links • entity mentions  Wikipedia articles  DBpedia entities
  26. 26. Experiments: EL tasks Apple (Inc.) - type: Company - product: iPhone 5 Apple (Corps) - type: Company - product: Let It Be Apple (Fruit) - type: Fruit ? ..., Apple has finally gone into big-screen territory, … 1 target entity • gold-standard 2 (very challenging) noise entities • sharing a common name with the target entity, obtained from Wikipedia’s disambiguation pages
  27. 27. Experiments: approaches • Proposed approaches • CHR: +characterizing power, -information overlap • DFF: +differentiating power • CNT: +relevance to context • COMB: CHR+DFF+CNT • Baseline approaches • DESC: returns entire entity descriptions • RELIN: a state-of-the-art entity summarization approach for generic purposes • average length of entity descriptions: 680 characters • length limit for summaries: 100 characters (14.7%)
  28. 28. Experiments: extrinsic evaluation • COMB is the only approach that achieved the following statistically significant results on both data sets: • accuracy (% of correct answers): COMB = DESC • time: COMB < DESC (22-23% faster)
  29. 29. Experiments: intrinsic evaluation • Statistically significant results on both data sets: • human ratings: COMB > CHR > other approaches
  30. 30. Future work • More extensive experiments • to test with not-in-the-list • Summaries for automatic EL
  31. 31. Questions?

×