RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization

552 views
510 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
552
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization

  1. 1. .nju.edu.cn RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization Gong Cheng1, Thanh Tran2, Yuzhong Qu11 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Institute AIFB, Karlsruhe Institute of Technology, Germany gcheng@nju.edu.cn Presented at ISWC2011
  2. 2. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity.Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 30
  3. 3. Entity search --- find entities that match an information need ws .nju.edu.cnGong Cheng (程龚) gcheng@nju.edu.cn 3 of 30
  4. 4. Pay-as-you-go data integration --- judge whether two entities denote the same ws .nju.edu.cn sameAs?Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 30
  5. 5. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Problem: to summarize lengthy entity descriptionsGong Cheng (程龚) gcheng@nju.edu.cn 5 of 30
  6. 6. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments ConclusionsGong Cheng (程龚) gcheng@nju.edu.cn 6 of 30
  7. 7. Data graph ws .nju.edu.cnGong Cheng (程龚) gcheng@nju.edu.cn 7 of 30
  8. 8. Feature set ws .nju.edu.cnGong Cheng (程龚) gcheng@nju.edu.cn 8 of 30
  9. 9. Entity summarization ws .nju.edu.cn Entity summarization = feature ranking Entity summary = k top-ranked featuresGong Cheng (程龚) gcheng@nju.edu.cn 9 of 30
  10. 10. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments ConclusionsGong Cheng (程龚) gcheng@nju.edu.cn 10 of 30
  11. 11. Centrality-based ranking: concepts ws .nju.edu.cn Widely applied to text summarization and ontology summarization By constructing a graph Nodes: data elements to be ranked Edges: connecting related nodes and then, measuring node centrality e.g. degree, PageRank, … f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 30
  12. 12. PageRank ws .nju.edu.cn Simulating a random surfer’s behavior who navigates from node to node Two types of action Following a random edge (with a uniform probability distribution) Jumping at random (with a uniform probability distribution) Ranking based on the stationary distribution of such a Markov chain f2 f1 f4 f5 f3Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 30
  13. 13. Centrality-based ranking for entity summarization: problems ws .nju.edu.cn How to define a good feature Not only capturing the main themes of the entity description But also distinguishing the entity from others Loss of information Float-valued function  boolean-valued function f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 30
  14. 14. RELIN: concepts ws .nju.edu.cn An extension of PageRank Following a random edge ( ) within a complete graph, with a probability proportional to the relatedness between the two associated nodes, i.e. no threshold needed Jumping at random ( ) with a probability proportional to the amount of information carried by the target that helps to identify the entityGong Cheng (程龚) gcheng@nju.edu.cn 14 of 30
  15. 15. RELIN: RELatedness and INformativeness-based centrality ws .nju.edu.cn Two kinds of action Relational move --- more likely to a feature that carries related information about the theme currently under investigation Informational jump --- more likely to a feature that provides a large amount of new information for clarifying the identity of the underlying entity Two non-uniform probability distributionsGong Cheng (程龚) gcheng@nju.edu.cn 15 of 30
  16. 16. Formalization ws .nju.edu.cn Actions (given the current feature fq) P(M|fq): the probability of performing a relational move from fq P(J|fq): the probability of performing an informational jump from fq subject to P(M|fq) + P(J|fq) = 1 Targets for actions (given FS the feature set) P(fp|fq,M): the probability of performing a relational move from fq to fp P(fp|fq,J): the probability of performing an informational jump from fq to fp subject to P f p | f q , M 1 and P f p | fq , J 1 f p FS f p FS Result x(t): |FS|-dimensional vector xp(t): the probability that the surfer visits fp at step t Finally, xp t 1 xq t P M | fq P f p | fq , M P J | fq P f p | fq , J f q FS and lim x t x tGong Cheng (程龚) gcheng@nju.edu.cn 16 of 30
  17. 17. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments ConclusionsGong Cheng (程龚) gcheng@nju.edu.cn 17 of 30
  18. 18. Actions ws .nju.edu.cn P(M|fq) = 1 – λ P(J|fq) = λ λ: to be tuned in experimentsGong Cheng (程龚) gcheng@nju.edu.cn 18 of 30
  19. 19. Relatedness --- P(fp|fq,M) ws .nju.edu.cn Relatedness between features (i.e. property-value pairs) combines Relatedness between properties (i.e. resources) Relatedness between values (i.e. resources) Relatedness between resources = relatedness between resource names URI: label or local name Literal: lexical form Distributional relatedness between resource names More related = more often co-occur in certain contexts (e.g. documents) Estimated via “pointwise mutual information + Google” Hits si , s j P si , s j N Hits s j P sj NGong Cheng (程龚) gcheng@nju.edu.cn 19 of 30
  20. 20. Informativeness --- P(fp|fq,J) ws .nju.edu.cn Self-information o: informational jump from fq to fp P(fp|fq): the probability that fp belongs to a feature set given fq also does so Estimated via a statistical analysis of the data set Approximation: P(fp|fq) = P(fp)Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 30
  21. 21. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments ConclusionsGong Cheng (程龚) gcheng@nju.edu.cn 21 of 30
  22. 22. Experiments ws .nju.edu.cn Intrinsic evaluation Extrinsic evaluationGong Cheng (程龚) gcheng@nju.edu.cn 22 of 30
  23. 23. Intrinsic evaluation --- design ws .nju.edu.cn Task To manually construct ideal entity summaries as the gold standard Participants 24 students majoring in computer science Test cases 149 entity descriptions randomly selected from DBpedia 3.4 Assignment 4.43 participants per entity description Output Top-5 features Top-10 featuresGong Cheng (程龚) gcheng@nju.edu.cn 23 of 30
  24. 24. Intrinsic evaluation --- results ws .nju.edu.cn Metric: overlap between summaries Agreement between participants about ideal summaries 2.91 when k=5 7.86 when k=10 Quality of summaries computed under different approach settings Baselines OursGong Cheng (程龚) gcheng@nju.edu.cn 24 of 30
  25. 25. Extrinsic evaluation --- design ws .nju.edu.cn Task To manually confirm entity mappings by using summaries Participants 19 students majoring in computer science Test cases 47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009) Gold-standard judgments based on owl:sameAs links 24 correct and 23 incorrect Assignment 3.62 participants per pair, per approach setting Output Judgment: correct or incorrectGong Cheng (程龚) gcheng@nju.edu.cn 25 of 30
  26. 26. Extrinsic evaluation --- results ws .nju.edu.cn Metrics Accuracy of the judgments 1.0 = consistent with the gold standard 0.0 = inconsistent Time spent Normalized by the average time per judgment spent by the participant 1.0 = medium efficiency Smaller value = higher efficiency ResultsGong Cheng (程龚) gcheng@nju.edu.cn 26 of 30
  27. 27. Discussion ws .nju.edu.cn Automatically computed summaries are still not as good as handcrafted ones. k=5 k=10 Agreement between ideal summaries 2.91 7.86 Agreement between computed summaries and ideal summaries 2.40 4.88 User-specific notion of informativeness Longitude and latitude are highly informative, but … Information redundancy Longitude + latitude = point What if multiple sources … Summarization = what + how (to present)Gong Cheng (程龚) gcheng@nju.edu.cn 27 of 30
  28. 28. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments ConclusionsGong Cheng (程龚) gcheng@nju.edu.cn 28 of 30
  29. 29. Conclusions ws .nju.edu.cn Problem of entity summarization Extractive About identifying the entity that underlies a lengthy description The RELIN model Variant of the random surfer model Non-uniform probability distributions Informativeness + relatedness Implementation Based on linguistic and information theory concepts Using information captured by the labels of nodes and edges in the data graph Experiments Closer to handcrafted ideal summaries Assisting users in confirming entity mappings more accuratelyGong Cheng (程龚) gcheng@nju.edu.cn 29 of 30
  30. 30. Future work --- application-specific entity summarization ws .nju.edu.cn sameAs?Gong Cheng (程龚) gcheng@nju.edu.cn 30 of 30
  31. 31. Related work --- summarization ws .nju.edu.cn Paradigm Approach Measure Model RELIN Extractive - Relatedness - Text Centrality-based PageRank-like - Informativeness - Ontology - Non-uniform probability distribution Others PageRank Non-extractive - Degree - Relatedness - Database Centroid-based - Betweenness - Uniform probability - Graph -… distributionGong Cheng (程龚) gcheng@nju.edu.cn 31 of 30
  32. 32. Related work --- ranking ws .nju.edu.cn Different goals --- to best identify the underlying entity B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE Internet Comput. 2005. R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010. T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC 2009. … Exploitation of data semantics at different levels --- use labels of nodes and edges T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009. X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007. …Gong Cheng (程龚) gcheng@nju.edu.cn 32 of 30

×