• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 Leveraging Usage Data for Linked Data Movie Entity Summarization
 

Leveraging Usage Data for Linked Data Movie Entity Summarization

on

  • 389 views

Novel research in the field of Linked Data focuses on the problem of entity summarization. This field addresses the problem of ranking features according to their importance for the task of ...

Novel research in the field of Linked Data focuses on the problem of entity summarization. This field addresses the problem of ranking features according to their importance for the task of identifying a particular entity. Next to a more human friendly presentation, these summarizations can play a central role for semantic search engines and semantic recommender systems. In current approaches, it has been tried to apply entity summarization based on patterns that are inherent to the regarded data. The proposed approach of this paper focuses on the movie domain. It utilizes usage data in order to support measuring the similarity between movie entities. Using this similarity it is possible to determine the k-nearest neighbors of an entity. This leads to the idea that features that entities share with their nearest neighbors can be considered as significant or important for these entities. Additionally, we introduce a downgrading factor (similar to TF-IDF) in order to overcome the high number of commonly occurring features. We exemplify the approach based on a movie-ratings dataset that has been linked to Freebase entities.

http://arxiv.org/pdf/1204.2718v1

Statistics

Views

Total Views
389
Views on SlideShare
389
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • eachentityisassociatedwith an averageof 192 triples

 Leveraging Usage Data for Linked Data Movie Entity Summarization Leveraging Usage Data for Linked Data Movie Entity Summarization Presentation Transcript

  • Leveraging Usage Data for Linked Data Movie Entity SummarizationAndreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel USEWOD Workshop, April 17, 2012, Lyon - France©www.sti-innsbruck.at INNSBRUCK www.sti-innsbruck.at Copyright 2012 STI
  • Overview1. Problem statement2. Proposed approach3. Dataset4. Preliminary results5. Conclusionwww.sti-innsbruck.at 2
  • Problem statement • Problem: Linked Data entities comprise too much information for a user to grasp them quickly. • Entity summarization: “... we aim at solving this novel problem that we call entity summarization to produce a version of the original description that is more concise, yet containing sufficient information for users to quickly identify the underlying entity.” [Cheng et al., 2011]www.sti-innsbruck.at 3
  • Proposed approach (1) • Techniques: Item-based collaborative filtering. [Sarwar et al., 2001] k-nearest neighbor (kNN). • Usage data: Bob Alice Marc Elena John Mary Toy Story 1 0 1 0 1 1 Heat 0 0 1 1 0 0 Jumanji 1 0 1 0 1 0 Top Gun 1 0 0 1 1 0 The Juror 1 1 0 1 0 0www.sti-innsbruck.at 4
  • Proposed approach (2) Feature ranking for a specific entity e: • First idea: – Count shared features in the nearest neighbor set – Rank features according to the number of their occurrence – Problem: many features occur very often e.g. (cc:attributionName, “Source: Freebase - The Worlds database”) • Improvement: – Introduce TF-IDF to weight the features w(e,f) = |neighbor(e,f)| x log (|all()| / |all(f)|) – Rank features according to their weightwww.sti-innsbruck.at 5
  • Dataset • Initial datasets: – HetRec 2011 (2113 users, 10197 movies, 855598 ratings) – Freebase • Identified more than 10000 movies of HetRec 2011 in Freebase • kNN: (fb:en.pulp_fiction, knn:20, fb:en.reservoir_dogs)www.sti-innsbruck.at 6
  • Preliminary Results (1)www.sti-innsbruck.at 7
  • Preliminary Results (2)www.sti-innsbruck.at 8
  • Conclusions • Preliminary results look promising. • Interesting challenges: – accounting for numeric values – features as a result of property chains • Original idea of entity summarization: “... not just represent the main themes of the original data, but rather, can best identify the underlying entity” [Cheng et al., 2011] • Restriction to a single domain.www.sti-innsbruck.at 9
  • Thank you andreas.thalhammer@sti2.atwww.sti-innsbruck.at 10
  • References [Cheng et al., 2011] Gong Cheng, Thanh Tran, and Yuzhong Qu. “RELIN: relatedness and informativeness-based centrality for entity summarization”. In: Proc. of the 10th intl. conf. on the semantic web - Volume Part I. ISWC’11. Bonn, Germany: Springer-Verlag, 2011, pp. 114–129. [Sarwar et al., 2001] Badrul Sarwar, George Karypis, Joseph Konstan, and John Reidl. Item-based collaborative filtering recommendation algorithms. In proceedings of the 10th intl. conf. on World Wide Web, WWW ’01, pages 285–295, New York, NY, USA, 2001. ACM.www.sti-innsbruck.at 11