Summarization and                                Relevance of                             Linked Data Facts               ...
Summarization and Relevance                                of Linked Data Facts                               • Linked Ope...
http://www4.wiwiss.fu-ber                                                                                                 ...
State of LOD - as a knowledge base            • LOD facts are encoded in RDF            • related ontologies only provide ...
Consider a single LOD Dataset                                                                                             ...
How to find out                                                                                               what facts a...
Which fact is more important?      • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians .                          ...
Importance depends on the Context             Context: ,Nutrition‘      • dbpedia:Albert_Einstein rdf:type yago:AmericanVe...
Importance depends on the Context      • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians .                      ...
What can it be used for?Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Lei...
What can it be used for?         • Making Decisions              • which fact(s) should be considered to make a decision? ...
Google Knowledge Graph                                                                                                    ...
Google Knowledge GraphDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipz...
Google Knowledge Graph                                                                                                    ...
Entity Summarization                                                                Albert                                ...
Entity Summarization                                                                Albert                                ...
Entity Summarization                     Given a feature set FS(e) of an entity e and a positive                     integ...
How to determine relevant LOD facts?                                                                                      ...
Heuristics for Feature Relevance                  (1)      (Manual) Whitelisting                  (2)      RDF-properties ...
Heuristics for Feature Relevance                                           (1) (Manual) Whitelisting                      ...
Heuristics for Feature Relevance                                           (2) RDF-properties connecting objects of same r...
Heuristics for Feature Relevance                                           (3) Inverse and Symmetric RDF properties       ...
Heuristics for Feature Relevance                                           (4) Disambiguations        dbpedia:Sumerian    ...
Heuristics for Feature Relevance                                           (5) Bidirektional (Wiki)Links        dbpedia:Al...
Heuristics for Feature Relevance                                           (6) Linguistic Co-occurrences                  ...
Heuristics for Feature Relevance                                       (7) Frequency of features shared with entities of s...
Heuristics for Feature Relevance                                   (8) Frequency of features shared with ,neighborhood‘ en...
A Ground Truth is Hard to Find                   • A ground truth for relevance evaluation of facts is hard               ...
Game Based Evaluation Approach                    • Idea: a Quiz Game                            Create Questions from LOD...
Game Based Evaluation Approach  Jörg Waitelonis, Nadine Ludwig, Magnus Knuth, Harald  Sack: Whoknows? - Evaluating Linked ...
Game Based Evaluation Approach  Lina Wolf, Magnus Knuth, Johannes P. Osterhoff, Harald  Sack: RISQ! Renowned Individuals S...
Game Based Evaluation Approach                                                                                            ...
Game Based Ground Truth                                                  • WhoKnows? Movies has been played 690 times by 2...
Evaluation of Fact Ranking Heuristics        • Heuristics based on User-Based Entity Summarization (UBES),              i....
Evaluation of Fact Ranking Heuristics        • Feature Ranking for the cast of a movie        • Average difference of both...
Summary of Results                               • LOD fact relevance can be determined                                via...
Summarization and Relevance                                of Linked Data Facts                                   Contact:...
Upcoming SlideShare
Loading in …5
×

Summarization and Relevance of Linked Data Facts

3,248 views

Published on

Published in: Education, Technology
2 Comments
9 Likes
Statistics
Notes
No Downloads
Views
Total views
3,248
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
26
Comments
2
Likes
9
Embeds 0
No embeds

No notes for slide

Summarization and Relevance of Linked Data Facts

  1. 1. Summarization and Relevance of Linked Data Facts 4. Leipziger Semantic Web Tag 2012 Leipzig, 25.09.2012 Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12
  2. 2. Summarization and Relevance of Linked Data Facts • Linked Open Data and Semantics • The Importance of Being Relevant • The Most Important Facts • Heuristics for Fact Relevance • Relevance Evaluation Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12
  3. 3. http://www4.wiwiss.fu-ber lin.de/lodcloud/state/ There are more than 30 Billion facts in the Linked Data UniverseDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  4. 4. State of LOD - as a knowledge base • LOD facts are encoded in RDF • related ontologies only provide ,shallow‘ semantics • poor data quality • inconsistencies • ambiguities • redundancies • mapping and interlinking are still a problemDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  5. 5. Consider a single LOD Dataset • e.g.,..Albert Einstein • > 600 facts • > 70 properties • no given order of facts • no given relevance of facts Aldous HuxleyDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  6. 6. How to find out what facts are important?Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  7. 7. Which fact is more important? • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians . vs. • dbpedia:Albert_Einstein rdf:type yago:TheoreticalPhysicist .Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  8. 8. Importance depends on the Context Context: ,Nutrition‘ • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians . vs. • dbpedia:Albert_Einstein rdf:type yago:TheoreticalPhysicist .Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  9. 9. Importance depends on the Context • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians . vs. • dbpedia:Albert_Einstein rdf:type yago:TheoreticalPhysicist . Context: ,Science‘Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  10. 10. What can it be used for?Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  11. 11. What can it be used for? • Making Decisions • which fact(s) should be considered to make a decision? • how should the (relevant) facts be weighted to make the right decision? • Applications: Recommendation, Exploratory Search,... • Creating Summarizations • “entity summarization ... produce a version of the original [entity] description that is more concise, yet containing sufficient information for users to quickly identify the underlying entity.“ [1] [1] Gong Cheng, Thanh Tran, and Yuzhong Qu. RE- LIN: relatedness and informativeness-based centrality for entity summarization". In: Proc. of the 10th intl. conf. on The semantic web - Volume Part I. ISWC11.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig Bonn, Germany: Springer-Verlag, 2011, pp. 114{129.Dienstag, 25. September 12
  12. 12. Google Knowledge Graph • Summarizations of facts related to a given entity • on average 192 facts per entity [2] [2] Gong Cheng, Thanh Tran, and Yuzhong Qu. RELIN: relatedness and informativeness-based centrality for entity summarization. In Proc. of the 10th int. conf. on The Semantic Web - Vol. Part I, ISWC’11, pages 114– 129, Berlin, Heidelberg, 2011. Springer-Verlag.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  13. 13. Google Knowledge GraphDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  14. 14. Google Knowledge Graph • the Google users decide with their queries what is important • Queries: • subject + object „Einstein ETH Zürich“ • subject + property „Einstein birthplace“ • Collaborative FilteringDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  15. 15. Entity Summarization Albert Einstein foaf:givenName foaf:familyName dbpedia:Albert_Einstein rdt:type yago:AmericanVegetarian rdt:type yago:TheoreticalPhysicistDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig RDF GraphDienstag, 25. September 12
  16. 16. Entity Summarization Albert Einstein foaf:givenName FS(dbpedia:Albert_Einstein) foaf:familyName f1 <rdf:type, yago:AmericanVegetarian> dbpedia:Albert_Einstein f2 <rdf:type, yago:TheoreticalPhysicist> rdt:type yago:AmericanVegetarian f3 <foaf:givenName, „Albert“> rdt:type f4 <foaf:familyName, „Einstein“> yago:TheoreticalPhysicistDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig RDF GraphDienstag, 25. September 12
  17. 17. Entity Summarization Given a feature set FS(e) of an entity e and a positive integer k < |FS(e)|, the problem of entity summarization is to select Summ(e) ⊂ FS(e) such that |Summ(e)| = k. Summ(e) is called a summary of e. [3] [3] Gong Cheng, Thanh Tran, and Yuzhong Qu. RE- LIN: relatedness and informativeness-based centrality for entity summarization". In: Proc. of the 10th intl. conf. on The semantic web - Volume Part I. ISWC11.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig Bonn, Germany: Springer-Verlag, 2011, pp. 114{129.Dienstag, 25. September 12
  18. 18. How to determine relevant LOD facts? graph analysis (usage) statistics semantic analysis linguistic analysisDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  19. 19. Heuristics for Feature Relevance (1) (Manual) Whitelisting (2) RDF-properties connecting objects of same rdf:type (3) Inverse and symmetric RDF-properties (4) Disambiguations (5) Bidirectional (wiki)Links (6) Linguistic Co-occurrences (7) Frequency of features shared with entities of same type (8) Frequency of features shared with ,neighborhood‘ entities [4,5] [4] Jörg Waitelonis and Harald Sack. Towards exploratory video search using linked data. Multimedia Tools and Applications, 59:645–672, 2012 [5] Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, and Dieter Fensel. Leveraging usage data for linked data movie entity summarization. In Proc. of the 2nd Int. Ws. on Usage Analysis and the Web of Data (USEWOD2012)Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  20. 20. Heuristics for Feature Relevance (1) (Manual) Whitelisting dbpedia-owl:residence dbpedia:Albert_Einstein dbpedia:Switzerland rdf:type rdf:type dbpedia-owl:Person dbpedia-owl:Place Locations are (manually) considered to be important for persons...Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  21. 21. Heuristics for Feature Relevance (2) RDF-properties connecting objects of same rdf:type dbpedia:BillCosby dbpedia:Albert_Einstein rdt:type rdt:type yago:AmericanVegetarian rdt:type dbpedia-owl:Scientist dbpedia-owl:doctoralAdvisor dbpedia:AlfredKleiner rdt:typeDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  22. 22. Heuristics for Feature Relevance (3) Inverse and Symmetric RDF properties rdt:type dbpedia:Albert_Einstein dbpedia-owl:Scientist dbpedia-owl:doctoralAdvisor rdt:type dbpedia-owl:doctoralStudent dbpedia:AlfredKleiner dbpedia-owl:doctoralAdvisor owl:inverseOf dbpedia:doctoralStudent .Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  23. 23. Heuristics for Feature Relevance (4) Disambiguations dbpedia:Sumerian dbpedia:Cuneiform_script dbpedia:Sumer dbpedia:Sumerian_religion dbpedia:Sumerian_language dbpedia-owl:wikiPageDisambiguates dbpedia:Sumerian_artDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  24. 24. Heuristics for Feature Relevance (5) Bidirektional (Wiki)Links dbpedia:Albert_Einstein dbprop:wikilink dbprop:wikilink dbpedia:AlfredKleinerDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  25. 25. Heuristics for Feature Relevance (6) Linguistic Co-occurrences dbpprop:fields dbpedia:Albert_Einstein dbpedia:Physics foaf:familyName Einstein foaf:givenName Albert rdf:type dbpedia:ViolinistsDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  26. 26. Heuristics for Feature Relevance (7) Frequency of features shared with entities of same type dbpedia:Albert_Einstein rdt:type dbpedia-owl:Scientist dbpedia:AlfredKleiner rdt:type dbpedia:Archimedes rdt:type dbpedia:Niels_BohrDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  27. 27. Heuristics for Feature Relevance (8) Frequency of features shared with ,neighborhood‘ entities (1) Determine neighborhood entities Ne,k ⊂ E of an entity e ∈ E (2) Frequency of features shared with neighbors determines feature importance: For all features FS(e) of entity e: Ae,f and Be,f are sets of items sharing the same features, where Ae,f ⊂ Ne,k and Be,f ⊂ E (3) The weight we(f) of a feature f for an entity e is determined asDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  28. 28. A Ground Truth is Hard to Find • A ground truth for relevance evaluation of facts is hard to find. We need: • a sufficient number of arbitrary and independent facts • a sufficient number of people to ask with different opinions • Idea: Crowd Sourcing • Game based approach for evaluationDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  29. 29. Game Based Evaluation Approach • Idea: a Quiz Game Create Questions from LOD facts • Hypotheses: • If the user is able to answer the question correctly, it is rather likely that the fact behind the answer is well known and of some importance for the related entity. • If the user is not able to answer the question or gives the wrong answer, it is rather likely that the fact behind the answer is not well known and maybe not of high importance for the related entity.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  30. 30. Game Based Evaluation Approach Jörg Waitelonis, Nadine Ludwig, Magnus Knuth, Harald Sack: Whoknows? - Evaluating Linked Data Heuristics with a Quiz that cleans up DBpedia. International Journal of Interactive Technology and Smart Education http://tinyurl.com/whoknowsgame (ITSE), Emerald Group, Vol. 8, 2011 (3).Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  31. 31. Game Based Evaluation Approach Lina Wolf, Magnus Knuth, Johannes P. Osterhoff, Harald Sack: RISQ! Renowned Individuals Semantic Quiz – A Jeopardy like Quiz Game for Ranking Facts. In Proc. of 7th Int. Conf. on Semantic Systems I-SEMANTICS, 07.-09. http://apps.facebook.com/hpi-risq Sept., 2011, Graz, Austria, ACM, 2011, pp. 71-78.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  32. 32. Game Based Evaluation Approach • WhoKnows? Movies! game adapted for entity summarization in the movie domain • restricted set of entities (movies) and related properties • Questions are generated out of RDF triples, e.g. fb:en.pulp_fiction :hasActor fb:en.john_travolta . Question: John Travolta is the actor of ...? Correct Answer: Pulp Fiction http://bit.ly/WhoKnowsMoviesDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  33. 33. Game Based Ground Truth • WhoKnows? Movies has been played 690 times by 217 players • All 2,829 triples have been played at least once, 2,314 triples at least three times. • In total 8,308 questions have been played of which 4,716 have been answered correctly. • Overall result of relevance ranking: http://yovisto.com/labs/iswc2012/Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  34. 34. Evaluation of Fact Ranking Heuristics • Heuristics based on User-Based Entity Summarization (UBES), i.e. heuristics (8) Frequency of features shared with ,neighborhood‘ entities with usage-based neighborhood determination with data from Freebase and HetRec2011 MovieLens2k dataset • Comparison with results derived from Google Knowledge Graph (GKG) and random results • Kendall τ rank correlation used to evaluate the orderings of GKG and UBES with our ground truthDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  35. 35. Evaluation of Fact Ranking Heuristics • Feature Ranking for the cast of a movie • Average difference of both, UBES and GKG to random is significantDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
  36. 36. Summary of Results • LOD fact relevance can be determined via heuristics based on statistical, linguistic, and semantic features • Ground Truth for fact relevance can be created with the help of a game based approach • Ground truth dataset and evaluation dataset are publicly available (Thalhammer, Knuth, Sack: Evaluating Entity Summarizations Using a Game-Based Ground Truth, ISWC 2012) Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12
  37. 37. Summarization and Relevance of Linked Data Facts Contact: Harald Sack Hasso-Plattner-Institut für Softwaresystemtechnik Universität Potsdam Prof.-Dr.-Helmert-Str. 2-3 D-14482 Potsdam Homepage: http://bit.ly/HaraldSack-HPI http://www.yovisto.com/ Blog: http://moresemantic.blogspot.com/ E-Mail: harald.sack@hpi.uni-potsdam.de Twitter: lysander07 / biblionomicon / yovisto Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12

×