• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Summarization and Relevance of Linked Data Facts
 

Summarization and Relevance of Linked Data Facts

on

  • 1,812 views

 

Statistics

Views

Total Views
1,812
Views on SlideShare
1,805
Embed Views
7

Actions

Likes
9
Downloads
22
Comments
2

2 Embeds 7

https://twitter.com 6
http://bottlenose.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • @narphorium yes...but in the Google Knowledge Graph you will find 5 major properties/facts about the displayed entity. The number 192 only refers to the average number of facts that has to be summarized to come up with the 5 properties/facts of GKG.
    Are you sure you want to
    Your message goes here
    Processing…
  • Slide 12 shows the Google Knowledge Graph and says that there are '192 facts per entity'. This is actually a statistic about DBpedia and is not an accurate measurement of Google's Knowledge Graph.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Summarization and Relevance of Linked Data Facts  Summarization and Relevance of Linked Data Facts Presentation Transcript

    • Summarization and Relevance of Linked Data Facts 4. Leipziger Semantic Web Tag 2012 Leipzig, 25.09.2012 Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12
    • Summarization and Relevance of Linked Data Facts • Linked Open Data and Semantics • The Importance of Being Relevant • The Most Important Facts • Heuristics for Fact Relevance • Relevance Evaluation Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12
    • http://www4.wiwiss.fu-ber lin.de/lodcloud/state/ There are more than 30 Billion facts in the Linked Data UniverseDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • State of LOD - as a knowledge base • LOD facts are encoded in RDF • related ontologies only provide ,shallow‘ semantics • poor data quality • inconsistencies • ambiguities • redundancies • mapping and interlinking are still a problemDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Consider a single LOD Dataset • e.g.,..Albert Einstein • > 600 facts • > 70 properties • no given order of facts • no given relevance of facts Aldous HuxleyDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • How to find out what facts are important?Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Which fact is more important? • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians . vs. • dbpedia:Albert_Einstein rdf:type yago:TheoreticalPhysicist .Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Importance depends on the Context Context: ,Nutrition‘ • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians . vs. • dbpedia:Albert_Einstein rdf:type yago:TheoreticalPhysicist .Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Importance depends on the Context • dbpedia:Albert_Einstein rdf:type yago:AmericanVegetarians . vs. • dbpedia:Albert_Einstein rdf:type yago:TheoreticalPhysicist . Context: ,Science‘Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • What can it be used for?Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • What can it be used for? • Making Decisions • which fact(s) should be considered to make a decision? • how should the (relevant) facts be weighted to make the right decision? • Applications: Recommendation, Exploratory Search,... • Creating Summarizations • “entity summarization ... produce a version of the original [entity] description that is more concise, yet containing sufficient information for users to quickly identify the underlying entity.“ [1] [1] Gong Cheng, Thanh Tran, and Yuzhong Qu. RE- LIN: relatedness and informativeness-based centrality for entity summarization". In: Proc. of the 10th intl. conf. on The semantic web - Volume Part I. ISWC11.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig Bonn, Germany: Springer-Verlag, 2011, pp. 114{129.Dienstag, 25. September 12
    • Google Knowledge Graph • Summarizations of facts related to a given entity • on average 192 facts per entity [2] [2] Gong Cheng, Thanh Tran, and Yuzhong Qu. RELIN: relatedness and informativeness-based centrality for entity summarization. In Proc. of the 10th int. conf. on The Semantic Web - Vol. Part I, ISWC’11, pages 114– 129, Berlin, Heidelberg, 2011. Springer-Verlag.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Google Knowledge GraphDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Google Knowledge Graph • the Google users decide with their queries what is important • Queries: • subject + object „Einstein ETH Zürich“ • subject + property „Einstein birthplace“ • Collaborative FilteringDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Entity Summarization Albert Einstein foaf:givenName foaf:familyName dbpedia:Albert_Einstein rdt:type yago:AmericanVegetarian rdt:type yago:TheoreticalPhysicistDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig RDF GraphDienstag, 25. September 12
    • Entity Summarization Albert Einstein foaf:givenName FS(dbpedia:Albert_Einstein) foaf:familyName f1 <rdf:type, yago:AmericanVegetarian> dbpedia:Albert_Einstein f2 <rdf:type, yago:TheoreticalPhysicist> rdt:type yago:AmericanVegetarian f3 <foaf:givenName, „Albert“> rdt:type f4 <foaf:familyName, „Einstein“> yago:TheoreticalPhysicistDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig RDF GraphDienstag, 25. September 12
    • Entity Summarization Given a feature set FS(e) of an entity e and a positive integer k < |FS(e)|, the problem of entity summarization is to select Summ(e) ⊂ FS(e) such that |Summ(e)| = k. Summ(e) is called a summary of e. [3] [3] Gong Cheng, Thanh Tran, and Yuzhong Qu. RE- LIN: relatedness and informativeness-based centrality for entity summarization". In: Proc. of the 10th intl. conf. on The semantic web - Volume Part I. ISWC11.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, Leipzig Bonn, Germany: Springer-Verlag, 2011, pp. 114{129.Dienstag, 25. September 12
    • How to determine relevant LOD facts? graph analysis (usage) statistics semantic analysis linguistic analysisDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (1) (Manual) Whitelisting (2) RDF-properties connecting objects of same rdf:type (3) Inverse and symmetric RDF-properties (4) Disambiguations (5) Bidirectional (wiki)Links (6) Linguistic Co-occurrences (7) Frequency of features shared with entities of same type (8) Frequency of features shared with ,neighborhood‘ entities [4,5] [4] Jörg Waitelonis and Harald Sack. Towards exploratory video search using linked data. Multimedia Tools and Applications, 59:645–672, 2012 [5] Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, and Dieter Fensel. Leveraging usage data for linked data movie entity summarization. In Proc. of the 2nd Int. Ws. on Usage Analysis and the Web of Data (USEWOD2012)Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (1) (Manual) Whitelisting dbpedia-owl:residence dbpedia:Albert_Einstein dbpedia:Switzerland rdf:type rdf:type dbpedia-owl:Person dbpedia-owl:Place Locations are (manually) considered to be important for persons...Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (2) RDF-properties connecting objects of same rdf:type dbpedia:BillCosby dbpedia:Albert_Einstein rdt:type rdt:type yago:AmericanVegetarian rdt:type dbpedia-owl:Scientist dbpedia-owl:doctoralAdvisor dbpedia:AlfredKleiner rdt:typeDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (3) Inverse and Symmetric RDF properties rdt:type dbpedia:Albert_Einstein dbpedia-owl:Scientist dbpedia-owl:doctoralAdvisor rdt:type dbpedia-owl:doctoralStudent dbpedia:AlfredKleiner dbpedia-owl:doctoralAdvisor owl:inverseOf dbpedia:doctoralStudent .Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (4) Disambiguations dbpedia:Sumerian dbpedia:Cuneiform_script dbpedia:Sumer dbpedia:Sumerian_religion dbpedia:Sumerian_language dbpedia-owl:wikiPageDisambiguates dbpedia:Sumerian_artDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (5) Bidirektional (Wiki)Links dbpedia:Albert_Einstein dbprop:wikilink dbprop:wikilink dbpedia:AlfredKleinerDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (6) Linguistic Co-occurrences dbpprop:fields dbpedia:Albert_Einstein dbpedia:Physics foaf:familyName Einstein foaf:givenName Albert rdf:type dbpedia:ViolinistsDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (7) Frequency of features shared with entities of same type dbpedia:Albert_Einstein rdt:type dbpedia-owl:Scientist dbpedia:AlfredKleiner rdt:type dbpedia:Archimedes rdt:type dbpedia:Niels_BohrDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Heuristics for Feature Relevance (8) Frequency of features shared with ,neighborhood‘ entities (1) Determine neighborhood entities Ne,k ⊂ E of an entity e ∈ E (2) Frequency of features shared with neighbors determines feature importance: For all features FS(e) of entity e: Ae,f and Be,f are sets of items sharing the same features, where Ae,f ⊂ Ne,k and Be,f ⊂ E (3) The weight we(f) of a feature f for an entity e is determined asDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • A Ground Truth is Hard to Find • A ground truth for relevance evaluation of facts is hard to find. We need: • a sufficient number of arbitrary and independent facts • a sufficient number of people to ask with different opinions • Idea: Crowd Sourcing • Game based approach for evaluationDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Game Based Evaluation Approach • Idea: a Quiz Game Create Questions from LOD facts • Hypotheses: • If the user is able to answer the question correctly, it is rather likely that the fact behind the answer is well known and of some importance for the related entity. • If the user is not able to answer the question or gives the wrong answer, it is rather likely that the fact behind the answer is not well known and maybe not of high importance for the related entity.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Game Based Evaluation Approach Jörg Waitelonis, Nadine Ludwig, Magnus Knuth, Harald Sack: Whoknows? - Evaluating Linked Data Heuristics with a Quiz that cleans up DBpedia. International Journal of Interactive Technology and Smart Education http://tinyurl.com/whoknowsgame (ITSE), Emerald Group, Vol. 8, 2011 (3).Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Game Based Evaluation Approach Lina Wolf, Magnus Knuth, Johannes P. Osterhoff, Harald Sack: RISQ! Renowned Individuals Semantic Quiz – A Jeopardy like Quiz Game for Ranking Facts. In Proc. of 7th Int. Conf. on Semantic Systems I-SEMANTICS, 07.-09. http://apps.facebook.com/hpi-risq Sept., 2011, Graz, Austria, ACM, 2011, pp. 71-78.Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Game Based Evaluation Approach • WhoKnows? Movies! game adapted for entity summarization in the movie domain • restricted set of entities (movies) and related properties • Questions are generated out of RDF triples, e.g. fb:en.pulp_fiction :hasActor fb:en.john_travolta . Question: John Travolta is the actor of ...? Correct Answer: Pulp Fiction http://bit.ly/WhoKnowsMoviesDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Game Based Ground Truth • WhoKnows? Movies has been played 690 times by 217 players • All 2,829 triples have been played at least once, 2,314 triples at least three times. • In total 8,308 questions have been played of which 4,716 have been answered correctly. • Overall result of relevance ranking: http://yovisto.com/labs/iswc2012/Dr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Evaluation of Fact Ranking Heuristics • Heuristics based on User-Based Entity Summarization (UBES), i.e. heuristics (8) Frequency of features shared with ,neighborhood‘ entities with usage-based neighborhood determination with data from Freebase and HetRec2011 MovieLens2k dataset • Comparison with results derived from Google Knowledge Graph (GKG) and random results • Kendall τ rank correlation used to evaluate the orderings of GKG and UBES with our ground truthDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Evaluation of Fact Ranking Heuristics • Feature Ranking for the cast of a movie • Average difference of both, UBES and GKG to random is significantDr. Harald Sack, Hasso-Plattner-Institut Potsdam, Leipziger Semantic Web Tage, 24./25.09.2012, LeipzigDienstag, 25. September 12
    • Summary of Results • LOD fact relevance can be determined via heuristics based on statistical, linguistic, and semantic features • Ground Truth for fact relevance can be created with the help of a game based approach • Ground truth dataset and evaluation dataset are publicly available (Thalhammer, Knuth, Sack: Evaluating Entity Summarizations Using a Game-Based Ground Truth, ISWC 2012) Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12
    • Summarization and Relevance of Linked Data Facts Contact: Harald Sack Hasso-Plattner-Institut für Softwaresystemtechnik Universität Potsdam Prof.-Dr.-Helmert-Str. 2-3 D-14482 Potsdam Homepage: http://bit.ly/HaraldSack-HPI http://www.yovisto.com/ Blog: http://moresemantic.blogspot.com/ E-Mail: harald.sack@hpi.uni-potsdam.de Twitter: lysander07 / biblionomicon / yovisto Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDienstag, 25. September 12