Deriving human readable labels from sparql queries
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Deriving human readable labels from sparql queries

on

  • 155 views

This presentation was given at I-SEMANTICS 2011, 7th International Conference on Semantic Systems, Graz, and is related the publication of the same title. ...

This presentation was given at I-SEMANTICS 2011, 7th International Conference on Semantic Systems, Graz, and is related the publication of the same title.

Over 80% of entities on the Semantic Web lack a human-readable label. This hampers the ability of any tool that uses linked data to offer a meaningful interface to human users. We argue that methods for deriving human-readable labels are essential in order to allow the usage of the Web of Data. In this paper we explore, implement, and evaluate a method for deriving human-readable labels based on the variable names used in a large corpus of SPARQL queries that we built from a set of log files. We analyze the structure of the SPARQL graph patterns and offer a classification scheme for graph patterns. Based on this classification, we identify graph patterns that allow us to derive useful labels. We also provide an overview over the current usage of SPARQL in the newly built corpus.

The publication is available at http://www.aifb.kit.edu/images/9/9d/Sparql_queries.pdf

Statistics

Views

Total Views
155
Views on SlideShare
152
Embed Views
3

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 3

http://www.slideee.com 3

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Deriving human readable labels from sparql queries Presentation Transcript

  • 1. KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association INSTITUTE FOR APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS www.kit.edu Deriving Human-Readable Labels from SPARQL Queries Basil Ell, Denny Vrandečić, and Elena Simperl 7th International Conference on Semantic Systems, Graz 7 September 2011
  • 2. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 2 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries Outline Motivation Human-readability of the LOD cloud Method Evaluation Conclusions
  • 3. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 3 31.03.2014 Introduction Entities are identified by URIs, such as http://de.dbpedia.org/resource/Graz http://rdf.freebase.com/ns/m.043j22x Human-readable names can be provided e.g. using the property rdfs:label dbpedia:Austria rdfs:label "Österreich"@de Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 4. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 4 31.03.2014 Motivation – Why are labels necessary? Scenario: linked data browsing Basil Ell – Deriving Human-Readable Labels from SPARQL queries [SIGMA] Is this meaningful to human users?
  • 5. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 5 31.03.2014 Human-Readability of the LOD Cloud BTC2010 Corpus [BTC2010] 3,167,799,445 ntriples 159,177,123 distinct subjects 137,156,213 (86.17%) have no value for any of the properties rdfs:label, rdfs:comment, dc:title, and foaf:name. 61.8% of the analyzed non-information resources have no label (regarding 36 labeling properties) [Ell et al. 2011] Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 6. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 6 31.03.2014 Main Idea Can we automatically derive labels for entities by analyzing SPARQL queries? station can be used as a label for http://dbpedia.org/ontology/RadioStation Basil Ell – Deriving Human-Readable Labels from SPARQL queries PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbo: <http://dbpedia.org/ontology/> SELECT ?stationWHERE { ?station rdf:type dbo:RadioStation }
  • 7. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 7 31.03.2014 Analyzed data set USEWOD2011 corpus[USEWOD2011] Contains log files from DBpedia and SWDF distinct parsable SPARQL SELECT queries: 1,212,932 (DBpedia) 195,641 (SWDF) Basil Ell – Deriving Human-Readable Labels from SPARQL queries Semantic Web Dog Food (SWDF)
  • 8. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 8 31.03.2014 Classification of variable names Class Description short String length up to 2 chars. Common: s, p, o, x. stop Known no-short strings that cannot be used as labels, e.g. subject, instance, uri. lang A no-stop string that belongs to a natural language or that consists of separatedwords of a natural language, e.g. Artist and RadioStation. Checkedfor the languages {de, en, es, fr, it} using the [Corpex] webservice. (The Corpex dataset consists of all words and their frequencies as extractedand counted from instances of Wikipedia in multiple languages. [Vrandecic et al. 2011]) nolang Variable names that are neither short, nor stop, nor lang. Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 9. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 9 31.03.2014 Classification of triple patterns Triple pattern classes P = {RRV, RVR, VRL, ...} R is a resource, V is a variable, L is a literal Ignoring features such as UNION, OPTIONAL etc. Basil Ell – Deriving Human-Readable Labels from SPARQL queries SELECT ... WHERE { ... dbpedia:Karlsruhe dbo:populationTotal ?population . ... } RRV pattern
  • 10. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 10 31.03.2014 Classification of triple patterns (2) Basil Ell – Deriving Human-Readable Labels from SPARQL queries DBpedia
  • 11. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 11 31.03.2014 DBpedia – top query patterns (pruned n >= 5000) Basil Ell – Deriving Human-Readable Labels from SPARQL queries 8312 queries consist of one VVL triple and three VRV triples Graph pattern classes visualized as hypergraph: n Number of instances TP Name of triple pattern
  • 12. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 12 31.03.2014 SWDF – top query patterns pruned (n >= 1000) Basil Ell – Deriving Human-Readable Labels from SPARQL queries Graph pattern classes visualized as hypergraph: n Number of instances TP Name of triple pattern
  • 13. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 13 31.03.2014 Derivation pattern 1: 1 x RRV (31.75% of all DBpedia queries) Assumption: V‘ is a human-readable label for property R2 iff local_name(R2) = V and lang(V). V‘ can be derived from V by substituting separators and splitting camel-cased words into constituents. Basil Ell – Deriving Human-Readable Labels from SPARQL queries <http://dbpedia.org/page/NASA> R1 <http://dbpedia.org/property/agencyName> R2 ?agencyName V
  • 14. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 14 31.03.2014 Derivation pattern 2: Any graph with VRR (22.32% of all DBpedia queries) Assumption: V‘ is a human-readable label for class R2 iff lang(V) and R1 = rdf:type Example: ?place rdf:type dbo:Location Basil Ell – Deriving Human-Readable Labels from SPARQL queries ?paper V <http://data.semanticweb.org/ns/swc/ontology#isPartOf> R1 <http://data.semanticweb.org/conference/www/2009/proceedings> R2
  • 15. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 15 31.03.2014 Evaluation – 1 x RRV Basil Ell – Deriving Human-Readable Labels from SPARQL queries 1,366,363 triples of class RRV 549,093 cases: local_name(R2) = V 817,269 cases: local_name(R2) ≠ V 226 pairs (URI, guessed label) 54.5% correct: sufficiently similar to existing labels 14% correct: manual evaluation 9.1% correct within a given context (location for dbo:residence) 22.4% wrong (containedfor dbprop:creator) 68%
  • 16. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 16 31.03.2014 Evaluation – Any graph with VRR Basil Ell – Deriving Human-Readable Labels from SPARQL queries 80,455 triples of class RRV 549,093 cases: local_name(R2) = V 60 distinct URIs, 36 labels 25% correct: sufficiently similar to existing labels 39.975% correct: manual evaluation 35.025% wrong (scientist for dbo:SoccerPlayer) 64.975%
  • 17. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 17 31.03.2014 Conclusions Approach for automatically deriving labels Acceptable precision: most derived labels matched the already existing labels (atypical datasets) Derived variable names less specific Derived labels for terminological entities (properties and classes), not for instances. Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 18. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 18 31.03.2014 References & Acknowledgements [BTC 2010]http://km.aifb.kit.edu/projects/btc-2010/ [Ell et al. 2011] Labels in the Web of Data, ISWC2011, to appear. [SIGMA] http://sig.ma/search?q=Sidney+Bechet [USEWOD2011] http://data.semanticweb.org/usewod/2011/challenge.html [Corpex] http://km.aifb.kit.edu/sites/corpex/ [Vrandecic et al. 2011] Basil Ell – Deriving Human-Readable Labels from SPARQL queries Part of this work has been carriedout in the framework of the German Research Foundation (DFG) project entitled: "Entwicklung einer Virtuellen Forschungs- umgebung für die Historische Bildungsforschung mit Semantischer Wiki-Techno- logie - Semantic MediaWiki for Collaborative CorporaAnalysis" (INST 5580/1-1), in the domain of "Scientific Library Services and Information Systems" (LIS).
  • 19. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 19 31.03.2014 THANK YOU FOR YOUR ATTENTION Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 20. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 20 31.03.2014 BACKUP SLIDES Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 21. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 21 31.03.2014 Triple pattern classes (SWDF) Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 22. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 22 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries