This presentation was given at I-SEMANTICS 2011, 7th International Conference on Semantic Systems, Graz, and is related the publication of the same title.
Over 80% of entities on the Semantic Web lack a human-readable label. This hampers the ability of any tool that uses linked data to offer a meaningful interface to human users. We argue that methods for deriving human-readable labels are essential in order to allow the usage of the Web of Data. In this paper we explore, implement, and evaluate a method for deriving human-readable labels based on the variable names used in a large corpus of SPARQL queries that we built from a set of log files. We analyze the structure of the SPARQL graph patterns and offer a classification scheme for graph patterns. Based on this classification, we identify graph patterns that allow us to derive useful labels. We also provide an overview over the current usage of SPARQL in the newly built corpus.
The publication is available at http://www.aifb.kit.edu/images/9/9d/Sparql_queries.pdf
Production 2024 sunderland culture final - Copy.pptx
Deriving human readable labels from sparql queries
1. KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association
INSTITUTE FOR APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS
www.kit.edu
Deriving Human-Readable Labels
from SPARQL Queries
Basil Ell, Denny Vrandečić, and Elena Simperl
7th International Conference on Semantic Systems, Graz
7 September 2011
2. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
2 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Outline
Motivation
Human-readability of the LOD cloud
Method
Evaluation
Conclusions
3. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
3 31.03.2014
Introduction
Entities are identified by URIs, such as
http://de.dbpedia.org/resource/Graz
http://rdf.freebase.com/ns/m.043j22x
Human-readable names can be provided e.g.
using the property rdfs:label
dbpedia:Austria
rdfs:label
"Österreich"@de
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
4. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
4 31.03.2014
Motivation – Why are labels necessary?
Scenario: linked data browsing
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
[SIGMA]
Is this
meaningful to
human users?
5. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
5 31.03.2014
Human-Readability of the LOD Cloud
BTC2010 Corpus [BTC2010]
3,167,799,445 ntriples
159,177,123 distinct subjects
137,156,213 (86.17%) have no value for any of the
properties rdfs:label, rdfs:comment,
dc:title, and foaf:name.
61.8% of the analyzed non-information resources have
no label (regarding 36 labeling properties) [Ell et al. 2011]
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
6. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
6 31.03.2014
Main Idea
Can we automatically derive labels for entities by
analyzing SPARQL queries?
station can be used as a label for
http://dbpedia.org/ontology/RadioStation
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?stationWHERE {
?station rdf:type dbo:RadioStation
}
7. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
7 31.03.2014
Analyzed data set
USEWOD2011 corpus[USEWOD2011]
Contains log files from DBpedia and SWDF
distinct parsable SPARQL SELECT queries:
1,212,932 (DBpedia)
195,641 (SWDF)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Semantic Web Dog Food
(SWDF)
8. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
8 31.03.2014
Classification of variable names
Class Description
short String length up to 2 chars. Common: s, p, o, x.
stop Known no-short strings that cannot be used as labels, e.g. subject,
instance, uri.
lang A no-stop string that belongs to a natural language or that consists of
separatedwords of a natural language, e.g. Artist and RadioStation.
Checkedfor the languages {de, en, es, fr, it} using the [Corpex]
webservice.
(The Corpex dataset consists of all words and their frequencies as
extractedand counted from instances of Wikipedia in multiple
languages. [Vrandecic et al. 2011])
nolang Variable names that are neither short, nor stop, nor lang.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
9. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
9 31.03.2014
Classification of triple patterns
Triple pattern classes P = {RRV, RVR, VRL, ...}
R is a resource, V is a variable, L is a literal
Ignoring features such as UNION, OPTIONAL etc.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
SELECT ... WHERE {
...
dbpedia:Karlsruhe dbo:populationTotal ?population .
...
}
RRV pattern
10. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
10 31.03.2014
Classification of triple patterns (2)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
DBpedia
11. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
11 31.03.2014
DBpedia – top query patterns
(pruned n >= 5000)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
8312 queries
consist of one
VVL triple and
three VRV triples
Graph pattern classes
visualized as hypergraph:
n Number of
instances
TP Name of
triple pattern
12. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
12 31.03.2014
SWDF – top query patterns
pruned (n >= 1000)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Graph pattern classes
visualized as hypergraph:
n Number of
instances
TP Name of
triple pattern
13. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
13 31.03.2014
Derivation pattern 1: 1 x RRV
(31.75% of all DBpedia queries)
Assumption: V‘ is a human-readable label for
property R2 iff local_name(R2) = V and lang(V).
V‘ can be derived from V by substituting
separators and splitting camel-cased words into
constituents.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
<http://dbpedia.org/page/NASA> R1
<http://dbpedia.org/property/agencyName> R2
?agencyName V
14. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
14 31.03.2014
Derivation pattern 2: Any graph with VRR
(22.32% of all DBpedia queries)
Assumption: V‘ is a human-readable label for
class R2 iff lang(V) and R1 = rdf:type
Example:
?place rdf:type dbo:Location
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
?paper V
<http://data.semanticweb.org/ns/swc/ontology#isPartOf> R1
<http://data.semanticweb.org/conference/www/2009/proceedings> R2
15. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
15 31.03.2014
Evaluation – 1 x RRV
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
1,366,363 triples of class RRV
549,093 cases: local_name(R2) = V
817,269 cases: local_name(R2) ≠ V
226 pairs (URI, guessed label)
54.5% correct: sufficiently similar to existing labels
14% correct: manual evaluation
9.1% correct within a given context (location for dbo:residence)
22.4% wrong (containedfor dbprop:creator)
68%
16. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
16 31.03.2014
Evaluation – Any graph with VRR
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
80,455 triples of class RRV
549,093 cases: local_name(R2) = V
60 distinct URIs, 36 labels
25% correct: sufficiently similar to existing labels
39.975% correct: manual evaluation
35.025% wrong (scientist for dbo:SoccerPlayer)
64.975%
17. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
17 31.03.2014
Conclusions
Approach for automatically deriving labels
Acceptable precision: most derived labels
matched the already existing labels (atypical
datasets)
Derived variable names less specific
Derived labels for terminological entities
(properties and classes), not for instances.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
18. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
18 31.03.2014
References & Acknowledgements
[BTC 2010]http://km.aifb.kit.edu/projects/btc-2010/
[Ell et al. 2011] Labels in the Web of Data, ISWC2011, to appear.
[SIGMA] http://sig.ma/search?q=Sidney+Bechet
[USEWOD2011] http://data.semanticweb.org/usewod/2011/challenge.html
[Corpex] http://km.aifb.kit.edu/sites/corpex/
[Vrandecic et al. 2011]
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Part of this work has been carriedout in the framework of the German Research
Foundation (DFG) project entitled: "Entwicklung einer Virtuellen Forschungs-
umgebung für die Historische Bildungsforschung mit Semantischer Wiki-Techno-
logie - Semantic MediaWiki for Collaborative CorporaAnalysis"
(INST 5580/1-1), in the domain of "Scientific Library Services and Information
Systems" (LIS).
19. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
19 31.03.2014
THANK YOU FOR YOUR ATTENTION
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
20. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
20 31.03.2014
BACKUP SLIDES
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
21. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
21 31.03.2014
Triple pattern classes (SWDF)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
22. KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
22 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries