SlideShare a Scribd company logo
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association
INSTITUTE FOR APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS
www.kit.edu
Deriving Human-Readable Labels
from SPARQL Queries
Basil Ell, Denny Vrandečić, and Elena Simperl
7th International Conference on Semantic Systems, Graz
7 September 2011
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
2 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Outline
Motivation
Human-readability of the LOD cloud
Method
Evaluation
Conclusions
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
3 31.03.2014
Introduction
Entities are identified by URIs, such as
http://de.dbpedia.org/resource/Graz
http://rdf.freebase.com/ns/m.043j22x
Human-readable names can be provided e.g.
using the property rdfs:label
dbpedia:Austria
rdfs:label
"Österreich"@de
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
4 31.03.2014
Motivation – Why are labels necessary?
Scenario: linked data browsing
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
[SIGMA]
Is this
meaningful to
human users?
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
5 31.03.2014
Human-Readability of the LOD Cloud
BTC2010 Corpus [BTC2010]
3,167,799,445 ntriples
159,177,123 distinct subjects
137,156,213 (86.17%) have no value for any of the
properties rdfs:label, rdfs:comment,
dc:title, and foaf:name.
61.8% of the analyzed non-information resources have
no label (regarding 36 labeling properties) [Ell et al. 2011]
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
6 31.03.2014
Main Idea
Can we automatically derive labels for entities by
analyzing SPARQL queries?
station can be used as a label for
http://dbpedia.org/ontology/RadioStation
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?stationWHERE {
?station rdf:type dbo:RadioStation
}
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
7 31.03.2014
Analyzed data set
USEWOD2011 corpus[USEWOD2011]
Contains log files from DBpedia and SWDF
distinct parsable SPARQL SELECT queries:
1,212,932 (DBpedia)
195,641 (SWDF)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Semantic Web Dog Food
(SWDF)
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
8 31.03.2014
Classification of variable names
Class Description
short String length up to 2 chars. Common: s, p, o, x.
stop Known no-short strings that cannot be used as labels, e.g. subject,
instance, uri.
lang A no-stop string that belongs to a natural language or that consists of
separatedwords of a natural language, e.g. Artist and RadioStation.
Checkedfor the languages {de, en, es, fr, it} using the [Corpex]
webservice.
(The Corpex dataset consists of all words and their frequencies as
extractedand counted from instances of Wikipedia in multiple
languages. [Vrandecic et al. 2011])
nolang Variable names that are neither short, nor stop, nor lang.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
9 31.03.2014
Classification of triple patterns
Triple pattern classes P = {RRV, RVR, VRL, ...}
R is a resource, V is a variable, L is a literal
Ignoring features such as UNION, OPTIONAL etc.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
SELECT ... WHERE {
...
dbpedia:Karlsruhe dbo:populationTotal ?population .
...
}
RRV pattern
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
10 31.03.2014
Classification of triple patterns (2)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
DBpedia
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
11 31.03.2014
DBpedia – top query patterns
(pruned n >= 5000)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
8312 queries
consist of one
VVL triple and
three VRV triples
Graph pattern classes
visualized as hypergraph:
n Number of
instances
TP Name of
triple pattern
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
12 31.03.2014
SWDF – top query patterns
pruned (n >= 1000)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Graph pattern classes
visualized as hypergraph:
n Number of
instances
TP Name of
triple pattern
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
13 31.03.2014
Derivation pattern 1: 1 x RRV
(31.75% of all DBpedia queries)
Assumption: V‘ is a human-readable label for
property R2 iff local_name(R2) = V and lang(V).
V‘ can be derived from V by substituting
separators and splitting camel-cased words into
constituents.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
<http://dbpedia.org/page/NASA> R1
<http://dbpedia.org/property/agencyName> R2
?agencyName V
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
14 31.03.2014
Derivation pattern 2: Any graph with VRR
(22.32% of all DBpedia queries)
Assumption: V‘ is a human-readable label for
class R2 iff lang(V) and R1 = rdf:type
Example:
?place rdf:type dbo:Location
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
?paper V
<http://data.semanticweb.org/ns/swc/ontology#isPartOf> R1
<http://data.semanticweb.org/conference/www/2009/proceedings> R2
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
15 31.03.2014
Evaluation – 1 x RRV
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
1,366,363 triples of class RRV
549,093 cases: local_name(R2) = V
817,269 cases: local_name(R2) ≠ V
226 pairs (URI, guessed label)
54.5% correct: sufficiently similar to existing labels
14% correct: manual evaluation
9.1% correct within a given context (location for dbo:residence)
22.4% wrong (containedfor dbprop:creator)
68%
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
16 31.03.2014
Evaluation – Any graph with VRR
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
80,455 triples of class RRV
549,093 cases: local_name(R2) = V
60 distinct URIs, 36 labels
25% correct: sufficiently similar to existing labels
39.975% correct: manual evaluation
35.025% wrong (scientist for dbo:SoccerPlayer)
64.975%
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
17 31.03.2014
Conclusions
Approach for automatically deriving labels
Acceptable precision: most derived labels
matched the already existing labels (atypical
datasets)
Derived variable names less specific
Derived labels for terminological entities
(properties and classes), not for instances.
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
18 31.03.2014
References & Acknowledgements
[BTC 2010]http://km.aifb.kit.edu/projects/btc-2010/
[Ell et al. 2011] Labels in the Web of Data, ISWC2011, to appear.
[SIGMA] http://sig.ma/search?q=Sidney+Bechet
[USEWOD2011] http://data.semanticweb.org/usewod/2011/challenge.html
[Corpex] http://km.aifb.kit.edu/sites/corpex/
[Vrandecic et al. 2011]
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
Part of this work has been carriedout in the framework of the German Research
Foundation (DFG) project entitled: "Entwicklung einer Virtuellen Forschungs-
umgebung für die Historische Bildungsforschung mit Semantischer Wiki-Techno-
logie - Semantic MediaWiki for Collaborative CorporaAnalysis"
(INST 5580/1-1), in the domain of "Scientific Library Services and Information
Systems" (LIS).
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
19 31.03.2014
THANK YOU FOR YOUR ATTENTION
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
20 31.03.2014
BACKUP SLIDES
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
21 31.03.2014
Triple pattern classes (SWDF)
Basil Ell – Deriving Human-Readable Labels from SPARQL queries
KIT – Karlsruhe Institute of Technology
Institute for Applied Informatics and Formal Description Methods
22 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries

More Related Content

What's hot

What's hot (8)

3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
 
9. Searching & Sorting - Data Structures using C++ by Varsha Patil
9. Searching & Sorting - Data Structures using C++ by Varsha Patil9. Searching & Sorting - Data Structures using C++ by Varsha Patil
9. Searching & Sorting - Data Structures using C++ by Varsha Patil
 
Chado introduction
Chado introductionChado introduction
Chado introduction
 
Positional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesPositional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted Indexes
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
Stacks in algorithems & data structure
Stacks in algorithems & data structureStacks in algorithems & data structure
Stacks in algorithems & data structure
 

Similar to Deriving human readable labels from sparql queries

Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
DBOnto
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
imranlatif
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Bradley Allen
 

Similar to Deriving human readable labels from sparql queries (20)

Labels in the web of data
Labels in the web of dataLabels in the web of data
Labels in the web of data
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
SPARQL Query Verbalization for Explaining Semantic Search Engine Queries
SPARQL Query Verbalization for Explaining Semantic Search Engine QueriesSPARQL Query Verbalization for Explaining Semantic Search Engine Queries
SPARQL Query Verbalization for Explaining Semantic Search Engine Queries
 
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
Interconnecting Belgian national and regional address data using EC ISA "Loca...
Interconnecting Belgian national and regional address data using EC ISA "Loca...Interconnecting Belgian national and regional address data using EC ISA "Loca...
Interconnecting Belgian national and regional address data using EC ISA "Loca...
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 

Recently uploaded

一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
aagad
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
abhinandnam9997
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
ChloeMeadows1
 

Recently uploaded (14)

Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
Case study on merger of Vodafone and Idea (VI).pptx
Case study on merger of Vodafone and Idea (VI).pptxCase study on merger of Vodafone and Idea (VI).pptx
Case study on merger of Vodafone and Idea (VI).pptx
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 

Deriving human readable labels from sparql queries

  • 1. KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association INSTITUTE FOR APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS www.kit.edu Deriving Human-Readable Labels from SPARQL Queries Basil Ell, Denny Vrandečić, and Elena Simperl 7th International Conference on Semantic Systems, Graz 7 September 2011
  • 2. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 2 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries Outline Motivation Human-readability of the LOD cloud Method Evaluation Conclusions
  • 3. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 3 31.03.2014 Introduction Entities are identified by URIs, such as http://de.dbpedia.org/resource/Graz http://rdf.freebase.com/ns/m.043j22x Human-readable names can be provided e.g. using the property rdfs:label dbpedia:Austria rdfs:label "Österreich"@de Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 4. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 4 31.03.2014 Motivation – Why are labels necessary? Scenario: linked data browsing Basil Ell – Deriving Human-Readable Labels from SPARQL queries [SIGMA] Is this meaningful to human users?
  • 5. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 5 31.03.2014 Human-Readability of the LOD Cloud BTC2010 Corpus [BTC2010] 3,167,799,445 ntriples 159,177,123 distinct subjects 137,156,213 (86.17%) have no value for any of the properties rdfs:label, rdfs:comment, dc:title, and foaf:name. 61.8% of the analyzed non-information resources have no label (regarding 36 labeling properties) [Ell et al. 2011] Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 6. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 6 31.03.2014 Main Idea Can we automatically derive labels for entities by analyzing SPARQL queries? station can be used as a label for http://dbpedia.org/ontology/RadioStation Basil Ell – Deriving Human-Readable Labels from SPARQL queries PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbo: <http://dbpedia.org/ontology/> SELECT ?stationWHERE { ?station rdf:type dbo:RadioStation }
  • 7. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 7 31.03.2014 Analyzed data set USEWOD2011 corpus[USEWOD2011] Contains log files from DBpedia and SWDF distinct parsable SPARQL SELECT queries: 1,212,932 (DBpedia) 195,641 (SWDF) Basil Ell – Deriving Human-Readable Labels from SPARQL queries Semantic Web Dog Food (SWDF)
  • 8. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 8 31.03.2014 Classification of variable names Class Description short String length up to 2 chars. Common: s, p, o, x. stop Known no-short strings that cannot be used as labels, e.g. subject, instance, uri. lang A no-stop string that belongs to a natural language or that consists of separatedwords of a natural language, e.g. Artist and RadioStation. Checkedfor the languages {de, en, es, fr, it} using the [Corpex] webservice. (The Corpex dataset consists of all words and their frequencies as extractedand counted from instances of Wikipedia in multiple languages. [Vrandecic et al. 2011]) nolang Variable names that are neither short, nor stop, nor lang. Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 9. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 9 31.03.2014 Classification of triple patterns Triple pattern classes P = {RRV, RVR, VRL, ...} R is a resource, V is a variable, L is a literal Ignoring features such as UNION, OPTIONAL etc. Basil Ell – Deriving Human-Readable Labels from SPARQL queries SELECT ... WHERE { ... dbpedia:Karlsruhe dbo:populationTotal ?population . ... } RRV pattern
  • 10. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 10 31.03.2014 Classification of triple patterns (2) Basil Ell – Deriving Human-Readable Labels from SPARQL queries DBpedia
  • 11. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 11 31.03.2014 DBpedia – top query patterns (pruned n >= 5000) Basil Ell – Deriving Human-Readable Labels from SPARQL queries 8312 queries consist of one VVL triple and three VRV triples Graph pattern classes visualized as hypergraph: n Number of instances TP Name of triple pattern
  • 12. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 12 31.03.2014 SWDF – top query patterns pruned (n >= 1000) Basil Ell – Deriving Human-Readable Labels from SPARQL queries Graph pattern classes visualized as hypergraph: n Number of instances TP Name of triple pattern
  • 13. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 13 31.03.2014 Derivation pattern 1: 1 x RRV (31.75% of all DBpedia queries) Assumption: V‘ is a human-readable label for property R2 iff local_name(R2) = V and lang(V). V‘ can be derived from V by substituting separators and splitting camel-cased words into constituents. Basil Ell – Deriving Human-Readable Labels from SPARQL queries <http://dbpedia.org/page/NASA> R1 <http://dbpedia.org/property/agencyName> R2 ?agencyName V
  • 14. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 14 31.03.2014 Derivation pattern 2: Any graph with VRR (22.32% of all DBpedia queries) Assumption: V‘ is a human-readable label for class R2 iff lang(V) and R1 = rdf:type Example: ?place rdf:type dbo:Location Basil Ell – Deriving Human-Readable Labels from SPARQL queries ?paper V <http://data.semanticweb.org/ns/swc/ontology#isPartOf> R1 <http://data.semanticweb.org/conference/www/2009/proceedings> R2
  • 15. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 15 31.03.2014 Evaluation – 1 x RRV Basil Ell – Deriving Human-Readable Labels from SPARQL queries 1,366,363 triples of class RRV 549,093 cases: local_name(R2) = V 817,269 cases: local_name(R2) ≠ V 226 pairs (URI, guessed label) 54.5% correct: sufficiently similar to existing labels 14% correct: manual evaluation 9.1% correct within a given context (location for dbo:residence) 22.4% wrong (containedfor dbprop:creator) 68%
  • 16. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 16 31.03.2014 Evaluation – Any graph with VRR Basil Ell – Deriving Human-Readable Labels from SPARQL queries 80,455 triples of class RRV 549,093 cases: local_name(R2) = V 60 distinct URIs, 36 labels 25% correct: sufficiently similar to existing labels 39.975% correct: manual evaluation 35.025% wrong (scientist for dbo:SoccerPlayer) 64.975%
  • 17. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 17 31.03.2014 Conclusions Approach for automatically deriving labels Acceptable precision: most derived labels matched the already existing labels (atypical datasets) Derived variable names less specific Derived labels for terminological entities (properties and classes), not for instances. Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 18. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 18 31.03.2014 References & Acknowledgements [BTC 2010]http://km.aifb.kit.edu/projects/btc-2010/ [Ell et al. 2011] Labels in the Web of Data, ISWC2011, to appear. [SIGMA] http://sig.ma/search?q=Sidney+Bechet [USEWOD2011] http://data.semanticweb.org/usewod/2011/challenge.html [Corpex] http://km.aifb.kit.edu/sites/corpex/ [Vrandecic et al. 2011] Basil Ell – Deriving Human-Readable Labels from SPARQL queries Part of this work has been carriedout in the framework of the German Research Foundation (DFG) project entitled: "Entwicklung einer Virtuellen Forschungs- umgebung für die Historische Bildungsforschung mit Semantischer Wiki-Techno- logie - Semantic MediaWiki for Collaborative CorporaAnalysis" (INST 5580/1-1), in the domain of "Scientific Library Services and Information Systems" (LIS).
  • 19. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 19 31.03.2014 THANK YOU FOR YOUR ATTENTION Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 20. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 20 31.03.2014 BACKUP SLIDES Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 21. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 21 31.03.2014 Triple pattern classes (SWDF) Basil Ell – Deriving Human-Readable Labels from SPARQL queries
  • 22. KIT – Karlsruhe Institute of Technology Institute for Applied Informatics and Formal Description Methods 22 31.03.2014 Basil Ell – Deriving Human-Readable Labels from SPARQL queries