Poster - Completeness Statements about RDF Data Sources and Their Use for Query Answering

253 views

Published on

Thousands of RDF data sources are today available on the Web.
Machine-readable qualitative descriptions of their content are crucial.
We focus on data completeness, an important aspect of data quality.
How to formalize and express in a machine-readable way completeness information about RDF data sources?
How to leverage such completeness information?
Formal framework for expressing completeness information.
Study of query completeness from completeness information in various settings.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
253
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Poster - Completeness Statements about RDF Data Sources and Their Use for Query Answering

  1. 1. Completeness Statements about RDF Data Sources and Their Use for Query Answering Fariz Darari joint work with Werner Nutt, Giuseppe Pirrò, and Simon Razniewski KRDB, Free University of Bozen-Bolzano, Italy Context Problem Thousands of RDF data sources are today available on the Web. Machine-readable qualitative descriptions of their content are crucial. We focus on data completeness, an important aspect of data quality. Contributions How to formalize and express in a machine-readable way completeness information about RDF data sources? How to leverage such completeness information? Completeness statement on the Web 1. Formal framework for expressing completeness information. 2. Study of query completeness from completeness information in various settings. Completeness statement on the Semantic Web lv:lmdbdataset rdf:type void:Dataset. lv:lmdbdataset c:hasComplStmt lv:st1. lv:st1 c:hasPattern [c:subject[spin:varName "m"]; c:predicate schema:actor; c:object[spin:varName "a"]]. lv:st1 c:hasCondition [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie]. lv:st1 c:hasCondition [c:subject [spin:varName "m"]; c:predicate schema:director; c:object dbp:Tarantino]. Semantics of completeness statements For each completeness statement, all the triple patterns defined via hasPattern are collected into a set P1 and all the triple patterns defined via hasCondition are collected into a set P2. A completeness statement is interpreted as: CONSTRUCT {P1} WHERE {P1 . P2} When a data source has a completeness statement (defined via hasComplStmt), it means that if the query above is evaluated over an “ideal” graph then all the results are in the data source. Users visiting this source can prefer it to other sources. Checking query completeness Given a query Q and a data source with completeness statements S: 1. Create a template answer graph GQ of Q. 2. Over GQ , evaluate all CONSTRUCT queries derived from S 3. Check whether GQ can be obtained after the evaluation. If yes, the query is complete, otherwise might be incomplete. However, the completeness statement verified as complete is only human readable! Query completeness in a single data source scenario @prefix @prefix @prefix @prefix @prefix @prefix @prefix @prefix c: <http://inf.unibz.it/ontologies/completeness#> rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> spin: <http://spinrdf.org/sp#> void: <http://rdfs.org/ns/void#> dv: <http://dbpedia.org/void/> lv: <http://linkedmdb.org/void/> dbp: <http://dbpedia.org/resource/> schema: <http://schema.org> dv:dbpdataset rdf:type void:Dataset; dv:dbpdataset c:hasComplStmt dv:st1. dv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie ]. dv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate schema:director;c:object dbp:Tarantino]. Endpoint IRI DBPe lv:lmdbdataset rdf:type void:Dataset; lv:lmdbdataset c:hasComplStmt lv:st1. lv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie ]. lv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate schema:director;c:object dbp:Tarantino ]. lv:lmdbdataset c:hasComplStmt lv:st2. lv:st2 c:hasPattern [c:subject[spin:varName "m"]; c:predicate schema:actor; c:object[spin:varName "a"]]. lv:st2 c:hasCondition [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie]. lv:st2 c:hasCondition [c:subject [spin:varName "m"]; c:predicate schema:director; c:object dbp:Tarantino]. Select all the movies for which Tarantino is the director and also an actor SPARQL endpoint DBPedia is complete for all Tarantino's movies The answer is incomplete Endpoint IRI LMDBe SELECT ?m SPARQL WHERE {?m rdf:type schema:Movie. The answer is endpoint complete ?m schema:director dbp:Tarantino. ?m schema:actor dbp:Tarantino} LinkedMDB is completeall Tarantino’s movies and LMDB is complete for for all Tarantino's movies Q and also moviestheir actors. is an actor all for which he Extensions SPARQL queries with OPT Completeness with RDFS inference Federated query completeness Work In Progress SPARQL queries with negations and comparisons Live, Web-based CoRner Empirical evaluation of query completeness checking Why is DBpedia not complete for the query ? The completeness statement in DBpedia says that it is complete for Tarantino’s movies (dv:st1). However, the query asks about all movies for which Tarantino is the director, and also an actor. It is not stated that DBpedia includes all the actors of Tarantino’s movies. Therefore, DBpedia is possibly not complete for this query. Why is LinkedMDB complete ? The completeness statements in LMDB say that they are complete for Tarantino’s movies (lv:st1) and also the actors (lv:st2). Implementation CoRner: Completeness Reasoner http://rdfcorner.wordpress.com

×