Your SlideShare is downloading. ×
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Using SPARQL and SPIN for Data Quality Management on the Semantic Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using SPARQL and SPIN for Data Quality Management on the Semantic Web

4,718

Published on

Published in: Technology, Business
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
4,718
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
49
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Using SPARQL and SPIN for Data Quality Management on the Semantic Web Christian Fürber / Martin Hepp christian@fuerber.com, mhepp@computer.org Presentation @ BIS May 4th 2010
  • 2. Vision of the Semantic Web Publishing data on the web in a meaningful way for more automation, better integration, and higher reusability of data. © Hanspeter Graf / www.pixelio.de C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 2
  • 3. Growth of Data: Retrieving information Well on Track… Building smart Reference: http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html SemWeb apps C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 3
  • 4. …but what if the published data was of poor quality? Get a giant camcorder from amazon! C. Fürber, M. Hepp: 4 Using SPARQL and SPIN for Data Quality Management on the Semantic Web
  • 5. Using Poor Data is Costly Without quality checks your SemWeb Apps will take this data seriously and… …get an oversized shipping package with expensive postage, …and waste transportation capacity. C. Fürber, M. Hepp: 5 Using SPARQL and SPIN for Data Quality Management on the Semantic Web
  • 6. Is there any way to avoid data quality disasters? Yes, if we know about data quality problems, before anything bad will happen! A giant camcorder on the road! C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 6
  • 7. The Impact of Poor Data Quality Higher Costs Missed Revenues Poor Decisions Lower Product / Failed Business Processes Service Quality Failed Projects Lower Stakeholder Satisfaction Fatal Disasters C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 7
  • 8. Data Quality is a Key Bottleneck of the Unique value violation Semantic Web <vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1"> <vocab:location_ZIP></vocab:location_ZIP> Missing literal values <vocab:location_STREETNO></vocab:location_STREETNO> <vocab:location_COUNTRY>France</vocab:location_COUNTRY> <vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int" >1</vocab:location_ID> <vocab:location_STREET>8489 Strong St.</vocab:location_STREET> <vocab:location_STATE>NV</vocab:location_STATE> <rdfs:label>location #1</rdfs:label> Functional dependency violation <vocab:location_CITY>Las Vegas</vocab:location_CITY> </vocab:location> Syntax violation C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 8
  • 9. <vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1"> <vocab:location_ZIP></vocab:location_ZIP> Our Approach <vocab:location_STREETNO></vocab:location_STREETNO> <vocab:location_COUNTRY>France</vocab:location_COUNTRY> <vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int" >1</vocab:location_ID> <vocab:location_STREET>8489 Strong St.</vocab:location_STREET> <vocab:location_STATE>NV</vocab:location_STATE> <rdfs:label>location #1</rdfs:label> <vocab:location_CITY>Las Vegas</vocab:location_CITY> </vocab:location> Identification of data quality problems on instance level of Semantic Web sources solely with Semantic Web technologies. Integration advantages Access to SemWeb data may be useful for dqm. C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 9
  • 10. Proposed Architecture SPARQL + SPIN Query Layer Domain- SPIN Ontology Ontology Layer OBDQM Data Sources Layer Knowledge Linked RDB Base Data Cloud C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 10
  • 11. Defining Data Quality Rules with SPARQL (1) Define what is allowed and negate it. Define what is not allowed. Negations and regular expressions save manual effort. C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 11
  • 12. Defining Data Quality Rules with SPARQL (2) The city „Las Vegas“ must be in the country „USA“. # Checking functional dependency of {?arg4} with {?arg2} CONSTRUCT { _:b0 a spin:ConstraintViolation . _:b0 spin:violationRoot ?this . _:b0 spin:violationPath vocab:location_COUNTRY . } WHERE { ?this vocab:location_CITY „Las Vegas“ . FILTER (!spl:hasValue(?this, vocab:location_COUNTRY, “USA”)) . } C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 12
  • 13. Defining Data Quality Rules with SPARQL (3) High reusability of data quality rules through SPIN‘s SPARQL query templates. # Checking functional dependency of {?arg4} with {?arg2} CONSTRUCT { _:b0 a spin:ConstraintViolation . _:b0 spin:violationRoot ?this . _:b0 spin:violationPath ?arg3 . } WHERE { ?this ?arg1 ?arg2 . FILTER (!spl:hasValue(?this, ?arg3, ?arg4)) . } C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 13
  • 14. Enforced DQ-Rules with SPIN Application: http://www.topquadrant.com/products/TB_Composer.html#free C. Fürber, M. Hepp: 14 Using SPARQL and SPIN for Data Quality Management on the Semantic Web
  • 15. More Data Quality Rule Templates (1) Data Quality Problem SPARQL Query Template Missing literal values ASK WHERE { ?this ?arg1 "" . } Out of range value ASK WHERE { ?this ?arg1 ?value . (lower limit) FILTER (?value < ?arg2) . } Out of range value ASK WHERE { ?this ?arg1 ?value . (upper limit) FILTER (?value > ?arg2) . } Global Ontology Knowledge RDB RDB Base C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 15
  • 16. More Data Quality Rule Templates (2) Data Quality Problem SPARQL Query Template Syntax violation ASK WHERE { ?this ?arg1 ?value . (only letters and dots FILTER (!regex(str(?value), allowed) "^([A-Za-z,. ])*$"))} Unique value violation CONSTRUCT { _:b0 a spin:ConstraintViolation . _:b0 spin:violationRoot ?a . _:b0 spin:violationPath ?arg1 . } WHERE { ?a ?arg1 ?uniqueValue . ?b ?arg1 ?uniqueValue . FILTER (?a != ?b)} Global Ontology RDB RDB Knowledge Base C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 16
  • 17. Contributions • Domain-independent SPARQL query templates for data quality problem identification • Queries are highly reusable • Architecture enables the use of Linked Data • Methodology for data quality management of Semantic Web data • First approach on how to apply SPIN for DQM C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 17
  • 18. Limitations & Open Issues • Knowing the problem does not mean we can solve it • Homonym / Synonym handling • Incomplete knowledge may cause constraint violations of clean instances • Current approach focuses on literal values • Scalability on large data sets C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 18
  • 19. Ongoing Extensions • Extension to a broader set of data quality problems • Enabling synonym handling and homonym tolerance • Enhancement of peformance • Calculation of information quality scores • Integration of Linked Data as trusted reference for data quality management • Evaluate the quality of popular Semantic Web data sets on instance level (e.g. Geonames & DBPedia) • Extension for (semi-)automated data cleansing C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 19
  • 20. Christian Fuerber Researcher E-Business & Web Science Research Group Werner-Heisenberg-Weg 39 85577 Neubiberg Germany skype c.fuerber email christian@fuerber.com web http://www.unibw.de/ebusiness homepage http://www.fuerber.com Paper is available at http://bit.ly/bYes0V 20
  • 21. References & Links LOD-Cloud: http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html D2RQ: http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/ SPIN: http://spinrdf.org/ TopBraid Composer Free Edition: http://www.topquadrant.com/products/TB_Composer.html#free C. Fürber, M. Hepp: 21 Using SPARQL and SPIN for Data Quality Management on the Semantic Web

×