Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2016.02 - Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages (ICSC 2016)

540 views

Published on

For research institutes, data libraries, and data
archives, RDF data validation according to predefined constraints
is a much sought-after feature, particularly as this is taken
for granted in the XML world. Based on our work in the
DCMI RDF Application Profiles Task Group and in cooperation
with the W3C Data Shapes Working Group, we identified and
published by today 81 types of constraints that are required
by various stakeholders for data applications. In this paper,
in collaboration with several domain experts we formulate 115
constraints on three different vocabularies (DDI-RDF, QB, and
SKOS) and classify them according to (1) the severity of an
occurring violation and (2) the complexity of the constraint
expression in common constraint languages. We evaluate the
data quality of 15,694 data sets (4.26 billion triples) of research
data for the social, behavioral, and economic sciences obtained
from 33 SPARQL endpoints. Based on the results, we formulate
several findings to direct the further development of constraint
languages.

Published in: Technology
  • Be the first to comment

2016.02 - Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages (ICSC 2016)

  1. 1. Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages Thomas Hartmann Benjamin Zapilko, Joachim Wackerow, Kai Eckert International Conference on Semantic Systems (ICSC 2016)
  2. 2. XML Validation
  3. 3. <!ELEMENT library (book+, author*)> <!ELEMENT book (isbn, title, author-ref+)> <!ATTLIST book id ID #REQUIRED > <!ELEMENT author-ref EMPTY> <!ATTLIST author-ref id IDREF #REQUIRED > <!ELEMENT author (name)> <!ATTLIST author id ID #REQUIRED > <!ELEMENT isbn (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT name (#PCDATA)>
  4. 4. RDF Validation Workshop
  5. 5. Working Groups on RDF Validation W3C Data Shapes Working Group DCMI RDF Application Profiles Task Group
  6. 6. http://purl.org/net/rdf-validation 81 Types of Constraints on RDF Data
  7. 7. Constraint Languages
  8. 8. SPARQL Query Language for RDF SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
  9. 9. SPARQL Inferencing Notation (SPIN) # FILTER NOT EXISTS { ?book author ?person } [ a sp:Filter ; sp:expression [ a sp:notExists ; sp:elements ( [ sp:subject [ sp:varName "book" ] ; sp:predicate author ; sp:object [ sp:varName "person" ]])]])
  10. 10. Web Ontology Language (OWL) :Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
  11. 11. Shape Expressions (ShEx) :Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
  12. 12. Resource Shapes (ReSh) :Computer-Science-Book a oslc:ResourceShape ; oslc:property [ oslc:propertyDefinition :subject ; oslc:allowedValues [ oslc:allowedValue "Computer Science" , "Informatics" , "Information Technology" ] ] .
  13. 13. [ a dsp:DescriptionTemplate ; dsp:resourceClass :Science-Fiction-Book ; dsp:statementTemplate [ dsp:property :subject ; dsp:nonLiteralConstraint [ dsp:valueClass skos:Concept ; dsp:valueURI :Science-Fiction, :Sci-Fi, :SF ; dsp:vocabularyEncodingScheme :Science-Fiction-Book-Subjects ; ] ] . Description Set Profiles (DSP)
  14. 14. Shapes Constraint Language (SHACL) :BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] .
  15. 15. http://purl.org/net/rdfval-demo RDF Validation Environment
  16. 16. Constraint Types Classification 1. RDFS/OWL Based 2. Constraint Language Based 3. SPARQL Based
  17. 17. RDFS/OWL Based :Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
  18. 18. Constraint Language Based :Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
  19. 19. SPARQL Based SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
  20. 20. Constraints Classification 1. Informational 2. Warning 3. Error
  21. 21. Evaluation Setup • 115 constraints from vocabularies and experts • constraints classified and implemented • on 3 vocabularies in the SBE sciences – well-established vocabularies (QB, SKOS) – vocabulary under development (DDI-RDF)
  22. 22. Validated Data Sets Vocabulary Data Sets Triples QB 9,990 3,775,983,610 SKOS 4,178 477,737,281 DDI-RDF 1,526 9,673,055 Total 15,694 4.26 billion 33 SPARQL Endpoints
  23. 23. Finding 1 C [%] CV [%] SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8 C (constraints), CV (constraint violations)
  24. 24. Finding 2 C [%] CV [%] SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8 C (constraints), CV (constraint violations)
  25. 25. Finding 3 C [%] CV [%] Info 42.3 31.3 Warning 18.7 62.7 Error 39.0 6.1 C (constraints), CV (constraint violations)
  26. 26. Limitations > 3 Vocabularies > 1 Domain

×