Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

344 views

Published on

In this thesis, a validation framework is introduced that enables to consistently execute RDF-based constraint languages on RDF data and to formulate constraints of any type. The framework reduces the representation of constraints to the absolute minimum, is based on formal logics, consists of a small lightweight vocabulary, and ensures consistency regarding validation results and enables constraint transformations for each constraint type across RDF-based constraint languages.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

  1. 1. KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Validation Framework for RDF-based Constraint Languages M.Sc. (TUM) Thomas Hartmann Professor Dr. York Sure-Vetter Professor Dr. Kai Eckert (Stuttgart Media University) Professor Dr. Rudi Studer Professor Dr. Andreas Geyer-Schulz Disputation, 08.07.2016
  2. 2. 2 enthusiasm for SW technologies problem statement
  3. 3. 3 common need for RDF Validation problem statement
  4. 4. 4 common needs of data practitioners 2013: W3C RDF Validation Workshop 2014: 2 international working groups on RDF validation constraint languages SPARQL Query Language for RDF SPARQL Inferencing Notation (SPIN) Web Ontology Language (OWL) Shape Expressions (ShEx) Resource Shapes (ReSh) Description Set Profiles (DSP) Shapes Constraint Language (SHACL) none of these languages meets all requirements RDF validation as research field problem statement W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group
  5. 5. 5 Resource Description Framework (RDF) 5problem statement
  6. 6. 6 constraints of running example 6problem statement
  7. 7. 7 constraints of running example 7problem statement
  8. 8. 8 constraints of running example 8problem statement
  9. 9. 9 constraints of running example 9problem statement
  10. 10. 10 constraints of running example 10problem statement
  11. 11. 11 provide a basis for continued research RDF validation development of constraint languages further development of constraint languages based on commonly approved requirements incorporate the findings into the working groups thesis objectives thesis objectives
  12. 12. www.kit.edu 12 5 research questions
  13. 13. 13 Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? research question 1 RQ1 IASSIST Quarterly, 38(4) & 39(1), 7-16 IASSIST Quarterly, 38(4) & 39(1), 17-24 IASSIST Quarterly, 38(4) & 39(1), 25-37 IASSIST Quarterly, 38(4) & 39(1), 38-46 LDOW (WWW 2013) SemStats (ISWC 2013) DC 2012 ESWC 2011 (Poster) DDI Moving Forward Project RDF Vocabularies Working Group
  14. 14. 14 How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? research question 2 RQ2 IJMSO, 8(3) ISWC 2012 ICITST 2011 OCAS (ISWC 2011)
  15. 15. www.kit.edu 15 research question 3
  16. 16. 16 http://purl.org/net/rdf-validation DC 2014RQ3
  17. 17. 17RQ3
  18. 18. 18RQ3
  19. 19. 19RQ3
  20. 20. 20RQ3
  21. 21. 21 Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data? research question 3 RQ3
  22. 22. 22 a constraint is instantiated from a constraint type each constraint type corresponds to a requirement 81 constraint types types of constraints on RDF data RQ3
  23. 23. www.kit.edu 23 research question 4
  24. 24. 24 ShEx: ReSh: SHACL: :Book { :author @:Person{1, } } :Book a rs:ResourceShape ; rs:property [ rs:propertyDefinition :author ; rs:valueShape :Person ; rs:occurs rs:One-or-many ; ] . minimum qualified cardinality restrictions (R-75) :BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] . :PersonShape a sh:Shape ; sh:scopeClass :Person . RQ4
  25. 25. 25 SPARQL and SPIN: CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE { ?subject a ?C1 ; ?predicate ?object . BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ). BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) . FILTER ( ?cardinality < ?minimumCardinality ) . FILTER ( ?minimumCardinality = 1 ) . FILTER ( ?C1 = :Book ) . FILTER ( ?C2 = :Person ) . FILTER ( ?predicate = :author ) . } SELECT ( COUNT ( ?arg1 ) AS ?c ) WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . } RQ4 minimum qualified cardinality restrictions (R-75)
  26. 26. 26 minimum qualified cardinality restrictions (R-75) OWL: DSP: :Book rdfs:subClassOf [ a owl:Restriction ; owl:minQualifiedCardinality 1 ; owl:onProperty :author ; owl:onClass :Person ] . [ dsp:resourceClass :Book ; dsp:statementTemplate [ dsp:minOccur 1 ; dsp:property :author ; dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] . RQ4
  27. 27. 27 high-level constraint languages either lack an implementation or are based on different implementations How to consistently validate RDF data against constraints of any constraint type expressed in any RDF-based constraint language? research question 4-1 RQ4
  28. 28. 28 validation environment constraint language implementation (SPIN mapping): :MinimumQualifiedCardinalityRestrictions a spin:ConstructTemplate ; spin:body [ ... CONSTRUCT { ... } WHERE { ... } ... ] . RQ4
  29. 29. 29 validation process RQ4
  30. 30. 30RQ4 validation results 30
  31. 31. 31 validation results RQ4 31
  32. 32. 32 validation results RQ4 32
  33. 33. 33 validation results RQ4 33
  34. 34. 34 validation results RQ4 34
  35. 35. 35 validation results RQ4 35
  36. 36. 36 validation results RQ4 36
  37. 37. 37 full implementations for all OWL 2 and DSP language constructs all constraint types expressible in OWL 2 and DSP major constraint types representable by ShEx and ReSh RDF serialization for DSP validation environment http://purl.org/net/rdfval-demo RQ4
  38. 38. 38 http://purl.org/net/rdfval-demo RQ4
  39. 39. 39 constraints and constraint language constructs must be representable in RDF constraint languages and supported constraint types must be expressible in SPARQL limitations RQ4
  40. 40. 40 How to represent constraints of any constraint type and how to reduce the representation of constraints of any constraint type to the absolute minimum? research question 4-2 RQ4 DSP ReSh ShEx SHACL OWL 2 SPARQL 17.3 (14) 25.9 (21) 29.6 (24) 51.9 (42) 67.9 (55) 100.0 (81)
  41. 41. 41 intermediate abstraction layer based on formal logics enables to express any constraint type enables straight-forward mappings from high-level constraint languages reduces the representation of constraints to the absolute minimum validation framework for RDF-based constraint languages RQ4
  42. 42. 42 conceptual model DC 2015 RQ4 74% 26%
  43. 43. 43RQ4 43 simple constraints
  44. 44. 44 different validation results RQ4
  45. 45. 45 different validation results RQ4 45
  46. 46. 46 different validation results RQ4 46
  47. 47. 47 different validation results RQ4 47
  48. 48. 48 different validation results RQ4 48
  49. 49. 49 different validation results RQ4 49
  50. 50. 50 How to ensure for any constraint type that RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages? framework is solely based on the abstract definitions of constraint types just 1 SPIN mapping for each constraint type research question 4-3 RQ4
  51. 51. 51RQ4 semantically equivalent constraints 51
  52. 52. 52 How to ensure for any constraint type that semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another? gc = mα (cα) cβ = m'β (gc) RQ4 research question 4-4
  53. 53. 53 What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5 RQ5 SEMANTiCS 2015
  54. 54. 54 collected, classified, and implemented 115 constraints from vocabularies or domain experts on 3 common vocabularies well-established (QB, SKOS) under development (DDI-RDF) evaluation evaluation IJSC, 10(2) ICSC 2016 33 SPARQL endpoints
  55. 55. 55 future work: validation database and framework maintain and extend RDF validation database collect case studies and use cases extract requirements publish constraint types keep framework in sync evaluate solutions future work http://purl.org/net/rdf-validation
  56. 56. 56 future work: combine framework with SHACL derive SHACL extensions define mappings from SHACL to the abstraction layer and back maintain consistency of implementations of constraint types future work W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group
  57. 57. 57 summary of main contributions development of 3 RDF vocabularies direct validation of XML using common RDF validation tools publication of 81 constraint types validation framework for RDF-based constraint languages role of reasoning for RDF validation THANK YOU!
  58. 58. 58 acknowledgements, publications, research data 30 publications 6 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports 1. author of all (except 1) journal articles, conference articles, workshop articles research data and results KIT research data repository: http://dx.doi.org/10.5445/BWDD/11 GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis 4 international working groups DCMI RDF Application Profiles Task Group part of the editorial board RDF Vocabularies Working Group editor for DDI-RDF and PHDD W3C RDF Data Shapes Working Group DDI Moving Forward Project THANK YOU!
  59. 59. www.kit.edu 59 appendix
  60. 60. 60 publications: journal articles 1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Development of Constraint Languages by Checking Constraints on RDF Data. International Journal of Semantic Computing, 10(02), 1–25. http://www.worldscientific.com/worldscinet/ijsc 2. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4 3. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4 4. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4 5. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4 6. Bosch, Thomas & Mathiak, B. (2013). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760 Please note that in 2015, my last name changed from Bosch to Hartmann.
  61. 61. 61 publications: articles in conference proceedings 1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/ 2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368 3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867 4. Bosch, Thomas & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257 5. Bosch, Thomas & Eckert, K. (2014). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc- 2014/paper/view/270 Please note that in 2015, my last name changed from Bosch to Hartmann.
  62. 62. 62 publications: articles in conference proceedings 6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654 7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34 8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html 9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html Please note that in 2015, my last name changed from Bosch to Hartmann.
  63. 63. 63 publications: articles in workshop proceedings Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/ 2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings 3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/
  64. 64. 64 publications: specifications Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery 2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html
  65. 65. 65 publications: technical reports Please note that in 2015, my last name changed from Bosch to Hartmann. 1. Hartmann, Thomas (2016). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062 2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02 3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements 4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable 5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933
  66. 66. 66 publications: technical reports Please note that in 2015, my last name changed from Bosch to Hartmann. 6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479 7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478 8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470 9. Bosch, Thomas & Mathiak, B. (2013). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/ 10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series
  67. 67. 67 research questions 1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? 2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? 3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data? 4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another? 5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? appendix
  68. 68. 68 summary of contributions 1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies 2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains 3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints 4.1 Consistent validation across RDF-based constraint languages 4.2 Minimal representation of constraints of any type 4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages 4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another 5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type (1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics 6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality appendix
  69. 69. 69 summary of limitations 1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way 2. Constraints of supported constraint types and constraint language constructs must be representable in RDF 3. Constraint languages and supported constraint types must be expressible in SPARQL 4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies appendix
  70. 70. www.kit.edu 70 research question 1
  71. 71. 71 Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies? research question 1 RQ1 IASSIST Quarterly, 38(4) & 39(1), 7-16 IASSIST Quarterly, 38(4) & 39(1), 17-24 IASSIST Quarterly, 38(4) & 39(1), 25-37 IASSIST Quarterly, 38(4) & 39(1), 38-46 LDOW (WWW 2013) SemStats (ISWC 2013) DC 2012 ESWC 2011 (Poster) DDI Moving Forward Project RDF Vocabularies Working Group
  72. 72. 72 development of 3 RDF vocabularies: 1. DDI-RDF Discovery Vocabulary (DDI-RDF) to describe unit-record data 2. Physical Data Description (PHDD) to describe data in tabular format and its physical properties 3. The SKOS Extension for Statistics (XKOS) to describe the structure and textual properties of formal statistical classifications to describe relations between classifications and concepts and among concepts contribution RQ1
  73. 73. www.kit.edu 73 research question 2
  74. 74. 74 XML, XML Schema (XSD) RDF, Web Ontology Language (OWL) XML Schemas > OWL ontologies time-consuming work designing domain ontologies from scratch by hand reuse information contained in XML Schemas designing OWL domain ontologies RQ2
  75. 75. 75 How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed? research question 2 RQ2 IJMSO, 8(3) ISWC 2012 ICITST 2011 OCAS (ISWC 2011)
  76. 76. 76 sub-class relationships OWL hasValue restrictions on data properties OWL universal restrictions on object properties semantically rich OWL axioms <library> <book year="February 1890"> <author> <name>Arthur Conan Doyle</name> </author> <title>The Sign of the Four</title> </book> </library> Title ⊑  value.string Year ⊑  value.integer RQ2
  77. 77. 77 on formal logics based transformations OWL axioms extracted out of XML Schemas explicitly implicitly formally underpin transformations to formally define and model semantics in a semantically correct way complete extraction of XML Schemas' structural information XML can directly be validated against semantically rich OWL axioms any XML Schema is convertible to OWL minimized effort designing OWL domain ontologies contributions IJMSO, 8(3) RQ2
  78. 78. 78 ISWC 2012 ICITST 2011 OCAS (ISWC 2011) RQ2
  79. 79. 79 1. step of approach executed generic test cases created out of the XML Schema meta-model transformed XML Schemas of 6 XML standards 2. step of approach specified SWRL rules for 3 OWL domain ontologies verified hypothesis determined effort for traditional manual approach estimated effort for semi-automatic approach DDI-RDF serves as OWL domain ontology The effort and the time needed to deliver high quality domain ontologies from scratch by reusing information of already existing XML Schemas is much less than creating domain ontologies completely manually and from the ground up. evaluation IJMSO, 8(3) RQ2
  80. 80. www.kit.edu 80 research question 5
  81. 81. 81 What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5 RQ5 SEMANTiCS 2015
  82. 82. 82 What is the role reasoning plays in practical data validation? research question 5-1 RQ5
  83. 83. 83 reasoning may resolve violations Book ⊑  author.Person Book(Huckleberry-Finn) author(Huckleberry-Finn, Mark-Twain) → Person(Mark-Twain) RQ5
  84. 84. 84 reasoning may cause violations Publication ⊑ ∃ publisher.Publisher Book(Huckleberry-Finn) Book ⊑ Publication RQ5
  85. 85. 85 reasoning solves redundency Publication ⊑ ∃ publicationDate . xsd:date Book ⊑ Publication Conference-Proceeding ⊑ Publication Journal-Article ⊑ Publication RQ5
  86. 86. 86 For which constraint types reasoning may be performed prior to validation to enhance data quality? research question 5-2 RQ5
  87. 87. 87 > 2/5 of constraint types property domains (R-25): constraint types with reasoning ∃ author.⊤ ⊑ Publication author(Alices-Adventures-In-Wonderland, Lewis-Carroll) → rdf:type(Alices-Adventures-In-Wonderland, Publication) RQ5
  88. 88. 88 < 3/5 of constraint types literal pattern matching (R-44): constraint types without reasoning RQ5 ISBN a rdfs:Datatype ; owl:equivalentClass [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ([ xsd:pattern "^d{9}[d|X]$" ])] . Book ⊑  identifier.ISBN
  89. 89. 89 For which constraint types validation results differ (1) if the CWA or the OWA and (2) if the UNA or the nUNA is assumed? CWA dependent: 56.8% UNA dependent: 66.6% research question 5-3 RQ5
  90. 90. 90 56.8% of constraint types minimum qualified cardinality restrictions (R-75): CWA dependent constraint types RQ5 Book ⊑ ∃ title.⊤
  91. 91. 91 disjoint classes (R-7): CWA independent constraint types RQ5 Book ⊓ JournalArticle ⊑ ⊥
  92. 92. 92 66.6% of constraint types functional properties (R-57/65): UNA dependent constraint types RQ5 funct(title) title(The-Adventures-of-Huckleberry-Finn, "The Adventures of Huckleberry Finn") title(The-Adventures-of-Huckleberry-Finn, "Die Abenteuer des Huckleberry Finn")
  93. 93. 93 literal value comparison (R-43): UNA independent constraint types RQ5 birthDate(Albert-Einstein, "1955-04-18") deathDate(Albert-Einstein, "1879-03-14") birthDate(Albert_Einstein, "1879-03-14") deathDate(Albert_Einstein, "1955-04-18") owl:sameAs(Albert-Einstein, Albert_Einstein)
  94. 94. www.kit.edu 94 evaluation
  95. 95. 95 collected, classified, and implemented 115 constraints from vocabularies or domain experts on 3 common vocabularies well-established (QB, SKOS) under development (DDI-RDF) evaluation evaluation IJSC, 10(2) ICSC 2016 33 SPARQL endpoints
  96. 96. 96 classification of constraint types RDFS/OWL based constraint language based SPARQL based classification of constraints informational warning error evaluation classification
  97. 97. 97 RDFS/OWL based evaluation classification of constraint types :Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
  98. 98. 98 constraint language based evaluation classification of constraint types :Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
  99. 99. 99 SPARQL based evaluation classification of constraint types SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
  100. 100. 100 C (constraints), CV (constraint violations) values in % evaluation finding 1 C CV SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8
  101. 101. 101 C (constraints), CV (constraint violations) values in % evaluation finding 2 C CV SPARQL 63.2 78.2 CL 34.7 21.8 RDFS/OWL 35.6 21.8
  102. 102. 102 C (constraints), CV (constraint violations) values in % evaluation finding 3 C CV Info 42.3 31.3 Warning 18.7 62.7 Error 39.0 6.1
  103. 103. www.kit.edu 103 future work
  104. 104. 104 future work: RQ1 publication of RDF vocabularies DDI Alliance specifications W3C recommendation for DDI-RDF DDI-Lifecycle MD (Model-Driven) new requirements based on experiences with DDI-RDF international working group: DDI Moving Forward Project individual contributions formalize conceptual model (using UML 2) conceptualize and implement diverse model serializations (e.g., RDFS/OWL) future work
  105. 105. 105 aligning PHDD and CSV on the WEB overlap in the description of tabular data in CSV format broader scope of PHDD description of tabular data with fixed record length description of tabular data with multiple records per case evaluation for use in DDI-Lifecycle MD future work: RQ1 future work
  106. 106. 106 future work: RQ2 bidirectional transformations from models of any meta-model to OWL generalize from XSD meta-model based unidirectional transformations from XSD models into OWL models enable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools future work
  107. 107. 107 future work: validation database and framework maintain and extend RDF validation database collect case studies and use cases extract requirements publish constraint types keep framework in sync evaluate solutions future work http://purl.org/net/rdf-validation
  108. 108. 108 future work: combine framework with SHACL derive SHACL extensions define mappings from SHACL to the abstraction layer and back maintain consistency of implementations of constraint types future work W3C RDF Data Shapes Working Group DCMI RDF Application Profiles Task Group

×