Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards an RDF Validation Language based on Regular Expression Derivatives

1,245 views

Published on

Towards an RDF Validation Language based on Regular Expression Derivatives
Author: Jose Emilio Labra Gayo
Slides presented at: Linked Web Data Management Workshop
Brussels, 27th March, 2015

Published in: Internet
  • Be the first to comment

Towards an RDF Validation Language based on Regular Expression Derivatives

  1. 1. Towards an RDF Validation Language based on Regular Expression Derivatives Eric Prud'hommeaux World Wide Web Consortium MIT, Cambridge, MA, USA Harold Solbrig Mayo Clinic USA College of Medicine, Rochester, MN, USA Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain Sławek Staworko LINKS, INRIA & CNRS University of Lille, France
  2. 2. Overview Shape Expressions for RDF validation - Justification Regular Shape Expressions Axiomatic Semantics Implementation based on Derivatives Regular Shape Expression Schemas Adapt Axiomatic Semantics to Schemas Adapt Implementation based on Derivatives Conclusions & Future work
  3. 3. Shape Expressions Simple and intuitive language that can: Describe the topology of RDF data Validate that RDF instance data matches a shape Two syntaxes Compact syntax (inspired by RelaxNG, Turtle and SPARQL) RDF Related to W3c RDF Data Shapes Working Group
  4. 4. Example: RDF model of a Person Person__ foaf:age xsd:integer foaf:name xsd:string + 0..* foaf:knows :john foaf:age 23; foaf:name "John"; foaf:knows :bob . :bob foaf:age 34; foaf:name "Bob", "Robert" .   <Person> { foaf:age xsd:integer , foaf:name xsd:string+ , foaf:knows @<Person>* } Shape Expressions Schema Some RDF data :mary foaf:age 50, 65 . E-R Diagram
  5. 5. Why not SPARQL? <Person> { foaf:age xsd:integer , foaf:name xsd:string+ , foaf:knows @<Person>* }ASK { { SELECT ?Person { ?Person foaf:age ?o . } GROUP BY ?Person HAVING (COUNT(*)=1) } { SELECT ?Person { ?Person foaf:age ?o . FILTER ( isLiteral(?o) && datatype(?o) = xsd:integer ) } GROUP BY ?Person HAVING (COUNT(*)=1) } ... 1 2 3 4 5 6 7 8 9 10 ... ... { SELECT ?Person (COUNT(*) AS ?Person_c0) { ?Person foaf:name ?o . } GROUP BY ?Person HAVING (COUNT(*)>=1) } { SELECT ?Person (COUNT(*) AS ?Person_c1) { ?Person foaf:name ?o . FILTER (isLiteral(?o) && datatype(?o) = xsd:string) } GROUP BY ?Person HAVING (COUNT(*)>=1) } FILTER (?Person_c0 = ?Person_c1) ... ... 11 12 13 14 15 16 17 18 19 20 ... ... { { { SELECT ?Person (COUNT(*) AS ?Person_c2) { ?Person foaf:knows ?o . } GROUP BY ?Person } { SELECT ?Person (COUNT(*) AS ?Person_c3) { ?Person foaf:knows ?o . FILTER ((isIRI(?o) || isBlank(?o))) } GROUP BY ?Person HAVING (COUNT(*) >= 1) } FILTER (?Person_c2 = ?Person_c3) } ... ... 21 22 23 24 25 26 27 28 29 30 ... ... UNION { SELECT ?Person { OPTIONAL { ?Person foaf:knows ?o } FILTER (!bound(?o)) } } } } ... 31 32 33 34 35 36 37 38 1 2 3 4 5
  6. 6. Regular Shape Expressions (RSEs) Simplified version of Shape Expressions Based on Regular Expressions Sets of triples instead of list of characters Interleave instead of concatenation Abstract syntax
  7. 7. Shape Expressions vs RSEs* <Shape1> { foaf:age xsd:integer , foaf:name xsd:string* } Example1: Shape Expression RSE * Note: We are considering a subset of Shape Expressions with Closed Shapes, and inclusive Or <Shape2> { :a ( 1 ) , :b ( 1 2 ) * } Example 2:
  8. 8. Cardinalities in RSEs Cardinalities can be defined as: Example:
  9. 9. Shape of a RSE: Example
  10. 10. Simplification rules It is easy to show that the operators obey:
  11. 11. Matching triples with RSEs
  12. 12. Example matching tree Rules employed
  13. 13. Derivatives of RSEs Brzozowski's algorithm (1964) developed for Regular Expressions We adapted that algorithm to RSEs Calculates the derivative of a RSE with respect to a triple t: Definition:
  14. 14. Calculating the derivative Definitions       
  15. 15. Matching using derivatives Auxiliary function that returns true if a RSE matches the empty graph The matching relation can be expressed as:
  16. 16. Example trace:      
  17. 17. Regular Shape Expression Schemas Given a set of labels, a RSE schema is a function where we extend RSEs to admit label references Example 1: Example 2: <Person> { foaf:age xsd:integer , foaf:knows @<Person>* } Corresponds to:
  18. 18. From matching to typing We extend previous definitions to include the notion of typing A typing associates a label to a node in a context Definitions on typings The matching algorithm returns the typing in the context:
  19. 19. Matching RSEs Schemas We define the matching of a RSE e with a set of triples as a partial function that returns a typing. The function takes a typing context as argument and we extend previous axiomatic definitions as...
  20. 20. Axiomatic definitions adapted RSE Schemas
  21. 21. Derivative of a RSE in a typing context We adapt previous definitions to typing contexts where
  22. 22. Example:
  23. 23. Implementations The algorithm has been implemented in Scala Available at: http://labra.github.io/shexcala We have also implemented a simplified prototype following the paper definitions in Haskell Available at: http://labra.github.io/Haws An online version is also available at: http://rdfshape.weso.es
  24. 24. First experimental results Comparison between derivatives (deriv) and backtracking (back)
  25. 25. Conclusions & Future work Declarative algorithm to match Regular Shape Expressions Based on equational reasoning Theoretical complexity is unaffected However, the derivatives algorithm behaves better than backtracking in practice Future work: Prove the correctness of the algorithm Experimental results Align this work with current RDF Data Shapes development
  26. 26. End of Presentation
  27. 27. SHACL vs RSEs At this moment, SHACL is being defined by the RDF Data Shapes WG Some differences: Open Shapes (allow remaining triples) Arcs check that there are no other arcs with the same predicate and different values And operator instead of interleave Inclusive vs Exclusive-or Semantics of all these features is under discussion
  28. 28. Example of derivatives that don't match

×