Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Shaping the Big Ball of Data Mud
W3C's Shapes Constraint Language (SHACL)
Richard Cyganiak
Lotico Berlin Semantic Web Meet...
Semantic Web
RDF
SPARQL
OWL
RDFS
RDF
SPARQL
OWL
RDFS
Strengths Weaknesses
• Flexible can-say-anything data model
• Merging data is trivial
• Shared, explicit meaning thanks to...
Application Areas
• Knowledge graphs
• Publishing
• Life sciences
• Fraud detection & identity management
• Data integrati...
https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
RDF is supposedly self-describing.
RDF
Schema.org
Simple Knowledge Organization Scheme
(SKOS)
Dublin Core
Data Cube Vocabulary
R2RML
Linked Data Platform (LDP)
Why is RDFS not enough?
RDF
SPARQL
OWL
RDFS
Why is RDFS not enough?
• RDF “Schema” — and schemas are for validation, right?
• It’s a misnomer; should be “RDF Vocabula...
Why is OWL not enough?
RDF
SPARQL
OWL
RDFS
Why is OWL not enough?
• De facto a constraint language: logical contradiction => invalid
• Very expressive
• But targeted...
ICV: OWL closed-world semantics in Stardog
Why is SPARQL not enough?
RDF
SPARQL
OWL
RDFS
Why is SPARQL not enough? SPARQL
http://spinrdf.org/
Why is SPARQL not enough?
• SPARQL ASK seems ideal for constraint validation
• Very expressive
• Efficient implementations...
Other proposals
ShEx — Shape Expressions
http://shex.io/
So, something new?
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
SHACL
Shapes Constraint Language
SHACL Overview
• A language for “checking RDF graphs against conditions”
• Produced by W3C Data Shapes Working Group
• Wor...
ex:PersonShape
a sh:Shape ;
sh:targetClass ex:Person ;
sh:property [
sh:predicate ex:ssn ;
sh:maxCount 1 ;
sh:datatype xsd...
How a Shape works
Diagram: Dimitris Kontokostas
Targets: Initial selection of focus nodes
• Node target
• Class instance target
• Subjects-of target
• Objects-of target
•...
Node constraints
Constraints about the focus node itself:
• Node kind (IRI, blank, literal)
• IRI stem (namespace)
• IRI r...
Property constraints
Constraints about a certain outgoing or incoming property of the focus
node(s):
• Cardinality
• Class...
Other features
• Combine constraints with logical OR/any (default: AND/all)
• Property-pair comparison (=, <, >)
• Severit...
Violation reports can be produced in RDF
ex:ExampleConstraintViolation
a sh:ValidationResult ;
sh:severity sh:Violation ;
...
Relationship to Rules
• Rules: “If someone says this, then I say that.”
• SHACL can’t do this.
• Does not replace SWRL, Je...
Uses and implementations
SHACL in TopBraid Composer:
Shapes + Constraints
SHACL support is available in the TopBraid Composer Free Edition. http://...
SHACL in TopBraid Composer: SPARQL-based constraints
SHACL in TopQuadrant’s web products (EVN, EDG)
SHACL Protégé Plugin
http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html
Repairing SKOS taxonomies with SHACL
Validation of SKOS with SHACL, and extension of SHACL with
specification of repair st...
Validating the “bag of crisps”…
• Validation is often not about correct/incorrect or valid/invalid
• Constraints-first (e....
“Anyone can say anything about anything”
RDF
SPARQL
OWL
RDFS
Statements: What is being said?
What words do
we have?
What m...
richard@topquadrant.com
Backup slides
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
Upcoming SlideShare
Loading in …5
×

SHACL: Shaping the Big Ball of Data Mud

2,258 views

Published on

Semantic Web technologies (such as RDF and SPARQL) excel at bringing together diverse data in a world of independent data publishers and consumers. Common ontologies help to arrive at a shared understanding of the intended meaning of data.

However, they don’t address one critically important issue: What does it mean for data to be complete and/or valid? Semantic knowledge graphs without a shared notion of completeness and validity quickly turn into a Big Ball of Data Mud.

The Shapes Constraint Language (SHACL), an upcoming W3C standard, promises to help solve this problem. By keeping semantics separate from validity, SHACL makes it possible to resolve a slew of data quality and data exchange issues.

Presented at the Lotico Berlin Semantic Web Meetup.

Published in: Technology

SHACL: Shaping the Big Ball of Data Mud

  1. 1. Shaping the Big Ball of Data Mud W3C's Shapes Constraint Language (SHACL) Richard Cyganiak Lotico Berlin Semantic Web Meetup, 17 November 2016
  2. 2. Semantic Web RDF SPARQL OWL RDFS
  3. 3. RDF SPARQL OWL RDFS
  4. 4. Strengths Weaknesses • Flexible can-say-anything data model • Merging data is trivial • Shared, explicit meaning thanks to URIs • Mixing and matching of schemas; partial understanding • Painstakingly developed vocabularies • “Neutral ground” for modelling • SPARQL • Overgeneralisation: works for anything, but great at nothing • “RDF tax” • Logic foundations and web foundations can be baggage • Maps poorly to common programming language data structures • Schemaless nature makes optimisation difficult • Not good at semi-structured
  5. 5. Application Areas • Knowledge graphs • Publishing • Life sciences • Fraud detection & identity management • Data integration & analysis The V’s of Big Data: Volume, Velocity, Variety
  6. 6. https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/
  7. 7. RDF SPARQL OWL RDFS Validation? Constraint checking?
  8. 8. RDF is supposedly self-describing. RDF
  9. 9. Schema.org
  10. 10. Simple Knowledge Organization Scheme (SKOS)
  11. 11. Dublin Core
  12. 12. Data Cube Vocabulary
  13. 13. R2RML
  14. 14. Linked Data Platform (LDP)
  15. 15. Why is RDFS not enough? RDF SPARQL OWL RDFS
  16. 16. Why is RDFS not enough? • RDF “Schema” — and schemas are for validation, right? • It’s a misnomer; should be “RDF Vocabulary Definition Language” • Very limited expressivity • Not the right semantics for validation • ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …? • Invalid data -> infer more invalid data => ex:Germany a ex:City RDFS
  17. 17. Why is OWL not enough? RDF SPARQL OWL RDFS
  18. 18. Why is OWL not enough? • De facto a constraint language: logical contradiction => invalid • Very expressive • But targeted at logic modelling, not validity constraints • Not the right semantics for validation • ex:Dublin ex:inCountry ex:Ireland, ex:USA => …? • Open world assumption • No unique name assumption => ex:Ireland owl:sameAs ex:USA OWL
  19. 19. ICV: OWL closed-world semantics in Stardog
  20. 20. Why is SPARQL not enough? RDF SPARQL OWL RDFS
  21. 21. Why is SPARQL not enough? SPARQL
  22. 22. http://spinrdf.org/
  23. 23. Why is SPARQL not enough? • SPARQL ASK seems ideal for constraint validation • Very expressive • Efficient implementations • But writing even simple constraints can be tedious SPARQL
  24. 24. Other proposals
  25. 25. ShEx — Shape Expressions http://shex.io/
  26. 26. So, something new? RDF SPARQL OWL RDFS Validation? Constraint checking?
  27. 27. SHACL Shapes Constraint Language
  28. 28. SHACL Overview • A language for “checking RDF graphs against conditions” • Produced by W3C Data Shapes Working Group • Work in progress, some features at risk • 4th Working Draft: August 2016 • Should be done by June 2017 • Like RDFS and OWL, SHACL constraints are themselves written in RDF • SPARQL underneath (for evaluation semantics and extensibility)
  29. 29. ex:PersonShape a sh:Shape ; sh:targetClass ex:Person ; sh:property [ sh:predicate ex:ssn ; sh:maxCount 1 ; sh:datatype xsd:string ; sh:pattern "^d{3}-d{2}-d{4}$" ; ] ; sh:property [ sh:predicate ex:child ; sh:class ex:Person ; sh:nodeKind sh:IRI ; ] ; sh:property [ sh:path [ sh:inversePath ex:child ] ; sh:name "parent" ; sh:maxCount 2 ; ] .
  30. 30. How a Shape works Diagram: Dimitris Kontokostas
  31. 31. Targets: Initial selection of focus nodes • Node target • Class instance target • Subjects-of target • Objects-of target • SPARQL-based selection (advanced)
  32. 32. Node constraints Constraints about the focus node itself: • Node kind (IRI, blank, literal) • IRI stem (namespace) • IRI regex • SPARQL query constraint (advanced)
  33. 33. Property constraints Constraints about a certain outgoing or incoming property of the focus node(s): • Cardinality • Class • Datatype • Node kind (IRI, blank node, literal) • String min/max length, string regex • Numeric min/max • Value must match another shape • Value must not match another shape
  34. 34. Other features • Combine constraints with logical OR/any (default: AND/all) • Property-pair comparison (=, <, >) • Severities (Violation, Warning, Info) • Annotations (name, description, grouping, order) • Define additional types of constraints based on SPARQL (advanced)
  35. 35. Violation reports can be produced in RDF ex:ExampleConstraintViolation a sh:ValidationResult ; sh:severity sh:Violation ; sh:focusNode ex:Bob ; sh:path ex:age ; sh:value "twenty two" ; sh:message "ex:age must be literal of datatype xsd:integer." ; sh:sourceConstraintComponent sh:DatatypeConstraintComponent ; sh:sourceShape ex:PersonShape .
  36. 36. Relationship to Rules • Rules: “If someone says this, then I say that.” • SHACL can’t do this. • Does not replace SWRL, Jena Rules, RIF, SPIN Rules
  37. 37. Uses and implementations
  38. 38. SHACL in TopBraid Composer: Shapes + Constraints SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
  39. 39. SHACL in TopBraid Composer: SPARQL-based constraints
  40. 40. SHACL in TopQuadrant’s web products (EVN, EDG)
  41. 41. SHACL Protégé Plugin http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html
  42. 42. Repairing SKOS taxonomies with SHACL Validation of SKOS with SHACL, and extension of SHACL with specification of repair strategies. Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
  43. 43. Validating the “bag of crisps”… • Validation is often not about correct/incorrect or valid/invalid • Constraints-first (e.g., SQL) • Well-formed vs valid (e.g., XML Schema) • Validation is often about completeness and correctness for a specific purpose: “This is what I produce”; “This is what I understand” • Assumption is that there may be other statements • Different consumers may apply different constraints • SHACL should work well in this flexible, multi-source, multi-consumer world.
  44. 44. “Anyone can say anything about anything” RDF SPARQL OWL RDFS Statements: What is being said? What words do we have? What makes logical sense to say? What did you say about XYZ? OWL SHACL Is that word used correctly? What do you need to know from me? You can't say that here! I’d never say that!
  45. 45. richard@topquadrant.com
  46. 46. Backup slides

×