SHACL: Shaping the Big Ball of Data Mud

Shaping the Big Ball of Data Mud
W3C's Shapes Constraint Language (SHACL)
Richard Cyganiak
Lotico Berlin Semantic Web Meetup, 17 November 2016

Semantic Web
RDF
SPARQL
OWL
RDFS

Strengths Weaknesses
• Flexible can-say-anything data model
• Merging data is trivial
• Shared, explicit meaning thanks to URIs
• Mixing and matching of schemas;
partial understanding
• Painstakingly developed vocabularies
• “Neutral ground” for modelling
• SPARQL
• Overgeneralisation: works for
anything, but great at nothing
• “RDF tax”
• Logic foundations and web
foundations can be baggage
• Maps poorly to common
programming language data
structures
• Schemaless nature makes
optimisation difficult
• Not good at semi-structured

Application Areas
• Knowledge graphs
• Publishing
• Life sciences
• Fraud detection & identity management
• Data integration & analysis
The V’s of Big Data: Volume, Velocity, Variety

https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/

RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?

RDF is supposedly self-describing.
RDF

Simple Knowledge Organization Scheme
(SKOS)

Why is RDFS not enough?
RDF
SPARQL
OWL
RDFS

Why is RDFS not enough?
• RDF “Schema” — and schemas are for validation, right?
• It’s a misnomer; should be “RDF Vocabulary Definition Language”
• Very limited expressivity
• Not the right semantics for validation
• ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …?
• Invalid data -> infer more invalid data
=> ex:Germany a ex:City
RDFS

Why is OWL not enough?
RDF
SPARQL
OWL
RDFS

Why is OWL not enough?
• De facto a constraint language: logical contradiction => invalid
• Very expressive
• But targeted at logic modelling, not validity constraints
• Not the right semantics for validation
• ex:Dublin ex:inCountry ex:Ireland, ex:USA => …?
• Open world assumption
• No unique name assumption
=> ex:Ireland owl:sameAs ex:USA
OWL

ICV: OWL closed-world semantics in Stardog

Why is SPARQL not enough?
RDF
SPARQL
OWL
RDFS

Why is SPARQL not enough? SPARQL

Why is SPARQL not enough?
• SPARQL ASK seems ideal for constraint validation
• Very expressive
• Efficient implementations
• But writing even simple constraints can be tedious
SPARQL

ShEx — Shape Expressions
http://shex.io/

So, something new?
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?

SHACL
Shapes Constraint Language

SHACL Overview
• A language for “checking RDF graphs against conditions”
• Produced by W3C Data Shapes Working Group
• Work in progress, some features at risk
• 4th Working Draft: August 2016
• Should be done by June 2017
• Like RDFS and OWL, SHACL constraints are themselves written in RDF
• SPARQL underneath (for evaluation semantics and extensibility)

ex:PersonShape
a sh:Shape ;
sh:targetClass ex:Person ;
sh:property [
sh:predicate ex:ssn ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^d{3}-d{2}-d{4}$" ;
] ;
sh:property [
sh:predicate ex:child ;
sh:class ex:Person ;
sh:nodeKind sh:IRI ;
] ;
sh:property [
sh:path [ sh:inversePath ex:child ] ;
sh:name "parent" ;
sh:maxCount 2 ;
] .

How a Shape works
Diagram: Dimitris Kontokostas

Targets: Initial selection of focus nodes
• Node target
• Class instance target
• Subjects-of target
• Objects-of target
• SPARQL-based selection (advanced)

Node constraints
Constraints about the focus node itself:
• Node kind (IRI, blank, literal)
• IRI stem (namespace)
• IRI regex
• SPARQL query constraint (advanced)

Property constraints
Constraints about a certain outgoing or incoming property of the focus
node(s):
• Cardinality
• Class
• Datatype
• Node kind (IRI, blank node, literal)
• String min/max length, string regex
• Numeric min/max
• Value must match another shape
• Value must not match another shape

Other features
• Combine constraints with logical OR/any (default: AND/all)
• Property-pair comparison (=, <, >)
• Severities (Violation, Warning, Info)
• Annotations (name, description, grouping, order)
• Define additional types of constraints based on SPARQL (advanced)

Violation reports can be produced in RDF
ex:ExampleConstraintViolation
a sh:ValidationResult ;
sh:severity sh:Violation ;
sh:focusNode ex:Bob ;
sh:path ex:age ;
sh:value "twenty two" ;
sh:message "ex:age must be literal of datatype xsd:integer." ;
sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
sh:sourceShape ex:PersonShape .

Relationship to Rules
• Rules: “If someone says this, then I say that.”
• SHACL can’t do this.
• Does not replace SWRL, Jena Rules, RIF, SPIN Rules

SHACL in TopBraid Composer:
Shapes + Constraints
SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/

SHACL in TopBraid Composer: SPARQL-based constraints

SHACL in TopQuadrant’s web products (EVN, EDG)

SHACL Protégé Plugin
http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html

Repairing SKOS taxonomies with SHACL
Validation of SKOS with SHACL, and extension of SHACL with
specification of repair strategies.
Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf

Validating the “bag of crisps”…
• Validation is often not about correct/incorrect or valid/invalid
• Constraints-first (e.g., SQL)
• Well-formed vs valid (e.g., XML Schema)
• Validation is often about completeness and correctness for a specific
purpose: “This is what I produce”; “This is what I understand”
• Assumption is that there may be other statements
• Different consumers may apply different constraints
• SHACL should work well in this flexible, multi-source, multi-consumer
world.

“Anyone can say anything about anything”
RDF
SPARQL
OWL
RDFS
Statements: What is being said?
What words do
we have?
What makes logical sense to say?
What did you say
about XYZ?
OWL SHACL
Is that word used correctly?
What do you need to know from me?
You can't say that here!
I’d never say that!

SHACL: Shaping the Big Ball of Data Mud

SHACL: Shaping the Big Ball of Data Mud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SHACL: Shaping the Big Ball of Data Mud

Similar to SHACL: Shaping the Big Ball of Data Mud (20)

More from Richard Cyganiak

More from Richard Cyganiak (11)

Recently uploaded

Recently uploaded (20)

SHACL: Shaping the Big Ball of Data Mud

Editor's Notes