SlideShare a Scribd company logo
1 of 50
Shaping the Big Ball of Data Mud
W3C's Shapes Constraint Language (SHACL)
Richard Cyganiak
Lotico Berlin Semantic Web Meetup, 17 November 2016
Semantic Web
RDF
SPARQL
OWL
RDFS
RDF
SPARQL
OWL
RDFS
Strengths Weaknesses
• Flexible can-say-anything data model
• Merging data is trivial
• Shared, explicit meaning thanks to URIs
• Mixing and matching of schemas;
partial understanding
• Painstakingly developed vocabularies
• “Neutral ground” for modelling
• SPARQL
• Overgeneralisation: works for
anything, but great at nothing
• “RDF tax”
• Logic foundations and web
foundations can be baggage
• Maps poorly to common
programming language data
structures
• Schemaless nature makes
optimisation difficult
• Not good at semi-structured
Application Areas
• Knowledge graphs
• Publishing
• Life sciences
• Fraud detection & identity management
• Data integration & analysis
The V’s of Big Data: Volume, Velocity, Variety
https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
RDF is supposedly self-describing.
RDF
Schema.org
Simple Knowledge Organization Scheme
(SKOS)
Dublin Core
Data Cube Vocabulary
R2RML
Linked Data Platform (LDP)
Why is RDFS not enough?
RDF
SPARQL
OWL
RDFS
Why is RDFS not enough?
• RDF “Schema” — and schemas are for validation, right?
• It’s a misnomer; should be “RDF Vocabulary Definition Language”
• Very limited expressivity
• Not the right semantics for validation
• ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …?
• Invalid data -> infer more invalid data
=> ex:Germany a ex:City
RDFS
Why is OWL not enough?
RDF
SPARQL
OWL
RDFS
Why is OWL not enough?
• De facto a constraint language: logical contradiction => invalid
• Very expressive
• But targeted at logic modelling, not validity constraints
• Not the right semantics for validation
• ex:Dublin ex:inCountry ex:Ireland, ex:USA => …?
• Open world assumption
• No unique name assumption
=> ex:Ireland owl:sameAs ex:USA
OWL
ICV: OWL closed-world semantics in Stardog
Why is SPARQL not enough?
RDF
SPARQL
OWL
RDFS
Why is SPARQL not enough? SPARQL
http://spinrdf.org/
Why is SPARQL not enough?
• SPARQL ASK seems ideal for constraint validation
• Very expressive
• Efficient implementations
• But writing even simple constraints can be tedious
SPARQL
Other proposals
ShEx — Shape Expressions
http://shex.io/
So, something new?
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
SHACL
Shapes Constraint Language
SHACL Overview
• A language for “checking RDF graphs against conditions”
• Produced by W3C Data Shapes Working Group
• Work in progress, some features at risk
• 4th Working Draft: August 2016
• Should be done by June 2017
• Like RDFS and OWL, SHACL constraints are themselves written in RDF
• SPARQL underneath (for evaluation semantics and extensibility)
ex:PersonShape
a sh:Shape ;
sh:targetClass ex:Person ;
sh:property [
sh:predicate ex:ssn ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^d{3}-d{2}-d{4}$" ;
] ;
sh:property [
sh:predicate ex:child ;
sh:class ex:Person ;
sh:nodeKind sh:IRI ;
] ;
sh:property [
sh:path [ sh:inversePath ex:child ] ;
sh:name "parent" ;
sh:maxCount 2 ;
] .
How a Shape works
Diagram: Dimitris Kontokostas
Targets: Initial selection of focus nodes
• Node target
• Class instance target
• Subjects-of target
• Objects-of target
• SPARQL-based selection (advanced)
Node constraints
Constraints about the focus node itself:
• Node kind (IRI, blank, literal)
• IRI stem (namespace)
• IRI regex
• SPARQL query constraint (advanced)
Property constraints
Constraints about a certain outgoing or incoming property of the focus
node(s):
• Cardinality
• Class
• Datatype
• Node kind (IRI, blank node, literal)
• String min/max length, string regex
• Numeric min/max
• Value must match another shape
• Value must not match another shape
Other features
• Combine constraints with logical OR/any (default: AND/all)
• Property-pair comparison (=, <, >)
• Severities (Violation, Warning, Info)
• Annotations (name, description, grouping, order)
• Define additional types of constraints based on SPARQL (advanced)
Violation reports can be produced in RDF
ex:ExampleConstraintViolation
a sh:ValidationResult ;
sh:severity sh:Violation ;
sh:focusNode ex:Bob ;
sh:path ex:age ;
sh:value "twenty two" ;
sh:message "ex:age must be literal of datatype xsd:integer." ;
sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
sh:sourceShape ex:PersonShape .
Relationship to Rules
• Rules: “If someone says this, then I say that.”
• SHACL can’t do this.
• Does not replace SWRL, Jena Rules, RIF, SPIN Rules
Uses and implementations
SHACL in TopBraid Composer:
Shapes + Constraints
SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
SHACL in TopBraid Composer: SPARQL-based constraints
SHACL in TopQuadrant’s web products (EVN, EDG)
SHACL Protégé Plugin
http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html
Repairing SKOS taxonomies with SHACL
Validation of SKOS with SHACL, and extension of SHACL with
specification of repair strategies.
Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
Validating the “bag of crisps”…
• Validation is often not about correct/incorrect or valid/invalid
• Constraints-first (e.g., SQL)
• Well-formed vs valid (e.g., XML Schema)
• Validation is often about completeness and correctness for a specific
purpose: “This is what I produce”; “This is what I understand”
• Assumption is that there may be other statements
• Different consumers may apply different constraints
• SHACL should work well in this flexible, multi-source, multi-consumer
world.
“Anyone can say anything about anything”
RDF
SPARQL
OWL
RDFS
Statements: What is being said?
What words do
we have?
What makes logical sense to say?
What did you say
about XYZ?
OWL SHACL
Is that word used correctly?
What do you need to know from me?
You can't say that here!
I’d never say that!
richard@topquadrant.com
Backup slides
SHACL: Shaping the Big Ball of Data Mud

More Related Content

What's hot

Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQLOlaf Hartig
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQLOpen Data Support
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshellFabien Gandon
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020andyseaborne
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesMarin Dimitrov
 
Resource description framework
Resource description frameworkResource description framework
Resource description frameworkStanley Wang
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Graph and RDF databases
Graph and RDF databasesGraph and RDF databases
Graph and RDF databasesNassim Bahri
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
LOD (linked open data) part 2 lod 구축과 현황
LOD (linked open data) part 2   lod 구축과 현황LOD (linked open data) part 2   lod 구축과 현황
LOD (linked open data) part 2 lod 구축과 현황LiST Inc
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesJose Emilio Labra Gayo
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
 

What's hot (20)

Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshell
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic Repositories
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Graph and RDF databases
Graph and RDF databasesGraph and RDF databases
Graph and RDF databases
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
LOD (linked open data) part 2 lod 구축과 현황
LOD (linked open data) part 2   lod 구축과 현황LOD (linked open data) part 2   lod 구축과 현황
LOD (linked open data) part 2 lod 구축과 현황
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
 

Similar to SHACL: Shaping the Big Ball of Data Mud

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesKurt Cagle
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebShamod Lacoul
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic webMarakana Inc.
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsRinke Hoekstra
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CIvan Herman
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...Dr.-Ing. Thomas Hartmann
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web workPaul Houle
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum
 

Similar to SHACL: Shaping the Big Ball of Data Mud (20)

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data Frames
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic Web
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic web
 
What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
Linked services
Linked servicesLinked services
Linked services
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
 
SPIN and Shapes
SPIN and ShapesSPIN and Shapes
SPIN and Shapes
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
SPIN in Five Slides
SPIN in Five SlidesSPIN in Five Slides
SPIN in Five Slides
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 
Semantic web
Semantic web Semantic web
Semantic web
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 

More from Richard Cyganiak

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsRichard Cyganiak
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Richard Cyganiak
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyRichard Cyganiak
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfRichard Cyganiak
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksRichard Cyganiak
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government DataRichard Cyganiak
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesRichard Cyganiak
 

More from Richard Cyganiak (11)

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations Ontology
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data catalogues
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

SHACL: Shaping the Big Ball of Data Mud

  • 1. Shaping the Big Ball of Data Mud W3C's Shapes Constraint Language (SHACL) Richard Cyganiak Lotico Berlin Semantic Web Meetup, 17 November 2016
  • 4. Strengths Weaknesses • Flexible can-say-anything data model • Merging data is trivial • Shared, explicit meaning thanks to URIs • Mixing and matching of schemas; partial understanding • Painstakingly developed vocabularies • “Neutral ground” for modelling • SPARQL • Overgeneralisation: works for anything, but great at nothing • “RDF tax” • Logic foundations and web foundations can be baggage • Maps poorly to common programming language data structures • Schemaless nature makes optimisation difficult • Not good at semi-structured
  • 5. Application Areas • Knowledge graphs • Publishing • Life sciences • Fraud detection & identity management • Data integration & analysis The V’s of Big Data: Volume, Velocity, Variety
  • 7.
  • 9. RDF is supposedly self-describing. RDF
  • 14. R2RML
  • 16. Why is RDFS not enough? RDF SPARQL OWL RDFS
  • 17. Why is RDFS not enough? • RDF “Schema” — and schemas are for validation, right? • It’s a misnomer; should be “RDF Vocabulary Definition Language” • Very limited expressivity • Not the right semantics for validation • ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …? • Invalid data -> infer more invalid data => ex:Germany a ex:City RDFS
  • 18. Why is OWL not enough? RDF SPARQL OWL RDFS
  • 19. Why is OWL not enough? • De facto a constraint language: logical contradiction => invalid • Very expressive • But targeted at logic modelling, not validity constraints • Not the right semantics for validation • ex:Dublin ex:inCountry ex:Ireland, ex:USA => …? • Open world assumption • No unique name assumption => ex:Ireland owl:sameAs ex:USA OWL
  • 20. ICV: OWL closed-world semantics in Stardog
  • 21. Why is SPARQL not enough? RDF SPARQL OWL RDFS
  • 22. Why is SPARQL not enough? SPARQL
  • 24. Why is SPARQL not enough? • SPARQL ASK seems ideal for constraint validation • Very expressive • Efficient implementations • But writing even simple constraints can be tedious SPARQL
  • 26. ShEx — Shape Expressions http://shex.io/
  • 29. SHACL Overview • A language for “checking RDF graphs against conditions” • Produced by W3C Data Shapes Working Group • Work in progress, some features at risk • 4th Working Draft: August 2016 • Should be done by June 2017 • Like RDFS and OWL, SHACL constraints are themselves written in RDF • SPARQL underneath (for evaluation semantics and extensibility)
  • 30. ex:PersonShape a sh:Shape ; sh:targetClass ex:Person ; sh:property [ sh:predicate ex:ssn ; sh:maxCount 1 ; sh:datatype xsd:string ; sh:pattern "^d{3}-d{2}-d{4}$" ; ] ; sh:property [ sh:predicate ex:child ; sh:class ex:Person ; sh:nodeKind sh:IRI ; ] ; sh:property [ sh:path [ sh:inversePath ex:child ] ; sh:name "parent" ; sh:maxCount 2 ; ] .
  • 31. How a Shape works Diagram: Dimitris Kontokostas
  • 32. Targets: Initial selection of focus nodes • Node target • Class instance target • Subjects-of target • Objects-of target • SPARQL-based selection (advanced)
  • 33. Node constraints Constraints about the focus node itself: • Node kind (IRI, blank, literal) • IRI stem (namespace) • IRI regex • SPARQL query constraint (advanced)
  • 34. Property constraints Constraints about a certain outgoing or incoming property of the focus node(s): • Cardinality • Class • Datatype • Node kind (IRI, blank node, literal) • String min/max length, string regex • Numeric min/max • Value must match another shape • Value must not match another shape
  • 35. Other features • Combine constraints with logical OR/any (default: AND/all) • Property-pair comparison (=, <, >) • Severities (Violation, Warning, Info) • Annotations (name, description, grouping, order) • Define additional types of constraints based on SPARQL (advanced)
  • 36. Violation reports can be produced in RDF ex:ExampleConstraintViolation a sh:ValidationResult ; sh:severity sh:Violation ; sh:focusNode ex:Bob ; sh:path ex:age ; sh:value "twenty two" ; sh:message "ex:age must be literal of datatype xsd:integer." ; sh:sourceConstraintComponent sh:DatatypeConstraintComponent ; sh:sourceShape ex:PersonShape .
  • 37. Relationship to Rules • Rules: “If someone says this, then I say that.” • SHACL can’t do this. • Does not replace SWRL, Jena Rules, RIF, SPIN Rules
  • 39. SHACL in TopBraid Composer: Shapes + Constraints SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
  • 40. SHACL in TopBraid Composer: SPARQL-based constraints
  • 41. SHACL in TopQuadrant’s web products (EVN, EDG)
  • 42.
  • 44. Repairing SKOS taxonomies with SHACL Validation of SKOS with SHACL, and extension of SHACL with specification of repair strategies. Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
  • 45.
  • 46. Validating the “bag of crisps”… • Validation is often not about correct/incorrect or valid/invalid • Constraints-first (e.g., SQL) • Well-formed vs valid (e.g., XML Schema) • Validation is often about completeness and correctness for a specific purpose: “This is what I produce”; “This is what I understand” • Assumption is that there may be other statements • Different consumers may apply different constraints • SHACL should work well in this flexible, multi-source, multi-consumer world.
  • 47. “Anyone can say anything about anything” RDF SPARQL OWL RDFS Statements: What is being said? What words do we have? What makes logical sense to say? What did you say about XYZ? OWL SHACL Is that word used correctly? What do you need to know from me? You can't say that here! I’d never say that!

Editor's Notes

  1. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.
  2. Talk will be about validation and SHACL, but I’d like to start by setting the scene Where is the Semantic Web on the hype cycle? Arguably, it went over the bump twice already: with focus on logic/AI around 2000, and focus on Linked Data around 2010. I helped to fan the flames of the second hype. The base standards: Today it's no longer that exciting. Overblown expectations have cooled off. It’s no longer expected to change the world. Getting stable and mature. Specific applications can be elsewhere on the cycle. See “Enterprise Taxonomy and Ontology Management”. That’s actually what TQ does.
  3. If you work with these technologies, life is pretty good these days, and still getting better. Maturing standards and tool support. And today we really understand what the technologies are good at, and what not.
  4. “Maps poorly to programming languages”: property names are not simple identifiers, every property can be multivalued; need navigability along incoming and outgoing arcs; ordering is difficult Semi-structured is important in big data
  5. We know where it works and where it doesn’t. It’s productive in a number of niches. RDF is good at dealing with Variety. (But not good enough: contextual validation, fuzzy/statistical matching for the semi-structured stuff) Variety tends to make logic approaches difficult—no single global truth—less OWL, more SPARQL
  6. Tim Berners-Lee deconstructing a bag of crisps Perfect metaphor for the strengths of the SW Different information co-exists on the packaging: the plain English “potato chips” the nutrition information on the back, standardized by the U.S. food and drug administration some allergy information that many people don’t pay any attention to, but those with allergies read very carefully. the UPC code that can be read by any retail checkout machine in the world some numbers on the bottom edge of the package that make no sense to him whatsoever. Mixing and matching of different vocabularies, standardised by different organisations, intended for different consumers. Partial understanding. Once you have agreed on an identifier for a thing *and a location for data about it*, different data producers and consumers can use it without stepping on each others' toes.
  7. The two main open source implementations of the technology stack, Jena and Sesame, are now at the Apache Foundation and at the Eclipse Foundation—big, established, mature, enterprisey organisations.
  8. So life is pretty good. Maturing technology stack, clearly understood strengths and weaknesses, productive niches, improving tools. But… We never solved validation. That’s kind of surprising. After all, each of these technologies has aspects that address these needs. Review one by one
  9. Every class and property has a URI. The URI references an ontology that defines the term. So each triple describes itself, right? One of the major strengths, right? No. Actually, most of the meaning is just not given in the ontology. Too much of the meaning is implicit, or just written down in text somewhere and cannot be automatically checked. Let me give examples.
  10. Arguably most important ontology in existence. Examples of things they want to validate in a tool for webmasters, came out of the workshop that kicked off the Data Shapes WG See https://www.w3.org/2001/sw/wiki/images/0/00/SimpleApplication-SpecificConstraintsforRDFModels.pdf https://www.w3.org/TR/shacl-ucr/#uc23-schema.org-constraints
  11. DC is widely used. It’s easy enough to agree on calling a title “dc:title” and an author “dc:creator”, but different orgs have widely differing views on what constitutes a complete metadata record. DC Application profiles as a response. DC developed its own way to represent those. Not a standard, not used apart from the DC community.
  12. I’ve been involved. We wrote constraint in prose, and added SPARQL queries to make it more formal/explicit. And yes people can copy-paste them. But still no way of just running all of them automatically against a published dataset! And no error reporting—just true/false.
  13. I’ve been involved. Mapping files are written in RDF; goal was to be very clear about what constitutes a valid mapping file. This is semi-formal. Surely this should be representable in some standard machine-readable way?
  14. Read/write Linked Data. Applications want to put constraints on the kind of data they can receive. Address book application wants to say that there should be an address in the RDF you PUT/POST. But completely punted on saying how to achieve it. “machine-readable ones facilitate better client interaction”—no shit!
  15. So, lots of initiatives that are serious about using SW in an interoperable and robust way end up just putting constraints in prose text, where it should really be in a machine-processable form. Same problem everywhere! But we have RDF Schema. SCHEMA!
  16. RDF Schema in analogy to XML Schema, but they really do very different things.
  17. So RDFS is just not powerful enough. But OWL surely gets us there?
  18. Clark&Parsia. Use OWL syntax, but switch to a semantics based on CW and UNA. This works pretty well! But can be a bit confusing—if you find some OWL, what semantics is intended? And OWL, while expressive, lacks some things that one would like to have in validation.
  19. We saw the Data Cube example where SPARQL was used to query the graph to see if it’s complete. Isn’t that enough to solve all validation issues?
  20. SPIN is a technology introduced by TQ. A bunch of things (rules written in SPARQL, templated SPARQL queries, defining custom SPARQL functions, etc.) We have used this for years and it actually works very well.
  21. Custom syntax. Somewhere between SPARQL, regular expressions, and grammar parsing. “regex for graphs.” Pretty cool. Concise. Needs new parsers.
  22. So, several good solutions around—but none has enough mindshare to take over. Meet at W3C, make a standard with the best aspects of each. (Or with the worst aspects of each—fingers crossed.)
  23. Some features and aspects are still highly controversial.
  24. When a violation occurs, the result is not just “false”. It’s a structure with info. Can process it in various ways. Just display it? Attach it to the right form field based on sh:path? Just count the violations per type in a large dataset? Different behaviour for different severitites?
  25. It’s still early days. Mostly individuals and organisations that are active in the working group.
  26. SHACL is getting really important to our products. We have made a major contribution. TQ’s Holger Knublauch is one of the editors of the spec. TBC is an SW IDE, workbench for SW professionals. At its heart is a schema/ontology editor. It supports editing of SHACL constraints through nice UI.
  27. EVN is a taxonomy and ontology management platform. EDG a data governance solution. SHACL allows our customers to add custom constraints over their own data models. Very powerful.
  28. Note the suggestions for fixing the problem. Goes beyond standard SHACL but an obvious addition and very cool.
  29. Not sure how well maintained and if it’s following the spec.
  30. Semantic Web Company. Nice application that shows using SHACL for bulk validation. Automated repair—somewhat similar to our suggestions extension.
  31. Dimitris Kontokostas (SHACL spec editor) and team at Uni Leipzig. Organising entire test suites, expressed originally in SPARQL but now with SHACL support, for data quality of large data sets. Used in context of DBpedia.
  32. So, how do the parts of the stack fit together? High-level view. Let’s run with the metaphor that anyone can say anything about anything. First we should note: Just because you can say anything about anything doesn’t mean you should! RDF is triples. But we also call them RDF statements. Each triple is a statement of some fact.
  33. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.