SlideShare a Scribd company logo
1 of 50
Shaping the Big Ball of Data Mud
W3C's Shapes Constraint Language (SHACL)
Richard Cyganiak
Lotico Berlin Semantic Web Meetup, 17 November 2016
Semantic Web
RDF
SPARQL
OWL
RDFS
RDF
SPARQL
OWL
RDFS
Strengths Weaknesses
• Flexible can-say-anything data model
• Merging data is trivial
• Shared, explicit meaning thanks to URIs
• Mixing and matching of schemas;
partial understanding
• Painstakingly developed vocabularies
• “Neutral ground” for modelling
• SPARQL
• Overgeneralisation: works for
anything, but great at nothing
• “RDF tax”
• Logic foundations and web
foundations can be baggage
• Maps poorly to common
programming language data
structures
• Schemaless nature makes
optimisation difficult
• Not good at semi-structured
Application Areas
• Knowledge graphs
• Publishing
• Life sciences
• Fraud detection & identity management
• Data integration & analysis
The V’s of Big Data: Volume, Velocity, Variety
https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
RDF is supposedly self-describing.
RDF
Schema.org
Simple Knowledge Organization Scheme
(SKOS)
Dublin Core
Data Cube Vocabulary
R2RML
Linked Data Platform (LDP)
Why is RDFS not enough?
RDF
SPARQL
OWL
RDFS
Why is RDFS not enough?
• RDF “Schema” — and schemas are for validation, right?
• It’s a misnomer; should be “RDF Vocabulary Definition Language”
• Very limited expressivity
• Not the right semantics for validation
• ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …?
• Invalid data -> infer more invalid data
=> ex:Germany a ex:City
RDFS
Why is OWL not enough?
RDF
SPARQL
OWL
RDFS
Why is OWL not enough?
• De facto a constraint language: logical contradiction => invalid
• Very expressive
• But targeted at logic modelling, not validity constraints
• Not the right semantics for validation
• ex:Dublin ex:inCountry ex:Ireland, ex:USA => …?
• Open world assumption
• No unique name assumption
=> ex:Ireland owl:sameAs ex:USA
OWL
ICV: OWL closed-world semantics in Stardog
Why is SPARQL not enough?
RDF
SPARQL
OWL
RDFS
Why is SPARQL not enough? SPARQL
http://spinrdf.org/
Why is SPARQL not enough?
• SPARQL ASK seems ideal for constraint validation
• Very expressive
• Efficient implementations
• But writing even simple constraints can be tedious
SPARQL
Other proposals
ShEx — Shape Expressions
http://shex.io/
So, something new?
RDF
SPARQL
OWL
RDFS
Validation?
Constraint checking?
SHACL
Shapes Constraint Language
SHACL Overview
• A language for “checking RDF graphs against conditions”
• Produced by W3C Data Shapes Working Group
• Work in progress, some features at risk
• 4th Working Draft: August 2016
• Should be done by June 2017
• Like RDFS and OWL, SHACL constraints are themselves written in RDF
• SPARQL underneath (for evaluation semantics and extensibility)
ex:PersonShape
a sh:Shape ;
sh:targetClass ex:Person ;
sh:property [
sh:predicate ex:ssn ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^d{3}-d{2}-d{4}$" ;
] ;
sh:property [
sh:predicate ex:child ;
sh:class ex:Person ;
sh:nodeKind sh:IRI ;
] ;
sh:property [
sh:path [ sh:inversePath ex:child ] ;
sh:name "parent" ;
sh:maxCount 2 ;
] .
How a Shape works
Diagram: Dimitris Kontokostas
Targets: Initial selection of focus nodes
• Node target
• Class instance target
• Subjects-of target
• Objects-of target
• SPARQL-based selection (advanced)
Node constraints
Constraints about the focus node itself:
• Node kind (IRI, blank, literal)
• IRI stem (namespace)
• IRI regex
• SPARQL query constraint (advanced)
Property constraints
Constraints about a certain outgoing or incoming property of the focus
node(s):
• Cardinality
• Class
• Datatype
• Node kind (IRI, blank node, literal)
• String min/max length, string regex
• Numeric min/max
• Value must match another shape
• Value must not match another shape
Other features
• Combine constraints with logical OR/any (default: AND/all)
• Property-pair comparison (=, <, >)
• Severities (Violation, Warning, Info)
• Annotations (name, description, grouping, order)
• Define additional types of constraints based on SPARQL (advanced)
Violation reports can be produced in RDF
ex:ExampleConstraintViolation
a sh:ValidationResult ;
sh:severity sh:Violation ;
sh:focusNode ex:Bob ;
sh:path ex:age ;
sh:value "twenty two" ;
sh:message "ex:age must be literal of datatype xsd:integer." ;
sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
sh:sourceShape ex:PersonShape .
Relationship to Rules
• Rules: “If someone says this, then I say that.”
• SHACL can’t do this.
• Does not replace SWRL, Jena Rules, RIF, SPIN Rules
Uses and implementations
SHACL in TopBraid Composer:
Shapes + Constraints
SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
SHACL in TopBraid Composer: SPARQL-based constraints
SHACL in TopQuadrant’s web products (EVN, EDG)
SHACL Protégé Plugin
http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html
Repairing SKOS taxonomies with SHACL
Validation of SKOS with SHACL, and extension of SHACL with
specification of repair strategies.
Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
Validating the “bag of crisps”…
• Validation is often not about correct/incorrect or valid/invalid
• Constraints-first (e.g., SQL)
• Well-formed vs valid (e.g., XML Schema)
• Validation is often about completeness and correctness for a specific
purpose: “This is what I produce”; “This is what I understand”
• Assumption is that there may be other statements
• Different consumers may apply different constraints
• SHACL should work well in this flexible, multi-source, multi-consumer
world.
“Anyone can say anything about anything”
RDF
SPARQL
OWL
RDFS
Statements: What is being said?
What words do
we have?
What makes logical sense to say?
What did you say
about XYZ?
OWL SHACL
Is that word used correctly?
What do you need to know from me?
You can't say that here!
I’d never say that!
richard@topquadrant.com
Backup slides
SHACL: Shaping the Big Ball of Data Mud

More Related Content

What's hot

Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic WebMyungjin Lee
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesJose Emilio Labra Gayo
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshellFabien Gandon
 
LOD(linked open data) part 1 lod 란 무엇인가
LOD(linked open data) part 1   lod 란 무엇인가LOD(linked open data) part 1   lod 란 무엇인가
LOD(linked open data) part 1 lod 란 무엇인가LiST Inc
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Edureka!
 
LOD連続講義 第5回「LODの作り方・使い方」
LOD連続講義 第5回「LODの作り方・使い方」LOD連続講義 第5回「LODの作り方・使い方」
LOD連続講義 第5回「LODの作り方・使い方」Fuyuko Matsumura
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeAdriel Café
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020andyseaborne
 
LODを使ってみよう!
LODを使ってみよう!LODを使ってみよう!
LODを使ってみよう!uedayou
 
Graph and RDF databases
Graph and RDF databasesGraph and RDF databases
Graph and RDF databasesNassim Bahri
 
RDFS In A Nutshell V1
RDFS In A Nutshell V1RDFS In A Nutshell V1
RDFS In A Nutshell V1Fabien Gandon
 

What's hot (20)

SPARQL Tutorial
SPARQL TutorialSPARQL Tutorial
SPARQL Tutorial
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
 
RDF data validation 2017 SHACL
RDF data validation 2017 SHACLRDF data validation 2017 SHACL
RDF data validation 2017 SHACL
 
RDF data model
RDF data modelRDF data model
RDF data model
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshell
 
LOD(linked open data) part 1 lod 란 무엇인가
LOD(linked open data) part 1   lod 란 무엇인가LOD(linked open data) part 1   lod 란 무엇인가
LOD(linked open data) part 1 lod 란 무엇인가
 
HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
 
LOD連続講義 第5回「LODの作り方・使い方」
LOD連続講義 第5回「LODの作り方・使い方」LOD連続講義 第5回「LODの作り方・使い方」
LOD連続講義 第5回「LODの作り方・使い方」
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & Practice
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020
 
LODを使ってみよう!
LODを使ってみよう!LODを使ってみよう!
LODを使ってみよう!
 
Graph and RDF databases
Graph and RDF databasesGraph and RDF databases
Graph and RDF databases
 
RDFS In A Nutshell V1
RDFS In A Nutshell V1RDFS In A Nutshell V1
RDFS In A Nutshell V1
 

Similar to SHACL: Shaping the Big Ball of Data Mud

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesKurt Cagle
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebShamod Lacoul
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic webMarakana Inc.
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsRinke Hoekstra
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CIvan Herman
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...Dr.-Ing. Thomas Hartmann
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web workPaul Houle
 

Similar to SHACL: Shaping the Big Ball of Data Mud (20)

RDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data FramesRDF SHACL, Annotations, and Data Frames
RDF SHACL, Annotations, and Data Frames
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic Web
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016KIT Graduiertenkolloquium 11.05.2016
KIT Graduiertenkolloquium 11.05.2016
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic web
 
What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
Linked services
Linked servicesLinked services
Linked services
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
 
SPIN and Shapes
SPIN and ShapesSPIN and Shapes
SPIN and Shapes
 
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
2016.02 - Validating RDF Data Quality using Constraints to Direct the Develop...
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
SPIN in Five Slides
SPIN in Five SlidesSPIN in Five Slides
SPIN in Five Slides
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 

More from Richard Cyganiak

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsRichard Cyganiak
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Richard Cyganiak
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyRichard Cyganiak
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfRichard Cyganiak
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksRichard Cyganiak
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government DataRichard Cyganiak
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesRichard Cyganiak
 

More from Richard Cyganiak (11)

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations Ontology
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data catalogues
 

Recently uploaded

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

SHACL: Shaping the Big Ball of Data Mud

  • 1. Shaping the Big Ball of Data Mud W3C's Shapes Constraint Language (SHACL) Richard Cyganiak Lotico Berlin Semantic Web Meetup, 17 November 2016
  • 4. Strengths Weaknesses • Flexible can-say-anything data model • Merging data is trivial • Shared, explicit meaning thanks to URIs • Mixing and matching of schemas; partial understanding • Painstakingly developed vocabularies • “Neutral ground” for modelling • SPARQL • Overgeneralisation: works for anything, but great at nothing • “RDF tax” • Logic foundations and web foundations can be baggage • Maps poorly to common programming language data structures • Schemaless nature makes optimisation difficult • Not good at semi-structured
  • 5. Application Areas • Knowledge graphs • Publishing • Life sciences • Fraud detection & identity management • Data integration & analysis The V’s of Big Data: Volume, Velocity, Variety
  • 7.
  • 9. RDF is supposedly self-describing. RDF
  • 14. R2RML
  • 16. Why is RDFS not enough? RDF SPARQL OWL RDFS
  • 17. Why is RDFS not enough? • RDF “Schema” — and schemas are for validation, right? • It’s a misnomer; should be “RDF Vocabulary Definition Language” • Very limited expressivity • Not the right semantics for validation • ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …? • Invalid data -> infer more invalid data => ex:Germany a ex:City RDFS
  • 18. Why is OWL not enough? RDF SPARQL OWL RDFS
  • 19. Why is OWL not enough? • De facto a constraint language: logical contradiction => invalid • Very expressive • But targeted at logic modelling, not validity constraints • Not the right semantics for validation • ex:Dublin ex:inCountry ex:Ireland, ex:USA => …? • Open world assumption • No unique name assumption => ex:Ireland owl:sameAs ex:USA OWL
  • 20. ICV: OWL closed-world semantics in Stardog
  • 21. Why is SPARQL not enough? RDF SPARQL OWL RDFS
  • 22. Why is SPARQL not enough? SPARQL
  • 24. Why is SPARQL not enough? • SPARQL ASK seems ideal for constraint validation • Very expressive • Efficient implementations • But writing even simple constraints can be tedious SPARQL
  • 26. ShEx — Shape Expressions http://shex.io/
  • 29. SHACL Overview • A language for “checking RDF graphs against conditions” • Produced by W3C Data Shapes Working Group • Work in progress, some features at risk • 4th Working Draft: August 2016 • Should be done by June 2017 • Like RDFS and OWL, SHACL constraints are themselves written in RDF • SPARQL underneath (for evaluation semantics and extensibility)
  • 30. ex:PersonShape a sh:Shape ; sh:targetClass ex:Person ; sh:property [ sh:predicate ex:ssn ; sh:maxCount 1 ; sh:datatype xsd:string ; sh:pattern "^d{3}-d{2}-d{4}$" ; ] ; sh:property [ sh:predicate ex:child ; sh:class ex:Person ; sh:nodeKind sh:IRI ; ] ; sh:property [ sh:path [ sh:inversePath ex:child ] ; sh:name "parent" ; sh:maxCount 2 ; ] .
  • 31. How a Shape works Diagram: Dimitris Kontokostas
  • 32. Targets: Initial selection of focus nodes • Node target • Class instance target • Subjects-of target • Objects-of target • SPARQL-based selection (advanced)
  • 33. Node constraints Constraints about the focus node itself: • Node kind (IRI, blank, literal) • IRI stem (namespace) • IRI regex • SPARQL query constraint (advanced)
  • 34. Property constraints Constraints about a certain outgoing or incoming property of the focus node(s): • Cardinality • Class • Datatype • Node kind (IRI, blank node, literal) • String min/max length, string regex • Numeric min/max • Value must match another shape • Value must not match another shape
  • 35. Other features • Combine constraints with logical OR/any (default: AND/all) • Property-pair comparison (=, <, >) • Severities (Violation, Warning, Info) • Annotations (name, description, grouping, order) • Define additional types of constraints based on SPARQL (advanced)
  • 36. Violation reports can be produced in RDF ex:ExampleConstraintViolation a sh:ValidationResult ; sh:severity sh:Violation ; sh:focusNode ex:Bob ; sh:path ex:age ; sh:value "twenty two" ; sh:message "ex:age must be literal of datatype xsd:integer." ; sh:sourceConstraintComponent sh:DatatypeConstraintComponent ; sh:sourceShape ex:PersonShape .
  • 37. Relationship to Rules • Rules: “If someone says this, then I say that.” • SHACL can’t do this. • Does not replace SWRL, Jena Rules, RIF, SPIN Rules
  • 39. SHACL in TopBraid Composer: Shapes + Constraints SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/
  • 40. SHACL in TopBraid Composer: SPARQL-based constraints
  • 41. SHACL in TopQuadrant’s web products (EVN, EDG)
  • 42.
  • 44. Repairing SKOS taxonomies with SHACL Validation of SKOS with SHACL, and extension of SHACL with specification of repair strategies. Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf
  • 45.
  • 46. Validating the “bag of crisps”… • Validation is often not about correct/incorrect or valid/invalid • Constraints-first (e.g., SQL) • Well-formed vs valid (e.g., XML Schema) • Validation is often about completeness and correctness for a specific purpose: “This is what I produce”; “This is what I understand” • Assumption is that there may be other statements • Different consumers may apply different constraints • SHACL should work well in this flexible, multi-source, multi-consumer world.
  • 47. “Anyone can say anything about anything” RDF SPARQL OWL RDFS Statements: What is being said? What words do we have? What makes logical sense to say? What did you say about XYZ? OWL SHACL Is that word used correctly? What do you need to know from me? You can't say that here! I’d never say that!

Editor's Notes

  1. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.
  2. Talk will be about validation and SHACL, but I’d like to start by setting the scene Where is the Semantic Web on the hype cycle? Arguably, it went over the bump twice already: with focus on logic/AI around 2000, and focus on Linked Data around 2010. I helped to fan the flames of the second hype. The base standards: Today it's no longer that exciting. Overblown expectations have cooled off. It’s no longer expected to change the world. Getting stable and mature. Specific applications can be elsewhere on the cycle. See “Enterprise Taxonomy and Ontology Management”. That’s actually what TQ does.
  3. If you work with these technologies, life is pretty good these days, and still getting better. Maturing standards and tool support. And today we really understand what the technologies are good at, and what not.
  4. “Maps poorly to programming languages”: property names are not simple identifiers, every property can be multivalued; need navigability along incoming and outgoing arcs; ordering is difficult Semi-structured is important in big data
  5. We know where it works and where it doesn’t. It’s productive in a number of niches. RDF is good at dealing with Variety. (But not good enough: contextual validation, fuzzy/statistical matching for the semi-structured stuff) Variety tends to make logic approaches difficult—no single global truth—less OWL, more SPARQL
  6. Tim Berners-Lee deconstructing a bag of crisps Perfect metaphor for the strengths of the SW Different information co-exists on the packaging: the plain English “potato chips” the nutrition information on the back, standardized by the U.S. food and drug administration some allergy information that many people don’t pay any attention to, but those with allergies read very carefully. the UPC code that can be read by any retail checkout machine in the world some numbers on the bottom edge of the package that make no sense to him whatsoever. Mixing and matching of different vocabularies, standardised by different organisations, intended for different consumers. Partial understanding. Once you have agreed on an identifier for a thing *and a location for data about it*, different data producers and consumers can use it without stepping on each others' toes.
  7. The two main open source implementations of the technology stack, Jena and Sesame, are now at the Apache Foundation and at the Eclipse Foundation—big, established, mature, enterprisey organisations.
  8. So life is pretty good. Maturing technology stack, clearly understood strengths and weaknesses, productive niches, improving tools. But… We never solved validation. That’s kind of surprising. After all, each of these technologies has aspects that address these needs. Review one by one
  9. Every class and property has a URI. The URI references an ontology that defines the term. So each triple describes itself, right? One of the major strengths, right? No. Actually, most of the meaning is just not given in the ontology. Too much of the meaning is implicit, or just written down in text somewhere and cannot be automatically checked. Let me give examples.
  10. Arguably most important ontology in existence. Examples of things they want to validate in a tool for webmasters, came out of the workshop that kicked off the Data Shapes WG See https://www.w3.org/2001/sw/wiki/images/0/00/SimpleApplication-SpecificConstraintsforRDFModels.pdf https://www.w3.org/TR/shacl-ucr/#uc23-schema.org-constraints
  11. DC is widely used. It’s easy enough to agree on calling a title “dc:title” and an author “dc:creator”, but different orgs have widely differing views on what constitutes a complete metadata record. DC Application profiles as a response. DC developed its own way to represent those. Not a standard, not used apart from the DC community.
  12. I’ve been involved. We wrote constraint in prose, and added SPARQL queries to make it more formal/explicit. And yes people can copy-paste them. But still no way of just running all of them automatically against a published dataset! And no error reporting—just true/false.
  13. I’ve been involved. Mapping files are written in RDF; goal was to be very clear about what constitutes a valid mapping file. This is semi-formal. Surely this should be representable in some standard machine-readable way?
  14. Read/write Linked Data. Applications want to put constraints on the kind of data they can receive. Address book application wants to say that there should be an address in the RDF you PUT/POST. But completely punted on saying how to achieve it. “machine-readable ones facilitate better client interaction”—no shit!
  15. So, lots of initiatives that are serious about using SW in an interoperable and robust way end up just putting constraints in prose text, where it should really be in a machine-processable form. Same problem everywhere! But we have RDF Schema. SCHEMA!
  16. RDF Schema in analogy to XML Schema, but they really do very different things.
  17. So RDFS is just not powerful enough. But OWL surely gets us there?
  18. Clark&Parsia. Use OWL syntax, but switch to a semantics based on CW and UNA. This works pretty well! But can be a bit confusing—if you find some OWL, what semantics is intended? And OWL, while expressive, lacks some things that one would like to have in validation.
  19. We saw the Data Cube example where SPARQL was used to query the graph to see if it’s complete. Isn’t that enough to solve all validation issues?
  20. SPIN is a technology introduced by TQ. A bunch of things (rules written in SPARQL, templated SPARQL queries, defining custom SPARQL functions, etc.) We have used this for years and it actually works very well.
  21. Custom syntax. Somewhere between SPARQL, regular expressions, and grammar parsing. “regex for graphs.” Pretty cool. Concise. Needs new parsers.
  22. So, several good solutions around—but none has enough mindshare to take over. Meet at W3C, make a standard with the best aspects of each. (Or with the worst aspects of each—fingers crossed.)
  23. Some features and aspects are still highly controversial.
  24. When a violation occurs, the result is not just “false”. It’s a structure with info. Can process it in various ways. Just display it? Attach it to the right form field based on sh:path? Just count the violations per type in a large dataset? Different behaviour for different severitites?
  25. It’s still early days. Mostly individuals and organisations that are active in the working group.
  26. SHACL is getting really important to our products. We have made a major contribution. TQ’s Holger Knublauch is one of the editors of the spec. TBC is an SW IDE, workbench for SW professionals. At its heart is a schema/ontology editor. It supports editing of SHACL constraints through nice UI.
  27. EVN is a taxonomy and ontology management platform. EDG a data governance solution. SHACL allows our customers to add custom constraints over their own data models. Very powerful.
  28. Note the suggestions for fixing the problem. Goes beyond standard SHACL but an obvious addition and very cool.
  29. Not sure how well maintained and if it’s following the spec.
  30. Semantic Web Company. Nice application that shows using SHACL for bulk validation. Automated repair—somewhat similar to our suggestions extension.
  31. Dimitris Kontokostas (SHACL spec editor) and team at Uni Leipzig. Organising entire test suites, expressed originally in SPARQL but now with SHACL support, for data quality of large data sets. Used in context of DBpedia.
  32. So, how do the parts of the stack fit together? High-level view. Let’s run with the metaphor that anyone can say anything about anything. First we should note: Just because you can say anything about anything doesn’t mean you should! RDF is triples. But we also call them RDF statements. Each triple is a statement of some fact.
  33. It’s amazing how many people have done incredible work. Massive effort shown in this pic. But there is some hype. Quite a few datasets are a sloppy conversion script, results thrown into a SPARQL store, with some haphazard links to DBpedia. Run a handful of SPARQL queries as sanity checks. But no in-depth quality control at all. Lots of data quality issues. Querying within a dataset can be hard enough, across datasets often impossible. If one dataset (e.g., DBpedia) changes, links break and often are never fixed.