Shape Expressions: An RDF validation 
and transformation language 
Eric Prud'hommeaux 
World Wide Web 
Consortium 
MIT, Cambridge, MA, USA 
eric@w3.org 
Harold Solbrig 
Mayo Clinic 
USA 
College of Medicine, Rochester, 
MN, USA 
Jose Emilio Labra Gayo 
WESO Research group 
University of Oviedo 
Spain 
labra@uniovi.es
This talk in 1 slide 
Motivating example: 
Represent issues and users in RDF 
...and validate that data 
Shape Expressions = simple language to: 
Describe the topology of RDF data 
Validate if an RDF graph matches a given shape 
Shape expressions can be extended with actions 
Possible application: transform RDF into XML
Motivating example 
Represent in RDF a issue tracking system 
Issues are reported by users on some date 
Issues have some status (assigned/unassigned) 
Issues can also be reproduced on some date by users 
User Issue
User__ 
foaf:name: xsd:string 
foaf:givenName: xsd:string* 
foaf:familyName: xsd:string 
foaf:mbox: IRI 
Issue__ 
:status: (:Assigned :Unassigned) 
:reportedOn: xsd:date 
:reproducedOn: xsd:date 
1 :reportedBy 0..* 
0..* :reproducedBy 0..1 
0..* 
0..1 
:related 
E-R Diagram 
...and several constraints 
A user: 
- has full name or 
several given names and one 
family name 
- can have one mbox 
A Issue 
- has status Assigned/Unassigned 
- is reported by a user 
- is reported on a date 
- can be reproduced by a user on a 
date 
- is related to other issues
Example data in RDF 
:Issue1 
:status :Unassigned ; 
:reportedBy :Bob ; 
:reportedOn "2013-01-23"^^xsd:date ; 
:reproducedBy :Thompson.J ; 
:reproducedOn "2013-01-23"^^xsd:date . 
:Bob 
foaf:name "Bob Smith" ; 
foaf:mbox <mail:bob@example.org> . 
:Thompson.J 
foaf:givenName "Joe", "Joseph" ; 
foaf:familyName "Thompson" ; 
foaf:mbox <mail:joe@example.org> . 
:Issue2 
:status :Checked ; 
:reportedBy :Issue1 ; 
:reportedOn 2014 ; 
:reproducedBy :Tom . 
 
:Tom 
foaf:name "Tom Smith", "Tam" . 
:Anna 
foaf:givenName "Anna" ; 
foaf:mbox 23. 

Problem statement 
We want to detect possible errors in RDF like: 
Issues without status 
Issues with status different of Assigned/Unassigned 
Issues reported by something different to a user 
Issues reported on a date with a non-date type 
Issues reproduced on a date before the reported date 
Users without mbox 
Users with 2 names 
Users with with a name of type integer 
...lots of other errors... 
Q: How can we describe RDF data to be able to detect those errors? 
A: Our proposal = Shape Expressions
Shape Expressions - Users 
A user can have either: 
one foaf:name 
or one or more foaf:givenName and one foaf:familyName 
all of them must be of type xsd:string 
A user can have one foaf:mbox with value any IRI 
<UserShape> { 
( foaf:name xsd:string 
| foaf:givenName xsd:string+ 
, foaf:familyName xsd:string 
) 
, foaf:mbox IRI ? 
} 
The example uses compact syntax 
Shape Expressions can also be represented in RDF
Shape Expressions - Issues 
Issues :status must be either :Assigned or :Unassigned 
Issues are :reportedBy a user 
Issues are :reportedOn a xsd:date 
A issue may be :reproducedBy a user and :reproduceOn an xsd:date 
A issue can be :related to several issues 
<IssueShape> { 
:status (:Assigned :Unassigned) 
, :reportedBy @<UserShape> 
, :reportedOn xsd:date 
, ( :reproducedBy @<UserShape> 
, :reproducedOn xsd:date 
)? 
, :related @<IssueShape>* 
}
Full example 
prefix : <http://example.org/> 
prefix xsd: <http://www.w3.org/2001/XMLSchema#> 
prefix foaf: <http://xmlns.com/foaf/0.1/> 
<UserShape> { 
( foaf:name xsd:string 
| foaf:givenName xsd:string+ 
, foaf:familyName xsd:string 
) 
, foaf:mbox IRI ? 
} 
<IssueShape> { 
:status (:Assigned :Unassigned) 
, :reportedBy @<UserShape> 
, :reportedOn xsd:date 
, ( :reproducedBy @<UserShape> 
, :reproducedOn xsd:date 
)? 
, :related @<IssueShape>* 
} 
Online Shape Expressions validators: 
http://www.w3.org/2013/ShEx 
http://rdfshape.weso.es
FAQ: Why not use SPARQL? 
<UserShape> { 
( foaf:name xsd:string 
| foaf:givenName xsd:string+ 
, foaf:familyName xsd:string 
) 
, foaf:mbox IRI ? 
} 
<IssueShape> { 
:status (:Assigned :Unassigned) 
, :reportedBy @<UserShape> 
, :reportedOn xsd:date 
, ( :reproducedBy @<UserShape> 
, :reproducedOn xsd:date 
)? 
, :related @<IssueShape>* 
} 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
CONSTRUCT { 
?IssueShape :hasShape <IssueShape> . 
?UserShape :hasShape <UserShape> . 
} { { SELECT ?IssueShape { 
?IssueShape :status ?o } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} 
{ SELECT ?IssueShape { 
?IssueShape :status ?o . 
FILTER ((?o = :Assigned || ?o = :Unassigned)) 
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)} 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c0) { 
?IssueShape :reportedBy ?o . 
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)} 
{ SELECT ?IssueShape { 
?IssueShape :reportedBy ?o . 
FILTER ((isIRI(?o) || isBlank(?o))) 
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)} 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c1) { 
{ SELECT ?IssueShape ?UserShape { 
?IssueShape :reportedBy ?UserShape . 
FILTER (isIRI(?UserShape) || isBlank(?UserShape)) 
} } 
{ SELECT ?UserShape WHERE { 
{ { SELECT ?UserShape { 
?UserShape foaf:name ?o . 
} GROUP BY ?UserShape HAVING (COUNT(*)=1)} 
{ SELECT ?UserShape { 
?UserShape foaf:name ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) 
} GROUP BY ?UserShape HAVING (COUNT(*)=1) 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
} UNION { 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { 
?UserShape foaf:givenName ?o . 
} GROUP BY ?UserShape HAVING (COUNT(*)>=1)} 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) 
{ ?UserShape foaf:givenName ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) 
} GROUP BY ?UserShape 
HAVING (COUNT(*)>=1)} 
FILTER (?UserShape_c0 = ?UserShape_c1) 
{ SELECT ?UserShape { 
?UserShape foaf:familyName ?o . 
} GROUP BY ?UserShape 
HAVING (COUNT(*)=1)} 
{ SELECT ?UserShape { 
?UserShape foaf:familyName ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) 
} GROUP BY ?UserShape HAVING (COUNT(*)=1)} 
} 
} GROUP BY ?UserShape HAVING (COUNT(*) = 1)} 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) 
{ 
?UserShape foaf:mbox ?o . 
} GROUP BY ?UserShape HAVING (COUNT(*)<=1)} 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) 
{ 
?UserShape foaf:mbox ?o . 
FILTER (isIRI(?o)) 
} GROUP BY ?HAVING (COUNT(*)<=1)} 
FILTER (?UserShape_c2 = ?UserShape_c3) 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
UserShape_c2 = ?UserShape_c3) 
} GROUP BY ?IssueShape } 
FILTER (?IssueShape_c0 = ?IssueShape_c1) 
OPTIONAL { 
?IssueShape :reportedBy ?IssueShape_UserShape_ref0 . 
FILTER (isIRI(?IssueShape_UserShape_ref0) 
|| isBlank(?IssueShape_UserShape_ref0)) } 
{ SELECT ?IssueShape { 
?IssueShape :reportedOn } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} 
{ SELECT ?IssueShape { 
?IssueShape :reportedOn ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:date)) 
} GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c2) { 
?IssueShape :reproducedBy ?o . 
} GROUP BY IssueShape} 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c3) { 
?IssueShape :reproducedBy ?o . 
FILTER ((isIRI(?o) || isBlank(?o))) 
} GROUP BY ?IssueShape} 
FILTER (?IssueShape_c2 = ?IssueShape_c3) 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c5) { 
?IssueShape :reproducedOn ?o . 
} GROUP BY ?IssueShape} 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c6) { 
?IssueShape :reproducedOn ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:date)) 
} GROUP BY IssueShape} 
FILTER (?IssueShape_c5 = ?IssueShape_c6) 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
FILTER (?IssueShape_c2=0 && ?IssueShape_c5=0 || 
?IssueShape_c2>=1&&?IssueShape_c2<=1 && 
?IssueShape_c5>=1&&?IssueShape_c5<=1) 
} 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c7) { 
?IssueShape :related ?o . 
} GROUP BY ?IssueShape} 
{ SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c8) { 
?IssueShape :related ?o . 
} GROUP BY ?IssueShape} 
FILTER (?IssueShape_c7 = ?IssueShape_c8) 
{ SELECT ?UserShape WHERE { 
{ { SELECT ?UserShape { 
?UserShape foaf:name ?o . 
} GROUP BY ?UserShape HAVING (COUNT(*)=1)} 
{ SELECT ?UserShape { 
?UserShape foaf:name ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) 
} GROUP BY ?UserShape HAVING (COUNT(*)=1)} 
} UNION { 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { 
?UserShape foaf:givenName ?o . 
} GROUP BY ?UserShape HAVING (COUNT(*)>=1)} 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) { 
?UserShape foaf:givenName ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) 
} GROUP BY ?UserShape HAVING (COUNT(*)>=1)} 
FILTER (?UserShape_c0 = ?UserShape_c1) 
{ SELECT ?UserShape { 
?UserShape foaf:familyName ?o . 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
} GROUP BY ?UserShape HAVING (COUNT(*)=1)} 
{ SELECT ?UserShape { 
?UserShape foaf:familyName ?o . 
FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) 
} GROUP BY ?UserShape HAVING (COUNT(*)=1)} 
} 
} GROUP BY ?UserShape HAVING (COUNT(*) = 1)} 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) { 
?UserShape foaf:mbox ?o . 
} GROUP BY ?UserShape HAVING (COUNT(*)<=1)} 
{ SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) { 
?UserShape foaf:mbox ?o . FILTER (isIRI(?o)) 
} GROUP BY ?UserShape HAVING (COUNT(*)<=1)} 
FILTER (?UserShape_c2 = ?UserShape_c3) 
} 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
. 
. 
. 
. 
Shape Expression 
Shape Expressions can be converted to SPARQL 
But Shape Expressions are simpler and more readable to solve this problem
Shape Expressions Language 
Schema = set of Shape Expressions 
Shape Expression = labeled pattern 
Typical pattern = conjunction of several expressions 
Conjunction represented by , 
<IssueShape> { 
:status (:Assigned :Unassigned) 
, :reportedBy @<UserShape> 
, :reportedOn xsd:date 
... 
} 
<label> { 
...pattern... 
} 
Label 
Conjunction
Arcs 
Basic expression: an Arc 
Arc = name definition followed by value definition 
<IssueShape> { 
:status (:Assigned :Unassigned) 
, :reportedBy @<UserShape> 
, :reportedOn xsd:date 
... 
} 
Name defn Value defn 
:status :Unassigned 
:isue1 :reportedBy :bob 
:reportedOn 23-01-2013
Value definition 
Value definitions can be 
Value type xsd:date Matches a value of type xsd:date 
Value set ( :Assigned 
:Unassigned ) 
The object is an element of the given set 
Reference @<UserShape> The object has shape <UserShape> 
Stem foaf:~ Starts with the IRI associated with foaf 
Any - :Checked Any value except :Checked 
<IssueShape> { 
:status (:Assigned :Unassigned) 
, :reportedBy @<UserShape> 
, :reportedOn xsd:date 
... 
} 
Value set 
Value reference 
Value type
Name definition 
Name definitions can be 
Name term foaf:name Matches given IRI 
Name stem foaf:~ Any predicate that starts by foaf 
Name any - foaf:name Any predicate except foaf:name 
<IssueShape> { 
:status (:Assigned :Unassigned) 
, :reportedBy @<UserShape> 
, :reportedOn xsd:date 
... 
} 
Name terms
Alternatives 
Alternatives (disjunctions) are marked by | 
Example 1: An agent has either foaf:name or rdfs:label 
<Agent> { 
( foaf:name xsd:string | rdfs:label xsd:string ) 
... 
} 
Example 2: A list of integers 
<listOfInt> { 
rdf:first xsd:integer 
, ( rdf:rest ( rdf:nil ) 
| rdf:rest @<listOfInt> 
) 
}
Cardinalities 
The same as in common regular expressions 
* 0 or more 
+ 1 or more 
? 0 or 1 
{m} m repetitions 
{m,n} Between m and n repetitions 
<IssueShape> { 
... 
( :reproducedBy @<UserShape>, :reproducedOn xsd:date)? 
, :related @<IssueShape>* 
}
Semantic actions 
Define actions to be executed during validation 
<Issue> { 
... 
:reportedOn xsd:date %js{ report = _.o; return true; %} 
, ( :reproducedBy @<UserShape> 
, :reproducedOn xsd:date %js{ return _.o.lex > report.lex; %} 
) ? 
} 
%lang{ ...actions... %} 
Calls lang processor passing it the given actions 
Example: 
Check that :reportedOn must be before :reproducedOn
Semantics of Shape Expressions 
Operational semantics using inference rules 
Inspired by the semantics of RelaxNG 
Formalism used to define type inference systems 
Matching  infer shape typings 
Axioms and rules of the form:
Example: matching rules ( ) 
Graph can be decomposed 
in g1 and g2 
Combine typings 
t1 and t2 
Context Graph Type Assignment
Transforming RDF using ShEx 
Semantic actions can be combined with 
specialized languages 
Possible languages: sparql, js 
Other examples: 
GenX = very simple language to generate XML 
Goal: Semantic lowering 
Map RDF clinical records to XML 
GenJ generates JSON
Example 
:Issue1 
:status :Unassigned ; 
:reportedBy :Bob ; 
:reportedOn "2013-01-23"^^xsd:date ; 
:reproducedBy :Thompson.J ; 
:reproducedOn "2013-01-23"^^xsd:date . 
:Bob 
foaf:name "Bob Smith" ; 
foaf:mbox <mail:bob@example.org> . 
:Thompson.J 
foaf:givenName "Joe", "Joseph" ; 
foaf:familyName "Thompson" ; 
foaf:mbox <mail:joe@example.org> . 
RDF (Turtle) 
Shape Expressions 
XML 
+ 
GenX 
<issue xmlns="http://ex.example/xml" 
id="Issue1" status="Unassigned"> 
<reported date="2013-01-23"> 
<given-name>Bob</given-name> 
<family-name>Smith</family-name> 
<email>mail:bob@example.org</email> 
</reported> 
<reproduced date="2013-01-23"> 
<given-name>Joe</given-name> 
<given-name>Joseph</given-name> 
<family-name>Thompson</family-name> 
<email>mail:joe@example.org</email> 
</reproduced> 
</issue>
GenX 
GenX syntax 
$IRI Generates elements in that namespace 
<name> Add element <name> 
@<name> Add attribute <name> 
=<expr> XPath function applied to the value 
= Don't emit the value 
[-n] Place the value up n values in the hierarchy
Example transforming RDF to XML 
%GenX{ issue $http://ex.example/xml %} 
<IssueShape> { 
ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%} 
, ex:reportedBy @<UserShape> %GenX{ reported = %} 
, ex:reportedOn xsd:date %GenX{ [-1]@date %} 
, (ex:reproducedBy @<UserShape>, 
ex:reproducedOn xsd:date %GenX{ @date %} 
)? %GenX{ reproduced = %} 
, ex:related @<IssueShape>* 
} %GenX{ @id %} 
<UserShape> { 
(foaf:name xsd:string %GenX{ full-name %} 
| foaf:givenName xsd:string+ %GenX{ given-name %} 
, foaf:familyName xsd:string %GenX{ family-name %} 
) 
, foaf:mbox shex:IRI ? %GenX{ email %} 
}
Example 
:Issue1 
:status :Unassigned ; 
:reportedBy :Bob ; 
:reportedOn "2013-01-23"^^xsd:date ; 
:reproducedBy :Thompson.J ; 
:reproducedOn "2013-01-23"^^xsd:date . 
:Bob 
foaf:name "Bob Smith" ; 
foaf:mbox <mail:bob@example.org> . 
:Thompson.J 
foaf:givenName "Joe", "Joseph" ; 
foaf:familyName "Thompson" ; 
foaf:mbox <mail:joe@example.org> . 
<issue xmlns="http://ex.example/xml" 
id="Issue1" status="Unassigned"> 
<reported date="2013-01-23"> 
<given-name>Bob</given-name> 
<family-name>Smith</family-name> 
<email>mail:bob@example.org</email> 
</reported> 
<reproduced date="2013-01-23"> 
<given-name>Joe</given-name> 
<given-name>Joseph</given-name> 
<family-name>Thompson</family-name> 
<email>mail:joe@example.org</email> 
</reproduced> 
</issue> 
RDF (Turtle) 
XML 
Shape Expressions 
+ 
GenX 
%GenX{ issue $http://ex.example/xml %} 
<IssueShape> { 
ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%} 
, ex:reportedBy @<UserShape> %GenX{ reported = %} 
, ex:reportedOn xsd:date %GenX{ [-1]@date %} 
, (ex:reproducedBy @<UserShape>, 
ex:reproducedOn xsd:date %GenX{ @date %} 
)? %GenX{ reproduced = %} 
, ex:related @<IssueShape>* 
} %GenX{ @id %} 
<UserShape> { 
(foaf:name xsd:string %GenX{ full-name %} 
| foaf:givenName xsd:string+ %GenX{ given-name %} 
, foaf:familyName xsd:string %GenX{ family-name %} 
) 
, foaf:mbox shex:IRI ? %GenX{ email %} 
} 
Shape Expressions + 
GenX
Current Implementations 
Name Main 
Developer 
Language Features 
FancyDemo Eric 
Prud'hommeaux 
Javascript First implementation 
Semantic Actions 
- GenX, GenJ 
Conversion to SPARQL 
http://www.w3.org/2013/ShEx/ 
JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntax 
https://github.com/jessevdam/shextest 
ShExcala Jose E. Labra Scala Several extensions: 
negations, reverse arcs, relations,... 
Efficient implementation using Derivatives 
http://labra.github.io/ShExcala/ 
Haws Jose E. Labra Haskell Prototype to check inference semantics 
http://labra.github.io/haws/
Applications to linked data portals 
2 data portals: WebIndex and LandPortal 
Data portal documentation 
http://weso.github.io/wiDoc/ http://weso.github.io/landportalDoc/data 
<Observation> { 
cex:md5-checksum xsd:string 
, cex:computation @<Computation> 
, dcterms:issued xsd:integer 
, dcterms:publisher ( wi-org:WebFoundation ) 
, qb:dataSet @<Dataset> 
, rdfs:label (@en) 
, sdmx-concept:obsStatus @<ObsStatus> 
, wi-onto:ref-area @<Area> 
, wi-onto:ref-indicator @<Indicator> 
, wi-onto:ref-year xsd:int 
, cex:value xsd:double 
, a ( qb:Observation ) 
} 
<Observation> { 
cex:ref-area @<Area> 
, cex:ref-indicator @<Indicator> 
, cex:ref-time @<Time> 
, cex:value xsd:double? 
, cex:computation @<Computation> 
, dcterms:issued xsd:dateTime 
, qb:dataSet @<DataSet> 
, qb:slice @<Slice> 
, rdfs:label xsd:string 
, lb:source @<Upload> 
, a ( qb:Observation ) 
} 
Same type: qb:Observation 
...but different shapes More info: 
Paper on Linked Data Quality Workshop
Conclusions 
Shape Expressions = simple language 
One goal: Describe and validate RDF graphs 
Semantics of Shape Expressions 
Described using inference rules 
...but Shape Expressions can be converted to SPARQL 
Compatible with other Semantic technologies 
Semantic actions = Extensibility mechanism 
Can be applied to transform RDF
Future Work 
Improve implementations and language 
Debugging and error messages 
Expressiveness and usability of language 
Performance evaluation 
Shape Expressions = role similar to Schema for XML 
Future applications: 
Online validators 
Interface generators 
Binding: generate parsers/tools from shapes 
Performance of RDF triplestores?
Future work at w3c 
RDF Data shapes WG chartered 
Mailing list: public-rdf-shapes@mail.org 
"The discussion on public-rdf-shapes@w3.org is the best entertainment since years; 
Game of Thrones colors pale." Paul Hermans (@PaulZH)
End of presentation 
Slides available at: 
http://www.slideshare.net/jelabra/semantics-2014

Shape Expressions: An RDF validation and transformation language

  • 1.
    Shape Expressions: AnRDF validation and transformation language Eric Prud'hommeaux World Wide Web Consortium MIT, Cambridge, MA, USA eric@w3.org Harold Solbrig Mayo Clinic USA College of Medicine, Rochester, MN, USA Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain labra@uniovi.es
  • 2.
    This talk in1 slide Motivating example: Represent issues and users in RDF ...and validate that data Shape Expressions = simple language to: Describe the topology of RDF data Validate if an RDF graph matches a given shape Shape expressions can be extended with actions Possible application: transform RDF into XML
  • 3.
    Motivating example Representin RDF a issue tracking system Issues are reported by users on some date Issues have some status (assigned/unassigned) Issues can also be reproduced on some date by users User Issue
  • 4.
    User__ foaf:name: xsd:string foaf:givenName: xsd:string* foaf:familyName: xsd:string foaf:mbox: IRI Issue__ :status: (:Assigned :Unassigned) :reportedOn: xsd:date :reproducedOn: xsd:date 1 :reportedBy 0..* 0..* :reproducedBy 0..1 0..* 0..1 :related E-R Diagram ...and several constraints A user: - has full name or several given names and one family name - can have one mbox A Issue - has status Assigned/Unassigned - is reported by a user - is reported on a date - can be reproduced by a user on a date - is related to other issues
  • 5.
    Example data inRDF :Issue1 :status :Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date . :Bob foaf:name "Bob Smith" ; foaf:mbox <mail:bob@example.org> . :Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:joe@example.org> . :Issue2 :status :Checked ; :reportedBy :Issue1 ; :reportedOn 2014 ; :reproducedBy :Tom .  :Tom foaf:name "Tom Smith", "Tam" . :Anna foaf:givenName "Anna" ; foaf:mbox 23. 
  • 6.
    Problem statement Wewant to detect possible errors in RDF like: Issues without status Issues with status different of Assigned/Unassigned Issues reported by something different to a user Issues reported on a date with a non-date type Issues reproduced on a date before the reported date Users without mbox Users with 2 names Users with with a name of type integer ...lots of other errors... Q: How can we describe RDF data to be able to detect those errors? A: Our proposal = Shape Expressions
  • 7.
    Shape Expressions -Users A user can have either: one foaf:name or one or more foaf:givenName and one foaf:familyName all of them must be of type xsd:string A user can have one foaf:mbox with value any IRI <UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ) , foaf:mbox IRI ? } The example uses compact syntax Shape Expressions can also be represented in RDF
  • 8.
    Shape Expressions -Issues Issues :status must be either :Assigned or :Unassigned Issues are :reportedBy a user Issues are :reportedOn a xsd:date A issue may be :reproducedBy a user and :reproduceOn an xsd:date A issue can be :related to several issues <IssueShape> { :status (:Assigned :Unassigned) , :reportedBy @<UserShape> , :reportedOn xsd:date , ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )? , :related @<IssueShape>* }
  • 9.
    Full example prefix: <http://example.org/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix foaf: <http://xmlns.com/foaf/0.1/> <UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ) , foaf:mbox IRI ? } <IssueShape> { :status (:Assigned :Unassigned) , :reportedBy @<UserShape> , :reportedOn xsd:date , ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )? , :related @<IssueShape>* } Online Shape Expressions validators: http://www.w3.org/2013/ShEx http://rdfshape.weso.es
  • 10.
    FAQ: Why notuse SPARQL? <UserShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+ , foaf:familyName xsd:string ) , foaf:mbox IRI ? } <IssueShape> { :status (:Assigned :Unassigned) , :reportedBy @<UserShape> , :reportedOn xsd:date , ( :reproducedBy @<UserShape> , :reproducedOn xsd:date )? , :related @<IssueShape>* } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 CONSTRUCT { ?IssueShape :hasShape <IssueShape> . ?UserShape :hasShape <UserShape> . } { { SELECT ?IssueShape { ?IssueShape :status ?o } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :status ?o . FILTER ((?o = :Assigned || ?o = :Unassigned)) } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c0) { ?IssueShape :reportedBy ?o . } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :reportedBy ?o . FILTER ((isIRI(?o) || isBlank(?o))) } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c1) { { SELECT ?IssueShape ?UserShape { ?IssueShape :reportedBy ?UserShape . FILTER (isIRI(?UserShape) || isBlank(?UserShape)) } } { SELECT ?UserShape WHERE { { { SELECT ?UserShape { ?UserShape foaf:name ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:name ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 } UNION { { SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { ?UserShape foaf:givenName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) { ?UserShape foaf:givenName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} FILTER (?UserShape_c0 = ?UserShape_c1) { SELECT ?UserShape { ?UserShape foaf:familyName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:familyName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1)} } } GROUP BY ?UserShape HAVING (COUNT(*) = 1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) { ?UserShape foaf:mbox ?o . } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) { ?UserShape foaf:mbox ?o . FILTER (isIRI(?o)) } GROUP BY ?HAVING (COUNT(*)<=1)} FILTER (?UserShape_c2 = ?UserShape_c3) 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 UserShape_c2 = ?UserShape_c3) } GROUP BY ?IssueShape } FILTER (?IssueShape_c0 = ?IssueShape_c1) OPTIONAL { ?IssueShape :reportedBy ?IssueShape_UserShape_ref0 . FILTER (isIRI(?IssueShape_UserShape_ref0) || isBlank(?IssueShape_UserShape_ref0)) } { SELECT ?IssueShape { ?IssueShape :reportedOn } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { SELECT ?IssueShape { ?IssueShape :reportedOn ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:date)) } GROUP BY ?IssueShape HAVING (COUNT(*)=1)} { { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c2) { ?IssueShape :reproducedBy ?o . } GROUP BY IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c3) { ?IssueShape :reproducedBy ?o . FILTER ((isIRI(?o) || isBlank(?o))) } GROUP BY ?IssueShape} FILTER (?IssueShape_c2 = ?IssueShape_c3) { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c5) { ?IssueShape :reproducedOn ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c6) { ?IssueShape :reproducedOn ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:date)) } GROUP BY IssueShape} FILTER (?IssueShape_c5 = ?IssueShape_c6) 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 FILTER (?IssueShape_c2=0 && ?IssueShape_c5=0 || ?IssueShape_c2>=1&&?IssueShape_c2<=1 && ?IssueShape_c5>=1&&?IssueShape_c5<=1) } { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c7) { ?IssueShape :related ?o . } GROUP BY ?IssueShape} { SELECT ?IssueShape (COUNT(*) AS ?IssueShape_c8) { ?IssueShape :related ?o . } GROUP BY ?IssueShape} FILTER (?IssueShape_c7 = ?IssueShape_c8) { SELECT ?UserShape WHERE { { { SELECT ?UserShape { ?UserShape foaf:name ?o . } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:name ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1)} } UNION { { SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) { ?UserShape foaf:givenName ?o . } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) { ?UserShape foaf:givenName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)>=1)} FILTER (?UserShape_c0 = ?UserShape_c1) { SELECT ?UserShape { ?UserShape foaf:familyName ?o . 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 } GROUP BY ?UserShape HAVING (COUNT(*)=1)} { SELECT ?UserShape { ?UserShape foaf:familyName ?o . FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) } GROUP BY ?UserShape HAVING (COUNT(*)=1)} } } GROUP BY ?UserShape HAVING (COUNT(*) = 1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c2) { ?UserShape foaf:mbox ?o . } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} { SELECT ?UserShape (COUNT(*) AS ?UserShape_c3) { ?UserShape foaf:mbox ?o . FILTER (isIRI(?o)) } GROUP BY ?UserShape HAVING (COUNT(*)<=1)} FILTER (?UserShape_c2 = ?UserShape_c3) } 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 . . . . Shape Expression Shape Expressions can be converted to SPARQL But Shape Expressions are simpler and more readable to solve this problem
  • 11.
    Shape Expressions Language Schema = set of Shape Expressions Shape Expression = labeled pattern Typical pattern = conjunction of several expressions Conjunction represented by , <IssueShape> { :status (:Assigned :Unassigned) , :reportedBy @<UserShape> , :reportedOn xsd:date ... } <label> { ...pattern... } Label Conjunction
  • 12.
    Arcs Basic expression:an Arc Arc = name definition followed by value definition <IssueShape> { :status (:Assigned :Unassigned) , :reportedBy @<UserShape> , :reportedOn xsd:date ... } Name defn Value defn :status :Unassigned :isue1 :reportedBy :bob :reportedOn 23-01-2013
  • 13.
    Value definition Valuedefinitions can be Value type xsd:date Matches a value of type xsd:date Value set ( :Assigned :Unassigned ) The object is an element of the given set Reference @<UserShape> The object has shape <UserShape> Stem foaf:~ Starts with the IRI associated with foaf Any - :Checked Any value except :Checked <IssueShape> { :status (:Assigned :Unassigned) , :reportedBy @<UserShape> , :reportedOn xsd:date ... } Value set Value reference Value type
  • 14.
    Name definition Namedefinitions can be Name term foaf:name Matches given IRI Name stem foaf:~ Any predicate that starts by foaf Name any - foaf:name Any predicate except foaf:name <IssueShape> { :status (:Assigned :Unassigned) , :reportedBy @<UserShape> , :reportedOn xsd:date ... } Name terms
  • 15.
    Alternatives Alternatives (disjunctions)are marked by | Example 1: An agent has either foaf:name or rdfs:label <Agent> { ( foaf:name xsd:string | rdfs:label xsd:string ) ... } Example 2: A list of integers <listOfInt> { rdf:first xsd:integer , ( rdf:rest ( rdf:nil ) | rdf:rest @<listOfInt> ) }
  • 16.
    Cardinalities The sameas in common regular expressions * 0 or more + 1 or more ? 0 or 1 {m} m repetitions {m,n} Between m and n repetitions <IssueShape> { ... ( :reproducedBy @<UserShape>, :reproducedOn xsd:date)? , :related @<IssueShape>* }
  • 17.
    Semantic actions Defineactions to be executed during validation <Issue> { ... :reportedOn xsd:date %js{ report = _.o; return true; %} , ( :reproducedBy @<UserShape> , :reproducedOn xsd:date %js{ return _.o.lex > report.lex; %} ) ? } %lang{ ...actions... %} Calls lang processor passing it the given actions Example: Check that :reportedOn must be before :reproducedOn
  • 18.
    Semantics of ShapeExpressions Operational semantics using inference rules Inspired by the semantics of RelaxNG Formalism used to define type inference systems Matching  infer shape typings Axioms and rules of the form:
  • 19.
    Example: matching rules( ) Graph can be decomposed in g1 and g2 Combine typings t1 and t2 Context Graph Type Assignment
  • 20.
    Transforming RDF usingShEx Semantic actions can be combined with specialized languages Possible languages: sparql, js Other examples: GenX = very simple language to generate XML Goal: Semantic lowering Map RDF clinical records to XML GenJ generates JSON
  • 21.
    Example :Issue1 :status:Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date . :Bob foaf:name "Bob Smith" ; foaf:mbox <mail:bob@example.org> . :Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:joe@example.org> . RDF (Turtle) Shape Expressions XML + GenX <issue xmlns="http://ex.example/xml" id="Issue1" status="Unassigned"> <reported date="2013-01-23"> <given-name>Bob</given-name> <family-name>Smith</family-name> <email>mail:bob@example.org</email> </reported> <reproduced date="2013-01-23"> <given-name>Joe</given-name> <given-name>Joseph</given-name> <family-name>Thompson</family-name> <email>mail:joe@example.org</email> </reproduced> </issue>
  • 22.
    GenX GenX syntax $IRI Generates elements in that namespace <name> Add element <name> @<name> Add attribute <name> =<expr> XPath function applied to the value = Don't emit the value [-n] Place the value up n values in the hierarchy
  • 23.
    Example transforming RDFto XML %GenX{ issue $http://ex.example/xml %} <IssueShape> { ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%} , ex:reportedBy @<UserShape> %GenX{ reported = %} , ex:reportedOn xsd:date %GenX{ [-1]@date %} , (ex:reproducedBy @<UserShape>, ex:reproducedOn xsd:date %GenX{ @date %} )? %GenX{ reproduced = %} , ex:related @<IssueShape>* } %GenX{ @id %} <UserShape> { (foaf:name xsd:string %GenX{ full-name %} | foaf:givenName xsd:string+ %GenX{ given-name %} , foaf:familyName xsd:string %GenX{ family-name %} ) , foaf:mbox shex:IRI ? %GenX{ email %} }
  • 24.
    Example :Issue1 :status:Unassigned ; :reportedBy :Bob ; :reportedOn "2013-01-23"^^xsd:date ; :reproducedBy :Thompson.J ; :reproducedOn "2013-01-23"^^xsd:date . :Bob foaf:name "Bob Smith" ; foaf:mbox <mail:bob@example.org> . :Thompson.J foaf:givenName "Joe", "Joseph" ; foaf:familyName "Thompson" ; foaf:mbox <mail:joe@example.org> . <issue xmlns="http://ex.example/xml" id="Issue1" status="Unassigned"> <reported date="2013-01-23"> <given-name>Bob</given-name> <family-name>Smith</family-name> <email>mail:bob@example.org</email> </reported> <reproduced date="2013-01-23"> <given-name>Joe</given-name> <given-name>Joseph</given-name> <family-name>Thompson</family-name> <email>mail:joe@example.org</email> </reproduced> </issue> RDF (Turtle) XML Shape Expressions + GenX %GenX{ issue $http://ex.example/xml %} <IssueShape> { ex:status (ex:unassigned ex:assigned) %GenX{@status =substr(19)%} , ex:reportedBy @<UserShape> %GenX{ reported = %} , ex:reportedOn xsd:date %GenX{ [-1]@date %} , (ex:reproducedBy @<UserShape>, ex:reproducedOn xsd:date %GenX{ @date %} )? %GenX{ reproduced = %} , ex:related @<IssueShape>* } %GenX{ @id %} <UserShape> { (foaf:name xsd:string %GenX{ full-name %} | foaf:givenName xsd:string+ %GenX{ given-name %} , foaf:familyName xsd:string %GenX{ family-name %} ) , foaf:mbox shex:IRI ? %GenX{ email %} } Shape Expressions + GenX
  • 25.
    Current Implementations NameMain Developer Language Features FancyDemo Eric Prud'hommeaux Javascript First implementation Semantic Actions - GenX, GenJ Conversion to SPARQL http://www.w3.org/2013/ShEx/ JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntax https://github.com/jessevdam/shextest ShExcala Jose E. Labra Scala Several extensions: negations, reverse arcs, relations,... Efficient implementation using Derivatives http://labra.github.io/ShExcala/ Haws Jose E. Labra Haskell Prototype to check inference semantics http://labra.github.io/haws/
  • 26.
    Applications to linkeddata portals 2 data portals: WebIndex and LandPortal Data portal documentation http://weso.github.io/wiDoc/ http://weso.github.io/landportalDoc/data <Observation> { cex:md5-checksum xsd:string , cex:computation @<Computation> , dcterms:issued xsd:integer , dcterms:publisher ( wi-org:WebFoundation ) , qb:dataSet @<Dataset> , rdfs:label (@en) , sdmx-concept:obsStatus @<ObsStatus> , wi-onto:ref-area @<Area> , wi-onto:ref-indicator @<Indicator> , wi-onto:ref-year xsd:int , cex:value xsd:double , a ( qb:Observation ) } <Observation> { cex:ref-area @<Area> , cex:ref-indicator @<Indicator> , cex:ref-time @<Time> , cex:value xsd:double? , cex:computation @<Computation> , dcterms:issued xsd:dateTime , qb:dataSet @<DataSet> , qb:slice @<Slice> , rdfs:label xsd:string , lb:source @<Upload> , a ( qb:Observation ) } Same type: qb:Observation ...but different shapes More info: Paper on Linked Data Quality Workshop
  • 27.
    Conclusions Shape Expressions= simple language One goal: Describe and validate RDF graphs Semantics of Shape Expressions Described using inference rules ...but Shape Expressions can be converted to SPARQL Compatible with other Semantic technologies Semantic actions = Extensibility mechanism Can be applied to transform RDF
  • 28.
    Future Work Improveimplementations and language Debugging and error messages Expressiveness and usability of language Performance evaluation Shape Expressions = role similar to Schema for XML Future applications: Online validators Interface generators Binding: generate parsers/tools from shapes Performance of RDF triplestores?
  • 29.
    Future work atw3c RDF Data shapes WG chartered Mailing list: public-rdf-shapes@mail.org "The discussion on public-rdf-shapes@w3.org is the best entertainment since years; Game of Thrones colors pale." Paul Hermans (@PaulZH)
  • 30.
    End of presentation Slides available at: http://www.slideshare.net/jelabra/semantics-2014