Researcher - University of Oviedo at University of Oviedo
Aug. 31, 2014•0 likes•1,574 views
1 of 29
Validating and Describing Linked Data Portals using RDF Shape Expressions
Aug. 31, 2014•0 likes•1,574 views
Download to read offline
Report
Software
Presentation at 1st Linked Data Quality Workshop, Leipzig, 2nd Sept. 2014
Author: Jose Emilio Labra Gayo
Applies Shapes Expressions to validate the WebIndex linked data portal
Validating and Describing Linked Data Portals using RDF Shape Expressions
1. Validating and Describing
Linked Data Portals using
RDF Shape Expressions
Eric Prud'hommeaux
World Wide Web
Consortium
MIT, Cambridge, MA, USA
eric@w3.org
Jose Emilio Labra Gayo
WESO Research group
University of Oviedo
Spain
labra@uniovi.es
Harold Solbrig
Mayo Clinic
USA
College of Medicine, Rochester,
MN, USA
Jose María Álvarez Rodríguez
Dept. Computer Science
Carlos III University
Spain
josemaria.alvarez@uc3m.es
2. This talk in one slide
Shape Expressions
Simple language to describe and validate RDF
Can be applied to linked data portals
Use case: WebIndex project
3. Web Index
Measure WWW's contribution to development and
human rights by country
Developed by the Web Foundation
81 countries, 116 indicators, 5 years (2007-12)
Linked data portal
http://data.webfoundation.org/webindex/2013
4. Webindex workflow
Data
(Excel)
RDF
Datastore
Visualizations
Linked data portal
Conversion
Excel RDF
Enrichment
5. WebIndex data model
Observations have values by years
Observations refer to indicators and countries
ITU_B 2011 2012 2013 ...
Germany 20.34 35.46 37.12 ...
Spain 19.12 23.78 25.45 ...
France 20.12 21.34 28.34 ...
... ... ... ... ...
ITU_B 2011 2012 2013 ...
Germany 20.34 35.46 37.12 ...
Spain 19.12 23.78 25.45 ...
France 20.12 21.34 28.34 ...
... ... ... ... ...
ITU_B 2010 2011 2012 ...
Germany 20.34 35.46 37.12 ...
Spain 19.12 23.78 25.45 ...
France 20.12 21.34 28.34 ...
... ... ... ... ...
Model based on RDF Data Cube
Main entity = Observation
DataSets are published by Organizations
Datasets contain several slices
Slices group observations
Observation
Indicator Years
% Broadband subscribers
Countries
Slice
DataSet
Indicators are provided by Organizations
Examples
ITU = International Telecommunication Union
UN = United Nations
WB = World bank
...
8. Description and Validation
Lots of constraints
Observations must be linked to some country
Observations have a float value
Observations are related with an indicator, a country
and a year
Dataset contains several slices and slices contain
several observations
....etc.
Q: How can we express those constraints easily?
Our proposal: Shape expressions
9. Shape Expressions
A simple and intuitive language to:
Describe the topology of RDF data
Validate that an RDF graph matches a shape
Two syntaxes
- Compact syntax (inspired by RelaxNG, Turtle and SPARQL)
- RDF
More details about ShEx:
Paper in Semantics-2014 (5th Sept, 10:30h)
http://www.w3.org/2013/ShEx/Primer
10. Country
A <Country> has at least the following properties:
rdf:type with value wf:Country
rdfs:label with value of type xsd:string
wf:iso2 with value of type xsd:string
wf:iso3 with value of type xsd:string
Using shape Expressions:
Label Open shape
<Country> {
rdf:type (wf:Country)
, rdfs:label xsd:string
, wf:iso2 xsd:string
, wf:iso3 xsd:string
}
Conjunction
11. DataSets
A <DataSet> has the shape:
rdf:type with value qb:Dataset
qb:structure with value wf:DSD
Optional rdfs:label with value of type xsd:string
One or more qb:slice with shape <Slice>
<DataSet> {
rdf:type (qb:DataSet)
, qb:structure (wf:DSD)
, dc:publisher @<Organization>
, rdfs:label xsd:string ?
, qb:slice @<Slice>+
}
Cardinality posibilities:
* (0 or more)
? (0 or 1)
+ (1 or more)
{m,n} between m and n
12. Slices
<Slice> has the properties:
rdf:type with value qb:Slice
qb:SliceStructure Whar does with it value mean?
wf:sliceByYear
Several qb:observation with shape <Observation>
cex:indicator with shape <Indicator>
<Slice> {
rdf:type (qb:Slice)
, qb:sliceStructure (wf:sliceByYear)
, qb:observation @<Observation>+
, cex:indicator @<Indicator>
}
15. Current Use of shape expressions
in WebIndex
1. Documentation of linked data portal
Human-readable
Machine processable
2. Team communication
Tell the developers which shapes they must generate
3. Validation
For example: check if a value of type
qb:Observation has shape <Observation>
16. OK, but...why not use SPARQL?
1 CONSTRUCT {
2 ?Organization shex:hasShape <Organization> .
3 } WHERE { SELECT ?Organization {
4 ?Organization a ?o .
5 } GROUP BY ?Organization HAVING (COUNT(*)=1)}
6 { SELECT ?Organization {
7 ?Organization a ?o . FILTER ((?o = org:Organization))
8 } GROUP BY ?Organization HAVING (COUNT(*)=1)}
9 { SELECT ?Organization {
10 ?Organization rdfs:label ?o .
11 } GROUP BY ?Organization HAVING (COUNT(*)=1)}
12 { SELECT ?Organization {
13 ?Organization rdfs:label ?o .
14 FILTER ((isLiteral(?o) && datatype(?o) = xsd:string))
15 } GROUP BY ?Organization HAVING (COUNT(*)=1)}
16 { SELECT ?Organization {
17 ?Organization foaf:homepage ?o .
18 } GROUP BY ?Organization HAVING (COUNT(*)=1)}
19 { SELECT ?Organization {
20 ?Organization foaf:homepage ?o . FILTER (isIRI(?o))
21 } GROUP BY ?Organization HAVING (COUNT(*)=1)}
22 { SELECT ?Organization (COUNT(*) AS ?Organization_c0) {
23 ?Organization org:hasSubOrganization ?o .
24 } GROUP BY ?Organization}
25 { SELECT ?Organization (COUNT(*) AS ?Organization_c1) {
26 ?Organization org:hasSubOrganization ?o .
26 } GROUP BY ?Organization}
28 FILTER (?Organization_c0 = ?Organization_c1)
29 }
SPARQL
Opgnization shape
Shape Expressions
1 <Organization> {
2 rdf:type (org:Organization)
3 , rdfs:label xsd:string
4 , foaf:homepage IRI
5 , org:hasSubOrganization @<Organization>
6 }
Full example of WebIndex:
61 lines ShEx vs 580 lines SPARQL
17. Implementations of
Shape Expressions
Name Main
Developer
Language Features
FancyDemo Eric
Prud'hommeaux
Javascript First implementation
Semantic Actions and Conversion to SPARQL
http://www.w3.org/2013/ShEx/
JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntax
https://github.com/jessevdam/shextest
ShExcala Jose E. Labra Scala Several extensions:
negations, reverse arcs, relations,...
Efficient implementation using Derivatives
http://labra.github.io/ShExcala/
Haws Jose E. Labra Haskell Prototype to check Inference semantics
http://labra.github.io/haws/
18. RDFShape: Online validation tool
RDFShape = online validation tool
Deployed at: http://rdfshape.weso.es
Based on ShExcala
Can validate by:
URI
File
Textarea
SPARQL endpoint
Dereference
19. Shapes ≠ Types
Types oriented to concepts
Inference: RDF Schena, OWL
Shapes oriented to RDF graphs
Interface descriptions
WebIndex
<Observation> {
rdf:type (qb:Observation)
,cex:ref-year xsd:gYear
...other properties from WebIndex
}
LandPortal
<Observation> {
rdf:type (qb:Observation)
, lb:time @<Time>
...other properties from LandPortal
}
RDF Data Cube
qb:Observation a rdfs:Class,
owl:Class;
rdfs:label "Observation"@en;
rdfs:comment "... "@en;
rdfs:subClassOf qb:Attachable;
owl:equivalentClass scovo:Item;
rdfs:isDefinedBy ...
Can be local to a data portal Types are global
However, we can also define generic Shapes templates
20. Extensions & challenges
Shape Expressions language has just been born
It will be affected by W3c Charter group about
RDF Data Shapes
Mailing list: public-rdf-shapes@w3c.org
"The discussion on public-rdf-shapes@w3.org is the
best entertainment since years;
Game of Thrones colors pale." @PaulZH
21. Alternatives, negations, etc.
More expressiveness from regular expressions
Alternatives
Negations
Groupings
<Country> { a (wf:Country)
, rdfs:label xsd:string
, ( wf:iso2 xsd:string | wf:iso3 xsd:string )
, ! dc:creator .
}
22. Open vs Closed shapes
Open shapes allow remaining triples after validation
<User> {
a (foaf:Person)
}
:john a foaf:Person,
foaf:name "John" .
<User> [
a (foaf:Person)
]
:anna a foaf:Person .
Open shape Closed shape
23. Shape inclusions
A shape includes/extends another shape
Example:
<Provider> extends <Organization> {
wf:sourceURI IRI
}
A <Provider> has the same shape as an <Organization>
plus the property wf:sourceURI
24. Reverse and relation arcs
Shape Expressions describe subjects
Can be extended to describe
objects (reverse arcs)
predicates (relation arcs)
<Country> {
rdf:type (wf:Country)
, rdfs:label xsd:string
, wf:iso2 xsd:string
, wf:iso3 xsd:string
, ^ cex:ref-area @<Observation> *
}
Reverse arc: a country has several incoming arcs with property cex:ref-area
from subjects with shape <Observation>
25. Semantic actions
Defines actions to be executed during validation
%lang{ ...actions... %}
Calls lang processor passing it the given actions
<Country> {
...
wf:iso2 xsd:string %js{return /^[A-Z]{2}$/i.test(_.o.lex); %}
, wf:iso3 xsd:string %js{return /^[A-Z]{3}$/i.test(_.o.lex); %}
}
26. Variables and filters
Add variable bindings to matched values
Add FILTER rules similar to SPARQL
<Country> {
wf:iso2 (xsd:string AS ?iso2)
wf:iso3 (xsd:string AS ?iso3)
FILTER (regex(?iso2,"^[A-Z]{2}$","i") &&
regex(?iso3,"^[A-Z]{3}$","i"))
}
27. Conclusions
Shape Expressions:
DSL to describe and validate RDF
Role similar to DTDs or Schema languages for XML
Quality of linked data portals
Shape Expressions as interface descriptions
Publishers: Understand what to publish
Consumers: Check data before processing
28. Future Work
Shape Expressions language
Named graphs
Implementations: Debugging and error messages
Performance
Applications to
Other linked data use cases
User interface generation
Binding: generate parsers/tools from shapes
...