Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Validating and Describing 
Linked Data Portals using 
RDF Shape Expressions 
Eric Prud'hommeaux 
World Wide Web 
Consortiu...
This talk in one slide 
Shape Expressions 
Simple language to describe and validate RDF 
Can be applied to linked data por...
Web Index 
Measure WWW's contribution to development and 
human rights by country 
Developed by the Web Foundation 
81 cou...
Webindex workflow 
Data 
(Excel) 
RDF 
Datastore 
Visualizations 
Linked data portal 
Conversion 
Excel  RDF 
Enrichment
WebIndex data model 
Observations have values by years 
Observations refer to indicators and countries 
ITU_B 2011 2012 20...
Main webIndex data model* 
1..n 
qb:slice 
Observation 
qb:observation 
rdf:type = qb:Observation 
cex:value: xsd:float 
d...
Excel  RDF (Turtle) 
indicator:ITU_B 
a wf:SecondaryIndicator ; 
rdfs:label "Broadband subscribers %" 
. 
dataset:DITU a ...
Description and Validation 
Lots of constraints 
Observations must be linked to some country 
Observations have a float va...
Shape Expressions 
A simple and intuitive language to: 
Describe the topology of RDF data 
Validate that an RDF graph matc...
Country 
A <Country> has at least the following properties: 
rdf:type with value wf:Country 
rdfs:label with value of type...
DataSets 
A <DataSet> has the shape: 
rdf:type with value qb:Dataset 
qb:structure with value wf:DSD 
Optional rdfs:label ...
Slices 
<Slice> has the properties: 
rdf:type with value qb:Slice 
qb:SliceStructure Whar does with it value mean? 
wf:sli...
Observations 
<Observation> { 
rdf:type (qb:Observation) 
, cex:value xsd:float ? 
, dc:issued xsd:dateTime 
, rdfs:label ...
...and more 
Indicators 
<Indicator> { 
rdf:type (wf:PrimaryIndicator wf:SecondaryIndicator) 
, rdfs:label xsd:string 
, r...
Current Use of shape expressions 
in WebIndex 
1. Documentation of linked data portal 
Human-readable 
Machine processable...
OK, but...why not use SPARQL? 
1 CONSTRUCT { 
2 ?Organization shex:hasShape <Organization> . 
3 } WHERE { SELECT ?Organiza...
Implementations of 
Shape Expressions 
Name Main 
Developer 
Language Features 
FancyDemo Eric 
Prud'hommeaux 
Javascript ...
RDFShape: Online validation tool 
RDFShape = online validation tool 
Deployed at: http://rdfshape.weso.es 
Based on ShExca...
Shapes ≠ Types 
Types oriented to concepts 
Inference: RDF Schena, OWL 
Shapes oriented to RDF graphs 
Interface descripti...
Extensions & challenges 
Shape Expressions language has just been born 
It will be affected by W3c Charter group about 
RD...
Alternatives, negations, etc. 
More expressiveness from regular expressions 
Alternatives 
Negations 
Groupings 
<Country>...
Open vs Closed shapes 
Open shapes allow remaining triples after validation 
<User> { 
a (foaf:Person) 
} 
:john a foaf:Pe...
Shape inclusions 
A shape includes/extends another shape 
Example: 
<Provider> extends <Organization> { 
wf:sourceURI IRI ...
Reverse and relation arcs 
Shape Expressions describe subjects 
Can be extended to describe 
objects (reverse arcs) 
predi...
Semantic actions 
Defines actions to be executed during validation 
%lang{ ...actions... %} 
Calls lang processor passing ...
Variables and filters 
Add variable bindings to matched values 
Add FILTER rules similar to SPARQL 
<Country> { 
wf:iso2 (...
Conclusions 
Shape Expressions: 
DSL to describe and validate RDF 
Role similar to DTDs or Schema languages for XML 
Quali...
Future Work 
Shape Expressions language 
Named graphs 
Implementations: Debugging and error messages 
Performance 
Applica...
End of presentation 
More info: 
Jose Emilio Labra Gayo 
http://www.weso.es
Upcoming SlideShare
Loading in …5
×

Validating and Describing Linked Data Portals using RDF Shape Expressions

1,298 views

Published on

Presentation at 1st Linked Data Quality Workshop, Leipzig, 2nd Sept. 2014
Author: Jose Emilio Labra Gayo
Applies Shapes Expressions to validate the WebIndex linked data portal

Published in: Software
  • Be the first to comment

Validating and Describing Linked Data Portals using RDF Shape Expressions

  1. 1. Validating and Describing Linked Data Portals using RDF Shape Expressions Eric Prud'hommeaux World Wide Web Consortium MIT, Cambridge, MA, USA eric@w3.org Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain labra@uniovi.es Harold Solbrig Mayo Clinic USA College of Medicine, Rochester, MN, USA Jose María Álvarez Rodríguez Dept. Computer Science Carlos III University Spain josemaria.alvarez@uc3m.es
  2. 2. This talk in one slide Shape Expressions Simple language to describe and validate RDF Can be applied to linked data portals Use case: WebIndex project
  3. 3. Web Index Measure WWW's contribution to development and human rights by country Developed by the Web Foundation 81 countries, 116 indicators, 5 years (2007-12) Linked data portal http://data.webfoundation.org/webindex/2013
  4. 4. Webindex workflow Data (Excel) RDF Datastore Visualizations Linked data portal Conversion Excel  RDF Enrichment
  5. 5. WebIndex data model Observations have values by years Observations refer to indicators and countries ITU_B 2011 2012 2013 ... Germany 20.34 35.46 37.12 ... Spain 19.12 23.78 25.45 ... France 20.12 21.34 28.34 ... ... ... ... ... ... ITU_B 2011 2012 2013 ... Germany 20.34 35.46 37.12 ... Spain 19.12 23.78 25.45 ... France 20.12 21.34 28.34 ... ... ... ... ... ... ITU_B 2010 2011 2012 ... Germany 20.34 35.46 37.12 ... Spain 19.12 23.78 25.45 ... France 20.12 21.34 28.34 ... ... ... ... ... ... Model based on RDF Data Cube Main entity = Observation DataSets are published by Organizations Datasets contain several slices Slices group observations Observation Indicator Years % Broadband subscribers Countries Slice DataSet Indicators are provided by Organizations Examples ITU = International Telecommunication Union UN = United Nations WB = World bank ...
  6. 6. Main webIndex data model* 1..n qb:slice Observation qb:observation rdf:type = qb:Observation cex:value: xsd:float dc:issued: xsd:dateTime rdfs:label: xsd:String cex:ref-year: xsd:gYear Organization rdf:type = org:Organization rdfs:label: xsd:String foaf:homepage: IRI Indicator rdf:type = cex:Primary | cex:Secondary rdfs:label: xsd:string rdfs:comment: xsd:string skos:notation: xsd:String Slice rdf:type = qb:Slice qb:sliceStructure: wf:sliceByArea Country rdf:type = wf:Country wf:iso2 : xsd:string wf:iso3 : xsd:string rdfs:label : xsd:String DataSet rdf:type = qb:DataSet qb:structure : wf:DSD rdfs:label : xsd:String qb:observation cex:ref-area cex:indicator wf:provider dc:publisher 1..n *Simplified
  7. 7. Excel  RDF (Turtle) indicator:ITU_B a wf:SecondaryIndicator ; rdfs:label "Broadband subscribers %" . dataset:DITU a qb:DataSet ; rdfs:label "ITU Dataset" ; dc:publisher org:ITU ; qb:slice slice:ITU10B , slice:ITU11B, . ... ... slice:ITU11B a qb:Slice ; qb:sliceStructure wf:sliceByYear ; qb:observation obs:obs8165, obs:obs8166, ... ... org:ITU a org:Organization ; rdfs:label "ITU" ; foaf:homepage <http://www.itu.int/> . country:Spain a wf:Country ; wf:iso2 "ES" ; wf:iso3 "ESP" ; rdfs:label "Spain" . interrelated linked data obs:obs8165 a qb:Observation ; rdfs:label "ITU B in ESP, 2011" ; cex:indicator indicator:ITU_B ; qb:dataSet dataset:DITU ; cex:value "23.78"^^xsd:float ; cex:ref-year 2011 ; cex:ref-area country:Spain ; dc:issued "2013-05-30"^^xsd:date ; ... .
  8. 8. Description and Validation Lots of constraints Observations must be linked to some country Observations have a float value Observations are related with an indicator, a country and a year Dataset contains several slices and slices contain several observations ....etc. Q: How can we express those constraints easily? Our proposal: Shape expressions
  9. 9. Shape Expressions A simple and intuitive language to: Describe the topology of RDF data Validate that an RDF graph matches a shape Two syntaxes - Compact syntax (inspired by RelaxNG, Turtle and SPARQL) - RDF More details about ShEx: Paper in Semantics-2014 (5th Sept, 10:30h) http://www.w3.org/2013/ShEx/Primer
  10. 10. Country A <Country> has at least the following properties: rdf:type with value wf:Country rdfs:label with value of type xsd:string wf:iso2 with value of type xsd:string wf:iso3 with value of type xsd:string Using shape Expressions: Label Open shape <Country> { rdf:type (wf:Country) , rdfs:label xsd:string , wf:iso2 xsd:string , wf:iso3 xsd:string } Conjunction
  11. 11. DataSets A <DataSet> has the shape: rdf:type with value qb:Dataset qb:structure with value wf:DSD Optional rdfs:label with value of type xsd:string One or more qb:slice with shape <Slice> <DataSet> { rdf:type (qb:DataSet) , qb:structure (wf:DSD) , dc:publisher @<Organization> , rdfs:label xsd:string ? , qb:slice @<Slice>+ } Cardinality posibilities: * (0 or more) ? (0 or 1) + (1 or more) {m,n} between m and n
  12. 12. Slices <Slice> has the properties: rdf:type with value qb:Slice qb:SliceStructure Whar does with it value mean? wf:sliceByYear Several qb:observation with shape <Observation> cex:indicator with shape <Indicator> <Slice> { rdf:type (qb:Slice) , qb:sliceStructure (wf:sliceByYear) , qb:observation @<Observation>+ , cex:indicator @<Indicator> }
  13. 13. Observations <Observation> { rdf:type (qb:Observation) , cex:value xsd:float ? , dc:issued xsd:dateTime , rdfs:label xsd:string ? , qb:dataSet @<DataSet> , cex:ref-area @<Country> , cex:indicator @<Indicator> , cex:ref-year xsd:gYear }
  14. 14. ...and more Indicators <Indicator> { rdf:type (wf:PrimaryIndicator wf:SecondaryIndicator) , rdfs:label xsd:string , rdfs:comment xsd:string ? , skos:notation xsd:string ? } Organizations <Organization> { rdf:type (org:Organization) , rdfs:label xsd:string , foaf:homepage IRI , org:hasSubOrganization @<Organization> }
  15. 15. Current Use of shape expressions in WebIndex 1. Documentation of linked data portal Human-readable Machine processable 2. Team communication Tell the developers which shapes they must generate 3. Validation For example: check if a value of type qb:Observation has shape <Observation>
  16. 16. OK, but...why not use SPARQL? 1 CONSTRUCT { 2 ?Organization shex:hasShape <Organization> . 3 } WHERE { SELECT ?Organization { 4 ?Organization a ?o . 5 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 6 { SELECT ?Organization { 7 ?Organization a ?o . FILTER ((?o = org:Organization)) 8 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 9 { SELECT ?Organization { 10 ?Organization rdfs:label ?o . 11 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 12 { SELECT ?Organization { 13 ?Organization rdfs:label ?o . 14 FILTER ((isLiteral(?o) && datatype(?o) = xsd:string)) 15 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 16 { SELECT ?Organization { 17 ?Organization foaf:homepage ?o . 18 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 19 { SELECT ?Organization { 20 ?Organization foaf:homepage ?o . FILTER (isIRI(?o)) 21 } GROUP BY ?Organization HAVING (COUNT(*)=1)} 22 { SELECT ?Organization (COUNT(*) AS ?Organization_c0) { 23 ?Organization org:hasSubOrganization ?o . 24 } GROUP BY ?Organization} 25 { SELECT ?Organization (COUNT(*) AS ?Organization_c1) { 26 ?Organization org:hasSubOrganization ?o . 26 } GROUP BY ?Organization} 28 FILTER (?Organization_c0 = ?Organization_c1) 29 } SPARQL Opgnization shape Shape Expressions 1 <Organization> { 2 rdf:type (org:Organization) 3 , rdfs:label xsd:string 4 , foaf:homepage IRI 5 , org:hasSubOrganization @<Organization> 6 } Full example of WebIndex: 61 lines ShEx vs 580 lines SPARQL
  17. 17. Implementations of Shape Expressions Name Main Developer Language Features FancyDemo Eric Prud'hommeaux Javascript First implementation Semantic Actions and Conversion to SPARQL http://www.w3.org/2013/ShEx/ JsShExTest Jesse van Dam Javascript Supports RDF and Compact syntax https://github.com/jessevdam/shextest ShExcala Jose E. Labra Scala Several extensions: negations, reverse arcs, relations,... Efficient implementation using Derivatives http://labra.github.io/ShExcala/ Haws Jose E. Labra Haskell Prototype to check Inference semantics http://labra.github.io/haws/
  18. 18. RDFShape: Online validation tool RDFShape = online validation tool Deployed at: http://rdfshape.weso.es Based on ShExcala Can validate by: URI File Textarea SPARQL endpoint Dereference
  19. 19. Shapes ≠ Types Types oriented to concepts Inference: RDF Schena, OWL Shapes oriented to RDF graphs Interface descriptions WebIndex <Observation> { rdf:type (qb:Observation) ,cex:ref-year xsd:gYear ...other properties from WebIndex } LandPortal <Observation> { rdf:type (qb:Observation) , lb:time @<Time> ...other properties from LandPortal } RDF Data Cube qb:Observation a rdfs:Class, owl:Class; rdfs:label "Observation"@en; rdfs:comment "... "@en; rdfs:subClassOf qb:Attachable; owl:equivalentClass scovo:Item; rdfs:isDefinedBy ... Can be local to a data portal Types are global However, we can also define generic Shapes templates
  20. 20. Extensions & challenges Shape Expressions language has just been born It will be affected by W3c Charter group about RDF Data Shapes Mailing list: public-rdf-shapes@w3c.org "The discussion on public-rdf-shapes@w3.org is the best entertainment since years; Game of Thrones colors pale." @PaulZH
  21. 21. Alternatives, negations, etc. More expressiveness from regular expressions Alternatives Negations Groupings <Country> { a (wf:Country) , rdfs:label xsd:string , ( wf:iso2 xsd:string | wf:iso3 xsd:string ) , ! dc:creator . }
  22. 22. Open vs Closed shapes Open shapes allow remaining triples after validation <User> { a (foaf:Person) } :john a foaf:Person, foaf:name "John" . <User> [ a (foaf:Person) ] :anna a foaf:Person . Open shape Closed shape    
  23. 23. Shape inclusions A shape includes/extends another shape Example: <Provider> extends <Organization> { wf:sourceURI IRI } A <Provider> has the same shape as an <Organization> plus the property wf:sourceURI
  24. 24. Reverse and relation arcs Shape Expressions describe subjects Can be extended to describe objects (reverse arcs) predicates (relation arcs) <Country> { rdf:type (wf:Country) , rdfs:label xsd:string , wf:iso2 xsd:string , wf:iso3 xsd:string , ^ cex:ref-area @<Observation> * } Reverse arc: a country has several incoming arcs with property cex:ref-area from subjects with shape <Observation>
  25. 25. Semantic actions Defines actions to be executed during validation %lang{ ...actions... %} Calls lang processor passing it the given actions <Country> { ... wf:iso2 xsd:string %js{return /^[A-Z]{2}$/i.test(_.o.lex); %} , wf:iso3 xsd:string %js{return /^[A-Z]{3}$/i.test(_.o.lex); %} }
  26. 26. Variables and filters Add variable bindings to matched values Add FILTER rules similar to SPARQL <Country> { wf:iso2 (xsd:string AS ?iso2) wf:iso3 (xsd:string AS ?iso3) FILTER (regex(?iso2,"^[A-Z]{2}$","i") && regex(?iso3,"^[A-Z]{3}$","i")) }
  27. 27. Conclusions Shape Expressions: DSL to describe and validate RDF Role similar to DTDs or Schema languages for XML Quality of linked data portals Shape Expressions as interface descriptions Publishers: Understand what to publish Consumers: Check data before processing
  28. 28. Future Work Shape Expressions language Named graphs Implementations: Debugging and error messages Performance Applications to Other linked data use cases User interface generation Binding: generate parsers/tools from shapes ...
  29. 29. End of presentation More info: Jose Emilio Labra Gayo http://www.weso.es

×