Digital Enterprise Research Institute                                             www.deri.ie




                        ...
Linked Data…
Digital Enterprise Research Institute       www.deri.ie




                                        2
Purpose of talk: Application developers…
                   how to not sink…
Digital Enterprise Research Institute        ...
Purpose of talk: RDF Publishers…   how
              to avoid common mistakes…
Digital Enterprise Research Institute      ...
Talking about errors in Linked Data…
Digital Enterprise Research Institute                                       www.deri....
Digital Enterprise Research Institute                                   www.deri.ie




 Chapter 1: HTTP-level issues…
   ...
Waldo URIs:
               URIs with no dereferencable RDF
Digital Enterprise Research Institute                          ...
Hmm not *so* many…
Digital Enterprise Research Institute                  www.deri.ie




    5.3% of HTTP URIs return 40...
Lies… Damned Lies…
                  & Content-Type Reporting
Digital Enterprise Research Institute                       ...
Okay… So he’s actually pretty honest
Digital Enterprise Research Institute                www.deri.ie


      16.9% of va...
Same triples, different document
Digital Enterprise Research Institute                                    www.deri.ie




...
E.g., the Miracle at Calais:
                     turning 1,778 triples into ~∞ quads
Digital Enterprise Research Institut...
Digital Enterprise Research Institute                                www.deri.ie




 Chapter 2: Reasoning issues…
       ...
Undefined classes and properties…
Digital Enterprise Research Institute                                  www.deri.ie




 ...
Quite common…
Digital Enterprise Research Institute                                  www.deri.ie


      14.3% of triples...
Not-so-unique values for
                  Inverse-Functional Properties
Digital Enterprise Research Institute            ...
Spartacus relived…
Digital Enterprise Research Institute                                         www.deri.ie



          ...
…unattended, can be pretty serious…
Digital Enterprise Research Institute                                       www.deri.i...
Malformed/incompatible datatypes
Digital Enterprise Research Institute                              www.deri.ie




      ...
Not *too* bad…
Digital Enterprise Research Institute                                   www.deri.ie




      4.7% of type...
Mystical beings…
                 Members of disjoint classes
Digital Enterprise Research Institute                       ...
Again, not *too* bad…
Digital Enterprise Research Institute                             www.deri.ie




      1,329 membe...
Ontology hijacking…
Digital Enterprise Research Institute                            www.deri.ie




   Anybody can say an...
Redefining Everything…
                                        …and home in time for tea
Digital Enterprise Research Insti...
Solutions?
Digital Enterprise Research Institute        www.deri.ie




                                        25
Application side: workarounds
Digital Enterprise Research Institute                             www.deri.ie




         ...
Publishing side: Validators!
Digital Enterprise Research Institute                                 www.deri.ie




      ...
Publishing side: Pedantic Web Group
Digital Enterprise Research Institute                                                 ...
Upcoming SlideShare
Loading in …5
×

Weaving the Pedantic Web (LD

3,993
-1

Published on

Published in: Technology

Weaving the Pedantic Web (LD

  1. 1. Digital Enterprise Research Institute www.deri.ie Weaving the Pedantic Web LDOW 2010 Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, Axel Polleres 0:39:00 Copyright 2009 Digital Enterprise Research Institute. All rights reserved. 1
  2. 2. Linked Data… Digital Enterprise Research Institute www.deri.ie 2
  3. 3. Purpose of talk: Application developers… how to not sink… Digital Enterprise Research Institute www.deri.ie 3
  4. 4. Purpose of talk: RDF Publishers… how to avoid common mistakes… Digital Enterprise Research Institute www.deri.ie 4
  5. 5. Talking about errors in Linked Data… Digital Enterprise Research Institute www.deri.ie We’ll try not to ruin the party …statistics based on crawl:  April 2009  5k domain limit  150k URIS, 55k RDF docs  12.5m triples (quads)  Mentioning 1.6m URIs  5,850 classes/9,507 props  Accept: application/rdf+xml …okay… so no RDFa Statistics are *illustrative* not exhaustive! 5
  6. 6. Digital Enterprise Research Institute www.deri.ie Chapter 1: HTTP-level issues… …a good RDF description these days is hard to find 6
  7. 7. Waldo URIs: URIs with no dereferencable RDF Digital Enterprise Research Institute www.deri.ie Not a crawler’s idea of fun… 7
  8. 8. Hmm not *so* many… Digital Enterprise Research Institute www.deri.ie  5.3% of HTTP URIs return 40x/50x  Excluding redirects… 92.8% return 200 OK  In return, only 45.4% of 200 Okay return report application/rdf+xml  34.8% return HTML… probably just HTML docs… okay… maybe a *few* contain RDFa 8
  9. 9. Lies… Damned Lies… & Content-Type Reporting Digital Enterprise Research Institute www.deri.ie “Trust me, it’s RDF/XML” 9
  10. 10. Okay… So he’s actually pretty honest Digital Enterprise Research Institute www.deri.ie  16.9% of valid RDF/XML documents returned with an invalid/more generic Content-type: text/xml (9.5%) application/xml (5.9%) text/plain (1%) text/html (0.4%)  Of those returning Content-type:application/rdf+xml 98.8% were valid RDF/XML 10
  11. 11. Same triples, different document Digital Enterprise Research Institute www.deri.ie I wish they’d used a redirect… 11
  12. 12. E.g., the Miracle at Calais: turning 1,778 triples into ~∞ quads Digital Enterprise Research Institute www.deri.ie http://d.opencalais.com/1/type/em/r/SameTriplesDifferentDocument (apologies to OpenCalais guys – it’s just a convenient example) 12
  13. 13. Digital Enterprise Research Institute www.deri.ie Chapter 2: Reasoning issues… …or, how I learned to start worrying and stop loving OWL 13
  14. 14. Undefined classes and properties… Digital Enterprise Research Institute www.deri.ie It looks important, but I’m afraid I don’t fully follow 14
  15. 15. Quite common… Digital Enterprise Research Institute www.deri.ie  14.3% of triples use undeclared property  8.1% of triples use undeclared class  Three cases:  Case 1: Namespace has no vocabulary/ is not deferencable (e.g., rss:item)  Case 2: Term invented in related namespace (e.g., foaf:tagLine invented by LiveJournal)  Case 3: Term is misspelt version of term defined in namespace (e.g., foaf:image vs. foaf:img) 15
  16. 16. Not-so-unique values for Inverse-Functional Properties Digital Enterprise Research Institute www.deri.ie Despite what you claim, not all of you can *actually be* Spartacus 16
  17. 17. Spartacus relived… Digital Enterprise Research Institute www.deri.ie 08445a31a78661b5c746feff39a9db6e4e2cc5cf  sha1-sum of „mailto:‟  common value for foaf:mbox_sha1sum  An inverse-functional (uniquely identifying) property!!!  Any person who shares the same value will be considered the same *I’m Spartacus!* …and so’s my wife 17
  18. 18. …unattended, can be pretty serious… Digital Enterprise Research Institute www.deri.ie foaf:mbox_sha1sum a owl:InverseFunctionalProperty . ?x foaf:mbox_sha1sum 08445a31a78661b5c746feff39a9db6e4e2cc5cf . OWL 2 RL rule prp-ifp: ?p a owl:InverseFunctionalProperty . ?x1 ?p ?z . ?x2 ?p ?z . ⇒ ?x1 owl:sameAs ?x2 . 106 ?x1/?x2bindings in body 1012 inferred pair-wise and reflexive owl:sameAs statements …or in simpler terms: pow! 18
  19. 19. Malformed/incompatible datatypes Digital Enterprise Research Institute www.deri.ie As he would undoubtedly be able to tell you, “true” is not a valid xsd:int 19
  20. 20. Not *too* bad… Digital Enterprise Research Institute www.deri.ie  4.7% of typed literals were “ill-typed” (lexically invalid)…  mostly xsd:dateTimes (26.4% of all date-time literals were invalid; e.g., omitted the seconds field)  Also, literals are sometimes incompatible with the datatype-range of a property:  E.g., 21.8% of ical:description triples used language tags incompatible with the defined range of xsd:string  E.g., 100% of sl:creationDate triples use plain literal values incompatible with defined range of xsd:date 20
  21. 21. Mystical beings… Members of disjoint classes Digital Enterprise Research Institute www.deri.ie Despite what FOAF says, it seems that Persons can also be Documents 21
  22. 22. Again, not *too* bad… Digital Enterprise Research Institute www.deri.ie  1,329 members of disjoint classes found  Generally caused by naïve URI naming:  Use of information resource URIs to name entities (particularly foaf:Persons)  E.g., <me> foaf:knows <jim/foaf.rdf> . 22
  23. 23. Ontology hijacking… Digital Enterprise Research Institute www.deri.ie Anybody can say anything, anywhere, and unfortunately for everyone else, have a good chance of being taken seriously 23
  24. 24. Redefining Everything… …and home in time for tea Digital Enterprise Research Institute www.deri.ie From http://www.eiao.net/rdf/1.0 <owl:Property rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"> <rdfs:label xml:lang="en">type</rdfs:label> <rdfs:comment xml:lang="en">Type of resource</rdfs:comment> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#testRun"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#pageSurvey"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#siteSurvey"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#scenario"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#rangeLocation"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#startPointer"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#endPointer"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#header"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#runs"/> </owl:Property> Ontology hijacking!! (apologies to EIAO guys – it’s just a convenient example) 24
  25. 25. Solutions? Digital Enterprise Research Institute www.deri.ie 25
  26. 26. Application side: workarounds Digital Enterprise Research Institute www.deri.ie  All presented issues have a suitable antidote, once you know about them  See paper for discussion… 26
  27. 27. Publishing side: Validators! Digital Enterprise Research Institute www.deri.ie  Syntax errors quite rare, partly due to popularity of W3C RDF/XML syntax validator  Need an all-in-one validation service  Should not only validate strict errors, but give feedback on suspected issues  We offer a prototypical service at: http://swse.deri.org/RDFAlerts/ 27
  28. 28. Publishing side: Pedantic Web Group Digital Enterprise Research Institute www.deri.ie  Get the community to contact publishers about errors/issues as they arise  Get involved: http://pedantic-web.org/  137 members!  Acknowledgements to: Aidan Hogan, Alex Passant, Me, Antoine Zimmermann, Axel Polleres, Michael Hausenblas, Richard Cyganiak, Stéphane Corlosquet 28
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×