SlideShare a Scribd company logo
Digital Enterprise Research Institute                                             www.deri.ie




                                        Weaving the Pedantic Web

                                              LDOW 2010
                          Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan
                                         Decker, Axel Polleres




              0:39:00
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
                                                                                  1
Linked Data…
Digital Enterprise Research Institute       www.deri.ie




                                        2
Purpose of talk: Application developers…
                   how to not sink…
Digital Enterprise Research Institute                    www.deri.ie




                                        3
Purpose of talk: RDF Publishers…   how
              to avoid common mistakes…
Digital Enterprise Research Institute                  www.deri.ie




                                        4
Talking about errors in Linked Data…
Digital Enterprise Research Institute                                       www.deri.ie


                                        We’ll try not to ruin the party

                                        …statistics based on crawl:
                                         April 2009
                                         5k domain limit
                                         150k URIS, 55k RDF docs
                                         12.5m triples (quads)
                                         Mentioning 1.6m URIs
                                         5,850 classes/9,507 props
                                         Accept: application/rdf+xml
                                             …okay… so no RDFa
                                        Statistics are *illustrative* not
                                          exhaustive!
                                         5
Digital Enterprise Research Institute                                   www.deri.ie




 Chapter 1: HTTP-level issues…
                   …a good RDF description these days is hard to find




                                          6
Waldo URIs:
               URIs with no dereferencable RDF
Digital Enterprise Research Institute                                  www.deri.ie




                                        Not a crawler’s idea of fun…

                                                   7
Hmm not *so* many…
Digital Enterprise Research Institute                  www.deri.ie




    5.3% of HTTP URIs return 40x/50x
    Excluding redirects…
        92.8% return 200 OK

    In return, only 45.4% of 200 Okay return report
     application/rdf+xml
    34.8% return HTML… probably just HTML docs…
        okay… maybe a *few* contain RDFa




                                        8
Lies… Damned Lies…
                  & Content-Type Reporting
Digital Enterprise Research Institute                              www.deri.ie




                                        “Trust me, it’s RDF/XML”

                                                   9
Okay… So he’s actually pretty honest
Digital Enterprise Research Institute                www.deri.ie


      16.9% of valid RDF/XML documents
       returned with an invalid/more generic
       Content-type:
           text/xml (9.5%)
           application/xml (5.9%)
           text/plain (1%)
           text/html (0.4%)

      Of those returning
       Content-type:application/rdf+xml
       98.8% were valid RDF/XML

                                        10
Same triples, different document
Digital Enterprise Research Institute                                    www.deri.ie




                                        I wish they’d used a redirect…

                                                    11
E.g., the Miracle at Calais:
                     turning 1,778 triples into ~∞ quads
Digital Enterprise Research Institute                                                          www.deri.ie




             http://d.opencalais.com/1/type/em/r/SameTriplesDifferentDocument


                             (apologies to OpenCalais guys – it’s just a convenient example)



                                                        12
Digital Enterprise Research Institute                                www.deri.ie




 Chapter 2: Reasoning issues…
          …or, how I learned to start worrying and stop loving OWL




                                        13
Undefined classes and properties…
Digital Enterprise Research Institute                                  www.deri.ie




             It looks important, but I’m afraid I don’t fully follow

                                        14
Quite common…
Digital Enterprise Research Institute                                  www.deri.ie


      14.3% of triples use undeclared property
      8.1% of triples use undeclared class

      Three cases:

      Case 1: Namespace has no vocabulary/
       is not deferencable
                  (e.g., rss:item)
      Case 2: Term invented in related namespace
            (e.g., foaf:tagLine invented by LiveJournal)
      Case 3: Term is misspelt version of term defined in namespace
            (e.g., foaf:image vs. foaf:img)



                                        15
Not-so-unique values for
                  Inverse-Functional Properties
Digital Enterprise Research Institute                                www.deri.ie




                                  Despite what you claim,
                        not all of you can *actually be* Spartacus

                                          16
Spartacus relived…
Digital Enterprise Research Institute                                         www.deri.ie



                            08445a31a78661b5c746feff39a9db6e4e2cc5cf



           sha1-sum of „mailto:‟
           common value for foaf:mbox_sha1sum
                  An inverse-functional (uniquely identifying) property!!!
                  Any person who shares the same value will be considered
                   the same


                                        *I’m Spartacus!*
                                          …and so’s my wife


                                             17
…unattended, can be pretty serious…
Digital Enterprise Research Institute                                       www.deri.ie




 foaf:mbox_sha1sum a owl:InverseFunctionalProperty .
 ?x foaf:mbox_sha1sum 08445a31a78661b5c746feff39a9db6e4e2cc5cf .


 OWL 2 RL rule prp-ifp:
 ?p a owl:InverseFunctionalProperty . ?x1 ?p ?z . ?x2 ?p ?z .
 ⇒ ?x1 owl:sameAs ?x2 .


 106     ?x1/?x2bindings in body
       1012 inferred pair-wise and reflexive owl:sameAs statements



                                        …or in simpler terms:
                                                                     pow!



                                             18
Malformed/incompatible datatypes
Digital Enterprise Research Institute                              www.deri.ie




                    As he would undoubtedly be able to tell you,
                          “true” is not a valid xsd:int

                                        19
Not *too* bad…
Digital Enterprise Research Institute                                   www.deri.ie




      4.7% of typed literals were “ill-typed” (lexically
       invalid)…
             mostly xsd:dateTimes (26.4% of all date-time literals
              were invalid; e.g., omitted the seconds field)


      Also, literals are sometimes incompatible with
       the datatype-range of a property:
             E.g., 21.8% of ical:description triples used
              language tags incompatible with the defined range of
              xsd:string
             E.g., 100% of sl:creationDate triples use plain literal
              values incompatible with defined range of xsd:date


                                        20
Mystical beings…
                 Members of disjoint classes
Digital Enterprise Research Institute                              www.deri.ie




                           Despite what FOAF says, it seems that
                             Persons can also be Documents

                                           21
Again, not *too* bad…
Digital Enterprise Research Institute                             www.deri.ie




      1,329 members of disjoint classes found



      Generally caused by naïve URI naming:
             Use of information resource URIs to name entities
              (particularly foaf:Persons)
             E.g., <me> foaf:knows <jim/foaf.rdf> .




                                        22
Ontology hijacking…
Digital Enterprise Research Institute                            www.deri.ie




   Anybody can say anything, anywhere, and unfortunately for everyone
            else, have a good chance of being taken seriously


                                        23
Redefining Everything…
                                        …and home in time for tea
Digital Enterprise Research Institute                                                         www.deri.ie




      From http://www.eiao.net/rdf/1.0
      <owl:Property rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">
          <rdfs:label xml:lang="en">type</rdfs:label>
          <rdfs:comment xml:lang="en">Type of resource</rdfs:comment>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#testRun"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#pageSurvey"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#siteSurvey"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#scenario"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#rangeLocation"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#startPointer"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#endPointer"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#header"/>
          <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#runs"/>
      </owl:Property>


      Ontology hijacking!!
                                  (apologies to EIAO guys – it’s just a convenient example)


                                                          24
Solutions?
Digital Enterprise Research Institute        www.deri.ie




                                        25
Application side: workarounds
Digital Enterprise Research Institute                             www.deri.ie




           All presented issues have a suitable antidote, once
            you know about them

           See paper for discussion…




                                        26
Publishing side: Validators!
Digital Enterprise Research Institute                                 www.deri.ie




           Syntax errors quite rare, partly due to popularity of
            W3C RDF/XML syntax validator

           Need an all-in-one validation service
               Should    not only validate strict errors, but give
                   feedback on suspected issues
               We       offer a prototypical service at:
                      http://swse.deri.org/RDFAlerts/




                                          27
Publishing side: Pedantic Web Group
Digital Enterprise Research Institute                                                      www.deri.ie




           Get the community to contact publishers about
            errors/issues as they arise
           Get involved: http://pedantic-web.org/
           137 members!
           Acknowledgements to: Aidan Hogan, Alex Passant, Me, Antoine Zimmermann, Axel
            Polleres, Michael Hausenblas, Richard Cyganiak, Stéphane Corlosquet


                                                28

More Related Content

Similar to Weaving the Pedantic Web (LD

How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
Richard Cyganiak
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
net2-project
 
A Privacy Preference Manager for the Social Semantic Web
A Privacy Preference Manager for the Social Semantic WebA Privacy Preference Manager for the Social Semantic Web
A Privacy Preference Manager for the Social Semantic Web
Owen Sacco
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
Roberto García
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
Dan Brickley
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
fahimilyas
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
Alexandre Passant
 
Riding the Semantic Web
Riding the Semantic WebRiding the Semantic Web
Riding the Semantic Web
Matthias Vandermaesen
 
Towards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful ExperiencesTowards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful Experiences
Fadi Maali
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
Martin Hepp
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
guestecacad2
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
Michael Hausenblas
 
The Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataThe Social Semantic Web and Linked Data
The Social Semantic Web and Linked Data
Alexandre Passant
 
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
François Scharffe
 
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift
 
Naming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of DataNaming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of Data
Daniel Vila Suero
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data catalogues
Richard Cyganiak
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
Dan Brickley
 
OO and Rails...
OO and Rails... OO and Rails...
OO and Rails... adzdavies
 

Similar to Weaving the Pedantic Web (LD (20)

How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
A Privacy Preference Manager for the Social Semantic Web
A Privacy Preference Manager for the Social Semantic WebA Privacy Preference Manager for the Social Semantic Web
A Privacy Preference Manager for the Social Semantic Web
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
Riding the Semantic Web
Riding the Semantic WebRiding the Semantic Web
Riding the Semantic Web
 
When?
When?When?
When?
 
Towards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful ExperiencesTowards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful Experiences
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
The Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataThe Social Semantic Web and Linked Data
The Social Semantic Web and Linked Data
 
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
 
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
 
Naming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of DataNaming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of Data
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data catalogues
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
OO and Rails...
OO and Rails... OO and Rails...
OO and Rails...
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 

Weaving the Pedantic Web (LD

  • 1. Digital Enterprise Research Institute www.deri.ie Weaving the Pedantic Web LDOW 2010 Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, Axel Polleres 0:39:00 Copyright 2009 Digital Enterprise Research Institute. All rights reserved. 1
  • 2. Linked Data… Digital Enterprise Research Institute www.deri.ie 2
  • 3. Purpose of talk: Application developers… how to not sink… Digital Enterprise Research Institute www.deri.ie 3
  • 4. Purpose of talk: RDF Publishers… how to avoid common mistakes… Digital Enterprise Research Institute www.deri.ie 4
  • 5. Talking about errors in Linked Data… Digital Enterprise Research Institute www.deri.ie We’ll try not to ruin the party …statistics based on crawl:  April 2009  5k domain limit  150k URIS, 55k RDF docs  12.5m triples (quads)  Mentioning 1.6m URIs  5,850 classes/9,507 props  Accept: application/rdf+xml …okay… so no RDFa Statistics are *illustrative* not exhaustive! 5
  • 6. Digital Enterprise Research Institute www.deri.ie Chapter 1: HTTP-level issues… …a good RDF description these days is hard to find 6
  • 7. Waldo URIs: URIs with no dereferencable RDF Digital Enterprise Research Institute www.deri.ie Not a crawler’s idea of fun… 7
  • 8. Hmm not *so* many… Digital Enterprise Research Institute www.deri.ie  5.3% of HTTP URIs return 40x/50x  Excluding redirects… 92.8% return 200 OK  In return, only 45.4% of 200 Okay return report application/rdf+xml  34.8% return HTML… probably just HTML docs… okay… maybe a *few* contain RDFa 8
  • 9. Lies… Damned Lies… & Content-Type Reporting Digital Enterprise Research Institute www.deri.ie “Trust me, it’s RDF/XML” 9
  • 10. Okay… So he’s actually pretty honest Digital Enterprise Research Institute www.deri.ie  16.9% of valid RDF/XML documents returned with an invalid/more generic Content-type: text/xml (9.5%) application/xml (5.9%) text/plain (1%) text/html (0.4%)  Of those returning Content-type:application/rdf+xml 98.8% were valid RDF/XML 10
  • 11. Same triples, different document Digital Enterprise Research Institute www.deri.ie I wish they’d used a redirect… 11
  • 12. E.g., the Miracle at Calais: turning 1,778 triples into ~∞ quads Digital Enterprise Research Institute www.deri.ie http://d.opencalais.com/1/type/em/r/SameTriplesDifferentDocument (apologies to OpenCalais guys – it’s just a convenient example) 12
  • 13. Digital Enterprise Research Institute www.deri.ie Chapter 2: Reasoning issues… …or, how I learned to start worrying and stop loving OWL 13
  • 14. Undefined classes and properties… Digital Enterprise Research Institute www.deri.ie It looks important, but I’m afraid I don’t fully follow 14
  • 15. Quite common… Digital Enterprise Research Institute www.deri.ie  14.3% of triples use undeclared property  8.1% of triples use undeclared class  Three cases:  Case 1: Namespace has no vocabulary/ is not deferencable (e.g., rss:item)  Case 2: Term invented in related namespace (e.g., foaf:tagLine invented by LiveJournal)  Case 3: Term is misspelt version of term defined in namespace (e.g., foaf:image vs. foaf:img) 15
  • 16. Not-so-unique values for Inverse-Functional Properties Digital Enterprise Research Institute www.deri.ie Despite what you claim, not all of you can *actually be* Spartacus 16
  • 17. Spartacus relived… Digital Enterprise Research Institute www.deri.ie 08445a31a78661b5c746feff39a9db6e4e2cc5cf  sha1-sum of „mailto:‟  common value for foaf:mbox_sha1sum  An inverse-functional (uniquely identifying) property!!!  Any person who shares the same value will be considered the same *I’m Spartacus!* …and so’s my wife 17
  • 18. …unattended, can be pretty serious… Digital Enterprise Research Institute www.deri.ie foaf:mbox_sha1sum a owl:InverseFunctionalProperty . ?x foaf:mbox_sha1sum 08445a31a78661b5c746feff39a9db6e4e2cc5cf . OWL 2 RL rule prp-ifp: ?p a owl:InverseFunctionalProperty . ?x1 ?p ?z . ?x2 ?p ?z . ⇒ ?x1 owl:sameAs ?x2 . 106 ?x1/?x2bindings in body 1012 inferred pair-wise and reflexive owl:sameAs statements …or in simpler terms: pow! 18
  • 19. Malformed/incompatible datatypes Digital Enterprise Research Institute www.deri.ie As he would undoubtedly be able to tell you, “true” is not a valid xsd:int 19
  • 20. Not *too* bad… Digital Enterprise Research Institute www.deri.ie  4.7% of typed literals were “ill-typed” (lexically invalid)…  mostly xsd:dateTimes (26.4% of all date-time literals were invalid; e.g., omitted the seconds field)  Also, literals are sometimes incompatible with the datatype-range of a property:  E.g., 21.8% of ical:description triples used language tags incompatible with the defined range of xsd:string  E.g., 100% of sl:creationDate triples use plain literal values incompatible with defined range of xsd:date 20
  • 21. Mystical beings… Members of disjoint classes Digital Enterprise Research Institute www.deri.ie Despite what FOAF says, it seems that Persons can also be Documents 21
  • 22. Again, not *too* bad… Digital Enterprise Research Institute www.deri.ie  1,329 members of disjoint classes found  Generally caused by naïve URI naming:  Use of information resource URIs to name entities (particularly foaf:Persons)  E.g., <me> foaf:knows <jim/foaf.rdf> . 22
  • 23. Ontology hijacking… Digital Enterprise Research Institute www.deri.ie Anybody can say anything, anywhere, and unfortunately for everyone else, have a good chance of being taken seriously 23
  • 24. Redefining Everything… …and home in time for tea Digital Enterprise Research Institute www.deri.ie From http://www.eiao.net/rdf/1.0 <owl:Property rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"> <rdfs:label xml:lang="en">type</rdfs:label> <rdfs:comment xml:lang="en">Type of resource</rdfs:comment> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#testRun"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#pageSurvey"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#siteSurvey"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#scenario"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#rangeLocation"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#startPointer"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#endPointer"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#header"/> <rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#runs"/> </owl:Property> Ontology hijacking!! (apologies to EIAO guys – it’s just a convenient example) 24
  • 25. Solutions? Digital Enterprise Research Institute www.deri.ie 25
  • 26. Application side: workarounds Digital Enterprise Research Institute www.deri.ie  All presented issues have a suitable antidote, once you know about them  See paper for discussion… 26
  • 27. Publishing side: Validators! Digital Enterprise Research Institute www.deri.ie  Syntax errors quite rare, partly due to popularity of W3C RDF/XML syntax validator  Need an all-in-one validation service  Should not only validate strict errors, but give feedback on suspected issues  We offer a prototypical service at: http://swse.deri.org/RDFAlerts/ 27
  • 28. Publishing side: Pedantic Web Group Digital Enterprise Research Institute www.deri.ie  Get the community to contact publishers about errors/issues as they arise  Get involved: http://pedantic-web.org/  137 members!  Acknowledgements to: Aidan Hogan, Alex Passant, Me, Antoine Zimmermann, Axel Polleres, Michael Hausenblas, Richard Cyganiak, Stéphane Corlosquet 28