Learning from Linked Open Data Usage
Upcoming SlideShare
Loading in...5
×
 

Learning from Linked Open Data Usage

on

  • 4,107 views

"Although the cloud of Linked Open Data has been growing continuously for several years, little is known about the particular features of linked data usage. Motivating why it is important to ...

"Although the cloud of Linked Open Data has been growing continuously for several years, little is known about the particular features of linked data usage. Motivating why it is important to understand the usage of Linked Data, we describe typical linked data usage scenarios and contrast the so derived requirement with conventional server access analysis. Then, we report on usage patterns found through an in-depth analysis of access logs of four popular LOD datasets. Eventually, based on the usage patterns we found in the analysis, we propose metrics for assessing Linked Data usage from the human and the machine perspective, taking into account different agent types and resource representations."

Slides for a presentation at WebScience 2010. The paper is available for download at http://journal.webscience.org/302/.

Statistics

Views

Total Views
4,107
Slideshare-icon Views on SlideShare
3,085
Embed Views
1,022

Actions

Likes
3
Downloads
24
Comments
0

5 Embeds 1,022

http://kantenwerk.org 1002
http://www.slideshare.net 13
http://translate.googleusercontent.com 4
http://178.79.144.218 2
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Learning from Linked Open Data Usage Learning from Linked Open Data Usage Presentation Transcript

    •  Copyright 2010 Knud Möller Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-sa/3.0/ Learning from Linked Open Data Usage: Patterns & Metrics Knud Möller, Michael Hausenblas, Richard Cyganiak, Gunnar Grimnes, Siegfried Handschuh WebScience 2010, Raleigh, NC, USA 26/04/2010 13/03/2008 FAST kick-off, Madrid, 2008  Copyright 2010 Digital Enterprise Research Institute. All rights reserved. Monday 26 April 2010
    • What is Linked (Open) Data? (in <1 minute) Conventional “Eye-ball” Web Web of Linked Data interlinked documents interlinked items of data (URIs, RDF) mainly people / Web mainly machine agents browsers 2 Monday 26 April 2010
    • What is Linked (Open) Data? (in <1 minute) Linked Open Data cloud (the set of interlinked, Semantic Web datasets) February 2008 http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData July 2009 3 Monday 26 April 2010
    • Question: How is Linked Data being Used? •plenty of research on conventional Web usage •what about usage of linked data? Why? •how healthy is the Web of linked data? •who is using the data and how? Is it useful? Are there trends? •providers: improve hosting •... just curiosity! 4 Monday 26 April 2010
    • Question: How is Linked Data being Used? •plenty of research on conventional Web usage •what about usage of linked data? Why? •how healthy is the Web of linked data? •who is using the data and how? Is it useful? Are there trends? ics? •providers: improve hosting e tr m •... just curiosity! e bo w 4 Monday 26 April 2010
    • Approach •particular sites: – a URI for each data item ➙ a request for each data item (resource) – content negotiation best practices – redirection (HTTP 303) 5 Monday 26 April 2010
    • Approach •particular sites: – a URI for each data item ➙ a request for each data item (resource) – content negotiation best practices – redirection (HTTP 303) http://data.semanticweb.org/ conference/www/2009 plain resource URI RDF HTML document URI document URI http://data.semanticweb.org/ http://data.semanticweb.org/ conference/www/2009/rdf conference/www/2009/html 5 Monday 26 April 2010
    • Approach (ctd.) •server log files – common log format (CLF), combined log format Request IP Request Date Request String 80.219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0" 200 64674 "-" "ARC Reader (http://arc.semsol.org/)" Response Code Responce Size Referrer User Agent •RDF requests vs. “semantic” requests •90.21.243.141 − − [06/Oct/2008:16:07:58 +0100] ”GET /organization/vrije −universiteit−amsterdam−the−netherlands HTTP/1.1” 303 7592 ”−” ”rdflib −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)” •90.21.243.141 − − [06/Oct/2008:16:08:02 +0100] ”GET /organization/vrije −universiteit−amsterdam−the−netherlands/rdf HTTP/1.1” 200 45358 ”−” ”rdflib −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)” 6 Monday 26 April 2010
    • se Code Responce Size Referrer User Agent Source Data Figure 1: The combined log format # triples # days total # hits # plain hits # RDF hits # HTML hits SPARQL Dog Food 79,175 597 8,427,967 1,923,945 259,031 1,647,205 879,932 (14,117) (3,223) (434) (2,759) (1,471) DBpedia 109,750,000 118 87,203,310 22,821,475 7,008,310 22,999,237 20,972,630 (739,011) (193,402) (59,392) (194,909) (177,734) DBTune 74,209,000 61 7,467,125 1,952,185 1,135,509 677,904 3,055,493 (122,412) (32,003) (18,615) (11,113) (50,090) RKBExplorer 91,501,684 29 529,938 — — — 9,327 (18,274) (—) (—) (—) (322) RDF 5.8% Semantic 2.8% Table 1: Overview of four 4.2% datasets Semantic LOD Semantic 2.5% RDF 14.9% RDF 7.8% are served. For our evaluation, we had access to log taining a SPARQL query, we assume that it is Plain 47.7% two periods: from 24/05/2009–21/06/2009 and from ble of 45% Plain handling the query result, i.e., either a Plain 41.0% 2009–29/10/2009, i.e., roughly two months. bindings (in the case of a SELECT query), pote containing URIs of RDF resources, or an RDF RKBExplorer (in the case of a CONSTRUCT or DESCRIBE q BExplorer6 [11] is another meta-dataset currently com- 44 sub-datasets covering various topics and sources HTML 46.5% • RDF requests: if an agent directly requests HTML 39.9% HTML 51.1% the domain of academic research, as well as a Web from a server, we assume that it knows how t ation that allows users to access and browse its content cess data in this format. Directly here mean DBpedia ntegrated fashion. Both RDF and HTML documents DBTune the agent specified an RDF syntax such as rd Dog Food the resources in all datasets are available. Apart from as an acceptable response in the header of its re g linked data, the site also features a module that Merely requesting the URI of an RDF represen es co-reference resolution functionality [10]. For our does not suffice to indicate semanticity, as this 7 tion, we had access to log files in the period from simply mean that the agent followed a link to th 2009–21/06/2009, i.e., roughly one month. However, resentation. Monday 26 April 2010
    • Agents: Ordinary Traffic http://data.semanticweb.org, 21/07/2008 - 20/06/2009 500000 hits 3) 83 ordinary traffic: the usual suspects 66 8 97 37 23 ) (4 13 59 400000 ot (1 B p ) le 28 ur & ) g 11 Sl 89 oo 92 11 o! G (1 ho t( 300000 bo er 5) Ya 32 ch sn 12 et m hits eF r( le ic w nd ra Si 200000 2) tic 34 ul ) 08 (7 m 68 .0 /1 r( ot de fb 100000 ea rd R C R A 0 0 5 10 15 20 25 30 SW Dog Food (21/07/2008 - 20/06/2009) agents 8 Monday 26 April 2010
    • semantic hits/total hits (>100 semantic hits) 9 0 0.2 0.4 0.6 0.8 1 Monday 26 April 2010 attributor/1.13.2 triplr sindicebot rdflib-2.4.2 Ripple OL_Virtuoso_RDF_crawler Morph_Converter_Service Falconsbot Speedy Slug_SW_Crawler yacybot hclsreport-crawler MJ12bot PycURL heritrix/1.14.3 SindiceFetcher heritrix/pom.version heritrix/2.0.2 multicrawler SindiceBot ia_archiver Zitgist-APlusPlus-Agent rdflib-2.4.1 Mp3Bot curl Agents: How “Semantic” are they? Zend_Http_Client Speedy_Spider nxcrawler marbles - Java rdflib-2.4.0 (unknown) ARC_Reader MLBot Mozilla Jakarta_HttpClient Wget libwww-perl MSIE Firefox Python-urllib sindice_ontology_fetcher semantic traffic: new kinds of agents Googlebot
    • Is Demand for LOD increasing? Dog Food Hits over Time (smoothing factor 0.05) 6000 plain html rdf 5000 semantic 4000 3000 2000 1000 no increase for semantic requests 0 2008-07-01 2008-09-01 2008-11-01 2009-01-01 2009-03-01 2009-05-01 2009-07-01 2009-09-01 2009-11-01 2010-01-01 2010-03-01 2010-05-01 10 Monday 26 April 2010
    • Is Demand for LOD increasing? (ctd.) DBpedia Hits over Time (smoothing factor 0.05) 300000 plain html rdf 250000 semantic 200000 150000 100000 50000 no increase for semantic requests 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 11 Monday 26 April 2010
    • Do Real-world Events have an Impact on LOD Usage? Demand for Events (smoothing factor 0.05) 700 iswc2008 www2009 600 possible impact eswc2009 iswc2009 500 400 300 200 100 0 2008-07-01 2008-09-01 2008-11-01 2009-01-01 2009-03-01 2009-05-01 2009-07-01 2009-09-01 2009-11-01 2010-01-01 2010-03-01 2010-05-01 12 Monday 26 April 2010
    • Do Real-world Events have an Impact on LOD Usage? Irish Lisbon Treaty Referendum (smoothing factor 0.05) 9 http://dbpedia.org/resource/Republic_of_Ireland http://dbpedia.org/resource/European_Union 8 http://dbpedia.org/resource/Treaty_of_Lisbon 7 possible impact 6 5 4 3 2 1 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 13 Monday 26 April 2010
    • Do Real-world Events have an Impact on LOD Usage? Michael Jackson Memorial Service (smoothing factor 0.05) 4.5 http://dbpedia.org/resource/Staples_Center http://dbpedia.org/resource/Michael_Jackson_memorial_service 4 http://dbpedia.org/resource/Michael_Jackson 3.5 3 2.5 2 possible impact 1.5 1 0.5 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 14 Monday 26 April 2010
    • Conclusion (of sorts) •Generic approach for analysing usage of LOD sites (but see below), based on server log files •Metric for semanticity of agents •Did not notice a rising demand in LOD •However: real-world events do seem to have an effect on LOD usage •Restrictions: – does not work well with embedded metadata (e.g., RDFa-based sites) – does not take into account usage through meta sites (indexes, search engines, ...) 15 Monday 26 April 2010