Successfully reported this slideshow.

The Lonesome LOD Cloud

3,219 views

Published on

Invited talk at USEWOD2014 (http://people.cs.kuleuven.be/~bettina.berendt/USEWOD2014/)

A tremendous amount of machine-interpretable information is available in the Linked Open Data Cloud. Unfortunately, much of this data remains underused as machine clients struggle to use the Web. I believe this can be solved by giving machines interfaces similar to those we offer humans, instead of separate interfaces such as SPARQL endpoints. In this talk, I'll discuss the Linked Data Fragments vision on machine access to the Web of Data, and indicate how this impacts usage analysis of the LOD Cloud. We all can learn a lot from how humans access the Web, and those strategies can be applied to querying and analysis. In particular, we have to focus first on solving those use cases that humans can do easily, and only then consider tackling others.

Published in: Technology

The Lonesome LOD Cloud

  1. 1. The Lonesome LOD Cloud Ruben Verborgh
  2. 2. The Web for humans offers an HTTP interface to HTML documents. client dataHTTP HTML
  3. 3. The Web for applications offers an HTTP interface to JSON documents. client dataHTTP JSON
  4. 4. The Web for intelligent agents offers an HTTP interface to RDF documents. client dataHTTP RDF
  5. 5. The Web for intelligent agents offers a SPARQL interface to an RDF database. client dataHTTP RDF SPARQL
  6. 6. The good news about SPARQL they’re fast. they’re great for analysis. they’re efficient. endpoints:
  7. 7. The bad news about SPARQL they don’t work on the Web. endpoints:
  8. 8. We cannot wait any longer for
 public SPARQL endpoints to work. The LOD cloud is there— we should query it now.
  9. 9. Stop building intelligent servers, start building intelligent clients. But how will we analyze those?
  10. 10. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  11. 11. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  12. 12. Currently, there are three ways
 to provide access to a Linked Data dataset. SPARQL endpoint data dump Linked Data documents
  13. 13. Those three ways have one thing in common: they offer fragments of a dataset. SPARQL endpoint data dump Linked Data documents
  14. 14. Linked Data Fragments look
 at all ways at the same time. specific queries
 high server effort
 low availability generic requests
 high client effort
 high availability LD
 document data
 dump SPARQL
 result
  15. 15. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls What data does it contain? What do we know about it? What can we do next?
  16. 16. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls a specific entity creator, maintainer, … links to other LD documents Linked Data Document
  17. 17. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls a SPARQL query (none) (none) SPARQL CONSTRUCT result
  18. 18. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls everything (none) data dump number of triples, file size
  19. 19. Any API that provides triples
 publishes Linked Data Fragments. specific queries
 high server effort
 low availability generic requests
 high client effort
 high availability LD
 document data
 dump SPARQL
 result
  20. 20. Can we define APIs that efficiently allow
 SPARQL querying with high availability? specific queries
 high server effort
 low availability generic requests
 high client effort
 high availability LD
 document data
 dump SPARQL
 result basic
 LDFs
  21. 21. A basic Linked Data Fragments API
 offers triple-pattern-based access. selector metadata controls a triple pattern total number of matches access to all basic LDFs basic Linked Data Fragment
  22. 22. data (first 100) controls (other basic LDFs) metadata (total count)
  23. 23. Triple-pattern-based access to Linked Data
 doesn’t endanger a server’s availability. Easy to generate Efficiently cacheable through HTTP Low message entropy compressed triple format HDT
  24. 24. The higher the message entropy, the more interesting analysis becomes. high message entropylow message entropy LD
 document data
 dump SPARQL
 result basic
 LDFs interesting for USEWOD?boring for USEWOD?
  25. 25. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  26. 26. SPARQL Server Client Client Client Client Client Client Client (a) sparql endpoints perform all processing on the server, leading to fast query execution with low data bandwidth, and a rapidly overloaded server. The current server-does-all model
 leads inevitably to low availability.
  27. 27. LDF Server Client ClientClient Client Client Client Client Client Client (b) ldf servers only support simple requests and can thus handle far higher loads. Clients perform the querying, so they need more (cacheable) data. Instead of asking one complex question,
 ask several simpler questions.
  28. 28. SELECT ?person ?city WHERE { ! ! } How can we answer this query
 using basic Linked Data Fragments? ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en.
  29. 29. Split the query based on
 the available fragment types. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en.
  30. 30. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name "York"@en. dbpedia:York,_Ontario foaf:name "York"@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Get the first page
 of the corresponding fragments.
  31. 31. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name "York"@en. dbpedia:York,_Ontario foaf:name "York"@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Read the count metadata
 of each fragment page. ±61,000 ±470,000 12
  32. 32. ?person a dbpedia-owl:Artist ?person dbpedia-owl:birthPlace ?city foaf:name "York"@en. dbpedia:York foaf:name "York"@en. dbpedia:York,_Ontario foaf:name "York"@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce. … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. … Take the smallest fragment, start with its first match. ±61, ±470, 12
  33. 33. SELECT ?person WHERE { ! ! } How can we answer this query
 using basic Linked Data Fragments? ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:York foaf:name "York"@en.
  34. 34. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. Split the query based on
 the available fragment types.
  35. 35. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Get the first page
 of the corresponding fragments.
  36. 36. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Read the count metadata
 of each fragment page. ±61,000 75
  37. 37. ?person a dbpedia-owl:Artist ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. … ±61, 75 Take the smallest fragment, start with its first match.
  38. 38. ASK { ! ! } How can we answer this query
 using basic Linked Data Fragments? dbpedia:John_Flaxman a dbpedia-owl:Artist. dbpedia:John_Flaxman :birthPlace dbpedia:York. dbpedia:York foaf:name "York"@en.
  39. 39. dbpedia:John_Flaxman a dbpedia-owl:Artist. Split the query based on
 the available fragment types.
  40. 40. dbpedia:John_Flaxman a dbpedia-owl:Artist. Check if the fragment
 has matches. HEAD /dbpedia?s=John_Flaxman&p=… HTTP/1.1 200 OK
  41. 41. Output the mappings
 for each successful match. ?person = dbpedia:John_Flaxman
 ?city = dbpedia:York is a result of the query.
  42. 42. ?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … Recursively repeat the process
 for the remaining mappings.
  43. 43. All client and server software
 is available as open source. linkeddatafragments.org data.linkeddatafragments.org client.linkeddatafragments.org
  44. 44. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  45. 45. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  46. 46. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  47. 47. Despite being on the Web, we use public SPARQL endpoints like databases. Ask a complex question. Wait. Process the answer.
  48. 48. When was the last time
 you used the Web like that? Ask a complex question. Wait. Process the answer.
  49. 49. On the Web, there are no final answers.
 We ask questions and iteratively improve. Ask a simple questions. Process answers as they arrive. Create new questions.
  50. 50. Show a sorted list of names of Greek artists,
 linked to their DBpedia page. … καλλιτέχνες endpoint
 approach fragment
 approach
  51. 51. καλλιτέχνες καλλιτέχνες SPARQL
 endpoint endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page.
  52. 52. endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. SELECT DISTINCT(?person) MIN(?name) WHERE { ?person a dbpedia-owl:Artist; foaf:name ?name; dbpedia-owl:birthPlace ?city. ?city dbpedia-owl:country dbpedia:Greece. } ORDER BY ?name
  53. 53. endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. SELECT DISTINCT(?person) MIN(?name) WHERE { ?person a dbpedia-owl:Artist; foaf:name ?name; dbpedia-owl:birthPlace ?city. ?city dbpedia-owl:country dbpedia:Greece. } ORDER BY ?name
  54. 54. endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. DISTINCT MIN SORT BY keep all results in memory keep all results in memory, blocking keep all results in memory, blocking Consequences: Doesn’t matter; we’re waiting anyway.
  55. 55. fragment
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. SELECT ?person ?name
 WHERE { ?person a dbpedia-owl:Artist; foaf:name ?name; dbpedia-owl:birthPlace ?city. ?city dbpedia-owl:country dbpedia:Greece } No blocking operators; streaming is important.
  56. 56. καλλιτέχνες καλλιτέχνες fragment
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. καλλιτέχνες
  57. 57. Making the LOD cloud less lonesome
 starts with embracing its open nature. How meaningful is a sort anyway? How meaningful is a single answer? Build applications that react to data.
  58. 58. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  59. 59. Let’s closely inspect the server logs
 of the “Artists from York” query. SPARQL: http://dbpedia.org/sparql?query=SELECT+%3Fp+ %3Fc+WHERE+%7B%0D%0A++++%3Fp+a+ %3Chttp%3A%2F%2Fdbpedia.org%2Fontology SELECT ?person ?city WHERE { ! ! } ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en.
  60. 60. Let’s closely inspect the server logs
 of the “Artists from York” query. basic Linked Data Fragments: /dbpedia /dbpedia?predicate=http%3A%2F%2Fxmlns.com%2Ffoa /dbpedia?predicate=http%3A%2F%2Fwww.w3.org%2F1 /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
  61. 61. Let’s closely inspect the server logs
 of the “Artists from York” query. basic Linked Data Fragments: ?c foaf:name "York"@en. ?p rdf:type dbpedia-owl:Artist. ?p dbpedia-owl:birthPlace ?c. ?p dbpedia-owl:birthPlace dbpedia:York_(explorer). ?p dbpedia-owl:birthPlace dbpedia:York_railway_station. ?p dbpedia-owl:birthPlace dbpedia:28220_York. ?p dbpedia-owl:birthPlace dbpedia:York_(provincial_elect ?p dbpedia-owl:birthPlace dbpedia:York,_New_York. dbpedia:Cornelius_R._Parsons rdf:type dbpedia-owl:Artist dbpedia:John_R._McPherson rdf:type dbpedia-owl:Artist.
  62. 62. Access logs resulting from basic LDF clients
 are hard to interpret. Parallel requests, unclear dependencies Full query hard to reconstruct Was it SPARQL in the first place?
  63. 63. What would we do
 if the users were humans? Create a user profile Use cookies Check the Referer header
  64. 64. The Referer header tells us
 the path the client has followed. Interesting, still underused idea Augmenting the Web of Data using Referers
 by Hannes Mühleisen & Anja Jentzsch It explains part of the “why” Allows to reconstruct dependencies
  65. 65. Reconstruct dependencies
 from Referer metadata. dbpedia:Cornelius_R._Parsons
 rdf:type dbpedia-owl:Artist. dbpedia:John_R._McPherson
 ?c foaf:name "York"@en. ?p rdf:type dbpedia-owl:Artist. ?p dbpedia-owl:birthPlace ?c. ?p dbpedia-owl:birt ?p dbpedia-owl:birt ?p dbpedia-owl:birt ?p dbpedia-owl:birt ?p dbpedia-owl:birt
  66. 66. These dependencies can help us
 cache and prefetch. After retrieving ?s <p> <o> patterns,
 clients often ask for <s> rdfs:label ?l . Example observation: Example action: Always add labels to concepts
 in all responses. cfr. Caching and Prefetching Strategies for SPARQL queries
 by Johannes Lorey and Felix Naumann
  67. 67. However, the open-world assumption
 can cause cardinality trouble. SELECT ?person ?label WHERE { ?person a dbpedia-owl:Artist; rdfs:label ?label. } dbpedia:Yannis_Markopoulos a dbpedia-owl:Artist; rdfs:label "Yannis Markopoulos"@en. dbpedia:Yanni a dbpedia-owl:Artist; rdfs:label “Yanni"@en. … Are these all labels?
 Should I ask for more? fragment “?person a dbpedia-owl:Artist”
  68. 68. The intent of this query
 is probably different from its semantics. SELECT ?person ?label WHERE { ?person a dbpedia-owl:Artist; rdfs:label ?label. } With SPARQL endpoints, this doesn’t matter.
 Clients don’t have to work more. To optimize client usage patterns,
 this difference is really important.
  69. 69. Referers only show part of the story.
 Can we know more? GET /dbpedia?o=dbpedia%3AGreece HTTP/1.1 User-Agent: curl/7.35.0 Host: data.linkeddatafragments.org Accept: text/turtle Referer: http://data.linkeddatafragments.org/dbpedia X-Executed-Query: SELECT ?person ?label WHERE { ?person a dbpe Inform the server what you’re doing. Then the server can help you better in the future.
  70. 70. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  71. 71. My reflex when building machine clients
 is to wonder: what would a human do? I don’t expect any server
 to solve my queries;
 I collect small pieces of information
 to solve queries myself.
  72. 72. If you as a human use a website
 and it doesn’t work the way you want, what would you do?
  73. 73. As a human, I would leave feedback.
 I would comment, like or upvote/downvote. “I tried to find artists from Greece.
 Finding out Greek citizens was easy,
 but the artist checks went quite slow.
 The total query took me 4 minutes,
 whereas I would prefer 1 minute.” ★★★☆☆ Feedback is key
 to improving a service.
  74. 74. Why don’t we let machines
 give feedback about their experience? [ a f:ExperienceFeedback; f:author _:agent; f:subject _:query; f:actualSituation [ f:duration "3m"; f:bandwidth "500KB" ]; f:desiredSituation [ f:duration "1m" ] ].
  75. 75. If clients are more intelligent than servers,
 we have to analyze usage differently. Enable clients to act smart Creatively reuse human techniques Learn from optimizations feedback
  76. 76. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  77. 77. Machine clients sending feedback? What you say is total science fiction! What’s next, machine clients
 that poke you on Facebook? “
  78. 78. What I consider science fiction: a public endpoint on the Web
 that answers any question.
  79. 79. 99.9% of time, a basic LDF client
 solves this query in 3 seconds: Which public SPARQL endpoints
 could guarantee you that? SELECT ?person ?city WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name “York"@en. }
  80. 80. You cannot solve all queries
 with basic Linked Data Fragments! SELECT ?x ?l WHERE { ?x rdfs:label ?l. FILTER REGEX(?l, "^A") } “
  81. 81. The Semantic Web tried
 to solve too much too fast. The result is
 a very lonesome LOD Cloud. You can query anything, but it never works.
  82. 82. Start with enabling tasks
 humans could easily do. SELECT ?x ?l WHERE { ?x rdfs:label ?l. FILTER REGEX(?l, "^A") }
  83. 83. Start with enabling tasks
 humans could easily do. SELECT ?person ?city WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. }
  84. 84. Start with enabling tasks
 humans could easily do. “I tried to find artists from Greece.
 Finding out Greek citizens was easy,
 but the artist checks went quite slow.
 The total query took me 4 minutes,
 whereas I would prefer 1 minute.” ★★★☆☆
  85. 85. Start with enabling tasks
 humans could easily do. After that,
 we’ll talk about the rest.
  86. 86. The LOD usage community
 can help create intelligent clients. Put the intelligent servers aside,
 enable clients to be intelligent. Look at usage from the perspective
 of intelligent clients.
  87. 87. The LOD Cloud is lonesome
 because we gave
 human and machine clients
 different interfaces. Let’s make the simple things work.
 Let’s get the data used.
  88. 88. The Lonesome LOD Cloud @RubenVerborgh

×