Your SlideShare is downloading. ×
0
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
The Lonesome LOD Cloud
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The Lonesome LOD Cloud

1,632

Published on

Invited talk at USEWOD2014 (http://people.cs.kuleuven.be/~bettina.berendt/USEWOD2014/) …

Invited talk at USEWOD2014 (http://people.cs.kuleuven.be/~bettina.berendt/USEWOD2014/)

A tremendous amount of machine-interpretable information is available in the Linked Open Data Cloud. Unfortunately, much of this data remains underused as machine clients struggle to use the Web. I believe this can be solved by giving machines interfaces similar to those we offer humans, instead of separate interfaces such as SPARQL endpoints. In this talk, I'll discuss the Linked Data Fragments vision on machine access to the Web of Data, and indicate how this impacts usage analysis of the LOD Cloud. We all can learn a lot from how humans access the Web, and those strategies can be applied to querying and analysis. In particular, we have to focus first on solving those use cases that humans can do easily, and only then consider tackling others.

Published in: Technology
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,632
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
12
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Lonesome LOD Cloud Ruben Verborgh
  • 2. The Web for humans offers an HTTP interface to HTML documents. client dataHTTP HTML
  • 3. The Web for applications offers an HTTP interface to JSON documents. client dataHTTP JSON
  • 4. The Web for intelligent agents offers an HTTP interface to RDF documents. client dataHTTP RDF
  • 5. The Web for intelligent agents offers a SPARQL interface to an RDF database. client dataHTTP RDF SPARQL
  • 6. The good news about SPARQL they’re fast. they’re great for analysis. they’re efficient. endpoints:
  • 7. The bad news about SPARQL they don’t work on the Web. endpoints:
  • 8. We cannot wait any longer for
 public SPARQL endpoints to work. The LOD cloud is there— we should query it now.
  • 9. Stop building intelligent servers, start building intelligent clients. But how will we analyze those?
  • 10. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  • 11. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  • 12. Currently, there are three ways
 to provide access to a Linked Data dataset. SPARQL endpoint data dump Linked Data documents
  • 13. Those three ways have one thing in common: they offer fragments of a dataset. SPARQL endpoint data dump Linked Data documents
  • 14. Linked Data Fragments look
 at all ways at the same time. specific queries
 high server effort
 low availability generic requests
 high client effort
 high availability LD
 document data
 dump SPARQL
 result
  • 15. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls What data does it contain? What do we know about it? What can we do next?
  • 16. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls a specific entity creator, maintainer, … links to other LD documents Linked Data Document
  • 17. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls a SPARQL query (none) (none) SPARQL CONSTRUCT result
  • 18. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls everything (none) data dump number of triples, file size
  • 19. Any API that provides triples
 publishes Linked Data Fragments. specific queries
 high server effort
 low availability generic requests
 high client effort
 high availability LD
 document data
 dump SPARQL
 result
  • 20. Can we define APIs that efficiently allow
 SPARQL querying with high availability? specific queries
 high server effort
 low availability generic requests
 high client effort
 high availability LD
 document data
 dump SPARQL
 result basic
 LDFs
  • 21. A basic Linked Data Fragments API
 offers triple-pattern-based access. selector metadata controls a triple pattern total number of matches access to all basic LDFs basic Linked Data Fragment
  • 22. data (first 100) controls (other basic LDFs) metadata (total count)
  • 23. Triple-pattern-based access to Linked Data
 doesn’t endanger a server’s availability. Easy to generate Efficiently cacheable through HTTP Low message entropy compressed triple format HDT
  • 24. The higher the message entropy, the more interesting analysis becomes. high message entropylow message entropy LD
 document data
 dump SPARQL
 result basic
 LDFs interesting for USEWOD?boring for USEWOD?
  • 25. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  • 26. SPARQL Server Client Client Client Client Client Client Client (a) sparql endpoints perform all processing on the server, leading to fast query execution with low data bandwidth, and a rapidly overloaded server. The current server-does-all model
 leads inevitably to low availability.
  • 27. LDF Server Client ClientClient Client Client Client Client Client Client (b) ldf servers only support simple requests and can thus handle far higher loads. Clients perform the querying, so they need more (cacheable) data. Instead of asking one complex question,
 ask several simpler questions.
  • 28. SELECT ?person ?city WHERE { ! ! } How can we answer this query
 using basic Linked Data Fragments? ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en.
  • 29. Split the query based on
 the available fragment types. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en.
  • 30. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name "York"@en. dbpedia:York,_Ontario foaf:name "York"@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Get the first page
 of the corresponding fragments.
  • 31. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name "York"@en. dbpedia:York,_Ontario foaf:name "York"@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Read the count metadata
 of each fragment page. ±61,000 ±470,000 12
  • 32. ?person a dbpedia-owl:Artist ?person dbpedia-owl:birthPlace ?city foaf:name "York"@en. dbpedia:York foaf:name "York"@en. dbpedia:York,_Ontario foaf:name "York"@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce. … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. … Take the smallest fragment, start with its first match. ±61, ±470, 12
  • 33. SELECT ?person WHERE { ! ! } How can we answer this query
 using basic Linked Data Fragments? ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:York foaf:name "York"@en.
  • 34. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. Split the query based on
 the available fragment types.
  • 35. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Get the first page
 of the corresponding fragments.
  • 36. ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 … Read the count metadata
 of each fragment page. ±61,000 75
  • 37. ?person a dbpedia-owl:Artist ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. … ±61, 75 Take the smallest fragment, start with its first match.
  • 38. ASK { ! ! } How can we answer this query
 using basic Linked Data Fragments? dbpedia:John_Flaxman a dbpedia-owl:Artist. dbpedia:John_Flaxman :birthPlace dbpedia:York. dbpedia:York foaf:name "York"@en.
  • 39. dbpedia:John_Flaxman a dbpedia-owl:Artist. Split the query based on
 the available fragment types.
  • 40. dbpedia:John_Flaxman a dbpedia-owl:Artist. Check if the fragment
 has matches. HEAD /dbpedia?s=John_Flaxman&p=… HTTP/1.1 200 OK
  • 41. Output the mappings
 for each successful match. ?person = dbpedia:John_Flaxman
 ?city = dbpedia:York is a result of the query.
  • 42. ?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … Recursively repeat the process
 for the remaining mappings.
  • 43. All client and server software
 is available as open source. linkeddatafragments.org data.linkeddatafragments.org client.linkeddatafragments.org
  • 44. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  • 45. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  • 46. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  • 47. Despite being on the Web, we use public SPARQL endpoints like databases. Ask a complex question. Wait. Process the answer.
  • 48. When was the last time
 you used the Web like that? Ask a complex question. Wait. Process the answer.
  • 49. On the Web, there are no final answers.
 We ask questions and iteratively improve. Ask a simple questions. Process answers as they arrive. Create new questions.
  • 50. Show a sorted list of names of Greek artists,
 linked to their DBpedia page. … καλλιτέχνες endpoint
 approach fragment
 approach
  • 51. καλλιτέχνες καλλιτέχνες SPARQL
 endpoint endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page.
  • 52. endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. SELECT DISTINCT(?person) MIN(?name) WHERE { ?person a dbpedia-owl:Artist; foaf:name ?name; dbpedia-owl:birthPlace ?city. ?city dbpedia-owl:country dbpedia:Greece. } ORDER BY ?name
  • 53. endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. SELECT DISTINCT(?person) MIN(?name) WHERE { ?person a dbpedia-owl:Artist; foaf:name ?name; dbpedia-owl:birthPlace ?city. ?city dbpedia-owl:country dbpedia:Greece. } ORDER BY ?name
  • 54. endpoint
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. DISTINCT MIN SORT BY keep all results in memory keep all results in memory, blocking keep all results in memory, blocking Consequences: Doesn’t matter; we’re waiting anyway.
  • 55. fragment
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. SELECT ?person ?name
 WHERE { ?person a dbpedia-owl:Artist; foaf:name ?name; dbpedia-owl:birthPlace ?city. ?city dbpedia-owl:country dbpedia:Greece } No blocking operators; streaming is important.
  • 56. καλλιτέχνες καλλιτέχνες fragment
 approach Show a sorted list of names of Greek artists,
 linked to their DBpedia page. καλλιτέχνες
  • 57. Making the LOD cloud less lonesome
 starts with embracing its open nature. How meaningful is a sort anyway? How meaningful is a single answer? Build applications that react to data.
  • 58. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  • 59. Let’s closely inspect the server logs
 of the “Artists from York” query. SPARQL: http://dbpedia.org/sparql?query=SELECT+%3Fp+ %3Fc+WHERE+%7B%0D%0A++++%3Fp+a+ %3Chttp%3A%2F%2Fdbpedia.org%2Fontology SELECT ?person ?city WHERE { ! ! } ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en.
  • 60. Let’s closely inspect the server logs
 of the “Artists from York” query. basic Linked Data Fragments: /dbpedia /dbpedia?predicate=http%3A%2F%2Fxmlns.com%2Ffoa /dbpedia?predicate=http%3A%2F%2Fwww.w3.org%2F1 /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo /dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
  • 61. Let’s closely inspect the server logs
 of the “Artists from York” query. basic Linked Data Fragments: ?c foaf:name "York"@en. ?p rdf:type dbpedia-owl:Artist. ?p dbpedia-owl:birthPlace ?c. ?p dbpedia-owl:birthPlace dbpedia:York_(explorer). ?p dbpedia-owl:birthPlace dbpedia:York_railway_station. ?p dbpedia-owl:birthPlace dbpedia:28220_York. ?p dbpedia-owl:birthPlace dbpedia:York_(provincial_elect ?p dbpedia-owl:birthPlace dbpedia:York,_New_York. dbpedia:Cornelius_R._Parsons rdf:type dbpedia-owl:Artist dbpedia:John_R._McPherson rdf:type dbpedia-owl:Artist.
  • 62. Access logs resulting from basic LDF clients
 are hard to interpret. Parallel requests, unclear dependencies Full query hard to reconstruct Was it SPARQL in the first place?
  • 63. What would we do
 if the users were humans? Create a user profile Use cookies Check the Referer header
  • 64. The Referer header tells us
 the path the client has followed. Interesting, still underused idea Augmenting the Web of Data using Referers
 by Hannes Mühleisen & Anja Jentzsch It explains part of the “why” Allows to reconstruct dependencies
  • 65. Reconstruct dependencies
 from Referer metadata. dbpedia:Cornelius_R._Parsons
 rdf:type dbpedia-owl:Artist. dbpedia:John_R._McPherson
 ?c foaf:name "York"@en. ?p rdf:type dbpedia-owl:Artist. ?p dbpedia-owl:birthPlace ?c. ?p dbpedia-owl:birt ?p dbpedia-owl:birt ?p dbpedia-owl:birt ?p dbpedia-owl:birt ?p dbpedia-owl:birt
  • 66. These dependencies can help us
 cache and prefetch. After retrieving ?s <p> <o> patterns,
 clients often ask for <s> rdfs:label ?l . Example observation: Example action: Always add labels to concepts
 in all responses. cfr. Caching and Prefetching Strategies for SPARQL queries
 by Johannes Lorey and Felix Naumann
  • 67. However, the open-world assumption
 can cause cardinality trouble. SELECT ?person ?label WHERE { ?person a dbpedia-owl:Artist; rdfs:label ?label. } dbpedia:Yannis_Markopoulos a dbpedia-owl:Artist; rdfs:label "Yannis Markopoulos"@en. dbpedia:Yanni a dbpedia-owl:Artist; rdfs:label “Yanni"@en. … Are these all labels?
 Should I ask for more? fragment “?person a dbpedia-owl:Artist”
  • 68. The intent of this query
 is probably different from its semantics. SELECT ?person ?label WHERE { ?person a dbpedia-owl:Artist; rdfs:label ?label. } With SPARQL endpoints, this doesn’t matter.
 Clients don’t have to work more. To optimize client usage patterns,
 this difference is really important.
  • 69. Referers only show part of the story.
 Can we know more? GET /dbpedia?o=dbpedia%3AGreece HTTP/1.1 User-Agent: curl/7.35.0 Host: data.linkeddatafragments.org Accept: text/turtle Referer: http://data.linkeddatafragments.org/dbpedia X-Executed-Query: SELECT ?person ?label WHERE { ?person a dbpe Inform the server what you’re doing. Then the server can help you better in the future.
  • 70. How can we analyze queries
 from intelligent clients? Client queries are different Look at the logs Treat machine clients as humans
  • 71. My reflex when building machine clients
 is to wonder: what would a human do? I don’t expect any server
 to solve my queries;
 I collect small pieces of information
 to solve queries myself.
  • 72. If you as a human use a website
 and it doesn’t work the way you want, what would you do?
  • 73. As a human, I would leave feedback.
 I would comment, like or upvote/downvote. “I tried to find artists from Greece.
 Finding out Greek citizens was easy,
 but the artist checks went quite slow.
 The total query took me 4 minutes,
 whereas I would prefer 1 minute.” ★★★☆☆ Feedback is key
 to improving a service.
  • 74. Why don’t we let machines
 give feedback about their experience? [ a f:ExperienceFeedback; f:author _:agent; f:subject _:query; f:actualSituation [ f:duration "3m"; f:bandwidth "500KB" ]; f:desiredSituation [ f:duration "1m" ] ].
  • 75. If clients are more intelligent than servers,
 we have to analyze usage differently. Enable clients to act smart Creatively reuse human techniques Learn from optimizations feedback
  • 76. Linked Data Fragments Client-side querying Analyzing client queries The Lonesome LOD Cloud
  • 77. Machine clients sending feedback? What you say is total science fiction! What’s next, machine clients
 that poke you on Facebook? “
  • 78. What I consider science fiction: a public endpoint on the Web
 that answers any question.
  • 79. 99.9% of time, a basic LDF client
 solves this query in 3 seconds: Which public SPARQL endpoints
 could guarantee you that? SELECT ?person ?city WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name “York"@en. }
  • 80. You cannot solve all queries
 with basic Linked Data Fragments! SELECT ?x ?l WHERE { ?x rdfs:label ?l. FILTER REGEX(?l, "^A") } “
  • 81. The Semantic Web tried
 to solve too much too fast. The result is
 a very lonesome LOD Cloud. You can query anything, but it never works.
  • 82. Start with enabling tasks
 humans could easily do. SELECT ?x ?l WHERE { ?x rdfs:label ?l. FILTER REGEX(?l, "^A") }
  • 83. Start with enabling tasks
 humans could easily do. SELECT ?person ?city WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. }
  • 84. Start with enabling tasks
 humans could easily do. “I tried to find artists from Greece.
 Finding out Greek citizens was easy,
 but the artist checks went quite slow.
 The total query took me 4 minutes,
 whereas I would prefer 1 minute.” ★★★☆☆
  • 85. Start with enabling tasks
 humans could easily do. After that,
 we’ll talk about the rest.
  • 86. The LOD usage community
 can help create intelligent clients. Put the intelligent servers aside,
 enable clients to be intelligent. Look at usage from the perspective
 of intelligent clients.
  • 87. The LOD Cloud is lonesome
 because we gave
 human and machine clients
 different interfaces. Let’s make the simple things work.
 Let’s get the data used.
  • 88. The Lonesome LOD Cloud @RubenVerborgh

×