Successfully reported this slideshow.

Querying data on the Web – client or server?

2

Share

Upcoming SlideShare
Reasoned SPARQL
Reasoned SPARQL
Loading in …3
×
1 of 66
1 of 66

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Querying data on the Web – client or server?

  1. 1. Querying data on the Web:
 client or server? Ruben Verborgh Ghent University – iMinds
  2. 2. The current Semantic Web
 has many implicit assumptions. We should be able
 to answer all queries. Complexity is more important
 than availability. Data servers
 need to be expensive.
  3. 3. Those assumptions are
 not necessarily wrong. They’re also not necessarily
 the only possible ones.
  4. 4. Some queries are
 hard to answer. Availability is a top priority. Low-cost data servers
 have potential. Let’s rethink our assumptions,
 just to see what’s possible.
  5. 5. Different assumptions lead
 to a different Semantic Web. Maybe they bring us closer
 to the Web We Want.
  6. 6. …but what do we want?
  7. 7. The Semantic Web’s assumptions Client-side query execution Querying data on the Web:
 client or server? New query opportunities
  8. 8. 1. Clients need a different protocol.
  9. 9. The Web for humans offers an HTTP interface to HTML. client dataHTTP HTML
  10. 10. The Web for applications offers an HTTP interface to JSON. client dataHTTP JSON
  11. 11. The Web for applications offers an HTTP interface to RDF. client dataHTTP RDF
  12. 12. The Web for applications offers an SPARQL interface to RDF. client dataHTTP RDF SPARQL
  13. 13. Documents need a new language. Semantic Web clients were
 perceived as very limited. Querying needs a new protocol. …unlike “simple” JSON clients.
  14. 14. 1. Clients need a different protocol. 2. Live queries require that protocol.
  15. 15. public SPARQL endpoints There are 3 common ways
 to publish Linked Data. Linked Data documents downloadable data dumps
  16. 16. …and that’s not always a good thing. Public SPARQL endpoints
 offer a very powerful interface. Clients can ask any query… …if the endpoint is available. Hosting an endpoint is costly.
  17. 17. Low-cost to host. Linked Data documents
 seem to work like the Web. Solve queries by traversing links. Many queries cannot be solved.
  18. 18. Set up your own endpoint. Downloadable data dumps
 have high availability. Data is not live. You’re not really querying the Web.
  19. 19. 1. Clients need a different protocol. 2. Live queries require that protocol. 3. Clients can request any query.
  20. 20. The query language abstracts away
 the steps needed to solve it. In SPARQL, asking a simple query
 is as easy as asking a difficult one. In contrast to the rest of the Web,
 clients are in control.
  21. 21. With a JSON interface, the server decides how clients access data. client dataHTTP JSON
  22. 22. client dataHTTP RDF SPARQL With a SPARQL interface, clients
 decide how they access data.
  23. 23. Clients can ask anything, also
 queries that bring servers down. The majority
 of public SPARQL endpoints
 has less than 95% availability. That means the endpoint
 —and thus your application—
 doesn’t work 1.5 days each month.
  24. 24. If you have operational need
 for SPARQL accessible data,
 you must have your own infrastructure. No public endpoints.
 Public endpoints are for lookups and discovery;
 sort of a dataset demo. —Orri Erling, OpenLink (2014)
  25. 25. SEMANTICthings we happen to have
 downloaded from the WEB
  26. 26. If you want to study
 a subject on Wikipedia, do you download all
 4,614,000 articles first?
  27. 27. 1. Clients need a different protocol. 2. Live queries require that protocol. 3. Clients can request any query.
  28. 28. The Semantic Web’s assumptions Client-side query execution New query opportunities Querying data on the Web:
 client or server?
  29. 29. data
 dump SPARQL
 endpoint Any fragment of a Linked Data set
 is called a Linked Data Fragment. derefer-
 encing high server efforthigh client effort all subject SPARQL querySELECTOR
  30. 30. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls What data does it contain? What do we know about it? What can we do next?
  31. 31. a SPARQL query (none) (none) SPARQL CONSTRUCT result selector metadata controls Each type of Linked Data Fragment
 is defined by three characteristics.
  32. 32. a specific entity creator, maintainer, … links to other LD documents Linked Data Document selector metadata controls Each type of Linked Data Fragment
 is defined by three characteristics.
  33. 33. everything (none) data dump number of triples, file size selector metadata controls Each type of Linked Data Fragment
 is defined by three characteristics.
  34. 34. Can we query fragments that
 balance client and server effort? data
 dump SPARQL
 endpoint triple
 pattern
 fragments derefer-
 encing high server efforthigh client effort all subject SPARQL querytriple pattern
  35. 35. triple pattern total number of matches access to all other fragments selector metadata controls Triple pattern fragments are cheap
 yet enable efficient querying.
  36. 36. data (first 100) controls (other fragments) metadata (total count)
  37. 37. Other APIs exist, but are specific. Triple pattern fragment servers
 enable clients to execute queries. Triple patterns work on all datasets. Combine data, metadata & controls.
  38. 38. How to answer this query using
 only triple pattern fragments? SELECT ?person ?city WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. }
  39. 39. Get the corresponding fragments
 ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  40. 40. Get the corresponding fragments
 and read the count metadata. ?person a dbpedia-owl:Artist. ±61,000 ±470,000 12 ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  41. 41. Start with the smallest fragment.
 Start with the first match. ?person a dbpedia-owl:Artist ±61, ±470, 12 ?person dbpedia-owl:birthPlace ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce. … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. …
  42. 42. How to answer this query using
 only triple pattern fragments? SELECT ?person WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:York foaf:name "York"@en. }
  43. 43. Get the corresponding fragments
 ?person a dbpedia-owl:Artist. ?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  44. 44. Get the corresponding fragments
 and read the count metadata. ?person a dbpedia-owl:Artist. ±61,000 75?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  45. 45. Start with the smallest fragment.
 Start with the first match. ?person a dbpedia-owl:Artist ±61, 75?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. …
  46. 46. How to answer this query using
 only triple pattern fragments? ASK { dbp:John_Flaxman a dbpo:Artist. dbp:John_Flaxman dbpo:birthPlace dbp:York. dbp:York foaf:name "York"@en. }
  47. 47. Get the corresponding fragment
 and read the count metadata. dbpedia:John_Flaxman a dbpedia-owl:Artist. 1 dbpedia:John_Flaxman a dbpedia-owl:Artist. ! Output the match: ?person = dbpedia:John_Flaxman
 ?city = dbpedia:York
  48. 48. Recursively repeat the process
 for all bindings. ?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 …
  49. 49. Use the Web’s protocol HTTP. This way of querying
 changes the usual assumptions. Don’t be smart; enable intelligence. Some queries will be hard / slow.
  50. 50. Querying semantic datasources
 means managing expectations. data
 dump SPARQL
 endpoint triple
 pattern
 fragments derefer-
 encing high server efforthigh client effort low availabilityhigh availability low freshness / speed high freshness / speed
  51. 51. The Semantic Web’s assumptions Client-side query execution New query opportunities Querying data on the Web:
 client or server?
  52. 52. Coupling access and processing
 leads to low availability. SPARQL Server Client Client Client Client Client Client Client (a) sparql endpoints perform all processing on the server, leading to fast query execution with low data bandwidth, and a rapidly overloaded server.
  53. 53. LDF Server Client ClientClient Client Client Client Client Client Client (b) ldf servers only support simple requests and can thus handle far higher loads. Clients perform the querying, so they need more (cacheable) data. Enabling clients to query
 leads to high scalability.
  54. 54. Show a sorted list of molecules
 that match certain characteristics. … Molecules endpoint
 approach fragment
 approach
  55. 55. Molecules endpoint
 approach SPARQL
 endpoint Molecules Show a sorted list of molecules
 that match certain characteristics.
  56. 56. endpoint
 approachSELECT DISTINCT(?mol) MIN(?name) WHERE { ?mol rdfs:label ?name; … … } ORDER BY ?name Show a sorted list of molecules
 that match certain characteristics.
  57. 57. endpoint
 approach Show a sorted list of molecules
 that match certain characteristics. SELECT DISTINCT(?mol) MIN(?name) WHERE { ?mol rdfs:label ?name; … … } ORDER BY ?name
  58. 58. endpoint
 approach DISTINCT MIN SORT BY keep all results in memory keep all results in memory, blocking keep all results in memory, blocking Consequences: Doesn’t matter; we’re waiting anyway. Show a sorted list of molecules
 that match certain characteristics.
  59. 59. fragments
 approach No blocking operators; streaming matters. Show a sorted list of molecules
 that match certain characteristics. SELECT ?mol ?name WHERE { ?mol rdfs:label ?name; … … }
  60. 60. Molecules fragments
 approach MoleculesMolecules Show a sorted list of molecules
 that match certain characteristics.
  61. 61. The algorithm remains the same
 when clients use one or multiple
 triple pattern fragment servers. Federation also becomes
 substantially easier. Avoid the unavailability cascade.
  62. 62. An optimal solution doesn’t exist.
 We should look at all APIs. data
 dump SPARQL
 endpoint triple
 pattern
 fragments derefer-
 encing
  63. 63. Servers indicate what they do,
 enabling clients to query optimally. “This server supports triple patterns
 and full-text search on objects.” “This server supports SPARQL queries
 with up to 2 joins.” “This server supports Linked Data documents.”
  64. 64. The Semantic Web’s assumptions Client-side query execution New query opportunities Querying data on the Web:
 client or server?
  65. 65. Different assumptions
 lead to different trade-offs. Live querying of public data is possible at low cost,
 but at slower speeds… …for now :-)
  66. 66. Let your browser
 solve a SPARQL query:
 client.linkeddatafragments.org Ruben Verborgh Ghent University – iMinds

×