1. Querying data on the Web:
client or server?
Ruben Verborgh
Ghent University – iMinds
2. The current Semantic Web
has many implicit assumptions.
We should be able
to answer all queries.
Complexity is more important
than availability.
Data servers
need to be expensive.
4. Some queries are
hard to answer.
Availability is
a top priority.
Low-cost data servers
have potential.
Let’s rethink our assumptions,
just to see what’s possible.
9. The Web for humans offers
an HTTP interface to HTML.
client dataHTTP
HTML
10. The Web for applications offers
an HTTP interface to JSON.
client dataHTTP
JSON
11. The Web for applications offers
an HTTP interface to RDF.
client dataHTTP
RDF
12. The Web for applications offers
an SPARQL interface to RDF.
client dataHTTP
RDF
SPARQL
13. Documents need a new language.
Semantic Web clients were
perceived as very limited.
Querying needs a new protocol.
…unlike “simple” JSON clients.
14. 1. Clients need a different protocol.
2. Live queries require that protocol.
15. public SPARQL endpoints
There are 3 common ways
to publish Linked Data.
Linked Data documents
downloadable data dumps
16. …and that’s not always a good thing.
Public SPARQL endpoints
offer a very powerful interface.
Clients can ask any query…
…if the endpoint is available.
Hosting an endpoint is costly.
17. Low-cost to host.
Linked Data documents
seem to work like the Web.
Solve queries by traversing links.
Many queries cannot be solved.
18. Set up your own endpoint.
Downloadable data dumps
have high availability.
Data is not live.
You’re not really querying the Web.
19. 1. Clients need a different protocol.
2. Live queries require that protocol.
3. Clients can request any query.
20. The query language abstracts away
the steps needed to solve it.
In SPARQL, asking a simple query
is as easy as asking a difficult one.
In contrast to the rest of the Web,
clients are in control.
21. With a JSON interface, the server
decides how clients access data.
client dataHTTP
JSON
23. Clients can ask anything, also
queries that bring servers down.
The majority
of public SPARQL endpoints
has less than 95% availability.
That means the endpoint
—and thus your application—
doesn’t work 1.5 days each month.
24. If you have operational need
for SPARQL accessible data,
you must have your own infrastructure.
No public endpoints.
Public endpoints are for lookups and discovery;
sort of a dataset demo.
—Orri Erling, OpenLink (2014)
26. If you want to study
a subject on Wikipedia,
do you download all
4,614,000 articles first?
27. 1. Clients need a different protocol.
2. Live queries require that protocol.
3. Clients can request any query.
28. The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:
client or server?
29. data
dump
SPARQL
endpoint
Any fragment of a Linked Data set
is called a Linked Data Fragment.
derefer-
encing
high server efforthigh client effort
all subject SPARQL querySELECTOR
30. Each type of Linked Data Fragment
is defined by three characteristics.
selector
metadata
controls
What data does it contain?
What do we know about it?
What can we do next?
31. a SPARQL query
(none)
(none)
SPARQL CONSTRUCT result
selector
metadata
controls
Each type of Linked Data Fragment
is defined by three characteristics.
32. a specific entity
creator, maintainer, …
links to other LD documents
Linked Data Document
selector
metadata
controls
Each type of Linked Data Fragment
is defined by three characteristics.
33. everything
(none)
data dump
number of triples, file size
selector
metadata
controls
Each type of Linked Data Fragment
is defined by three characteristics.
34. Can we query fragments that
balance client and server effort?
data
dump
SPARQL
endpoint
triple
pattern
fragments
derefer-
encing
high server efforthigh client effort
all subject SPARQL querytriple pattern
35. triple pattern
total number of matches
access to all other fragments
selector
metadata
controls
Triple pattern fragments are cheap
yet enable efficient querying.
37. Other APIs exist, but are specific.
Triple pattern fragment servers
enable clients to execute queries.
Triple patterns work on all datasets.
Combine data, metadata & controls.
38. How to answer this query using
only triple pattern fragments?
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
}
39. Get the corresponding fragments
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
40. Get the corresponding fragments
and read the count metadata.
?person a dbpedia-owl:Artist. ±61,000
±470,000
12
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
41. Start with the smallest fragment.
Start with the first match.
?person a dbpedia-owl:Artist ±61,
±470,
12
?person dbpedia-owl:birthPlace
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
42. How to answer this query using
only triple pattern fragments?
SELECT ?person WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
}
43. Get the corresponding fragments
?person a dbpedia-owl:Artist.
?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
44. Get the corresponding fragments
and read the count metadata.
?person a dbpedia-owl:Artist. ±61,000
75?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
45. Start with the smallest fragment.
Start with the first match.
?person a dbpedia-owl:Artist ±61,
75?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
46. How to answer this query using
only triple pattern fragments?
ASK {
dbp:John_Flaxman a dbpo:Artist.
dbp:John_Flaxman dbpo:birthPlace dbp:York.
dbp:York foaf:name "York"@en.
}
47. Get the corresponding fragment
and read the count metadata.
dbpedia:John_Flaxman a dbpedia-owl:Artist. 1
dbpedia:John_Flaxman a dbpedia-owl:Artist.
!
Output the match:
?person = dbpedia:John_Flaxman
?city = dbpedia:York
48. Recursively repeat the process
for all bindings.
?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
49. Use the Web’s protocol HTTP.
This way of querying
changes the usual assumptions.
Don’t be smart; enable intelligence.
Some queries will be hard / slow.
50. Querying semantic datasources
means managing expectations.
data
dump
SPARQL
endpoint
triple
pattern
fragments
derefer-
encing
high server efforthigh client effort
low availabilityhigh availability
low freshness / speed high freshness / speed
51. The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:
client or server?
52. Coupling access and processing
leads to low availability.
SPARQL Server
Client
Client
Client
Client
Client
Client
Client
(a) sparql endpoints perform all processing on the server, leading to fast
query execution with low data bandwidth, and a rapidly overloaded server.
57. endpoint
approach
Show a sorted list of molecules
that match certain characteristics.
SELECT DISTINCT(?mol) MIN(?name)
WHERE {
?mol rdfs:label ?name;
…
…
}
ORDER BY ?name
58. endpoint
approach
DISTINCT
MIN
SORT BY
keep all results in memory
keep all results in memory, blocking
keep all results in memory, blocking
Consequences:
Doesn’t matter; we’re waiting anyway.
Show a sorted list of molecules
that match certain characteristics.
59. fragments
approach
No blocking operators; streaming matters.
Show a sorted list of molecules
that match certain characteristics.
SELECT ?mol ?name
WHERE {
?mol rdfs:label ?name;
…
…
}
61. The algorithm remains the same
when clients use one or multiple
triple pattern fragment servers.
Federation also becomes
substantially easier.
Avoid the unavailability cascade.
62. An optimal solution doesn’t exist.
We should look at all APIs.
data
dump
SPARQL
endpoint
triple
pattern
fragments
derefer-
encing
63. Servers indicate what they do,
enabling clients to query optimally.
“This server supports triple patterns
and full-text search on objects.”
“This server supports SPARQL queries
with up to 2 joins.”
“This server supports Linked Data documents.”
64. The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:
client or server?
65. Different assumptions
lead to different trade-offs.
Live querying of public data
is possible at low cost,
but at slower speeds…
…for now :-)
66. Let your browser
solve a SPARQL query:
client.linkeddatafragments.org
Ruben Verborgh
Ghent University – iMinds