SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
Querying data on the Web:
client or server?
Ruben Verborgh
Ghent University – iMinds
2.
The current Semantic Web
has many implicit assumptions.
We should be able
to answer all queries.
Complexity is more important
than availability.
Data servers
need to be expensive.
3.
Those assumptions are
not necessarily wrong.
They’re also not necessarily
the only possible ones.
4.
Some queries are
hard to answer.
Availability is
a top priority.
Low-cost data servers
have potential.
Let’s rethink our assumptions,
just to see what’s possible.
5.
Different assumptions lead
to a different Semantic Web.
Maybe they bring us closer
to the Web We Want.
9.
The Web for humans offers
an HTTP interface to HTML.
client dataHTTP
HTML
10.
The Web for applications offers
an HTTP interface to JSON.
client dataHTTP
JSON
11.
The Web for applications offers
an HTTP interface to RDF.
client dataHTTP
RDF
12.
The Web for applications offers
an SPARQL interface to RDF.
client dataHTTP
RDF
SPARQL
13.
Documents need a new language.
Semantic Web clients were
perceived as very limited.
Querying needs a new protocol.
…unlike “simple” JSON clients.
14.
1. Clients need a different protocol.
2. Live queries require that protocol.
15.
public SPARQL endpoints
There are 3 common ways
to publish Linked Data.
Linked Data documents
downloadable data dumps
16.
…and that’s not always a good thing.
Public SPARQL endpoints
offer a very powerful interface.
Clients can ask any query…
…if the endpoint is available.
Hosting an endpoint is costly.
17.
Low-cost to host.
Linked Data documents
seem to work like the Web.
Solve queries by traversing links.
Many queries cannot be solved.
18.
Set up your own endpoint.
Downloadable data dumps
have high availability.
Data is not live.
You’re not really querying the Web.
19.
1. Clients need a different protocol.
2. Live queries require that protocol.
3. Clients can request any query.
20.
The query language abstracts away
the steps needed to solve it.
In SPARQL, asking a simple query
is as easy as asking a difficult one.
In contrast to the rest of the Web,
clients are in control.
21.
With a JSON interface, the server
decides how clients access data.
client dataHTTP
JSON
22.
client dataHTTP
RDF
SPARQL
With a SPARQL interface, clients
decide how they access data.
23.
Clients can ask anything, also
queries that bring servers down.
The majority
of public SPARQL endpoints
has less than 95% availability.
That means the endpoint
—and thus your application—
doesn’t work 1.5 days each month.
24.
If you have operational need
for SPARQL accessible data,
you must have your own infrastructure.
No public endpoints.
Public endpoints are for lookups and discovery;
sort of a dataset demo.
—Orri Erling, OpenLink (2014)
25.
SEMANTICthings we happen to have
downloaded from the
WEB
26.
If you want to study
a subject on Wikipedia,
do you download all
4,614,000 articles first?
27.
1. Clients need a different protocol.
2. Live queries require that protocol.
3. Clients can request any query.
28.
The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:
client or server?
29.
data
dump
SPARQL
endpoint
Any fragment of a Linked Data set
is called a Linked Data Fragment.
derefer-
encing
high server efforthigh client effort
all subject SPARQL querySELECTOR
30.
Each type of Linked Data Fragment
is defined by three characteristics.
selector
metadata
controls
What data does it contain?
What do we know about it?
What can we do next?
31.
a SPARQL query
(none)
(none)
SPARQL CONSTRUCT result
selector
metadata
controls
Each type of Linked Data Fragment
is defined by three characteristics.
32.
a specific entity
creator, maintainer, …
links to other LD documents
Linked Data Document
selector
metadata
controls
Each type of Linked Data Fragment
is defined by three characteristics.
33.
everything
(none)
data dump
number of triples, file size
selector
metadata
controls
Each type of Linked Data Fragment
is defined by three characteristics.
34.
Can we query fragments that
balance client and server effort?
data
dump
SPARQL
endpoint
triple
pattern
fragments
derefer-
encing
high server efforthigh client effort
all subject SPARQL querytriple pattern
35.
triple pattern
total number of matches
access to all other fragments
selector
metadata
controls
Triple pattern fragments are cheap
yet enable efficient querying.
36.
data (first 100)
controls (other fragments)
metadata (total count)
37.
Other APIs exist, but are specific.
Triple pattern fragment servers
enable clients to execute queries.
Triple patterns work on all datasets.
Combine data, metadata & controls.
38.
How to answer this query using
only triple pattern fragments?
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
}
39.
Get the corresponding fragments
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
40.
Get the corresponding fragments
and read the count metadata.
?person a dbpedia-owl:Artist. ±61,000
±470,000
12
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
41.
Start with the smallest fragment.
Start with the first match.
?person a dbpedia-owl:Artist ±61,
±470,
12
?person dbpedia-owl:birthPlace
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
42.
How to answer this query using
only triple pattern fragments?
SELECT ?person WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
}
43.
Get the corresponding fragments
?person a dbpedia-owl:Artist.
?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
44.
Get the corresponding fragments
and read the count metadata.
?person a dbpedia-owl:Artist. ±61,000
75?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
45.
Start with the smallest fragment.
Start with the first match.
?person a dbpedia-owl:Artist ±61,
75?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
46.
How to answer this query using
only triple pattern fragments?
ASK {
dbp:John_Flaxman a dbpo:Artist.
dbp:John_Flaxman dbpo:birthPlace dbp:York.
dbp:York foaf:name "York"@en.
}
47.
Get the corresponding fragment
and read the count metadata.
dbpedia:John_Flaxman a dbpedia-owl:Artist. 1
dbpedia:John_Flaxman a dbpedia-owl:Artist.
!
Output the match:
?person = dbpedia:John_Flaxman
?city = dbpedia:York
48.
Recursively repeat the process
for all bindings.
?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.
…
49.
Use the Web’s protocol HTTP.
This way of querying
changes the usual assumptions.
Don’t be smart; enable intelligence.
Some queries will be hard / slow.
50.
Querying semantic datasources
means managing expectations.
data
dump
SPARQL
endpoint
triple
pattern
fragments
derefer-
encing
high server efforthigh client effort
low availabilityhigh availability
low freshness / speed high freshness / speed
51.
The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:
client or server?
52.
Coupling access and processing
leads to low availability.
SPARQL Server
Client
Client
Client
Client
Client
Client
Client
(a) sparql endpoints perform all processing on the server, leading to fast
query execution with low data bandwidth, and a rapidly overloaded server.
53.
LDF Server
Client
ClientClient
Client
Client
Client
Client Client
Client
(b) ldf servers only support simple requests and can thus handle far higher
loads. Clients perform the querying, so they need more (cacheable) data.
Enabling clients to query
leads to high scalability.
54.
Show a sorted list of molecules
that match certain characteristics.
…
Molecules endpoint
approach
fragment
approach
55.
Molecules
endpoint
approach
SPARQL
endpoint
Molecules
Show a sorted list of molecules
that match certain characteristics.
56.
endpoint
approachSELECT DISTINCT(?mol) MIN(?name)
WHERE {
?mol rdfs:label ?name;
…
…
}
ORDER BY ?name
Show a sorted list of molecules
that match certain characteristics.
57.
endpoint
approach
Show a sorted list of molecules
that match certain characteristics.
SELECT DISTINCT(?mol) MIN(?name)
WHERE {
?mol rdfs:label ?name;
…
…
}
ORDER BY ?name
58.
endpoint
approach
DISTINCT
MIN
SORT BY
keep all results in memory
keep all results in memory, blocking
keep all results in memory, blocking
Consequences:
Doesn’t matter; we’re waiting anyway.
Show a sorted list of molecules
that match certain characteristics.
59.
fragments
approach
No blocking operators; streaming matters.
Show a sorted list of molecules
that match certain characteristics.
SELECT ?mol ?name
WHERE {
?mol rdfs:label ?name;
…
…
}
60.
Molecules
fragments
approach
MoleculesMolecules
Show a sorted list of molecules
that match certain characteristics.
61.
The algorithm remains the same
when clients use one or multiple
triple pattern fragment servers.
Federation also becomes
substantially easier.
Avoid the unavailability cascade.
62.
An optimal solution doesn’t exist.
We should look at all APIs.
data
dump
SPARQL
endpoint
triple
pattern
fragments
derefer-
encing
63.
Servers indicate what they do,
enabling clients to query optimally.
“This server supports triple patterns
and full-text search on objects.”
“This server supports SPARQL queries
with up to 2 joins.”
“This server supports Linked Data documents.”
64.
The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:
client or server?
65.
Different assumptions
lead to different trade-offs.
Live querying of public data
is possible at low cost,
but at slower speeds…
…for now :-)
66.
Let your browser
solve a SPARQL query:
client.linkeddatafragments.org
Ruben Verborgh
Ghent University – iMinds