Initial Usage Analysis of DBpedia's Triple Pattern Fragments
1. Initial Usage Analysis
of DBpedia's
Triple Pattern Fragments
Ruben Verborgh,
Erik Mannens & Rik Van de Walle
3. What role will the Semantic Web play
for the future generation?
Will it be remotely as important
as the Web now is to us?
4. There used to be no applications
because there was no data.
Linked Data more of less solved
this chicken-and-egg problem.
5. There are no applications
because data is not queryable.
SPARQL endpoints are unreliable.
Data dumps are not live.
6. We analyzed DBpedia’s low-cost
Triple Pattern Fragments interface
between Nov 2014 and Feb 2015.
Over 4M requests were made.
There was 1 minute of downtime.
7. Web interfaces to triples
Four months of fragments
Extending the analysis
8. Web interfaces to triples
Four months of fragments
Extending the analysis
9. Web interfaces act as gateways
between clients and databases.
Database Client
Web
interface
The interface hides the database schema.
The interface restricts the kind of queries.
10. No sane Web developer or admin
would give direct database access.
Database Client
Web
interface
The client must know the database schema.
The client can ask any query.
11. SPARQL endpoints happily give
direct access to the database.
Triple
store
Client
SPARQL
protocol
The client must know the database schema.
The client can ask any query.
12. SPARQL interfaces are expensive,
so we have an availability problem.
There are few SPARQL endpoints
because hosting them is expensive.
Many of the endpoints that exist
suffer from low availability.
You already give data for free.
Do you have to pay for query time as well?
13. Data dumps allow you to set up
your own private SPARQL endpoint.
But then we no longer query the Web…
No usage statistics whatsoever.
Not everybody can do this:
mobile devices, non-technical users, …
14. The interface hides the database schema.
The interface restricts the kind of queries.
A Triple Pattern Fragments interface
acts as a gateway to an RDF source.
RDF
source
Client
TPF
interface
15. A Triple Pattern Fragments interface
acts as a gateway to an RDF source.
Client can only ask ?s ?p ?o patterns.
Decompose complex SPARQL queries
on the client client-side.
Low server cost, highly cacheable,
but higher bandwidth and query time.
18. Web interfaces to triples
Four months of fragments
Extending the analysis
19. In mid-October 2014, we started
an official TPF interface for DBpedia.
Will this interface be used?
How will clients use it?
Will the availability be sufficient
for live application usage?
20. The server is deployed virtually,
availability monitored externally.
Amazon Elastic Compute Cloud
c3.2xlarge machine (8 CPU, 15GB RAM)
Compressed HDT format as backend
Pingdom for analysis
26. A quarter of all requests was cached
(but we could cache everything).
27. During four months,
the API had 99.9994% availability.
We deeply apologize for
that one minute of downtime
in November.
28. Web interfaces to triples
Four months of fragments
Extending the analysis
29. We don’t know exactly
which clients executed queries.
Was the TPF client used standalone?
As a library of another application?
Also hard for SPARQL endpoints.
30. The analysis did not give insights
in which queries clients executed.
Good for privacy!
We can try reconstructing SPARQL queries,
but maybe clients did something else.
We only know with SPARQL endpoints,
not with data dumps or LD documents.
31. We could learn from the human Web:
can clients give explicit feedback?
“This is the query I executed.
It took me 5 seconds.”
Potential source of insights,
but clients need a gain.
Will this be representative/truthful?
32. Web interfaces to triples
Four months of fragments
Extending the analysis
33. We have a >99.999% available API
to the most popular RDF datasource.
No more excuses not to build apps.
So where are they?
Is something else holding us back?
34. We need to think differently
on how to build Linked Data apps.
The paradigm of querying a database
and waiting for the results
does not scale to the Web.
Live data requires new interfaces
and new visualizations.
35. We need developers to build bridges
from data to end users.
Now that the chicken-and-egg problem
and the availability problems are solved,
we need to tackle fundamental questions.
Where are the killer apps
the next generation is waiting for?