The Future is Federated

The future 
is federated
Ruben Verborgh

Big Data thrives 
on centralization.

Knowledge 
is inherently distributed.

Knowledge 
is inherently heterogeneous.

Knowledge on the Web 
is inherently linked.

Centralization
skips
interesting
the most 
problems

Where to ﬁnd
data you need?
How to access them?
How to integrate them?

Let’s create smart apps 
over VIVO and Web data.

a light interface to VIVO data
queries over that interface
an app built on such queries
You’ll get to see 3 things:

We can integrate 
multiple data sources 
on the live Web,
but we need to set 
our expectations right.

The future 
is federated
Big Data fails at Web scale
Light interfaces rule
Engineer for serendipity

<subject> <predicate> <object>.
triple

client
SPARQL 
endpoint
SPARQL protocol
SPARQL 
query

SELECT ?person ?name WHERE {
?person a dbo:Scientist.
?person rdfs:label ?name.
?person dbo:birthPlace dbp:Denver.
}
Hey, SPARQL endpoint…
Sure!

SELECT DISTINCT ?drug ?drug1 ?drug2 ?drug3 ?drug4 ?d1 WHERE {
?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban
?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr1 .
?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o1 .
?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o2 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr2 .
?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o2 .
Hey, SPARQL endpoint…
Sure!

SPARQL endpoints 
try to be the Web’s 
Big Data processors.
for free

few endpoints exist
the average endpoint is 
down for 1.5 days/month
Can I SPARQL 
your endpoint?

Big Data fails 
at Web scale
because Web Scale 
is much bigger.

SEMANTIC 
WEBSHOULDN’T TRY TO COMPETE WITH
BIG DATA

WEB
I WANT TO PUT THE
BACK INTO SEMANTIC WEB
IT’S OUR MAIN DIFFERENTIATOR 
FROM BIG DATA

WEB
IF IT’S NOT
I’M NOT INTERESTED
That’s why I think 
Big Data is boring.

AVERAGE 
HUMAN
What would the
do?

}
AVERAGE 
HUMAN
You can use only Wikipedia.

AVERAGE 
HUMAN
Which scientists were born in Denver?

AVERAGE 
HUMAN
1. visit the page about Denver
2. make a list of people born there
3. read their pages to see if they’re a scientist

WEB LINKING 
IS UNIDIRECTIONAL
a Denver person’s page links to Denver
Denver doesn’t necessarily link to that person

AVERAGE 
HUMAN
We need to empower the
but please not with a SPARQL endpoint 
because they’re so expensive to keep up.

SIMPLEST 
COMPLEXITY
WHAT IS THE
?

THE ESSENCE 
OF RDF
<subject> <predicate> <object>.

THE ESSENCE 
OF LINKED DATA
?subject <predicate> <object>.

THE ESSENCE 
OF LINKED DATA
Denver <predicate> <object>.

THE ESSENCE 
OF TPF
?subject ?predicate ?object.

THE ESSENCE 
OF TPF
?subject ?predicate Denver.

TRIPLE 
PATTERN 
FRAGMENTS

Clients can ask 
the server only 
for triple patterns.

AVERAGE 
HUMAN
Which scientists were born in Denver?
You can only use a TPF interface of DBpedia.

AVERAGE 
HUMAN
1. “?people birthPlace Denver.”
2. “?person type Scientist.”
3. “?person fullName ?name.”

AVERAGE 
MACHINE
1. “?person birthPlace Denver.”
2. “?person type Scientist.”
3. “?person fullName ?name.”

}
AVERAGE 
MACHINE

Engineer for serendipity.
—Roy T. Fielding

If 1 endpoint is down 
for 1.5 days each month,
then 2 endpoints might be 
for 3 days each month.
Federated queries with 
SPARQL endpoints 
pose a problem.

Just ask each of the questions 
to different TPF servers.
Federated queries are 
native to TPF clients.

But in federated scenarios, 
performance can be on par 
with SPARQL endpoints!
TPF trades server cost 
for query performance.

TPF is not the ﬁnal solution
—no API will ever be—
but an excellent starting point.
Lightweight interfaces 
are easy to extend
and combine with others.

The Memento protocol 
brings time to the Web.
Ask for representations at
a certain point in the past.

TPF and Memento 
are a great match.
We combined them in collaboration 
with Herbert Van de Sompel & team 
at the Los Alamos National Laboratory.

VIVO
 
 
 
client SPARQL
VIVO today
TPF 
server

VIVO
 
 
 
client TPF
VIVO tomorrow?

Federation
is a game changer.
with the TPF interface

power
With great
responsibility
comes great

realistic
We need
expectations
about our
to be

Some queries will 
always be hard 
on an open Web.
You might need centralization 
if you want answers fast.
*
*Terms and conditions apply.

…and streaming!
Many more queries 
than you’d think 
are pretty fast…

OPEN SOURCE
linkeddatafragments.org

@RubenVerborgh
 
and it 
starts today
 
The future 
is federated

The Future is Federated

More Related Content

What's hot

Viewers also liked

Similar to The Future is Federated

Recently uploaded

The Future is Federated