The future

is federated
Ruben Verborgh
Big Data
I think
is boring.
Big Data thrives

on centralization.
Knowledge

is inherently distributed.
Knowledge

is inherently heterogeneous.
Knowledge on the Web

is inherently linked.
Centralization
skips
interesting
the most

problems
Where to find
data you need?
How to access them?
How to integrate them?
Let’s create smart apps

over VIVO and Web data.
a light interface to VIVO data
queries over that interface
an app built on such queries
You’ll get to see 3 things:
We can integrate

multiple data sources

on the live Web,
but we need to set

our expectations right.


The future

is federated
Big Data fails at Web scale
Light interfaces rule
Engineer for serendipity


The future

is federated
Big Data fails at Web scale
Light interfaces rule
Engineer for serendipity
RDFTHE DATA LANGUAGE
<subject> <predicate> <object>.
triple
SPARQLTHE QUERY LANGUAGE
SPARQLTHE PROTOCOL
client
SPARQL

endpoint
SPARQL protocol
SPARQL

query
SELECT ?person ?name WHERE {
?person a dbo:Scientist.
?person rdfs:label ?name.
?person dbo:birthPlace dbp:Denver.
}
Hey, SPARQL endpoint…
Sure!
SELECT DISTINCT ?drug ?drug1 ?drug2 ?drug3 ?drug4 ?d1 WHERE {
?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban
?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban
?drug3 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban
?drug4 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban
?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps1 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr1 .
?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o1 .
?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o2 .
?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps2 .
?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr2 .
?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o2 .
Hey, SPARQL endpoint…
Sure!
SPARQL endpoints

try to be the Web’s

Big Data processors.
for free
few endpoints exist
the average endpoint is

down for 1.5 days/month
Can I SPARQL

your endpoint?
Big Data fails

at Web scale
because Web Scale

is much bigger.
SEMANTIC

WEBSHOULDN’T TRY TO COMPETE WITH
BIG DATA
WEB
I WANT TO PUT THE
BACK INTO SEMANTIC WEB
IT’S OUR MAIN DIFFERENTIATOR

FROM BIG DATA
WEB
IF IT’S NOT
I’M NOT INTERESTED
That’s why I think

Big Data is boring.


The future

is federated
Big Data fails at Web scale
Light interfaces rule
Engineer for serendipity
AVERAGE

HUMAN
What would the
do?
SELECT ?person ?name WHERE {
?person a dbo:Scientist.
?person rdfs:label ?name.
?person dbo:birthPlace dbp:Denver.
}
AVERAGE

HUMAN
You can use only Wikipedia.
AVERAGE

HUMAN
Which scientists were born in Denver?
You can use only Wikipedia.
AVERAGE

HUMAN
1. visit the page about Denver
2. make a list of people born there
3. read their pages to see if they’re a scientist
You can use only Wikipedia.
WEB LINKING

IS UNIDIRECTIONAL
a Denver person’s page links to Denver
Denver doesn’t necessarily link to that person
AVERAGE

HUMAN
1. visit the page about Denver
2. make a list of people born there
3. read their pages to see if they’re a scientist
You can use only Wikipedia.
AVERAGE

HUMAN
We need to empower the
but please not with a SPARQL endpoint

because they’re so expensive to keep up.
SIMPLEST

COMPLEXITY
WHAT IS THE
?
THE ESSENCE

OF RDF
<subject> <predicate> <object>.
THE ESSENCE

OF LINKED DATA
?subject <predicate> <object>.
THE ESSENCE

OF LINKED DATA
Denver <predicate> <object>.
THE ESSENCE

OF TPF
?subject ?predicate ?object.
THE ESSENCE

OF TPF
?subject ?predicate Denver.
TRIPLE

PATTERN

FRAGMENTS
Clients can ask

the server only

for triple patterns.
AVERAGE

HUMAN
Which scientists were born in Denver?
You can only use a TPF interface of DBpedia.
AVERAGE

HUMAN
1. “?people birthPlace Denver.”
2. “?person type Scientist.”
3. “?person fullName ?name.”
You can only use a TPF interface of DBpedia.
AVERAGE

MACHINE
1. “?person birthPlace Denver.”
2. “?person type Scientist.”
3. “?person fullName ?name.”
You can only use a TPF interface of DBpedia.
SELECT ?person ?name WHERE {
?person a dbo:Scientist.
?person rdfs:label ?name.
?person dbo:birthPlace dbp:Denver.
}
AVERAGE

MACHINE
You can only use a TPF interface of DBpedia.


The future

is federated
Big Data fails at Web scale
Light interfaces rule
Engineer for serendipity
Engineer for serendipity.
—Roy T. Fielding
If 1 endpoint is down

for 1.5 days each month,
then 2 endpoints might be

for 3 days each month.
Federated queries with

SPARQL endpoints

pose a problem.
Just ask each of the questions

to different TPF servers.
Federated queries are

native to TPF clients.
But in federated scenarios,

performance can be on par

with SPARQL endpoints!
TPF trades server cost

for query performance.
TPF is not the final solution
—no API will ever be—
but an excellent starting point.
Lightweight interfaces

are easy to extend
and combine with others.
The Memento protocol

brings time to the Web.
Ask for representations at
a certain point in the past.
TPF and Memento

are a great match.
We combined them in collaboration

with Herbert Van de Sompel & team

at the Los Alamos National Laboratory.


The future

is federated
Big Data fails at Web scale
Light interfaces rule
Engineer for serendipity
VIVO






client SPARQL
VIVO today
TPF

server
VIVO






client TPF
VIVO tomorrow?
Federation
is a game changer.
Federation
is a game changer.
with the TPF interface
power
With great
responsibility
comes great
realistic
We need
expectations
about our
to be
Some queries will

always be hard

on an open Web.
You might need centralization

if you want answers fast.
*
*Terms and conditions apply.
…and streaming!
Many more queries

than you’d think

are pretty fast…
OPEN SOURCE
linkeddatafragments.org
@RubenVerborgh


and it

starts today


The future

is federated

The Future is Federated