SlideShare a Scribd company logo
1 of 46
Querying datasets on the Web 
with high availability 
Ruben Verborgh, Olaf Hartig, Ben De Meester, Gerald Haesendonck, 
Laurens De Vocht, Miel Vander Sande, Richard Cyganiak, Pieter Colpaert, 
Erik Mannens, and Rik Van de Walle
Somebody comes to you 
with a high-quality 
Linked Open Data set: 
“How can people query 
my Linked Data on the Web?”
“A public SPARQL endpoint 
gives live querying, but it’s costly 
and has availability issues.” 
“Offer a data dump. 
but it’s not really Web querying: 
users need to set up an endpoint” 
“Publish Linked Data documents. 
But querying is very slow…”
Querying Linked Data 
on the Web 
always involves trade-offs. 
But have we looked 
at all possible trade-offs?
Querying Linked Data 
live on the Web 
becomes affordable 
by building simpler servers 
and more intelligent clients.
We formalized and evaluated 
client-side SPARQL querying 
through a triple-pattern interface. 
The trade-offs are different: 
low-cost, high-availability servers, 
more bandwidth and slower queries.
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
Evaluating the trade-offs
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
Evaluating the trade-offs
It is not an all-or-nothing world. 
There is a spectrum of trade-offs. 
out-of-date data live data 
high bandwidth low bandwidth 
high availability low availability 
high client cost low client cost 
low server cost high server cost 
data 
dump 
SPARQL 
endpoint 
Linked Data 
documents 
interface offered by the server
Linked Data Fragments are 
a uniform view on Linked Data interfaces. 
data 
dump 
SPARQL 
endpoint 
Every Linked Data interface 
offers specific fragments 
of a Linked Data set. 
Linked Data 
documents 
interface offered by the server
Each type of Linked Data Fragment 
is defined by three characteristics. 
data 
metadata 
controls 
What triples does it contain? 
What do we know about it? 
How to access more data?
Each type of Linked Data Fragment 
is defined by three characteristics. 
all dataset triples 
(none) 
data dump 
number of triples, file size 
data 
metadata 
controls
Each type of Linked Data Fragment 
is defined by three characteristics. 
SPARQL query result 
data 
triples matching the query 
(none) 
(none) 
metadata 
controls
Each type of Linked Data Fragment 
is defined by three characteristics. 
Linked Data Document 
data 
triples about a subject 
creator, maintainer, … 
links to other LD documents 
metadata 
controls
We designed a new trade-off mix 
with low cost and high availability. 
out-of-date data live data 
high bandwidth low bandwidth 
high availability low availability 
high client cost low client cost 
low server cost high server cost 
data 
dump 
SPARQL 
query results 
Linked Data 
documents
A triple pattern fragments interface 
is low-cost and enables clients to query. 
live data 
low server cost 
data 
dump 
SPARQL 
query results 
high availability 
Linked Data 
documents 
triple pattern 
fragments
A triple pattern fragments interface 
is low-cost and enables clients to query. 
matches of a triple pattern 
total number of matches 
access to all other fragments 
data 
metadata 
controls 
(paged)
controls (other fragments) 
metadata (total count) 
data (first 100)
Triple patterns are not the final answer. 
No interface ever will be. 
Triple patterns show how far we can get 
with simple servers and smart clients. 
data 
dump 
SPARQL 
query results 
Linked Data 
documents 
triple pattern 
fragments
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
Evaluating the trade-offs
Triple pattern fragment servers 
enable clients to be intelligent. 
The HTML representation explains: 
“you can query by triple pattern”. 
controls
Triple pattern fragment servers 
enable clients to be intelligent. 
<http://fragments.dbpedia.org/2014/en#dataset> hydra:search [ 
hydra:template "http://fragments.dbpedia.org/2014/en 
controls 
{?subject,predicate,object}"; 
hydra:mapping 
[ hydra:variable "subject"; hydra:property rdf:subject ], 
[ hydra:variable "predicate"; hydra:property rdf:predicate ], 
[ hydra:variable "object"; hydra:property rdf:object ] 
]. 
The RDF representation explains: 
“you can query by triple pattern”.
Triple pattern fragment servers 
enable clients to be intelligent. 
The HTML representation explains: 
“this is the number of matches”. 
metadata
Triple pattern fragment servers 
enable clients to be intelligent. 
<#fragment> void:triples 8141. 
The RDF representation explains: 
“this is the number of matches”. 
metadata
How can intelligent clients 
solve SPARQL queries over fragments? 
Give them a SPARQL query. 
Give them a URL of any dataset fragment. 
They look inside the fragment 
to see how to access the dataset 
and use the metadata 
to decide how to plan the query.
Let’s follow the execution 
of an example SPARQL query. 
Find names of artists born in Padua, Italy. 
SELECT ?artist ?name WHERE { 
?artist a dbpedia-owl:Artist; 
rdfs:label ?name; 
dbpedia-owl:birthPlace dbpedia:Padua. 
FILTER LANGMATCHES(LANG(?name), "EN") 
} 
Fragment: http://fragments.dbpedia.org/2014/en
The client looks inside the fragment 
to see how to access the dataset. 
Fragment: http://fragments.dbpedia.org/2014/en 
<http://fragments.dbpedia.org/2014/en#dataset> hydra:search [ 
hydra:template "http://fragments.dbpedia.org/2014/en 
{?subject,predicate,object}"; 
hydra:mapping 
[ hydra:variable "subject"; hydra:property rdf:subject ], 
[ hydra:variable "predicate"; hydra:property rdf:predicate ], 
[ hydra:variable "object"; hydra:property rdf:object ] 
]. 
“I can query the dataset by triple pattern.”
The client splits the query 
into the available fragments. 
SELECT ?artist ?name WHERE { 
?artist a dbpedia-owl:Artist; 
rdfs:label ?name; 
dbpedia-owl:birthPlace dbpedia:Padua. 
FILTER LANGMATCHES(LANG(?name), "EN") 
}
The client gets the fragments 
and inspects their metadata. 
?artist a dbpedia-owl:Artist. 
first 100 triples 
96.000 
?artist rdfs:label ?name. 
first 100 triples 
12.000.000 
?artist dbont:birthPlace dbpedia:Padua. 
first 100 triples 
135
The metadata enables the client 
to choose the right starting point. 
?dbp:artist Alberto_a dbpedia-Benettin owl:a Artist. dbont:Artist. 
96.000 
?artist rdfs:label ?name. 12.000.000 
dbp:Alberto_Benettin rdfs:label ?name. 
?artist dbont:birthPlace dbpedia:Padua. 
135 
dbpedia:Alberto_Benettin dbont:birthPlace dbpedia:Padua. 
dbpedia:Alberto_Bigon dbont:birthPlace dbpedia:Padua.
Clients execute the query in 3 seconds 
on a highly available, low-cost server. 
SELECT ?artist ?name WHERE { 
?artist a dbpedia-owl:Artist; 
rdfs:label ?name; 
dbpedia-owl:birthPlace dbpedia:Padua. 
FILTER LANGMATCHES(LANG(?name), "EN") 
} 
Try it yourself: 
bit.ly/artistspadua
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
Evaluating the trade-offs
We evaluated triple pattern fragments 
for server cost and availability. 
Berlin SPARQL benchmark 
on Amazon EC2 machines 
100 million triples 
high query diversity 
BGP, UNION, FILTER, …
We evaluated triple pattern fragments 
for server cost and availability. 
Berlin SPARQL benchmark 
on Amazon EC2 machines 
1 server 
1 cache 
1–240 simultaneous clients
The query throughput is lower, 
but resilient to high client numbers. 
Querying Datasets on 1 10 100 10 100 1000 10000 clients 
throughput (q/hr) 
Virtuoso 6 Fuseki–tdb triple pattern Fig. 3.1: Server performance (log-log plot) 
200 
executed SPARQL queries per hour 
timeouts
The server traffic is higher, 
but requests are significantly lighter. 
Datasets on the Web with High Availability 13 
Virtuoso 6 Virtuoso 7 
Fuseki–tdb Fuseki–hdt 
pattern fragments 
1 10 100 
4 
2 
0 
clients 
data sent (mb) 
Fig. 3.2: Server network traffic 
data sent by server in MB
2 
Caching is significantly more effective, 
as clients reuse fragments for queries. 
1 10 100 
0 
clients 
sent (mb) 
Fig. 3.2: Server network traffic 
1 10 100 
20 
10 
0 
clients 
sent (mb) 
Fig. 3.4: Cache network traffic 
ram use data sent by cache in MB 
8 
6 
4
150 
The server uses much less CPU, 
allowing for higher availability. 
clients 
1 10 100 
1 1 10 100 
100 
50 
0 
100 
50 
1 50 
server CPU usage per core 
#timeouts 
Fig. 3.3: Query timeouts 
0 
clients 
cpu use (%) 
Fig. 3.5: Server processor usage per core 
100 
use (%)
150 
Servers enable clients to be intelligent, 
so they remain simple and light-weight. 
1 10 100 
1 1 10 100 
100 
50 
0 
clients 
#timeouts 
Fig. 3.3: Query timeouts 
100 
50 
0 
clients 
cpu use (%) 
1 Fig. 3.5: Server processor usage per core 
50 
100 
use (%) 
server CPU usage per core
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
Evaluating the trade-offs
Triple pattern fragments allow 
querying live data with high availability. 
live data 
low server cost 
data 
dump 
SPARQL 
query results 
high availability 
Linked Data 
documents 
triple pattern 
fragments
Like each Linked Data query method, 
triple pattern fragments have trade-offs. 
Live data 
Low-cost and highly available servers 
More bandwidth 
Slower results (but they are streaming)
Experience the trade-offs yourself 
on the official DBpedia interfaces. 
DBpedia data dump 
DBpedia Linked Data documents 
DBpedia SPARQL endpoint 
DBpedia triple pattern fragments 
fragments.dbpedia.org
Triple pattern fragments are easy: 
all software is available as open source. 
Software 
github.com/LinkedDataFragments 
Documentation and specification 
linkeddatafragments.org
Querying is a client–server dialogue. 
Start exploring the entire spectrum. 
data 
dump 
SPARQL 
query results 
Linked Data 
documents 
triple pattern 
fragments
Querying datasets on the Web 
with high availability 
linkeddatafragments.org 
@LDFragments

More Related Content

What's hot

The Digital Cavemen of Linked Lascaux
The Digital Cavemen of Linked LascauxThe Digital Cavemen of Linked Lascaux
The Digital Cavemen of Linked LascauxRuben Verborgh
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsRuben Verborgh
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD CloudRuben Verborgh
 
Distributed Affordance
Distributed AffordanceDistributed Affordance
Distributed AffordanceRuben Verborgh
 
Linking media, data, and services
Linking media, data, and servicesLinking media, data, and services
Linking media, data, and servicesRuben Verborgh
 
The web – A hypermedia story
The web – A hypermedia storyThe web – A hypermedia story
The web – A hypermedia storyRuben Verborgh
 
Hypermedia APIs that make sense
Hypermedia APIs that make senseHypermedia APIs that make sense
Hypermedia APIs that make senseRuben Verborgh
 
Hypermedia Cannot be the Engine
Hypermedia Cannot be the EngineHypermedia Cannot be the Engine
Hypermedia Cannot be the EngineRuben Verborgh
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraMarkus Lanthaler
 
Overview of GraphQL & Clients
Overview of GraphQL & ClientsOverview of GraphQL & Clients
Overview of GraphQL & ClientsPokai Chang
 
RESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumptionRESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumptionRuben Verborgh
 
CrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossref
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesMichael Nelson
 
On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataMiel Vander Sande
 
GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑Pokai Chang
 
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAPOpen Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAPPieter De Leenheer
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitTom Bennet
 

What's hot (20)

The Digital Cavemen of Linked Lascaux
The Digital Cavemen of Linked LascauxThe Digital Cavemen of Linked Lascaux
The Digital Cavemen of Linked Lascaux
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIs
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 
Reasoned SPARQL
Reasoned SPARQLReasoned SPARQL
Reasoned SPARQL
 
Distributed Affordance
Distributed AffordanceDistributed Affordance
Distributed Affordance
 
Linking media, data, and services
Linking media, data, and servicesLinking media, data, and services
Linking media, data, and services
 
The web – A hypermedia story
The web – A hypermedia storyThe web – A hypermedia story
The web – A hypermedia story
 
Hypermedia APIs that make sense
Hypermedia APIs that make senseHypermedia APIs that make sense
Hypermedia APIs that make sense
 
Hypermedia Cannot be the Engine
Hypermedia Cannot be the EngineHypermedia Cannot be the Engine
Hypermedia Cannot be the Engine
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with Hydra
 
Overview of GraphQL & Clients
Overview of GraphQL & ClientsOverview of GraphQL & Clients
Overview of GraphQL & Clients
 
RESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumptionRESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumption
 
CrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossRef Technical Information for Libraries
CrossRef Technical Information for Libraries
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑
 
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAPOpen Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
 

Similar to Querying datasets on the Web with high availability

Time travelling through DBpedia
Time travelling through DBpediaTime travelling through DBpedia
Time travelling through DBpediaMiel Vander Sande
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBArangoDB Database
 
A sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data ArchivesA sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data ArchivesMiel Vander Sande
 
Reproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archiveReproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archiveMiel Vander Sande
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddlerholiman
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDBBrian Ritchie
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.darach
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)jaxLondonConference
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
 

Similar to Querying datasets on the Web with high availability (20)

Time travelling through DBpedia
Time travelling through DBpediaTime travelling through DBpedia
Time travelling through DBpedia
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
 
A sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data ArchivesA sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data Archives
 
Reproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archiveReproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archive
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddler
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
Practical OData
Practical ODataPractical OData
Practical OData
 

Recently uploaded

VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneCall girls in Ahmedabad High profile
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 

Recently uploaded (20)

VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 

Querying datasets on the Web with high availability

  • 1. Querying datasets on the Web with high availability Ruben Verborgh, Olaf Hartig, Ben De Meester, Gerald Haesendonck, Laurens De Vocht, Miel Vander Sande, Richard Cyganiak, Pieter Colpaert, Erik Mannens, and Rik Van de Walle
  • 2. Somebody comes to you with a high-quality Linked Open Data set: “How can people query my Linked Data on the Web?”
  • 3. “A public SPARQL endpoint gives live querying, but it’s costly and has availability issues.” “Offer a data dump. but it’s not really Web querying: users need to set up an endpoint” “Publish Linked Data documents. But querying is very slow…”
  • 4. Querying Linked Data on the Web always involves trade-offs. But have we looked at all possible trade-offs?
  • 5. Querying Linked Data live on the Web becomes affordable by building simpler servers and more intelligent clients.
  • 6. We formalized and evaluated client-side SPARQL querying through a triple-pattern interface. The trade-offs are different: low-cost, high-availability servers, more bandwidth and slower queries.
  • 7. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  • 8. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  • 9. It is not an all-or-nothing world. There is a spectrum of trade-offs. out-of-date data live data high bandwidth low bandwidth high availability low availability high client cost low client cost low server cost high server cost data dump SPARQL endpoint Linked Data documents interface offered by the server
  • 10. Linked Data Fragments are a uniform view on Linked Data interfaces. data dump SPARQL endpoint Every Linked Data interface offers specific fragments of a Linked Data set. Linked Data documents interface offered by the server
  • 11. Each type of Linked Data Fragment is defined by three characteristics. data metadata controls What triples does it contain? What do we know about it? How to access more data?
  • 12. Each type of Linked Data Fragment is defined by three characteristics. all dataset triples (none) data dump number of triples, file size data metadata controls
  • 13. Each type of Linked Data Fragment is defined by three characteristics. SPARQL query result data triples matching the query (none) (none) metadata controls
  • 14. Each type of Linked Data Fragment is defined by three characteristics. Linked Data Document data triples about a subject creator, maintainer, … links to other LD documents metadata controls
  • 15. We designed a new trade-off mix with low cost and high availability. out-of-date data live data high bandwidth low bandwidth high availability low availability high client cost low client cost low server cost high server cost data dump SPARQL query results Linked Data documents
  • 16. A triple pattern fragments interface is low-cost and enables clients to query. live data low server cost data dump SPARQL query results high availability Linked Data documents triple pattern fragments
  • 17. A triple pattern fragments interface is low-cost and enables clients to query. matches of a triple pattern total number of matches access to all other fragments data metadata controls (paged)
  • 18. controls (other fragments) metadata (total count) data (first 100)
  • 19. Triple patterns are not the final answer. No interface ever will be. Triple patterns show how far we can get with simple servers and smart clients. data dump SPARQL query results Linked Data documents triple pattern fragments
  • 20. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  • 21. Triple pattern fragment servers enable clients to be intelligent. The HTML representation explains: “you can query by triple pattern”. controls
  • 22. Triple pattern fragment servers enable clients to be intelligent. <http://fragments.dbpedia.org/2014/en#dataset> hydra:search [ hydra:template "http://fragments.dbpedia.org/2014/en controls {?subject,predicate,object}"; hydra:mapping [ hydra:variable "subject"; hydra:property rdf:subject ], [ hydra:variable "predicate"; hydra:property rdf:predicate ], [ hydra:variable "object"; hydra:property rdf:object ] ]. The RDF representation explains: “you can query by triple pattern”.
  • 23. Triple pattern fragment servers enable clients to be intelligent. The HTML representation explains: “this is the number of matches”. metadata
  • 24. Triple pattern fragment servers enable clients to be intelligent. <#fragment> void:triples 8141. The RDF representation explains: “this is the number of matches”. metadata
  • 25. How can intelligent clients solve SPARQL queries over fragments? Give them a SPARQL query. Give them a URL of any dataset fragment. They look inside the fragment to see how to access the dataset and use the metadata to decide how to plan the query.
  • 26. Let’s follow the execution of an example SPARQL query. Find names of artists born in Padua, Italy. SELECT ?artist ?name WHERE { ?artist a dbpedia-owl:Artist; rdfs:label ?name; dbpedia-owl:birthPlace dbpedia:Padua. FILTER LANGMATCHES(LANG(?name), "EN") } Fragment: http://fragments.dbpedia.org/2014/en
  • 27. The client looks inside the fragment to see how to access the dataset. Fragment: http://fragments.dbpedia.org/2014/en <http://fragments.dbpedia.org/2014/en#dataset> hydra:search [ hydra:template "http://fragments.dbpedia.org/2014/en {?subject,predicate,object}"; hydra:mapping [ hydra:variable "subject"; hydra:property rdf:subject ], [ hydra:variable "predicate"; hydra:property rdf:predicate ], [ hydra:variable "object"; hydra:property rdf:object ] ]. “I can query the dataset by triple pattern.”
  • 28. The client splits the query into the available fragments. SELECT ?artist ?name WHERE { ?artist a dbpedia-owl:Artist; rdfs:label ?name; dbpedia-owl:birthPlace dbpedia:Padua. FILTER LANGMATCHES(LANG(?name), "EN") }
  • 29. The client gets the fragments and inspects their metadata. ?artist a dbpedia-owl:Artist. first 100 triples 96.000 ?artist rdfs:label ?name. first 100 triples 12.000.000 ?artist dbont:birthPlace dbpedia:Padua. first 100 triples 135
  • 30. The metadata enables the client to choose the right starting point. ?dbp:artist Alberto_a dbpedia-Benettin owl:a Artist. dbont:Artist. 96.000 ?artist rdfs:label ?name. 12.000.000 dbp:Alberto_Benettin rdfs:label ?name. ?artist dbont:birthPlace dbpedia:Padua. 135 dbpedia:Alberto_Benettin dbont:birthPlace dbpedia:Padua. dbpedia:Alberto_Bigon dbont:birthPlace dbpedia:Padua.
  • 31. Clients execute the query in 3 seconds on a highly available, low-cost server. SELECT ?artist ?name WHERE { ?artist a dbpedia-owl:Artist; rdfs:label ?name; dbpedia-owl:birthPlace dbpedia:Padua. FILTER LANGMATCHES(LANG(?name), "EN") } Try it yourself: bit.ly/artistspadua
  • 32. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  • 33. We evaluated triple pattern fragments for server cost and availability. Berlin SPARQL benchmark on Amazon EC2 machines 100 million triples high query diversity BGP, UNION, FILTER, …
  • 34. We evaluated triple pattern fragments for server cost and availability. Berlin SPARQL benchmark on Amazon EC2 machines 1 server 1 cache 1–240 simultaneous clients
  • 35. The query throughput is lower, but resilient to high client numbers. Querying Datasets on 1 10 100 10 100 1000 10000 clients throughput (q/hr) Virtuoso 6 Fuseki–tdb triple pattern Fig. 3.1: Server performance (log-log plot) 200 executed SPARQL queries per hour timeouts
  • 36. The server traffic is higher, but requests are significantly lighter. Datasets on the Web with High Availability 13 Virtuoso 6 Virtuoso 7 Fuseki–tdb Fuseki–hdt pattern fragments 1 10 100 4 2 0 clients data sent (mb) Fig. 3.2: Server network traffic data sent by server in MB
  • 37. 2 Caching is significantly more effective, as clients reuse fragments for queries. 1 10 100 0 clients sent (mb) Fig. 3.2: Server network traffic 1 10 100 20 10 0 clients sent (mb) Fig. 3.4: Cache network traffic ram use data sent by cache in MB 8 6 4
  • 38. 150 The server uses much less CPU, allowing for higher availability. clients 1 10 100 1 1 10 100 100 50 0 100 50 1 50 server CPU usage per core #timeouts Fig. 3.3: Query timeouts 0 clients cpu use (%) Fig. 3.5: Server processor usage per core 100 use (%)
  • 39. 150 Servers enable clients to be intelligent, so they remain simple and light-weight. 1 10 100 1 1 10 100 100 50 0 clients #timeouts Fig. 3.3: Query timeouts 100 50 0 clients cpu use (%) 1 Fig. 3.5: Server processor usage per core 50 100 use (%) server CPU usage per core
  • 40. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  • 41. Triple pattern fragments allow querying live data with high availability. live data low server cost data dump SPARQL query results high availability Linked Data documents triple pattern fragments
  • 42. Like each Linked Data query method, triple pattern fragments have trade-offs. Live data Low-cost and highly available servers More bandwidth Slower results (but they are streaming)
  • 43. Experience the trade-offs yourself on the official DBpedia interfaces. DBpedia data dump DBpedia Linked Data documents DBpedia SPARQL endpoint DBpedia triple pattern fragments fragments.dbpedia.org
  • 44. Triple pattern fragments are easy: all software is available as open source. Software github.com/LinkedDataFragments Documentation and specification linkeddatafragments.org
  • 45. Querying is a client–server dialogue. Start exploring the entire spectrum. data dump SPARQL query results Linked Data documents triple pattern fragments
  • 46. Querying datasets on the Web with high availability linkeddatafragments.org @LDFragments