Semantic Web Solutions For Large-Scale
Biomedical Data Analytics (SEWEBMEDA)
Workshop at ESWC2017, Portoroz,
Slovenia
May 28th, 2017
Federated Query Formulation and
Processing through BioFed
Ali Hasnain, Syeda Sana E Zainab, Dure Zehra,
Qaiser Mehmood, Muhammad Saleem and Dietrich
Rebholz-Schuhmann
1
OUTLINE
1. Introduction
2. BioFed query processing
 Source selection
 Query re-writing
3. Evaluation
4. Biofed demo
2
INTRODUCTION
 Linked, decentralized
and distributed architecture
 9,960 datasets
 ~150B triples
 Complex information needs
 Need for federated queries
3
INTRODUCTION: EXAMPLE
Return the party membership and news pages about all US presidents.
 Party memberships
 US presidents
 US presidents
 News pages
 Computation of results require data from both sources
4
Integrator
Source Selection
Parse Query
SERVICE Annotation
Road
Map
BIOFED: QUERY PROCESSING
Get Individual Triple
Patterns
Identify relevant
sources
Generate optimized
query Execution Plan
Integrate sub-queries
results
Execute sub-queries
5
Federator Optimizer
Rewrite query, i.e.,
add SPARQL SERVICES
BioFed
Engine
BIOFED: SOURCE SELECTION
Two steps triple pattern-wise source selection:
1. Road Map lookup for predicate of each triple pattern
 Select those sources that contain the predicate
 Select all sources if predicate is unbound
2. If subject or object of triple pattern is bound
 Send SPARQL ASK query to each of the selected source in step 1, asking
for the complete triple pattern
 Prune relevant sources that returns false for the SPARQL ASK query
6
BIOFED: SOURCE SELECTION
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
//TP1
//TP3
//TP4
//TP5
//TP2
7
Step 1: Road Map lookup
for rdf:type
S2 S3 S4
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
BIOFED: SOURCE SELECTION
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
//TP1
//TP3
//TP4
//TP5
//TP2
8
S2 S3 S4
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
Step 2: Prune step 1 sources
using SPARQL ASK queries
ASK{ ?president rdf:type
dbpedia:President}
S1 S2 S3 S4
BIOFED: SOURCE SELECTION
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
//TP1
//TP3
//TP4
//TP5
//TP2
9
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
MOTIVATION: SOURCE SELECTION
10
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
MOTIVATION: SOURCE SELECTION
11
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 =
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
MOTIVATION: SOURCE SELECTION
12
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
MOTIVATION: SOURCE SELECTION
13
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S4
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
BIOFED: QUERY RE-WRITING
SPARQL 1.0 To SPARQL 1.1 conversion
14
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S4SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President . //TP1
?president dbpedia:nationality dbpedia:United_States . //TP2
?president dbpedia:party ?party . //TP3
?x nyt:topicPage ?page . //TP4
?x owl:sameAs ?president . //TP5
}
BIOFED: QUERY RE-WRITING
SPARQL 1.0 To SPARQL 1.1 conversion
 Combine triple patterns having same, one and only one relevant source
15
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S4
SELECT ?president ?party ?page
WHERE {
SERVICE <S1> {
?president rdf:type dbpedia:President . //TP1
?president dbpedia:nationality dbpedia:United_States . //TP2
?president dbpedia:party ?party . } //TP3
SERVICE <S4> { ?x nyt:topicPage ?page . } //TP4
?x owl:sameAs ?president . //TP5
}
BIOFED: QUERY RE-WRITING
SPARQL 1.0 To SPARQL 1.1 conversion
 Combine triple patterns having same, one and only one relevant source
 Use UNION and SERVICE for triple patterns with more than one relevant sources
16
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S4
SELECT ?president ?party ?page
WHERE {
SERVICE <S1> {
?president rdf:type dbpedia:President . //TP1
?president dbpedia:nationality dbpedia:United_States . //TP2
?president dbpedia:party ?party . } //TP3
SERVICE <S4> { ?x nyt:topicPage ?page . } //TP4
{ SERVICE<S1> { ?x owl:sameAs ?president . }} //TP5
UNION {
SERVICE<S2> { ?x owl:sameAs ?president . }} //TP5
UNION {
SERVICE<S4> { ?x owl:sameAs ?president . }} //TP5
}
COMPARISON ON
LARGERDFBENCH
17
COMPARISON ON
LARGERDFBENCH
18
http://vmurq09.deri.ie:8007/
19
THANK YOU
20

Federated Query Formulation and Processing through BioFed

  • 1.
    Semantic Web SolutionsFor Large-Scale Biomedical Data Analytics (SEWEBMEDA) Workshop at ESWC2017, Portoroz, Slovenia May 28th, 2017 Federated Query Formulation and Processing through BioFed Ali Hasnain, Syeda Sana E Zainab, Dure Zehra, Qaiser Mehmood, Muhammad Saleem and Dietrich Rebholz-Schuhmann 1
  • 2.
    OUTLINE 1. Introduction 2. BioFedquery processing  Source selection  Query re-writing 3. Evaluation 4. Biofed demo 2
  • 3.
    INTRODUCTION  Linked, decentralized anddistributed architecture  9,960 datasets  ~150B triples  Complex information needs  Need for federated queries 3
  • 4.
    INTRODUCTION: EXAMPLE Return theparty membership and news pages about all US presidents.  Party memberships  US presidents  US presidents  News pages  Computation of results require data from both sources 4
  • 5.
    Integrator Source Selection Parse Query SERVICEAnnotation Road Map BIOFED: QUERY PROCESSING Get Individual Triple Patterns Identify relevant sources Generate optimized query Execution Plan Integrate sub-queries results Execute sub-queries 5 Federator Optimizer Rewrite query, i.e., add SPARQL SERVICES BioFed Engine
  • 6.
    BIOFED: SOURCE SELECTION Twosteps triple pattern-wise source selection: 1. Road Map lookup for predicate of each triple pattern  Select those sources that contain the predicate  Select all sources if predicate is unbound 2. If subject or object of triple pattern is bound  Send SPARQL ASK query to each of the selected source in step 1, asking for the complete triple pattern  Prune relevant sources that returns false for the SPARQL ASK query 6
  • 7.
    BIOFED: SOURCE SELECTION FedBench(LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } Source Selection Algorithm Triple pattern-wise source selection S1TP1 = //TP1 //TP3 //TP4 //TP5 //TP2 7 Step 1: Road Map lookup for rdf:type S2 S3 S4 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF S1 S2 S3 S4
  • 8.
    BIOFED: SOURCE SELECTION FedBench(LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } Source Selection Algorithm Triple pattern-wise source selection S1TP1 = //TP1 //TP3 //TP4 //TP5 //TP2 8 S2 S3 S4 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF Step 2: Prune step 1 sources using SPARQL ASK queries ASK{ ?president rdf:type dbpedia:President} S1 S2 S3 S4
  • 9.
    BIOFED: SOURCE SELECTION FedBench(LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } Source Selection Algorithm Triple pattern-wise source selection S1TP1 = //TP1 //TP3 //TP4 //TP5 //TP2 9 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF S1 S2 S3 S4
  • 10.
    MOTIVATION: SOURCE SELECTION 10 SourceSelection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF S1 S2 S3 S4
  • 11.
    MOTIVATION: SOURCE SELECTION 11 SourceSelection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF S1 S2 S3 S4
  • 12.
    MOTIVATION: SOURCE SELECTION 12 SourceSelection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF S1 S2 S3 S4
  • 13.
    MOTIVATION: SOURCE SELECTION 13 SourceSelection Algorithm Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = S1TP5 = S2 S4 FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } //TP1 //TP3 //TP4 //TP5 //TP2 DBpedia RDF KEGG RDF ChEBI RDF NYT RDF S1 S2 S3 S4
  • 14.
    BIOFED: QUERY RE-WRITING SPARQL1.0 To SPARQL 1.1 conversion 14 Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = S1TP5 = S2 S4SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . //TP1 ?president dbpedia:nationality dbpedia:United_States . //TP2 ?president dbpedia:party ?party . //TP3 ?x nyt:topicPage ?page . //TP4 ?x owl:sameAs ?president . //TP5 }
  • 15.
    BIOFED: QUERY RE-WRITING SPARQL1.0 To SPARQL 1.1 conversion  Combine triple patterns having same, one and only one relevant source 15 Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = S1TP5 = S2 S4 SELECT ?president ?party ?page WHERE { SERVICE <S1> { ?president rdf:type dbpedia:President . //TP1 ?president dbpedia:nationality dbpedia:United_States . //TP2 ?president dbpedia:party ?party . } //TP3 SERVICE <S4> { ?x nyt:topicPage ?page . } //TP4 ?x owl:sameAs ?president . //TP5 }
  • 16.
    BIOFED: QUERY RE-WRITING SPARQL1.0 To SPARQL 1.1 conversion  Combine triple patterns having same, one and only one relevant source  Use UNION and SERVICE for triple patterns with more than one relevant sources 16 Triple pattern-wise source selection S1TP1 = S1TP2 = S1TP3 = S4TP4 = S1TP5 = S2 S4 SELECT ?president ?party ?page WHERE { SERVICE <S1> { ?president rdf:type dbpedia:President . //TP1 ?president dbpedia:nationality dbpedia:United_States . //TP2 ?president dbpedia:party ?party . } //TP3 SERVICE <S4> { ?x nyt:topicPage ?page . } //TP4 { SERVICE<S1> { ?x owl:sameAs ?president . }} //TP5 UNION { SERVICE<S2> { ?x owl:sameAs ?president . }} //TP5 UNION { SERVICE<S4> { ?x owl:sameAs ?president . }} //TP5 }
  • 17.
  • 18.
  • 19.
  • 20.