SlideShare a Scribd company logo
SPARQL Querying Benchmarks
Muhammad Saleem, Ivan Ermilov, Axel-Cyrille Ngonga
Ngomo, Ricardo Usbeck, Michael Roder
https://sites.google.com/site/sqbenchmarks/
Tutorial at ISWC 2016, Kobe, Japan, 17/10/2016
Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig,
Germany
11/13/2016 1
Agenda
• Why benchmarks?
• Components and design principles
• Key features and choke points
• Centralized SPARQL benchmarks
• Federated SPARQL benchmarks
• Hands-on
• HOBBIT introduction
10:00 – 10:30
9:00 – 10:00
10:30 – 12:00
11/13/2016 2
Why Benchmarks?
• What tools I can use for my use case?
• Which tool best suit my use case and why?
• Which are the relevant measures?
• Which is the behavior of the existing engines?
• What are the limitations of the existing engines?
• How to improve existing engines?
11/13/2016 3
Benchmark Categories
• Micro benchmarks
 Specialized, detailed, very focused and easy to run
 Neglect larger picture
 Difficult to generalize results
 Do not use standardized metrics
 For example, Joins evaluation benchmark
• Standard benchmarks
 Generalized, well defined
 Standard metrics
 Complicated to run
 Systems are often optimized for benchmarks
 For example, Transaction Processing Council (TPC) benchmarks
• Real-life applications
11/13/2016 4
SPARQL Querying Benchmarks
• Centralized benchmarks
• Centralized repositories
• Query span over a single dataset
• Real or synthetic
• Examples: LUBM, SP2Bench, BSBM, WatDiv, DBPSB, FEASIBLE
• Federated benchmarks
• Multiple Interlinked datasets
• Query span over multiple datasets
• Real or synthetic
• Examples: FedBench, LargeRDFBench
5
Querying Benchmark Components
• Datasets (real or synthetic)
• Queries (real or synthetic)
• Performance metrics
• Execution rules
11/13/2016 6
Design Principles [L97]
• Relevant
• Understandable
• Good metrics
• Scalable
• Coverage
• Acceptance
• Repeatable
• Verifiable
11/13/2016 7
Choke Points: Technological Challenges [BNE14]
• CP1: Aggregation Performance
• CP2: Join Performance
• CP3: Data Access Locality (materialized views)
• CP4: Expression Calculation
• CP5: Correlated Sub-queries
• CP6: Parallelism and Concurrency
11/13/2016 8
RDF Querying Benchmarks Choke Points [FK16]
• CP1: Join Ordering
• CP2: Aggregation
• CP3: OPTIONAL and nested OPTIONAL clauses
• CP4: Reasoning
• CP5: Parallel execution of UNIONS
• CP6: FILTERS
• CP7: ORDERING
• CP8: Geo-spatial predicates
• CP9: Full Text
• CP10: Duplicate elimination
• CP11: Complex FILTER conditions
11/13/2016 9
SPARQL Queries as Directed Labelled Hyper-graphs (DLH)
[SNM15]
11/13/2016 10
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
11
DLH Of SPARQL Queries
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
12
DLH Of SPARQL Queries
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
dbpedia:
party
?party
13
DLH Of SPARQL Queries
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
?x
dbpedia:
party
?party
nyt:topic
Page
?page
14
DLH Of SPARQL Queries
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
?x
owl:
SameAS
dbpedia:
party
?party
nyt:topic
Page
?page
Star simple hybrid Tail of hyperedge
15
DLH Of SPARQL Queries
Key SPARQL Queries Characteristics
FEASIBLE [SNM15], WatDiv [AHO+14], LUBM [GPH05] identified:
• Query forms
 SELECT, DESCRIBE, ASK, CONSTRUCT
• Constructs
 UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY,
Negation
• Features
 Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean
join vertices degree, Mean triple pattern selectivity, Join selectivity, Query
runtime, Unbound predicates,
11/13/2016 16
Centralized SPARQL Querying Benchmarks
11/13/2016 17
Lehigh University Benchmark (LUBM) [GPH05]
• Synthetic RDF benchmark
• Test reasoning capabilities of triple stores
• Synthetic universities data generator
• 15 SPARQL 1.0 queries
• Query design criteria
 Input Size, Selectivity, Complexity, Logical inferencing
• Performance metrics
 Load time, Repository size, Query runtime, Query completeness and
soundness, Combined metric (runtime + completeness + soundness)
11/13/2016 18
LUBM Queries Choke Points [FK16]
# CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11
Q1
Q2 
Q3 
Q4  
Q5 
Q6 
Q7 
Q8 
Q9 
Q10 
Q11 
Q12  
Q13 
Q14
11/13/2016 19
Join Ordering
Reasoning
LUBM Queries Characteristic [SNM15]
11/13/2016 20
Queries 15
Query
Forms
SELECT 100.00%
ASK 0.00%
CONSTRUCT 0.00%
DESCRIBE 0.00%
Important
SPARQL
Constructs
UNION 0.00%
DISTINCT 0.00%
ORDER BY 0.00%
REGEX 0.00%
LIMIT 0.00%
OFFSET 0.00%
OPTIONAL 0.00%
FILTER 0.00%
GROUP BY 0.00%
Result Size
Min 3
Max 1.39E+04
Mean 4.96E+03
S.D. 1.14E+04
BGPs
Min 1
Max 1
Mean 1
S.D. 0
Triple
Patterns
Min 1
Max 6
Mean 3
S.D. 1.8126539
Join
Vertices
Min 0
Max 4
Mean 1.6
S.D. 1.4040757
Mean Join
Vertices
Degree
Min 0
Max 5
Mean 2.0222222
S.D. 1.2999796
Mean
Triple
Patterns
Selectivity
Min 0.0003212
Max 0.432
Mean 0.01
S.D. 0.0745
Query
Runtime
(ms)
Min 2
Max 3200
Mean 437.675
S.D. 320.34
SP2Bench[SHM+09]
• Synthetic RDF triple stores benchmark
• DBLP bibliographic synthetic data generator
• 12 SPARQL 1.0 queries
• Query design criteria
 SELECT, ASK SPARQL forms, Covers majority of SPARQL constructs
• Performance metrics
 Load time, Per query runtime, Arithmetic and geometric mean of overall
queries runtime, memory consumption
11/13/2016 21
SP2Bench Queries Choke Points [FK16]
# CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11
Q1 
Q2 
Q3 
Q4   
Q5   
Q6    
Q7    
Q8     
Q9  
Q10
Q11
Q12   
11/13/2016 22
Join Ordering FILTERS Duplicate Elimination
SP2Bench Queries Characteristic [SNM15]
11/13/2016 23
Queries 12
Query
Forms
SELECT 91.67%
ASK 8.33%
CONSTRUCT 0.00%
DESCRIBE 0.00%
Important
SPARQL
Constructs
UNION 16.67%
DISTINCT 41.67%
ORDER BY 16.67%
REGEX 0.00%
LIMIT 8.33%
OFFSET 8.33%
OPTIONAL 25.00%
FILTER 58.33%
GROUP BY 0.00%
Result Size
Min 1
Max 4.34E+07
Mean 4.55E+06
S.D. 1.37E+07
BGPs
Min 1
Max 3
Mean 1.5
S.D. 0.67419986
Triple
Patterns
Min 1
Max 13
Mean 5.91666667
S.D. 3.82475985
Join
Vertices
Min 0
Max 10
Mean 4.25
S.D. 3.79293602
Mean Join
Vertices
Degree
Min 0
Max 9
Mean 2.41342593
S.D. 2.26080826
Mean
Triple
Patterns
Selectivity
Min 6.5597E-05
Max 0.53980613
Mean 0.22180428
S.D. 0.20831387
Query
Runtime
(ms)
Min 7
Max 7.13E+05
Mean 2.83E+05
S.D. 5.26E+05
Berlin SPARQL Benchmark (BSBM) [ BS09]
• Synthetic RDF triple stores benchmark
• E-commerce use case synthetic data generator
• 20 Queries
 12 SPARQL 1.0 queries for explore, explore and update use cases
 8 SPARQL 1.1 analytical queries for business intelligence use case
• Query design criteria
 SELECT, DESCRIBE, CONSTRUCT SPARQL forms, Covers majority of SPARQL
constructs
• Performance metrics
 Load time, Query Mixes per Hour (QMpH), Queries per Second (QpS)
11/13/2016 24
BSBM Queries Choke Points [FK16]
# CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11
Q1    
Q2 
Q3   
Q4    
Q5    
Q6  
Q7   
Q8   
Q9 
Q10    
Q11 
Q12 
11/13/2016 25
Join Ordering
FILTERS
Result Ordering
BSBM Queries Characteristic [SNM15]
11/13/2016 26
Queries 20
Query
Forms
SELECT 80.00%
ASK 0.00%
CONSTRUCT 4.00%
DESCRIBE 16.00%
Important
SPARQL
Constructs
UNION 8.00%
DISTINCT 24.00%
ORDER BY 36.00%
REGEX 0.00%
LIMIT 36.00%
OFFSET 4.00%
OPTIONAL 52.00%
FILTER 52.00%
GROUP BY 0.00%
Result Size
Min 0
Max 31
Mean 8.312
S.D. 9.0308
BGPs
Min 1
Max 5
Mean 2.8
S.D. 1.7039
Triple
Patterns
Min 1
Max 15
Mean 9.32
S.D. 5.18
Join
Vertices
Min 0
Max 6
Mean 2.88
S.D. 1.8032
Mean Join
Vertices
Degree
Min 0
Max 4.5
Mean 3.05
S.D. 1.6375
Mean
Triple
Patterns
Selectivity
Min 9E-08
Max 0.0453
Mean 0.0105
S.D. 0.0142
Query
Runtime
(ms)
Min 5
Max 99
Mean 9.1
S.D. 14.564
DBpedia SPARQL Benchmark (DBSB) [MLA+14]
• Real benchmark generation framework based on
 DBpedia dataset with different sizes
 DBpedia query log mining
• Clustering log queries
 Name variables in triple patterns
 Select frequently executed queries
 Remove SPARQL keywords and prefixes
 Compute query similarity using Levenshtein string matching
 Compute query clusters using a soft graph clustering algorithm [NS09]
 Get queries templates (most frequently asked and uses more SPARQL constructs)
from clusters with > 5 queries
 Generate any number of queries from queries templates
11/13/2016 27
DBSB Queries Features
• Number of triple patterns
 Test the efficiency of join operations (CP1)
• SPARQL UNION & OPTIONAL constructs
 Handle parallel execution of Unions (CP5)
• Solution sequences & modifiers (DISTINCT)
 Efficiency of duplication elimination (CP10)
• Filter conditions and operators (FILTER, LANG, REGEX, STR)
 Efficiency of engines to execute filters as early as possible (CP6)
11/13/2016 28
DBSB Queries Features
• Queries are based on 25 templates
• Do not consider features such as number of join vertices, join vertex
degree, triple patterns selectivities or query execution times etc.
• Only consider SPARQL SELECT queries
• Not customizable for given use cases or needs of an application
11/13/2016 29
Recall: Key SPARQL Queries Characteristics
FEASIBLE [SNM15], WatDiv [AHO+14], LUBM [GPH05] identified:
• Query forms
 SELECT, DESCRIBE, ASK, CONSTRUCT
• Constructs
 UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY,
Negation
• Features
 Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean
join vertices degree, Mean triple pattern selectivity, Join selectivity, Query
runtime, Unbound predicates,
11/13/2016 30
DBSB Queries Characteristic [SNM15]
11/13/2016 31
Queries from
25 templates 125
Query
Forms
SELECT 100%
ASK 0%
CONSTRUCT 0%
DESCRIBE 0%
Important
SPARQL
Constructs
UNION 36%
DISTINCT 100%
ORDER BY 0%
REGEX 4%
LIMIT 0%
OFFSET 0%
OPTIONAL 32%
FILTER 48%
GROUP BY 0%
Result Size
Min 197
Max 4.62E+06
Mean 3.24E+05
S.D. 9.56E+05
BGPs
Min 1
Max 9
Mean 2.695652
S.D. 2.438979
Triple
Patterns
Min 1
Max 12
Mean 4.521739
S.D. 2.79398
Join
Vertices
Min 0
Max 3
Mean 1.217391
S.D. 1.126399
Mean Join
Vertices
Degree
Min 0
Max 5
Mean 1.826087
S.D. 1.435022
Mean
Triple
Patterns
Selectivity
Min 1.19E-05
Max 1
Mean 0.119288
S.D. 0.226966
Query
Runtime
(ms)
Min 11
Max 5.40E+04
Mean 1.07E+04
S.D. 1.73E+04
Waterloo SPARQL Diversity Test Suite (WatDiv) [AHO+14]
• Synthetic benchmark
 Synthetic data generator
 Synthetic query generator
• User-controlled data generator
 Entities to include
 Structuredness [DKS+11] of the dataset
 Probability of entity associations
 Cardinality of property associations
• Query design criteria
 Structural query features
 Data-driven query features
11/13/2016 32
WatDiv Query Design Criteria
• Structural features
 Number of triple patterns
 Join vertex count
 Join vertex degree
• Data-driven features
 Result size
 (Filtered) Triple Pattern (f-TP) selectivity
 BGP-Restricted f-TP selectivity
 Join-Restricted f-TP selectivity
11/13/2016 33
WatDiv Queries Generation
• Query Template Generator
 User-specified number of templates
 User specified template characteristics
• Query Generator
 Instantiates the query templates with terms (IRIs, literals etc.) from the RDF
dataset
 User-specified number of queries produced
11/13/2016 34
WatDiv Queries Characteristic [SNM15]
11/13/2016 35
Queries
templates 125
Query
Forms
SELECT 100.00%
ASK 0.00%
CONSTRUCT 0.00%
DESCRIBE 0.00%
Important
SPARQL
Constructs
UNION 0.00%
DISTINCT 0.00%
ORDER BY 0.00%
REGEX 0.00%
LIMIT 0.00%
OFFSET 0.00%
OPTIONAL 0.00%
FILTER 0.00%
GROUP BY 0.00%
Result Size
Min 0
Max 4.17E+09
Mean 3.49E+07
S.D. 3.73E+08
BGPs
Min 1
Max 1
Mean 1
S.D. 0
Triple
Patterns
Min 1
Max 12
Mean 5.328
S.D. 2.60823
Join
Vertices
Min 0
Max 5
Mean 1.776
S.D. 0.9989
Mean Join
Vertices
Degree
Min 0
Max 7
Mean 3.62427
S.D. 1.40647
Mean
Triple
Patterns
Selectivity
Min 0
Max 0.01176
Mean 0.00494
S.D. 0.00239
Query
Runtime
(ms)
Min 3
Max 8.82E+08
Mean 4.41E+08
S.D. 2.77E+07
FEASIBLE: Benchmark Generation Framework
[SNM15]
• Customizable benchmark generation framework
• Generate real benchmarks from queries log
• Can be applied to any SPARQL queries log
• Customizable for given use cases or needs of an application
11/13/2016 36
FEASIBLE Queries Selection Criteria
• Query forms
 SELECT, DESCRIBE, ASK, CONSTRUCT
• Constructs
 UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY,
Negation
• Features
 Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean
join vertices degree, Mean triple pattern selectivity, Join selectivity, Query
runtime, Unbound predicates
11/13/2016 37
FEASIBLE: Benchmark Generation Framework
• Dataset cleaning
• Feature vectors and normalization
• Selection of exemplars
• Selection of benchmark queries
38
Feature Vectors and Normalization
39
SELECT DISTINCT ?entita ?nome
WHERE
{
?entita rdf:type dbo:VideoGame .
?entita rdfs:label ?nome
FILTER regex(?nome, "konami", "i")
}
LIMIT 100
Query Type: SELECT
Results Size: 13
Basic Graph Patterns (BGPs): 1
Triple Patterns: 2
Join Vertices: 1
Mean Join Vertices Degree: 2.0
Mean triple patterns selectivity: 0.01709761619798973
UNION: No
DISTINCT: Yes
ORDER BY: No
REGEX: Yes
LIMIT: Yes
OFFSET: No
OPTIONAL: No
FILTER: Yes
GROUP BY: No
Runtime (ms): 65
13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65
0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14
Feature Vector
Normalized Feature Vector
FEASIBLE
40
Plot feature vectors in a multidimensional space
Query F1 F2
Q1 0.2 0.2
Q2 0.5 0.3
Q3 0.8 0.3
Q4 0.9 0.1
Q5 0.5 0.5
Q6 0.2 0.7
Q7 0.1 0.8
Q8 0.13 0.65
Q9 0.9 0.5
Q10 0.1 0.5
Suppose we need a benchmark of 3 queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
41
Calculate average point
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
42
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point of minimum Euclidean distance to avg. point
*Red is our first exemplar
FEASIBLE
43
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point that is farthest to exemplars
FEASIBLE
44
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
45
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point that is farthest to exemplars
FEASIBLE
46
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
47
Calculate distance from Q1 to each exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
48
Assign Q1 to the minimum distance exemplar
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
49
Repeat the process for Q2
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
50
Repeat the process for Q3
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
51
Repeat the process for Q6
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
52
Repeat the process for Q8
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
53
Repeat the process for Q9
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
54
Repeat the process for Q10
FEASIBLE
55
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate Average across each cluster
FEASIBLE
56
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate distance of each point in cluster to the average
FEASIBLE
57
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance query as the final benchmark
query from that cluster
Purple, i.e., Q2 is the final selected query from yellow cluster
FEASIBLE
58
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance query as the final benchmark
query from that cluster
Purple, i.e., Q3 is the final selected query from green cluster
FEASIBLE
59
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance query as the final benchmark
query from that cluster
Purple, i.e., Q8 is the final selected query from brown cluster
Our benchmark queries are Q2, Q3, and Q8
Comparison of Composite Error
60
FEASIBLE’s composite error is 54.9% less than DBPSB
Rank-wise Ranking of Triple Stores
61
All values are in percentages
 None of the system is sole winner or loser for a particular rank
 Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%)
 Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%)
 OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %)
 Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)
FEASIBLE(DBpedia) Queries Characteristic
[SNM15]
11/13/2016 62
Queries 125
Query
Forms
SELECT 95.20%
ASK 0.00%
CONSTRUCT 4.00%
DESCRIBE 0.80%
Important
SPARQL
Constructs
UNION 40.80%
DISTINCT 52.80%
ORDER BY 28.80%
REGEX 14.40%
LIMIT 38.40%
OFFSET 18.40%
OPTIONAL 30.40%
FILTER 58.40%
GROUP BY 0.80%
Result Size
Min 1
Max 1.41E+06
Mean 52183
S.D. 1.97E+05
BGPs
Min 1
Max 14
Mean 3.176
S.D. 3.55841574
Triple
Patterns
Min 1
Max 18
Mean 4.88
S.D. 4.396846377
Join
Vertices
Min 0
Max 11
Mean 1.296
S.D. 2.39294662
Mean Join
Vertices
Degree
Min 0
Max 11
Mean 1.44906666
S.D. 2.13246612
Mean Triple
Patterns
Selectivity
Min 2.86693E-09
Max 1
Mean
0.14021433
7
S.D. 0.31899488
Query
Runtime
(ms)
Min 2
Max 3.22E+04
Mean 2242.6
S.D. 6961.99191
FEASIBLE(SWDF) Queries Characteristic
[SNM15]
11/13/2016 63
Queries 125
Query
Forms
SELECT 92.80%
ASK 2.40%
CONSTRUCT 3.20%
DESCRIBE 1.60%
Important
SPARQL
Constructs
UNION 32.80%
DISTINCT 50.40%
ORDER BY 25.60%
REGEX 16.00%
LIMIT 45.60%
OFFSET 20.80%
OPTIONAL 32.00%
FILTER 29.60%
GROUP BY 19.20%
Result Size
Min 1
Max 3.01E+05
Mean 9091.512
S.D. 4.70E+04
BGPs
Min 0
Max 14
Mean 2.688
S.D. 2.812460752
Triple
Patterns
Min 0
Max 14
Mean 3.232
S.D. 2.76246734
Join
Vertices
Min 0
Max 3
Mean 0.52
S.D. 0.65500554
Mean Join
Vertices
Degree
Min 0
Max 4
Mean 0.968
S.D. 1.09202386
Mean Triple
Patterns
Selectivity
Min 1.06097E-05
Max 1
Mean 0.29192835
S.D.
0.32513860
1
Query
Runtime
(ms)
Min 4
Max 4.13E+04
Mean 1308.832
S.D. 5335.44123
Others Useful Benchmarks
• Semantic Publishing Benchmark (SPB)
• UniProt [RU09][UniprotKB]
• YAGO (Yet Another Great Ontology)[SKW07]
• Barton Library [Barton]
• Linked Sensor Dataset [PHS10]
• WordNet [WordNet]
• Publishing TPC-H as RDF [TPC-H]
• Apples and Oranges [DKS+11]
11/13/2016 64
Summary of the centralized SPARQL querying benchmarks
11/13/2016 65
Centralized SPARQL Querying Benchmarks
Summary [SNM15]
11/13/2016 66
LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog
Queries 15 125 12 125 125 125 130466 125 64030
Basic
Query
Forms
SELECT 100.00% 80.00% 91.67% 100.00% 100% 95.20% 97.964987 92.80% 58.7084
ASK 0.00% 0.00% 8.33% 0.00% 0% 0.00% 1.93% 2.40% 0.09%
CONSTRUCT 0.00% 4.00% 0.00% 0.00% 0% 4.00% 0.09% 3.20% 0.04%
DESCRIBE 0.00% 16.00% 0.00% 0.00% 0% 0.80% 0.02% 1.60% 41.17%
11/13/2016 67
LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog
Important
SPARQL
Construct
s
UNION 0.00% 8.00% 16.67% 0.00% 36% 40.80% 7.97% 32.80% 29.32%
DISTINCT 0.00% 24.00% 41.67% 0.00% 100% 52.80% 4.16% 50.40% 34.18%
ORDER BY 0.00% 36.00% 16.67% 0.00% 0% 28.80% 0.30% 25.60% 10.67%
REGEX 0.00% 0.00% 0.00% 0.00% 4% 14.40% 0.21% 16.00% 0.03%
LIMIT 0.00% 36.00% 8.33% 0.00% 0% 38.40% 0.40% 45.60% 1.79%
OFFSET 0.00% 4.00% 8.33% 0.00% 0% 18.40% 0.03% 20.80% 0.14%
OPTIONAL 0.00% 52.00% 25.00% 0.00% 32% 30.40% 20.11% 32.00% 29.52%
FILTER 0.00% 52.00% 58.33% 0.00% 48% 58.40% 93.38% 29.60% 0.72%
GROUP BY 0.00% 0.00% 0.00% 0.00% 0% 0.80% 7.66E-06 19.20% 1.34%
Centralized SPARQL Querying Benchmarks
Summary [SNM15]
Centralized SPARQL Querying Benchmarks
Summary [SNM15]
11/13/2016 68
LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog
Result
Size
Min 3 0 1 0 197 1 1 1 1
Max 1.39E+04 31 4.34E+07 4.17E+09 4.62E+06 1.41E+06 1.41E+06 3.01E+05 3.01E+05
Mean 4.96E+03 8.312 4.55E+06 3.49E+07 3.24E+05 52183 404.000307 9091.512 39.5068
S.D 1.14E+04 9.0308 1.37E+07 3.73E+08 9.56E+05 1.97E+05 12932.2472 4.70E+04 2208.7
BGPs
Min 1 1 1 1 1 1 0 0 0
Max 1 5 3 1 9 14 14 14 14
Mean 1 2.8 1.5 1 2.695652 3.176 1.67629114 2.688 2.28603
S.D 0 1.7039 0.67419986 0 2.438979 3.55841574 1.66075812 2.81246075 2.94057
Triple
Patterns
Min 1 1 1 1 1 1 0 0 0
Max 6 15 13 12 12 18 18 14 14
Mean 3 9.32 5.91666667 5.328 4.521739 4.88 1.7062683 3.232 2.50928
S.D 1.812653 5.18 3.82475985 2.60823 2.79398 4.396846377 1.68639622 2.76246734 3.21393
Join
Vertices
Min 0 0 0 0 0 0 0 0 0
Max 4 6 10 5 3 11 11 3 3
Mean 1.6 2.88 4.25 1.776 1.217391 1.296 0.02279521 0.52 0.18076
S.D 1.40407 1.8032 3.79293602 0.9989 1.126399 2.392946625 0.23381101 0.65500554 0.45669
Centralized SPARQL Querying Benchmarks
Summary [SNM15]
11/13/2016 69
LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog
Mean
Join
Vertices
Degree
Min 0 0 0 0 0 0 0 0 0
Max 5 4.5 9 7 5 11 11 4 5
Mean 2.02222 3.05 2.4134259 3.62427 1.826087 1.449066667 0.04159183 0.968 0.37006
S.D 1.29997 1.6375 2.2608082 1.40647 1.435022 2.132466121 0.33443107 1.092023868 0.87378
Mean
Triple
Patterns
Selectivi
ty
Min 0.00032 9E-08 6.559E-05 0 1.19E-05 2.86693E-09 1.261E-05 1.06097E-05 1.1E-05
Max 0.432 0.0453 0.5398061 0.01176 1 1 1 1 1
Mean 0.01 0.0105 0.2218042 0.00494 0.119288 0.140214337 0.00578652 0.29192835 0.02381
S.D 0.0745 0.0142 0.2083138 0.00239 0.226966 0.318994887 0.03669906 0.325138601 0.07857
Query
Runtime
Min 2 5 7 3 11 2 1 4 3
Max 3200 99 7.13E+05 8.82E+08 5.40E+04 3.22E+04 5.60E+04 4.13E+04 4.13E+04
Mean 437.675 9.1 2.83E+05 4.41E+08 1.07E+04 2242.6 30.4185995 1308.832 16.1632
S.D 320.34 14.564 5.26E+05 2.77E+07 1.73E+04 6961.991912 702.518249 5335.441231 249.674
Federated SPARQL Querying Benchmarks
11/13/2016 70
Federated Query
• Return the party membership and news pages about all US
presidents.
 Party memberships
 US presidents
 US presidents
 News pages
71
Computation of results require data from both sources
Federated SPARQL Query Processing
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Rewrite query
and get Individual
Triple Patterns
Identify
capable/relevant
sources
Generate
optimized query
Execution Plan
Integrate sub-
queries results
Execute sub-
queries
Federation
Engine
72
SPARQL Query Federation Approaches
• SPARQL Endpoint Federation (SEF)
• Linked Data Federation (LDF)
• Hybrid of SEF+LDF
11/13/2016 73
SPLODGE [SP+12]
• Federated benchmarks generation tool
• Query design criteria
 Query form
 Join type
 Result modifiers: DISTINCT, LIMIT, OFFSET, ORDER BY
 Variable triple patterns
 Triple patterns joins
 Cross product triple patterns
 Number of sources
 Number Join vertices
 Query selectivity
• Non-conjunctive queries that make use of the SPARQL UNION, OPTIONAL
are not considered
11/13/2016 74
FedBench [FB+11]
• Based on 9 real interconnected datasets
 KEGG, DrugBank, ChEDI from life sciences
 DBpedia, GeoNames, Jamendo, SWDF, NYT, LMDB from cross domain
 Vary in structuredness and sizes
• Four sets of queries
 7 life sciences queries
 7 cross domain queries
 11 Linked Data queries
 14 queries from SP2Bench
11/13/2016 75
FedBench Queries Characteristic
11/13/2016 76
Queries 25
Query
Forms
SELECT 100.00%
ASK 0.00%
CONSTRUCT 0.00%
DESCRIBE 0.00%
Important
SPARQL
Constructs
UNION 12%
DISTINCT 0.00%
ORDER BY 0.00%
REGEX 0.00%
LIMIT 0.00%
OFFSET 0.00%
OPTIONAL 4%
FILTER 4%
GROUP BY 0.00%
Result Size
Min 1
Max 9054
Mean 529
S.D. 1764
BGPs
Min 1
Max 2
Mean 1.16
S.D. 0.37
Triple
Patterns
Min 2
Max 7
Mean 4
S.D. 1.25
Join
Vertices
Min 0
Max 5
Mean 2.52
S.D. 1.26
Mean Join
Vertices
Degree
Min 0
Max 3
Mean 2.14
S.D. 0.56
Mean
Triple
Patterns
Selectivity
Min 0.001
Max 1
Mean 0.05
S.D. 0.092
Query
Runtime
(ms)
Min 50
Max 1.2E+4
Mean 1987
S.D. 3950
LargeRDFBench
• 32 Queries
 10 simple
 10 complex
 8 large data
• 14 Interlined datasets
77
Linked
MDB
DBpedia
New
York
Times
Linked
TCGA-
M
Linked
TCGA-
E
Linked
TCGA-
A
Affymetr
ix
SW
Dog
Food
KEGG
Drug
bank
Jamend
o
ChEBI
Geo
names
basedNear owl:sameAs
x-geneid
#Links: 251.3k
country, ethnicity, race
keggCompoundId
bcr_patient_barcode
Same instance
Life Sciences Cross Domain Large Data
bcr_patient_barcode
#Links: 1.7k
#Links: 4.1k
#Links: 21.7k
#Links: 1.3k
LargeRDFBench Datasets Statistics
78
LargeRDFBench Queries Properties
• 14 Simple
 2-7 triple patterns
 Subset of SPARQL clauses
 Query execution time around 2 seconds on avg.
• 10 Complex
 8-13 triple patterns
 Use more SPARQL clauses
 Query execution time up to 10 min
• 8 Large Data
 Minimum 80459 results
 Large intermediate results
 Query execution time in hours
11/13/2016 79
LargeRDFBench Queries Characteristic
11/13/2016 80
Queries 32
Query
Forms
SELECT 100.00%
ASK 0.00%
CONSTRUCT 0.00%
DESCRIBE 0.00%
Important
SPARQL
Constructs
UNION 18.75%
DISTINCT 28.21%
ORDER BY 9.37%
REGEX 3.12%
LIMIT 12.5%
OFFSET 0.00%
OPTIONAL 25%
FILTER 31.25%
GROUP BY 0.00%
Result Size
Min 1
Max 3.0E+5
Mean 5.9E+4
S.D. 1.1E+5
BGPs
Min 1
Max 2
Mean 1.43
S.D. 0.5
Triple
Patterns
Min 2
Max 12
Mean 6.6
S.D. 2.6
Join
Vertices
Min 0
Max 6
Mean 3.43
S.D. 1.36
Mean Join
Vertices
Degree
Min 0
Max 6
Mean 2.56
S.D. 0.76
Mean
Triple
Patterns
Selectivity
Min 0.001
Max 1
Mean 0.10
S.D. 0.14
Query
Runtime
(ms)
Min 159
Max >1hr
Mean Undefined
S.D. Undefined
FedBench vs. LargeRDFBench
11/13/2016 81
Performance Metrics
• Efficient source selection in terms of
• Total triple pattern-wise sources selected
• Total number of SPARQL ASK requests used during source selection
• Source selection time
• Query execution time
• Results completeness and correctness
• Number of remote requests during query execution
• Index compression ratio (1- index size/datadump size)
• Number of intermediate results
11/13/2016 82
Future Directions
• Micro benchmarking
• Synthetic benchmarks generation
 Synthetic data that is like real data
 Synthetic queries that is like real queries
• Customizable and flexible benchmark generation
• Fits user needs
• Fits current use-case
• What are the most important choke points for SPARQL querying
benchmarks? How they are related to query performance?
11/13/2016 83
References
• [L97] Charles Levine. TPC-C: The OLTP Benchmark. In SIGMOD – Industrial
Session, 1997.
• [GPH05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A Benchmark for OWL Knowledge
Base Systems. Journal Web Semantics: Science, Services and Agents on the World
Wide Web archive Volume 3 Issue 2-3, October, 2005 , Pages 158-182
• [SHM+09] M. Schmidt , T. Hornung, M. Meier, C. Pinkel, G. Lausen. SP2Bench: A
SPARQL Performance Benchmark. Semantic Web Information Management, 2009.
• [BS09] C. Bizer and A. Schultz. The Berlin SPARQL Benchmark. Int. J. Semantic
Web and Inf. Sys., 5(2), 2009.
• [BSBM] Berlin SPARQL Benchmark (BSBM) Specification - V3.1. http://wifo5-
3.informatik.unimannheim.de/bizer/berlinsparqlbenchmark/spec/index.html.
• [RU09] N. Redaschi and UniProt Consortium. UniProt in RDF: Tackling Data
Integration and Distributed Annotation with the Semantic Web. In Biocuration
Conference, 2009.
11/13/2016 84
References
• [UniProtKB] UniProtKB Queries. http://www.uniprot.org/help/query-fields.
• [SKW07]F. M. Suchanek, G. Kasneci and G. Weikum. YAGO: A Core of Semantic Knowledge
Unifying WordNet and Wikipedia, In WWW 2007.
• [Barton] The MIT Barton Library dataset. http://simile.mit.edu/rdf-test-data/
• [PHS10] H. Patni, C. Henson, and A. Sheth. Linked sensor data. 2010
• [TPC-H] The TPC-H Homepage. http://www.tpc.org/tpch/
• [WordNet] WordNet: A lexical database for English. http://wordnet.princeton.edu/
• [MLA+14] M. Morsey, J. Lehmann, S. Auer, A-C. Ngonga Ngomo. Dbpedia SPARQL Benchmark
• [SP+12] Görlitz, Olaf, Matthias Thimm, and Steffen Staab. Splodge: Systematic generation of
sparql benchmark queries for linked open data. International Semantic Web Conference. Springer
Berlin Heidelberg, 2012.
• [BNE14] P. Boncz, T. Neumann, O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned
from an Influential Benchmark. Performance Characterization and Benchmarking. In TPCTC 2013,
Revised Selected Papers.
11/13/2016 85
References
• [NS09] A–C. Ngonga Ngomo and D. Schumacher. Borderflow: A local graph clustering algorithm
for natural language processing. In CICLing, 2009.
• [AHO+14]G. Aluc, O. Hartig, T. Ozsu, K. Daudjee. Diversifed Stress Testing of RDF Data
Management Systems. In ISWC, 2014.
• [SMN15] M. Saleem, Q. Mehmood, and A–C. Ngonga Ngomo. FEASIBLE: A Feature-Based SPARQL
Benchmark Generation Framework. ISWC 2015.
• [DKS+11] S. Duan, A. Kementsietsidis, Kavitha Srinivas and Octavian Udrea. Apples and oranges: a
comparison of RDF benchmarks and real RDF datasets. In SIGMOD, 2011.
• [FK16] I.Fundulaki, A.Kementsietsidis Assessing the performance of RDF Engines: Discussing RDF
Benchmarks, Tutorial at ESWC2016
• [FB+11] Schmidt, Michael, et al. Fedbench: A benchmark suite for federated semantic data query
processing. International Semantic Web Conference. Springer Berlin Heidelberg, 2011.
• [LB+16] M.Saleem, A.Hasnain, A–C. Ngonga Ngomo. LargeRDFBench: A Billion Triples Benchmark
for SPARQL Query Federation, Submitted to Journal of Web Semantics
11/13/2016 86
Thanks
{lastname}@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany
11/13/2016 87
This work was supported by grands from BMWi project SAKE and the EU H2020
Framework Programme provided for the project HOBBIT (GA no. 688227).

More Related Content

Viewers also liked

Link Discovery Tutorial Part II: Accuracy
Link Discovery Tutorial Part II: AccuracyLink Discovery Tutorial Part II: Accuracy
Link Discovery Tutorial Part II: Accuracy
Holistic Benchmarking of Big Linked Data
 
Link Discovery Tutorial Introduction
Link Discovery Tutorial IntroductionLink Discovery Tutorial Introduction
Link Discovery Tutorial Introduction
Holistic Benchmarking of Big Linked Data
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...
Holistic Benchmarking of Big Linked Data
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
Holistic Benchmarking of Big Linked Data
 
Versioning for Linked Data: Archiving Systems and Benchmarks
Versioning for Linked Data: Archiving Systems and BenchmarksVersioning for Linked Data: Archiving Systems and Benchmarks
Versioning for Linked Data: Archiving Systems and Benchmarks
Holistic Benchmarking of Big Linked Data
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Holistic Benchmarking of Big Linked Data
 
Link Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-OnLink Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-On
Holistic Benchmarking of Big Linked Data
 
Link Discovery Tutorial Part I: Efficiency
Link Discovery Tutorial Part I: EfficiencyLink Discovery Tutorial Part I: Efficiency
Link Discovery Tutorial Part I: Efficiency
Holistic Benchmarking of Big Linked Data
 
Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016
Holistic Benchmarking of Big Linked Data
 

Viewers also liked (9)

Link Discovery Tutorial Part II: Accuracy
Link Discovery Tutorial Part II: AccuracyLink Discovery Tutorial Part II: Accuracy
Link Discovery Tutorial Part II: Accuracy
 
Link Discovery Tutorial Introduction
Link Discovery Tutorial IntroductionLink Discovery Tutorial Introduction
Link Discovery Tutorial Introduction
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
 
Versioning for Linked Data: Archiving Systems and Benchmarks
Versioning for Linked Data: Archiving Systems and BenchmarksVersioning for Linked Data: Archiving Systems and Benchmarks
Versioning for Linked Data: Archiving Systems and Benchmarks
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Link Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-OnLink Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-On
 
Link Discovery Tutorial Part I: Efficiency
Link Discovery Tutorial Part I: EfficiencyLink Discovery Tutorial Part I: Efficiency
Link Discovery Tutorial Part I: Efficiency
 
Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016
 

Similar to SPARQL Querying Benchmarks ISWC2016

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
Muhammad Saleem
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
Swiss Data Forum Swiss Data Forum
 
Vlfm2014 cims-4 group technology
Vlfm2014 cims-4 group technologyVlfm2014 cims-4 group technology
Vlfm2014 cims-4 group technology
shivaniradhu
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
t_ivanov
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
Maximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereMaximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL Anywhere
SAP Technology
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
Karel Zikmund
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
Ivo Andreev
 
Why MongoDB over other Databases - Habilelabs
Why MongoDB over other Databases - HabilelabsWhy MongoDB over other Databases - Habilelabs
Why MongoDB over other Databases - Habilelabs
Habilelabs
 
Perf tuning with-multitenant
Perf tuning with-multitenantPerf tuning with-multitenant
Perf tuning with-multitenant
Jacques Kostic
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016
Niko Neugebauer
 
JEEConf 2016. Effectiveness and code optimization in Java applications
JEEConf 2016. Effectiveness and code optimization in  Java applicationsJEEConf 2016. Effectiveness and code optimization in  Java applications
JEEConf 2016. Effectiveness and code optimization in Java applications
Strannik_2013
 
3GPP SON Series: An Introduction to Self-Organizing Networks (SON)
3GPP SON Series: An Introduction to Self-Organizing Networks (SON)3GPP SON Series: An Introduction to Self-Organizing Networks (SON)
3GPP SON Series: An Introduction to Self-Organizing Networks (SON)
3G4G
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
Institute of Contemporary Sciences
 
Cache issues from T-SQL-generated Plans and How to Manage Them
Cache issues from T-SQL-generated Plans and How to Manage ThemCache issues from T-SQL-generated Plans and How to Manage Them
Cache issues from T-SQL-generated Plans and How to Manage Them
SQLDBApros
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld
 
Dst
DstDst

Similar to SPARQL Querying Benchmarks ISWC2016 (20)

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Vlfm2014 cims-4 group technology
Vlfm2014 cims-4 group technologyVlfm2014 cims-4 group technology
Vlfm2014 cims-4 group technology
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
Maximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereMaximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL Anywhere
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
Why MongoDB over other Databases - Habilelabs
Why MongoDB over other Databases - HabilelabsWhy MongoDB over other Databases - Habilelabs
Why MongoDB over other Databases - Habilelabs
 
Perf tuning with-multitenant
Perf tuning with-multitenantPerf tuning with-multitenant
Perf tuning with-multitenant
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016
 
JEEConf 2016. Effectiveness and code optimization in Java applications
JEEConf 2016. Effectiveness and code optimization in  Java applicationsJEEConf 2016. Effectiveness and code optimization in  Java applications
JEEConf 2016. Effectiveness and code optimization in Java applications
 
3GPP SON Series: An Introduction to Self-Organizing Networks (SON)
3GPP SON Series: An Introduction to Self-Organizing Networks (SON)3GPP SON Series: An Introduction to Self-Organizing Networks (SON)
3GPP SON Series: An Introduction to Self-Organizing Networks (SON)
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Cache issues from T-SQL-generated Plans and How to Manage Them
Cache issues from T-SQL-generated Plans and How to Manage ThemCache issues from T-SQL-generated Plans and How to Manage Them
Cache issues from T-SQL-generated Plans and How to Manage Them
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
 
Dst
DstDst
Dst
 

More from Muhammad Saleem

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
Muhammad Saleem
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
Muhammad Saleem
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
Muhammad Saleem
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
Muhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
Muhammad Saleem
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
Muhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
Muhammad Saleem
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Muhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
Muhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
Muhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
Muhammad Saleem
 

More from Muhammad Saleem (16)

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Recently uploaded

CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
HODECEDSIET
 

Recently uploaded (20)

CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
 

SPARQL Querying Benchmarks ISWC2016

  • 1. SPARQL Querying Benchmarks Muhammad Saleem, Ivan Ermilov, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck, Michael Roder https://sites.google.com/site/sqbenchmarks/ Tutorial at ISWC 2016, Kobe, Japan, 17/10/2016 Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany 11/13/2016 1
  • 2. Agenda • Why benchmarks? • Components and design principles • Key features and choke points • Centralized SPARQL benchmarks • Federated SPARQL benchmarks • Hands-on • HOBBIT introduction 10:00 – 10:30 9:00 – 10:00 10:30 – 12:00 11/13/2016 2
  • 3. Why Benchmarks? • What tools I can use for my use case? • Which tool best suit my use case and why? • Which are the relevant measures? • Which is the behavior of the existing engines? • What are the limitations of the existing engines? • How to improve existing engines? 11/13/2016 3
  • 4. Benchmark Categories • Micro benchmarks  Specialized, detailed, very focused and easy to run  Neglect larger picture  Difficult to generalize results  Do not use standardized metrics  For example, Joins evaluation benchmark • Standard benchmarks  Generalized, well defined  Standard metrics  Complicated to run  Systems are often optimized for benchmarks  For example, Transaction Processing Council (TPC) benchmarks • Real-life applications 11/13/2016 4
  • 5. SPARQL Querying Benchmarks • Centralized benchmarks • Centralized repositories • Query span over a single dataset • Real or synthetic • Examples: LUBM, SP2Bench, BSBM, WatDiv, DBPSB, FEASIBLE • Federated benchmarks • Multiple Interlinked datasets • Query span over multiple datasets • Real or synthetic • Examples: FedBench, LargeRDFBench 5
  • 6. Querying Benchmark Components • Datasets (real or synthetic) • Queries (real or synthetic) • Performance metrics • Execution rules 11/13/2016 6
  • 7. Design Principles [L97] • Relevant • Understandable • Good metrics • Scalable • Coverage • Acceptance • Repeatable • Verifiable 11/13/2016 7
  • 8. Choke Points: Technological Challenges [BNE14] • CP1: Aggregation Performance • CP2: Join Performance • CP3: Data Access Locality (materialized views) • CP4: Expression Calculation • CP5: Correlated Sub-queries • CP6: Parallelism and Concurrency 11/13/2016 8
  • 9. RDF Querying Benchmarks Choke Points [FK16] • CP1: Join Ordering • CP2: Aggregation • CP3: OPTIONAL and nested OPTIONAL clauses • CP4: Reasoning • CP5: Parallel execution of UNIONS • CP6: FILTERS • CP7: ORDERING • CP8: Geo-spatial predicates • CP9: Full Text • CP10: Duplicate elimination • CP11: Complex FILTER conditions 11/13/2016 9
  • 10. SPARQL Queries as Directed Labelled Hyper-graphs (DLH) [SNM15] 11/13/2016 10
  • 11. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President 11 DLH Of SPARQL Queries
  • 12. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality 12 DLH Of SPARQL Queries
  • 13. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality dbpedia: party ?party 13 DLH Of SPARQL Queries
  • 14. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality ?x dbpedia: party ?party nyt:topic Page ?page 14 DLH Of SPARQL Queries
  • 15. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_States dbpedia: nationality ?x owl: SameAS dbpedia: party ?party nyt:topic Page ?page Star simple hybrid Tail of hyperedge 15 DLH Of SPARQL Queries
  • 16. Key SPARQL Queries Characteristics FEASIBLE [SNM15], WatDiv [AHO+14], LUBM [GPH05] identified: • Query forms  SELECT, DESCRIBE, ASK, CONSTRUCT • Constructs  UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY, Negation • Features  Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean join vertices degree, Mean triple pattern selectivity, Join selectivity, Query runtime, Unbound predicates, 11/13/2016 16
  • 17. Centralized SPARQL Querying Benchmarks 11/13/2016 17
  • 18. Lehigh University Benchmark (LUBM) [GPH05] • Synthetic RDF benchmark • Test reasoning capabilities of triple stores • Synthetic universities data generator • 15 SPARQL 1.0 queries • Query design criteria  Input Size, Selectivity, Complexity, Logical inferencing • Performance metrics  Load time, Repository size, Query runtime, Query completeness and soundness, Combined metric (runtime + completeness + soundness) 11/13/2016 18
  • 19. LUBM Queries Choke Points [FK16] # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1 Q2  Q3  Q4   Q5  Q6  Q7  Q8  Q9  Q10  Q11  Q12   Q13  Q14 11/13/2016 19 Join Ordering Reasoning
  • 20. LUBM Queries Characteristic [SNM15] 11/13/2016 20 Queries 15 Query Forms SELECT 100.00% ASK 0.00% CONSTRUCT 0.00% DESCRIBE 0.00% Important SPARQL Constructs UNION 0.00% DISTINCT 0.00% ORDER BY 0.00% REGEX 0.00% LIMIT 0.00% OFFSET 0.00% OPTIONAL 0.00% FILTER 0.00% GROUP BY 0.00% Result Size Min 3 Max 1.39E+04 Mean 4.96E+03 S.D. 1.14E+04 BGPs Min 1 Max 1 Mean 1 S.D. 0 Triple Patterns Min 1 Max 6 Mean 3 S.D. 1.8126539 Join Vertices Min 0 Max 4 Mean 1.6 S.D. 1.4040757 Mean Join Vertices Degree Min 0 Max 5 Mean 2.0222222 S.D. 1.2999796 Mean Triple Patterns Selectivity Min 0.0003212 Max 0.432 Mean 0.01 S.D. 0.0745 Query Runtime (ms) Min 2 Max 3200 Mean 437.675 S.D. 320.34
  • 21. SP2Bench[SHM+09] • Synthetic RDF triple stores benchmark • DBLP bibliographic synthetic data generator • 12 SPARQL 1.0 queries • Query design criteria  SELECT, ASK SPARQL forms, Covers majority of SPARQL constructs • Performance metrics  Load time, Per query runtime, Arithmetic and geometric mean of overall queries runtime, memory consumption 11/13/2016 21
  • 22. SP2Bench Queries Choke Points [FK16] # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1  Q2  Q3  Q4    Q5    Q6     Q7     Q8      Q9   Q10 Q11 Q12    11/13/2016 22 Join Ordering FILTERS Duplicate Elimination
  • 23. SP2Bench Queries Characteristic [SNM15] 11/13/2016 23 Queries 12 Query Forms SELECT 91.67% ASK 8.33% CONSTRUCT 0.00% DESCRIBE 0.00% Important SPARQL Constructs UNION 16.67% DISTINCT 41.67% ORDER BY 16.67% REGEX 0.00% LIMIT 8.33% OFFSET 8.33% OPTIONAL 25.00% FILTER 58.33% GROUP BY 0.00% Result Size Min 1 Max 4.34E+07 Mean 4.55E+06 S.D. 1.37E+07 BGPs Min 1 Max 3 Mean 1.5 S.D. 0.67419986 Triple Patterns Min 1 Max 13 Mean 5.91666667 S.D. 3.82475985 Join Vertices Min 0 Max 10 Mean 4.25 S.D. 3.79293602 Mean Join Vertices Degree Min 0 Max 9 Mean 2.41342593 S.D. 2.26080826 Mean Triple Patterns Selectivity Min 6.5597E-05 Max 0.53980613 Mean 0.22180428 S.D. 0.20831387 Query Runtime (ms) Min 7 Max 7.13E+05 Mean 2.83E+05 S.D. 5.26E+05
  • 24. Berlin SPARQL Benchmark (BSBM) [ BS09] • Synthetic RDF triple stores benchmark • E-commerce use case synthetic data generator • 20 Queries  12 SPARQL 1.0 queries for explore, explore and update use cases  8 SPARQL 1.1 analytical queries for business intelligence use case • Query design criteria  SELECT, DESCRIBE, CONSTRUCT SPARQL forms, Covers majority of SPARQL constructs • Performance metrics  Load time, Query Mixes per Hour (QMpH), Queries per Second (QpS) 11/13/2016 24
  • 25. BSBM Queries Choke Points [FK16] # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1     Q2  Q3    Q4     Q5     Q6   Q7    Q8    Q9  Q10     Q11  Q12  11/13/2016 25 Join Ordering FILTERS Result Ordering
  • 26. BSBM Queries Characteristic [SNM15] 11/13/2016 26 Queries 20 Query Forms SELECT 80.00% ASK 0.00% CONSTRUCT 4.00% DESCRIBE 16.00% Important SPARQL Constructs UNION 8.00% DISTINCT 24.00% ORDER BY 36.00% REGEX 0.00% LIMIT 36.00% OFFSET 4.00% OPTIONAL 52.00% FILTER 52.00% GROUP BY 0.00% Result Size Min 0 Max 31 Mean 8.312 S.D. 9.0308 BGPs Min 1 Max 5 Mean 2.8 S.D. 1.7039 Triple Patterns Min 1 Max 15 Mean 9.32 S.D. 5.18 Join Vertices Min 0 Max 6 Mean 2.88 S.D. 1.8032 Mean Join Vertices Degree Min 0 Max 4.5 Mean 3.05 S.D. 1.6375 Mean Triple Patterns Selectivity Min 9E-08 Max 0.0453 Mean 0.0105 S.D. 0.0142 Query Runtime (ms) Min 5 Max 99 Mean 9.1 S.D. 14.564
  • 27. DBpedia SPARQL Benchmark (DBSB) [MLA+14] • Real benchmark generation framework based on  DBpedia dataset with different sizes  DBpedia query log mining • Clustering log queries  Name variables in triple patterns  Select frequently executed queries  Remove SPARQL keywords and prefixes  Compute query similarity using Levenshtein string matching  Compute query clusters using a soft graph clustering algorithm [NS09]  Get queries templates (most frequently asked and uses more SPARQL constructs) from clusters with > 5 queries  Generate any number of queries from queries templates 11/13/2016 27
  • 28. DBSB Queries Features • Number of triple patterns  Test the efficiency of join operations (CP1) • SPARQL UNION & OPTIONAL constructs  Handle parallel execution of Unions (CP5) • Solution sequences & modifiers (DISTINCT)  Efficiency of duplication elimination (CP10) • Filter conditions and operators (FILTER, LANG, REGEX, STR)  Efficiency of engines to execute filters as early as possible (CP6) 11/13/2016 28
  • 29. DBSB Queries Features • Queries are based on 25 templates • Do not consider features such as number of join vertices, join vertex degree, triple patterns selectivities or query execution times etc. • Only consider SPARQL SELECT queries • Not customizable for given use cases or needs of an application 11/13/2016 29
  • 30. Recall: Key SPARQL Queries Characteristics FEASIBLE [SNM15], WatDiv [AHO+14], LUBM [GPH05] identified: • Query forms  SELECT, DESCRIBE, ASK, CONSTRUCT • Constructs  UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY, Negation • Features  Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean join vertices degree, Mean triple pattern selectivity, Join selectivity, Query runtime, Unbound predicates, 11/13/2016 30
  • 31. DBSB Queries Characteristic [SNM15] 11/13/2016 31 Queries from 25 templates 125 Query Forms SELECT 100% ASK 0% CONSTRUCT 0% DESCRIBE 0% Important SPARQL Constructs UNION 36% DISTINCT 100% ORDER BY 0% REGEX 4% LIMIT 0% OFFSET 0% OPTIONAL 32% FILTER 48% GROUP BY 0% Result Size Min 197 Max 4.62E+06 Mean 3.24E+05 S.D. 9.56E+05 BGPs Min 1 Max 9 Mean 2.695652 S.D. 2.438979 Triple Patterns Min 1 Max 12 Mean 4.521739 S.D. 2.79398 Join Vertices Min 0 Max 3 Mean 1.217391 S.D. 1.126399 Mean Join Vertices Degree Min 0 Max 5 Mean 1.826087 S.D. 1.435022 Mean Triple Patterns Selectivity Min 1.19E-05 Max 1 Mean 0.119288 S.D. 0.226966 Query Runtime (ms) Min 11 Max 5.40E+04 Mean 1.07E+04 S.D. 1.73E+04
  • 32. Waterloo SPARQL Diversity Test Suite (WatDiv) [AHO+14] • Synthetic benchmark  Synthetic data generator  Synthetic query generator • User-controlled data generator  Entities to include  Structuredness [DKS+11] of the dataset  Probability of entity associations  Cardinality of property associations • Query design criteria  Structural query features  Data-driven query features 11/13/2016 32
  • 33. WatDiv Query Design Criteria • Structural features  Number of triple patterns  Join vertex count  Join vertex degree • Data-driven features  Result size  (Filtered) Triple Pattern (f-TP) selectivity  BGP-Restricted f-TP selectivity  Join-Restricted f-TP selectivity 11/13/2016 33
  • 34. WatDiv Queries Generation • Query Template Generator  User-specified number of templates  User specified template characteristics • Query Generator  Instantiates the query templates with terms (IRIs, literals etc.) from the RDF dataset  User-specified number of queries produced 11/13/2016 34
  • 35. WatDiv Queries Characteristic [SNM15] 11/13/2016 35 Queries templates 125 Query Forms SELECT 100.00% ASK 0.00% CONSTRUCT 0.00% DESCRIBE 0.00% Important SPARQL Constructs UNION 0.00% DISTINCT 0.00% ORDER BY 0.00% REGEX 0.00% LIMIT 0.00% OFFSET 0.00% OPTIONAL 0.00% FILTER 0.00% GROUP BY 0.00% Result Size Min 0 Max 4.17E+09 Mean 3.49E+07 S.D. 3.73E+08 BGPs Min 1 Max 1 Mean 1 S.D. 0 Triple Patterns Min 1 Max 12 Mean 5.328 S.D. 2.60823 Join Vertices Min 0 Max 5 Mean 1.776 S.D. 0.9989 Mean Join Vertices Degree Min 0 Max 7 Mean 3.62427 S.D. 1.40647 Mean Triple Patterns Selectivity Min 0 Max 0.01176 Mean 0.00494 S.D. 0.00239 Query Runtime (ms) Min 3 Max 8.82E+08 Mean 4.41E+08 S.D. 2.77E+07
  • 36. FEASIBLE: Benchmark Generation Framework [SNM15] • Customizable benchmark generation framework • Generate real benchmarks from queries log • Can be applied to any SPARQL queries log • Customizable for given use cases or needs of an application 11/13/2016 36
  • 37. FEASIBLE Queries Selection Criteria • Query forms  SELECT, DESCRIBE, ASK, CONSTRUCT • Constructs  UNION, DISTINCT, ORDER BY, REGEX, LIMIT, FILTER, OPTIONAL, GROUP BY, Negation • Features  Result size, No. of BGPs, Number of triple patterns, No. of join vertices, Mean join vertices degree, Mean triple pattern selectivity, Join selectivity, Query runtime, Unbound predicates 11/13/2016 37
  • 38. FEASIBLE: Benchmark Generation Framework • Dataset cleaning • Feature vectors and normalization • Selection of exemplars • Selection of benchmark queries 38
  • 39. Feature Vectors and Normalization 39 SELECT DISTINCT ?entita ?nome WHERE { ?entita rdf:type dbo:VideoGame . ?entita rdfs:label ?nome FILTER regex(?nome, "konami", "i") } LIMIT 100 Query Type: SELECT Results Size: 13 Basic Graph Patterns (BGPs): 1 Triple Patterns: 2 Join Vertices: 1 Mean Join Vertices Degree: 2.0 Mean triple patterns selectivity: 0.01709761619798973 UNION: No DISTINCT: Yes ORDER BY: No REGEX: Yes LIMIT: Yes OFFSET: No OPTIONAL: No FILTER: Yes GROUP BY: No Runtime (ms): 65 13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65 0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14 Feature Vector Normalized Feature Vector
  • 40. FEASIBLE 40 Plot feature vectors in a multidimensional space Query F1 F2 Q1 0.2 0.2 Q2 0.5 0.3 Q3 0.8 0.3 Q4 0.9 0.1 Q5 0.5 0.5 Q6 0.2 0.7 Q7 0.1 0.8 Q8 0.13 0.65 Q9 0.9 0.5 Q10 0.1 0.5 Suppose we need a benchmark of 3 queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 41. FEASIBLE 41 Calculate average point Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 42. FEASIBLE 42 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select point of minimum Euclidean distance to avg. point *Red is our first exemplar
  • 43. FEASIBLE 43 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select point that is farthest to exemplars
  • 45. FEASIBLE 45 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select point that is farthest to exemplars
  • 47. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 47 Calculate distance from Q1 to each exemplars
  • 48. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 48 Assign Q1 to the minimum distance exemplar
  • 49. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 49 Repeat the process for Q2
  • 50. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 50 Repeat the process for Q3
  • 51. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 51 Repeat the process for Q6
  • 52. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 52 Repeat the process for Q8
  • 53. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 53 Repeat the process for Q9
  • 54. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FEASIBLE 54 Repeat the process for Q10
  • 55. FEASIBLE 55 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Calculate Average across each cluster
  • 56. FEASIBLE 56 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Calculate distance of each point in cluster to the average
  • 57. FEASIBLE 57 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select minimum distance query as the final benchmark query from that cluster Purple, i.e., Q2 is the final selected query from yellow cluster
  • 58. FEASIBLE 58 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select minimum distance query as the final benchmark query from that cluster Purple, i.e., Q3 is the final selected query from green cluster
  • 59. FEASIBLE 59 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Select minimum distance query as the final benchmark query from that cluster Purple, i.e., Q8 is the final selected query from brown cluster Our benchmark queries are Q2, Q3, and Q8
  • 60. Comparison of Composite Error 60 FEASIBLE’s composite error is 54.9% less than DBPSB
  • 61. Rank-wise Ranking of Triple Stores 61 All values are in percentages  None of the system is sole winner or loser for a particular rank  Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%)  Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%)  OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %)  Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)
  • 62. FEASIBLE(DBpedia) Queries Characteristic [SNM15] 11/13/2016 62 Queries 125 Query Forms SELECT 95.20% ASK 0.00% CONSTRUCT 4.00% DESCRIBE 0.80% Important SPARQL Constructs UNION 40.80% DISTINCT 52.80% ORDER BY 28.80% REGEX 14.40% LIMIT 38.40% OFFSET 18.40% OPTIONAL 30.40% FILTER 58.40% GROUP BY 0.80% Result Size Min 1 Max 1.41E+06 Mean 52183 S.D. 1.97E+05 BGPs Min 1 Max 14 Mean 3.176 S.D. 3.55841574 Triple Patterns Min 1 Max 18 Mean 4.88 S.D. 4.396846377 Join Vertices Min 0 Max 11 Mean 1.296 S.D. 2.39294662 Mean Join Vertices Degree Min 0 Max 11 Mean 1.44906666 S.D. 2.13246612 Mean Triple Patterns Selectivity Min 2.86693E-09 Max 1 Mean 0.14021433 7 S.D. 0.31899488 Query Runtime (ms) Min 2 Max 3.22E+04 Mean 2242.6 S.D. 6961.99191
  • 63. FEASIBLE(SWDF) Queries Characteristic [SNM15] 11/13/2016 63 Queries 125 Query Forms SELECT 92.80% ASK 2.40% CONSTRUCT 3.20% DESCRIBE 1.60% Important SPARQL Constructs UNION 32.80% DISTINCT 50.40% ORDER BY 25.60% REGEX 16.00% LIMIT 45.60% OFFSET 20.80% OPTIONAL 32.00% FILTER 29.60% GROUP BY 19.20% Result Size Min 1 Max 3.01E+05 Mean 9091.512 S.D. 4.70E+04 BGPs Min 0 Max 14 Mean 2.688 S.D. 2.812460752 Triple Patterns Min 0 Max 14 Mean 3.232 S.D. 2.76246734 Join Vertices Min 0 Max 3 Mean 0.52 S.D. 0.65500554 Mean Join Vertices Degree Min 0 Max 4 Mean 0.968 S.D. 1.09202386 Mean Triple Patterns Selectivity Min 1.06097E-05 Max 1 Mean 0.29192835 S.D. 0.32513860 1 Query Runtime (ms) Min 4 Max 4.13E+04 Mean 1308.832 S.D. 5335.44123
  • 64. Others Useful Benchmarks • Semantic Publishing Benchmark (SPB) • UniProt [RU09][UniprotKB] • YAGO (Yet Another Great Ontology)[SKW07] • Barton Library [Barton] • Linked Sensor Dataset [PHS10] • WordNet [WordNet] • Publishing TPC-H as RDF [TPC-H] • Apples and Oranges [DKS+11] 11/13/2016 64
  • 65. Summary of the centralized SPARQL querying benchmarks 11/13/2016 65
  • 66. Centralized SPARQL Querying Benchmarks Summary [SNM15] 11/13/2016 66 LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog Queries 15 125 12 125 125 125 130466 125 64030 Basic Query Forms SELECT 100.00% 80.00% 91.67% 100.00% 100% 95.20% 97.964987 92.80% 58.7084 ASK 0.00% 0.00% 8.33% 0.00% 0% 0.00% 1.93% 2.40% 0.09% CONSTRUCT 0.00% 4.00% 0.00% 0.00% 0% 4.00% 0.09% 3.20% 0.04% DESCRIBE 0.00% 16.00% 0.00% 0.00% 0% 0.80% 0.02% 1.60% 41.17%
  • 67. 11/13/2016 67 LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog Important SPARQL Construct s UNION 0.00% 8.00% 16.67% 0.00% 36% 40.80% 7.97% 32.80% 29.32% DISTINCT 0.00% 24.00% 41.67% 0.00% 100% 52.80% 4.16% 50.40% 34.18% ORDER BY 0.00% 36.00% 16.67% 0.00% 0% 28.80% 0.30% 25.60% 10.67% REGEX 0.00% 0.00% 0.00% 0.00% 4% 14.40% 0.21% 16.00% 0.03% LIMIT 0.00% 36.00% 8.33% 0.00% 0% 38.40% 0.40% 45.60% 1.79% OFFSET 0.00% 4.00% 8.33% 0.00% 0% 18.40% 0.03% 20.80% 0.14% OPTIONAL 0.00% 52.00% 25.00% 0.00% 32% 30.40% 20.11% 32.00% 29.52% FILTER 0.00% 52.00% 58.33% 0.00% 48% 58.40% 93.38% 29.60% 0.72% GROUP BY 0.00% 0.00% 0.00% 0.00% 0% 0.80% 7.66E-06 19.20% 1.34% Centralized SPARQL Querying Benchmarks Summary [SNM15]
  • 68. Centralized SPARQL Querying Benchmarks Summary [SNM15] 11/13/2016 68 LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog Result Size Min 3 0 1 0 197 1 1 1 1 Max 1.39E+04 31 4.34E+07 4.17E+09 4.62E+06 1.41E+06 1.41E+06 3.01E+05 3.01E+05 Mean 4.96E+03 8.312 4.55E+06 3.49E+07 3.24E+05 52183 404.000307 9091.512 39.5068 S.D 1.14E+04 9.0308 1.37E+07 3.73E+08 9.56E+05 1.97E+05 12932.2472 4.70E+04 2208.7 BGPs Min 1 1 1 1 1 1 0 0 0 Max 1 5 3 1 9 14 14 14 14 Mean 1 2.8 1.5 1 2.695652 3.176 1.67629114 2.688 2.28603 S.D 0 1.7039 0.67419986 0 2.438979 3.55841574 1.66075812 2.81246075 2.94057 Triple Patterns Min 1 1 1 1 1 1 0 0 0 Max 6 15 13 12 12 18 18 14 14 Mean 3 9.32 5.91666667 5.328 4.521739 4.88 1.7062683 3.232 2.50928 S.D 1.812653 5.18 3.82475985 2.60823 2.79398 4.396846377 1.68639622 2.76246734 3.21393 Join Vertices Min 0 0 0 0 0 0 0 0 0 Max 4 6 10 5 3 11 11 3 3 Mean 1.6 2.88 4.25 1.776 1.217391 1.296 0.02279521 0.52 0.18076 S.D 1.40407 1.8032 3.79293602 0.9989 1.126399 2.392946625 0.23381101 0.65500554 0.45669
  • 69. Centralized SPARQL Querying Benchmarks Summary [SNM15] 11/13/2016 69 LUBM BSBM SP2Bench WatDiv DBPSB FEASIBLE(DBpedia) DBpediaLog FEASIBLE(SWDF) SWDFLog Mean Join Vertices Degree Min 0 0 0 0 0 0 0 0 0 Max 5 4.5 9 7 5 11 11 4 5 Mean 2.02222 3.05 2.4134259 3.62427 1.826087 1.449066667 0.04159183 0.968 0.37006 S.D 1.29997 1.6375 2.2608082 1.40647 1.435022 2.132466121 0.33443107 1.092023868 0.87378 Mean Triple Patterns Selectivi ty Min 0.00032 9E-08 6.559E-05 0 1.19E-05 2.86693E-09 1.261E-05 1.06097E-05 1.1E-05 Max 0.432 0.0453 0.5398061 0.01176 1 1 1 1 1 Mean 0.01 0.0105 0.2218042 0.00494 0.119288 0.140214337 0.00578652 0.29192835 0.02381 S.D 0.0745 0.0142 0.2083138 0.00239 0.226966 0.318994887 0.03669906 0.325138601 0.07857 Query Runtime Min 2 5 7 3 11 2 1 4 3 Max 3200 99 7.13E+05 8.82E+08 5.40E+04 3.22E+04 5.60E+04 4.13E+04 4.13E+04 Mean 437.675 9.1 2.83E+05 4.41E+08 1.07E+04 2242.6 30.4185995 1308.832 16.1632 S.D 320.34 14.564 5.26E+05 2.77E+07 1.73E+04 6961.991912 702.518249 5335.441231 249.674
  • 70. Federated SPARQL Querying Benchmarks 11/13/2016 70
  • 71. Federated Query • Return the party membership and news pages about all US presidents.  Party memberships  US presidents  US presidents  News pages 71 Computation of results require data from both sources
  • 72. Federated SPARQL Query Processing S1 S2 S3 S4 RDF RDF RDF RDF Parsing/Rewriting Source Selection Federator Optimizer Integrator Rewrite query and get Individual Triple Patterns Identify capable/relevant sources Generate optimized query Execution Plan Integrate sub- queries results Execute sub- queries Federation Engine 72
  • 73. SPARQL Query Federation Approaches • SPARQL Endpoint Federation (SEF) • Linked Data Federation (LDF) • Hybrid of SEF+LDF 11/13/2016 73
  • 74. SPLODGE [SP+12] • Federated benchmarks generation tool • Query design criteria  Query form  Join type  Result modifiers: DISTINCT, LIMIT, OFFSET, ORDER BY  Variable triple patterns  Triple patterns joins  Cross product triple patterns  Number of sources  Number Join vertices  Query selectivity • Non-conjunctive queries that make use of the SPARQL UNION, OPTIONAL are not considered 11/13/2016 74
  • 75. FedBench [FB+11] • Based on 9 real interconnected datasets  KEGG, DrugBank, ChEDI from life sciences  DBpedia, GeoNames, Jamendo, SWDF, NYT, LMDB from cross domain  Vary in structuredness and sizes • Four sets of queries  7 life sciences queries  7 cross domain queries  11 Linked Data queries  14 queries from SP2Bench 11/13/2016 75
  • 76. FedBench Queries Characteristic 11/13/2016 76 Queries 25 Query Forms SELECT 100.00% ASK 0.00% CONSTRUCT 0.00% DESCRIBE 0.00% Important SPARQL Constructs UNION 12% DISTINCT 0.00% ORDER BY 0.00% REGEX 0.00% LIMIT 0.00% OFFSET 0.00% OPTIONAL 4% FILTER 4% GROUP BY 0.00% Result Size Min 1 Max 9054 Mean 529 S.D. 1764 BGPs Min 1 Max 2 Mean 1.16 S.D. 0.37 Triple Patterns Min 2 Max 7 Mean 4 S.D. 1.25 Join Vertices Min 0 Max 5 Mean 2.52 S.D. 1.26 Mean Join Vertices Degree Min 0 Max 3 Mean 2.14 S.D. 0.56 Mean Triple Patterns Selectivity Min 0.001 Max 1 Mean 0.05 S.D. 0.092 Query Runtime (ms) Min 50 Max 1.2E+4 Mean 1987 S.D. 3950
  • 77. LargeRDFBench • 32 Queries  10 simple  10 complex  8 large data • 14 Interlined datasets 77 Linked MDB DBpedia New York Times Linked TCGA- M Linked TCGA- E Linked TCGA- A Affymetr ix SW Dog Food KEGG Drug bank Jamend o ChEBI Geo names basedNear owl:sameAs x-geneid #Links: 251.3k country, ethnicity, race keggCompoundId bcr_patient_barcode Same instance Life Sciences Cross Domain Large Data bcr_patient_barcode #Links: 1.7k #Links: 4.1k #Links: 21.7k #Links: 1.3k
  • 79. LargeRDFBench Queries Properties • 14 Simple  2-7 triple patterns  Subset of SPARQL clauses  Query execution time around 2 seconds on avg. • 10 Complex  8-13 triple patterns  Use more SPARQL clauses  Query execution time up to 10 min • 8 Large Data  Minimum 80459 results  Large intermediate results  Query execution time in hours 11/13/2016 79
  • 80. LargeRDFBench Queries Characteristic 11/13/2016 80 Queries 32 Query Forms SELECT 100.00% ASK 0.00% CONSTRUCT 0.00% DESCRIBE 0.00% Important SPARQL Constructs UNION 18.75% DISTINCT 28.21% ORDER BY 9.37% REGEX 3.12% LIMIT 12.5% OFFSET 0.00% OPTIONAL 25% FILTER 31.25% GROUP BY 0.00% Result Size Min 1 Max 3.0E+5 Mean 5.9E+4 S.D. 1.1E+5 BGPs Min 1 Max 2 Mean 1.43 S.D. 0.5 Triple Patterns Min 2 Max 12 Mean 6.6 S.D. 2.6 Join Vertices Min 0 Max 6 Mean 3.43 S.D. 1.36 Mean Join Vertices Degree Min 0 Max 6 Mean 2.56 S.D. 0.76 Mean Triple Patterns Selectivity Min 0.001 Max 1 Mean 0.10 S.D. 0.14 Query Runtime (ms) Min 159 Max >1hr Mean Undefined S.D. Undefined
  • 82. Performance Metrics • Efficient source selection in terms of • Total triple pattern-wise sources selected • Total number of SPARQL ASK requests used during source selection • Source selection time • Query execution time • Results completeness and correctness • Number of remote requests during query execution • Index compression ratio (1- index size/datadump size) • Number of intermediate results 11/13/2016 82
  • 83. Future Directions • Micro benchmarking • Synthetic benchmarks generation  Synthetic data that is like real data  Synthetic queries that is like real queries • Customizable and flexible benchmark generation • Fits user needs • Fits current use-case • What are the most important choke points for SPARQL querying benchmarks? How they are related to query performance? 11/13/2016 83
  • 84. References • [L97] Charles Levine. TPC-C: The OLTP Benchmark. In SIGMOD – Industrial Session, 1997. • [GPH05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A Benchmark for OWL Knowledge Base Systems. Journal Web Semantics: Science, Services and Agents on the World Wide Web archive Volume 3 Issue 2-3, October, 2005 , Pages 158-182 • [SHM+09] M. Schmidt , T. Hornung, M. Meier, C. Pinkel, G. Lausen. SP2Bench: A SPARQL Performance Benchmark. Semantic Web Information Management, 2009. • [BS09] C. Bizer and A. Schultz. The Berlin SPARQL Benchmark. Int. J. Semantic Web and Inf. Sys., 5(2), 2009. • [BSBM] Berlin SPARQL Benchmark (BSBM) Specification - V3.1. http://wifo5- 3.informatik.unimannheim.de/bizer/berlinsparqlbenchmark/spec/index.html. • [RU09] N. Redaschi and UniProt Consortium. UniProt in RDF: Tackling Data Integration and Distributed Annotation with the Semantic Web. In Biocuration Conference, 2009. 11/13/2016 84
  • 85. References • [UniProtKB] UniProtKB Queries. http://www.uniprot.org/help/query-fields. • [SKW07]F. M. Suchanek, G. Kasneci and G. Weikum. YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia, In WWW 2007. • [Barton] The MIT Barton Library dataset. http://simile.mit.edu/rdf-test-data/ • [PHS10] H. Patni, C. Henson, and A. Sheth. Linked sensor data. 2010 • [TPC-H] The TPC-H Homepage. http://www.tpc.org/tpch/ • [WordNet] WordNet: A lexical database for English. http://wordnet.princeton.edu/ • [MLA+14] M. Morsey, J. Lehmann, S. Auer, A-C. Ngonga Ngomo. Dbpedia SPARQL Benchmark • [SP+12] Görlitz, Olaf, Matthias Thimm, and Steffen Staab. Splodge: Systematic generation of sparql benchmark queries for linked open data. International Semantic Web Conference. Springer Berlin Heidelberg, 2012. • [BNE14] P. Boncz, T. Neumann, O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. Performance Characterization and Benchmarking. In TPCTC 2013, Revised Selected Papers. 11/13/2016 85
  • 86. References • [NS09] A–C. Ngonga Ngomo and D. Schumacher. Borderflow: A local graph clustering algorithm for natural language processing. In CICLing, 2009. • [AHO+14]G. Aluc, O. Hartig, T. Ozsu, K. Daudjee. Diversifed Stress Testing of RDF Data Management Systems. In ISWC, 2014. • [SMN15] M. Saleem, Q. Mehmood, and A–C. Ngonga Ngomo. FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework. ISWC 2015. • [DKS+11] S. Duan, A. Kementsietsidis, Kavitha Srinivas and Octavian Udrea. Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In SIGMOD, 2011. • [FK16] I.Fundulaki, A.Kementsietsidis Assessing the performance of RDF Engines: Discussing RDF Benchmarks, Tutorial at ESWC2016 • [FB+11] Schmidt, Michael, et al. Fedbench: A benchmark suite for federated semantic data query processing. International Semantic Web Conference. Springer Berlin Heidelberg, 2011. • [LB+16] M.Saleem, A.Hasnain, A–C. Ngonga Ngomo. LargeRDFBench: A Billion Triples Benchmark for SPARQL Query Federation, Submitted to Journal of Web Semantics 11/13/2016 86
  • 87. Thanks {lastname}@informatik.uni-leipzig.de AKSW, University of Leipzig, Germany 11/13/2016 87 This work was supported by grands from BMWi project SAKE and the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).