SlideShare a Scribd company logo
FEASIBLE: A Feature-Based SPARQL Benchmark
Generation Framework
Muhammad Saleem1, Qaiser Mehmood2, Axel-Cyrille Ngonga Ngomo1
http://feasible.aksw.org/
1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany
2Insight Center for Data Analytics, National University of Ireland, Galway
International Semantic Web Conference, Bethlehem, USA, 2015
10/14/2015 1
Triple Stores Benchmarks
• Synthetic Benchmarks
• Make use of the synthetic queries and/or data
• Benchmarks of different data sizes possible
• Suitable to test the scalability
• Often fail to reflect the reality
• For example, LUBM, SP2Bench, BSBM, WatDiv etc.
• Queries Log Benchmarks
• Make use of the real queries from queries log
• Can be more close to the reality
• Can be used with different data sizes
• Scalability can be tested
• For example, DBPSB, FEASIBLE
10/14/2015 2
DBpedia SPARQL Benchmark
• Based on real DBpedia queries log
• Benchmarks of different data sizes possible
• Suitable to test the scalability
• Only Considers SPARQL SELECT
• Does not consider Important query features
• For example, number of join vertices, triple patterns selectivities
• Not customizable for given use cases or needs of an application
10/14/2015 3
FEASIBLE SPARQL Benchmark
• Can be applied to any SPARQL queries log
• Considers SPARQL SELECT, ASK, DESCRIBE, CONSTRUCT
• Considers Important query features
• For example, number of join vertices, triple patterns selectivities,
query runtime, resultset size, number of BGPs, Mean join vertices
degree, number of triple patterns etc.
• Customizable for given use cases or needs of an application
10/14/2015 4
FEASIBLE SPARQL Benchmark
• Dataset cleaning
• Feature vectors and normalization
• Selection of exemplars
• Selection of benchmark queries
10/14/2015 5
Dataset Cleaning
• Remove syntactically incorrect queries
• Remove zero result size queries
• It is an optional step
• Not of theoretical necessity
• Leads to practically reliable benchmarks
10/14/2015 6
Feature Vectors and Normalization
SELECT DISTINCT ?entita ?nome
WHERE
{
?entita rdf:type dbo:VideoGame .
?entita rdfs:label ?nome
FILTER regex(?nome, "konami", "i")
}
LIMIT 100
Query Type: SELECT
Results Size: 13
Basic Graph Patterns (BGPs): 1
Triple Patterns: 2
Join Vertices: 1
Mean Join Vertices Degree: 2.0
Mean triple patterns selectivity: 0.01709761619798973
UNION: No
DISTINCT: Yes
ORDER BY: No
REGEX: Yes
LIMIT: Yes
OFFSET: No
OPTIONAL: No
FILTER: Yes
GROUP BY: No
Runtime (ms): 65
13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65
0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14
Feature Vector
Normalized Feature Vector
10/14/2015 7
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 8
Plot feature vectors in a multidimensional space
Query Feature 1 Feature 2
Q1 0.2 0.2
Q2 0.5 0.3
Q3 0.8 0.3
Q4 0.9 0.1
Q5 0.5 0.5
Q6 0.2 0.7
Q7 0.1 0.8
Q8 0.13 0.65
Q9 0.9 0.5
Q10 0.1 0.5
Suppose we need a benchmark of 3 queries
Selection of exemplars
10/14/2015 9
Calculate average point
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 10
Select point of minimum Euclidean distance to avg. point
*Red is our first exemplar
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 11
Select point that is farthest to exemplars
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 12
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 13
Select point that is farthest to exemplars
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 14
Selection of Benchmark Queries
10/14/2015 15
Calculate distance from Q1 to each exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 16
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Assign Q1 to the minimum distance exemplar
Selection of Benchmark Queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 17
Repeat the process for Q2
Selection of Benchmark Queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 18
Repeat the process for Q3
Selection of Benchmark Queries
10/14/2015 19
Repeat the process for Q6
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 20
Repeat the process for Q8
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 21
Repeat the process for Q9
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 22
Repeat the process for Q10
Selection of Benchmark Queries
10/14/2015 23
Calculate Average across each cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 24
Calculate distance of each point in cluster to the average
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 25
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q2 is the final selected query from yellow cluster
Selection of Benchmark Queries
10/14/2015 26
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q8 is the final selected query from brown cluster
Selection of Benchmark Queries
10/14/2015 27
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q3 is the final selected query from green cluster
Our benchmark queries are Q2, Q3, and Q8
Experimental Setup
• Composite Error Estimation
• L is the query log, B is the benchmark and K is the set of all features
10/14/2015 28
Experimental Setup
• Virtuoso Open-Source Edition version 7.2
• NumberOfBuffers = 680000, MaxDirtyBuffers = 500000
• Sesame Version 2.7.8
• Tomcat 7 as HTTP interface and native storage layout.
• Set the spoc, posc, opsc indices to those specified in the native storage configuration
• The Java heap size was set to 6GB
• Jena-TDB (Fuseki) Version 2.0
• Java heap size set to 6GB
• OWLIM-SE Version 6.1
• Tomcat 7.0 as HTTP interface
• Set the entity index size to 45,000,000 and enabled the predicate list
• Rule set was empty and the Java heap size was set to 6GB.
• We configured all triple stores to use 6GB of memory and used default values
otherwise.
10/14/2015 29
Comparison of Composite Error
10/14/2015 30
FEASIBLE’s composite error is 54.9% less than DBPSB
Comparison of Triple Stores: QpS
10/14/2015 31
0
50
100
150
200
250
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
0
0.5
1
1.5
2
2.5
3
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
0
10
20
30
40
50
60
70
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
SPARQL ASK SPARQL CONSTRUCT
SPARQL DESCRIBE SPARQL SELECT
Comparison of Triple Stores: Mix Queries
10/14/2015 32
0
5
10
15
20
25
30
35
40
Sesame Virtuoso OWLIM-SE Fuseki
SWDF
QMpH
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Sesame Virtuoso OWLIM-SE Fuseki
DBpedia
QMpH
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Sesame Virtuoso OWLIM-SE Fuseki
SWDF
QpS
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Sesame Virtuoso OWLIM-SE Fuseki
DBpedia
QpS
Rank-wise Ranking of Triple Stores
10/14/2015 33
All values are in percentages
• None of the system is sole winner or loser for a particular rank
• Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%)
• Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%)
• OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %)
• Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)
Thanks
saleem@informatik.uni-Leipzig.de
Try Yourself
http://feasible.aksw.org/
10/14/2015 34

More Related Content

Similar to FEASIBLE-Benchmark-Framework-ISWC2015

open sta testing Certification
open sta testing Certificationopen sta testing Certification
open sta testing Certification
Vskills
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
Muhammad Saleem
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16
Sumi Ryu
 
Postgre sql vs oracle
Postgre sql vs oraclePostgre sql vs oracle
Postgre sql vs oracle
Jacques Kostic
 
10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser Testing10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser Testing
Perfecto by Perforce
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test Framework
SmartBear
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph Community
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
aragozin
 
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Ruby Meditation
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
 
Classifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsClassifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm Algorithms
Aboul Ella Hassanien
 
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
OPNFV
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Lionel Briand
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...
Isabelle Van Campenhoudt
 

Similar to FEASIBLE-Benchmark-Framework-ISWC2015 (20)

open sta testing Certification
open sta testing Certificationopen sta testing Certification
open sta testing Certification
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16
 
Postgre sql vs oracle
Postgre sql vs oraclePostgre sql vs oracle
Postgre sql vs oracle
 
10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser Testing10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser Testing
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test Framework
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Classifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsClassifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm Algorithms
 
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...
 

More from Muhammad Saleem

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
Muhammad Saleem
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
Muhammad Saleem
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Muhammad Saleem
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
Muhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
Muhammad Saleem
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
Muhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
Muhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
Muhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
Muhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
Muhammad Saleem
 

More from Muhammad Saleem (15)

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Recently uploaded

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 

Recently uploaded (20)

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 

FEASIBLE-Benchmark-Framework-ISWC2015

  • 1. FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework Muhammad Saleem1, Qaiser Mehmood2, Axel-Cyrille Ngonga Ngomo1 http://feasible.aksw.org/ 1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany 2Insight Center for Data Analytics, National University of Ireland, Galway International Semantic Web Conference, Bethlehem, USA, 2015 10/14/2015 1
  • 2. Triple Stores Benchmarks • Synthetic Benchmarks • Make use of the synthetic queries and/or data • Benchmarks of different data sizes possible • Suitable to test the scalability • Often fail to reflect the reality • For example, LUBM, SP2Bench, BSBM, WatDiv etc. • Queries Log Benchmarks • Make use of the real queries from queries log • Can be more close to the reality • Can be used with different data sizes • Scalability can be tested • For example, DBPSB, FEASIBLE 10/14/2015 2
  • 3. DBpedia SPARQL Benchmark • Based on real DBpedia queries log • Benchmarks of different data sizes possible • Suitable to test the scalability • Only Considers SPARQL SELECT • Does not consider Important query features • For example, number of join vertices, triple patterns selectivities • Not customizable for given use cases or needs of an application 10/14/2015 3
  • 4. FEASIBLE SPARQL Benchmark • Can be applied to any SPARQL queries log • Considers SPARQL SELECT, ASK, DESCRIBE, CONSTRUCT • Considers Important query features • For example, number of join vertices, triple patterns selectivities, query runtime, resultset size, number of BGPs, Mean join vertices degree, number of triple patterns etc. • Customizable for given use cases or needs of an application 10/14/2015 4
  • 5. FEASIBLE SPARQL Benchmark • Dataset cleaning • Feature vectors and normalization • Selection of exemplars • Selection of benchmark queries 10/14/2015 5
  • 6. Dataset Cleaning • Remove syntactically incorrect queries • Remove zero result size queries • It is an optional step • Not of theoretical necessity • Leads to practically reliable benchmarks 10/14/2015 6
  • 7. Feature Vectors and Normalization SELECT DISTINCT ?entita ?nome WHERE { ?entita rdf:type dbo:VideoGame . ?entita rdfs:label ?nome FILTER regex(?nome, "konami", "i") } LIMIT 100 Query Type: SELECT Results Size: 13 Basic Graph Patterns (BGPs): 1 Triple Patterns: 2 Join Vertices: 1 Mean Join Vertices Degree: 2.0 Mean triple patterns selectivity: 0.01709761619798973 UNION: No DISTINCT: Yes ORDER BY: No REGEX: Yes LIMIT: Yes OFFSET: No OPTIONAL: No FILTER: Yes GROUP BY: No Runtime (ms): 65 13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65 0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14 Feature Vector Normalized Feature Vector 10/14/2015 7
  • 8. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 8 Plot feature vectors in a multidimensional space Query Feature 1 Feature 2 Q1 0.2 0.2 Q2 0.5 0.3 Q3 0.8 0.3 Q4 0.9 0.1 Q5 0.5 0.5 Q6 0.2 0.7 Q7 0.1 0.8 Q8 0.13 0.65 Q9 0.9 0.5 Q10 0.1 0.5 Suppose we need a benchmark of 3 queries
  • 9. Selection of exemplars 10/14/2015 9 Calculate average point Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 10. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 10 Select point of minimum Euclidean distance to avg. point *Red is our first exemplar
  • 11. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 11 Select point that is farthest to exemplars
  • 12. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 12
  • 13. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 13 Select point that is farthest to exemplars
  • 14. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 14
  • 15. Selection of Benchmark Queries 10/14/2015 15 Calculate distance from Q1 to each exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 16. Selection of Benchmark Queries 10/14/2015 16 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Assign Q1 to the minimum distance exemplar
  • 17. Selection of Benchmark Queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 17 Repeat the process for Q2
  • 18. Selection of Benchmark Queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 18 Repeat the process for Q3
  • 19. Selection of Benchmark Queries 10/14/2015 19 Repeat the process for Q6 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 20. Selection of Benchmark Queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 20 Repeat the process for Q8
  • 21. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Selection of Benchmark Queries 10/14/2015 21 Repeat the process for Q9
  • 22. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Selection of Benchmark Queries 10/14/2015 22 Repeat the process for Q10
  • 23. Selection of Benchmark Queries 10/14/2015 23 Calculate Average across each cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 24. Selection of Benchmark Queries 10/14/2015 24 Calculate distance of each point in cluster to the average Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 25. Selection of Benchmark Queries 10/14/2015 25 Select minimum distance query as the final benchmark query from that cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Black, i.e., Q2 is the final selected query from yellow cluster
  • 26. Selection of Benchmark Queries 10/14/2015 26 Select minimum distance query as the final benchmark query from that cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Black, i.e., Q8 is the final selected query from brown cluster
  • 27. Selection of Benchmark Queries 10/14/2015 27 Select minimum distance query as the final benchmark query from that cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Black, i.e., Q3 is the final selected query from green cluster Our benchmark queries are Q2, Q3, and Q8
  • 28. Experimental Setup • Composite Error Estimation • L is the query log, B is the benchmark and K is the set of all features 10/14/2015 28
  • 29. Experimental Setup • Virtuoso Open-Source Edition version 7.2 • NumberOfBuffers = 680000, MaxDirtyBuffers = 500000 • Sesame Version 2.7.8 • Tomcat 7 as HTTP interface and native storage layout. • Set the spoc, posc, opsc indices to those specified in the native storage configuration • The Java heap size was set to 6GB • Jena-TDB (Fuseki) Version 2.0 • Java heap size set to 6GB • OWLIM-SE Version 6.1 • Tomcat 7.0 as HTTP interface • Set the entity index size to 45,000,000 and enabled the predicate list • Rule set was empty and the Java heap size was set to 6GB. • We configured all triple stores to use 6GB of memory and used default values otherwise. 10/14/2015 29
  • 30. Comparison of Composite Error 10/14/2015 30 FEASIBLE’s composite error is 54.9% less than DBPSB
  • 31. Comparison of Triple Stores: QpS 10/14/2015 31 0 50 100 150 200 250 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS 0 0.5 1 1.5 2 2.5 3 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS 0 10 20 30 40 50 60 70 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS SPARQL ASK SPARQL CONSTRUCT SPARQL DESCRIBE SPARQL SELECT
  • 32. Comparison of Triple Stores: Mix Queries 10/14/2015 32 0 5 10 15 20 25 30 35 40 Sesame Virtuoso OWLIM-SE Fuseki SWDF QMpH 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Sesame Virtuoso OWLIM-SE Fuseki DBpedia QMpH 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Sesame Virtuoso OWLIM-SE Fuseki SWDF QpS 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Sesame Virtuoso OWLIM-SE Fuseki DBpedia QpS
  • 33. Rank-wise Ranking of Triple Stores 10/14/2015 33 All values are in percentages • None of the system is sole winner or loser for a particular rank • Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%) • Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%) • OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %) • Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)