Practical SPARQL Benchmarking Revisited

Rob Vesse
Rob VesseSoftware Engineer at YarcData
1 
Rob Vesse 
rvesse@yarcdata.com 
@RobVesse
2 
1. Rewind to 2012 
2. Limitations 
3. Evolving the Framework 
4. Examples 
5. Future Work
3
4 
 Presentation I gave at this conference in 2012 
 Slides at http://www.slideshare.net/RobVesse/practical-sparql-benchmarking 
 Highlighted some issues with SPARQL Benchmarking: 
 Standard Benchmarks all have know deficiencies 
 Lack of standardized methodology 
 Best benchmark is the one you run with your data and workload 
 Introduced the 1.x version of our SPARQL Query 
Benchmarker tool 
 Java tool and API for benchmarking 
 Used a methodology based upon combination of the BSBM runner and Revelytix SP2B white 
paper 
 Reports various appropriate statistics 
 Various configuration options to change what exactly is benchmarked e.g. whether results are 
fully parsed and counted
5 
 The 1.x tool was open sourced shortly after the 2012 
conference under a 3 clause BSD License 
 Available on SourceForge 
 http://sourceforge.net/projects/sparql-query-bm/files/1.0.0/ 
 Also as Maven artifacts (in Maven Central): 
 Group ID: net.sf.sparql-query-bm 
 Artifact IDs: 
 cmd 
 core 
 Latest 1.x Version: 1.1.0
6
 The 1.x tool can only benchmark SPARQL queries 
 SPARQL 1.1 has been standardized since the 1.x version of 
the tool was written and adds various additional SPARQL 
features that you may want to test: 
7 
 SPARQL Updates 
 SPARQL Graph Store Protocol 
 Queries are fixed 
 No parameterization support 
 Can't pass custom endpoint parameters in 
 For example enable/disable reasoning 
 Also no way to test endpoint specific extensions 
 e.g. transactions
8 
 Requires using HTTP endpoints to access the SPARQL 
system to be tested 
 Adds communication overheads to the results 
 Sometimes this may be desirable 
 No ability to test SPARQL operations in-memory 
 i.e. can't test lower level APIs
 Only supports a single benchmarking methodology 
 Methodology is hard coded 
 Can't do things like run a subset of the provided operations 
on each run 
9 
 Or repeat an operation within a run 
 Or retry an operation under specific failure conditions 
 Configuration of the methodology is tightly coupled to the 
methodology 
 Many aspects are actually independent of the methodology
1 
0 
 Used a simplistic text based format 
 One query file per line 
 No way to specify additional parameters 
 No way to assign a friendly name to queries 
 Assigns each query the filename
 There is a progress monitoring API but it is limited 
 E.g. Gets called after a query completes but not before it 
starts 
 Makes it awkward/impossible to implement some kinds of 
monitoring 
1 
1 
 e.g. crash detection, memory usage
1 
2 
 In the interests of speed over usability we rolled our own 
command line arguments parser 
 Means argument parsing is awkward to extend
1 
3
1 
4 
 Earlier this year we found a compelling reason to rewrite 
the tool and address the various limitations 
 First 2.x release was made 9th June 2014 
 Minor bug fix and maintenance releases since 
 Releases available at: 
 http://sourceforge.net/projects/sparql-query-bm/files/ 
 Code is now using Git 
 http://git.code.sf.net/p/sparql-query-bm/git sparql-query-bm-git 
 Mirrors available on GitHub for those who think that it is the one true source 
 https://github.com/rvesse/sparql-query-bm 
 Maven artifacts available through Maven Central as before: 
 Group ID: net.sf.sparql-query-bm 
 Artifact IDs: core, cmd and dist 
 Latest 2.x version: 2.0.1
 Concept of Queries replaced with the general concept of 
Operations 
 Also divorces the definition of an operation with how to run 
said operation 
1 
5 
 Makes it easier to change runtime behaviour of operations 
 20 built-in operations provided 
 API allows defining and plugging in new operations as 
desired 
 http://sparql-query-bm.sourceforge.net/javadoc/latest/core/
1 
6 
 Several kinds of query/update 
 Fixed 
 Parameterized 
 Dataset Size 
 Variants for both remote endpoints and in-memory 
datasets 
 Remote variants have additional NVP variants 
 Allows adding custom parameters to the remote request 
 Accounts for 13 of the built in operations
1 
7 
 One for each graph store protocol operation: 
 DELETE 
 GET 
 HEAD 
 POST 
 PUT 
 Accounts for a further 5 of the built-in operations
1 
8 
 Sleep 
 Do nothing for some period 
 Useful for simulating quiet periods as part of testing 
 Mix 
 Allow grouping a set of operations into a single operation 
 Lets you compose mixes from other mixes
1 
9 
 As already noted in-memory variants of some operations 
are now available 
 These run tests against a Dataset implementation 
 Part of Apache Jena ARQ API 
 Removes SPARQL Protocol and HTTP overhead from testing 
 Of course depending on Dataset implementation may still be some communication overhead 
 But this is likely using lower level back end native communications protocols instead
2 
0 
 Addresses the limitation of hard coded methodology 
 Separates test running into three components: 
 Overall runner 
 Mix runner 
 Operation runner 
 Each has own API and can be customized as desired 
 Various useful base/abstract implementations provided 
 Four different test runners are provided: 
 Benchmark 
 Smoke 
 Soak 
 Stress
2 
1 
 Smoke 
 Runs the mix once and indicates whether it passes/fails 
 Pass is defined as all operations pass 
 Soak 
 Run the mix continuously for some period of time 
 Test how a system reacts under continuous load 
 Stress 
 Run the mix with increasingly high load 
 Test how a system reacts under increasing load 
 AbstractRunner provides a basic framework and helper 
method to make it easy to add custom runners or 
customize existing runs
2 
2 
 Allows customizing how mixes and individual operations 
are run 
 Some alternative implementations built in: 
 E.g. SamplingOperationMixRunner 
 Runs a sample of the operations in the mix 
 May include repeats 
 E.g. RetryingOperationRunner 
 Retries an operation if it doesn't succeed 
 Easy to implement your own
2 
3 
 Separates test configuration from the test runner 
 Interface with all common configuration defined 
 Endpoints 
 Timeouts 
 Progress Listeners 
 etc 
 NB - Runners are typically defined such that they restrict 
their input options to sub-interfaces that add runner 
specific configuration e.g. 
 Warm-ups for benchmarks 
 Total runtime for soak testing 
 Ramp up factor for stress testing
2 
4 
 Now using TSV as the file format 
 Still wanted to be simple enough that someone with zero RDF/SPARQL knowledge can 
configure 
 Each line is a series of parameters separated by a tab 
character 
 First parameter is an identifier for the type of the operation 
 Used to decide how to interpret the remaining parameters 
 Can define your own mix file format and register a loader 
for it 
 Possible to override the loader for a specific operation 
identifier since this has an API 
 Means you can do neat tricks like use a mix designed for remote endpoints against an in-memory 
dataset
query 806670-warmup1.rq 806670 Warmup Query 1 
query 806670-warmup2.rq 806670 Warmup Query 2 
query 806670-nofilter.rq 806670 Query with No Filter 
query 806670-filter3.rq 806670 Query with Filter (Variant 3) 
param-query 806670-filter3-params.rq instances.tsv Parameterized Query with 
Filter (Variant 3) 
query 806670-filter4.rq 806670 Query with Filter (Variant 4) 
query 806670-filter4a.rq 806670 Query with Filter (Variant 4a - Zero Results) 
param-query 806670-filter4-params.rq instances.tsv Parameterized Query with 
Filter (Variant 4) 
query 806238-warmup1.rq 806238 Warmup Query 1 
query 806238-warmup2.rq 806238 Warmup Query 2 
query 806238-comment43.rq 806238 Query (Comment 43) 
query 806238-comment43a.rq 806238 Query (Comment 43 - SELECT * sub-query) 
query 806238-comment45.rq 806238 Query (Comment 45 - Multiple sub-queries) 
query 806238-comment54.rq 806238 Query (Comment 54) 
param-update load-full1m.ru graph-names.tsv Load 1M Dataset into named graph 
param-query count-loaded.rq graph-names.tsv Count named graph 
param-update drop-loaded.ru graph-names.tsv Drop named graph 
query count.rq Count quads 
checkpoint10 Checkpoint every 10 runs 
sleep 180 3 minute sleep 
2 
5
 Now provides notifications before and after operation and 
mix runs 
 Improvements to how some of the built-in 
implementations handle multi-threaded output 
2 
6 
 Makes it easier to distinguish where errors occurred when running multi-threaded 
benchmarks
2 
7 
 Now based upon the powerful open source Airline library 
 https://github.com/airlift/airline 
 Provides a command line interface to each built-in runner 
 Also provides AbstractCommandwith all standard options exposed 
 Standardized exit codes across all commands 
 Comprehensive built-in help 
 Can help you define operation mixes 
 ./operations 
 ./operation --op param-query
2 
8
 These are things we've done (or are currently doing) with 
the framework that aren't in the open source releases 
 However the 2.x framework makes these (hopefully) easy 
to replicate yourself 
2 
9
3 
0 
 Many stores often have rich REST APIs in addition to their 
SPARQL APIs 
 Can be useful to include testing of these in your mixes 
 Requires implementing two interfaces: 
 Operation 
 OperationCallable 
 Abstract implementations of both available to give you the 
boiler plate bits 
 Internally we have 9 different custom operations defined 
which test a subset of our REST API: 
 Database Management 
 Asynchronous Queries 
 Import Management
 One thing we're particularly interested in is how operations 
affect memory usage 
3 
1 
 We added custom progress listeners that track and monitor memory usage 
 Reports on min, max and average memory usage 
 We also have another progress listener that tracks 
processes to identify when a test run may have been 
impacted by other activity on the system
3 
2 
public class RetryOnAuthFailureOperationRunner extends RetryingOperationRunner { 
public RetryOnAuthFailureOperationRunner() { 
this(1); 
} 
public RetryOnAuthFailureOperationRunner(int maxRetries) { 
super(maxRetries); 
} 
@Override 
protected <T extends Options> boolean shouldRetry(Runner<T> runner, T options, 
Operation op, OperationRun run) { 
return run.getErrorCategory() == ErrorCategories.AUTHENTICATION; 
} 
} 
 Extends the built-in RetryingOperationRunner 
 Simply adds a constraint on retries by overriding the 
shouldRetry() method
3 
3
3 
4 
 Embrace Java 7 features fully 
 Use ServiceLoader to automatically discover new operations and mix formats 
 Make it even easier to customize runners 
 i.e. provide more abstraction of the current implementations
3 
5 
Questions? 
rvesse@yarcdata.com 
@RobVesse
1 of 35

Recommended

Apache Jena Elephas and Friends by
Apache Jena Elephas and FriendsApache Jena Elephas and Friends
Apache Jena Elephas and FriendsRob Vesse
2.4K views41 slides
Semantic Integration with Apache Jena and Stanbol by
Semantic Integration with Apache Jena and StanbolSemantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and StanbolAll Things Open
3.5K views27 slides
Quadrupling your elephants - RDF and the Hadoop ecosystem by
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemRob Vesse
5K views40 slides
Sempala - Interactive SPARQL Query Processing on Hadoop by
Sempala - Interactive SPARQL Query Processing on HadoopSempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopAlexander Schätzle
1.6K views21 slides
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass... by
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Chris Fregly
4.8K views42 slides
WebTech Tutorial Querying DBPedia by
WebTech Tutorial Querying DBPediaWebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaKatrien Verbert
8.2K views41 slides

More Related Content

What's hot

Debugging Apache Spark - Scala & Python super happy fun times 2017 by
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017Holden Karau
881 views47 slides
Pandas UDF and Python Type Hint in Apache Spark 3.0 by
Pandas UDF and Python Type Hint in Apache Spark 3.0Pandas UDF and Python Type Hint in Apache Spark 3.0
Pandas UDF and Python Type Hint in Apache Spark 3.0Databricks
525 views38 slides
Querying Linked Data with SPARQL by
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQLOlaf Hartig
15.4K views48 slides
Apache Spark MLlib 2.0 Preview: Data Science and Production by
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionDatabricks
13.9K views22 slides
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python by
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
9.5K views72 slides
Why Scala Is Taking Over the Big Data World by
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldDean Wampler
41.7K views89 slides

What's hot(20)

Debugging Apache Spark - Scala & Python super happy fun times 2017 by Holden Karau
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
Holden Karau881 views
Pandas UDF and Python Type Hint in Apache Spark 3.0 by Databricks
Pandas UDF and Python Type Hint in Apache Spark 3.0Pandas UDF and Python Type Hint in Apache Spark 3.0
Pandas UDF and Python Type Hint in Apache Spark 3.0
Databricks525 views
Querying Linked Data with SPARQL by Olaf Hartig
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
Olaf Hartig15.4K views
Apache Spark MLlib 2.0 Preview: Data Science and Production by Databricks
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and Production
Databricks13.9K views
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python by Christian Perone
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Christian Perone9.5K views
Why Scala Is Taking Over the Big Data World by Dean Wampler
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
Dean Wampler41.7K views
Learn Apache Spark: A Comprehensive Guide by Whizlabs
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
Whizlabs591 views
Holden Karau - Spark ML for Custom Models by sparktc
Holden Karau - Spark ML for Custom ModelsHolden Karau - Spark ML for Custom Models
Holden Karau - Spark ML for Custom Models
sparktc2.2K views
SPARQL 1.1 Update (2013-03-05) by andyseaborne
SPARQL 1.1 Update (2013-03-05)SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)
andyseaborne4.5K views
Scalable Data Science in Python and R on Apache Spark by felixcss
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
felixcss2K views
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow by Databricks
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowMigrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Databricks2.6K views
Apache Spark Super Happy Funtimes - CHUG 2016 by Holden Karau
Apache Spark Super Happy Funtimes - CHUG 2016Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
Holden Karau149 views
Tuning and Monitoring Deep Learning on Apache Spark by Databricks
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
Databricks3.3K views
SPARQL Cheat Sheet by LeeFeigenbaum
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
LeeFeigenbaum92.3K views
Functional programming in Scala by datamantra
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra1.7K views
Getting started with Apache Spark in Python - PyLadies Toronto 2016 by Holden Karau
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Holden Karau222 views
Apache: Big Data - Starting with Apache Spark, Best Practices by felixcss
Apache: Big Data - Starting with Apache Spark, Best PracticesApache: Big Data - Starting with Apache Spark, Best Practices
Apache: Big Data - Starting with Apache Spark, Best Practices
felixcss442 views
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung by Spark Summit
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit2.2K views
Intro to apache spark stand ford by Thu Hiền
Intro to apache spark stand fordIntro to apache spark stand ford
Intro to apache spark stand ford
Thu Hiền1.1K views

Similar to Practical SPARQL Benchmarking Revisited

Integration Group - Robot Framework by
Integration Group - Robot Framework Integration Group - Robot Framework
Integration Group - Robot Framework OpenDaylight
2.4K views17 slides
Play framework : A Walkthrough by
Play framework : A WalkthroughPlay framework : A Walkthrough
Play framework : A Walkthroughmitesh_sharma
462 views44 slides
Network Protocol Testing Using Robot Framework by
Network Protocol Testing Using Robot FrameworkNetwork Protocol Testing Using Robot Framework
Network Protocol Testing Using Robot FrameworkPayal Jain
7.6K views27 slides
Automation using ibm rft by
Automation using ibm rftAutomation using ibm rft
Automation using ibm rftPrashant Chaudhary
761 views6 slides
Maximizing SAP ABAP Performance by
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP PerformancePeterHBrown
9.2K views35 slides
Meetup 2022 - APIs with Quarkus.pdf by
Meetup 2022 - APIs with Quarkus.pdfMeetup 2022 - APIs with Quarkus.pdf
Meetup 2022 - APIs with Quarkus.pdfLuca Mattia Ferrari
26 views23 slides

Similar to Practical SPARQL Benchmarking Revisited(20)

Integration Group - Robot Framework by OpenDaylight
Integration Group - Robot Framework Integration Group - Robot Framework
Integration Group - Robot Framework
OpenDaylight2.4K views
Play framework : A Walkthrough by mitesh_sharma
Play framework : A WalkthroughPlay framework : A Walkthrough
Play framework : A Walkthrough
mitesh_sharma462 views
Network Protocol Testing Using Robot Framework by Payal Jain
Network Protocol Testing Using Robot FrameworkNetwork Protocol Testing Using Robot Framework
Network Protocol Testing Using Robot Framework
Payal Jain7.6K views
Maximizing SAP ABAP Performance by PeterHBrown
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
PeterHBrown9.2K views
Native Support of Prometheus Monitoring in Apache Spark 3.0 by Databricks
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks2.4K views
Performancetestingjmeter 131210111657-phpapp02 by Nitish Bhardwaj
Performancetestingjmeter 131210111657-phpapp02Performancetestingjmeter 131210111657-phpapp02
Performancetestingjmeter 131210111657-phpapp02
Nitish Bhardwaj533 views
Linaro Connect 2016 (BKK16) - Introduction to LISA by Patrick Bellasi
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISA
Patrick Bellasi673 views
Adventures in Laravel 5 SunshinePHP 2016 Tutorial by Joe Ferguson
Adventures in Laravel 5 SunshinePHP 2016 TutorialAdventures in Laravel 5 SunshinePHP 2016 Tutorial
Adventures in Laravel 5 SunshinePHP 2016 Tutorial
Joe Ferguson5.6K views
Streaming Processing with a Distributed Commit Log by Joe Stein
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein3.1K views
Mykola Kovsh - Functional API automation with Jmeter by Ievgenii Katsan
Mykola Kovsh - Functional API automation with JmeterMykola Kovsh - Functional API automation with Jmeter
Mykola Kovsh - Functional API automation with Jmeter
Ievgenii Katsan1.7K views
Performance Testing REST APIs by Jason Weden
Performance Testing REST APIsPerformance Testing REST APIs
Performance Testing REST APIs
Jason Weden13.2K views
Basics of QTP Framework by Anish10110
Basics of QTP FrameworkBasics of QTP Framework
Basics of QTP Framework
Anish1011018.6K views
How to use Exachk effectively to manage Exadata environments OGBEmea by Sandesh Rao
How to use Exachk effectively to manage Exadata environments OGBEmeaHow to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmea
Sandesh Rao328 views

More from Rob Vesse

Challenges and patterns for semantics at scale by
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleRob Vesse
390 views15 slides
Introducing JDBC for SPARQL by
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQLRob Vesse
2.4K views27 slides
Practical SPARQL Benchmarking by
Practical SPARQL BenchmarkingPractical SPARQL Benchmarking
Practical SPARQL BenchmarkingRob Vesse
2.8K views13 slides
Everyday Tools for the Semantic Web Developer by
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperRob Vesse
1.2K views10 slides
Everyday Tools for the Semantic Web Developer by
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperRob Vesse
2.6K views19 slides
dotNetRDF - A Semantic Web/RDF Library for .Net Developers by
dotNetRDF - A Semantic Web/RDF Library for .Net DevelopersdotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net DevelopersRob Vesse
2.8K views10 slides

More from Rob Vesse(6)

Challenges and patterns for semantics at scale by Rob Vesse
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scale
Rob Vesse390 views
Introducing JDBC for SPARQL by Rob Vesse
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQL
Rob Vesse2.4K views
Practical SPARQL Benchmarking by Rob Vesse
Practical SPARQL BenchmarkingPractical SPARQL Benchmarking
Practical SPARQL Benchmarking
Rob Vesse2.8K views
Everyday Tools for the Semantic Web Developer by Rob Vesse
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
Rob Vesse1.2K views
Everyday Tools for the Semantic Web Developer by Rob Vesse
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
Rob Vesse2.6K views
dotNetRDF - A Semantic Web/RDF Library for .Net Developers by Rob Vesse
dotNetRDF - A Semantic Web/RDF Library for .Net DevelopersdotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
Rob Vesse2.8K views

Recently uploaded

Future of Learning - Yap Aye Wee.pdf by
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdfNUS-ISS
38 views11 slides
Understanding GenAI/LLM and What is Google Offering - Felix Goh by
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
39 views33 slides
Liqid: Composable CXL Preview by
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL PreviewCXL Forum
121 views8 slides
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... by
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...NUS-ISS
32 views54 slides
AI: mind, matter, meaning, metaphors, being, becoming, life values by
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life valuesTwain Liu 刘秋艳
34 views16 slides
"How we switched to Kanban and how it integrates with product planning", Vady... by
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...Fwdays
61 views24 slides

Recently uploaded(20)

Future of Learning - Yap Aye Wee.pdf by NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 views
Understanding GenAI/LLM and What is Google Offering - Felix Goh by NUS-ISS
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS39 views
Liqid: Composable CXL Preview by CXL Forum
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum121 views
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... by NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS32 views
AI: mind, matter, meaning, metaphors, being, becoming, life values by Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
"How we switched to Kanban and how it integrates with product planning", Vady... by Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays61 views
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy by Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays40 views
GigaIO: The March of Composability Onward to Memory with CXL by CXL Forum
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXL
CXL Forum126 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10165 views
Photowave Presentation Slides - 11.8.23.pptx by CXL Forum
Photowave Presentation Slides - 11.8.23.pptxPhotowave Presentation Slides - 11.8.23.pptx
Photowave Presentation Slides - 11.8.23.pptx
CXL Forum126 views
.conf Go 2023 - Data analysis as a routine by Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk90 views
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS23 views
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur by Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays40 views
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM by CXL Forum
Samsung: CMM-H Tiered Memory Solution with Built-in DRAMSamsung: CMM-H Tiered Memory Solution with Built-in DRAM
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM
CXL Forum105 views
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor... by Vadym Kazulkin
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
Vadym Kazulkin70 views
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... by Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays40 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 views
Web Dev - 1 PPT.pdf by gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet52 views

Practical SPARQL Benchmarking Revisited

  • 1. 1 Rob Vesse rvesse@yarcdata.com @RobVesse
  • 2. 2 1. Rewind to 2012 2. Limitations 3. Evolving the Framework 4. Examples 5. Future Work
  • 3. 3
  • 4. 4  Presentation I gave at this conference in 2012  Slides at http://www.slideshare.net/RobVesse/practical-sparql-benchmarking  Highlighted some issues with SPARQL Benchmarking:  Standard Benchmarks all have know deficiencies  Lack of standardized methodology  Best benchmark is the one you run with your data and workload  Introduced the 1.x version of our SPARQL Query Benchmarker tool  Java tool and API for benchmarking  Used a methodology based upon combination of the BSBM runner and Revelytix SP2B white paper  Reports various appropriate statistics  Various configuration options to change what exactly is benchmarked e.g. whether results are fully parsed and counted
  • 5. 5  The 1.x tool was open sourced shortly after the 2012 conference under a 3 clause BSD License  Available on SourceForge  http://sourceforge.net/projects/sparql-query-bm/files/1.0.0/  Also as Maven artifacts (in Maven Central):  Group ID: net.sf.sparql-query-bm  Artifact IDs:  cmd  core  Latest 1.x Version: 1.1.0
  • 6. 6
  • 7.  The 1.x tool can only benchmark SPARQL queries  SPARQL 1.1 has been standardized since the 1.x version of the tool was written and adds various additional SPARQL features that you may want to test: 7  SPARQL Updates  SPARQL Graph Store Protocol  Queries are fixed  No parameterization support  Can't pass custom endpoint parameters in  For example enable/disable reasoning  Also no way to test endpoint specific extensions  e.g. transactions
  • 8. 8  Requires using HTTP endpoints to access the SPARQL system to be tested  Adds communication overheads to the results  Sometimes this may be desirable  No ability to test SPARQL operations in-memory  i.e. can't test lower level APIs
  • 9.  Only supports a single benchmarking methodology  Methodology is hard coded  Can't do things like run a subset of the provided operations on each run 9  Or repeat an operation within a run  Or retry an operation under specific failure conditions  Configuration of the methodology is tightly coupled to the methodology  Many aspects are actually independent of the methodology
  • 10. 1 0  Used a simplistic text based format  One query file per line  No way to specify additional parameters  No way to assign a friendly name to queries  Assigns each query the filename
  • 11.  There is a progress monitoring API but it is limited  E.g. Gets called after a query completes but not before it starts  Makes it awkward/impossible to implement some kinds of monitoring 1 1  e.g. crash detection, memory usage
  • 12. 1 2  In the interests of speed over usability we rolled our own command line arguments parser  Means argument parsing is awkward to extend
  • 13. 1 3
  • 14. 1 4  Earlier this year we found a compelling reason to rewrite the tool and address the various limitations  First 2.x release was made 9th June 2014  Minor bug fix and maintenance releases since  Releases available at:  http://sourceforge.net/projects/sparql-query-bm/files/  Code is now using Git  http://git.code.sf.net/p/sparql-query-bm/git sparql-query-bm-git  Mirrors available on GitHub for those who think that it is the one true source  https://github.com/rvesse/sparql-query-bm  Maven artifacts available through Maven Central as before:  Group ID: net.sf.sparql-query-bm  Artifact IDs: core, cmd and dist  Latest 2.x version: 2.0.1
  • 15.  Concept of Queries replaced with the general concept of Operations  Also divorces the definition of an operation with how to run said operation 1 5  Makes it easier to change runtime behaviour of operations  20 built-in operations provided  API allows defining and plugging in new operations as desired  http://sparql-query-bm.sourceforge.net/javadoc/latest/core/
  • 16. 1 6  Several kinds of query/update  Fixed  Parameterized  Dataset Size  Variants for both remote endpoints and in-memory datasets  Remote variants have additional NVP variants  Allows adding custom parameters to the remote request  Accounts for 13 of the built in operations
  • 17. 1 7  One for each graph store protocol operation:  DELETE  GET  HEAD  POST  PUT  Accounts for a further 5 of the built-in operations
  • 18. 1 8  Sleep  Do nothing for some period  Useful for simulating quiet periods as part of testing  Mix  Allow grouping a set of operations into a single operation  Lets you compose mixes from other mixes
  • 19. 1 9  As already noted in-memory variants of some operations are now available  These run tests against a Dataset implementation  Part of Apache Jena ARQ API  Removes SPARQL Protocol and HTTP overhead from testing  Of course depending on Dataset implementation may still be some communication overhead  But this is likely using lower level back end native communications protocols instead
  • 20. 2 0  Addresses the limitation of hard coded methodology  Separates test running into three components:  Overall runner  Mix runner  Operation runner  Each has own API and can be customized as desired  Various useful base/abstract implementations provided  Four different test runners are provided:  Benchmark  Smoke  Soak  Stress
  • 21. 2 1  Smoke  Runs the mix once and indicates whether it passes/fails  Pass is defined as all operations pass  Soak  Run the mix continuously for some period of time  Test how a system reacts under continuous load  Stress  Run the mix with increasingly high load  Test how a system reacts under increasing load  AbstractRunner provides a basic framework and helper method to make it easy to add custom runners or customize existing runs
  • 22. 2 2  Allows customizing how mixes and individual operations are run  Some alternative implementations built in:  E.g. SamplingOperationMixRunner  Runs a sample of the operations in the mix  May include repeats  E.g. RetryingOperationRunner  Retries an operation if it doesn't succeed  Easy to implement your own
  • 23. 2 3  Separates test configuration from the test runner  Interface with all common configuration defined  Endpoints  Timeouts  Progress Listeners  etc  NB - Runners are typically defined such that they restrict their input options to sub-interfaces that add runner specific configuration e.g.  Warm-ups for benchmarks  Total runtime for soak testing  Ramp up factor for stress testing
  • 24. 2 4  Now using TSV as the file format  Still wanted to be simple enough that someone with zero RDF/SPARQL knowledge can configure  Each line is a series of parameters separated by a tab character  First parameter is an identifier for the type of the operation  Used to decide how to interpret the remaining parameters  Can define your own mix file format and register a loader for it  Possible to override the loader for a specific operation identifier since this has an API  Means you can do neat tricks like use a mix designed for remote endpoints against an in-memory dataset
  • 25. query 806670-warmup1.rq 806670 Warmup Query 1 query 806670-warmup2.rq 806670 Warmup Query 2 query 806670-nofilter.rq 806670 Query with No Filter query 806670-filter3.rq 806670 Query with Filter (Variant 3) param-query 806670-filter3-params.rq instances.tsv Parameterized Query with Filter (Variant 3) query 806670-filter4.rq 806670 Query with Filter (Variant 4) query 806670-filter4a.rq 806670 Query with Filter (Variant 4a - Zero Results) param-query 806670-filter4-params.rq instances.tsv Parameterized Query with Filter (Variant 4) query 806238-warmup1.rq 806238 Warmup Query 1 query 806238-warmup2.rq 806238 Warmup Query 2 query 806238-comment43.rq 806238 Query (Comment 43) query 806238-comment43a.rq 806238 Query (Comment 43 - SELECT * sub-query) query 806238-comment45.rq 806238 Query (Comment 45 - Multiple sub-queries) query 806238-comment54.rq 806238 Query (Comment 54) param-update load-full1m.ru graph-names.tsv Load 1M Dataset into named graph param-query count-loaded.rq graph-names.tsv Count named graph param-update drop-loaded.ru graph-names.tsv Drop named graph query count.rq Count quads checkpoint10 Checkpoint every 10 runs sleep 180 3 minute sleep 2 5
  • 26.  Now provides notifications before and after operation and mix runs  Improvements to how some of the built-in implementations handle multi-threaded output 2 6  Makes it easier to distinguish where errors occurred when running multi-threaded benchmarks
  • 27. 2 7  Now based upon the powerful open source Airline library  https://github.com/airlift/airline  Provides a command line interface to each built-in runner  Also provides AbstractCommandwith all standard options exposed  Standardized exit codes across all commands  Comprehensive built-in help  Can help you define operation mixes  ./operations  ./operation --op param-query
  • 28. 2 8
  • 29.  These are things we've done (or are currently doing) with the framework that aren't in the open source releases  However the 2.x framework makes these (hopefully) easy to replicate yourself 2 9
  • 30. 3 0  Many stores often have rich REST APIs in addition to their SPARQL APIs  Can be useful to include testing of these in your mixes  Requires implementing two interfaces:  Operation  OperationCallable  Abstract implementations of both available to give you the boiler plate bits  Internally we have 9 different custom operations defined which test a subset of our REST API:  Database Management  Asynchronous Queries  Import Management
  • 31.  One thing we're particularly interested in is how operations affect memory usage 3 1  We added custom progress listeners that track and monitor memory usage  Reports on min, max and average memory usage  We also have another progress listener that tracks processes to identify when a test run may have been impacted by other activity on the system
  • 32. 3 2 public class RetryOnAuthFailureOperationRunner extends RetryingOperationRunner { public RetryOnAuthFailureOperationRunner() { this(1); } public RetryOnAuthFailureOperationRunner(int maxRetries) { super(maxRetries); } @Override protected <T extends Options> boolean shouldRetry(Runner<T> runner, T options, Operation op, OperationRun run) { return run.getErrorCategory() == ErrorCategories.AUTHENTICATION; } }  Extends the built-in RetryingOperationRunner  Simply adds a constraint on retries by overriding the shouldRetry() method
  • 33. 3 3
  • 34. 3 4  Embrace Java 7 features fully  Use ServiceLoader to automatically discover new operations and mix formats  Make it even easier to customize runners  i.e. provide more abstraction of the current implementations
  • 35. 3 5 Questions? rvesse@yarcdata.com @RobVesse

Editor's Notes

  1. Ask for a show of hands as to who has used the tool to get an idea of the audience
  2. SPARQL 1.1 standardized 21st March 2013