Practical SPARQL Benchmarking

Rob Vesse
Rob VesseSoftware Engineer at YarcData
Rob Vesse
rvesse@yarcdata.com
     @RobVesse




                      1
 Regardless of what technology your solution will be built on
  (RDBMS, RDF + SPARQL, NoSQL etc) you need to know it
  performs sufficiently to meet your goals
 You need to justify option X over option Y
   Business – Price vs Performance
   Technical – Does it perform sufficiently?
 No guarantee that a standard benchmark accurately
 models your usage




                                                           2
 Berlin SPARQL Benchmark (BSBM)
   Relational style data model
   Access pattern simulates replacing a traditional RDBMS with a Triple
      Store
 Lehigh University Benchmark (LUBM)
   More typical RDF data model
   Stores require reasoning to answer the queries correctly
 SPARQL2Bench (SP2B)
     Again typical RDF data model
     Queries designed to be hard – cross products, filters, etc.
     Generates artificially massive unrealistic results
     Tests clever optimization and join performance




                                                                      3
 Often no standardized       methodology
   E.g. only BSBM provides a test harness
 Lack of transparency as a result
   If I say I’m 10x faster than you is that really true or did I measure
    differently?
   Are the figures you’re comparing with even current?
 What actually got measured?
   Time to start responding
   Time to count all results
   Something else?
 Even if you run a benchmark does it actually tell you
  anything useful?



                                                                            4
 Java command line tool (and API) for benchmarking
 Designed to be highly configurable
   Runs any set of SPARQL queries you can devise against any HTTP
    based SPARQL endpoint
   Run single and multi-threaded benchmarks
   Generates a variety of statistics
 Methodology
   Runs some quick sanity tests to check the provided endpoint is up
    and working
   Optionally runs W warm up runs prior to actual benchmarking
   Runs a Query Mix N times
      Randomizes query order for each run
      Discards outliers (best and worst runs)
   Calculates averages, variances and standard deviations over the runs
   Generates reports as CSV and XML


                                                                        5
 Response Time
   Time from when query is issued to when results start being received
 Runtime
   Time from when query is issued to all results being received and
    counted
   Exact definition may vary according to configuration
 Queries per Second
   How many times a given query can be executed per second
 Query Mixed per Hour
   How many times a query mix can be executed per hour




                                                                       6
7
   SP2B at 10k, 50k and 250k run with 5 warm-ups and 25 runs
     All options left as defaults i.e. full result counting
     Runs for 50k and 250k skipped if store was incapable of performing the run
        in reasonable time
   Run on following systems
     *nix based stores run on late 2011 Mac Book Pro (quad core, 8GB RAM,
        SSD)
          Java heap space set to 4GB
     Windows based stores run on HP Laptop (dual core, 4GB RAM, HDD)
     Both low powered systems compared to servers
   Benchmarked Stores
       Jena TDB 0.9.1
       Sesame 2.6.5 (Memory and Native Stores)
       Bigdata 1.2 (WORM Store)
       Dydra
       Virtuoso 6.1.3 (Open Source Edition)
       dotNetRDF (In-Memory Store)
       Stardog 0.9.4 (In-Memory and Disk Stores)
       OWLIM


                                                                             8
9
1
0
1
1
 Code Release is management Approved
     Currently undergoing Legal and IP Clearance
     Should be open sourced shortly under a BSD license
     Will be available from https://sourceforge.net/p/sparql-query-bm
     Apologies this isn’t yet available at time of writing
 Example Results data available       from:
   https://dl.dropbox.com/u/590790/semtech2012.tar.gz




                                                                         1
                                                                         2
1
3
1 of 13

Recommended

Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE by
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSECassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE
Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSEDataStax Academy
1.4K views20 slides
Load testing and performance tracing by
Load testing and performance tracingLoad testing and performance tracing
Load testing and performance tracingHans Höchtl
2.7K views29 slides
Cache optimization by
Cache optimizationCache optimization
Cache optimizationVani Kandhasamy
688 views61 slides
Lessons PostgreSQL learned from commercial databases, and didn’t by
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
1.7K views21 slides
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa... by
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
2.5K views31 slides
Scalable and Available, Patterns for Success by
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessDerek Collison
20.2K views117 slides

More Related Content

What's hot

dh-slides-perf.ppt by
dh-slides-perf.pptdh-slides-perf.ppt
dh-slides-perf.ppthackday08
496 views13 slides
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar... by
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...MongoDB
671 views65 slides
Drupal meets PostgreSQL for DrupalCamp MSK 2014 by
Drupal meets PostgreSQL for DrupalCamp MSK 2014Drupal meets PostgreSQL for DrupalCamp MSK 2014
Drupal meets PostgreSQL for DrupalCamp MSK 2014Kate Marshalkina
3K views32 slides
WTF? by
WTF?WTF?
WTF?Andrey Karpov
101 views2 slides
re:dash is awesome by
re:dash is awesomere:dash is awesome
re:dash is awesomeHiroshi Toyama
19K views42 slides
OSGifying the repository by
OSGifying the repositoryOSGifying the repository
OSGifying the repositoryJukka Zitting
2.2K views11 slides

What's hot(16)

dh-slides-perf.ppt by hackday08
dh-slides-perf.pptdh-slides-perf.ppt
dh-slides-perf.ppt
hackday08496 views
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar... by MongoDB
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
MongoDB671 views
Drupal meets PostgreSQL for DrupalCamp MSK 2014 by Kate Marshalkina
Drupal meets PostgreSQL for DrupalCamp MSK 2014Drupal meets PostgreSQL for DrupalCamp MSK 2014
Drupal meets PostgreSQL for DrupalCamp MSK 2014
Kate Marshalkina3K views
OSGifying the repository by Jukka Zitting
OSGifying the repositoryOSGifying the repository
OSGifying the repository
Jukka Zitting2.2K views
Troubleshooting redis by DaeMyung Kang
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
DaeMyung Kang2.6K views
Gluster the ugly parts with Jeff Darcy by Gluster.org
Gluster  the ugly parts with Jeff DarcyGluster  the ugly parts with Jeff Darcy
Gluster the ugly parts with Jeff Darcy
Gluster.org819 views
Actions in QTP by Anish10110
Actions in QTPActions in QTP
Actions in QTP
Anish101108.4K views
Breaking the Sound Barrier with Persistent Memory by HBaseCon
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
HBaseCon1.6K views
CCI2018 - Benchmarking in the cloud by walk2talk srl
CCI2018 - Benchmarking in the cloudCCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloud
walk2talk srl304 views
J2EE Performance And Scalability Bp by Chris Adkin
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
Chris Adkin3K views
Cassandra from tarball to production by Ron Kuris
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
Ron Kuris710 views
Cloud Performance Benchmarking by Santanu Dey
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
Santanu Dey1.4K views
Why we love pgpool-II and why we hate it! by PGConf APAC
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
PGConf APAC6K views

Similar to Practical SPARQL Benchmarking

Ceph - High Performance Without High Costs by
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High CostsJonathan Long
385 views21 slides
How Many Slaves (Ukoug) by
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)Doug Burns
802 views37 slides
UnConference for Georgia Southern Computer Science March 31, 2015 by
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
1K views76 slides
Big Linked Data ETL Benchmark on Cloud Commodity Hardware by
Big Linked Data ETL Benchmark on Cloud Commodity HardwareBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity HardwareLaurens De Vocht
482 views31 slides
Benchmarking Hadoop and Big Data by
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
3.2K views41 slides
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St... by
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
385 views51 slides

Similar to Practical SPARQL Benchmarking(20)

Ceph - High Performance Without High Costs by Jonathan Long
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
Jonathan Long385 views
How Many Slaves (Ukoug) by Doug Burns
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
Doug Burns802 views
UnConference for Georgia Southern Computer Science March 31, 2015 by Christopher Curtin
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Big Linked Data ETL Benchmark on Cloud Commodity Hardware by Laurens De Vocht
Big Linked Data ETL Benchmark on Cloud Commodity HardwareBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Laurens De Vocht482 views
Benchmarking Hadoop and Big Data by Nicolas Poggi
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
Nicolas Poggi3.2K views
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St... by t_ivanov
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
t_ivanov385 views
Thing you didn't know you could do in Spark by SnappyData
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
SnappyData4.2K views
Planning for-high-performance-web-application by Nguyễn Duy Nhân
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
Nguyễn Duy Nhân1.8K views
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010 by Bhupesh Bansal
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal1.3K views
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ... by javier ramirez
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez18 views
Data Engineering for Data Scientists by jlacefie
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
jlacefie1.2K views
scale_perf_best_practices by webuploader
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practices
webuploader724 views
VMworld 2013: Virtualizing Databases: Doing IT Right by VMworld
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld1.3K views
Api testing libraries using java script an overview by vodQA
Api testing libraries using java script   an overviewApi testing libraries using java script   an overview
Api testing libraries using java script an overview
vodQA177 views
SQL on Hadoop benchmarks using TPC-DS query set by Kognitio
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
Kognitio412 views
Planning For High Performance Web Application by Yue Tian
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web Application
Yue Tian1.3K views
Hardware Provisioning by MongoDB
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB2.4K views

More from Rob Vesse

Challenges and patterns for semantics at scale by
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleRob Vesse
390 views15 slides
Apache Jena Elephas and Friends by
Apache Jena Elephas and FriendsApache Jena Elephas and Friends
Apache Jena Elephas and FriendsRob Vesse
2.4K views41 slides
Quadrupling your elephants - RDF and the Hadoop ecosystem by
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemRob Vesse
5K views40 slides
Practical SPARQL Benchmarking Revisited by
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedRob Vesse
1.1K views35 slides
Introducing JDBC for SPARQL by
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQLRob Vesse
2.4K views27 slides
Everyday Tools for the Semantic Web Developer by
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperRob Vesse
1.2K views10 slides

More from Rob Vesse(8)

Challenges and patterns for semantics at scale by Rob Vesse
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scale
Rob Vesse390 views
Apache Jena Elephas and Friends by Rob Vesse
Apache Jena Elephas and FriendsApache Jena Elephas and Friends
Apache Jena Elephas and Friends
Rob Vesse2.4K views
Quadrupling your elephants - RDF and the Hadoop ecosystem by Rob Vesse
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystem
Rob Vesse5K views
Practical SPARQL Benchmarking Revisited by Rob Vesse
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking Revisited
Rob Vesse1.1K views
Introducing JDBC for SPARQL by Rob Vesse
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQL
Rob Vesse2.4K views
Everyday Tools for the Semantic Web Developer by Rob Vesse
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
Rob Vesse1.2K views
Everyday Tools for the Semantic Web Developer by Rob Vesse
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
Rob Vesse2.6K views
dotNetRDF - A Semantic Web/RDF Library for .Net Developers by Rob Vesse
dotNetRDF - A Semantic Web/RDF Library for .Net DevelopersdotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
Rob Vesse2.8K views

Recently uploaded

[2023] Putting the R! in R&D.pdf by
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdfEleanor McHugh
38 views127 slides
Roadmap to Become Experts.pptx by
Roadmap to Become Experts.pptxRoadmap to Become Experts.pptx
Roadmap to Become Experts.pptxdscwidyatamanew
11 views45 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
16 views6 slides
.conf Go 2023 - Data analysis as a routine by
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
93 views12 slides
Throughput by
ThroughputThroughput
ThroughputMoisés Armani Ramírez
36 views11 slides
Web Dev - 1 PPT.pdf by
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdfgdsczhcet
55 views45 slides

Recently uploaded(20)

[2023] Putting the R! in R&D.pdf by Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2216 views
.conf Go 2023 - Data analysis as a routine by Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk93 views
Web Dev - 1 PPT.pdf by gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet55 views
How the World's Leading Independent Automotive Distributor is Reinventing Its... by NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst470 views
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor... by Vadym Kazulkin
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
Vadym Kazulkin75 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta15 views
RADIUS-Omnichannel Interaction System by RADIUS
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction System
RADIUS15 views
Future of Learning - Khoong Chan Meng by NUS-ISS
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
NUS-ISS33 views
DALI Basics Course 2023 by Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg14 views
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma17 views
Perth MeetUp November 2023 by Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price15 views
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica... by NUS-ISS
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
NUS-ISS16 views

Practical SPARQL Benchmarking

  • 2.  Regardless of what technology your solution will be built on (RDBMS, RDF + SPARQL, NoSQL etc) you need to know it performs sufficiently to meet your goals  You need to justify option X over option Y  Business – Price vs Performance  Technical – Does it perform sufficiently?  No guarantee that a standard benchmark accurately models your usage 2
  • 3.  Berlin SPARQL Benchmark (BSBM)  Relational style data model  Access pattern simulates replacing a traditional RDBMS with a Triple Store  Lehigh University Benchmark (LUBM)  More typical RDF data model  Stores require reasoning to answer the queries correctly  SPARQL2Bench (SP2B)  Again typical RDF data model  Queries designed to be hard – cross products, filters, etc.  Generates artificially massive unrealistic results  Tests clever optimization and join performance 3
  • 4.  Often no standardized methodology  E.g. only BSBM provides a test harness  Lack of transparency as a result  If I say I’m 10x faster than you is that really true or did I measure differently?  Are the figures you’re comparing with even current?  What actually got measured?  Time to start responding  Time to count all results  Something else?  Even if you run a benchmark does it actually tell you anything useful? 4
  • 5.  Java command line tool (and API) for benchmarking  Designed to be highly configurable  Runs any set of SPARQL queries you can devise against any HTTP based SPARQL endpoint  Run single and multi-threaded benchmarks  Generates a variety of statistics  Methodology  Runs some quick sanity tests to check the provided endpoint is up and working  Optionally runs W warm up runs prior to actual benchmarking  Runs a Query Mix N times  Randomizes query order for each run  Discards outliers (best and worst runs)  Calculates averages, variances and standard deviations over the runs  Generates reports as CSV and XML 5
  • 6.  Response Time  Time from when query is issued to when results start being received  Runtime  Time from when query is issued to all results being received and counted  Exact definition may vary according to configuration  Queries per Second  How many times a given query can be executed per second  Query Mixed per Hour  How many times a query mix can be executed per hour 6
  • 7. 7
  • 8. SP2B at 10k, 50k and 250k run with 5 warm-ups and 25 runs  All options left as defaults i.e. full result counting  Runs for 50k and 250k skipped if store was incapable of performing the run in reasonable time  Run on following systems  *nix based stores run on late 2011 Mac Book Pro (quad core, 8GB RAM, SSD)  Java heap space set to 4GB  Windows based stores run on HP Laptop (dual core, 4GB RAM, HDD)  Both low powered systems compared to servers  Benchmarked Stores  Jena TDB 0.9.1  Sesame 2.6.5 (Memory and Native Stores)  Bigdata 1.2 (WORM Store)  Dydra  Virtuoso 6.1.3 (Open Source Edition)  dotNetRDF (In-Memory Store)  Stardog 0.9.4 (In-Memory and Disk Stores)  OWLIM 8
  • 9. 9
  • 10. 1 0
  • 11. 1 1
  • 12.  Code Release is management Approved  Currently undergoing Legal and IP Clearance  Should be open sourced shortly under a BSD license  Will be available from https://sourceforge.net/p/sparql-query-bm  Apologies this isn’t yet available at time of writing  Example Results data available from:  https://dl.dropbox.com/u/590790/semtech2012.tar.gz 1 2
  • 13. 1 3

Editor's Notes

  1. Introduce MyselfMay want to add a disclaimer here about views/opinions expressed primarily being my personal ones and not those of the company a la DVD extras disclaimers ;-)
  2. What is says on the slide ;-)
  3. Describe the benchmarks – shown on slidesDiscuss deficiencies of each benchmarkBSBMRelational – not really showing off the capabilities of a SPARQL engineLUBMNeed for reasoning – implementation thereof can make a huge difference in performanceForward vs Backward Chaining ReasoningSP2BQueries are unrealisticFocuses on optimization
  4. Self explanatory slide for the most partHighlight that just because the store you are interested in is good/bad at a particular benchmark doesn’t tell you whether the store is good/bad for your use case
  5. Describe the methodology in detailNote that this is based on an amalgamation of the BSBM style and Revelytix SP2B methodologies
  6. Key Point is to cover difference between Response Time and RuntimeNote that this stat can give some interesting information about how stores execute queries – almost instant response time but much longer runtime indicates streaming execution. Long response time with small difference to runtime indicates a batch execution.
  7. Run through a brief demo of the command line tool – make sure to have a running Stardog/Fuseki instance to run against – likely safer to use Fuseki as easier to ensure running and open source so no appearance of bias to a commercial productRun on SP2B 10k – will complete in reasonable time while I’m talking – suggest using a limited number of runs for demo purposes.Show the output data (CSV and XML)Key difference is CSV converts to seconds while XML uses raw nanosecondsXML is better for post processingCSV useful for quick import into Spreadsheet tools
  8. Discuss the setup for the example results – why the stores were chosen?Ease of availability (open source, runnable on *nix, personal interest etc)Ensure to highlight YMMVDisclaimer – Be sure to state that this is just a arbitrarily selected sample of stores and that performance indicated here may not be representative of the true performance of any store. Most importantly Cray/YarcData is not endorsing any specific store.Again point out the importance of people running their own benchmarks
  9. Note how as dataset size increases many stores can’t complete within reasonable time on the machines we usedLogarithmic ScaleMake sure to mention that the fact that many stores did not complete on the 50k and 250k sizes doesn’t mean they are defective, merely that with the machine resources available they couldn’t run in a timely fashion. This leads nicely to the point that it is important to benchmark on the hardware you actually intend to use.
  10. Discuss the variation in average runtime – some stores are way ahead of othersNote that some store’s results are heavily influenced by poor performance on certain queries – see next slideLogarithmic Scale
  11. Highlight the variation in performance both between stores and queries. Note how certain queries are just fundamentally hard even with clever optimisationIn-Memory trumps disk for relevant stores in most cases