SlideShare a Scribd company logo
Rob Vesse
rvesse@yarcdata.com
     @RobVesse




                      1
 Regardless of what technology your solution will be built on
  (RDBMS, RDF + SPARQL, NoSQL etc) you need to know it
  performs sufficiently to meet your goals
 You need to justify option X over option Y
   Business – Price vs Performance
   Technical – Does it perform sufficiently?
 No guarantee that a standard benchmark accurately
 models your usage




                                                           2
 Berlin SPARQL Benchmark (BSBM)
   Relational style data model
   Access pattern simulates replacing a traditional RDBMS with a Triple
      Store
 Lehigh University Benchmark (LUBM)
   More typical RDF data model
   Stores require reasoning to answer the queries correctly
 SPARQL2Bench (SP2B)
     Again typical RDF data model
     Queries designed to be hard – cross products, filters, etc.
     Generates artificially massive unrealistic results
     Tests clever optimization and join performance




                                                                      3
 Often no standardized       methodology
   E.g. only BSBM provides a test harness
 Lack of transparency as a result
   If I say I’m 10x faster than you is that really true or did I measure
    differently?
   Are the figures you’re comparing with even current?
 What actually got measured?
   Time to start responding
   Time to count all results
   Something else?
 Even if you run a benchmark does it actually tell you
  anything useful?



                                                                            4
 Java command line tool (and API) for benchmarking
 Designed to be highly configurable
   Runs any set of SPARQL queries you can devise against any HTTP
    based SPARQL endpoint
   Run single and multi-threaded benchmarks
   Generates a variety of statistics
 Methodology
   Runs some quick sanity tests to check the provided endpoint is up
    and working
   Optionally runs W warm up runs prior to actual benchmarking
   Runs a Query Mix N times
      Randomizes query order for each run
      Discards outliers (best and worst runs)
   Calculates averages, variances and standard deviations over the runs
   Generates reports as CSV and XML


                                                                        5
 Response Time
   Time from when query is issued to when results start being received
 Runtime
   Time from when query is issued to all results being received and
    counted
   Exact definition may vary according to configuration
 Queries per Second
   How many times a given query can be executed per second
 Query Mixed per Hour
   How many times a query mix can be executed per hour




                                                                       6
7
   SP2B at 10k, 50k and 250k run with 5 warm-ups and 25 runs
     All options left as defaults i.e. full result counting
     Runs for 50k and 250k skipped if store was incapable of performing the run
        in reasonable time
   Run on following systems
     *nix based stores run on late 2011 Mac Book Pro (quad core, 8GB RAM,
        SSD)
          Java heap space set to 4GB
     Windows based stores run on HP Laptop (dual core, 4GB RAM, HDD)
     Both low powered systems compared to servers
   Benchmarked Stores
       Jena TDB 0.9.1
       Sesame 2.6.5 (Memory and Native Stores)
       Bigdata 1.2 (WORM Store)
       Dydra
       Virtuoso 6.1.3 (Open Source Edition)
       dotNetRDF (In-Memory Store)
       Stardog 0.9.4 (In-Memory and Disk Stores)
       OWLIM


                                                                             8
9
1
0
1
1
 Code Release is management Approved
     Currently undergoing Legal and IP Clearance
     Should be open sourced shortly under a BSD license
     Will be available from https://sourceforge.net/p/sparql-query-bm
     Apologies this isn’t yet available at time of writing
 Example Results data available       from:
   https://dl.dropbox.com/u/590790/semtech2012.tar.gz




                                                                         1
                                                                         2
1
3

More Related Content

What's hot

dh-slides-perf.ppt
dh-slides-perf.pptdh-slides-perf.ppt
dh-slides-perf.ppt
hackday08
 
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
MongoDB
 
Drupal meets PostgreSQL for DrupalCamp MSK 2014
Drupal meets PostgreSQL for DrupalCamp MSK 2014Drupal meets PostgreSQL for DrupalCamp MSK 2014
Drupal meets PostgreSQL for DrupalCamp MSK 2014
Kate Marshalkina
 
WTF?
WTF?WTF?
re:dash is awesome
re:dash is awesomere:dash is awesome
re:dash is awesome
Hiroshi Toyama
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
Jukka Zitting
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
DaeMyung Kang
 
Gluster the ugly parts with Jeff Darcy
Gluster  the ugly parts with Jeff DarcyGluster  the ugly parts with Jeff Darcy
Gluster the ugly parts with Jeff Darcy
Gluster.org
 
Actions in QTP
Actions in QTPActions in QTP
Actions in QTPAnish10110
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
HBaseCon
 
CCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloudCCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloud
walk2talk srl
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
Chris Adkin
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
Ron Kuris
 
Compression talk
Compression talkCompression talk
Compression talk
Ilya Ganelin
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
Santanu Dey
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
PGConf APAC
 

What's hot (16)

dh-slides-perf.ppt
dh-slides-perf.pptdh-slides-perf.ppt
dh-slides-perf.ppt
 
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
 
Drupal meets PostgreSQL for DrupalCamp MSK 2014
Drupal meets PostgreSQL for DrupalCamp MSK 2014Drupal meets PostgreSQL for DrupalCamp MSK 2014
Drupal meets PostgreSQL for DrupalCamp MSK 2014
 
WTF?
WTF?WTF?
WTF?
 
re:dash is awesome
re:dash is awesomere:dash is awesome
re:dash is awesome
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
Gluster the ugly parts with Jeff Darcy
Gluster  the ugly parts with Jeff DarcyGluster  the ugly parts with Jeff Darcy
Gluster the ugly parts with Jeff Darcy
 
Actions in QTP
Actions in QTPActions in QTP
Actions in QTP
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
 
CCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloudCCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloud
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
 
Compression talk
Compression talkCompression talk
Compression talk
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
 

Similar to Practical SPARQL Benchmarking

Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
Jonathan Long
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
Doug Burns
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity HardwareBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Laurens De Vocht
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
Nicolas Poggi
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
t_ivanov
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
SnappyData
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
jlacefie
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practiceswebuploader
 
FHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhirFHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhir
Brian Postlethwaite
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
Api testing libraries using java script an overview
Api testing libraries using java script   an overviewApi testing libraries using java script   an overview
Api testing libraries using java script an overview
vodQA
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
Kognitio
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Planning For High Performance Web Application
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web Application
Yue Tian
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 

Similar to Practical SPARQL Benchmarking (20)

Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity HardwareBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practices
 
FHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhirFHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhir
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Api testing libraries using java script an overview
Api testing libraries using java script   an overviewApi testing libraries using java script   an overview
Api testing libraries using java script an overview
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Planning For High Performance Web Application
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web Application
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 

More from Rob Vesse

Challenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scale
Rob Vesse
 
Apache Jena Elephas and Friends
Apache Jena Elephas and FriendsApache Jena Elephas and Friends
Apache Jena Elephas and Friends
Rob Vesse
 
Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystem
Rob Vesse
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking Revisited
Rob Vesse
 
Introducing JDBC for SPARQL
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQL
Rob Vesse
 
Everyday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
Rob Vesse
 
Everyday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
Rob Vesse
 
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net DevelopersdotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
Rob Vesse
 

More from Rob Vesse (8)

Challenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scale
 
Apache Jena Elephas and Friends
Apache Jena Elephas and FriendsApache Jena Elephas and Friends
Apache Jena Elephas and Friends
 
Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystem
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking Revisited
 
Introducing JDBC for SPARQL
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQL
 
Everyday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
 
Everyday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
 
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net DevelopersdotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Practical SPARQL Benchmarking

  • 2.  Regardless of what technology your solution will be built on (RDBMS, RDF + SPARQL, NoSQL etc) you need to know it performs sufficiently to meet your goals  You need to justify option X over option Y  Business – Price vs Performance  Technical – Does it perform sufficiently?  No guarantee that a standard benchmark accurately models your usage 2
  • 3.  Berlin SPARQL Benchmark (BSBM)  Relational style data model  Access pattern simulates replacing a traditional RDBMS with a Triple Store  Lehigh University Benchmark (LUBM)  More typical RDF data model  Stores require reasoning to answer the queries correctly  SPARQL2Bench (SP2B)  Again typical RDF data model  Queries designed to be hard – cross products, filters, etc.  Generates artificially massive unrealistic results  Tests clever optimization and join performance 3
  • 4.  Often no standardized methodology  E.g. only BSBM provides a test harness  Lack of transparency as a result  If I say I’m 10x faster than you is that really true or did I measure differently?  Are the figures you’re comparing with even current?  What actually got measured?  Time to start responding  Time to count all results  Something else?  Even if you run a benchmark does it actually tell you anything useful? 4
  • 5.  Java command line tool (and API) for benchmarking  Designed to be highly configurable  Runs any set of SPARQL queries you can devise against any HTTP based SPARQL endpoint  Run single and multi-threaded benchmarks  Generates a variety of statistics  Methodology  Runs some quick sanity tests to check the provided endpoint is up and working  Optionally runs W warm up runs prior to actual benchmarking  Runs a Query Mix N times  Randomizes query order for each run  Discards outliers (best and worst runs)  Calculates averages, variances and standard deviations over the runs  Generates reports as CSV and XML 5
  • 6.  Response Time  Time from when query is issued to when results start being received  Runtime  Time from when query is issued to all results being received and counted  Exact definition may vary according to configuration  Queries per Second  How many times a given query can be executed per second  Query Mixed per Hour  How many times a query mix can be executed per hour 6
  • 7. 7
  • 8. SP2B at 10k, 50k and 250k run with 5 warm-ups and 25 runs  All options left as defaults i.e. full result counting  Runs for 50k and 250k skipped if store was incapable of performing the run in reasonable time  Run on following systems  *nix based stores run on late 2011 Mac Book Pro (quad core, 8GB RAM, SSD)  Java heap space set to 4GB  Windows based stores run on HP Laptop (dual core, 4GB RAM, HDD)  Both low powered systems compared to servers  Benchmarked Stores  Jena TDB 0.9.1  Sesame 2.6.5 (Memory and Native Stores)  Bigdata 1.2 (WORM Store)  Dydra  Virtuoso 6.1.3 (Open Source Edition)  dotNetRDF (In-Memory Store)  Stardog 0.9.4 (In-Memory and Disk Stores)  OWLIM 8
  • 9. 9
  • 10. 1 0
  • 11. 1 1
  • 12.  Code Release is management Approved  Currently undergoing Legal and IP Clearance  Should be open sourced shortly under a BSD license  Will be available from https://sourceforge.net/p/sparql-query-bm  Apologies this isn’t yet available at time of writing  Example Results data available from:  https://dl.dropbox.com/u/590790/semtech2012.tar.gz 1 2
  • 13. 1 3

Editor's Notes

  1. Introduce MyselfMay want to add a disclaimer here about views/opinions expressed primarily being my personal ones and not those of the company a la DVD extras disclaimers ;-)
  2. What is says on the slide ;-)
  3. Describe the benchmarks – shown on slidesDiscuss deficiencies of each benchmarkBSBMRelational – not really showing off the capabilities of a SPARQL engineLUBMNeed for reasoning – implementation thereof can make a huge difference in performanceForward vs Backward Chaining ReasoningSP2BQueries are unrealisticFocuses on optimization
  4. Self explanatory slide for the most partHighlight that just because the store you are interested in is good/bad at a particular benchmark doesn’t tell you whether the store is good/bad for your use case
  5. Describe the methodology in detailNote that this is based on an amalgamation of the BSBM style and Revelytix SP2B methodologies
  6. Key Point is to cover difference between Response Time and RuntimeNote that this stat can give some interesting information about how stores execute queries – almost instant response time but much longer runtime indicates streaming execution. Long response time with small difference to runtime indicates a batch execution.
  7. Run through a brief demo of the command line tool – make sure to have a running Stardog/Fuseki instance to run against – likely safer to use Fuseki as easier to ensure running and open source so no appearance of bias to a commercial productRun on SP2B 10k – will complete in reasonable time while I’m talking – suggest using a limited number of runs for demo purposes.Show the output data (CSV and XML)Key difference is CSV converts to seconds while XML uses raw nanosecondsXML is better for post processingCSV useful for quick import into Spreadsheet tools
  8. Discuss the setup for the example results – why the stores were chosen?Ease of availability (open source, runnable on *nix, personal interest etc)Ensure to highlight YMMVDisclaimer – Be sure to state that this is just a arbitrarily selected sample of stores and that performance indicated here may not be representative of the true performance of any store. Most importantly Cray/YarcData is not endorsing any specific store.Again point out the importance of people running their own benchmarks
  9. Note how as dataset size increases many stores can’t complete within reasonable time on the machines we usedLogarithmic ScaleMake sure to mention that the fact that many stores did not complete on the 50k and 250k sizes doesn’t mean they are defective, merely that with the machine resources available they couldn’t run in a timely fashion. This leads nicely to the point that it is important to benchmark on the hardware you actually intend to use.
  10. Discuss the variation in average runtime – some stores are way ahead of othersNote that some store’s results are heavily influenced by poor performance on certain queries – see next slideLogarithmic Scale
  11. Highlight the variation in performance both between stores and queries. Note how certain queries are just fundamentally hard even with clever optimisationIn-Memory trumps disk for relevant stores in most cases