Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Assessing the performance of RDF Engines: Discussing RDF Benchmarks

964 views

Published on

ESWC 2016 Tutorial on RDF Benchmarks
(This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)

Published in: Engineering
  • Be the first to comment

Assessing the performance of RDF Engines: Discussing RDF Benchmarks

  1. 1. Assessing the performance of RDF Engines: Discussing RDF Benchmarks Irini Fundulaki Institute of Computer Science – FORTH, Greece Anastasios Kementsietsidis Google Research, USA 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 1
  2. 2. Traditional Web: Web of Documents •  single information space: global filesystem •  designed for human consumption •  documents are the primary objects with a loose structure •  URLs are the globally unique IDs and part of the retrieval mechanism •  cannot ask expressive queries 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 2 © Hartig, Cyganiac, Bizer, Hausenblas, Heath How to Publish Linked Data on the Web HTML HTML HTML Web Browsers Web Browsers hyperlinks hyperlinks
  3. 3. Going from the Web of Documents to the Web of Data •  A global database •  Designed for machines first, humans later •  Things are primary objects with a well defined structure •  Typed links between things •  Ability to express structured queries 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 3 Thing Thing Thing Thing Thing Thing Don’t link the documents, link the things typed links typed links © The Web of Linked Data: Tom Heath, An Introduction to Linked Data
  4. 4. Linking Open Datasets (LOD) •  Publish open data as Linked Data on the Web •  Interlink entities between heterogeneous data sources 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 4
  5. 5. Status of the Linked Open Data Cloud, 2007 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 5
  6. 6. Status of the Linked Open Data Cloud, 2011 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 6
  7. 7. Status of the Linked Open Data Cloud, 2014 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 7 Media Government Geographic Publications User-generated Life sciences Cross-domain RDF, a common data model More than 31B triples in LOD Links (external): 500M
  8. 8. Linked Data in numbers (2014) •  State of the LOD Cloud 2014, University of Manheim 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 8 Domain Datasets % Any SPARQL Dump Government 183 18.05 61 (32.80%) 30.11% 30.65% Publications 96 9.47 10 (10.58%) 9.62% 3.85% Life Sciences 83 8.19 19 (21.35%) 20.22% 16.85% User-generated content 48 4.73 3 (5.4%5) 5.45% 1.82% Cross-domain 41 4.04 4 (9.09%) 4.55% 6.82% Media 22 2.17 1 (2.70%) 0.00% 2.70% Geographic 21 2.07 8 (19.51%) 12.20% 12.20% Social Web 520 51.28 6 (1.16%%) 1.16% 0.39% Total 1014 - 48 (5.89%) 4.54% 3.80% Access Methods
  9. 9. Proliferation of Big Data Stores 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 9
  10. 10. Many (not a lot) RDF Stores 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 10
  11. 11. The Question(s) •  Which are the problems that I wish to solve? •  Which are the relevant key performance indicators? •  Which is the behavior of the existing engines w.r.t. the key performance indicators? 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 11 Which are the tool(s) that I should use for my data and for my use case?
  12. 12. The Answer: Benchmark your engines! 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 12 •  Querying Benchmark comprises of –  datasets (synthetic or real) –  set of software tools •  synthetic data generators •  query generators –  performance metrics, and –  set of clear execution rules •  Standardized application scenario(s) that serve as a basis for testing systems •  Must include a clear set of factors to be measured and the conditions under which the systems should be measured
  13. 13. 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 13 •  Benchmarks exist –  To allow adequate measurements of systems –  To provide evaluation of engines for real (or close to real) use cases •  Provide help –  Designers and Developers to assess the performance of their tools –  Users to compare the different available tools and evaluate suitability for their needs –  Researchers to compare their work to others •  Leads to improvements: –  Vendors can improve their technology –  Researchers can address new challenges –  Current benchmark design can be improved to cover new necessities and application domains Importance of Benchmarking
  14. 14. Tutorial Objective & Benefits •  Objectives: –  Discuss a set of principles and best practices for benchmark development –  Present an overview of the current work on benchmarks for RDF query engines –  Focus on identifying research challenges & unexplored research directions •  Benefits for the audience –  Academic: Obtain a solid background, discover new research directions –  Practitioner: find out what are the available benchmarks, advantages and limitations thereof 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 14
  15. 15. Purpose of the Tutorial •  Stimulate discussions on the following topics: 1.  How can one come up with the right benchmark that accurately captures use cases of interest? 2.  How can a benchmark capture the fact that RDF data originate from a multitude of formats ! Structured: relational and/or XML data to RDF ! Unstructured 3.  How can a benchmark capture the different data and query patterns and provide a consistent picture for system behavior across different application settings? 4.  How can one select the right benchmark for her system, data and workload? 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 15
  16. 16. Overview •  Introducing Benchmarks •  A short discussion about Linked Data –  Resource Description Framework (Data Model) –  SPARQL (Query Language) •  Benchmarking Principles & Choke Points •  Benchmarks –  Synthetic –  Real –  Benchmark Generators •  Sum up: what did we learn today? 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 16
  17. 17. A short discussion about Linked Data - Resource Description Framework (Data Model) - SPARQL (Query Language) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 17
  18. 18. Resource Description Framework (RDF) •  W3C standard to represent Web data and metadata •  generic and simple graph based model •  information from heterogeneous sources merges naturally: –  resources with the same URI denote the same non-information resource (leading to the Linked Data Cloud) •  structure is added using schema languages and is represented as RDF triples •  Web browsers use URIs to retrieve information 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 18
  19. 19. Resource Description Framework (RDF) •  An RDF triple is of the form (s, p, o) where –  s is the subject: the URI identifying the described resource –  o is the object: can either be a simple literal value or the URI of another resource –  p is the predicate: the URI indicating the relation between subject and object •  An RDF graph is a set of triples –  Can be viewed as a node and edge-labeled directed graph –  It is published in different formats •  RDF-XML, turtle, n3 triples, … 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 19 (dbpedia:Good_Day_Sunshine, dbpedia-owl:artist, dbpedia:The_Beatles) Close to how people see the world (as a graph)!
  20. 20. Adding Semantics to RDF •  RDF is a generic, abstract data model for describing resources in the form of triples •  RDF does not provide ways of defining classes, properties, constraints •  W3C Standard Schema Languages –  RDF Vocabulary Description Language (RDF Schema - RDFS) to define schema vocabularies –  Ontology Web Language (OWL) to define ontologies 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 20
  21. 21. Adding Semantics to RDF •  RDF Vocabularies are sets of terms used to describe notions in a domain of interest •  An RDF term is either a Class or a Property –  Object properties denote relationships between objects –  Data type properties denote attributes of resources •  RDFS designed to introduce useful semantics to RDF triples •  RDFS Schemas are represented as RDF triples 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 21 "An RDF Vocabulary is a schema comprising of classes, properties and relationships which can be used for describing data and metadata"
  22. 22. RDF Vocabulary Description Language (RDFS) •  Typing: defining classes, properties, instances •  Relationships between classes and properties: subsumption •  Constraints: domain and range of properties •  Inference rules to entail new, inferred knowledge 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 22 Subject Predicate Object t1 dbo:MusicalWork rdfs:subClassOf dbo:Album t2 dbo:MusicalWork rdfs:domain dbo:artist t3 dbo:MusicalWork rdfs:range dbo:march t4 dbr:Seven_Seas_Of_Rye rdf:type dbo:MusicalWork t5 dbo:Album rdf:type rdf:Class
  23. 23. RDFS Inference 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 23 •  Used to entail new information from the one that is explicitly stated in the dataset –  Transitive closure across class and property hierarchies –  Transitive closure along the type and class/property relations •  Two ways to implement it: Forward & Backward Reasoning –  Forward Reasoning: closure is computed at loading time –  Backward Reasoning: closure is computed on the fly when needed (P1, rdfs:subPropertyOf, P2), (P2, rdfs:subPropertyOf, P3) (P1, rdfs:subPropertyOf, P3) R1: (C1, rdfs:subClassOf, C2), (C2, rdfs:subClassOf, C3) (C1, rdfs:subClassOf, C3) R2: (C1, rdfs:subClassOf, C2), (r1, rdf:type, C1) (r1, rdf:type, C2) R2: (P1, rdfs:subPropertyOf, P2), (r1, P1, r2) (r1, P2, r2) R3:
  24. 24. RDFS Inference •  Transitive closure along the type and class/property relations 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 24 (C1, rdfs:subClassOf, C2), (r1, rdf:type, C1) (r1, rdf:type, C2) R2: Subject Predicate Object t1 dbo:MusicalWork rdfs:subClassOf dbo:Album t2 dbo:MusicalWork rdfs:domain dbo:artist t3 dbo:MusicalWork rdfs:range dbo:march t4 dbr:Seven_Seas_Of_Rye rdf:type dbo:MusicalWork t5 dbo:Album rdf:type rdf:Class t6 dbo:MusicalWork rdf:type rdf:Class
  25. 25. SPARQL: Querying RDF Data •  SPARQL: W3C Standard Language for Querying Linked Data •  SPARQL 1.0 (2008) only allows accessing the data (query) •  SPARQL 1.1 (2013) introduces: –  Query Extensions: aggregates, sub-queries, negation, expressions in the SELECT clause, property paths, assignment, short form for CONSTRUCT, expanded set of functions and operators –  Updates: •  Data management: Insert, Delete, Delete/Insert •  Graph management: Create, Load, Clear, Drop, Copy, Move, Add –  Federation extension: Service, values, service variables (informative) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 25
  26. 26. SPARQL Queries (1) •  Building Block is the Triple Pattern –  RDF triple with variables •  Group Graph Patterns –  Built through inductive construction combining smaller patterns into more complex ones using SPARQL operators •  Join - similar to relational join •  Union (UNION) – similar to relational union •  Optional (OPTIONAL) operators on triple patterns – similar to relational left outer join (introduces negation in the language) •  Filtering conditions (FILTER) •  Patterns on Named Graphs 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 26
  27. 27. SPARQL Queries (2) •  Aggregates –  specify expressions over groups of solutions –  As in standard settings used when the result is computed over a group of solutions rather than a single solution •  Example: average value of a set of values, sum of a set –  Aggregates defined in SPARQL 1.1 are COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE. –  Solutions are grouped using the GROUP BY clause –  Pruning at group level is performed with the HAVING clause •  Additional Features –  duplicate elimination (DISTINCT) –  ordering results (ORDER BY) with an optional LIMIT clause 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 27
  28. 28. SPARQL Semantics •  SPARQL semantics based on Pattern Matching –  Queries describe subgraphs of the queried graph –  SPARQL graph patterns describe the subgraphs to match 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 28 Intuitively a triple pattern denotes the triples in an RDF graph that are of a specific form TP1 = (?album, dbpedia-owl:artist, dbpedia:The_Beatles) TP2 = (dbpedia_The_Beatles, ?property, ?object ) matches all albums of the Beatles matches all information about The Beatles
  29. 29. SPARQL Types of Queries •  SELECT returns ordered multi-set of variable bindings –  Bindings: mappings of variables to RDF terms in the dataset –  SQL-Like Syntax •  ASK checks whether a graph pattern has at least one solution - returns a Boolean value (true/false) •  CONSTRUCT returns a new RDF graph as specified by the graph template of the CONSTRUCT clause using the computed bindings from the query’s WHERE clause •  DESCRIBE returns the RDF graph containing the RDF data about the requested resource 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 29 SELECT ?v1, ?v2, … WHERE GraphPattern
  30. 30. Querying RDF Data with SPARQL (1) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 30 PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?title WHERE { <http://example.org/book/book1> dc:title ?title } Simple SELECT query PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox . } JOIN Query PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:mbox ?mbox } } OPTIONAL Operator PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?title WHERE { ?x dc:title ?title . FILTER regex(?title, "^SPARQL") } REGEX in FILTER
  31. 31. Querying RDF Data with SPARQL (2) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 31 PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX org: <http://example.com/ns#> CONSTRUCT { ?x foaf:name ?name } WHERE { ?x org:employeeName ?name } PREFIX foaf: <http://xmlns.com/foaf/0.1/> ASK { ?x foaf:name "Alice" } “Find the people who live in “Palo Alto” and have founded or are board members of companies in the software industry. For each such company, find the products that were developed by it, its revenue, and optionally its number of employees.“ SELECT* WHERE { ?x home “Palo Alto” . { ?x founder ?y } UNION { ?x member ?y } { ?y industry “Software” . ?z developer ?y . ?y revenue ?n . OPTIONAL { ?y employees ?m } . } } SPARQL 1.1: SPARQL plus Aggregates, Sub- queries, Property paths, Negation and more!
  32. 32. Storing and Querying RDF data •  Schema agnostic –  triples are stored in a large triple table where the attributes are (subject, predicate and object) - “Monolithic” triple-stores –  But it can get a bit more efficient 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 32 Subject Predicate Object t1 dbr:Seven_Seas_Of_Rye rdf:type dbo:MusicalWork t2 dbr:Starman_(song) rdf:type dbo:MusicalWork t3 dbr:Seven_Seas_Of_Rye dbo:artist dbo:Queen id URI/Literal 1 dbr:Seven_Seas_Of_Rye 2 dbr:Starman_(song) 3 dbo:MusicalWork 4 dbo:Queen 5 dbo:artist 6 rdf:type Subject Predicate Object 1 6 3 2 6 3 1 5 4 RDF-3X maintains 6 indexes, namely, SPO, SOP, OSP, OPS, PSO, POS. To avoid storage overhead, indexes are compressed! [NW09]
  33. 33. Storing and Querying RDF data •  schema aware: –  one table is created per property with subject and object attributes (Property Tables [Wilkinson06]) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 33 Subject Predicate Object ID1 type BookType ID1 title “XYZ” ID1 author “Fox, Joe” ID1 copyright “2001” ID2 type CDType ID2 title “ABC” ID2 artist “Orr, Tim” ID2 copyright “1985” ID2 language “French” ID3 type BookType ID3 title “MNO” ID3 language “English” ID4 type DVDType ID4 title “DEF” ID5 type CDType ID5 title “GHI” ID5 copyright “1995” ID6 type BookType ID6 copyright “2004” Subject Type Title copyright ID1 BookType “XYZ” “2001” ID2 CDType “ABC” “1985” ID3 BookType “MNO” NULL ID4 DVDType “DEF” NULL ID5 CDType “GHI” “1995” ID6 BookType NULL “2004” Subject Predicate Object ID1 author “Fox, Joe” ID2 artist “Orr, Tim” ID2 language “French” ID3 language “English” Subject Title Author copyright ID1 “XYZ” “Fox, Joe” “2001” ID3 “MNO” NULL NULL ID6 NULL NULL “2004” Subject Title artist copyright ID2 “ABC” “Orr, Tim” “1985” ID5 “GHI” NULL “1985” Subject Predicate Object ID2 language “French” ID3 language “English” ID4 type DVDType ID4 title “DEF” Booktype CDType Property-class Table Subject Object … … … … Clustered Property Table Multi-Value P
  34. 34. Storing and Querying RDF data •  Vertically partitioned RDF [AMM+07] 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 34 Subject Predicate Object ID1 type BookType ID1 title “XYZ” ID1 author “Fox, Joe” ID1 copyright “2001” ID2 type CDType ID2 title “ABC” ID2 artist “Orr, Tim” ID2 copyright “1985” ID2 language “French” ID3 type BookType ID3 title “MNO” ID3 language “English” ID4 type DVDType ID4 title “DEF” ID5 type CDType ID5 title “GHI” ID5 copyright “1995” ID6 type BookType ID6 copyright “2004” Subject Object ID1 BookType ID2 CDType ID3 BookType ID4 DVDType ID5 CDType ID6 BookType Subject Object ID1 “XYZ” ID2 “ABC” ID3 “MNO” ID4 “DEF” ID5 “GHI” Subject Object ID1 “2001” ID2 “1985” ID5 “1995” ID6 “2004” Subject Object ID2 “Orr, Tim” Subject Object ID1 “Fox, Joe” Subject Object ID2 “French” ID3 “English” type title copyright author artist language To get the most out of this par0cular decomposi0on, a column-oriented DBMS is recommended.
  35. 35. Comparison of Storage Techniques [BDK+13] 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 35 company released Google Android Apple iPhone subject object Google Android Google developer Android subject predicate object Larry Page born “1973” Larry Page founder Google Google HQ “MTV” Google employees 50,000 Google industry Internet Google industry Software Google industry Hardware Triple store person born founder Larry Page “1973 Google Type-oriented store company HQ employees Google “MTV” 50,000 subject predicate object Google industry Internet Google industry Software Google industry Hardware subject object Larry Page “1973” Predicate-oriented store subject object Google “MTV” subject object Google Internet Google Software Google Hardware subject object Larry Page Google subject object Google 50,000 born founder HQ employees industry industtry Larry Page “1973” Google Internet Software Hardware “MTV” HQ 50,000 employees sample graph Columns are overloaded Traditional relational column treatment Static mix of overloaded and normal columns developer Schema does not change on updates Schema might change on updates
  36. 36. Storing Linked Data: Query Processing •  Schema Agnostic –  algebraic plan obtained for a query involves a large number of self joins –  queries are favorable when the predicate is a variable •  Hybrid Approach and Schema-aware –  algebraic plan contains operations over the appropriate property/class tables (more in the spirit of existing relational schemas) –  saves many self-joins over triple tables –  if the predicate is a variable, then one query per property/class must be expressed 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 36
  37. 37. Purpose of an RDF Querying Benchmark •  Test the performance of RDF stores –  Independently of underlying storage engine –  Independently of underlying logical and physical schema –  Independently of the query actually executed in the engine •  SPARQL for native stores •  SQL (SPARQL translated to SQL) for relational stores 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 37
  38. 38. Overview •  Introducing Benchmarks •  A short discussion about Linked Data –  Resource Description Framework (Data Model) –  SPARQL (Query Language) •  Benchmarking Principles & Choke Points •  Benchmarks –  Synthetic –  Real –  Benchmark Generators •  Sum up: what did we learn today? 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 38
  39. 39. Benchmarking Principles & Choke Points 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 39
  40. 40. Why Benchmarks? •  Performance Evaluation –  There is no no single recipe on how to do it right –  There are many ways how to do it wrong –  There are a number of best practices but no broadly accepted standard on how to design and develop a benchmark •  Questions asked: –  What data/data sets should we use? –  Which workload/queries should we consider? –  What to measure and how to measure? 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 40
  41. 41. Benchmark Categories •  Micro-benchmarks •  Standard benchmarks •  Real-life applications 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 41
  42. 42. Micro Benchmarks •  Specialized, stand-alone piece of software •  Isolate one particular functionality of a larger system •  In databases a micro benchmark tests a single database operator –  Selection, Join (and all types thereof), Projection, Aggregates, Sub-Queries, … 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 42
  43. 43. Micro Benchmarks: Advantages •  Very focused –  Test a specific operator of the system •  Controllable data & workload –  Synthetic and Real Data sets •  Different value ranges and value distribution and correlations (mostly applicable to structured data) –  Various data sizes to tackle scalability concerns •  Queries –  Workloads of different complexity & size •  Complexity: as to the types of query operators and patterns •  Size: as to the number of query operators involved –  Allow broad parameter range(s) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 43 ! Useful for detailed, in-depth analysis ! Low setup threshold; ! Easy to run
  44. 44. Micro Benchmarks: Disadvantages •  Neglect larger picture since they do not test the whole system •  Do not consider the flow of costs of specific operations to the cost of the system •  Do not measure the impact of micro-benchmark on real-life applications •  Difficult to generalize the results •  The results of micro-benchmarks cannot be applied in a straightforward manner •  Micro-benchmarks do not use standardized metrics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 44
  45. 45. Standard Benchmarks •  Relational, Object Oriented, Object Relational Database Management Systems –  Family of TPC Benchmarks for relational databases •  XML, XPath, XQuery, –  Mbench, XBench, XMach-1, XMark, •  General Computing –  SPEC 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 45
  46. 46. Standard Benchmarks: Advantages & Disadvantages •  Advantages –  Mimic real-life scenarios (respond to real needs) •  E.g., TPC is a business oriented benchmark –  Publicly available –  Well defined –  Provide scalable data sets and workloads –  Metrics are well defined •  Disadvantages –  Outdated (standardization is a lengthy process) •  XQuery took around 7 years to become a standard •  TPC benchmark definition is still an ongoing process –  Very large and complicated to run –  Limited dataset variation (target a specific type of data) –  Limited Workload (focuses on the application in mind) –  Systems are often optimized for the benchmark(s) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 46
  47. 47. •  Management and methodological activities performed by a group of people –  Management: Organizational protocols to control the process –  Methodological: principles, methods and steps for benchmark creation •  Benchmark Development –  Roles and bodies: people/groups involved in the development –  Design principles: fundamental rules that direct the development of a benchmark –  Development process: series of steps to develop a benchmark based on Choke Points Benchmark Development Methodology 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 47 Choke Points: the set of technical difficulties that force systems to improve their performance
  48. 48. The Example Standard Benchmark: TPC •  Transaction Processing Council (TPC) –  non-profit corporation focused on developing data-centric benchmark standards and disseminating objective, verifiable performance data to the industry –  goal is to «create, manage and maintain a set of fair and comprehensive benchmarks that enable end-users and vendors to objectively evaluate system performance under well defined consistent and comparable workloads» [NPM+12] 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 48 Benchmark Explanation TPC-C Focuses on transactions. TPC-DI Focuses on ETL processes TPC-DS Decision support solutions for, but not limited to, Big Data. TPC-E On-Line Transaction Processing (OLTP) workload TPC-H Decision support benchmark, ad hoc queries and concurrent data modifications TPC-VMS Virtual Measurement Single System Specification for running and reporting performance metrics for virtualized databases TPC-xHS measure of hardware, operating system and commercial Apache Hadoop File System API TPX-xV measure the performance of servers running database workloads in virtual machines. Active TPC Benchmarks (2016)
  49. 49. Benchmark Development Process (1) •  Design Principles [L97] 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 49 Principle Comment Relevant The benchmark is meaningful for the target domain Understandable The benchmark is easy to understand and use Good Metrics The metrics defined by the benchmark are linear, orthogonal and monotonic Scalable The benchmark is applicable to a broad spectrum of hardware and software configurations Coverage The benchmark workload does not oversimplify the typical environment Acceptance The benchmark is recognized as relevant by the majority of vendors and users
  50. 50. Benchmark Development Process (2) •  Benchmarking Metrics –  Performance –  Price/Performance –  Energy/Performance Metrics: Energy metric to measure the energy consumption of system components •  TPC Pricing specification –  Provides consistent methodologies for computing the price of the benchmarked system, licensing of software, maintenance, … 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 50 Benchmark Metrics TPC-C Transaction Rate(tpmC), Price per Transaction ($/tmpC) TPC-E Transactions per Second (tpS) TPC-H Composite Query per Hour Performance Metric (QpH@Size), Price per Composite Query per Hour Performance Metric ($/ QpH@Size)
  51. 51. Desirable Attributes of a Benchmark: •  “A good benchmark is written in a high-level language, making it portable across different machines; is representative of some programming style or application; can be measured easily; has wide distribution [W90]” •  “a domain specific benchmark must meet four important criteria: relevance, portability, simplicity, scalability [G93]” •  Six desirable attributes for TPC-C [L97]: relevance, understandability, good metrics, scalability, coverage, acceptance •  Five desirable attributes in Huppler [H09]: relevance, repeatability, fairness, verifiability, economy •  Big Data Benchmarking [1]: “a successful benchmark should be simple to implement and execute, cost effective, timely and verifiable”. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 51
  52. 52. Desirable Attributes of a Benchmark: 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 52
  53. 53. Design Principles: Desirable Attributes of a Benchmark 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 53 •  Relevant/Representative: based on realistic use case scenarios and must reflect the needs of the use case •  Understandable/Simple: the results and workload are easily understandable by users •  Portable/Fair/Repeatable: no system benefits from the benchmark. Must be deterministic and provide a «gold standard» •  Metrics: should be well defined to be able to assess and compare the systems. •  Scalable: datasets should be in the order of billions of «objects» •  Verifiable: allow verifiable results in each execution Benchmark Attributes relevant representative understandable simple portable fair repeatable metrics scalable verifiable
  54. 54. Design of Benchmark Workload [Grey93] •  Design the queries to test specific features of the query language or to test specific data management approaches •  Base the query mix on specific requirements of real world use cases –  Leads to complex queries that involve many (different) language features 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 54 Micro-benchmarks Domain specific and standard benchmarks
  55. 55. Development Process: Choke Points •  A benchmark exposes a system to a workload and should identify the technical difficulties of the system under test •  Choke Points [BNE14 ] are those technological challenges whose resolution will significantly improve the performance of a product •  TPC-H: a 20 years old benchmark (superseded by TPC-DS) but still influential using business-oriented queries and concurrent modifications •  22 queries capturing (most of) the aspects of relational query processing •  [BNE14] performed an analysis of the TPC-H workload and identified 28 choke points grouped into 6 categories 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 55
  56. 56. Choke Points à la TPC-H •  CP1: Aggregation Performance –  Ordered aggregation, small group-by keys, interesting orders, dependent group-by keys •  CP2: Join Performance –  Large joins, sparse foreign keys, rich join order optimization, late projection •  CP3: Data Access Locality (materialized views) –  Columnar locality, physical locality by key, detecting correlation •  CP4: Expression Calculation –  Raw Expression Arithmetic, Complex Boolean Expressions in Joins and Selections, String Matching Performance •  CP5: Correlated Sub-queries –  Flattening sub-queries, moving predicates to a sub-query, overlap between outer- and sub-query •  CP6: Parallelism and Concurrency –  Query plan parallelization, workload management, result re-use 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 56
  57. 57. Choke Points à la RDF Choke Point Description CP1: JOIN ORDERING 1.  Tests if the engine can evaluate the trade-offs between the time spent to find the best execution plan and the quality of the output plan 2.  Tests the ability of the engine to consider cardinality constraints expressed by the different kinds of schema constraints (e.g., functional and inverse functional properties) CP2: AGGREGATION Aggregations are implemented with the use of sub-selects in the SPARQL query; the optimizer should recognize the operations included in the sub-selects and evaluate them first. CP3: OPTIONAL & NESTED OPTIONAL CLAUSES Tests the ability of the optimizer to produce a plan where the execution of the optional triple patterns is the last to be performed since optional clauses do not reduce the size of intermediate results. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 57
  58. 58. Choke Points in RDF Benchmarks Choke Point Description CP4: REASONING Tests the ability of the engine to handle efficiently RDFS and OWL constructs expressed in the schema CP5: PARALLEL EXECUTION OF UNIONS Tests the ability of the optimizer to produce plans where unions are executed in parallel CP6: FILTERS Tests the ability of the engines to execute as early as possible those filter expressions to eliminate a possibly large number of intermediate results CP7: ORDERING Tests the ability of the engine to choose query plan(s) that facilitate the ordering of results CP8: GEO-SPATIAL PREDICATES Tests the ability of the system to handle queries for geospatial data 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 58
  59. 59. Choke Points in RDF Benchmarks Choke Point Description CP9: FULL TEXT Queries that involve the evaluation of regular expressions on data value properties of resources CP10: DUPLICATE ELIMINATION Tests the ability of the system to identify duplicate entries and eliminate them during the creation of intermediate results CP11: COMPLEX FILTER CONDITIONS Tests the ability of the engine to deal with negation, conjunction and disjunction efficiently (i.e., breaking the filters into conjunction of filters and execute them in parallel). 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 59
  60. 60. Query Characteristics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 60 Characteristics Simple filters Unbound predicates LIMIT REGEX CONSTRUCT Complex filters Negation ORDER BY UNION ASK >= 9 TPs OPTIONAL DISTINCT DESCRIBE
  61. 61. Overview •  Introducing Benchmarks •  A short discussion about Linked Data –  Resource Description Framework (Data Model) –  SPARQL (Query Language) •  Benchmarking Principles & Choke Points •  Benchmarks –  Synthetic –  Real –  Benchmark Generators •  Sum up: what did we learn today? 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 61
  62. 62. A Survey of RDF Benchmarks Synthetic Benchmarks Real Benchmarks Benchmark Generators 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 61
  63. 63. Benchmark Components •  Datasets •  The raw material of the benchmark against which the workload will be evaluated •  Synthetic & Real Datasets !  Synthetic: Produced with a data generator (that hopefully produces data with interesting characteristics) !  Real: Widely used datasets from a domain of interest •  Query Workload •  Sets of queries and/or updates to evaluate the system with •  Metrics •  The performance metric(s) that determine the systems behavior 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 62
  64. 64. Synthetic RDF Benchmarks 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 63
  65. 65. Lehigh University Benchmark (LUBM) [GPH05] •  Benchmark intended to facilitate the evaluation of Semantic Web repositories •  Widely adopted by the data engineering and Semantic Web communities •  Focuses on evaluating the performance of query optimizers and not ontology reasoning as in DL systems •  Components: –  Scalable Synthetic data generator –  Ontology of moderate size and complexity –  Supports extensional queries (i.e., queries that request instances and not only schema information) –  Proposes Performance metrics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 64
  66. 66. LUBM Univ-Bench Ontology •  Describes universities and departments and related activities •  Expressed in OWL Lite ( took into consideration the limitations of reasoning systems reg. completeness) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 65 Statistics: !  43 Classes !  32 Object Type Properties !  7 Data Type Properties !  OWL Lite inverseOf, TransitiveProperty, someValuesFrom, intersectionOf
  67. 67. LUBM Data Generation (1) •  Synthetically produced extensional data that conform to the LUBM Ontology •  Data are generated using the UBA (Univ-Bench Artificial Data Generator) •  Random and Repeatable Data Generation •  Minimum unit of data generation: University that has departments, employees, courses •  Instances of classes and properties are randomly produced •  To make data more realistic restrictions are applied: –  «Minimum 15 and maximum 25 departments per university» –  «Undergraduate student/faculty ratio between 8 and 14 inclusive» 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 66
  68. 68. LUBM Data Generation (2) •  Assignment of Identifiers is done using zero-based indexes –  University0, Department0, … •  Data generated by the tool are repeatable for the universities –  User enters a seed for the random number generator employed in the data generation process •  Data created are represented in OWL Lite •  Configurable serialization and representation model (RDF/ XML in .owl files, DAML) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 67
  69. 69. LUBM Queries (1) •  14 Realistic Queries •  Written in SPARQL 1.0 •  Query Design criteria –  Input Size: •  proportion of the class instances involved and entailed in the query to the total instances in the dataset –  Selectivity: •  estimated proportion of the class instances that satisfy the query criteria •  depends on the input dataset size 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 68
  70. 70. LUBM Queries (2) –  Complexity: •  measured on the basis of the number of classes and properties involved in the query •  different complexity for the same query and for different implementations: relational vs RDF –  Hierarchy information: •  class and property hierarchies are used to obtain all query answers –  Logical inference: •  inference is required to obtain all query answers 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 69
  71. 71. LUBM Queries (3): Characteristics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 70 Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Simple filters Complex filters >= 9 TPs Unbound predicates Negation OPTIONAL LIMIT ORDER BY DISTINCT REGEX UNION DESCRIBE CONSTRUCT ASK Simple SPARQL SELECT Queries
  72. 72. LUBM Queries (4): Choke Points # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1 Q2 ✓ Q3 ✓ Q4 ✓ ✓ Q5 ✓ Q6 ✓ Q7 ✓ Q8 ✓ Q9 ✓ Q10 ✓ Q11 ✓ Q12 ✓ ✓ Q13 ✓ Q14 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 71 Join Ordering Most complex query contains 5 joins Reasoning Focus on subClass and subProperty hierarchies
  73. 73. LUBM Performance Metrics (1) •  Load Time: –  Time needed to parse, load and reason for a dataset –  Focuses on persistent stores •  Repository Size: –  For persistent storage only –  The size of all files that constitute the repository •  Query Response Time: –  Average time for executing a query 10 times (warm run) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 72
  74. 74. LUBM Performance Metrics (2) •  Query Completeness and Soundness: –  Measures the degree of completeness of a query answer as the percentage of entailed unique answers •  Combined Metric: –  Combines query response time with answer completeness and answer soundness –  Measures the trade-off between query response time and completeness of results •  See how reasoning affects query performance –  Provides an absolute ranking of systems –  But hides details! 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 73
  75. 75. SP2Bench [SHM+09] •  Proposes a language specific benchmark to test the most common SPARQL constructs, operator constellations and RDF access patterns •  Components: –  Scalable synthetic data generator •  Creation of DBLP documents in RDF mimicking key characteristics of the original DBLP dataset •  Produced datasets contain blank nodes and RDF containers –  Supports extensional queries (i.e., queries that request instances and not schema information) –  Proposes performance metrics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 74
  76. 76. SP2Bench Schema DBLP (1) •  Study of DBLP real data –  Determine the probability distribution for selected attributes per document classes that forms the basis for generating class instances –  Reveals that only few of the attributes are repeated for the same class 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 75 <!ELEMENT dblp (article | inproceedings | proceedings | book | incollection | phdthesis | masterthesis | www)* > <!ENTITY %field “author | editor | title | booktitle | pages | year | address | journal | volume | number | month | url | ee | cdrom | cite | publisher | note | crossref | isbn | series | school | chapter” > <!ELEMENT article (%field)*> <!ELEMENT inproceedings (%field)* > Extract DBLP DTD 2008
  77. 77. SP2Bench Schema DBLP (2) •  Probability distribution for selected attributes per document classes •  Additional assumption is that attributes are not dependent –  Existence of an attribute does not depend on another •  Use Bell-shaped Gaussian curves to approximate input data –  Typically used to model normal distributions •  Studied the number of class instances over time and modeled those with a power law distribution 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 76 Article Inproc. Proc. Book WWW author 0.9895 0.9970 0.0001 0.8937 0.9973 cite 0.0048 0.0104 0.0001 0.0079 0.0000 editor 0.0000 0.0000 0.7992 0.1040 0.0004 isbn 0.0000 0.0000 0.8592 0.9294 0.0000 … … … … … …
  78. 78. SP2Bench Data Generation 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 77 •  Synthetically produced extensional data that conform to the DBLP Schema •  Use of existing external vocabularies to describe resources in a uniform way –  FOAF (persons) – Friend of A Friend [ FOAF], SWRC - Semantic Web for Research Communities (scientific publications) [SWRC], DC – Dublin Core [DC] •  Introduce blank nodes and RDF containers (rdf:Bag) to capture all aspects of the RDF data model •  Data generation takes into account data approximation as reflected in the Gaussian curves •  Data generator takes as input either the triple count, or year up to which the data is generated –  Always ending up in a consistent state! •  Random functions are based on a fixed seed making data generation deterministic
  79. 79. SP2Bench Queries (1): Characteristics •  17 queries –  12 main queries and modifications thereof •  Provided in natural language, in SPARQL 1.0 and SQL translations are also available •  Query design criteria –  Focus on SELECT and ASK SPARQL forms –  Aim at covering the majority of SPARQL constructs (including DISTINCT, ORDER By, LIMIT, OFFSET) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 78
  80. 80. SP2Bench Queries (2): Characteristics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 79 Characteristic Q1 Q2 Q3abc Q4 Q5ab Q6 Q7 Q8 Q9 Q10 Q11 Q12abc Simple filters ✔ ✔ ✔ ✔ Complex filters ✔ ✔ ✔ >= 9 TPs ✔ ✔ ✔ ✔ ✔ Unbound predicates ✔ ✔ Negation ✔ ✔ OPTIONAL ✔ ✔ ✔ LIMIT ✔ ORDER BY ✔ ✔ DISTINCT ✔ ✔ ✔ ✔ ✔ ✔ REGEX UNION ✔ ✔ ✔ DESCRIBE CONSTRUCT ASK ✔
  81. 81. SP2Bench Queries (3): Choke Points # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1 ✓ Q2 ✓ ✓ Q3 ✓ Q4 ✓ ✓ ✓ Q5 ✓ ✓ ✓ Q6 ✓ ✓ ✓ ✓ Q7 ✓ ✓ ✓ ✓ Q8 ✓ ✓ ✓ ✓ ✓ Q9 ✓ ✓ Q10 Q11 ✓ Q12 ✓ ✓ ✓ 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 80 Join Ordering: most complex query contains 8 joins Filters: most complex query contains 2 filters Duplicate Elimination
  82. 82. SP2Bench Performance Metrics •  Loading Time: –  time needed to parse, load and reason using the tested system for a dataset –  Focuses on persistent stores •  «Per-query» performance: –  Performance of each query •  «Global» performance: –  List the arithmetic and geometric mean of queries 1.  Multiply the execution time of all 17 queries 2.  Penalize queries that fail with 3600s penalty 3.  Compute the 17th root of the result •  Memory consumption –  High watermark of main memory consumption –  Average memory consumption of all queries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 81
  83. 83. Berlin SPARQL Benchmark (BSBM) [ BS09][BSBM] •  Built around an e-commerce use case •  Query mix emulates the search and navigation patterns of a user looking for a product of interest •  Goals –  Allow the comparison of SPARQL engines across different architectures (relational and/or RDF) –  Challenge forward and backward chain reasoning engines –  Focuses on an enterprise setting where multiple clients concurrently execute workloads –  Measures SPARQL query performance and not (so much) reasoning •  Components –  Data generator: supports the creation of arbitrarily large datasets –  Test Driver: executes sequences of SPARQL queries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 82
  84. 84. BSBM Schema (1) •  E-commerce use case: products are offered by several vendors and consumers post reviews for those products 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 83 9..22 Review bsbm:reviewFor rev:reviewer bsbm:reviewDate dc:title rev:text bsbm:rating1[0..1] bsbm:rating2[0..1] bsbm:rating3[0..1] bsbm:rating4[0..1] Producer rdfs:label rdfs:comment rdf:type foaf:homepage bsbm:country ProductType rdfs:label rdfs:comment rdf:type rdfs:subClassOf[1..0] ProductFeature rdfs:label rdfs:comment rdf:type Product rdfs:label rdfs:comment rdf:type bsbm:producer bsbm:productFeature[9..22] bsbm:productPropertyTextual1 bsbm:productPropertyTextual2 bsbm:productPropertyTextual3 bsbm:productPropertyTextual4[0..1] bsbm:productPropertyTextual5[0..1] bsbm:productPropertyNumeric1 bsbm:productPropertyNumeric2 bsbm:productPropertyNumeric3 bsbm:productPropertyNumeric4[0..1] bsbm:productPropertyNumeric5[0..1] Offer bsbm:product bsbm:vendor bsbm:price bsbm:validFrom bsbm:validTo bsbm:deliveryDays bsbm:offerWebpage Person foaf:name foaf:mbox_sha1sum bsbm:country Vendor rdfs:label rdfs:comment rdf:type foaf:homepage bsbm:country 1..89 1 1..* 1..* 1..* 1 2..16 1 4..32 1 280..3730 2..37 1
  85. 85. BSBM Schema & Data Characteristics (1) •  Every product has a type from a product hierarchy •  Product Hierarchy is not fixed (depends on the dataset size) –  It’s depth and width depends on the chosen scale factor –  Hierarchy depth –  Branching factor for •  root level •  all other levels is 8 •  Product types are assigned a variable number of product features –  computed as lowerBound and upperBound with •  aa –  Set of possible features for a given product type is the union of the type and all its “super-types”. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 84 d =1+round(log10(n)) / 2 n bfr =1+ round(log10(n)) lowerBound = 35*i / (d *(d +1) / 2 −1),upperBound = 75*i / (d *(d +1) / 2 −1)
  86. 86. BSBM Schema & Data Characteristics (2) •  Products, Vendors, Offers –  Products that share the same type, have also the same set of features –  For a given product, its features are chosen from the set of possible features with a hard-coded probability of 25% –  Normal distribution with a mean of μ=50 and standard deviation σ=16.6 is employed to associate products with producers –  Vendors are associated to countries following hard-coded distributions –  Size of offers is n*20 are distributed over products following a normal distribution with «fixed parameters» μ=n/2 and σ=n/4 –  Offers are distributed over vendors following a normal distribution with «fixed parameters» μ=2000 and σ=667 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 85
  87. 87. BSBM Schema & Data Characteristics (3) •  Reviews –  10 times the scale factor n –  Data type property values (title and text) between 50 – 300 words –  Up to 4 ratings, each rating is a random integer between 1 and 10 –  Each rating is missing with hard-coded probability 10% –  Distributed over products with a normal distribution depending on dataset size and following μ=n/2 and σ=n/4 –  Number of reviews per reviewer follows normal distribution with μ=20 and σ=6.6 –  Reviews are generated until all reviews are assigned a reviewer –  Reviewer countries follow the same distribution as vendor countries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 86
  88. 88. BSBM Data Generation (1) •  Synthetically produces instances of class Product that conform to the BSBM Schema 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 87 Total #triples 250K 1M 2M 100M #products 666 2,785 70,812 284,826 #product features 2,860 4,745 23,833 47,884 #product types 55 151 731 2011 #producers 14 60 1422 5,618 #vendors 8 34 722 2,854 #offers 13,320 55,700 1,416,240 5,696,520 #reviewers 339 1432 36,249 146,054 #reviews 6,660 27,850 708,120 2,848,260 Total #instances 23,922 92,757 2,258,129 9,034,027 Indicative number of instances for different dataset sizes
  89. 89. BSBM Queries (1) •  12 Queries •  Query mix is emulates search and navigation patterns of a customer looking for a product •  BSBM queries are given in natural language, SPARQL and SQL 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 88 Query Description Q1 Find products for a given set of generic features Q2 Retrieve basic information about a specific product for display purposes Q3 Find products having some specific features and not having one feature Q4 Find products matching two different sets of features Q5 Find products that are similar to a given product Q6 Find products having a label that contains a specific string Q7 Retrieve in-depth information about a product including offers and reviews Q8 Give me recent language reviews for a specific product Q9 Get information about a reviewer Q10 Get cheap offers which fulfill the consumer’s delivery requirements Q11 Get all information about an offer Q12 Export information about an offer into another schema
  90. 90. Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Simple filters ✔ ✔ ✔ ✔ ✔ ✔ ✔ Complex filters ✔ ✔ > 9 TPs ✔ ✔ ✔ ✔ ✔ Unbound predicates ✔ ✔ Negation ✔ OPTIONAL ✔ ✔ ✔ ✔ LIMIT ✔ ✔ ✔ ✔ ✔ ✔ ORDER BY ✔ ✔ ✔ ✔ ✔ ✔ DISTINCT ✔ ✔ ✔ REGEX ✔ UNION ✔ ✔ DESCRIBE ✔ CONSTRUCT ✔ ASK BSBM Queries (2): Characteristics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 89 11 JOINs, 3 OPTIONAL clauses, 3 Filters, 1 Unbound variable 4 OPTIONAL clauses
  91. 91. BSBM Queries (3): Choke Points 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 90 # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1 ✔ ✔ ✔ ✔ Q2 ✔ Q3 ✔ ✔ ✔ Q4 ✔ ✔ ✔ ✔ Q5 ✔ ✔ ✔ ✔ Q6 ✔ ✔ Q7 ✔ ✔ ✔ Q8 ✔ ✔ ✔ Q9 ✔ Q10 ✔ ✔ ✔ ✔ Q11 ✔ Q12 ✔ Join Ordering: most complex query contains 11 joins Filters: most complex query contains 3 filters and most complex filter contains arithmetic expressions Result Ordering
  92. 92. BSBM: Performance Metrics •  Query Mixes per Hour (QMpH) –  Measures the number of complete BSBM query mixes answered by a system under test and for a specific number of clients running concurrently against the system under test •  Queries per Second (QpS) –  Measures the number of queries of a specific type handled by the system under test in a second –  Calculated by dividing the number of queries of a specific type within a benchmark run by the total execution time of those queries •  Load Time: –  Time to load the dataset in the RDF or relational repositories •  Includes the time to create the appropriate data structures & indices 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 91
  93. 93. Semantic Publishing Benchmark (SPB) •  Developed in the context of FP7 EU Project LDBC (2012-2015) •  LDBC’s goals: –  Develop querying benchmarks that will spur research & industry progress in large-scale graph and RDF data management •  scalability, storage, indexing and query optimization techniques for RDF and graph database solutions •  quantitatively and qualitatively assess different solutions for RDF data integration –  To establish an industry-neutral entity - LDBC foundation - à la the Transaction Processing Council (TPC) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 92
  94. 94. Semantic Publishing Benchmark (SPB) •  Industry-motivated benchmark –  The scenario involves a media / publisher organization that maintains semantic metadata about its Journalistic assets •  Components –  Scalable Synthetic Data Generator •  Creation of instances of BBC ontologies mimicking characteristics of the original real input datasets –  Supports extensional queries (i.e., queries that request instances and not schema information) –  Workload simulates consumption of RDF metadata •  Concurrent read and update queries –  Proposes performance metrics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 93
  95. 95. SPB Design: Requirements •  Storing and processing RDF data –  Storing and isolating data in separate RDF graphs –  Supporting following SPARQL standards : •  SPARQL 1.1 Protocol, Query, Update •  Support for Schema Languages –  Support for RDFS to obtain the correct answers –  Optional support for the RL profile of Web Ontology Language (OWL2 RL) in order to pass the conformance test suite •  Loading data from RDF serialization formats –  N-Quads, TRIG, Turtle, etc. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 94
  96. 96. SPB Schema: BBC Ontologies (1) •  Core Ontologies: 7 ontologies describe basic concepts about entities and relationships in the domain of interest –  Basic Concepts: Creative Works, Places, Persons, Provenance Information, Company Information, etc. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 95 Thing CreativeWork String cwork:title owl:Thing owl:sameAs Theme Organisation Event PlacePerson Programme NewsItemBlogPost cwork:tag cwork:shortTitle String cwork:category xsd:Any cwork:description String Audience International Audience National Audience cwork:audience cwork:Format Textual Format Video Format Interactive Format Image Format Audio Format PictureGallery Format cwork:primaryFormat xsd:dateTime xsd:dateTime cwork:dateModified cwork:dateCreated cwork:Thumbnail cwork:thumbnail Thumbnail ThumbnailTypethumbnailType StandardThumbnail FixedSize66Thumbnail CloseUpThumbnail FixedSize266Thumbnail FixedSize466Thumbnail p rdfs:subClassOf rdfs:subPropertyOf rdf:type tag about mentions Stringcwork:altText
  97. 97. Schema BBC Schema (2) •  Domain Ontologies: 3 ontologies describe concepts and properties related to a specific domain –  sports (competitions, events) –  politics entities –  news (concepts that journalists tag annotations with) •  Statistics –  74 classes –  88 data type properties, 28 object type properties –  60 rdfs:subClassOf (maximum depth 3) , 17 rdfs:subPropertyOf (maximum depth 1) hierarchies –  105 rdfs:domain and 115 rdfs:range RDFS properties –  8 owl:oneOf class axioms, 1 one owl:TransitiveProperty property. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 96
  98. 98. SPB: Reference datasets •  Collections of entities describing various domains –  Snapshots of the real datasets of BBC •  Football competitions and teams •  Formula One competitions and teams •  UK Parliament Members –  Additional datasets •  GeoNames - Places, names and coordinates •  DBPedia – Person data –  Reference Dataset Size: 25M triples 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 97
  99. 99. SPB Data Generation (1): Process 1.  Loader –  Ontology & Reference Data 2.  Data Generator a.  Retrieves instances from Reference Datasets b.  Generates Creative Works according to pre-defined allocations and models c.  Writes generated data to disk 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 98 RDF Repository BBC Ontologies Reference Datasets Ontology & Reference Data Set Loader Creative Works Generator SPARQL Endpoint SPB Data Generator Data generation parameters (1) (1) (2.a) Generated CWs (2.c) (1) (2.d)
  100. 100. SPB Data Generation (2) •  Produces synthetic data that mimic most of the characteristics of real world data provided by BBC •  Input: Core & Domain Ontologies and Reference datasets •  Output: –  Instances that conform to BBC core ontologies (class Creative Work) –  Instances refer to entities in the reference datasets using the about & mentions schema properties –  follows the (user) pre-defined distributions of SPB’s Data Generator Tagged entities 01/2012 12/ 2012 clustering correla1ons random distribu1on 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 99
  101. 101. SPB Operational Phases •  Data Loading 1.  Initial loading of reference datasets •  BBC datasets enriched with DBPedia Person and GeoNames place data 2.  Generation of Creative Works •  Parallel generation (multi-threaded and multi-process) 3.  Loading of Creative Works in the RDF repository •  Running the Benchmark 1.  Warm-up phrase 2.  Run the benchmark using the Test Driver 3.  Run conformance tests (OWL2 RL) [optional] 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 100
  102. 102. Benchmark Configuration •  Data Generator –  Allocation of tags in Creative Works •  Correlations of creative works with important entities (persons, places, events) •  Clustering of Creative Works around major / minor events –  Size of generated data (triples) –  Parallel data generation •  Test Driver –  Distribution of queries in the query-mix •  editorial operations (deletion/addition of RDF triples) •  aggregate operations (complex SPARQL queries) –  Number of editorial / aggregation agents –  Duration of Warm-up and Benchmark phases –  Each operational phase can be enabled or disabled 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 101
  103. 103. SPB Base Workload Queries (2) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 102 Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Simple filters ✔ Complex filters ✔ ✔ ✔ > 9 TPs ✔ ✔ ✔ ✔ Unbound predicates ✔ Negation OPTIONAL ✔ ✔ ✔ ✔ LIMIT ✔ ✔ ✔ ✔ ✔ ✔ ORDER BY ✔ ✔ ✔ ✔ ✔ ✔ ✔ DISTINCT ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ COUNT ✔ REGEX UNION ✔ ✔ ✔ GROYP BY ✔ CONSTRUCT ✔ ✔ ✔ ✔ ✔ Evaluate (parts of the) query on graphs
  104. 104. SPB Queries (1) •  Base and Advanced Workloads –  Base Workload: 12 queries & update operations –  Advanced Workload: 24 queries •  Workloads based on real queries used by BBC journalists during their editorial operations •  Editorial agents – simulate editorial work performed by journalists : –  Insert, Update, Delete •  Aggregation agents – simulate retrieval operations performed by end-users 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 103
  105. 105. SPB Base Workload Queries (3): Choke Points 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 104 # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1 ✔ ✔ ✔ ✔ ✔ Q2 ✔ ✔ ✔ Q3 ✔ ✔ ✔ ✔ ✔ ✔ Q4 ✔ ✔ ✔ ✔ ✔ Q5 ✔ ✔ ✔ ✔ ✔ Q6 ✔ ✔ ✔ ✔ Q7 ✔ ✔ Q8 ✔ ✔ ✔ Q9 ✔ ✔ ✔ Q10 ✔ ✔ ✔ ✔ Q11 ✔ ✔ ✔ ✔ ✔ Q12 ✔ ✔ Reasoning reg. class & property hierarchies Join Ordering Ordering & Duplicate Elimination
  106. 106. SPB Performance Metrics •  SPB Primary Metrics •  Query Execution Report (1) •  Query Execution Report (2) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 105 Query Rate Interactive mix (Queries per second) Query Rate Analytical Mix (Queries per second) Update Rate (Operations per second) Duration of Bulk Load (in ms) Duration of Measurement Window (in minutes) # Complete Analytical mixes (per second) # Complete Interactive mixes (per second) # Complete Update Operations Query Arithmetic Mean Execution Time Minimum Execution Time 90th % Average Execution Time # Executions
  107. 107. Real RDF Benchmarks 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 106
  108. 108. UniProt [RU09][UniprotKB] •  Comprehensive, high-quality and freely accessible resource of protein sequence and functional information •  UniProt Schema –  UniProt Core Vocabulary, BIBO (journals), ECO (evidence codes), Dublin Core (metadata) –  UniProt Core Vocabulary: 124 classes, 113 Properties •  Dataset contains approximately –  13 billion triples –  2.5 billion distinct subjects –  2 billion distinct objects •  Queries –  No representative set of queries is offered. –  [NW09] offers a set of 8 queries to test the RDF-3X engine 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 107
  109. 109. UniProt Queries (1) [NW09]: Characteristics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 108 Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Simple filters Complex filters > 9 TPs ✔ ✔ ✔ ✔ ✔ ✔ Unbound predicates Negation OPTIONAL LIMIT ORDER BY DISTINCT REGEX UNION DESCRIBE CONSTRUCT ASK Join Ordering RDF-3X aims at optimizing join processing for RDF data
  110. 110. UniProt Queries (2) [NW09]: Choke Points •  Focus on discovering optimal or close to optimal join orders 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 109 # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1 ✔ Q2 ✔ Q3 ✔ Q4 ✔ Q5 ✔ Q6 ✔ Q7 ✔ Q8 ✔ Join Ordering: most complex query contains 12 joins 7 queries contain more than 7 joins
  111. 111. YAGO (Yet Another Great Ontology)[SKW07] •  High quality multilingual knowledge based derived from Wikipedia, WordNet and GeoNames •  Schema –  Wikipedia Entities, WordNet and GeoNames Concepts and Relationships: associates WordNet taxonomy with Wikipedia Category System –  10 million schema entities •  Dataset –  120 million triples about schema entities –  2.625 million links to DBPedia •  Queries –  No representative set of queries is offered by YAGO –  [NW10] provides a representative set of 8 queries for RDF-3X Evaluation 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 110
  112. 112. YAGO Queries (1) [NW10]: Characteristics •  Simple SELECT queries that focus on Join ordering, negation and duplicate elimination 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 111 Characteristic A1 A2 A3 B1 B2 B3 C1 C2 Simple filters ✔ Complex filters > 9 TPs ✔ Unbound predicates Negation ✔ ✔ ✔ OPTIONAL LIMIT ORDER BY DISTINCT ✔ ✔ ✔ ✔ ✔ REGEX UNION ✔
  113. 113. YAGO Queries (2) [NW10]: Choke Points •  Queries focus mostly on discovering optimal or close to query evaluation plans, including negation in filters and duplicate elimination. •  6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 112 # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 A1 ✔ A2 ✔ A3 ✔ ✔ ✔ ✔ B1 ✔ ✔ ✔ B2 ✔ ✔ B3 ✔ ✔ ✔ C1 ✔ ✔ ✔ ✔ C2 ✔ ✔ Join Ordering: most complex query contains 8 joins all queries contain more than 5 joins
  114. 114. Barton Library [Barton] •  Data from the MIT Simile Project that develops tools for library data management –  contains records that compose an RDF-formatted dump of the MIT Libraries Barton catalog –  converted from raw data stored in an old library format standard called MARC (Machine Readable Catalog). •  Schema –  Common types include Record and Item, the latter being associated with instances of type Person and with instances of Description. –  Primitive types include Title and Date. •  Dataset –  Approximately 45 million RDF triples •  Queries –  No representative queries provided with the Barton Library Dataset –  [Abadi07] provides a workload of 7 queries ([NW10] in SPARQL) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 113
  115. 115. Barton Queries (1) [NW10]: Characteristics 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 114 Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Simple filters ✔ ✔ ✔ ✔ Complex filters > 9 TPs Unbound predicates Negation ✔ OPTIONAL LIMIT ORDER BY DISTINCT ✔ ✔ ✔ REGEX UNION ✔
  116. 116. Barton Queries (2) [NW10]: Choke Points •  Queries focus mostly on discovering optimal or close to optimal query evaluation plans, including negation in filters and duplicate elimination. •  6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 115 # CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11 Q1 ✔ Q2 ✔ ✔ Q3 ✔ Q4 ✔ Q5 ✔ ✔ Q6 ✔ ✔ ✔ ✔ Q7 ✔ Join Ordering: most complex query contains 3 joins
  117. 117. Linked Sensor Dataset [PHS10] •  Expressive descriptions of approximately 20,000 weather stations in the US •  divided up into multiple subsets, that reflect weather data for specific hurricanes or blizzards from the past (focus on hurricane Ike) •  Schema –  Contains information about temperature, precipitation, pressure, wind, speed, humidity –  Contains links to GeoNames and links to observations provided by MesoWest (meteorological service in the US) •  Dataset –  more than 1 billion triples •  Queries –  No representative set of queries is offered. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 116
  118. 118. WordNet [WordNet] •  Large lexical database of English, developed under the direction of George A. Miller (Emeritus). •  Schema –  Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. –  Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. •  Dataset –  Approximately 1.9 million triples (300MB). •  Queries –  No representative query workload 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 117
  119. 119. Publishing TPC-H as RDF [TPC-H] •  Benchmark can be used by decision support systems that examine –  large volumes of data, execute queries with a high degree of complexity, and provide answers to critical business questions •  Benchmark provides a suite of business oriented ad-hoc queries and concurrent data modifications •  Queries and the data populating the database have been chosen to have broad industry-wide relevance •  Use the DBGEN TPC-H generator to generate a TPC-H relational dataset •  Use the D2R tool or other relational to RDF tool to convert the relational dataset to the equivalent RDF one. •  TPC SQL queries are translated to equivalent SPARQL queries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 118
  120. 120. Benchmark Generators 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 119
  121. 121. DBPedia SPARQL Benchmark (DBSB) [MLA+14] •  Generic Methodology for SPARQL Benchmark Creation •  Based on –  Flexible data generation that mimics an input data source –  Query-log mining –  Clustering of queries –  SPARQL queries feature analysis •  Methodology is schema agnostic –  Demonstrated using DBPedia KB •  Proposed approach applied on various sizes of the DBPedia Knowledge Base •  Benchmark proposes query workload based on real queries expressed against DBPedia 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 120
  122. 122. DBSB Data Generation (1) •  Working assumptions 1.  Output dataset should have similar characteristics as input dataset •  Number classes, properties, value distributions, taxonomic structures (hierarchies) 2.  Varying output dataset sizes 3.  Characteristics such as in-, out- degree of nodes in datasets of varying sizes should be similar 4.  Easily repeatable data generation process 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 121
  123. 123. DBSB Data Generation (2) •  Idea 1.  Large datasets produced by •  Duplicating all triples and changing their namespace 2.  Smaller datasets produced by •  Removing triples in a way that would preserve the properties of the original graph •  Using a seed based method based on the assumption that a representative set of resources is obtained by sampling across classes 1.  For each selected element in the dataset, its concise bound description (CBD) is retrieved and added in the queue 2.  Process is repeated until the number of triples is reached 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 122
  124. 124. DBSB Query Analysis (1) •  Goal is to detect prototypical queries that were sent to a DBPedia SPARQL endpoint using similarity measures –  String similarity and graph similarity •  Idea: 4-step query analysis and clustering approach 1.  Select queries executed frequently on the input data 2.  Strip common syntactic constructs (namespace, prefixes) 3.  Compute query similarity using string matching 4.  Compute query clusters using a soft graph clustering algorithm •  Clusters used to devise the benchmark query generation patterns 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 123
  125. 125. DBSB Query Analysis (2) •  Query Selection 1.  Use DBPedia SPARQL Query log (31.5 million queries in a 3 month period) 2.  Reduce the initial set of queries by considering •  Query Variations: use a standard way to name variables to reduce differences among queries (promoting query constructs such as DISTINCT, REGEX) •  Query Frequency: discard queries with low frequency since they do not contribute to the overall query performance –  Result: 35,965 queries 3.  String Stripping: remove all SPARQL keywords and common prefixes 4.  Similarity Computation: compute the similarity of the stripped queries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 124
  126. 126. DBSB Query Analysis (3) •  Query Selection (cont’d) 4.  Similarity Computation •  Reduce the time of benchmark compilation, use LIMES[NS11] framework •  Use the Levenshtein string similarity measure, 0.9 threshold •  Reduce by 16.6% the number of computations required by computing the Cartesian product of queries 5.  Clustering •  Apply graph clustering to the query similarity graph of (4) •  Goal is to identify similar groups of queries out of which prototypical queries will be generated •  Use BorderFlow [NS09] algorithm that follows a seed-based approach •  Obtain 12272 clusters, 24% contain a single query •  Select the clusters with >5 queries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 125
  127. 127. DBSB Query Generation (1) •  Select the most interesting SPARQL queries –  Which are the most frequently asked SPARQL queries –  Which of those queries cover the most SPARQL features •  SPARQL Features –  Overall number of triple patterns •  Test the efficiency of join operations (CP1) –  SPARQL pattern constructors (UNION & OPTIONAL) •  Handle parallel execution of Unions (CP5) •  Perform OPTIONALs as late as possible in the query plan (CP3) –  Solution sequences & modifiers (DISTINCT) •  Efficiency of duplication elimination ( CP10) –  Filter conditions and operators (FILTER, LANG, REGEX, STR) •  Efficiency of engines to execute filters as early as possible (CP6) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 126
  128. 128. DBSB Query Generation (2) •  25 queries are selected –  For each of the features, manually select the part of the query to be varied (IRI or filter condition) –  Variability of query template(s) for the chosen values is sufficiently high (>=1000 per query template) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 127 Method ensures that •  Executed queries during the benchmark differ •  Always return non empty results
  129. 129. Apples and Oranges [DKS+11] •  Propose structuredness to characterize datasets –  The level of structuredness of a dataset D, with respect to a type (class) T, is determined by how well the instances of T, conform to type T –  If each instance of T has the properties defined in T, then the dataset has high structuredness with respect to T 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 128 0 1 2 3 4 5 6 name office ext major GPA OC(p, I(T, D)) OC(p, T) for each property p of T 0 1 2 3 4 5 6 name office ext major GPA Highly structured dataset •  all instances have the name attribute •  ext & GPA properties encountered in 50% of the instances •  οffice property found in 20% of the instances •  major property in 10% of the instances •  all instances have all attributes
  130. 130. Apples and Oranges [DKS+11] •  One of the key considerations while deciding: –  appropriate data representation format (e.g., relational for structured and XML for semi-structured data) –  organization of data (e.g., dependency theory and normal forms for the relational model, and XML). –  data indexes (e.g., B+-tree indexes for relational and numbering scheme-based indexes for XML). –  data querying (e.g., using SQL for the relational and XPath/ XQuery for XML). 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 129 In other words, structuredness permeates every aspect of data management
  131. 131. Apples and Oranges [DKS+11] 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 130 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Highly structured datasets (relational like) Less structured datasets Synthetic Datasets Real Datasets
  132. 132. Apples and Oranges [DKS+11] Some important observations: •  Since TPC-H is a relational dataset, it should have high structuredness. •  There is a difference between synthetic and and real datasets. •  Synthetic are fairly structured and relational-like •  Real datasets cover the whole spectrum of structuredness. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 131 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Structuredness of datasets Existing RDF stores are tested and compared against each other with respect to datasets that are not representative of most real RDF data.
  133. 133. Apples and Oranges [DKS+11] •  Nothing can better represent data than the data itself! •  Idea: Turn every dataset into a benchmark 1.  No need to synthetically generate values •  Use the actual data values in the dataset 2.  No need to synthetically generate queries. •  The queries that are known to run in your data can be used in the benchmark. 3.  But we need to cover the structuredness spectrum •  to get data as close as possible to the real world data •  to see how the systems perform when data goes from very structured to less structured 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 132
  134. 134. Counting Coins [DKS+11] •  Start with a dataset with size S and CH = 0.5 •  Aim for a dataset with size S’ and CH’, where S > S’ and CH > CH’. Process: •  Assign a coin to each triple (s, p, o) and compute the impact in CH of its removal –  The removal will impact the size by 1. Example: Consider (person1, ext, x5304). Removing the triple from D gives a dataset with CH(T, D) = 0.467. Therefore the coin(person1, ext, x5304) = 0.5 – 0.467 = 0.033. •  Formulate (automatically) an integer programming problem whose solutions will tell us how many coins to remove to achieve the desired coherence CH’ and size S’. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 133 subject predicate object person0 name Eric person0 office BA7430 person0 ext x4401 person1 name Kenny person1 office BA7349 person1 office BA5439 person1 ext x5304 person2 name Kyle person2 ext x6281 person3 name Timmy person3 major C.S. person3 GPA 3.4 person4 name Stan person4 GPA 3.8 person5 name Jimmy person5 GPA 3.7 One of the few occasions in life where having too many coins is undesirable…
  135. 135. Technical challenges in problem formulation •  Compute coins which represent the impact on structuredness of removing all triples with subjects that are instances of a type T with properties equal to p –  Therefore one coin for each type/property combination. •  Add constraints that set lower and upper bounds on the number of coins that can be removed so as not to completely remove a property from a type. •  Add constraints which guarantee that not all instances of a type are removed. •  To deal we multi-valued properties, we add constraints that introduce a relaxation parameter ρ –  required because of the approximation by using the average number of triples per coin. 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 134
  136. 136. Waterloo SPARQL Diversity Test Suite [AHO+14] •  Stress existing RDF engines to reveal a wider range of query requirements as established by web applications •  Contributions –  Definition of 2 classes of query features used to evaluate the variability of workloads and datasets •  Structural (e.g., number of triple patterns) •  Data-driven (affect selectivity and result cardinality) –  In-depth analysis of existing SPARQL benchmarks using the structural and data-driven features –  WatDiv Test Suite to stress existing RDF engines to reveal a wider range of query requirements 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 135
  137. 137. WatDiv Structural Features (1) 1.  Triple Pattern Count –  Number of triple patterns in SPARQL Graph Patterns 2.  Join Vertex Count –  Number of RDF terms (IRIs, literals, blank nodes) and variables that are subjects or objects of multiple triple patterns 3.  Join Vertex Degree –  The degree of a join vertex v is the number of triple patterns whose subject or object is v 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 136 SP2Bench Q5a SELECT DISTINCT ?person ?name WHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person2. ?person foaf:name ?name. ?person2 foaf:name ?name2 FILTER(?name=?name2) } Triple Count Join Vertices Join Vertex Count Join Vertex Degree 6 ?article, ?inproc ?person, ? person2 10 ?article:2, ?inproc:2 ?person:2, ? person2:2
  138. 138. WatDiv Structural Features (2) •  Join Vertex Degree & Count provide a good characterization of the structural complexity of a query –  Number of triple patterns does not properly characterize the query: two queries with the same set of triple patterns can have different structures 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 137 ?n ?m ?x ?l C E ?k A ?y ?b ?z ?d ?o Linear query ?c D D ?x ?b B ?z C ?w D ?b E ?w Snowflake query ?y ?b ?x B E A D ?z C ?c Star query ?m ?f G
  139. 139. WatDiv Structural Features (3) •  Join Vertex Type –  Play an important role in the behavior of RDF engines to determine efficient query plans •  E.g., star queries promote efficient merge joins •  3 (mutually non-exclusive) types of join vertices –  Vertex x of type SS+ if for all triple patterns (s,p,o)*, x is the subject –  Vertex x of type OO+ if for all triple patterns (s,p,o)*, x is the object –  Vertex x of type SO+ if for all triple patterns (s,p,o)*, (s’,p’,o’) x=s & x=o’ 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 138 ?n ?m ?x ?l C E ?k ?m type SS+ ?x B ?z C ?w ?x type OO+ ?c D D ?x ?b B ?z C ?w ?x type SO+ *Triple pa8erns (s,p,o) are incident on x
  140. 140. WatDiv Data-driven Features (1) •  A system’s choice on the most efficient query plan depends on –  (a) the characteristics of the dataset and –  (b) the query •  If the system relies on selectivity estimations and result cardinality, the same query will have a different query plan for dataset(s) of different sizes •  Different cases: –  Queries have a diverse mix of result cardinalities –  Some triple patterns are very selective, others are not –  All triple patterns are equally selective 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 139
  141. 141. WatDiv Data-driven Features (2) •  Result Cardinality CARD(Ā,G) –  the number of solutions in the result of the evaluation of a graph pattern Ā = <A, F> over graph G •  Filter Triple Pattern Selectivity (f-TP Selectivity) SELF G (tp) –  the ratio of distinct solution mappings of a triple pattern tp to the set of triples in graph G •  Measures 1.  Result cardinality 2.  Mean & standard deviation of f-TP selectivities of triple patterns •  Important for distinguishing queries whose triple patterns are almost equally selective from queries with varying f-TP selectivities 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 140
  142. 142. WatDiv Data-driven Features (3) •  Result Cardinality & f-TP selectivity are not sufficient –  Intermediate solution mappings will not make it to the final result (e.g., due to filters or more restrictive joins) –  The overall selectivity of a graph pattern can be determined by a single very selective triple pattern •  Run-time optimization techniques (e.g., side-ways information passing) to early prune intermediate results •  Introduce 2 features to capture above cases 1.  BGP-Restricted f-TP selectivity 2.  Join-Restricted f-TP selectivity 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 141
  143. 143. WatDiv Data-Driven Features (4) •  BGP-Restricted f-TP selectivity SELF G (tp|Ā) •  assesses how much a triple pattern contributes to the overall selectiveness of the query •  fraction of distinct solution mappings for a triple pattern that are compatible with some solution mapping in the query result. •  Join-restricted f-TP selectivity SELF G (tp|x) •  assesses how much a filtered triple pattern contributes to the overall selectiveness of the joins that it participates in •  for x a join vertex and tp a triple pattern incident on x, the x- restricted f-TP of tp over graph G is the fraction of distinct solution mappings compatible with a solution mapping in the query result of the sub-query that contains all triple patterns incident to x 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 142
  144. 144. WatDiv Test Suite (1) •  Components: Data Generator and Query Generator •  Data Generator –  Allows users to define their own dataset controlling •  Entities to include •  Topology of the graphs allowing one to mimic the real types of data distributions in the Web – «well-structuredness» of entities – probability of entity associations – cardinality of property associations –  Important: Instances of the same entity do not have the same set of attributes: breaking the «relational nature» of previous RDF benchmarks 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 143
  145. 145. WatDiv Test Suite (2) •  Query Template Generator –  User-specified number of templates –  User specified template characteristics •  Number of triple patterns •  Types of joins and filters in the triple patterns –  Traverses the WatDiv schema using a random walk and generates a set of query templates •  Query Generator –  Instantiates the query templates with terms (IRIs, literals etc.) from the RDF dataset –  User-specified number of queries produced 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 144
  146. 146. WatDiv Test Suite (3) •  Query Template Generator –  Random Walk on an internal representation of the schema •  Entity types in the schema correspond to graph vertices •  Relationships (i.e., object type properties) are graph edges •  Vertices are annotated with data type properties (i.e., attributes) –  Produces a set of Basic Graph Patterns with a maximum n triple patterns with unbound objects and subjects –  k uniformly randomly selected subjects/objects are replaced with placeholders –  Placeholders are replaced with actual RDF terms randomly retrieved from the dataset 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 145
  147. 147. Comparison of WatDiv with other RDF Benchmarks 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 146 Copyright [AHO+14] •  Query Workload –  Large range of queries •  Mean join vertex degree distributed among 2 and 10 –  Join Vertex Types: •  18% of queries are star joins, 4.4% in DBSB •  61.3% of queries are path queries, 5.4% in DBSB
  148. 148. Comparison of WatDiv with other RDF Benchmarks 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 147 Copyright [AHO+14] •  Data-Driven Features –  DBSB and BSBM cover the ends of the spectrum of mean Join-Restricted f- TP selectivity values –  WatDiv covers the full spectrum of Restricted f-TP selectivity values –  WatDiv covers a lower range of values for mean f-TP selectivity when compared to DBSB General Remarks •  comparable to DBSB •  more diverse than LUBM, SP2Bench and BSBM
  149. 149. FEASIBLE [SNM15] •  Proposes a feature-based benchmark generation approach from real queries –  Structure-based –  Data-driven based •  Approach is similar to WatDiv Test Suite •  Novel sampling approach for queries based on exemplars and medoids •  Propose SELECT, ASK, CONSTRUCT and DESCRIBE SPARQL queries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 148
  150. 150. FEASIBLE Query Features •  Number of Triple Patterns •  Number of Join Vertices –  Distinguishing between «star», «path» , «hybrid» and «sink» vertices •  Join Vertex Degree –  Sum of incoming and outgoing edges of the vertex •  Triple Pattern Selectivity –  Ratio of triples that match the triple pattern over all triples in the dataset 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 149 o1 x o2 p1 p2 x yp1 p2 z Star vertex: x Path vertex: x Hybrid vertex: x o1 x o2 p1 p2 y z Sink vertex: x x y z
  151. 151. FEASIBLE Benchmark Generation •  3-step benchmark generation •  Data-set Cleaning –  Leads to practically reliable benchmarks •  Normalization of Feature Vectors –  Query selection process requires distances between queries to be computed –  Normalize the query representations so that all queries are in a unit hypercube •  Query Selection –  Based on the idea of exemplars [NS11] 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 150
  152. 152. FEASIBLE Benchmark Generation •  Dataset Cleaning –  Remove erroneous and zero-result queries from the set of real queries used to generate the benchmark –  Exclude all syntactically incorrect queries –  Attach 9 SPARQL operators (UNION, DISTINCT, OPTIONAL, .. ) and 7 query features (join vertices, join vertex count etc.) to each of the queries 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 151
  153. 153. FEASIBLE Benchmark Generation •  Normalization of Feature Vectors –  Queries are mapped to a vector of length 16 which stores the query features •  For binary SPARQL clauses (e.g., UNION is either used or not used), store value 1. Else store value 0 •  All non-binary feature vectors are normalized by dividing their value with the overall maximum value in the data set •  Query representations are associated with values between 1 and 0 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 152

×