Challenges in the Design of a Graph Database Benchmark
Upcoming SlideShare
Loading in...5
×
 

Challenges in the Design of a Graph Database Benchmark

on

  • 1,064 views

 

Statistics

Views

Total Views
1,064
Views on SlideShare
1,062
Embed Views
2

Actions

Likes
2
Downloads
18
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Challenges in the Design of a Graph Database Benchmark Challenges in the Design of a Graph Database Benchmark Presentation Transcript

  • Marcus Paradies Challenges in the Design of a Graph Database Benchmark FOSDEM‘12 – Graph Processing DevRoom© Prof. Dr.-Ing. Wolfgang Lehner |
  • > Outline  Motivation  Challenges  Thoughts on Graph Data Generation  Thoughts on Query Workload  Summary and Outlook  Discussion Marcus Paradies | FOSDEM 2012 | 1
  • > Motivation  Graph databases are gaining momentum  Enterprise corporations are getting interested  How to compare the available graph database vendors?  Main issue: Results from benchmarks are not comparable  Lack of standardization in the data model and query language  What are “typical“ graph operations? Marcus Paradies | FOSDEM 2012 | 2
  • > Challenges Marcus Paradies | FOSDEM 2012 | 3
  • > Challenge #1: Application Domain  Graph data is not homogenous  Graph data from different domains follows different patterns  Examples:  Social Network Analysis (SNA)  Protein Interaction Analysis  Recommendation Systems  Supply Chain Management (Vehicle Routing, CRM)  Fraud Detection in Financial Systems  … Challenge: Find an application domain which represents a graph data pattern common in many different scenarios. Marcus Paradies | FOSDEM 2012 | 4
  • > Challenge #2: Graph Data Model What flavours of graph data models are commonly used? Marcus Paradies | FOSDEM 2012 | 5
  • > Challenge #2: Graph Data Model Directed Graph Marcus Paradies | FOSDEM 2012 | 6
  • > Challenge #2: Graph Data Model Directed Graph Undirected Graph Marcus Paradies | FOSDEM 2012 | 7
  • > Challenge #2: Graph Data Model Directed Graph Undirected Graph Mixed Graph Marcus Paradies | FOSDEM 2012 | 8
  • > Challenge #2: Graph Data Model Directed Graph Undirected Graph Mixed Graph Multi Graph Marcus Paradies | FOSDEM 2012 | 9
  • > Challenge #2: Graph Data Model (Plain) Property Directed Graph Graph Undirected Graph Mixed Graph Multi Graph Marcus Paradies | FOSDEM 2012 | 10
  • > Challenge #2: Graph Data Model (StructuredProperty Graph) (Plain) Property Directed Graph Graph Undirected Graph Mixed Graph Multi Graph Marcus Paradies | FOSDEM 2012 | 11
  • > Challenge #2: Graph Data Model (StructuredProperty Graph) (Plain) Property Directed Graph Graph Undirected Graph Mixed Graph Multi Graph Hyper Graph Marcus Paradies | FOSDEM 2012 | 12
  • > Challenge #2: Graph Data Model (StructuredProperty Graph) (Plain) Property Directed Graph Graph Undirected Graph Mixed Graph Multi Graph Hyper Graph Challenge: Find a graph data model suited for the majority of use cases from various domains. Marcus Paradies | FOSDEM 2012 | 13
  • > Challenge #3: Querying Graph Data  Large variety in graph processing and manipulation languages  Each graph database vendor implements own query languages/APIs  Reason: No standardized graph query language available Marcus Paradies | FOSDEM 2012 | 14
  • > Challenge #3: Querying Graph Data  Large variety in graph processing and manipulation languages  Each graph database vendor implements own query languages/APIs  Reason: No standardized graph query language available Challenge: Find a way to abstract from the zoo of available query languages. Marcus Paradies | FOSDEM 2012 | 15
  • > Challenge #4: Defining the Workload  The workload to be defined is dependent from the underlying query/manipulation language  Should complex (algorithmic) operations be part of a database benchmark?  Which algorithms to pick?  Social Network Analysis → Find communities  Supply Chain Management → Find maximal flow  Web of Data → Find pattern matches  How are concurrent users represented?  What about transactionality? Marcus Paradies | FOSDEM 2012 | 16
  • > Thoughts on Graph Data Generation Marcus Paradies | FOSDEM 2012 | 17
  • > Graph Data Generation - Patterns  Understanding graph patterns (characteristics) is crucical for a good graph data generator  What are distinguishing characteristics of graphs?  How can we identify graph patterns on large graphs?  Three main patterns [1]:  Power law distributed  Small diameters  Community Effects ? ? = = Marcus Paradies | FOSDEM 2012 | 18
  • > Pattern 1 – Power law distributed source: [2] source: [2]  Most real-world graph data sets follow a power law distribution  Examples:  Internet router graph  Subsets of the WWW  Citation Graphs Marcus Paradies | FOSDEM 2012 | 19
  • > Pattern 2 – Small Diameters  Effective Diameter (eccentricity): Minimum number of hops, in which a fraction (e.g. 90%) of all connected pairs of nodes can reach each other  Other measures exist as well, but are not applicable to disconnected graphs  In most use cases, diameter is much smaller than the size of the graph  Examples:  97% eccentricity of around 16 for path lengths in the WWW  Average path length around 6 for Epinions social network source: [1] Marcus Paradies | FOSDEM 2012 | 20
  • > Pattern 3 – Community Effects  Community: A set of nodes, where each node in the set is closer to all other nodes in the community than to nodes outside the community.  Communities can be found in many real-world graphs, especially social networks and collaboration networks  Clustering Coefficient C: A measure, which qualifies the „clumpiness“ of a graph Marcus Paradies | FOSDEM 2012 | 21
  • > Thoughts on Query Workload Marcus Paradies | FOSDEM 2012 | 22
  • > Query Workload - Operations  Graph Manipulation Operations  Add/Update/Remove Nodes from the Graph  Add/Update/Remove Edges from the Graph  Add/Update/Remove Edge attributes  Add/Update/Remove Node attributes  Graph Query Operations  Retrieve selection of nodes from given filter expression  Getting the neighbors of a set of nodes (possibly with edge filter constraints)  Graph Traversals  Based on basic query operations  Exploration of neighborhood from a given set of start nodes  Terminated by the number of steps and/or edge/node filter constraints  Graph Analytical Operations  Aggregation operations such as sum, avg, min, max  Aggregations on node-level and on edge-level Marcus Paradies | FOSDEM 2012 | 23
  • > Query Workload - Measures  Closely related to benchmark capabilities  Measures from relational benchmarks apply such as  Average query response time  Transactions per second (throughput)  Additional measures for graph traversals  Traversals per second  What about distributed scenarios?  What about concurrent users? Marcus Paradies | FOSDEM 2012 | 24
  • > Summary and Outlook  Graph data distribution highly important for graph database benchmark  Application domains do have very specific graph characteristics  A graph database benchmark has to provide abstract and high-level graph operation descriptions  Feel free to contact me if you want to contribute: marcus.paradies@gmail.com Marcus Paradies | FOSDEM 2012 | 25
  • > Discussion Marcus Paradies | FOSDEM 2012 | 26
  • > Theses  A benchmark based on social network data is nice, but might be not be that representative for large enterprise applications  Algorithms should NOT be part of a graph database benchmark  Only support basic operations such as simple lookups and path traversals  The underlying graph data model should be a simple property graph  A graph database has to scale in terms of data size as well as number of concurrent users  .... Marcus Paradies | FOSDEM 2012 | 27
  • > References [1] Graph Mining: Laws, Generators, and Algorithms (2006) [2] http://konect.uni-koblenz.de/ [3] A Discussion on the Design of Graph Database Benchmarks (2010) Marcus Paradies | FOSDEM 2012 | 28