Graphalytics:
A big data benchmark
for graph processing platforms
Mihai Capotă
2
Problem
• Lack of benchmarks for generic graph
processing platforms
• Graph500
• BFS
• Kroneker graph
• Several academic studies
• Specific to graph or RDF databases
• Ad-hoc setup, difficult to extend
3
Graphalytics
• Advance benchmark harness
• Choke-point analysis
• Enhanced LDBC Datagen
• Sponsored by Oracle
4
Advanced benchmark harness
Benchmark
Core
Report
Generator
Platform-specific
algorithm
implementation
Output
Validator
System
Monitor
Dataset
Generator
Datasets
User
Results
Graph processing
platform
Configuration
5
Monitoring & Logging
6
Choke-point analysis
• Choke points are crucial technological
challenges that platforms are struggling
with
• Select benchmark workload based on real-
world scenarios, but make sure they cover
the identified choke points
• Examples:
• Network traffic
• Access locality
• Skewed execution
7
Enhanced LDBC Datagen
• Multiple node degree distributions
• Previously Facebook only
• Currently added Zeta and Geometric
• Different structural characteristics
• Average clustering coefficient
• Assortativity
• Improved graph generation
• Generate only friendship graph
• MapReduce optimizations
8
Results
9
Discussion
• How much preprocessing should we allow in the ETL phase?
How to choose a metric that captures the preprocessing?
• How should we asses the correctness of algorithms that
produce approximate results?
• How to setup the platforms? Should we allow algorithm-
specific platform setups or should we require only one setup
to be used for all algorithms?
http://graphalytics.ewi.tudelft.nl

Graphalytics: A big data benchmark for graph processing platforms