Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing"

Mizan: A System for Dynamic Load
Balancing in Large-scale Graph Processing

Zuhair Khayyat1 Karim Awara1 Amani Alonazi1
Hani Jamjoom2 Dan Williams2 Panos Kalnis1
1
King Abdullah University of Science and Technology
Thuwal, Saudi Arabia
2
IBM Watson Research Center
Yorktown Heights, NY

1/34

Importance of Graphs

Graphs abstract application-speciﬁc algorithms into generic
problems represented as interactions using vertices and edges

Max ﬂow in road network Diameter of WWW

Ranking in social networks Simulating protein interactions

Algorithms vary in their computational requirements

2/34

How to Scale Graph Processing

Pregel was introduced by Google as a scalable abstraction for
graph processing
Overcomes the limitations of processing graphs on MapReduce
Based on vertex-centric computation
Utilizes bulk synchronous parallel (BSP)

System Programming Data Computations
Abstraction Exchange
map() Key based
MapReduce combine() grouping Disk based
reduce()
compute() Message
Pregel combine() passing In memory
aggregate()

3/34

Pregel’s BSP Graph Processing

Superstep 1 Superstep 2 Superstep 3

Worker 1 Worker 1 Worker 1



BSP Barrier

Balanced computation and communication is fundamental to
Pregel’s eﬃciency

4/34

Optimizing Graph Processing

Existing work focus on optimizing for graph structure (static
optimization):
Optimize graph partitioning:
Simple graph partitioning schemes (e.g., hash or range): Giraph
User-deﬁned partitioning function: Pregel
Sophisticated partitioning techniques (e.g., min-cuts): GraphLab,
PowerGraph, GoldenOrb and Surfer
Optimize graph data access:
Distributed data stores and graph indexing: GoldenOrb, Hama
and Trinity

5/34

Optimizing Graph Processing

Existing work focus on optimizing for graph structure (static
optimization):
Optimize graph partitioning:
Simple graph partitioning schemes (e.g., hash or range): Giraph
User-deﬁned partitioning function: Pregel
Sophisticated partitioning techniques (e.g., min-cuts): GraphLab,
PowerGraph, GoldenOrb and Surfer
Optimize graph data access:
Distributed data stores and graph indexing: GoldenOrb, Hama
and Trinity
What about algorithm behavior?
Pregel provides coarse-grained load balancing, is it enough?

5/34

Types of Algorithms

Algorithms behaves diﬀerently, we classify algorithms in two
categories depending on their behavior

Algorithm Type In/Out Vertices Examples
Messages State
PageRank,
Stationary Predictable Fixed Diameter estimation,
WCC
Distributed minimum
spanning tree,
Non-stationary Variable Variable Belief propagation,
Graph queries,
Ad propagation

6/34

Types of Algorithms – First Superstep

V2 V8 V2 V8

V4 V1 V6 V4 V1 V6

V7 V3 V5 V7 V3 V5

PageRank DMST

7/34

Types of Algorithms – Superstep k

V2 V8 V2 V8

V4 V1 V6 V4 V1 V6

V7 V3 V5 V7 V3 V5

PageRank DMST

8/34

Types of Algorithms – Superstep k + m

V2 V8 V2 V8

V4 V1 V6 V4 V1 V6

V7 V3 V5 V7 V3 V5

PageRank DMST

9/34

What Causes Computation Imbalance in
Non-stationary Algorithms?





BSP Barrier
-Vertex response time -Time to receive in messages
-Time to send out messages

10/34

Why to Optimize for Algorithm Behavior?

Diﬀerence between stationary and non-stationary algorithms

1000
In Messages (Millions)

100
10
1
0.1
0.01
0.001
0 10 20 30 40 50 60
SuperSteps

PageRank - Total DMST - Total
PageRank - Max/W DMST - Max/W

11/34

What is Mizan?

BSP-based graph processing framework

Uses runtime ﬁne-grained vertex migrations to balance
computation and communication

Follows the Pregel programming model

Open source, written in C++

12/34

PageRank’s compute() in Mizan

void compute(messageIterator * messages, userVertexObject * data,
messageManager * comm) {

double currVal = data->getVertexValue().getValue();
double newVal = 0; double c = 0.85;

if (data->getCurrentSS() > 1) {
while (messages->hasNext()) {
newVal = newVal + messages->getNext().getValue();
}
newVal = newVal * c + (1.0 - c) / ((double) vertexTotal);
data->setVertexValue(mDouble(newVal));
} else {newVal = currVal;}

if (data->getCurrentSS() <= maxSuperStep) {
mDouble outVal(newVal / ((double) data->getOutEdgeCount()));
for (int i = 0; i < data->getOutEdgeCount(); i++) {
comm->sendMessage(data->getOutEdgeID(i), outVal);
}
} else {data->voteToHalt();}
}

13/34

Migration Planning Objectives

Decentralized
Simple
Transparent
No need to change Pregel’s API
Does not assume any a priori knowledge to graph structure or
algorithm

14/34

Mizan’s Migration Barrier

Mizan performs both planning and migrations after all workers
reach the BSP barrier





Migration Planner
Migration Barrier
BSP Barrier

15/34

Mizan’s Migration Planning Steps
1 Identify the source of imbalance: By comparing the worker’s
execution time against a normal distribution and ﬂagging outliers

16/34

1 Identify the source of imbalance: By comparing the worker’s
execution time against a normal distribution and ﬂagging outliers

Mizan monitors for each vertex:
Remote outgoing messages
All incoming messages
Response time
High level summaries are broadcast to each worker
Vertex
Response Time
Remote Outgoing Messages

V6 V2 V1
V4 V3
V5

Mizan Mizan

Worker 1 Worker 2
Local Incoming Messages Remote Incoming Messages

16/34


1 Identify the source of imbalance
2 Select the migration objective:
Mizan ﬁnds the strongest cause of workload imbalance by
comparing statistics for outgoing messages and incoming
messages of all workers with the worker’s execution time

17/34


2 Select the migration objective:
Mizan ﬁnds the strongest cause of workload imbalance by
comparing statistics for outgoing messages and incoming
messages of all workers with the worker’s execution time
The migration objective is either:
Optimize for outgoing messages, or
Optimize for incoming messages, or
Optimize for response time

17/34


2 Select the migration objective
3 Pair over-utilized workers with under-utilized ones:
All workers create and execute the migration plan in parallel
without centralized coordination
Each worker is paired with one other worker at most

W7 W2 W1 W5 W8 W4 W0 W6 W9 W3

0 1 2 3 4 5 6 7 8

18/34


3 Pair over-utilized workers with under-utilized ones
4 Select vertices to migrate: Depending on the migration
objective

19/34


3 Pair over-utilized workers with under-utilized ones
4 Select vertices to migrate
5 Migrate vertices:
How to migrate vertices freely across workers while maintaining
vertex ownership and fast updates?
How to minimize migration costs for large vertices?

20/34

Vertex Ownership

Mizan uses distributed hash table (DHT) to implement a
distributed lookup service:
V can execute at any worker
V ’s home worker ID = (hash(ID) mod N )
Workers ask the home worker of V for its current location
The home worker is notiﬁed with the new location as V migrates

Where is V5 & V8

V1 V2 V8 V4 V6 V7 V8
Compute
V3 V5 V2 V5 V9 V1 V3 V9

DHT V1:W1 V4:W2 V7:W3 V2:W1 V5:W2 V8:W3 V3:W1 V6:W2 V9:W3


21/34

Migrating Vertices with Large Message Size
Introduce delayed migration for very large vertices:
At SS t: only ownership of vertex v is moved to workernew

Superstep t Superstep t+1 Superstep t+2
Worker_old Worker_old Worker_old
1 3 1 3 1 3

2 7 2 7 2

4 4 '7 4 7

5 6 5 6 5 6

Worker_new Worker_new Worker_new

22/34

At SS t + 1:
workernew receives vertex 7 messages
workerold process vertex 7

Superstep t Superstep t+1
1 3 1 3 1 3

2 7 2 7 2

4 4 '7 4 7

5 6 5 6 5 6


22/34

At SS t + 1:
workernew receives vertex 7 messages
workerold process vertex 7
After SS t + 1: workerold moves state of vertex 7 to workernew

Superstep t Superstep t+1 Superstep t+2
1 3 1 3 1 3

2 7 2 7 2

4 4 '7 4 7

5 6 5 6 5 6


22/34

Experimental Setup

Mizan is implemented on C++ with MPI with three variations:
Static Mizan: Emulates Giraph, disables dynamic migration
Work stealing (WS): Emulates Pregel’s coarse-grained
dynamic load balancing
Mizan: Activates dynamic migration

Local cluster of 21 machines:
Mix of i5 and i7 processors with 16GB RAM Each

IBM Blue Gene/P supercomputer with 1024 compute nodes:
Each is a 4 core PowerPC450 CPU at 850MHz with 4GB RAM

23/34

Datasets

Experiments on public datasets:
Stanford Network Analysis Project (SNAP)
The Laboratory for Web Algorithmics (LAW)
Kronecker generator

name |nodes| |edges|

kg1 (synthetic) 1,048,576 5,360,368
kg4m68m (synthetic) 4,194,304 68,671,566
web-Google 875,713 5,105,039
LiveJournal1 4,847,571 68,993,773
hollywood-2011 2,180,759 228,985,632
arabic-2005 22,744,080 639,999,458

24/34

Giraph vs. Static Mizan

40
Giraph 12 Giraph
35 Static Mizan Static Mizan
30 10

Run Time (Min)
Run Time (Min)

25 8
20 6
15
4
10
2
5
0
0
web k Live k 1 2 4 8 16
- Goo g1 Jour g4m68
nal1
gle m
Vertex count (Millions)

Figure: PageRank on social network Figure: PageRank on regular random
and random graphs graphs, each has around 17M edge

25/34

Eﬀectiveness of Dynamic Vertex Migration

PageRank on a social graph (LiveJournal1)
The shaded columns: algorithm runtime
The unshaded columns: initial partitioning cost

30

25
Run Time (Min)

20

15

10

5

0
Mizan

Mizan

Mizan
Static

Static

Static
WS

WS

WS
Hash Range Metis

26/34

Eﬀectiveness of Dynamic Vertex Migration

Comparing the performance on highly variable messaging pattern
algorithms, which are DMST and ad propagation simulation, on
a metis partitioned social graph (LiveJournal1)

300
Static
250 Work Stealing
Mizan
Run Time (Min)

200

150

100

50

0
Adve DMS
r tism T
ent

27/34

Mizan’s Overhead with Scaling

8 10
Mizan - hollywood-2011 Mizan - Arabic-2005
7
8
6

Speedup
5
Speedup

6
4
4
3
2 2
1
0 0

64
128

256

512

1024
0 2 4 6 8 10 12 14 16
Compute Nodes
Compute Nodes

Figure: Speedup on Linux Cluster
Figure: Speedup on IBM
Blue Gene/P supercomputer

28/34

Future Work

Work with skewed graphs
Improving Pregel’s fault tolerance

29/34

Conclusion

Mizan is a Pregel system that uses ﬁne-grained vertex migration
to load balance computation and communication across
supersteps
Mizan is an open source project developed within InfoCloud
group in KAUST in collaboration with IBM, programmed in
C++ with MPI
Mizan scales up to thousands of workers
Mizan improves the overall computation cost between 40% up to
two order of magnitudes with less than 10% migration overhead
Download it at: http://cloud.kaust.edu.sa
Try it on EC2 (ami-52ed743b)

30/34

Extra: Migration Details and Costs

4
Superstep Runtime
3.5 Average Workers Runtime
Run Time (Minutes)

3 Migration Cost
Balance Outgoing Messages
2.5 Balance Incoming Messages
2 Balance Response Time
1.5
1
0.5
0
5 10 15 20 25 30
Supersteps

31/34

Extra: Migrated Vertices per Superstep

4
Migrated Vertices (x1000)

500 Migration Cost

Migration Cost (Minutes)
Total Migrated Vertices
400 Max Vertices Migrated 3
by Single Worker
300
2
200
1
100

0 0
5 10 15 20 25 30
Supersteps

32/34

Extra: Mizan’s Migration Planning

Compute()

BSP
Barrier
Pair with
Underutilized
Worker

Migration No Superstep
Barrier Imbalanced?

Select
Yes Vertices
and Migrate
Identify Source
of Imbalance

No Worker Yes
Overutilized?

33/34

Extra: Architecture of Mizan

Vertex Compute()

Mizan Worker

BSP Processor
Migration
Communicator - DHT
Planner
Storage Manager

IO

HDFS/Local Disks

34/34

Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing"

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing"

Similar to Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing" (20)

More from Zuhair khayyat

More from Zuhair khayyat (9)

Recently uploaded

Recently uploaded (20)

Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing"