• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing"
 

Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing"

on

  • 879 views

This presentation is for the published paper "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing" which was presented in EuroSys 2013

This presentation is for the published paper "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing" which was presented in EuroSys 2013

Statistics

Views

Total Views
879
Views on SlideShare
874
Embed Views
5

Actions

Likes
1
Downloads
22
Comments
0

1 Embed 5

https://twitter.com 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing" Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing" Presentation Transcript

    • Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat1 Karim Awara1 Amani Alonazi1 Hani Jamjoom2 Dan Williams2 Panos Kalnis1 1 King Abdullah University of Science and Technology Thuwal, Saudi Arabia 2 IBM Watson Research Center Yorktown Heights, NY1/34
    • Importance of Graphs Graphs abstract application-specific algorithms into generic problems represented as interactions using vertices and edges Max flow in road network Diameter of WWW Ranking in social networks Simulating protein interactions Algorithms vary in their computational requirements2/34
    • How to Scale Graph Processing Pregel was introduced by Google as a scalable abstraction for graph processing Overcomes the limitations of processing graphs on MapReduce Based on vertex-centric computation Utilizes bulk synchronous parallel (BSP) System Programming Data Computations Abstraction Exchange map() Key based MapReduce combine() grouping Disk based reduce() compute() Message Pregel combine() passing In memory aggregate()3/34
    • Pregel’s BSP Graph Processing Superstep 1 Superstep 2 Superstep 3 Worker 1 Worker 1 Worker 1 Worker 2 Worker 2 Worker 2 Worker 3 Worker 3 Worker 3 BSP Barrier Balanced computation and communication is fundamental to Pregel’s efficiency4/34
    • Optimizing Graph Processing Existing work focus on optimizing for graph structure (static optimization): Optimize graph partitioning: Simple graph partitioning schemes (e.g., hash or range): Giraph User-defined partitioning function: Pregel Sophisticated partitioning techniques (e.g., min-cuts): GraphLab, PowerGraph, GoldenOrb and Surfer Optimize graph data access: Distributed data stores and graph indexing: GoldenOrb, Hama and Trinity5/34
    • Optimizing Graph Processing Existing work focus on optimizing for graph structure (static optimization): Optimize graph partitioning: Simple graph partitioning schemes (e.g., hash or range): Giraph User-defined partitioning function: Pregel Sophisticated partitioning techniques (e.g., min-cuts): GraphLab, PowerGraph, GoldenOrb and Surfer Optimize graph data access: Distributed data stores and graph indexing: GoldenOrb, Hama and Trinity What about algorithm behavior? Pregel provides coarse-grained load balancing, is it enough?5/34
    • Types of Algorithms Algorithms behaves differently, we classify algorithms in two categories depending on their behavior Algorithm Type In/Out Vertices Examples Messages State PageRank, Stationary Predictable Fixed Diameter estimation, WCC Distributed minimum spanning tree, Non-stationary Variable Variable Belief propagation, Graph queries, Ad propagation6/34
    • Types of Algorithms – First Superstep V2 V8 V2 V8 V4 V1 V6 V4 V1 V6 V7 V3 V5 V7 V3 V5 PageRank DMST7/34
    • Types of Algorithms – Superstep k V2 V8 V2 V8 V4 V1 V6 V4 V1 V6 V7 V3 V5 V7 V3 V5 PageRank DMST8/34
    • Types of Algorithms – Superstep k + m V2 V8 V2 V8 V4 V1 V6 V4 V1 V6 V7 V3 V5 V7 V3 V5 PageRank DMST9/34
    • What Causes Computation Imbalance in Non-stationary Algorithms? Superstep 1 Superstep 2 Superstep 3 Worker 1 Worker 1 Worker 1 Worker 2 Worker 2 Worker 2 Worker 3 Worker 3 Worker 3 BSP Barrier -Vertex response time -Time to receive in messages -Time to send out messages10/34
    • Why to Optimize for Algorithm Behavior? Difference between stationary and non-stationary algorithms 1000 In Messages (Millions) 100 10 1 0.1 0.01 0.001 0 10 20 30 40 50 60 SuperSteps PageRank - Total DMST - Total PageRank - Max/W DMST - Max/W11/34
    • What is Mizan? BSP-based graph processing framework Uses runtime fine-grained vertex migrations to balance computation and communication Follows the Pregel programming model Open source, written in C++12/34
    • PageRank’s compute() in Mizan void compute(messageIterator * messages, userVertexObject * data, messageManager * comm) { double currVal = data->getVertexValue().getValue(); double newVal = 0; double c = 0.85; if (data->getCurrentSS() > 1) { while (messages->hasNext()) { newVal = newVal + messages->getNext().getValue(); } newVal = newVal * c + (1.0 - c) / ((double) vertexTotal); data->setVertexValue(mDouble(newVal)); } else {newVal = currVal;} if (data->getCurrentSS() <= maxSuperStep) { mDouble outVal(newVal / ((double) data->getOutEdgeCount())); for (int i = 0; i < data->getOutEdgeCount(); i++) { comm->sendMessage(data->getOutEdgeID(i), outVal); } } else {data->voteToHalt();} }13/34
    • Migration Planning Objectives Decentralized Simple Transparent No need to change Pregel’s API Does not assume any a priori knowledge to graph structure or algorithm14/34
    • Mizan’s Migration Barrier Mizan performs both planning and migrations after all workers reach the BSP barrier Superstep 1 Superstep 2 Superstep 3 Worker 1 Worker 1 Worker 1 Worker 2 Worker 2 Worker 2 Worker 3 Worker 3 Worker 3 Migration Planner Migration Barrier BSP Barrier15/34
    • Mizan’s Migration Planning Steps 1 Identify the source of imbalance: By comparing the worker’s execution time against a normal distribution and flagging outliers16/34
    • Mizan’s Migration Planning Steps 1 Identify the source of imbalance: By comparing the worker’s execution time against a normal distribution and flagging outliers Mizan monitors for each vertex: Remote outgoing messages All incoming messages Response time High level summaries are broadcast to each worker Vertex Response Time Remote Outgoing Messages V6 V2 V1 V4 V3 V5 Mizan Mizan Worker 1 Worker 2 Local Incoming Messages Remote Incoming Messages16/34
    • Mizan’s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective: Mizan finds the strongest cause of workload imbalance by comparing statistics for outgoing messages and incoming messages of all workers with the worker’s execution time17/34
    • Mizan’s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective: Mizan finds the strongest cause of workload imbalance by comparing statistics for outgoing messages and incoming messages of all workers with the worker’s execution time The migration objective is either: Optimize for outgoing messages, or Optimize for incoming messages, or Optimize for response time17/34
    • Mizan’s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones: All workers create and execute the migration plan in parallel without centralized coordination Each worker is paired with one other worker at most W7 W2 W1 W5 W8 W4 W0 W6 W9 W3 0 1 2 3 4 5 6 7 818/34
    • Mizan’s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones 4 Select vertices to migrate: Depending on the migration objective19/34
    • Mizan’s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones 4 Select vertices to migrate 5 Migrate vertices: How to migrate vertices freely across workers while maintaining vertex ownership and fast updates? How to minimize migration costs for large vertices?20/34
    • Vertex Ownership Mizan uses distributed hash table (DHT) to implement a distributed lookup service: V can execute at any worker V ’s home worker ID = (hash(ID) mod N ) Workers ask the home worker of V for its current location The home worker is notified with the new location as V migrates Where is V5 & V8 V1 V2 V8 V4 V6 V7 V8 Compute V3 V5 V2 V5 V9 V1 V3 V9 DHT V1:W1 V4:W2 V7:W3 V2:W1 V5:W2 V8:W3 V3:W1 V6:W2 V9:W3 Worker 1 Worker 2 Worker 321/34
    • Migrating Vertices with Large Message Size Introduce delayed migration for very large vertices: At SS t: only ownership of vertex v is moved to workernew Superstep t Superstep t+1 Superstep t+2 Worker_old Worker_old Worker_old 1 3 1 3 1 3 2 7 2 7 2 4 4 7 4 7 5 6 5 6 5 6 Worker_new Worker_new Worker_new22/34
    • Migrating Vertices with Large Message Size Introduce delayed migration for very large vertices: At SS t: only ownership of vertex v is moved to workernew At SS t + 1: workernew receives vertex 7 messages workerold process vertex 7 Superstep t Superstep t+1 Worker_old Worker_old Worker_old 1 3 1 3 1 3 2 7 2 7 2 4 4 7 4 7 5 6 5 6 5 6 Worker_new Worker_new Worker_new22/34
    • Migrating Vertices with Large Message Size Introduce delayed migration for very large vertices: At SS t: only ownership of vertex v is moved to workernew At SS t + 1: workernew receives vertex 7 messages workerold process vertex 7 After SS t + 1: workerold moves state of vertex 7 to workernew Superstep t Superstep t+1 Superstep t+2 Worker_old Worker_old Worker_old 1 3 1 3 1 3 2 7 2 7 2 4 4 7 4 7 5 6 5 6 5 6 Worker_new Worker_new Worker_new22/34
    • Experimental Setup Mizan is implemented on C++ with MPI with three variations: Static Mizan: Emulates Giraph, disables dynamic migration Work stealing (WS): Emulates Pregel’s coarse-grained dynamic load balancing Mizan: Activates dynamic migration Local cluster of 21 machines: Mix of i5 and i7 processors with 16GB RAM Each IBM Blue Gene/P supercomputer with 1024 compute nodes: Each is a 4 core PowerPC450 CPU at 850MHz with 4GB RAM23/34
    • Datasets Experiments on public datasets: Stanford Network Analysis Project (SNAP) The Laboratory for Web Algorithmics (LAW) Kronecker generator name |nodes| |edges| kg1 (synthetic) 1,048,576 5,360,368 kg4m68m (synthetic) 4,194,304 68,671,566 web-Google 875,713 5,105,039 LiveJournal1 4,847,571 68,993,773 hollywood-2011 2,180,759 228,985,632 arabic-2005 22,744,080 639,999,45824/34
    • Giraph vs. Static Mizan 40 Giraph 12 Giraph 35 Static Mizan Static Mizan 30 10 Run Time (Min) Run Time (Min) 25 8 20 6 15 4 10 2 5 0 0 web k Live k 1 2 4 8 16 - Goo g1 Jour g4m68 nal1 gle m Vertex count (Millions) Figure: PageRank on social network Figure: PageRank on regular random and random graphs graphs, each has around 17M edge25/34
    • Effectiveness of Dynamic Vertex Migration PageRank on a social graph (LiveJournal1) The shaded columns: algorithm runtime The unshaded columns: initial partitioning cost 30 25 Run Time (Min) 20 15 10 5 0 Mizan Mizan Mizan Static Static Static WS WS WS Hash Range Metis26/34
    • Effectiveness of Dynamic Vertex Migration Comparing the performance on highly variable messaging pattern algorithms, which are DMST and ad propagation simulation, on a metis partitioned social graph (LiveJournal1) 300 Static 250 Work Stealing Mizan Run Time (Min) 200 150 100 50 0 Adve DMS r tism T ent27/34
    • Mizan’s Overhead with Scaling 8 10 Mizan - hollywood-2011 Mizan - Arabic-2005 7 8 6 Speedup 5 Speedup 6 4 4 3 2 2 1 0 0 64 128 256 512 1024 0 2 4 6 8 10 12 14 16 Compute Nodes Compute Nodes Figure: Speedup on Linux Cluster Figure: Speedup on IBM Blue Gene/P supercomputer28/34
    • Future Work Work with skewed graphs Improving Pregel’s fault tolerance29/34
    • Conclusion Mizan is a Pregel system that uses fine-grained vertex migration to load balance computation and communication across supersteps Mizan is an open source project developed within InfoCloud group in KAUST in collaboration with IBM, programmed in C++ with MPI Mizan scales up to thousands of workers Mizan improves the overall computation cost between 40% up to two order of magnitudes with less than 10% migration overhead Download it at: http://cloud.kaust.edu.sa Try it on EC2 (ami-52ed743b)30/34
    • Extra: Migration Details and Costs 4 Superstep Runtime 3.5 Average Workers Runtime Run Time (Minutes) 3 Migration Cost Balance Outgoing Messages 2.5 Balance Incoming Messages 2 Balance Response Time 1.5 1 0.5 0 5 10 15 20 25 30 Supersteps31/34
    • Extra: Migrated Vertices per Superstep 4 Migrated Vertices (x1000) 500 Migration Cost Migration Cost (Minutes) Total Migrated Vertices 400 Max Vertices Migrated 3 by Single Worker 300 2 200 1 100 0 0 5 10 15 20 25 30 Supersteps32/34
    • Extra: Mizan’s Migration Planning Compute() BSP Barrier Pair with Underutilized Worker Migration No Superstep Barrier Imbalanced? Select Yes Vertices and Migrate Identify Source of Imbalance No Worker Yes Overutilized?33/34
    • Extra: Architecture of Mizan Vertex Compute() Mizan Worker BSP Processor Migration Communicator - DHT Planner Storage Manager IO HDFS/Local Disks34/34