Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Thorny path to the 
Large-Scale Graph 
Processing 
Zinoviev Alexey
About 
• I am a <graph theory, machine learning, traffic jams prediction, BigData 
algorythms> scientist 
• But I'm a <Jav...
BigData & Graph Theory 
3/65
Big Data of old times 
• Astronomy 
• Weather 
• Trading 
• Sea routes 
• Battles
And now ... 
• Web graph 
• Facebook friend network 
• Gmail email graph 
• EU road network 
• Citation graph 
• PayPal tr...
Graph Number of 
vertexes 
Number of 
edges 
Volume Data/per day 
Web-graph 1,5 * 10^12 1,2 * 10^13 100 PB 300 TB 
Faceboo...
Problems 
• Popularity rank (page rank) 
• Determining popular users, news, jobs, etc. 
• Shortest paths 
• Max flow 
• Ho...
Node Centrality Problem 
• Verticies with high impact 
• Removal of important vertices reduces the reliability 
Cases: 
• ...
Small World Problem 
Facebook 4.74 712 M 69 G 
Twitter 3.67 ---- 5G follows 
MSN Messenger 
(1 month) 
6.6 180 M 1.3 G arc...
Large graph processing tools 
15/65
Think like a vertex… 
• Majority of graph algorithms are iterative and traverse the graph in 
some way 
• Classic map-redu...
Why not use MapReduce/Hadoop? 
• Example: PageRank, Google‘s 
famous algorithm for measuring the 
authority of a webpage b...
Google Pregel 
• Distributed system especially developed for large scale graph 
processing 
• Bulk Synchronous Parallel (B...
Superstep in BSP
Vertex-centric BSP 
• Each vertex has an id, a value, a list of its adjacent vertex ids and the 
corresponding edge values...
C++ API, Pregel
Apache Giraph 
23/65
Why Apache Giraph 
Pregel is proprietary, but: 
• Apache Giraph is an open source implementation of Pregel 
• Runs on stan...
Why Apache Giraph 
• No locks: message-based communication 
• No semaphores: global synchronization 
• Iteration isolation...
ZooKeeper in Apache Giraph 
ZooKeeper: responsible for 
computation state 
• Partition/worker mapping 
• Global state: sup...
Master in Apache Giraph 
Master: responsible for coordination 
• Assigns partitions to workers 
• Coordinates synchronizat...
Worker in Apache Giraph 
Worker: responsible for vertices 
• Invokes active vertices 
compute() function 
• Sends, receive...
Scaling Giraph to a trillion edges
Fault tolerance 
No single point of failure from Giraph threads 
• With multiple master threads, if the current master die...
Worker Scalability, 250m nodes
Vertex scalability, 300 workers
Vertex/workers scalability
MapReduce vs Giraph 
6 machines with 2x8core Opteron CPUs, 4x1TB disks and 32GB RAM each, ran 1 
Giraph worker per core 
W...
Okapi 
• Apache Mahout for graphs 
• Graph-based recommenders: ALS, 
SGD, SVD++, etc. 
• Graph analytics: Graph 
partition...
Giraph’s killer
Spark 
• MapReduce in memory 
• Up to 50x faster than Hadoop 
• Support for Shark (like Hive), MLlib 
(Machine learning), ...
Spark in Hadoop old family
GraphX 
Supported algorythms 
● PageRank 
● Connected components 
● Label propagation 
● SVD++ 
● Strongly connected compo...
GraphChi 
• Asynchronous Disk-based version of GraphLab 
• Utilizing parallel sliding window 
• Very small number of non-s...
GraphChi
GraphChi
Road Networks 
46/65
Definition 
• Edge weights > 0 
• A few classes of roads 
• Lat/Lon attributes for each vertex 
• Subgraphs for cross-road...
Shortest path problem
AI
Full
Dijkstra
Bi-Directional
We need in fast system! 
• Response < 10 ms (with high accuracy) 
• Shortest path (SP) with O(n) 
• Preprocessing phase 
•...
EU Road network 
Dijkstra ALT RE HH CH TN HL 
2 008 300 24 656 2444 462.0 94.0 1.8 0.3 
• ALT: [Goldberg & Harrelson 05], ...
A* with landmarks (ALT)
Reach (RE)
Transit nodes (TN) 
• Divide graph G on subgraphs G_i 
• Find R (subset of G_i) for each G_i 
• All sortest path in G_i ac...
TN + ALT
Special Cases 
59/65
Optimization problems 
• Unstable graph 
• Prerpocessing phase is meaningless 
• How to invest 1B $ in road network to min...
Last steps ... 
• I/O Efficient Algorythms and Data Structures 
• Graphs and Memory Errors
Omsk
Novosibirsk
Novosibirsk, TN preprocessing
twitter + G+ + VK
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Upcoming SlideShare
Loading in …5
×

Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)

957 views

Published on

Доклад Алексея Зиновьева на HighLoad++ 2014.

Published in: Internet
  • Be the first to comment

Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)

  1. 1. Thorny path to the Large-Scale Graph Processing Zinoviev Alexey
  2. 2. About • I am a <graph theory, machine learning, traffic jams prediction, BigData algorythms> scientist • But I'm a <Java, JavaScript, Android, NoSQL, Hadoop, Spark> programmer
  3. 3. BigData & Graph Theory 3/65
  4. 4. Big Data of old times • Astronomy • Weather • Trading • Sea routes • Battles
  5. 5. And now ... • Web graph • Facebook friend network • Gmail email graph • EU road network • Citation graph • PayPal transaction graph
  6. 6. Graph Number of vertexes Number of edges Volume Data/per day Web-graph 1,5 * 10^12 1,2 * 10^13 100 PB 300 TB Facebook 1,1 * 10^9 160 * 10^9 1 PB 15 TB (friends graph) Road graph of EU 18 * 10^6 42 * 10^6 20 GB 50 MB Road graph of this city 250 000 460 000 500 MB 100 KB
  7. 7. Problems • Popularity rank (page rank) • Determining popular users, news, jobs, etc. • Shortest paths • Max flow • How are users, groups connected? • Clustering, semi-clustering • Max clique, triangle closure, label propagation algorithms • Finding related people, groups, interests
  8. 8. Node Centrality Problem • Verticies with high impact • Removal of important vertices reduces the reliability Cases: • Bioinformatics • Social connections • Road network • Spam detection • Recommendation system
  9. 9. Small World Problem Facebook 4.74 712 M 69 G Twitter 3.67 ---- 5G follows MSN Messenger (1 month) 6.6 180 M 1.3 G arcs
  10. 10. Large graph processing tools 15/65
  11. 11. Think like a vertex… • Majority of graph algorithms are iterative and traverse the graph in some way • Classic map-reduce overheads (job startup/shutdown, reloading data from HDFS, shuffling) • High complexity of graph problem reduction to key-value model • Iteration algorythms, but multiple chained jobs in M/R with full saving and reading of each state
  12. 12. Why not use MapReduce/Hadoop? • Example: PageRank, Google‘s famous algorithm for measuring the authority of a webpage based on the underlying network of hyperlinks • defined recursively: each vertex distributes its authority to its neighbors in equal proportions
  13. 13. Google Pregel • Distributed system especially developed for large scale graph processing • Bulk Synchronous Parallel (BSP) as execution model • Supersteps are atomic units of parallel computation • Any superstep can be restarted from a checkpoint (need not be user defined) • A new superstep provides an opportunity for rebalancing of components among available resources
  14. 14. Superstep in BSP
  15. 15. Vertex-centric BSP • Each vertex has an id, a value, a list of its adjacent vertex ids and the corresponding edge values • Each vertex is invoked in each superstep, can recompute its value and send messages to other vertices, which are delivered over superstep barriers • Advanced features : termination votes, combiners, aggregators, topology mutations
  16. 16. C++ API, Pregel
  17. 17. Apache Giraph 23/65
  18. 18. Why Apache Giraph Pregel is proprietary, but: • Apache Giraph is an open source implementation of Pregel • Runs on standard Hadoop infrastructure • Computation is executed in memory • Can be a job in a pipeline(MapReduce, Hive) • Uses Apache ZooKeeperfor synchronization
  19. 19. Why Apache Giraph • No locks: message-based communication • No semaphores: global synchronization • Iteration isolation: massively parallelizable
  20. 20. ZooKeeper in Apache Giraph ZooKeeper: responsible for computation state • Partition/worker mapping • Global state: superstep • Checkpoint paths, aggregator values, statistics
  21. 21. Master in Apache Giraph Master: responsible for coordination • Assigns partitions to workers • Coordinates synchronization • Requests checkpoints • Aggregates aggregator values • Collects health statuses
  22. 22. Worker in Apache Giraph Worker: responsible for vertices • Invokes active vertices compute() function • Sends, receives and assigns messages • Computes local aggregation values
  23. 23. Scaling Giraph to a trillion edges
  24. 24. Fault tolerance No single point of failure from Giraph threads • With multiple master threads, if the current master dies, a new one will automatically take over. • If a worker thread dies, the application is rolled back to a previously checkpointed superstep. • If a zookeeper server dies, as long as a quorum remains, the application can proceed Hadoop single points of failure still exist (Namenode, jobtracker)
  25. 25. Worker Scalability, 250m nodes
  26. 26. Vertex scalability, 300 workers
  27. 27. Vertex/workers scalability
  28. 28. MapReduce vs Giraph 6 machines with 2x8core Opteron CPUs, 4x1TB disks and 32GB RAM each, ran 1 Giraph worker per core Wikipedia page link graph (6 million vertices, 200 million edges) PageRank on Hadoop/Mahout • 10 iterations approx. 29 minutes • average time per iteration: approx. 3 minutes PageRank on Giraph • 30 iterations took approx. 15 minutes • average time per iteration: approx. 30 seconds 10x performance improvement
  29. 29. Okapi • Apache Mahout for graphs • Graph-based recommenders: ALS, SGD, SVD++, etc. • Graph analytics: Graph partitioning, Community Detection, K-Core, etc.
  30. 30. Giraph’s killer
  31. 31. Spark • MapReduce in memory • Up to 50x faster than Hadoop • Support for Shark (like Hive), MLlib (Machine learning), GraphX (graph processing) • RDD is a basic building block (immutable distributed collections of objects)
  32. 32. Spark in Hadoop old family
  33. 33. GraphX Supported algorythms ● PageRank ● Connected components ● Label propagation ● SVD++ ● Strongly connected components ● Triangle count
  34. 34. GraphChi • Asynchronous Disk-based version of GraphLab • Utilizing parallel sliding window • Very small number of non-sequential accessesto the disk • Graph does not fit in memory • Input graph is split into P disjoint intervals to balance edges, each associated with a shard • For Home deals ...
  35. 35. GraphChi
  36. 36. GraphChi
  37. 37. Road Networks 46/65
  38. 38. Definition • Edge weights > 0 • A few classes of roads • Lat/Lon attributes for each vertex • Subgraphs for cross-roads • Not so big as web graph • Static
  39. 39. Shortest path problem
  40. 40. AI
  41. 41. Full
  42. 42. Dijkstra
  43. 43. Bi-Directional
  44. 44. We need in fast system! • Response < 10 ms (with high accuracy) • Shortest path (SP) with O(n) • Preprocessing phase • Don’t keep all SP - O(n^2) • Use geo attributes • Using compression and recoding for disk storage • Network is stable
  45. 45. EU Road network Dijkstra ALT RE HH CH TN HL 2 008 300 24 656 2444 462.0 94.0 1.8 0.3 • ALT: [Goldberg & Harrelson 05], [Delling & Wagner 07] • RE: [Gutman 05], [Goldberg et al. 07] • HH: [Sanders & Schultes 06] • CH: [Geisberger et al. 08] • TN: [Geisberger et al. 08] • HL: [Abraham et al. 11]
  46. 46. A* with landmarks (ALT)
  47. 47. Reach (RE)
  48. 48. Transit nodes (TN) • Divide graph G on subgraphs G_i • Find R (subset of G_i) for each G_i • All sortest path in G_i across R • Build pairs (v_i, r_k) for each v_i where r_k is closest Transit Node • Calculate shortest paths between transit nodes in R • Save it!
  49. 49. TN + ALT
  50. 50. Special Cases 59/65
  51. 51. Optimization problems • Unstable graph • Prerpocessing phase is meaningless • How to invest 1B $ in road network to minimize human time in traffic jams • How to invest 1M $ in road network to improve reliability before the flooding
  52. 52. Last steps ... • I/O Efficient Algorythms and Data Structures • Graphs and Memory Errors
  53. 53. Omsk
  54. 54. Novosibirsk
  55. 55. Novosibirsk, TN preprocessing
  56. 56. twitter + G+ + VK

×