Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

957 views

Published on

Доклад Алексея Зиновьева на HighLoad++ 2014.

Published in:
Internet

No Downloads

Total views

957

On SlideShare

0

From Embeds

0

Number of Embeds

325

Shares

0

Downloads

14

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Thorny path to the Large-Scale Graph Processing Zinoviev Alexey
- 2. About • I am a <graph theory, machine learning, traffic jams prediction, BigData algorythms> scientist • But I'm a <Java, JavaScript, Android, NoSQL, Hadoop, Spark> programmer
- 3. BigData & Graph Theory 3/65
- 4. Big Data of old times • Astronomy • Weather • Trading • Sea routes • Battles
- 5. And now ... • Web graph • Facebook friend network • Gmail email graph • EU road network • Citation graph • PayPal transaction graph
- 6. Graph Number of vertexes Number of edges Volume Data/per day Web-graph 1,5 * 10^12 1,2 * 10^13 100 PB 300 TB Facebook 1,1 * 10^9 160 * 10^9 1 PB 15 TB (friends graph) Road graph of EU 18 * 10^6 42 * 10^6 20 GB 50 MB Road graph of this city 250 000 460 000 500 MB 100 KB
- 7. Problems • Popularity rank (page rank) • Determining popular users, news, jobs, etc. • Shortest paths • Max flow • How are users, groups connected? • Clustering, semi-clustering • Max clique, triangle closure, label propagation algorithms • Finding related people, groups, interests
- 8. Node Centrality Problem • Verticies with high impact • Removal of important vertices reduces the reliability Cases: • Bioinformatics • Social connections • Road network • Spam detection • Recommendation system
- 9. Small World Problem Facebook 4.74 712 M 69 G Twitter 3.67 ---- 5G follows MSN Messenger (1 month) 6.6 180 M 1.3 G arcs
- 10. Large graph processing tools 15/65
- 11. Think like a vertex… • Majority of graph algorithms are iterative and traverse the graph in some way • Classic map-reduce overheads (job startup/shutdown, reloading data from HDFS, shuffling) • High complexity of graph problem reduction to key-value model • Iteration algorythms, but multiple chained jobs in M/R with full saving and reading of each state
- 12. Why not use MapReduce/Hadoop? • Example: PageRank, Google‘s famous algorithm for measuring the authority of a webpage based on the underlying network of hyperlinks • defined recursively: each vertex distributes its authority to its neighbors in equal proportions
- 13. Google Pregel • Distributed system especially developed for large scale graph processing • Bulk Synchronous Parallel (BSP) as execution model • Supersteps are atomic units of parallel computation • Any superstep can be restarted from a checkpoint (need not be user defined) • A new superstep provides an opportunity for rebalancing of components among available resources
- 14. Superstep in BSP
- 15. Vertex-centric BSP • Each vertex has an id, a value, a list of its adjacent vertex ids and the corresponding edge values • Each vertex is invoked in each superstep, can recompute its value and send messages to other vertices, which are delivered over superstep barriers • Advanced features : termination votes, combiners, aggregators, topology mutations
- 16. C++ API, Pregel
- 17. Apache Giraph 23/65
- 18. Why Apache Giraph Pregel is proprietary, but: • Apache Giraph is an open source implementation of Pregel • Runs on standard Hadoop infrastructure • Computation is executed in memory • Can be a job in a pipeline(MapReduce, Hive) • Uses Apache ZooKeeperfor synchronization
- 19. Why Apache Giraph • No locks: message-based communication • No semaphores: global synchronization • Iteration isolation: massively parallelizable
- 20. ZooKeeper in Apache Giraph ZooKeeper: responsible for computation state • Partition/worker mapping • Global state: superstep • Checkpoint paths, aggregator values, statistics
- 21. Master in Apache Giraph Master: responsible for coordination • Assigns partitions to workers • Coordinates synchronization • Requests checkpoints • Aggregates aggregator values • Collects health statuses
- 22. Worker in Apache Giraph Worker: responsible for vertices • Invokes active vertices compute() function • Sends, receives and assigns messages • Computes local aggregation values
- 23. Scaling Giraph to a trillion edges
- 24. Fault tolerance No single point of failure from Giraph threads • With multiple master threads, if the current master dies, a new one will automatically take over. • If a worker thread dies, the application is rolled back to a previously checkpointed superstep. • If a zookeeper server dies, as long as a quorum remains, the application can proceed Hadoop single points of failure still exist (Namenode, jobtracker)
- 25. Worker Scalability, 250m nodes
- 26. Vertex scalability, 300 workers
- 27. Vertex/workers scalability
- 28. MapReduce vs Giraph 6 machines with 2x8core Opteron CPUs, 4x1TB disks and 32GB RAM each, ran 1 Giraph worker per core Wikipedia page link graph (6 million vertices, 200 million edges) PageRank on Hadoop/Mahout • 10 iterations approx. 29 minutes • average time per iteration: approx. 3 minutes PageRank on Giraph • 30 iterations took approx. 15 minutes • average time per iteration: approx. 30 seconds 10x performance improvement
- 29. Okapi • Apache Mahout for graphs • Graph-based recommenders: ALS, SGD, SVD++, etc. • Graph analytics: Graph partitioning, Community Detection, K-Core, etc.
- 30. Giraph’s killer
- 31. Spark • MapReduce in memory • Up to 50x faster than Hadoop • Support for Shark (like Hive), MLlib (Machine learning), GraphX (graph processing) • RDD is a basic building block (immutable distributed collections of objects)
- 32. Spark in Hadoop old family
- 33. GraphX Supported algorythms ● PageRank ● Connected components ● Label propagation ● SVD++ ● Strongly connected components ● Triangle count
- 34. GraphChi • Asynchronous Disk-based version of GraphLab • Utilizing parallel sliding window • Very small number of non-sequential accessesto the disk • Graph does not fit in memory • Input graph is split into P disjoint intervals to balance edges, each associated with a shard • For Home deals ...
- 35. GraphChi
- 36. GraphChi
- 37. Road Networks 46/65
- 38. Definition • Edge weights > 0 • A few classes of roads • Lat/Lon attributes for each vertex • Subgraphs for cross-roads • Not so big as web graph • Static
- 39. Shortest path problem
- 40. AI
- 41. Full
- 42. Dijkstra
- 43. Bi-Directional
- 44. We need in fast system! • Response < 10 ms (with high accuracy) • Shortest path (SP) with O(n) • Preprocessing phase • Don’t keep all SP - O(n^2) • Use geo attributes • Using compression and recoding for disk storage • Network is stable
- 45. EU Road network Dijkstra ALT RE HH CH TN HL 2 008 300 24 656 2444 462.0 94.0 1.8 0.3 • ALT: [Goldberg & Harrelson 05], [Delling & Wagner 07] • RE: [Gutman 05], [Goldberg et al. 07] • HH: [Sanders & Schultes 06] • CH: [Geisberger et al. 08] • TN: [Geisberger et al. 08] • HL: [Abraham et al. 11]
- 46. A* with landmarks (ALT)
- 47. Reach (RE)
- 48. Transit nodes (TN) • Divide graph G on subgraphs G_i • Find R (subset of G_i) for each G_i • All sortest path in G_i across R • Build pairs (v_i, r_k) for each v_i where r_k is closest Transit Node • Calculate shortest paths between transit nodes in R • Save it!
- 49. TN + ALT
- 50. Special Cases 59/65
- 51. Optimization problems • Unstable graph • Prerpocessing phase is meaningless • How to invest 1B $ in road network to minimize human time in traffic jams • How to invest 1M $ in road network to improve reliability before the flooding
- 52. Last steps ... • I/O Efficient Algorythms and Data Structures • Graphs and Memory Errors
- 53. Omsk
- 54. Novosibirsk
- 55. Novosibirsk, TN preprocessing
- 56. twitter + G+ + VK

No public clipboards found for this slide

Be the first to comment