2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph

1,310 views

Published on

(Abstract from Strata talk)
http://strataconf.com/strata2014/public/schedule/detail/32137

Graph analytics have applications beyond large web scale organizations. Many computing problems can be efficiently expressed and processed as a graph and can lead to useful insights that drive product and business decisions

While you can express graph algorithms as SQL queries in Hive or Hadoop MapReduce programs, an API designed specifically for graph processing makes writing many iterative graph computations (such as page rank, connected components, label propagation, graph-based clustering, etc.) easy to express in simpler and easier to understand code. Apache Giraph provides such a native graph processing API, runs on existing Hadoop infrastructure and can directly access HDFS and/or Hive tables.

This talk describes our efforts at Facebook to scale Apache Giraph to very large graphs of up to one trillion edges and how we run Apache Giraph in production. We will also talk about several algorithms that we have implemented and their use cases.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,310
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
52
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph

  1. 1. Graph Analysis with One Trillion Edges on Apache Giraph 2/13/2014 Avery Ching, Facebook Strata
  2. 2. Motivation
  3. 3. Apache Giraph • Inspired by Google’s Pregel but runs on Hadoop • “Think like a vertex” • Maximum value vertex example Processor 1 Time 5 5 5 1 1 5 5 5 2 Processor 2 5 2 2 5 5
  4. 4. Giraph on Hadoop / Yarn Giraph MapReduce Hadoop 0.20.x Hadoop 0.20.203 Hadoop 1.x YARN Hadoop 2.0.x
  5. 5. Apache Giraph data flow Split 3 Load/ Send Graph Part 1 Part 2 Part 3 Compute/ Send Messages Compute/ Send Messages Send stats/iterate! Worker 0 Part 0 Worker 0 Load/ Send Graph Storing the graph Worker 1 Split 2 In-memory graph Worker 1 Split 1 Compute / Iterate Master Master Split 0 Worker 1 Input 
 format Worker 0 Loading the graph Part 0 Part 1 Output 
 format Part 0 Part 1 Part 2 Part 3 Part 2 Part 3
  6. 6. Beyond Pregel Sharded aggregators Master computation Composable computation
  7. 7. Use case: k-means clustering Cluster input vectors into k clusters • Assign each input vector to the closest centroid • Update centroid locations based on assignments Random centroid location Assignment to centroid c0 Update centroids c0 c2 c0 c2 c2 c0 c2 c1 c1 c1 c1
  8. 8. k-means in Giraph Partitioning the problem c0 c2 Input vectors → vertices • Partitioned across machines Centroids → aggregators • Shared data across all machines c1 ! ! Worker 0 Problem solved....right? Worker 1 c0 c0 c2 c1 c2 c1
  9. 9. Problem 1: Massive dimensions Cluster Facebook members by friendships? • 1 billion members (dimensions) • k clusters Each worker sending to the master a maximum of • 1B * (2 bytes - max 5k friends) * k = 2 * k GB Master receives up to 2 * k * workers GB • Saturated network link • OOM
  10. 10. Sharded aggregators Master handles all aggregators Aggregators sharded to workers final agg 0 master final agg 0 master final agg 1 final agg 1 final agg 2 final agg 2 partial agg 0 partial agg 1 final agg 1 partial agg 2 worker 0 final agg 0 partial agg 0 worker 0 final agg 0 final agg 2 partial agg 2 final agg 2 final agg 0 partial agg 0 final agg 0 partial agg 1 final agg 1 partial agg 2 final agg 2 partial agg 2 final agg 2 partial agg 0 final agg 0 partial agg 0 final agg 0 partial agg 1 final agg 1 partial agg 1 final agg 1 partial agg 2 worker 2 final agg 1 partial agg 0 worker 1 partial agg 1 final agg 2 worker 1 worker 2 partial agg 1 partial agg 2 final agg 1 final agg 2 • Share aggregator load across workers • Future work - tree-based optimizations (not yet a problem)
  11. 11. Problem 2: Edge cut metric Clusters should reduce the number of cut edges Two phases • Send all out edges your cluster id • Aggregate edges with different cluster ids Calculate no more than once an hour?
  12. 12. Master computation Serial computation on master • Communicates to workers via aggregators • Added to Giraph by Stanford GPS team Master Worker 0 Worker 1 Time k-means k-means start cut end cut k-means k-means k-means start cut end cut k-means
  13. 13. Problem 3: More phases, more problems Add a stage to initialize the centroids Add random input vectors to centroids • Add a few random friends Two phases c0 c2 • Randomly sample input vertices to add • Send messages to a few random neighbors c3
  14. 14. Problem 3: (continued) Cannot easily support different messages, combiners Vertex compute code getting messy c0 c2 if (phase == INITIALIZE_SELF) // Randomly add to centroid else if (phase == INITIALIZE_FRIEND) // Add my vector to centroid if a friend selected me else if (phase == K_MEANS) // Do k-means else if (phase == START_EDGE_CUT)... c3
  15. 15. Composable computation Decouple vertex from computation Master sets the computation, combiner classes Reusable and composable Computation Add random centroid / random friends Add to centroid K-means Start edge cut End edge cut In message Null Centroid message Null Null Cluster Out message Centroid message Null Null Cluster Null Combiner N/A N/A N/A Cluster combiner N/A
  16. 16. Composable computation (cont) Balanced Label Propagation compute candidates to move to partitions probabilistically move vertices Continue if halting condition not met (i.e. < n vertices moved?)
  17. 17. Composable computation (cont) Balanced Label Propagation compute candidates to move to partitions probabilistically move vertices Continue if halting condition not met (i.e. < n vertices moved?) Affinity Propagation calculate and send responsibilities calculate and send availabilities Continue if halting condition met (i.e. < n vertices changed exemplars?) update exemplars
  18. 18. Faster than Hive? Application Graph Size CPU Time Speedup Elapsed Time Speedup Page rank
 400B+ edges 26x 120x 71B+ edges 12.5x 48x (single iteration) Friends of friends score

  19. 19. Apache Giraph scalability Scalability of workers Scalability of edges (50 (200B edges) workers) 500 375 375 Seconds Seconds 500 250 125 0 50 100 150 200 250 300 # of Workers Giraph Ideal 250 125 0 1E+09 7E+10 1E+11 # of Edges Giraph Ideal 2E+11
  20. 20. A billion edges isn’t cool. 
 You know what’s cool? A TRILLION edges.
  21. 21. Page rank on 200 machines with 1 trillion (1,000,000,000,000) edges <4 minutes / iteration! * Results from 6/30/2013 with one-to-all messaging + request processing improvements
  22. 22. Why balanced partitioning Random partitioning == good balance BUT ignores entity affinity 0 1 6 3 4 5 10 7 8 9 2 11
  23. 23. Balanced partitioning application Results from one service: Cache hit rate grew from 70% to 85%, bandwidth cut in 1/2 ! ! 0 3 6 9 1 4 7 10 2 5 8 11
  24. 24. Balanced label propagation results * Loosely based on Ugander and Backstrom. Balanced label propagation for partitioning massive graphs, WSDM '13
  25. 25. Avoiding out-of-core Example: Mutual friends calculation between neighbors ! C:{D} D:{C} A 1. Send your friends a list of your friends ! ! E:{} B 2. Intersect with your friend list ! 1.23B (as of 1/2014) A:{D} D:{A,E} E:{D} C E 200+ average friends (2011 S1) 8-byte ids (longs) = 394 TB / 100 GB machines 3,940 machines (not including the graph) D A:{C} C:{A,E} E:{C} B:{} C:{D} D:{C}
  26. 26. Superstep splitting Subsets of sources/destinations edges per superstep * Currently manual - future work automatic! Sources: A (on), B (off) Destinations: A (on), B (off) Sources: A (on), B (off) Destinations: A (off), B (on) B Sources: A (off), B (on) Destinations: A (on), B (off) B Sources: A (off), B (on) Destinations: A (off), B (on) B B A B A B A B A B B A B A B A B A A A A A
  27. 27. Debugging with GiraphicJam
  28. 28. Giraph in Production Over 1.5 years in production Over 100 jobs processed a week 30+ applications in our internal application repository Sample production job - 700B+ edges Very stable • Checkpointing disabled (highly loaded HDFS adds instability) • Retries handle intermittent failures
  29. 29. Giraph roadmap 2/12 - 0.1 Relaxing BSP - 1.2? • Giraph++ (IBM research) • Giraphx (University at Buffalo, SUNY) 5/13 - 1.0 Spring 2014 - 1.1
  30. 30. Future work Evaluate alternative computing models Performance Lower the barrier to entry Applications
  31. 31. Our team ! Pavan Athivarapu Avery Ching Maja Kabiljo Greg Malewicz Sambavi Muthukrishnan

×