Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalable Distributed Graph Algorithms on Apache Spark

80 views

Published on

Lynx presented at the Crunch Data Engineering and Analytics Conference held in Budapest, Hungary in 2017.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Scalable Distributed Graph Algorithms on Apache Spark

  1. 1. András Németh, CRUNCH, Budapest, 20th October, 2017 Scalable Distributed Graph Algorithms on Apache Spark
  2. 2. Why scalable graph algorithms?
  3. 3. © Lynx Analytics Graphs are all around us … Citations 3 Social graphs Internet Transportatio n networks Protein structure Money transfers Viral infection patterns Electronic circuits Telecommunication networks Knowledge representations (i.e. Google’s Knowledge Graph) Neural networks (artificial and natural)
  4. 4. © Lynx Analytics … and they are full of hidden secrets 4 Looking close enough, they can: • Predict churn based on embeddedness in the call graph • Figure out demographic based on social relationships and communities • Find fraudsters in a bank’s transaction network • Help find influencers and design viral campaigns • Identify which bus routes are unnecessary and which ones need more capacity
  5. 5. © Lynx Analytics But they are large! 5 Telco call graph hundreds of millions of vertices and billions of edges Google Knowledge Graph 70 billion edges Internet tens of billions of vertices and hundreds of billions of edges Brain hundred billion vertices and hundred trillion edges
  6. 6. Apache Spark – horizontal scaling to the rescue
  7. 7. © Lynx Analytics What is Apache Spark? 7 Apache Spark is the world’s most trendy scalable distributed data processing engine. • It takes care of the plumbing to run distributed algorithms on huge cluters • break down work to tasks • scheduling of tasks on workers • distribution of input/output data and processing code • distributed FS and standard file format access • error recovery • etc, etc • Elegant, high level yet powerful API • Scala, Python and R • Higher level API add ons: SQL, machine learning, graph processing
  8. 8. © Lynx Analytics But graph algorithms are hard to parallelize 8 • Distributed computation works by splitting input data into manageable sized partitions • Graph algorithms are all about checking and modifying state of neighborings • Ideal partitioning would not cut through edges • Too bad that this is absolutely impossible for 99% of graph • Methods exists to minimizes edge cuts, but even one cut edge implies information exchange among partitions, which is very expensive
  9. 9. The Pregel Model
  10. 10. © Lynx Analytics Pregel model - definition 10 Based on Google’s “Pregel: A System for Large-Scale Graph Processing”, Pregel is an algorthmic framework to manage (if not solve) the above difficulties. A Pregel algorithm is a repetition of the following steps: 1. Some vertex local computation (using also messages received – see next point) 2. Sending messages to neighboring vertices
  11. 11. © Lynx Analytics Pregel example – shortest paths from multiple sources 11 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  12. 12. © Lynx Analytics Pregel example – shortest paths from multiple sources 12 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  13. 13. © Lynx Analytics Pregel example – shortest paths from multiple sources 13 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  14. 14. © Lynx Analytics Pregel example – shortest paths from multiple sources 14 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  15. 15. © Lynx Analytics Pregel example – shortest paths from multiple sources 15 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  16. 16. © Lynx Analytics Pregel example – shortest paths from multiple sources 16 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  17. 17. © Lynx Analytics Pregel example – shortest paths from multiple sources 17 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  18. 18. © Lynx Analytics Pregel example - pagerank 18 1. All vertices start with an initial pagerank estimate (say 1 for all) 2. All vertices send their current pagerank estimate to their outneighbors 3. Based on incoming pagerank estimates all vertices recompute their pagerank estimate 4. Repeat 2 and 3 until convergence or getting bored
  19. 19. © Lynx Analytics Pregel on Spark 19 // Contains actual (vertex id, vertex state) pairs var vertexStates: RDD[(ID, VertexState)] = …. Code to initialize vertex states … while (… ! halting condition …) { // returns an iterator of the (target vertex id, message) pairs sent by a given vertex def messageGenerator( sourceId: ID, sourceState: VertexState, neighbors: Iterable[ID]): Iterator[(ID, Message)] = { … } val messages: RDD[(ID, Message)] = vertexStates.join(edgesBySource.groupByKey).flatMap{ case (id, (state, neighbors)) => messageGenerator(id, state, neighbors)} // returns new state given old state and messages def newState(originalState: VertexState, messages: Iterable[Message]): VertexState = { … } vertexStates = vertexStates.join(messages.groupByKey).mapValues{ case (originalState, messages) => newState(originalState, messages)} }
  20. 20. © Lynx Analytics Pregel on Spark 20 Conceptually it’s super easy to represent a Pregel algorithm as a Spark program. There are some details to watch out for, though: • Lots of joins – they’d better be fast • Partitioning has to be controlled closely • Same partitioning for states throughout the algorithm • Above partitioning “enough” for number of messages, not just number of states • Potential hotspotting if a vertex generates or receives too many messages
  21. 21. © Lynx Analytics Fast joins – sorted RDDs 21 • Built-in Spark join: • Repartition both datasets by the hash of the join keys • Move corresponding partition pairs to the same machine • Join a single partition by collecting key-value pairs in a map • This is somewhat slow and memory intensive • Merge joins • much faster • constant memory overhead • Requires both RDDs sorted by key within partitions • This is done via an RDD subclass SortedRDD developed at Lynx
  22. 22. © Lynx Analytics Sorted joins - results 22
  23. 23. © Lynx Analytics Hotspots what & why 23 • Hotspotting means that partitioning of the work fails • Causes seriour performance hits even if total amount of work is manageable • Large partitions even cause OOM errors • Large degree vertices are notorious to cause hotsports in graph algorithms • Very typical problem with large, scale free (in other words, realistic  ) graphs
  24. 24. © Lynx Analytics Hotspots – how to deal with them? 24 Partition work based on edges, not vertices! E.g. instead of using our original message generator: def messageGenerator(sourceId: ID, sourceState: VertexState, neigbors: Iterable[ID]) on all vertices use something like this on all edges: def messageGenerator(sourceId: ID, destinationId: ID, sourceState: VertexState) This way we never have to collect all edges of a single vertex! Similar tricks can be applied to destination vertices: • Incoming messages can be pre-aggregated
  25. 25. © Lynx Analytics Hotspots – join problems 25 How do you exactly collect, say, source states on all edges? Easy! val edges: RDD[(ID, ID)] // Edges represented as (src, dst) ids. val edgesWithStates: RDD[(ID, ID, VertexState)] = edges.groupByKey().join(vertexStates).flatMap { case (src, (dsts, vertexState)) => dsts.map(dst => (src, dst, vertexState)) } Wait a second! That groupByKey in itself can create a hotspot! This does exactly what we pledged not to do: collects all edges of a vertex to a single partitioner…
  26. 26. © Lynx Analytics Hybrid lookup – the task 26 The technique we use to solve this problem is what we call a hybrid lookup. Problem statement We are given two RDDs, both with the same keyspace: val hybrid: RDD[(K,V1)] val lookupTable: RDD[(K,V2)] In lookupTable we know that all keys are unique but hybrid might have the same key many-many times. The task is to look up in lookupTable all keys in hybrid and return: val result: RDD[(K, (V1,V2))]
  27. 27. © Lynx Analytics Hybrid lookup – implementation 27 1. Split hybrid into two sets • only the really large keys (hybridLarges) • the rest of the keys (hybridSmalls) 2. For the small keys use standard, join based lookup (This includes repartitioning hybridSmalls by key) 3. Send the lookup value for all large keys to all partitions of hybridLarges and use that map to perform the lookup (no repartitioning hybridLarges!) 4. Take the union of results from 2 and 3 above The use of hybrid joins and techniques explained above resolved lots of performance instability and spark crash issues in LynxKite.
  28. 28. Monte Carlo for parallelization
  29. 29. © Lynx Analytics Yet another Pregel compatible algorithm – connected components 29 1. All vertices use their ids as their starting state 2. Every vertex sends their current state to its neighbors 3. States are updated to the minimum of current state and received messages 4. Repeat 2 and 3 until convergence Notice that on termination each node’s state will be the lowest id in its connected component. Exactly what we needed to differentiate components! Great!
  30. 30. © Lynx Analytics Yet another Pregel compatible algorithm – connected components 30 1. All vertices use their ids as their starting state 2. Every vertex sends their current state to its neighbors 3. States are updated to the minimum of current state and received messages 4. Repeat 2 and 3 until convergence Notice that on termination each node’s state will be the lowest id in its connected component. Exactly what we needed to differentiate components! Great! Or is it? We may have tons of iterations!
  31. 31. © Lynx Analytics Randomness to the rescue – connected components take 2 31 1. Let’s party! Each node organize a party with ½ probability. All neighbors invited! 2. Non-organizers choose a party to attend (Social pariahs start their own one person party.) 3. We create a new graph of parties 4. We recurse on the new party graph until we run out of edges This algorithm is expected to end in O(logN) iterations. (Based on algorithm from "A Model of Computation for MapReduce" by Karloff et al.)
  32. 32. © Lynx Analytics Randomness to the rescue – connected components take 2 32 1. Let’s party! Each node organize a party with ½ probability. All neighbors invited! 2. Non-organizers choose a party to attend (Social pariahs start their own one person party.) 3. We create a new graph of parties 4. We recurse on the new party graph until we run out of edges This algorithm is expected to end in O(logN) iterations. (Based on algorithm from "A Model of Computation for MapReduce" by Karloff et al.) Small performance trick: switch to single machine when the graph gets small.
  33. 33. © Lynx Analytics Connected component search - runtimes 33
  34. 34. Thank you!

×