Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flink Gelly - Karlsruhe - June 2015

1,462 views

Published on

  • Be the first to comment

Flink Gelly - Karlsruhe - June 2015

  1. 1. Andra Lungu Flink committer andra.lungu @campus.tu-berlin.de Large-Scale Graph Processing with Apache Flink
  2. 2. What is Gelly?  Large-scale graph processing API  On top of Flink’s Java API  Official release: Flink 0.9  Off-the shelf library methods  Supports record and graph analysis applications; iterative algorithms 2
  3. 3. The Growing Flink Stack 3
  4. 4. How to use Gelly? 4
  5. 5. Graph Creation 5
  6. 6. Graph Properties  getVertices()  getEdges()  getVertexIds()  getEdgeIds()  inDegrees()  outDegrees()  getDegrees()  numberOfVertices()  numberOfEdges()  getTriplets() 6
  7. 7. Graph Transformations  Map • mapVertices(final MapFunction<Vertex<K, VV>, NV> mapper) • mapEdges(final MapFunction<Edge<K, EV>, NV> mapper)  Filter • filterOnVertices(FilterFunction<Vertex<K, VV>> vertexFilter) • filterOnEdges(FilterFunction<Edge<K, EV>> edgeFilter) • subgraph(FilterFunction<Vertex<K, VV>> vertexFilter, FilterFunction<Edge<K, EV>> edgeFilter) 7
  8. 8. Filter Functions 8
  9. 9. Graph Transformations  Join • joinWithVertices(DataSet<Tuple2<K, T>> inputDataSet, final MapFunction<Tuple2<VV, T>, VV> mapper) • joinWithEdges(DataSet<Tuple3<K, K, T>> inputDataSet, final MapFunction<Tuple2<EV, T>, EV> mapper) • joinWithEdgesOnSource / joinWithEdgesOnTarget  Reverse  Undirected 9
  10. 10. Union 10
  11. 11. Graph Mutations  addVertex(final Vertex<K, VV> vertex)  addVertices(List<Vertex<K, VV>> verticesToAdd)  addEdge(Vertex<K, VV> source, Vertex<K, VV> target, EV edgeValue)  addEdges(List<Edge<K, EV>> newEdges)  removeVertex(Vertex<K, VV> vertex)  removeVertices(List<Vertex<K, VV>> verticesToBeRemoved)  removeEdge(Edge<K, EV> edge)  removeEdges(List<Edge<K, EV>> edgesToBeRemoved) 11
  12. 12. Neighborhood Methods  reduceOnNeighbors(reduceNeighborsFunc tion, direction)  reduceOnEdges  groupReduceOnNeighbors; groupReduceOnEdges 12
  13. 13. Graph Validation  Given criteria: • Edge IDs correspond to vertex IDs 13
  14. 14. Vertex-centric Iterations  Pregel [BSP] Execution Model  UDFs: • Messaging Function • VertexUpdateFunction  S-1: receive messages from neighbors  S: update vertex values  S+1: send new value to neighbors 14
  15. 15. Single Source Shortest Paths 15
  16. 16. SSSP – Second Superstep 16
  17. 17. SSSP - Result 17
  18. 18. SSSP – code snippet 18
  19. 19. Gather-Sum-Apply Iterations  UDFs: • GatherFunction • SumFunction • ApplyFunction  Back to SSSP: • Gather: neighbor value + edge weight • Sum/Accumulate: choose min • Apply: compare computed min and update 19
  20. 20. SSSP – Superstep 1 20
  21. 21. SSSP – Superstep 2 21
  22. 22. SSSP - Result 22
  23. 23. SSSP – code snippet 23
  24. 24. Vertex-centric or GSA?  Messaging = Gather + Sum  Update = Apply  Skewed graphs? – GSA (parallel gather)  coGroup vs. reduce  GSA gathers from immediate neighbors;  Vertex-centric send to any vertex 24
  25. 25. Library of Algorithms  Weakly Connected Components  Community Detection  Page Rank  Single Source Shortest Paths  Label Propagation 25
  26. 26. Music Profiles Example 26
  27. 27. Input Data  <user-id, song-id, play-count>  Set of bad records [IDs] 27
  28. 28. Filter out Bad Records 28
  29. 29. Compute Top Songs/User 29
  30. 30. Compute Top Songs/User 30
  31. 31. Create a user-user Graph 31
  32. 32. Create a user-user Graph 32
  33. 33. Cluster Similar Users 33
  34. 34. Coming up Next  Gelly Blog Post  Scala API  More Library Methods  Flink Streaming Integration  Graph Partitioning Techniques  Specialized Operators for Highly Skewed Graphs  Bipartite Graph Support Curious? Gelly Roadmap 34
  35. 35. flink.apache.org @ApacheFlink user@flink.apache.org dev@flink.apache.org

×