Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Loading in …5
×

# Flink Gelly - Karlsruhe - June 2015

1,462 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

### Flink Gelly - Karlsruhe - June 2015

1. 1. Andra Lungu Flink committer andra.lungu @campus.tu-berlin.de Large-Scale Graph Processing with Apache Flink
2. 2. What is Gelly?  Large-scale graph processing API  On top of Flink’s Java API  Official release: Flink 0.9  Off-the shelf library methods  Supports record and graph analysis applications; iterative algorithms 2
3. 3. The Growing Flink Stack 3
4. 4. How to use Gelly? 4
5. 5. Graph Creation 5
6. 6. Graph Properties  getVertices()  getEdges()  getVertexIds()  getEdgeIds()  inDegrees()  outDegrees()  getDegrees()  numberOfVertices()  numberOfEdges()  getTriplets() 6
7. 7. Graph Transformations  Map • mapVertices(final MapFunction<Vertex<K, VV>, NV> mapper) • mapEdges(final MapFunction<Edge<K, EV>, NV> mapper)  Filter • filterOnVertices(FilterFunction<Vertex<K, VV>> vertexFilter) • filterOnEdges(FilterFunction<Edge<K, EV>> edgeFilter) • subgraph(FilterFunction<Vertex<K, VV>> vertexFilter, FilterFunction<Edge<K, EV>> edgeFilter) 7
8. 8. Filter Functions 8
9. 9. Graph Transformations  Join • joinWithVertices(DataSet<Tuple2<K, T>> inputDataSet, final MapFunction<Tuple2<VV, T>, VV> mapper) • joinWithEdges(DataSet<Tuple3<K, K, T>> inputDataSet, final MapFunction<Tuple2<EV, T>, EV> mapper) • joinWithEdgesOnSource / joinWithEdgesOnTarget  Reverse  Undirected 9
10. 10. Union 10
11. 11. Graph Mutations  addVertex(final Vertex<K, VV> vertex)  addVertices(List<Vertex<K, VV>> verticesToAdd)  addEdge(Vertex<K, VV> source, Vertex<K, VV> target, EV edgeValue)  addEdges(List<Edge<K, EV>> newEdges)  removeVertex(Vertex<K, VV> vertex)  removeVertices(List<Vertex<K, VV>> verticesToBeRemoved)  removeEdge(Edge<K, EV> edge)  removeEdges(List<Edge<K, EV>> edgesToBeRemoved) 11
12. 12. Neighborhood Methods  reduceOnNeighbors(reduceNeighborsFunc tion, direction)  reduceOnEdges  groupReduceOnNeighbors; groupReduceOnEdges 12
13. 13. Graph Validation  Given criteria: • Edge IDs correspond to vertex IDs 13
14. 14. Vertex-centric Iterations  Pregel [BSP] Execution Model  UDFs: • Messaging Function • VertexUpdateFunction  S-1: receive messages from neighbors  S: update vertex values  S+1: send new value to neighbors 14
15. 15. Single Source Shortest Paths 15
16. 16. SSSP – Second Superstep 16
17. 17. SSSP - Result 17
18. 18. SSSP – code snippet 18
19. 19. Gather-Sum-Apply Iterations  UDFs: • GatherFunction • SumFunction • ApplyFunction  Back to SSSP: • Gather: neighbor value + edge weight • Sum/Accumulate: choose min • Apply: compare computed min and update 19
20. 20. SSSP – Superstep 1 20
21. 21. SSSP – Superstep 2 21
22. 22. SSSP - Result 22
23. 23. SSSP – code snippet 23
24. 24. Vertex-centric or GSA?  Messaging = Gather + Sum  Update = Apply  Skewed graphs? – GSA (parallel gather)  coGroup vs. reduce  GSA gathers from immediate neighbors;  Vertex-centric send to any vertex 24
25. 25. Library of Algorithms  Weakly Connected Components  Community Detection  Page Rank  Single Source Shortest Paths  Label Propagation 25
26. 26. Music Profiles Example 26
27. 27. Input Data  <user-id, song-id, play-count>  Set of bad records [IDs] 27
28. 28. Filter out Bad Records 28
29. 29. Compute Top Songs/User 29
30. 30. Compute Top Songs/User 30
31. 31. Create a user-user Graph 31
32. 32. Create a user-user Graph 32
33. 33. Cluster Similar Users 33
34. 34. Coming up Next  Gelly Blog Post  Scala API  More Library Methods  Flink Streaming Integration  Graph Partitioning Techniques  Specialized Operators for Highly Skewed Graphs  Bipartite Graph Support Curious? Gelly Roadmap 34
35. 35. flink.apache.org @ApacheFlink user@flink.apache.org dev@flink.apache.org