Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Vasia Kalavri – Training: Gelly School

6,546 views

Published on

Flink Forward 2015

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Vasia Kalavri – Training: Gelly School

  1. 1. Gelly School Vasia Kalavri Apache Flink PMC, PhD student @KTH kalavri@kth.se, @vkalavri
  2. 2. ● Java & Scala Graph APIs on top of Flink ● Library of common Graph algorithms ● Iteration abstractions ● Can be seamlessly mixed with the DataSet Flink API → easily implement applications that use both record-based and graph-based analysis Meet Gelly 2
  3. 3. Gelly in the Flink stack 3 Scala API (batch and streaming) Java API (batch and streaming) FlinkML Gelly Runtime and Execution Environments Data storage Table APIPython API Transformations and Utilities Iterative Graph Processing Graph Library
  4. 4. Hello, Gelly! 4 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Edge<Long, NullValue>> edges = getEdgesDataSet(env); Graph<Long, Long, NullValue> graph = Graph.fromDataSet(edges, env); DataSet<Vertex<Long, Long>> verticesWithMinIds = graph.run( new ConnectedComponents(maxIterations)); val env = ExecutionEnvironment.getExecutionEnvironment val edges: DataSet[Edge[Long, NullValue]] = getEdgesDataSet(env) val graph = Graph.fromDataSet(edges, env) val components = graph.run(new ConnectedComponents(maxIterations)) Java Scala
  5. 5. Go to http://gellyschool.com/flink-forward ● Tutorial#0: Why Graphs and Gelly Basics ● Tutorial#1: Calculate Degree Distributions ● Tutorial#2: PageRank ● Tutorial#3: People you Might Know Skeleton: http://github.com/vasia/gelly-school/tree/ff-skeleton Solutions: http://github.com/vasia/gelly-school/tree/ff-solutions ...or in the home folder of your VM :-) Tasks for Today 5
  6. 6. Today you will learn how to... Create a Graph from a file of edges Compute simple Graph properties Use Gelly’s neighborhood methods Run Gelly library algorithms Use DataSet and Gelly APIs together 6
  7. 7. Today you will not learn how to... Use the Scala Gelly API Write your own vertex-centric or gather-sum-apply iterative programs 7
  8. 8. Tutorial#0: Let’s get Started!
  9. 9. // create a vertex with ID=42 and value=0.8 Vertex<Integer, Double> v = new Vertex<Integer, Double>(42, 0.8); // create an edge from 5 to 6 with value="foo" Edge<Integer, Integer, String> e = new Edge<Integer, Integer, String>(5, 6, "foo"); // create an edge from 5 to 6 with no value Edge<Integer, Integer, NullValue> e = new Edge<Integer, Integer, NullValue>(5, 6, NullValue.getInstance()); ID type value type source ID type target ID type edge value type 9
  10. 10. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // create a Graph from Vertex and Edge DataSets DataSet<Vertex<String, Long>> vertices = ... DataSet<Edge<String, Double>> edges = ... Graph<String, Long, Double> g1 = Graph.fromDataSet(vertices, edges, env); ... // create a Graph from a Tuple3 DataSet DataSet<Tuple3<String, String, Double>> edges = ... Graph<String, NullValue, Double> g2 = Graph.fromTupleDataSet(edges, env); 10
  11. 11. // create a Graph from a Tuple2 DataSet DataSet<Tuple2<String, String>> input = ... DataSet<Edge<String, NullValue>> edges = input.map( new MapFunction<Tuple2<String, String>, Edge<String, NullValue>>() { public Edge<String, NullValue> map(Tuple2<String, String> in) { return new Edge(in.f0, in.f1, NullValue.getInstance()); } }) Graph<String, NullValue, NullValue> g3 = Graph.fromDataSet(edges, env); 11
  12. 12. Tutorial#1: Degree Distributions
  13. 13. 1 2 3 4 5 vertexID in-degree out-degree degree 1 0 2 2 2 2 2 4 3 2 1 3 4 2 1 3 5 1 1 2 degree #vertices distribution 2 2 2/5 3 2 2/5 4 1 1/5 13
  14. 14. Tutorial#2: PageRank 14
  15. 15. vertexID out-degree transition probability 1 2 1/2 2 2 1/2 3 0 - 4 3 1/3 5 1 1 15 1 2 43 5 PR(3) = 0.5*PR(1) + 0.33*PR(4) + PR(5) simplified PageRank
  16. 16. Graph<Long, Double, Double> network = ... DataSet<Tuple2<Long, Long>> vertexOutDegrees = network.outDegrees(); // assign the transition probabilities as the edge weights Graph<Long, Double, Double> networkWithWeights = network.joinWithEdgesOnSource(vertexOutDegrees, new MapFunction<Tuple2<Double, Long>, Double>() { public Double map(Tuple2<Double, Long> value) { return value.f0 / value.f0; } }); 16 current Edge value value from degrees new Edge value
  17. 17. Interactions as weights 17 Tom RT Wendy Tom RT Mary Tom RT Wendy Sarah RT James Tom RT Jim Tom RT Wendy Jim RT Obama ... Tom Wendy Mary Jim Tom Wendy Mary Jim Tom Wendy Mary Jim 3 1 1 0.6 0.2 0.2
  18. 18. Tutorial#3: People you Might Know 18
  19. 19. 19 Steve Wendy Carol Randy Shelly Steve knows Wendy Wendy knows Carol → Steve knows Carol? Steve knows Randy Randy and Wendy know Shelly → Steve knows Shelly?
  20. 20. Feeling Gelly? Gelly programming guide: https://ci.apache.org/projects/flink/flink-docs- master/libs/gelly_guide.html Blog post: https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html More Gelly School: http://gellyschool.com 20

×