Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem Aliev and Russell Spitzer

1,813 views

Published on

Graph is on the rise and it’s time to start learning about scalable graph analytics! In this session we will go over two Spark-based Graph Analytics frameworks: Tinkerpop and GraphFrames. While both frameworks can express very similar traversals, they have different performance characteristics and APIs. In this Deep-Dive by example presentation, we will demonstrate some common traversals and explain how, at a Spark level, each traversal is actually computed under the hood! Learn both the fluent Gremlin API as well as the powerful GraphFrame Motif api as we show examples of both simultaneously. No need to be familiar with Graphs or Spark for this presentation as we’ll be explaining everything from the ground up!

Published in: Data & Analytics

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem Aliev and Russell Spitzer

  1. 1. Artem Aliev and Russell Spitzer, DataStax A Tale of Two Graph Frameworks on Spark: 
 GraphFrames and Tinkerpop OLAP #EUeco3
  2. 2. #EUeco3 Pierrot and Harlequin • Artem • Graph Analytics Expert • Earth • Russell • Distributed Systems Enthusiast • Earth 2
  3. 3. Tinkerpop and GraphFrames provide Complimentary Approaches for Graph Analytics DataSet Catalyst GraphFrames 3#EUeco3
  4. 4. Graphs are Vertices and Edges 4 Vertices are things and edges represent their relations to one another #EUeco3
  5. 5. Graphs are Vertices and Edges 5 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class[6] Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class[8][9] Service: 2286–2293 (7 Years) #EUeco3
  6. 6. Graphs are Vertices and Edges 6 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class[6] Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class[8][9] Service: 2286–2293 (7 Years) Vertex Properties #EUeco3
  7. 7. Graphs are Vertices and Edges 7 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class[6] Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class[8][9] Service: 2286–2293 (7 Years) succeeded by succeeded by succeeded by #EUeco3
  8. 8. Graphs are Vertices and Edges 8 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class[6] Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class[8][9] Service: 2286–2293 (7 Years) Edge Edge Labelsucceeded by succeeded by succeeded by #EUeco3
  9. 9. Graphs are Vertices and Edges 9 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class[6] Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class[8][9] Service: 2286–2293 (7 Years) Ship Ship Ship Ship Vertex Label succeeded by succeeded by succeeded by #EUeco3
  10. 10. Graphs are Vertices and Edges 10 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class Service: 2245–2285 (40 Years) Ship Ship Ship Ship Position: Captain
 Name: Kirk Position: Captain
 Name: Picard Crew Crew succeeded by succeeded by succeeded by #EUeco3
  11. 11. Graphs are Vertices and Edges 11 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class Service: 2286–2293 (7 Years) Ship Ship Ship Ship Position: Captain
 Name: Kirk Position: Captain
 Name: Picard Crew Crew succeeded by succeeded by succeeded by served on served on served on served on #EUeco3
  12. 12. Graphs are Vertices and Edges 12 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class Service: 2286–2293 (7 Years) Ship Ship Ship Ship Position: Captain
 Name: Kirk Position: Captain
 Name: Picard Crew Crew succeeded by succeeded by succeeded by served on served on served on served on But why do I want this? #EUeco3
  13. 13. Graphs let us ask questions about our data based on their relations 13 What Captain Served After Kirk? What Ship was two after the NCC-1701? #EUeco3
  14. 14. Traversals involve following paths through the Graph 14 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-D) Class: Galaxy Service: 2363–2371 (8 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class Service: 2286–2293 (7 Years) Ship Ship Ship Ship Position: Captain
 Name: Kirk Position: Captain
 Name: Picard Crew Crew succeeded by succeeded by succeeded by served on served on served on served on #EUeco3
  15. 15. What Captain was After Kirk? 15 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class Service: 2286–2293 (7 Years) Ship Ship Position: Captain
 Name: Kirk Position: Captain
 Name: Picard Crew Crewsucceeded by served on served on #EUeco3
  16. 16. What Ship was two after the NCC-1701? 16 Registry: USS Enterprise (NCC-1701-C) Class: Ambassador Service: 2332[11] – 2344 (12 Years) Registry: USS Enterprise (NCC-1701) Class: Constitution class Service: 2245–2285 (40 Years) Registry: USS Enterprise (NCC-1701-A) Class: Enterprise class Service: 2286–2293 (7 Years) Ship Ship Ship succeeded by succeeded by #EUeco3
  17. 17. Tinkerpop is a Powerful and Flexible Graph Framework • Server, Language, Connectors • Graph Framework for 
 OLAP and OLTP • Node Centric Representations • Fluent API (Gremlin) • Fully Self Contained Framework 17#EUeco3
  18. 18. OLTP Examples 18#EUeco3 18
  19. 19. Movie Lens Example Schema 19 https://grouplens.org/datasets/movielens/ #EUeco3 19
  20. 20. 20
  21. 21. #EUeco3 What happens when you have too much data? 21
  22. 22. #EUeco3 Tinkerpop Spark OLAP Mechanism • Instead of one traversal we traverse starting from all nodes simultaneously 22
  23. 23. Distribution Requires Partitioning 23 ? Big Data Independent Chunks of Data#EUeco3
  24. 24. #EUeco3 Vertex Stored in a PairRDD Id -> StarVertex(Edge and Property Information) 24 1 A C D Star Vertex: Adjacency list representation
 1: "A", "Kirk"
 A: "C", "Kirk"
 C: "D", "Picard"
 D: "Picard"
 Just Id 
 Of Connected 
 Vertex
  25. 25. #EUeco3 Vertex Program Runs Initializing Traverser for every Vertex 25 1 A C D SparkMemory - Accumulator - Used for GlobalState
  26. 26. #EUeco3 Then we cycle through a message Passing Algorithm 26 1 A C D 1 A C D 1 A C D SparkMemory - Accumulator - Used for GlobalState
  27. 27. #EUeco3 Then we cycle through a message Passing Algorithm 27 1 A C D 1 A C D 1 A C D SparkMemory - Accumulator - Used for GlobalState Passes messages from one Vertex to another with a join
  28. 28. #EUeco3 Then we cycle through a message Passing Algorithm 28 1 A C D 1 A C D 1 A C D SparkMemory - Accumulator - Used for GlobalState Repeat
  29. 29. #EUeco3 Then we cycle through a message Passing Algorithm 29 1 A C D 1 A C D 1 A C D SparkMemory - Accumulator - Used for GlobalState All Traversers Halt
 Or Program Terminates Result!
  30. 30. #EUeco3 Example OLAP Traversals 30
  31. 31. #EUeco3 Tinkerpop Spark OLAP Pros/Cons Pros • Every message pass requires only a single shuffle • Edges and edge properties accessible without a step • Very Flexible, Many Provider Specific Shortcuts possible • Internal properties can be any Java type • All in one, Server already ready for multiple clients Cons • Limited in ability to connect to external sources/other spark applications • Flexibility of framework allows for many platform specific shortcuts to be added • Genericness provides difficulty in making some optimizations • Edges co-partitioned with vertices, high degree nodes can cause memory issues 31
  32. 32. #EUeco3 GraphFrames Background • Third Party Package • https://graphframes.github.io/ • Integrates with Dataset/Dataframe in Spark • Relational under the hood 32
  33. 33. #EUeco3 GraphFrames are built of two DataFrames 33 Row Column
  34. 34. #EUeco3 GraphFrames are built of two DataFrames 34 id job species Geordi Chief Engineer Human Data Science Officer Android Vertex DataFrame src dst relationship Geordi Data Friend Edge DataFrame Friend
  35. 35. #EUeco3 GraphFrames are built of two DataFrames 35 id job species Geordi Chief Engineer Human Data Science Officer Android Vertex DataFrame src dst relationship Geordi Data Friend Edge DataFrame Friend Can Only Be Spark Types
  36. 36. #EUeco3 GraphFrames are built of two DataFrames 36 id job species Geordi Chief Engineer Human Data Science Officer Android Vertex DataFrame src dst relationship Geordi Data Friend Edge DataFrame Friend No Built in Labels
  37. 37. #EUeco3 Catalyst Optimizes any Requests • Simple requests using DataFrame api don't do anything special • Some methods fall back to GraphX (RDD Based) • Others use pure DataFrame methods 37
  38. 38. #EUeco3 GraphFrames Motif Matching 38 GraphFrame (a)-[e]->(b) V E
  39. 39. #EUeco3 GraphFrames Motif Matching 39 GraphFrame (a)-[e]->(b) Vertex (a) Vertices as a UDT "A"V E A: <VertexRow>
  40. 40. #EUeco3 GraphFrames Motif Matching 40 GraphFrame (a)-[e]->(b) Vertex (a) Vertices as a UDT "A" Edge [b] 
 Edges as UDT "E"
 Join with edges where A.id = E.src V E A: <VertexRow> Join A: <VertexRow>, E: <EdgeRow>
  41. 41. #EUeco3 GraphFrames Motif Matching 41 GraphFrame (a)-[e]->(b) Vertex (a) Vertices as a UDT "A" [e] Vertices as UDT "B" Join with edges where E.dst = B.id Edge Vertex [b] 
 Edges as UDT "E"
 Join with edges where A.id = E.src V E A: <VertexRow> A: <VertexRow>, E: <EdgeRow> Join JoinA: <VertexRow>, E: <EdgeRow>, B: <VertexRow>
  42. 42. #EUeco3 GraphFrames Motif Matching 42 GraphFrame (a)-[e]->(b) Vertex (a) Vertices as a UDT "A" [e] Vertices as UDT "B" Join with edges where E.dst = B.id Edge Vertex [b] 
 Edges as UDT "E"
 Join with edges where A.id = E.src V E A: <VertexRow> A: <VertexRow>, E: <EdgeRow> Join JoinA: <VertexRow>, E: <EdgeRow>, B: <VertexRow> THAT'S SO MANY JOINS
  43. 43. #EUeco3 43 Vertex Edge Vertex A: <VertexRow> A: <VertexRow>, E: <EdgeRow> A: <VertexRow>, E: <EdgeRow>, B: <VertexRow> DataFrames means Optimizations are Automatic
  44. 44. #EUeco3 44 Vertex Edge Vertex A: <VertexRow> A: <VertexRow>, E: <EdgeRow> A: <VertexRow>, E: <EdgeRow>, B: <VertexRow> Select A.ID Columns Pruned and Predicates Pushed
  45. 45. 45 Vertex Edge Vertex A: <VertexRow> A: <VertexRow>, E: <EdgeRow> A: <VertexRow>, E: <EdgeRow>, B: <VertexRow> Select A.ID Columns Pruned and Predicates Pushed #EUeco3
  46. 46. 46 Vertex Edge Vertex A: <VertexRow> A: <VertexRow>, E: <EdgeRow> A: <VertexRow>, E: <EdgeRow>, B: <VertexRow> Select A.ID Columns Pruned and Predicates Pushed #EUeco3
  47. 47. 47 Vertex Edge Vertex A: <VertexRow> A: <VertexRow>, E: <EdgeRow> A: <VertexRow>, E: <EdgeRow>, B: <VertexRow> Select A.ID Columns Pruned and Predicates Pushed #EUeco3
  48. 48. #EUeco3 All of the normal optimizations happen within this FrameWork 48 Vertex Edge Vertex A: <VertexRow> A: <VertexRow>, E: <EdgeRow> A: <VertexRow>, E: <EdgeRow>, B: <VertexRow> Broadcast? Broadcast?
  49. 49. #EUeco3 Code Generation and Internal Rows 49 Vertex Edge Vertex A: <VertexRow> A: <VertexRow>, E: <EdgeRow> A: <VertexRow>, E: <EdgeRow>, B: <VertexRow> Code Generation Code Generation Code Generation Code Generation Code Generation
  50. 50. #EUeco3 GraphFrames Examples 50
  51. 51. #EUeco3 GraphFrame Pros Cons Pros • Much Faster on basic counts • Powerful optimizations + CodeGen • Easy to connect to other sources 
 Cons • Slower on complex traversals (2 Joins per hop) • Relational Model not as Flexible 51
  52. 52. #EUeco3 Choosing the Right Framework 52
  53. 53. Choose TinkerPop OLAP For Long Paths • More complicated queries • Traversals that require many hops • g.V().out.out.out.out 
 • Avoid for simple counts and aggregations • Avoid if you have very high degree Vertices 53#EUeco3
  54. 54. Choose GraphFrames for Interoperability and Short Paths • General Edge/Vertex stats groupCount, min, max • Connecting to other sources • Short paths • High Degree Vertices • Avoid • Long path algorithms 54#EUeco3
  55. 55. #EUeco3 Choosing the Right Framework 55 Gremlin on
 Graphframes OLTP backed by DSE Graph Built in Spark We write it! Search Built In! Advanced Security
  56. 56. #EUeco3 Thanks for Listening 56 Datastax Academy Graph Course https://academy.datastax.com/resources/ds330-datastax-enterprise-graph
 Try out Datastax Enterprise! https://academy.datastax.com/quick-downloads
 
 Apache Tinkerpop
 http://tinkerpop.apache.org/ 
 GraphFrames Link https://graphframes.github.io/

×