Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
GRADOOP: Scalable Graph Analytics
with Apache Flink
Martin Junghanns @kc1s
Apache Flink and Neo4j Meetup Berlin
About the speaker and the team
Apache Flink and Neo4j Meetup Berlin 2
André
PhD Student
Martin
PhD Student
Kevin
M.Sc. Stu...
Apache Flink and Neo4j Meetup Berlin 3
Motivation
„Graphs are everywhere“
Apache Flink and Neo4j Meetup Berlin 4
𝑮𝑟𝑎𝑝ℎ = (𝑽𝑒𝑟𝑡𝑖𝑐𝑒𝑠, 𝑬𝑑𝑔𝑒𝑠)
„Graphs are everywhere“
Apache Flink and Neo4j Meetup Berlin 5
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
Trent
𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬...
„Graphs are everywhere“
Apache Flink and Neo4j Meetup Berlin 6
𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠)
Alice
Bob
Eve
Dave
Carol
Mallo...
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
„Graphs are heterogeneous“
Apache Flink and Neo4j Meetup Berlin 7
𝐺𝑟𝑎𝑝ℎ...
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 8
𝐺𝑟𝑎𝑝ℎ =...
0.2
0.28
0.26
0.33
0.25
0.26
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
3.6
2.82
„Graphs can be analyzed“
Apache F...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 10
Assuming a social network
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 11
Assuming a social network
1. Determine subgraph
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 12
Assuming a social network
1. Determine subgraph
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 13
Assuming a social network
1. Determine subgraph
2. Find c...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 14
Assuming a social network
1. Determine subgraph
2. Find c...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 15
Assuming a social network
1. Determine subgraph
2. Find c...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 16
Assuming a social network
1. Determine subgraph
2. Find c...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 17
Assuming a social network
1. Determine subgraph
2. Find c...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 18
Assuming a social network
1. Determine subgraph
2. Find c...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 19
Assuming a social network
• Heterogeneous data
1. Determi...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 20
Assuming a social network
• Heterogeneous data
1. Determi...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 21
Assuming a social network
• Heterogeneous data
1. Determi...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 22
Assuming a social network
• Heterogeneous data
1. Determi...
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 23
Assuming a social network
• Heterogeneous data
1. Determi...
„And let‘s not forget …“
Apache Flink and Neo4j Meetup Berlin 24
“…Graphs are large”
Apache Flink and Neo4j Meetup Berlin 25
„A framework and research platform for efficient,
distributed and domain independent management
and analytics of heterogen...
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
Apache HBase Distributed Graph Store
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
Apache HBase Distributed Graph Store
Apa...
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
Apache HBase Distributed Graph Store
Apa...
Apache Flink Third-party library
Apache Flink and Neo4j Meetup Berlin 28
Streaming Dataflow Runtime
DataSet DataStream
Had...
Apache Flink and Neo4j Meetup Berlin 29
Extended Property Graph Model (EPGM)
Extended Property Graph Model
• Vertices and directed Edges
Apache Flink and Neo4j Meetup Berlin 30
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
Apache Flink and Neo4j Meetup Berlin 31
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
Apache Flink and Neo4j Meetup B...
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
Apache Flink and ...
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
• Properties
Apac...
Apache Flink and Neo4j Meetup Berlin 35
EPGM Operators
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
1
2
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
1 3
4
5
2
1
2
Combination
3
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
31 3
4
5
2
1
2
3
Combination
Overlap
3
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
3
1 2
1 3
4
5
2
1
2
3
3
Combination
Overlap
Exclu...
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
UDF
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
UDF
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
1 3
4
5
2
3
revenue:700...
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
1 3
4
5
2
3
revenue:700...
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
1 3
4
5
2
3
revenue:700...
Graph Transformation
Apache Flink and Neo4j Meetup Berlin 38
Graph Transformation
Apache Flink and Neo4j Meetup Berlin 38
3 | vertexCount: 5
name:Alice
f_name:Bob1 3
4
5
2
Graph Transformation
Apache Flink and Neo4j Meetup Berlin 38
UDF
3 | vertexCount: 5
name:Alice
f_name:Bob1 3
4
5
2
3 | Com...
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
UDF
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
3
4
1 2UDF
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
3
4
1 2
UDF
UDF
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
3
4
1 2
4
3
5
2UDF
UDF
UDF
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2 Pattern
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2 Pattern
4 5
1 3
4
2
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2 Pattern
4 5
1 3
4
2
Graph Collection
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
3
1 3
4
5
2
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
3
a:23 a:84
a:42
a:12
1 3
4
5
2
a:13
a:21
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
+Aggregate
3
a:23 a:84
a:42
a:12
1 3
4
5
2
a...
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
+Aggregate
3
a:23 a:84
a:42
a:12
1 3
4
5
2
a...
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
1
2
3
revenue:7000
expense:1000
expense:1000
revenue:2000...
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
Operator
1
2
3
revenue:7000
expense:1000
expense:1000
rev...
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
Operator
1
2
3
revenue:7000
expense:1000
expense:1000
rev...
Selection
Apache Flink and Neo4j Meetup Berlin 43
Selection
Apache Flink and Neo4j Meetup Berlin 43
1 | profit: 5000
2 | profit: -1000
3 | profit: 3000
revenue:7000
expense...
Selection
Apache Flink and Neo4j Meetup Berlin 43
UDF
profit > 0
1 | profit: 5000
2 | profit: -1000
3 | profit: 3000
reven...
Selection
Apache Flink and Neo4j Meetup Berlin 43
UDF
profit > 0
1 | profit: 5000
2 | profit: -1000
3 | profit: 3000
reven...
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
Algorithm
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
Algorithm
1
0 2
3
4
1
5 7 86
9 11 1210
2
3
4
0 2
3
4
1
5 7 ...
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
Algorithm
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
Algorithm
2
rank:0.11
rank:0.25
rank:0.11
rank:1.29
rank:1.29...
EPGM Operators Overview
Apache Flink and Neo4j Meetup Berlin 46
Operators
Unary Binary
GraphCollectionLogicalGraph
Algorit...
EPGM Operators Overview
Apache Flink and Neo4j Meetup Berlin 47
Operators
Unary Binary
GraphCollectionLogicalGraph
Algorit...
EPGM Operators Overview
Apache Flink and Neo4j Meetup Berlin 48
Operators
Unary Binary
GraphCollectionLogicalGraph
Algorit...
Apache Flink and Neo4j Meetup Berlin 49
EPGM on Apache Flink
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
DataSet
Data...
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
• Transforma...
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
• Transforma...
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSe...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
EPGMGraphHead
Id Label Properties POJO DataSet<EPGMGraphHead>
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
EPGMGraphHead
EPGMVertex
Id Label ...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId Targe...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId Targe...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId Targe...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId Targe...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId Targe...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId Targe...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|intere...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Comm...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Comm...
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Comm...
Flink DataSet Transformations
Apache Flink and Neo4j Meetup Berlin 53
Flink DataSet Transformations
Apache Flink and Neo4j Meetup Berlin 53
SQL-like Transformations
• filter
• project
• cross
...
Flink DataSet Transformations
Apache Flink and Neo4j Meetup Berlin 53
Hadoop-like Transformations
• map
• flatMap
• mapPar...
Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|int...
Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|int...
Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|int...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
graphId = secondGraph.getGraphHead()
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges()
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Sour...
GrALa API
Apache Flink and Neo4j Meetup Berlin 57
GrALa API
Apache Flink and Neo4j Meetup Berlin 57
class LogicalGraph<G extends EPGMGraphHead,
V extends EPGMVertex,
E exte...
GrALa API
Apache Flink and Neo4j Meetup Berlin 57
class LogicalGraph<G extends EPGMGraphHead,
V extends EPGMVertex,
E exte...
GrALa API
Apache Flink and Neo4j Meetup Berlin 58
class EPGMDatabase<G extends EPGMGraphHead,
V extends EPGMVertex,
E exte...
GrALa API
Apache Flink and Neo4j Meetup Berlin 59
class EPGMDatabase<G extends EPGMGraphHead,
V extends EPGMVertex,
E exte...
Apache Flink and Neo4j Meetup Berlin 60
Performance
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 62
1. Extract subgraph containing only Persons and knows rel...
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 63
Dataset # Vertices # Edges Disk size
Graphalytics.1 61,61...
Social Network Benchmark – Runtime
Apache Flink and Neo4j Meetup Berlin 64
Dataset # Vertices # Edges Disk size
Graphalyti...
1
2
4
8
16
1 2 4 8 16
Speedup
Number of workers
Graphalytics.100 Linear
Social Network Benchmark – Speedup
Apache Flink an...
1
10
100
1000
10000
Runtime[s]
Social Network Benchmark – Datasets
Apache Flink and Neo4j Meetup Berlin 66
Dataset # Verti...
Apache Flink and Neo4j Meetup Berlin 67
Demo
https://github.com/s1ck/neo4j-gradoop-demos
Apache Flink and Neo4j Meetup Berlin 68
Current State and Future Work
Current State – Operator Implementations
Apache Flink and Neo4j Meetup Berlin 69
Operators
Unary Binary
GraphCollectionLog...
Release History
Apache Flink and Neo4j Meetup Berlin 70
• 0.0.1 First Prototype (May 2015)
– Hadoop MapReduce and Giraph f...
Contributions to Flink
Apache Flink and Neo4j Meetup Berlin 71
• FLINK-2411 Add basic graph summarization algorithm
• FLIN...
Contributions Welcome
Apache Flink and Neo4j Meetup Berlin 72
• Code
– Operator implementations / improvement
– Performanc...
Apache Flink and Neo4j Meetup Berlin 73
Thank you!
www.gradoop.com
http://flink.apache.org
http://neo4j.com
http://ldbcoun...
Upcoming SlideShare
Loading in …5
×

Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Berlin

2,047 views

Published on

Presentation of the Gradoop Framework at the Flink & Neo4j Meetup in Berlin (http://www.meetup.com/graphdb-berlin/events/228576494/). The talk is about the extended property graph model, its operators and how they are implemented on top of Apache Flink. The talk also includes some benchmark results on scalability and a demo involving Neo4j, Flink and Gradoop (see www.gradoop.com)

Published in: Data & Analytics

Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Berlin

  1. 1. GRADOOP: Scalable Graph Analytics with Apache Flink Martin Junghanns @kc1s Apache Flink and Neo4j Meetup Berlin
  2. 2. About the speaker and the team Apache Flink and Neo4j Meetup Berlin 2 André PhD Student Martin PhD Student Kevin M.Sc. Student Niklas M.Sc. Student Prof. Dr. Erhard Rahm Database Chair
  3. 3. Apache Flink and Neo4j Meetup Berlin 3 Motivation
  4. 4. „Graphs are everywhere“ Apache Flink and Neo4j Meetup Berlin 4 𝑮𝑟𝑎𝑝ℎ = (𝑽𝑒𝑟𝑡𝑖𝑐𝑒𝑠, 𝑬𝑑𝑔𝑒𝑠)
  5. 5. „Graphs are everywhere“ Apache Flink and Neo4j Meetup Berlin 5 Alice Bob Eve Dave Carol Mallory Peggy Trent 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬, 𝐹𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠)
  6. 6. „Graphs are everywhere“ Apache Flink and Neo4j Meetup Berlin 6 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠) Alice Bob Eve Dave Carol Mallory Peggy Trent
  7. 7. Alice Bob AC/DC Dave Carol Mallory Peggy Metallica „Graphs are heterogeneous“ Apache Flink and Neo4j Meetup Berlin 7 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
  8. 8. Alice Bob AC/DC Dave Carol Mallory Peggy Metallica „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 8 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
  9. 9. 0.2 0.28 0.26 0.33 0.25 0.26 Alice Bob AC/DC Dave Carol Mallory Peggy Metallica 3.6 2.82 „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 9 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
  10. 10. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 10 Assuming a social network
  11. 11. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 11 Assuming a social network 1. Determine subgraph
  12. 12. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 12 Assuming a social network 1. Determine subgraph
  13. 13. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 13 Assuming a social network 1. Determine subgraph 2. Find communities
  14. 14. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 14 Assuming a social network 1. Determine subgraph 2. Find communities
  15. 15. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 15 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities
  16. 16. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 16 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities
  17. 17. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 17 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities 4. Find common subgraph
  18. 18. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 18 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities 4. Find common subgraph
  19. 19. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 19 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  20. 20. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 20 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  21. 21. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 21 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  22. 22. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 22 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  23. 23. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 23 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  24. 24. „And let‘s not forget …“ Apache Flink and Neo4j Meetup Berlin 24
  25. 25. “…Graphs are large” Apache Flink and Neo4j Meetup Berlin 25
  26. 26. „A framework and research platform for efficient, distributed and domain independent management and analytics of heterogeneous graph data.“ Apache Flink and Neo4j Meetup Berlin 26
  27. 27. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27
  28. 28. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster
  29. 29. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster Apache HBase Distributed Graph Store
  30. 30. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster Apache HBase Distributed Graph Store Apache Flink Distributed Operator Execution
  31. 31. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster Apache HBase Distributed Graph Store Apache Flink Operator Implementation Apache Flink Distributed Operator Execution Extended Property Graph Model (EPGM) Graph Analytical Language (GrALa)  Java 7  25K (33K) LOC  GPLv3
  32. 32. Apache Flink Third-party library Apache Flink and Neo4j Meetup Berlin 28 Streaming Dataflow Runtime DataSet DataStream HadoopMR Table Gelly ML Table Zeppelin Cascading MRQL Dataflow Storm Dataflow SAMOA GRADOOP Cluster (e.g. YARN)Local Cloud (e.g. EC2) Batch Stream Data Storage (e.g. Files, HDFS, S3, JDBC, Kafka, …)
  33. 33. Apache Flink and Neo4j Meetup Berlin 29 Extended Property Graph Model (EPGM)
  34. 34. Extended Property Graph Model • Vertices and directed Edges Apache Flink and Neo4j Meetup Berlin 30
  35. 35. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs Apache Flink and Neo4j Meetup Berlin 31
  36. 36. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs • Identifiers Apache Flink and Neo4j Meetup Berlin 32 1 3 4 5 21 2 3 4 5 1 2
  37. 37. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs • Identifiers • Type Labels Apache Flink and Neo4j Meetup Berlin 33 1 3 4 5 21 2 3 4 5 Person Band Person Person Band likes likes likes knows likes 1|Community 2|Community
  38. 38. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs • Identifiers • Type Labels • Properties Apache Flink and Neo4j Meetup Berlin 34 1 3 4 5 21 2 3 4 5 Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock
  39. 39. Apache Flink and Neo4j Meetup Berlin 35 EPGM Operators
  40. 40. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36
  41. 41. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 1 2
  42. 42. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 1 3 4 5 2 1 2 Combination 3
  43. 43. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 31 3 4 5 2 1 2 3 Combination Overlap 3
  44. 44. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 3 1 2 1 3 4 5 2 1 2 3 3 Combination Overlap Exclusion 3
  45. 45. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37
  46. 46. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3
  47. 47. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 UDF
  48. 48. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 UDF
  49. 49. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 1 3 4 5 2 3 revenue:7000 expense:1000 expense:1000 UDF
  50. 50. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 1 3 4 5 2 3 revenue:7000 expense:1000 expense:1000 UDF UDF
  51. 51. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 1 3 4 5 2 3 revenue:7000 expense:1000 expense:1000 1 3 4 5 2 3 | profit: 5000 revenue:7000 expense:1000 expense:1000 UDF UDF
  52. 52. Graph Transformation Apache Flink and Neo4j Meetup Berlin 38
  53. 53. Graph Transformation Apache Flink and Neo4j Meetup Berlin 38 3 | vertexCount: 5 name:Alice f_name:Bob1 3 4 5 2
  54. 54. Graph Transformation Apache Flink and Neo4j Meetup Berlin 38 UDF 3 | vertexCount: 5 name:Alice f_name:Bob1 3 4 5 2 3 | Community| vCount: 5 f_name:Alice f_name:Bob1 3 4 5 2
  55. 55. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39
  56. 56. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2
  57. 57. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 UDF
  58. 58. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2UDF
  59. 59. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 UDF UDF
  60. 60. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 3 4 1 2UDF UDF
  61. 61. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 3 4 1 2 UDF UDF UDF
  62. 62. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 3 4 1 2 4 3 5 2UDF UDF UDF
  63. 63. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40
  64. 64. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2
  65. 65. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2 Pattern
  66. 66. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2 Pattern 4 5 1 3 4 2
  67. 67. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2 Pattern 4 5 1 3 4 2 Graph Collection
  68. 68. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41
  69. 69. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 3 1 3 4 5 2
  70. 70. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2
  71. 71. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7
  72. 72. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7 3 a:23 a:84 a:42 a:12 1 3 4 5 2 a:13 a:21
  73. 73. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7 +Aggregate 3 a:23 a:84 a:42 a:12 1 3 4 5 2 a:13 a:21
  74. 74. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7 +Aggregate 3 a:23 a:84 a:42 a:12 1 3 4 5 2 a:13 a:21 4 count:2 count:2 max(a):42 max(a):84 max(a):13 max(a):21 6 7
  75. 75. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42
  76. 76. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42 1 2 3 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  77. 77. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42 Operator 1 2 3 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  78. 78. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42 Operator 1 2 3 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  79. 79. Selection Apache Flink and Neo4j Meetup Berlin 43
  80. 80. Selection Apache Flink and Neo4j Meetup Berlin 43 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  81. 81. Selection Apache Flink and Neo4j Meetup Berlin 43 UDF profit > 0 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  82. 82. Selection Apache Flink and Neo4j Meetup Berlin 43 UDF profit > 0 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210 1 | profit: 5000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:4000 expense:1000 0 2 3 4 1 9 11 1210
  83. 83. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44
  84. 84. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44 1 0 2 3 4 1 5 7 86 9 11 1210
  85. 85. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44 Algorithm 1 0 2 3 4 1 5 7 86 9 11 1210
  86. 86. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44 Algorithm 1 0 2 3 4 1 5 7 86 9 11 1210 2 3 4 0 2 3 4 1 5 7 86 9 11 1210
  87. 87. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45
  88. 88. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45 1 0 2 3 4 1 5 7 86 9 11 1210
  89. 89. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45 Algorithm 1 0 2 3 4 1 5 7 86 9 11 1210
  90. 90. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45 Algorithm 2 rank:0.11 rank:0.25 rank:0.11 rank:1.29 rank:1.29 rank:1.58rank:0.11rank:5.12 rank:0.11 rank:0.11 rank:0.26 rank:0.11 rank:2.47 0 2 3 4 1 5 7 86 9 11 1210 1 0 2 3 4 1 5 7 86 9 11 1210
  91. 91. EPGM Operators Overview Apache Flink and Neo4j Meetup Berlin 46 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  92. 92. EPGM Operators Overview Apache Flink and Neo4j Meetup Berlin 47 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  93. 93. EPGM Operators Overview Apache Flink and Neo4j Meetup Berlin 48 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  94. 94. Apache Flink and Neo4j Meetup Berlin 49 EPGM on Apache Flink
  95. 95. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50
  96. 96. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 • DataSet := Distributed Collection of Data Objects DataSet DataSet DataSet
  97. 97. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 • DataSet := Distributed Collection of Data Objects • Transformation := Operation on DataSets DataSet DataSet DataSet Transformation Transformation DataSet DataSet
  98. 98. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 • DataSet := Distributed Collection of Data Objects • Transformation := Operation on DataSets • Flink Programm := Composition of Transformations DataSet DataSet DataSet Transformation Transformation DataSet DataSet Transformation DataSet Flink Program
  99. 99. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet • DataSet := Distributed Collection of Data Objects • Transformation := Operation on DataSets • Flink Programm := Composition of Transformations DataSet DataSet DataSet Transformation Transformation DataSet DataSet Transformation DataSet Flink Program
  100. 100. Graph Representation Apache Flink and Neo4j Meetup Berlin 51
  101. 101. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 EPGMGraphHead Id Label Properties POJO DataSet<EPGMGraphHead>
  102. 102. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs EPGMGraphHead EPGMVertex Id Label Properties POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex>
  103. 103. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge>
  104. 104. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex
  105. 105. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit
  106. 106. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit String
  107. 107. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit String PropertyList := List<Property> Property := (String, PropertyValue) PropertyValue := byte[]
  108. 108. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit String PropertyList := List<Property> Property := (String, PropertyValue) PropertyValue := byte[] GradoopIdSet := Set<GradoopId>
  109. 109. Graph Representation Apache Flink and Neo4j Meetup Berlin 52
  110. 110. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5
  111. 111. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 Id Label Properties 1 Community {interest:Heavy Metal} 2 Community {interest:Hard Rock} 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 DataSet<EPGMGraphHead>
  112. 112. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 Id Label Properties 1 Community {interest:Heavy Metal} 2 Community {interest:Hard Rock} Id Label Properties Graphs 1 Person {name:Alice, born:1984} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} 4 Band {name:AC/DC,founded:1973} {2} 5 Person {name:Eve} {2} 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 DataSet<EPGMGraphHead> DataSet<EPGMVertex>
  113. 113. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 Id Label Properties 1 Community {interest:Heavy Metal} 2 Community {interest:Hard Rock} Id Label Properties Graphs 1 Person {name:Alice, born:1984} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} 4 Band {name:AC/DC,founded:1973} {2} 5 Person {name:Eve} {2} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} 3 likes 3 4 {since:2015} {2} 4 knows 3 5 {} {2} 5 likes 5 4 {since:2014} {2} 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge>
  114. 114. Flink DataSet Transformations Apache Flink and Neo4j Meetup Berlin 53
  115. 115. Flink DataSet Transformations Apache Flink and Neo4j Meetup Berlin 53 SQL-like Transformations • filter • project • cross • union • distinct • first-N (limit) • groupBy • aggregate • join • leftOuterJoin • rightOuterJoin • fullOuterJoin
  116. 116. Flink DataSet Transformations Apache Flink and Neo4j Meetup Berlin 53 Hadoop-like Transformations • map • flatMap • mapPartition • reduce • reduceGroup • coGroup Special Flink Operations • iterate • iterateDelta SQL-like Transformations • filter • project • cross • union • distinct • first-N (limit) • groupBy • aggregate • join • leftOuterJoin • rightOuterJoin • fullOuterJoin
  117. 117. Operator Implementation Apache Flink and Neo4j Meetup Berlin 54 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5
  118. 118. Operator Implementation Apache Flink and Neo4j Meetup Berlin 54 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 Exclusion
  119. 119. Operator Implementation Apache Flink and Neo4j Meetup Berlin 54 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 // input: firstGraph (G[1]), secondGraph (G[2]) 1: DataSet<GradoopId> graphId = secondGraph.getGraphHead() 2: .map(new Id<G>()); 3: 4: DataSet<V> newVertices = firstGraph.getVertices() 5: .filter(new NotInGraphBroadCast<V>()) 6: .withBroadcastSet(graphId, GRAPH_ID); 7: 8: DataSet<E> newEdges = firstGraph.getEdges() 9: .filter(new NotInGraphBroadCast<E>()) 10: .withBroadcastSet(graphId, GRAPH_ID) 11: .join(newVertices) 12: .where(new SourceId<E>().equalTo(new Id<V>()) 13: .with(new LeftSide<E, V>()) 14: .join(newVertices) 15: .where(new TargetId<E>().equalTo(new Id<V>()) 16: .with(new LeftSide<E, V>()); Exclusion
  120. 120. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55
  121. 121. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 graphId = secondGraph.getGraphHead()
  122. 122. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead()
  123. 123. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() .map(new Id<G>());
  124. 124. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 .map(new Id<G>());
  125. 125. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() .map(new Id<G>());
  126. 126. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} .map(new Id<G>());
  127. 127. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} .map(new Id<G>()); .filter(new NotInGraphBroadCast<V>()) .withBroadcastSet(graphId, GRAPH_ID);
  128. 128. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} .map(new Id<G>()); .filter(new NotInGraphBroadCast<V>()) .withBroadcastSet(graphId, GRAPH_ID);
  129. 129. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56
  130. 130. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges()
  131. 131. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1}
  132. 132. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  133. 133. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  134. 134. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  135. 135. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  136. 136. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  137. 137. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  138. 138. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  139. 139. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … Id Label Source Target … Id Label … 1 likes 1 2 … 2 Band … .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  140. 140. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … Id Label Source Target … Id Label … 1 likes 1 2 … 2 Band … .with(new LeftSide<E, V>()); .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  141. 141. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … Id Label Source Target … Id Label … 1 likes 1 2 … 2 Band … Id Label Source Target … 1 likes 1 2 … .with(new LeftSide<E, V>()); .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  142. 142. GrALa API Apache Flink and Neo4j Meetup Berlin 57
  143. 143. GrALa API Apache Flink and Neo4j Meetup Berlin 57 class LogicalGraph<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : LogicalGraph<G, V, E> fromDataSets(...) : LogicalGraph<G, V, E> fromGellyGraph(...) : LogicalGraph<G, V, E> getGraphHead() : DataSet<G> getVertices() : DataSet<V> getEdges() : DataSet<E> aggregate(...) : LogicalGraph<G, V, E> match(...) : GraphCollection<G, V, E> groupBy(...) : LogicalGraph<G, V, E> subgraph(...) : LogicalGraph<G, V, E> combine(...) : LogicalGraph<G, V, E> // ... }
  144. 144. GrALa API Apache Flink and Neo4j Meetup Berlin 57 class LogicalGraph<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : LogicalGraph<G, V, E> fromDataSets(...) : LogicalGraph<G, V, E> fromGellyGraph(...) : LogicalGraph<G, V, E> getGraphHead() : DataSet<G> getVertices() : DataSet<V> getEdges() : DataSet<E> aggregate(...) : LogicalGraph<G, V, E> match(...) : GraphCollection<G, V, E> groupBy(...) : LogicalGraph<G, V, E> subgraph(...) : LogicalGraph<G, V, E> combine(...) : LogicalGraph<G, V, E> // ... } class GraphCollection<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge > { fromCollections(...) : GraphCollection<G, V, E> fromDataSets(...) : GraphCollection<G, V, E> getGraphHeads() : DataSet<G> getVertices() : DataSet<V> getEdges() : DataSet<E> select(...) : GraphCollection<G, V, E> distinct( ) : GraphCollection<G, V, E> sortBy(...) : GraphCollection<G, V, E> union(...) : GraphCollection<G, V, E> difference(...) : GraphCollection<G, V, E> // ... }
  145. 145. GrALa API Apache Flink and Neo4j Meetup Berlin 58 class EPGMDatabase<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : EPGMDatabase<G, V, E> fromDataSets(...) : EPGMDatabase<G, V, E> fromHBase(...) : EPGMDatabase<G, V, E> fromJSON(...) : EPGMDatabase<G, V, E> fromExternalGraph(...) : EPGMDatabase<G, V, E> writeAsJSON(...) : void writeToHBase(...) : void getDatabaseGraph( ) : LogicalGraph<G, V, E> getGraphById(...) : LogicalGraph<G, V, E> getGraphsById(...) : GraphCollection<G, V, E> // ... }
  146. 146. GrALa API Apache Flink and Neo4j Meetup Berlin 59 class EPGMDatabase<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : EPGMDatabase<G, V, E> fromDataSets(...) : EPGMDatabase<G, V, E> fromHBase(...) : EPGMDatabase<G, V, E> fromJSON(...) : EPGMDatabase<G, V, E> fromExternalGraph(...) : EPGMDatabase<G, V, E> writeAsJSON(...) : void writeToHBase(...) : void getDatabaseGraph( ) : LogicalGraph<G, V, E> getGraphById(...) : LogicalGraph<G, V, E> getGraphsById(...) : GraphCollection<G, V, E> // ... }
  147. 147. Apache Flink and Neo4j Meetup Berlin 60 Performance
  148. 148. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61
  149. 149. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 http://www.ldbcouncil.org/
  150. 150. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations http://www.ldbcouncil.org/
  151. 151. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information http://www.ldbcouncil.org/
  152. 152. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation http://www.ldbcouncil.org/
  153. 153. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community http://www.ldbcouncil.org/
  154. 154. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users http://www.ldbcouncil.org/
  155. 155. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph http://www.ldbcouncil.org/
  156. 156. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph 7. Group graph by Persons location and gender http://www.ldbcouncil.org/
  157. 157. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph 7. Group graph by Persons location and gender 8. Aggregate vertex and edge count of grouped graph http://www.ldbcouncil.org/
  158. 158. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 62 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph 7. Group graph by Persons location and gender 8. Aggregate vertex and edge count of grouped graph https://git.io/vgozj
  159. 159. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 63 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960
  160. 160. Social Network Benchmark – Runtime Apache Flink and Neo4j Meetup Berlin 64 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960 0 200 400 600 800 1000 1200 1 2 4 8 16 Runtime[s] Number of workers Graphalytics.100
  161. 161. 1 2 4 8 16 1 2 4 8 16 Speedup Number of workers Graphalytics.100 Linear Social Network Benchmark – Speedup Apache Flink and Neo4j Meetup Berlin 65 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960
  162. 162. 1 10 100 1000 10000 Runtime[s] Social Network Benchmark – Datasets Apache Flink and Neo4j Meetup Berlin 66 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960
  163. 163. Apache Flink and Neo4j Meetup Berlin 67 Demo https://github.com/s1ck/neo4j-gradoop-demos
  164. 164. Apache Flink and Neo4j Meetup Berlin 68 Current State and Future Work
  165. 165. Current State – Operator Implementations Apache Flink and Neo4j Meetup Berlin 69 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  166. 166. Release History Apache Flink and Neo4j Meetup Berlin 70 • 0.0.1 First Prototype (May 2015) – Hadoop MapReduce and Giraph for operator implementations – Too much complexity – Performance loss through serialization in HDFS/HBase • 0.0.2 Using Flink as execution layer (June 2015) – Basic operators • 0.1 December 2015 – System-side identifiers (UUID) – Improved property handling – More operator implementations (e.g., Equality, Bool operators) – Code refactoring • 0.2-SNAPSHOT – Graph Pattern Matching – Frequent Subgraph Mining – Memory optimization (96-bit ID, Dictionary Encoding, …) – Tuple Implementation
  167. 167. Contributions to Flink Apache Flink and Neo4j Meetup Berlin 71 • FLINK-2411 Add basic graph summarization algorithm • FLINK-2590 DataSetUtils.zipWithUniqueID creates duplicate Ids • FLINK-2905 Add intersect method to Graph class • FLINK-2910 Combine tests for binary graph operators • FLINK-2941 Implement a neo4j - Flink/Gelly connector • FLINK-2981 Update README for building docs • FLINK-3064 Missing size check in GroupReduceOperatorBase leads to NPE • FLINK-3118 Check if MessageFunction implements ResultTypeQueryable • FLINK-3122 Generalize value type in LabelPropagation • FLINK-3272 Generalize vertex value type in ConnectedComponents • Flink Forward (October 2015) • Meetup Big Data Usergroup Saxony (December 2015) • FOSDEM (January 2016)
  168. 168. Contributions Welcome Apache Flink and Neo4j Meetup Berlin 72 • Code – Operator implementations / improvement – Performance Tuning • People – Bachelor / Master Thesis – Open PhD positions in Leipzig, Germany • Use Cases and (Big) Data!
  169. 169. Apache Flink and Neo4j Meetup Berlin 73 Thank you! www.gradoop.com http://flink.apache.org http://neo4j.com http://ldbcouncil.org https://github.com/s1ck/neo4j-gradoop-demos https://github.com/s1ck/flink-neo4j https://github.com/s1ck/ldbc-flink-import https://github.com/s1ck/gdl

×