The slides contain an overview of Gradoop, our framework for end-to-end graph analytics. We present our extended property graph data model and give an introduction into Apache Flink and its DataSet API. We show, how our data model is mapped to Flink DataSets and how we implement graph operators using DataSet transformations. Furthermore the slides contain information about two useful tools we developed around Gradoop: Graph Definition Language (GDL) and ldbc-flink-import.
Aspirational Block Program Block Syaldey District - Almora
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with Apache Flink
1. GRADOOP: Scalable Graph
Analytics with Apache Flink
Martin Junghanns
Leipzig University
Big Data User Group Dresden / Graph Databases Sachsen
December 2015
2. About the speaker and the team
André, PhD StudentMartin, PhD Student
Kevin, M.Sc. StudentNiklas, M.Sc. Student
Prof. Dr. Erhard Rahm
Database Chair
3. Outline
Motivation
Gradoop Architecture
Extended Property Graph Model (EPGM)
Apache Flink
EPGM on Apache Flink
Business Intelligence Use Case
Tooling
Current State & Future Work
13. World Wide Web
ca. 1 billion websites
“Graphs are large”
Facebook
ca. 1.49 billion active users
ca. 340 friends per user
14. End-to-End Graph Analytics
Data Integration Graph Analytics Representation
Integrate data from one or more sources into a dedicated
graph storage with common graph data model
Definition of analytical workflows from operator algebra
Result representation in a meaningful way
15. Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
16. What‘s missing?
An end-to-end framework and research platform
for efficient, distributed and domain independent
graph data management and analytics.
18. High Level Architecture
HDFS/YARN
Cluster
HBase Distributed Graph Store
Extended Property Graph Model
Flink Operator Implementations
Data Integration
Flink Operator Execution
Workflow
Declaration
Visual
GrALa DSL
Representation
Data flow
Control flow
Graph Analytics Representation
Workflow Execution
19. [1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
Extended Property Graph Model
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
10
11 12 13 14
15
16
17
18 19 20 21
22
23
knows
since : 2014
knows
since : 2014
knows
since : 2013
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
20. [2] Community | interest : Graphs | vertexCount : 4
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
Extended Property Graph Model
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
10
11 12 13 14
15
16
17
18 19 20 21
22
23
knows
since : 2014
knows
since : 2014
knows
since : 2013
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
23. Combination
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
[2] Community | interest : Graphs| vertexCount : 4
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
10
11 12 13 14
15
16
17
18 19 20 21
22
23
knows
since : 2014
knows
since : 2014
knows
since : 2013
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
DB
24. [0] Community | interest : Graphs| vertexCount : 4
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
10
11 12 13 14
15
16
17
18 19 20 21
22
23
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
DB
Combination
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
[4]
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
knows
since : 2014
knows
since : 2014
knows
since : 2013
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
26. [0] Community | interest : Databases | vertexCount : 3
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
10
11 12 13 14
15
16
17
18 19 20 21
22
23
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
DB
Combination + Summarization
[4]
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
knows
since : 2014
knows
since : 2014
knows
since : 2013
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
2: vertexGroupingKeys = {:LABEL, “city”}
3: edgeGroupingKeys = {:LABEL}
4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|)
5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|)
6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc,
edgeGroupingKeys, edgeAggFunc)
27. [0] Community | interest : Databases | vertexCount : 3
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
10
11 12 13 14
15
16
17
18 19 20 21
22
23
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
DB
Combination + Summarization
[4]
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
knows
since : 2014
knows
since : 2014
knows
since : 2013
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
2: vertexGroupingKeys = {:LABEL, “city”}
3: edgeGroupingKeys = {:LABEL}
4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|)
5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|)
6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc,
edgeGroupingKeys, edgeAggFunc)
[5]
[11] Person
city : Leipzig
count : 2
[12] Person
city : Dresden
count : 3
[13] Person
city : Berlin
count : 1
24
25
26
27
28
knows
count : 3
knows
count : 1
knows
count : 2
knows
count : 2
knows
count : 2
28. [0] Community | interest : Databases | vertexCount : 3
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
10
11 12 13 14
15
16
17
18 19 20 21
22
23
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
DB
Combination + Summarization + Aggregation
[4]
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
knows
since : 2014
knows
since : 2014
knows
since : 2013
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
2: vertexGroupingKeys = {:LABEL, “city”}
3: edgeGroupingKeys = {:LABEL}
4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|)
5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|)
6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc,
edgeGroupingKeys, edgeAggFunc)
7: aggFunc = (Graph g => |g.E|)
8: aggGraph = sumGraph.aggregate(“edgeCount”, aggFunc)
[5]
[11] Person
city : Leipzig
count : 2
[12] Person
city : Dresden
count : 3
[13] Person
city : Berlin
count : 1
24
25
26
27
28
knows
count : 3
knows
count : 1
knows
count : 2
knows
count : 2
knows
count : 2
29. [0] Community | interest : Databases | vertexCount : 3
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
10
11 12 13 14
15
16
17
18 19 20 21
22
23
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
DB
Combination + Summarization + Aggregation
[4]
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
knows
since : 2014
knows
since : 2014
knows
since : 2013
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
2: vertexGroupingKeys = {:LABEL, “city”}
3: edgeGroupingKeys = {:LABEL}
4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|)
5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|)
6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc,
edgeGroupingKeys, edgeAggFunc)
7: aggFunc = (Graph g => |g.E|)
8: aggGraph = sumGraph.aggregate(“edgeCount”, aggFunc)
[5] edgeCount : 5
[11] Person
city : Leipzig
count : 2
[12] Person
city : Dresden
count : 3
[13] Person
city : Berlin
count : 1
24
25
26
27
28
knows
count : 3
knows
count : 1
knows
count : 2
knows
count : 2
knows
count : 2
31. Selection
1: resultColl = db.G[0,1,2].select((Graph g => g[“vertexCount”] > 3))
[2] Community | interest : Graphs | vertexCount : 4
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
10
11 12 13 14
15
16
17
18 19 20 21
22
23
knows
since : 2014
knows
since : 2014
knows
since : 2013
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
DB
32. Selection
1: resultColl = db.G[0,1,2].select((Graph g => g[“vertexCount”] > 3))
[1] Community | interest : Hadoop| vertexCount : 3[0] Community | interest : Databases | vertexCount : 3
[0] Tag
name : Databases
[1] Tag
name : Graphs
[2] Tag
name : Hadoop
[3] Forum
title : Graph Databases
[4] Forum
title : Graph Processing
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
[10] Person
name : Frank
gender : m
city : Berlin
age : 23
IP: 169.32.1.3
6 7 8 9
10
11 12 13 14
15
16
17
18 19 20 21
22
23
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModerator
hasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
knows
since : 2015
knows
since : 2015
knows
since : 2015
knows
since : 2013
DB
[2] Community | interest : Graphs | vertexCount : 4
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
0
1
2
3
4
5
knows
since : 2014
knows
since : 2014
knows
since : 2013
knows
since : 2013
knows
since : 2014
knows
since : 2014
38. The „Hello World“ of Big Data – Word Count
1: ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
2:
3: DataSet<String> text = env.fromElements( // or env.readTextFile(„hdfs://…“)
4: „He who controls the past controls the future.“,
5: „He who controls the present controls the past.“);
6:
7: DataSet<Tuple2<String, Integer>> wordCounts = text
8: .flatMap(new LineSplitter()) // splits the line and outputs (word, 1) tuples
9: .groupBy(0)
10: .sum(1);
11:
12: wordCounts.print(); // trigger execution
flatMap
„He who controls the past controls the future.“
„He who controls the present controls the past.“
(He,1)
(who,1)
(controls,1)
(the,1)
(past,1)
// ...
groupBy(0)
[(He,1),(He,1)]
[(who,1),(who,1)]
[(future,1)]
[(past,1),(past,1)]
[(present,1)]
// ...
sum(1)
(He,2)
(who,2)
(future,1)
(past,2)
(present,1)
// ...
42. EPGM in Apache Flink – Exclusion
// input: firstGraph (G[0]), secondGraph (G[2])
1: DataSet<GradoopId> graphId = secondGraph.getGraphHead()
2: .map(new Id<G>());
3:
4: DataSet<V> newVertices = firstGraph.getVertices()
5: .filter(new NotInGraphBroadCast<V>())
6: .withBroadcastSet(graphId, GRAPH_ID);
7:
8: DataSet<E> newEdges = firstGraph.getEdges()
9: .filter(new NotInGraphBroadCast<E>())
10: .withBroadcastSet(graphId, GRAPH_ID)
11: .join(newVertices)
12: .where(new SourceId<E>().equalTo(new Id<V>())
13: .with(new LeftSide<E, V>())
14: .join(newVertices)
15: .where(new TargetId<E>().equalTo(new Id<V>())
16: .with(new LeftSide<E, V>());
db.G[0].exclude(db.G[2])
[2] Community | interest : Graphs| vertexCount : 4
[0] Community | interest : Databases | vertexCount : 3
[5] Person
name : Alice
gender : f
city : Leipzig
age : 23
[6] Person
name : Bob
gender : m
city : Leipzig
age : 30
[7] Person
name : Carol
gender : f
city : Dresden
age : 30
[8] Person
name : Dave
gender : m
city : Dresden
age : 42
[9] Person
name : Eve
gender : f
city : Dresden
age : 35
speaks : en
0
1
2
3
4
5
6 7
knows
since : 2014
knows
since : 2014
knows
since : 2013
knows
since : 2013
knows
since : 2014
knows
since : 2014
knows
since : 2015
knows
since : 2013
43. EPGM in Apache Flink – Exclusion
Id Label Properties
2 Community interest: Graphs vertexCount: 4
graphId =
secondGraph.getGraphHead()
Id
2
newVertices =
firstGraph.getVertices() Id Label Properties Graphs
5 Person name: Alice gender: f … [0, 2]
6 Person name: Bob gender: m … [0, 2]
9 Person name: Eve gender: f … [0]
Id Label Properties Graphs
9 Person name: Eve gender: f … [0]
.map(new Id<G>());
.filter(new NotInGraphBroadCast<V>())
.withBroadcastSet(graphId, GRAPH_ID);
46. Use Case: Graph Business Intelligence
Business intelligence usually based on relational data
warehouses
Enterprise data is integrated within dimensional schema
Analysis limited to predefined relationships
No support for relationship-oriented data mining
Graph-based approach
Integrate data sources within an instance graph by preserving original
relationships between data objects (transactional and master data)
Determine subgraphs (business transaction graphs) related to business
activities
Analyze subgraphs or entire graphs with aggregation queries, mining
relationship patterns, etc.
Facts
Dim 1
Dim 2
Dim 3
47. Prerequisites: Data Integration
metadata
Data SourcesEnterprise
Service Bus
Unified Metadata Graph
Domain expert
(1) Metadata
aquisition
(2) Graph
integration
Integrated Instance Graph
data
Business Transaction Graphs
(3) Subgraph
Detection
48. Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesInvoice
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
basedOn serves
serves
bills
bills
bills
processedBy
49. Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesInvoice
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
50. Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesInvoice
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
51. Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesInvoice
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
52. Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesInvoice
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
53. Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesInvoice
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
79. Contributions welcome!
Code
Operator implementations
Performance Tuning
Extend HBase Storage
Data! and Use Cases
We are researchers, we assume ...
Getting real data (especially BI data) is nearly impossible