GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Marcus Paradies, Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner
GRATIN: Accelerating Graph Traversals
in Main-Memory Column Stores
GRADES’14 Workshop
June 22, 2014

Graphs from an Enterprise View
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2

Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins

Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Chained Joins
Data already in RDBMS
SQL as interface
Data transfer to application

Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Chained Joins
SQL as interface
Relational + Graph
+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data

Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Chained Joins
SQL as interface
Relational + Graph
+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
Efﬁcient processing in GDBMS
Processing on replicated data
No combination with relational data

Graph Processing (The New World)
Graph Representation

id=1
name=John
type=User
id=2
title=The Shining
type=Product
id=3
title=The Stand
type=Product
id=4
name=Horror
type=Category
id=5
name=Literature
type=Category
type=category
type=similar
type=belongs
type=belongs
type=rated
rating=4.0
type=rated
rating=5.0
Example graph
id type name . . . title
1 User John . . . ?
2 Product ? . . . The Shining
3 Product ? . . . The Stand
4 Category Horror . . . ?
5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?
2 3 similar . . . ?
2 4 belongs . . . ?
3 4 belongs . . . ?
1 3 rated . . . 5.0
1 2 rated . . . 4.0
4 5 category . . . ?
Edge table

id=1
name=John
type=User
id=2
title=The Shining
type=Product
id=3
title=The Stand
type=Product
id=4
name=Horror
type=Category
id=5
name=Literature
type=Category
type=category
type=similar
type=belongs
type=belongs
type=rated
rating=4.0
type=rated
rating=5.0
Example graph
id type name . . . title
1 User John . . . ?
2 Product ? . . . The Shining
3 Product ? . . . The Stand
4 Category Horror . . . ?
5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?
2 3 similar . . . ?
2 4 belongs . . . ?
3 4 belongs . . . ?
1 3 rated . . . 5.0
1 2 rated . . . 4.0
4 5 category . . . ?
Edge table
• Each vertex/edge represented as a single record in universal tables
• Support for transactions and compression
• Combination with other data models (spatial, text, temporal)

Query Execution

Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)

Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)

Edge Clustering
• Clustering by edge type preserves
subgraph meaning
• Clustering by edge source preserves vertex
neighborhood
• Increases spatial locality in memory
• Allows reducing scan to range in column
Type Clustering
Edge Clustering
Vs Vt Type
D F a
A D a
A B a
A C a
E B a
E G a
D B b
B E b
F G b

GRATIN
Column
2
2
2
1
3
1
2
3
4
5
• gratin replaces full column scans by block scans

GRATIN
Column
2
2
2
1
3
1
2
3
4
5
Minimal blocksize: 2
• gratin indexes column with variable block size

GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2

GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
2
1
2
1
2
3
Value Blocks

GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5

GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• Allows efﬁcient handling of vertices with high outdegree

Experiments on Static Graphs
ID |V| |E| ¯dout
Amazon 0.4 M 3.3 M 16.8
California-Roads 1.9 M 2.7 M 2.8
1 2 3 4 5 6 7
0
5
10
15
20
# of Traversal Iterations
ExecutionTime(ms)
Amazon
SCAN gratin-512
gratin-4096 gratin-32768
Figure: Comparison for different block sizes.
2 4 6 8 10
10
20
30
40
50
Traversal Iteration
Querytime(ms)
Amazon
2 4 6 8 10
5
10
15
20
25
Traversal Iteration
California-Roads
Figure: Query time for scan-based traversal ( ) and
gratin-based traversal ( ).

Handling Updates
Column
21
22
23
14
35
B1
B2
2
1
2
1
2
3
Value Blocks Block ranges
2
1 3
5

Handling Updates
Column
21
22
23
14
35
B1
B2
2
1
2
1
2
3
Value Blocks Health factor
hB1
= 1.0
hB2
= 1.0
hB3
= 1.0
GRATIN health
h = 1.0
Block ranges
2
1 3
5
• Health factor describes viability of gratin
• If global health factor below threshold → rebuild index

Handling Updates
Column
21
22
23
14
35
26
B1
B2
B3
2
1
2
3
1
2
3
hB1
= 1.0
hB2
= 1.0
hB3
= 1.0
GRATIN health
h = 1.0
Block ranges
3
2
1 3
5
6
• gratin allows updates in constant time (append-only)

Handling Updates
Column
21
22
23
14
35
26
B1
B2
B3
2
1
2
3
1
2
3
hB1
= 1.0
hB2
= 0.5
hB3
= 1.0
GRATIN health
h = 0.83
Block ranges
3
2
1 3
5
6
• gratin allows updates in constant time (append-only)

Experiments on Dynamic Graphs
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
1.8
2
2.2
∆Batch Insertions
SlowdownFactor()
Amazon
0
0.2
0.4
0.6
0.8
1
HealthFactor()
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
∆Batch Insertions
SlowdownFactor()
California-Roads
0
0.2
0.4
0.6
0.8
1
HealthFactor()
Figure: Query time for gratin-based traversal on dynamic graphs. Slowdown factor describes
the relative execution time in multiples of the execution time on a static graph.

Summary
• Tight integration of traversal operator
into main-memory column store
• gratin is a lightweight secondary
index structure
• Handles dynamic graphs in
predictable time
• Experiments show a diverse
spectrum of performance
improvements
• Performance of gratin depends on
graph topology

Contact
Marcus Paradies
PhD Student at Database Technology Group, TU Dresden
https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/
marcus.paradies@gmail.com

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Recommended

Recommended

More Related Content

Similar to GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Similar to GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores (20)

Recently uploaded

Recently uploaded (20)

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores