SlideShare a Scribd company logo
Marcus Paradies, Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner
GRATIN: Accelerating Graph Traversals
in Main-Memory Column Stores
GRADES’14 Workshop
June 22, 2014
Graphs from an Enterprise View
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
 Data already in RDBMS
 SQL as interface
 Data transfer to application
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
 Data already in RDBMS
 SQL as interface
 Data transfer to application
Relational + Graph
+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
 Data already in RDBMS
 SQL as interface
 Data transfer to application
Relational + Graph
+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
 Efficient processing in GDBMS
 Processing on replicated data
 No combination with relational data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graph Processing (The New World)
Graph Representation
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Graph Representation
id=1
name=John
type=User
id=2
title=The Shining
type=Product
id=3
title=The Stand
type=Product
id=4
name=Horror
type=Category
id=5
name=Literature
type=Category
type=category
type=similar
type=belongs
type=belongs
type=rated
rating=4.0
type=rated
rating=5.0
Example graph
id type name . . . title
1 User John . . . ?
2 Product ? . . . The Shining
3 Product ? . . . The Stand
4 Category Horror . . . ?
5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?
2 3 similar . . . ?
2 4 belongs . . . ?
3 4 belongs . . . ?
1 3 rated . . . 5.0
1 2 rated . . . 4.0
4 5 category . . . ?
Edge table
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Graph Representation
id=1
name=John
type=User
id=2
title=The Shining
type=Product
id=3
title=The Stand
type=Product
id=4
name=Horror
type=Category
id=5
name=Literature
type=Category
type=category
type=similar
type=belongs
type=belongs
type=rated
rating=4.0
type=rated
rating=5.0
Example graph
id type name . . . title
1 User John . . . ?
2 Product ? . . . The Shining
3 Product ? . . . The Stand
4 Category Horror . . . ?
5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?
2 3 similar . . . ?
2 4 belongs . . . ?
3 4 belongs . . . ?
1 3 rated . . . 5.0
1 2 rated . . . 4.0
4 5 category . . . ?
Edge table
• Each vertex/edge represented as a single record in universal tables
• Support for transactions and compression
• Combination with other data models (spatial, text, temporal)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Query Execution
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Edge Clustering
• Clustering by edge type preserves
subgraph meaning
• Clustering by edge source preserves vertex
neighborhood
• Increases spatial locality in memory
• Allows reducing scan to range in column
Type Clustering
Edge Clustering
Vs Vt Type
D F a
A D a
A B a
A C a
E B a
E G a
D B b
B E b
F G b
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 5
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
• gratin replaces full column scans by block scans
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
Minimal blocksize: 2
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
Experiments on Static Graphs
ID |V| |E| ¯dout
Amazon 0.4 M 3.3 M 16.8
California-Roads 1.9 M 2.7 M 2.8
1 2 3 4 5 6 7
0
5
10
15
20
# of Traversal Iterations
ExecutionTime(ms)
Amazon
SCAN gratin-512
gratin-4096 gratin-32768
Figure: Comparison for different block sizes.
2 4 6 8 10
10
20
30
40
50
Traversal Iteration
Querytime(ms)
Amazon
2 4 6 8 10
5
10
15
20
25
Traversal Iteration
California-Roads
Figure: Query time for scan-based traversal ( ) and
gratin-based traversal ( ).
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 7
Handling Updates
Column
21
22
23
14
35
Minimal blocksize: 2
B1
B2
2
1
2
1
2
3
Value Blocks Block ranges
2
1 3
5
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
Minimal blocksize: 2
B1
B2
2
1
2
1
2
3
Value Blocks Health factor
hB1
= 1.0
hB2
= 1.0
hB3
= 1.0
GRATIN health
h = 1.0
Block ranges
2
1 3
5
• Health factor describes viability of gratin
• If global health factor below threshold → rebuild index
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks Health factor
hB1
= 1.0
hB2
= 1.0
hB3
= 1.0
GRATIN health
h = 1.0
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold → rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks Health factor
hB1
= 1.0
hB2
= 0.5
hB3
= 1.0
GRATIN health
h = 0.83
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold → rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Experiments on Dynamic Graphs
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
1.8
2
2.2
∆Batch Insertions
SlowdownFactor()
Amazon
0
0.2
0.4
0.6
0.8
1
HealthFactor()
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
∆Batch Insertions
SlowdownFactor()
California-Roads
0
0.2
0.4
0.6
0.8
1
HealthFactor()
Figure: Query time for gratin-based traversal on dynamic graphs. Slowdown factor describes
the relative execution time in multiples of the execution time on a static graph.
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 9
Summary
• Tight integration of traversal operator
into main-memory column store
• gratin is a lightweight secondary
index structure
• Handles dynamic graphs in
predictable time
• Experiments show a diverse
spectrum of performance
improvements
• Performance of gratin depends on
graph topology
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 10
Contact
Marcus Paradies
PhD Student at Database Technology Group, TU Dresden
https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/
marcus.paradies@gmail.com
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 11

More Related Content

Similar to GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
Safe Software
 
Map Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analyticsMap Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analytics
itesm
 
MapReduce
MapReduceMapReduce
MapReduce
SatyaHadoop
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
Gerald Muecke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between  CAD & GIS: 6 Ways to Automate Your  Data IntegrationBridging Between  CAD & GIS: 6 Ways to Automate Your  Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Safe Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Prakher Hajela Saxena
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
MapReduce
MapReduceMapReduce
MapReduce
KavyaGo
 
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
VMware Tanzu
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
richoe
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
Nishant Gandhi
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
Institute of Contemporary Sciences
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
Fabio Fumarola
 

Similar to GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores (20)

Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
 
Map Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analyticsMap Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analytics
 
MapReduce
MapReduceMapReduce
MapReduce
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between  CAD & GIS: 6 Ways to Automate Your  Data IntegrationBridging Between  CAD & GIS: 6 Ways to Automate Your  Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
MapReduce
MapReduceMapReduce
MapReduce
 
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

  • 1. Marcus Paradies, Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores GRADES’14 Workshop June 22, 2014
  • 2. Graphs from an Enterprise View © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 3. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 4. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins Data already in RDBMS SQL as interface Data transfer to application © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 5. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins Data already in RDBMS SQL as interface Data transfer to application Relational + Graph + Application Logic Application Layer RDBMS GDBMS Replicate Data Data Data Data © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 6. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins Data already in RDBMS SQL as interface Data transfer to application Relational + Graph + Application Logic Application Layer RDBMS GDBMS Replicate Data Data Data Data Efficient processing in GDBMS Processing on replicated data No combination with relational data © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 7. Graph Processing (The New World) Graph Representation © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
  • 8. Graph Processing (The New World) Graph Representation id=1 name=John type=User id=2 title=The Shining type=Product id=3 title=The Stand type=Product id=4 name=Horror type=Category id=5 name=Literature type=Category type=category type=similar type=belongs type=belongs type=rated rating=4.0 type=rated rating=5.0 Example graph id type name . . . title 1 User John . . . ? 2 Product ? . . . The Shining 3 Product ? . . . The Stand 4 Category Horror . . . ? 5 Category Literature . . . ? Vertex table Vs Vt type . . . rating 3 2 similar . . . ? 2 3 similar . . . ? 2 4 belongs . . . ? 3 4 belongs . . . ? 1 3 rated . . . 5.0 1 2 rated . . . 4.0 4 5 category . . . ? Edge table © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
  • 9. Graph Processing (The New World) Graph Representation id=1 name=John type=User id=2 title=The Shining type=Product id=3 title=The Stand type=Product id=4 name=Horror type=Category id=5 name=Literature type=Category type=category type=similar type=belongs type=belongs type=rated rating=4.0 type=rated rating=5.0 Example graph id type name . . . title 1 User John . . . ? 2 Product ? . . . The Shining 3 Product ? . . . The Stand 4 Category Horror . . . ? 5 Category Literature . . . ? Vertex table Vs Vt type . . . rating 3 2 similar . . . ? 2 3 similar . . . ? 2 4 belongs . . . ? 3 4 belongs . . . ? 1 3 rated . . . 5.0 1 2 rated . . . 4.0 4 5 category . . . ? Edge table • Each vertex/edge represented as a single record in universal tables • Support for transactions and compression • Combination with other data models (spatial, text, temporal) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
  • 10. Graph Processing (The New World) Query Execution © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 11. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 12. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 13. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 14. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 15. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 16. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 17. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 18. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 19. Edge Clustering • Clustering by edge type preserves subgraph meaning • Clustering by edge source preserves vertex neighborhood • Increases spatial locality in memory • Allows reducing scan to range in column Type Clustering Edge Clustering Vs Vt Type D F a A D a A B a A C a E B a E G a D B b B E b F G b © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 5
  • 20. GRATIN Column 2 2 2 1 3 1 2 3 4 5 • gratin replaces full column scans by block scans © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 21. GRATIN Column 2 2 2 1 3 1 2 3 4 5 Minimal blocksize: 2 • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 22. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 23. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 2 1 2 1 2 3 Value Blocks • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 24. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 2 1 2 1 2 3 Value Blocks Block ranges 2 1 3 5 • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 25. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 2 1 2 1 2 3 Value Blocks Block ranges 2 1 3 5 • gratin replaces full column scans by block scans • gratin indexes column with variable block size • Allows efficient handling of vertices with high outdegree © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 26. Experiments on Static Graphs ID |V| |E| ¯dout Amazon 0.4 M 3.3 M 16.8 California-Roads 1.9 M 2.7 M 2.8 1 2 3 4 5 6 7 0 5 10 15 20 # of Traversal Iterations ExecutionTime(ms) Amazon SCAN gratin-512 gratin-4096 gratin-32768 Figure: Comparison for different block sizes. 2 4 6 8 10 10 20 30 40 50 Traversal Iteration Querytime(ms) Amazon 2 4 6 8 10 5 10 15 20 25 Traversal Iteration California-Roads Figure: Query time for scan-based traversal ( ) and gratin-based traversal ( ). © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 7
  • 27. Handling Updates Column 21 22 23 14 35 Minimal blocksize: 2 B1 B2 2 1 2 1 2 3 Value Blocks Block ranges 2 1 3 5 © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 28. Handling Updates Column 21 22 23 14 35 Minimal blocksize: 2 B1 B2 2 1 2 1 2 3 Value Blocks Health factor hB1 = 1.0 hB2 = 1.0 hB3 = 1.0 GRATIN health h = 1.0 Block ranges 2 1 3 5 • Health factor describes viability of gratin • If global health factor below threshold → rebuild index © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 29. Handling Updates Column 21 22 23 14 35 26 Minimal blocksize: 2 B1 B2 B3 2 1 2 3 1 2 3 Value Blocks Health factor hB1 = 1.0 hB2 = 1.0 hB3 = 1.0 GRATIN health h = 1.0 Block ranges 3 2 1 3 5 6 • Health factor describes viability of gratin • If global health factor below threshold → rebuild index • gratin allows updates in constant time (append-only) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 30. Handling Updates Column 21 22 23 14 35 26 Minimal blocksize: 2 B1 B2 B3 2 1 2 3 1 2 3 Value Blocks Health factor hB1 = 1.0 hB2 = 0.5 hB3 = 1.0 GRATIN health h = 0.83 Block ranges 3 2 1 3 5 6 • Health factor describes viability of gratin • If global health factor below threshold → rebuild index • gratin allows updates in constant time (append-only) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 31. Experiments on Dynamic Graphs 0 +20K +20K +20K +20K +20K 1 1.2 1.4 1.6 1.8 2 2.2 ∆Batch Insertions SlowdownFactor() Amazon 0 0.2 0.4 0.6 0.8 1 HealthFactor() 0 +20K +20K +20K +20K +20K 1 1.2 1.4 1.6 ∆Batch Insertions SlowdownFactor() California-Roads 0 0.2 0.4 0.6 0.8 1 HealthFactor() Figure: Query time for gratin-based traversal on dynamic graphs. Slowdown factor describes the relative execution time in multiples of the execution time on a static graph. © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 9
  • 32. Summary • Tight integration of traversal operator into main-memory column store • gratin is a lightweight secondary index structure • Handles dynamic graphs in predictable time • Experiments show a diverse spectrum of performance improvements • Performance of gratin depends on graph topology © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 10
  • 33. Contact Marcus Paradies PhD Student at Database Technology Group, TU Dresden https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/ marcus.paradies@gmail.com © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 11