SlideShare a Scribd company logo
1 of 30
GRAPHITE: An Extensible Graph Traversal
Framework for Relational Database Management
Systems
Marcus Paradies*, Wolfgang Lehner*, and Christof Bornhoevd° | SSDBM‘
15
*TU Dresden °Risk Management Solutions, Inc.
2
Graph Processing on Enterprise Data
Relational + Application Logic Relational + Graph + Application Logic
Data already in RDBMS
SQL as the only interface/no graph abstraction
Data transfer to application
Efficient processing in GDBMS
Processing on replicated data
Data transfer to application
No combination with other data models possible
3
Integration of Graph Processing into an RDBMS
How could a deep integration of graph functionality into an RDBMS look like?
Graph operators can be seamlessly combined with other plan
operators
4
Columnar Graph Storage
All available data types can be used as vertex/edge attributes
5
Columnar Graph Storage
All available data types can be used as vertex/edge attributes
Lightweight
compression
techniques
(RLE)
6
Graph Traversal Workflow
Traversal
Configuration
Traversal
Kernels
• S – set of start vertices
• φ – edge predicate
• c – collection boundary
• r – recursion boundary
• d – traversal direction
Pluggable physical traversal kernels as
implementations of a logical traversal operator
7
Graph Traversal Formalism
FORMAL DESCRIPTION (SET-BASED)
• A traversal operation is a totally ordered set P of path steps
• Each path step pi receives a vertex set Di-1 discovered at level (i-1)
and returns a set of adjacent vertices Di (1 ≤ i ≤ r, r is recursion boundary)
• Initally, D0 = 𝑆
Di = 𝑣 ∃𝑢 ∈ 𝐷i−1 : 𝑒= (𝑢,𝑣) ∈ 𝐸 ˄ 𝑒𝑣𝑎𝑙(𝑒, φ)
• The final output R is defined as
R = 𝑖=𝑐
𝑟
𝐷𝑖  𝑖=0
𝑐−1
𝐷𝑖
Target
vertices
Visited
vertices
8
Graph Traversals by Example
Traversal Configuration Result
{ { A }, “type=‘a‘“, 1, 1,  } { B, C, D }
Root vertex
Discovered vertex
9
Graph Traversals by Example
Traversal Configuration Result
{ { E }, “type=‘b‘“, 2, 2,  } { D }
Root vertex
Discovered vertex
10
Graph Traversals by Example
Traversal Configuration Result
{ { A }, “type=‘a‘“, 1, ∞,  } { B, C, D, F}
Root vertex
Discovered vertex
11
Graph Traversals by Example
Traversal Configuration Result
{ {A}, “type=‘a‘ OR type=‘b‘“, 2, 2,  } { E, F }
Root vertex
Discovered vertex
12
Graph Traversals by Example
Traversal Configuration Result
{ { A }, “type=‘a‘“, 0, 1,  } { A, B, C, D }
Root vertex
Discovered vertex
13
Level-Synchronous (LS) Traversal
SCAN-BASED GRAPH TRAVERSAL
Scan partitions
(dictionary-encoded)
14
Level-Synchronous (LS) Traversal
SCAN-BASED GRAPH TRAVERSAL
Keep (intermediate)
results as bitsets
15
Level-Synchronous (LS) Traversal
SCAN-BASED GRAPH TRAVERSAL
Fetch neighbors
by position
16
Level-Synchronous (LS) Traversal
SCAN-BASED GRAPH TRAVERSAL
Set operations on
vectorized bitsets
17
Level-Synchronous (LS) Traversal
SCAN-BASED GRAPH TRAVERSAL EDGE CLUSTERING
Clustering by
edge type
Clustering by
source vertex
18
Fragmented-Incremental (FI) Traversal
• Partition column into fragments
• Track dependencies between fragments in index structure
• Goal: Minimize number of fragment reads
Fragment
For a fragment size equal to |E|, FI-traversal degenerates to LS-
traversal
19
Fragmented-Incremental (FI) Traversal
• Partition column into fragments
• Track dependencies between fragments in index structure
• Goal: Minimize number of fragment reads
Transition 8  13  12
For a fragment size equal to |E|, FI-traversal degenerates to LS-
traversal
20
Fragmented-Incremental (FI) Traversal
• Partition column into fragments
• Track dependencies between fragments in index structure
• Goal: Minimize number of fragment reads
Transition Graph
Probabilistic
Fragment Synopsis
For a fragment size equal to |E|, FI-traversal degenerates to LS-
traversal
21
Fragmented-Incremental (FI) Traversal
• Partition column into fragments
• Track dependencies between fragments in index structure
• Goal: Minimize number of fragment reads
Transition Graph
Fragment Queue
Execution Chain
For a fragment size equal to |E|, FI-traversal degenerates to LS-
traversal
22
Experimental Evaluation
EVALUATED REAL-WORLD DATA
SETS
• Six real-world data sets with different
topology characteristics
EVALUATED SYSTEMS
• Implementation in main-memory column store prototype (C++)
• Graph database (Neo4j)
• RDF DBMS (Virtuoso 7.0 with columnar storage layout)
• Commerical columnar RDBMS (via chained self-joins, with and without index support)
23
Experimental Evaluation
COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL
Traversal performance depends on the traversal depth and the
topology
FI-Traversal outperforms
LS-Traversal by one order
of magnitude
24
Experimental Evaluation
COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL
Traversal performance depends on the traversal depth and the
topology
LS-Traversal starts
outperforming
FI-Traversal
25
Experimental Evaluation
COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL
Traversal performance depends on the traversal depth and the
topology
Different performance
characteristics for different
fragment sizes
26
Experimental Evaluation
COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL
Traversal performance depends on the traversal depth and the
topology
27
Experimental Evaluation
COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL
Traversal performance depends on the traversal depth and the
topology
Scan-based traversal outperforms
fragmented traversal depending on
traversal depth and graph topology
28
Experimental Evaluation
SYSTEM-LEVEL BENCHMARK
Combination of LS and FI-traversal outperforms native graph systems
by up to two orders of magnitude
29
Summary
GRAPHITE
• Graph processing tightly integrated into RDBMS
• Extensions of core components by graph extensions
(operators, cost model, index structures)
• Topology characteristics-aware traversal operators
GRAPH-SPECIFIC DATA STATISTICS AND
ALGORITHMS• Diverse graph topologies demand different algorithmic design
decisions
• Index scan versus full column scan decision also applies for graph
traversals
FUTURE WORK
• Integration with temporal, spatial, and text data
• Language extensions for custom code executed during graph traversal
Integration of GRAPHITE into RDBMS
Contact
Marcus Paradies, TU Dresden
m.paradies@sap.com
https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/

More Related Content

What's hot

Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)shakimov
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Rohit Agrawal
 
Demonstration of how to read and write ESRI
Demonstration of how to read and write ESRIDemonstration of how to read and write ESRI
Demonstration of how to read and write ESRIRazimulseye
 
8085 arithmetic instructions
8085 arithmetic instructions8085 arithmetic instructions
8085 arithmetic instructionsprashant1271
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CARobert Metzger
 

What's hot (9)

Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)
 
Tx well data final
Tx well data finalTx well data final
Tx well data final
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
Demonstration of how to read and write ESRI
Demonstration of how to read and write ESRIDemonstration of how to read and write ESRI
Demonstration of how to read and write ESRI
 
Grasp Hsc09
Grasp Hsc09Grasp Hsc09
Grasp Hsc09
 
8085 arithmetic instructions
8085 arithmetic instructions8085 arithmetic instructions
8085 arithmetic instructions
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
 

Similar to GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems

Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...bhargavi804095
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at ScaleSascha Dittmann
 
Architectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsArchitectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsShereef Shehata
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
ePOM - Intro to Ocean Data Science - Raster and Vector Data Formats
ePOM - Intro to Ocean Data Science - Raster and Vector Data FormatsePOM - Intro to Ocean Data Science - Raster and Vector Data Formats
ePOM - Intro to Ocean Data Science - Raster and Vector Data FormatsGiuseppe Masetti
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationRevolution Analytics
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsRomanaPernischov
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsDatabricks
 

Similar to GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems (20)

Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...
 
3DRepo
3DRepo3DRepo
3DRepo
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
Architectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsArchitectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_Datapaths
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
ePOM - Intro to Ocean Data Science - Raster and Vector Data Formats
ePOM - Intro to Ocean Data Science - Raster and Vector Data FormatsePOM - Intro to Ocean Data Science - Raster and Vector Data Formats
ePOM - Intro to Ocean Data Science - Raster and Vector Data Formats
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 

GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems

  • 1. GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems Marcus Paradies*, Wolfgang Lehner*, and Christof Bornhoevd° | SSDBM‘ 15 *TU Dresden °Risk Management Solutions, Inc.
  • 2. 2 Graph Processing on Enterprise Data Relational + Application Logic Relational + Graph + Application Logic Data already in RDBMS SQL as the only interface/no graph abstraction Data transfer to application Efficient processing in GDBMS Processing on replicated data Data transfer to application No combination with other data models possible
  • 3. 3 Integration of Graph Processing into an RDBMS How could a deep integration of graph functionality into an RDBMS look like? Graph operators can be seamlessly combined with other plan operators
  • 4. 4 Columnar Graph Storage All available data types can be used as vertex/edge attributes
  • 5. 5 Columnar Graph Storage All available data types can be used as vertex/edge attributes Lightweight compression techniques (RLE)
  • 6. 6 Graph Traversal Workflow Traversal Configuration Traversal Kernels • S – set of start vertices • φ – edge predicate • c – collection boundary • r – recursion boundary • d – traversal direction Pluggable physical traversal kernels as implementations of a logical traversal operator
  • 7. 7 Graph Traversal Formalism FORMAL DESCRIPTION (SET-BASED) • A traversal operation is a totally ordered set P of path steps • Each path step pi receives a vertex set Di-1 discovered at level (i-1) and returns a set of adjacent vertices Di (1 ≤ i ≤ r, r is recursion boundary) • Initally, D0 = 𝑆 Di = 𝑣 ∃𝑢 ∈ 𝐷i−1 : 𝑒= (𝑢,𝑣) ∈ 𝐸 ˄ 𝑒𝑣𝑎𝑙(𝑒, φ) • The final output R is defined as R = 𝑖=𝑐 𝑟 𝐷𝑖 𝑖=0 𝑐−1 𝐷𝑖 Target vertices Visited vertices
  • 8. 8 Graph Traversals by Example Traversal Configuration Result { { A }, “type=‘a‘“, 1, 1,  } { B, C, D } Root vertex Discovered vertex
  • 9. 9 Graph Traversals by Example Traversal Configuration Result { { E }, “type=‘b‘“, 2, 2,  } { D } Root vertex Discovered vertex
  • 10. 10 Graph Traversals by Example Traversal Configuration Result { { A }, “type=‘a‘“, 1, ∞,  } { B, C, D, F} Root vertex Discovered vertex
  • 11. 11 Graph Traversals by Example Traversal Configuration Result { {A}, “type=‘a‘ OR type=‘b‘“, 2, 2,  } { E, F } Root vertex Discovered vertex
  • 12. 12 Graph Traversals by Example Traversal Configuration Result { { A }, “type=‘a‘“, 0, 1,  } { A, B, C, D } Root vertex Discovered vertex
  • 13. 13 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Scan partitions (dictionary-encoded)
  • 14. 14 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Keep (intermediate) results as bitsets
  • 15. 15 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Fetch neighbors by position
  • 16. 16 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL Set operations on vectorized bitsets
  • 17. 17 Level-Synchronous (LS) Traversal SCAN-BASED GRAPH TRAVERSAL EDGE CLUSTERING Clustering by edge type Clustering by source vertex
  • 18. 18 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Fragment For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 19. 19 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition 8  13  12 For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 20. 20 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition Graph Probabilistic Fragment Synopsis For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 21. 21 Fragmented-Incremental (FI) Traversal • Partition column into fragments • Track dependencies between fragments in index structure • Goal: Minimize number of fragment reads Transition Graph Fragment Queue Execution Chain For a fragment size equal to |E|, FI-traversal degenerates to LS- traversal
  • 22. 22 Experimental Evaluation EVALUATED REAL-WORLD DATA SETS • Six real-world data sets with different topology characteristics EVALUATED SYSTEMS • Implementation in main-memory column store prototype (C++) • Graph database (Neo4j) • RDF DBMS (Virtuoso 7.0 with columnar storage layout) • Commerical columnar RDBMS (via chained self-joins, with and without index support)
  • 23. 23 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology FI-Traversal outperforms LS-Traversal by one order of magnitude
  • 24. 24 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology LS-Traversal starts outperforming FI-Traversal
  • 25. 25 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology Different performance characteristics for different fragment sizes
  • 26. 26 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology
  • 27. 27 Experimental Evaluation COMPARISON OF LS-TRAVERSAL AND FI-TRAVERSAL Traversal performance depends on the traversal depth and the topology Scan-based traversal outperforms fragmented traversal depending on traversal depth and graph topology
  • 28. 28 Experimental Evaluation SYSTEM-LEVEL BENCHMARK Combination of LS and FI-traversal outperforms native graph systems by up to two orders of magnitude
  • 29. 29 Summary GRAPHITE • Graph processing tightly integrated into RDBMS • Extensions of core components by graph extensions (operators, cost model, index structures) • Topology characteristics-aware traversal operators GRAPH-SPECIFIC DATA STATISTICS AND ALGORITHMS• Diverse graph topologies demand different algorithmic design decisions • Index scan versus full column scan decision also applies for graph traversals FUTURE WORK • Integration with temporal, spatial, and text data • Language extensions for custom code executed during graph traversal Integration of GRAPHITE into RDBMS
  • 30. Contact Marcus Paradies, TU Dresden m.paradies@sap.com https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/

Editor's Notes

  1. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  2. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  3. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  4. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  5. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  6. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  7. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  8. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  9. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  10. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  11. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  12. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  13. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste
  14. MOM: Message Oriented Middlewared Middleware bezeichnet Middleware, die auf der asynchronen oder synchronen Kommunikation, also der Übertragung von Nachrichten (Messages) beruht. WSMS: Web Service Management Syste