SlideShare a Scribd company logo
1 of 16
Download to read offline
DEX: a High-Performance Graph Database Management System




                                                           A High-Performance Graph
                                                           Database Management
                                                           System


                                                           Authors:
                                                           Norbert Martínez-Bazan
                                                           Sergio Gómez-Villamor
                                                           Francesc Escalé-Claveras
DEX: a High-Performance Graph Database Management System


                                                           Outline
Nom e la presenatació o altra info (opcional)




                                                              Introduction
                                                              DEX
                                                                  Logical graph model
                                                                  Internal representation
                                                              Software architecture
                                                              Experimental results
                                                              Conclusions
                                                              Future work
DEX: a High-Performance Graph Database Management System


                                                           Introduction
Nom e la presenatació o altra info (opcional)




                                                              [2006] DEX started by DAMA-UPC
                                                              [2010] Sparsity Technologies is a spin-
                                                               out from DAMA-UPC
                                                                   Sparsity comercializes and provides
                                                                    services
                                                                   DAMA-UPC does development and
                                                                    research

                                                              DEX Versions
                                                                   V2.0   March/2009
                                                                   V3.0   October/2009
                                                                   V4.0   November/2010
DEX: a High-Performance Graph Database Management System


                                                           DEX
Nom e la presenatació o altra info (opcional)




                                                              DEX is a graph database:
                                                                  Data and schema both are represented as a graph
                                                                  Data operations are based on graph operations
                                                                  Graph-based integrity restrictions

                                                               Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database
                                                                 models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)


                                                              Focus:
                                                                  Management of very large graphs
                                                                  High-performance on query operations
DEX: a High-Performance Graph Database Management System


                                                               Logical graph model
Nom e la presenatació o altra info (opcional)




                                                              Labeled: nodes and edges are “typed”
                                                              Directed: edges can have a fixed direction
                                                              Attributed: nodes and edges can have multiple single-valued attributes
                                                              Multigraph: two nodes can be connected by multiple edges
DEX: a High-Performance Graph Database Management System


                                                           Internal representation
Nom e la presenatació o altra info (opcional)




                                                              Requirements

                                                                  Split the graph into smaller structures
                                                                    • Favour the caching
                                                                    • Move to main memory just significant parts

                                                                  OIDs instead of objects
                                                                    • Reduce memory requirements

                                                                  Specific structures to improve traversals
                                                                    • Index edges of a node

                                                                  Attributes fully indexed
                                                                    • Improve queries based on value filters
DEX: a High-Performance Graph Database Management System


                                                               Internal representation
Nom e la presenatació o altra info (opcional)




                                                              Our approach:
                                                                      Map + Bitmaps  Link
                                                              Link: bidirectional association between values and OIDs
                                                                      Two functionalities:
                                                                        • Given a value  a set of OIDs (a bitmap)
                                                                        • Given an OID  the value
                                                                                                                            Bitmaps

                                                                                                        oid                           oids
                                                                                                                              1   2    3     4   5
                                                               a                              1                     value
                                                                                                         1                    1 0 0 0 1
                                                               b                              2          2            a       1   2    3

                                                               c                              3          3            b       0 1 1
                                                                                                                              1   2    3     4
                                                                                                         4            c
                                                                                              4                               0 0 0 1
                                                                                                         5
                                                                                              5
                                                                                                              Map
                                                                                                                     Link
DEX: a High-Performance Graph Database Management System


                                                           Internal representation
Nom e la presenatació o altra info (opcional)




                                                           A Graph as a combination of Bitmaps:
                                                            1 Bitmap for each node or edge type
                                                            1 Link for each attribute
                                                            2 Links for each edge type:
                                                                   Out-going and in-going edges


                                                           N. Martínez-Bazán, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M. A.
                                                               Sánchez-Martínez, and J. Larriba-Pey, Dex: high-performance exploration
                                                               on large graphs for information retrieval. In Proceedings of the sixteenth
                                                               ACM conference on Conference on information and knowledge
                                                               management (CIKM '07)
DEX: a High-Performance Graph Database Management System


                                                           Software architecture
Nom e la presenatació o altra info (opcional)




                                                              DEXCORE
                                                                  Complete C++ library
                                                                    • Storage and query
                                                                  Linux / Windows /
                                                                   MacOSX
                                                                  32 / 64 bits
                                                              JDEX
                                                                  DEXCORE functionality
                                                                   provided as a
                                                                   Java library
DEX: a High-Performance Graph Database Management System


                                                               Software architecture
Nom e la presenatació o altra info (opcional)




                                                           DEXCORE:
                                                            IO
                                                                   Segment: Logical space of pages
                                                                   Pool: Groups of segments
                                                                   Storage: I/O device
                                                                   Cache: I/O management
                                                                     • Replacement policy
                                                               Data:
                                                                   Paged out-of-core structures
                                                                   Bitmaps, Maps, Links, …
                                                               Graph:
                                                                   A combination of structures
                                                                   DbGraph and RGraphs
                                                               DEX:
                                                                   Database and Session management
DEX: a High-Performance Graph Database Management System


                                                           Software architecture
Nom e la presenatació o altra info (opcional)




                                                           Implementation details:
                                                            37-bit unsigned integer OIDs
                                                                  + 137 billion objects per graph
                                                              Bitmaps are compressed
                                                                  Clusters of 32 consecutive bits
                                                                  Just existing clusters are stored
                                                              Groups of OIDs for each type
                                                                  Higher density of consecutive bits into bitmaps
                                                              Maps are B+ trees
                                                                  A compressed UTF-8 storage for UNICODE strings
DEX: a High-Performance Graph Database Management System


                                                            Experimental results
Nom e la presenatació o altra info (opcional)




                                                               Load tests

                                                                                IMDB          Wikipedia     RMAT (sf=28)
                                                           DB                     2.4 GB           7.6 GB          83 GB
                                                           Physical Mem                9 GB          9 GB          60 GB
                                                           Load time               21 min         2h 6min            15h
                                                           Nodes                       13 M          19 M          230 M
                                                           Edges                       22 M         180 M         2147 M
                                                           Values                      48 M         283 M          230 M
                                                           Insertions per sec          65 K          62 K            48 K
DEX: a High-Performance Graph Database Management System


                                                               Experimental results
Nom e la presenatació o altra info (opcional)




                                                              Query tests
                                                                   IMDB database (2.4 GB)
                                                                   Queries:
                                                                     • A: full extraction of a movie, multiple 1-hop traversal [4K edges]
                                                                     • B: distance between two actors [8K edges]
                                                                     • C: extract all movies that match a given pattern [315K edges]

                                                                               In-memory             128 MB
                                                                       A             0.13 sec             0.13 sec
                                                                       B             1.52 sec             1.79 sec

                                                                       C              384 sec             385 sec
DEX: a High-Performance Graph Database Management System


                                                               Conclusions
Nom e la presenatació o altra info (opcional)




                                                              We propose DEX, a high performance graph database
                                                               querying system for labeled and directed attributed
                                                               multigraphs

                                                              We propose a graph representation based on the
                                                               intensive use of bitmaps

                                                              We perform an experimental performance analysis to
                                                               show the ability of DEX to store and query very large
                                                               graphs
DEX: a High-Performance Graph Database Management System


                                                               Future work
Nom e la presenatació o altra info (opcional)




                                                              Trillions of objects
                                                              Transactional system
                                                              Distributed system
                                                              Query language
                                                                   Query optimization
                                                              High-level graph operations
                                                                   Pattern matching
DEX: a High-Performance Graph Database Management System


                                                           Questions?
Nom e la presenatació o altra info (opcional)




                                                                         sgomez@sparsity-technologies.com

                                                              Sparsity Technologies http://www.sparsity-technologies.com

                                                                        DAMA-UPC http://www.dama.upc.edu

More Related Content

What's hot

The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810Gennaro (Rino) Persico
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10keirdo1
 
Creating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data EnvironmentCreating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data Environmentanicewick
 
Relational
RelationalRelational
Relationaldieover
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMMediaEval2012
 
Data stage scenario design 2 - job1
Data stage scenario   design 2 - job1Data stage scenario   design 2 - job1
Data stage scenario design 2 - job1Naresh Bala
 
Cost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systemsCost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systemsMiguel Pardal
 
FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmIDES Editor
 

What's hot (12)

The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810
 
Greenplum hadoop
Greenplum hadoopGreenplum hadoop
Greenplum hadoop
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10
 
Creating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data EnvironmentCreating an RAD Authoratative Data Environment
Creating an RAD Authoratative Data Environment
 
Relational
RelationalRelational
Relational
 
Lee oracle
Lee oracleLee oracle
Lee oracle
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Resource Oriented Architecture for Managing Multimedia Content by Florian
Resource Oriented Architecture for Managing Multimedia Content by FlorianResource Oriented Architecture for Managing Multimedia Content by Florian
Resource Oriented Architecture for Managing Multimedia Content by Florian
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
 
Data stage scenario design 2 - job1
Data stage scenario   design 2 - job1Data stage scenario   design 2 - job1
Data stage scenario design 2 - job1
 
Cost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systemsCost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systems
 
FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT Algorithm
 

Similar to Dex

DBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleDBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleEmbarcadero Technologies
 
high_level_parallel_processing_model
high_level_parallel_processing_modelhigh_level_parallel_processing_model
high_level_parallel_processing_modelMingliang Sun
 
Big dataappliance hadoopworld_final
Big dataappliance hadoopworld_finalBig dataappliance hadoopworld_final
Big dataappliance hadoopworld_finaljdijcks
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Cloudera, Inc.
 
Introduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformIntroduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformGruter
 
IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...
IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...
IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...Jaroslav Prodelal
 
Data managing and Exchange GDB
Data managing and Exchange GDB Data managing and Exchange GDB
Data managing and Exchange GDB Esri
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...EclipseDayParis
 
Data Migration and MDM - DMM5
Data Migration and MDM - DMM5Data Migration and MDM - DMM5
Data Migration and MDM - DMM5Wael Elrifai
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...Lucidworks (Archived)
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Robert Grossman
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACsqlserver.co.il
 
No sql and data scalability
No sql and data scalabilityNo sql and data scalability
No sql and data scalabilityRoger Xia
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
Using postgre sql for 3d cms
Using postgre sql for 3d cmsUsing postgre sql for 3d cms
Using postgre sql for 3d cmsTim Child
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsDataWorks Summit
 
Evolution of Esri Data Formats Seminar
Evolution of Esri Data Formats SeminarEvolution of Esri Data Formats Seminar
Evolution of Esri Data Formats SeminarEsri South Africa
 

Similar to Dex (20)

DBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleDBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin Module
 
Dexjava Technical Seminar Dec 2011
Dexjava Technical Seminar Dec 2011Dexjava Technical Seminar Dec 2011
Dexjava Technical Seminar Dec 2011
 
high_level_parallel_processing_model
high_level_parallel_processing_modelhigh_level_parallel_processing_model
high_level_parallel_processing_model
 
Big dataappliance hadoopworld_final
Big dataappliance hadoopworld_finalBig dataappliance hadoopworld_final
Big dataappliance hadoopworld_final
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
 
Introduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformIntroduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData Platform
 
IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...
IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...
IBM: Servery a datová úložiště vhodná pro virtualizaci a budování privátních ...
 
Data managing and Exchange GDB
Data managing and Exchange GDB Data managing and Exchange GDB
Data managing and Exchange GDB
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
 
Data Migration and MDM - DMM5
Data Migration and MDM - DMM5Data Migration and MDM - DMM5
Data Migration and MDM - DMM5
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DAC
 
No sql and data scalability
No sql and data scalabilityNo sql and data scalability
No sql and data scalability
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Using postgre sql for 3d cms
Using postgre sql for 3d cmsUsing postgre sql for 3d cms
Using postgre sql for 3d cms
 
Graph Theory and Databases
Graph Theory and DatabasesGraph Theory and Databases
Graph Theory and Databases
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analytics
 
Evolution of Esri Data Formats Seminar
Evolution of Esri Data Formats SeminarEvolution of Esri Data Formats Seminar
Evolution of Esri Data Formats Seminar
 

More from University of New South Wales (11)

Declarative analysis of noisy information networks
Declarative analysis of noisy information networksDeclarative analysis of noisy information networks
Declarative analysis of noisy information networks
 
InfiniteGraph
InfiniteGraphInfiniteGraph
InfiniteGraph
 
Gremlin
Gremlin Gremlin
Gremlin
 
DHHT - Modeling beyond plain graphs
DHHT - Modeling beyond plain graphsDHHT - Modeling beyond plain graphs
DHHT - Modeling beyond plain graphs
 
Ontological Conjunctive Query Answering over Large Knowledge Bases
Ontological Conjunctive Query Answering over Large Knowledge BasesOntological Conjunctive Query Answering over Large Knowledge Bases
Ontological Conjunctive Query Answering over Large Knowledge Bases
 
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudKey-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
 
Allegograph
AllegographAllegograph
Allegograph
 
Neo4j
Neo4jNeo4j
Neo4j
 
Dependable Cardinality Forecast for XQuery
Dependable Cardinality Forecast for XQueryDependable Cardinality Forecast for XQuery
Dependable Cardinality Forecast for XQuery
 
GraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query ProcessorGraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query Processor
 
XML Compression Benchmark
XML Compression BenchmarkXML Compression Benchmark
XML Compression Benchmark
 

Dex

  • 1. DEX: a High-Performance Graph Database Management System A High-Performance Graph Database Management System Authors: Norbert Martínez-Bazan Sergio Gómez-Villamor Francesc Escalé-Claveras
  • 2. DEX: a High-Performance Graph Database Management System Outline Nom e la presenatació o altra info (opcional)  Introduction  DEX  Logical graph model  Internal representation  Software architecture  Experimental results  Conclusions  Future work
  • 3. DEX: a High-Performance Graph Database Management System Introduction Nom e la presenatació o altra info (opcional)  [2006] DEX started by DAMA-UPC  [2010] Sparsity Technologies is a spin- out from DAMA-UPC  Sparsity comercializes and provides services  DAMA-UPC does development and research  DEX Versions  V2.0 March/2009  V3.0 October/2009  V4.0 November/2010
  • 4. DEX: a High-Performance Graph Database Management System DEX Nom e la presenatació o altra info (opcional)  DEX is a graph database:  Data and schema both are represented as a graph  Data operations are based on graph operations  Graph-based integrity restrictions Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)  Focus:  Management of very large graphs  High-performance on query operations
  • 5. DEX: a High-Performance Graph Database Management System Logical graph model Nom e la presenatació o altra info (opcional)  Labeled: nodes and edges are “typed”  Directed: edges can have a fixed direction  Attributed: nodes and edges can have multiple single-valued attributes  Multigraph: two nodes can be connected by multiple edges
  • 6. DEX: a High-Performance Graph Database Management System Internal representation Nom e la presenatació o altra info (opcional)  Requirements  Split the graph into smaller structures • Favour the caching • Move to main memory just significant parts  OIDs instead of objects • Reduce memory requirements  Specific structures to improve traversals • Index edges of a node  Attributes fully indexed • Improve queries based on value filters
  • 7. DEX: a High-Performance Graph Database Management System Internal representation Nom e la presenatació o altra info (opcional)  Our approach:  Map + Bitmaps  Link  Link: bidirectional association between values and OIDs  Two functionalities: • Given a value  a set of OIDs (a bitmap) • Given an OID  the value Bitmaps oid oids 1 2 3 4 5 a 1 value 1 1 0 0 0 1 b 2 2 a 1 2 3 c 3 3 b 0 1 1 1 2 3 4 4 c 4 0 0 0 1 5 5 Map Link
  • 8. DEX: a High-Performance Graph Database Management System Internal representation Nom e la presenatació o altra info (opcional) A Graph as a combination of Bitmaps:  1 Bitmap for each node or edge type  1 Link for each attribute  2 Links for each edge type:  Out-going and in-going edges N. Martínez-Bazán, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M. A. Sánchez-Martínez, and J. Larriba-Pey, Dex: high-performance exploration on large graphs for information retrieval. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (CIKM '07)
  • 9. DEX: a High-Performance Graph Database Management System Software architecture Nom e la presenatació o altra info (opcional)  DEXCORE  Complete C++ library • Storage and query  Linux / Windows / MacOSX  32 / 64 bits  JDEX  DEXCORE functionality provided as a Java library
  • 10. DEX: a High-Performance Graph Database Management System Software architecture Nom e la presenatació o altra info (opcional) DEXCORE:  IO  Segment: Logical space of pages  Pool: Groups of segments  Storage: I/O device  Cache: I/O management • Replacement policy  Data:  Paged out-of-core structures  Bitmaps, Maps, Links, …  Graph:  A combination of structures  DbGraph and RGraphs  DEX:  Database and Session management
  • 11. DEX: a High-Performance Graph Database Management System Software architecture Nom e la presenatació o altra info (opcional) Implementation details:  37-bit unsigned integer OIDs  + 137 billion objects per graph  Bitmaps are compressed  Clusters of 32 consecutive bits  Just existing clusters are stored  Groups of OIDs for each type  Higher density of consecutive bits into bitmaps  Maps are B+ trees  A compressed UTF-8 storage for UNICODE strings
  • 12. DEX: a High-Performance Graph Database Management System Experimental results Nom e la presenatació o altra info (opcional)  Load tests IMDB Wikipedia RMAT (sf=28) DB 2.4 GB 7.6 GB 83 GB Physical Mem 9 GB 9 GB 60 GB Load time 21 min 2h 6min 15h Nodes 13 M 19 M 230 M Edges 22 M 180 M 2147 M Values 48 M 283 M 230 M Insertions per sec 65 K 62 K 48 K
  • 13. DEX: a High-Performance Graph Database Management System Experimental results Nom e la presenatació o altra info (opcional)  Query tests  IMDB database (2.4 GB)  Queries: • A: full extraction of a movie, multiple 1-hop traversal [4K edges] • B: distance between two actors [8K edges] • C: extract all movies that match a given pattern [315K edges] In-memory 128 MB A 0.13 sec 0.13 sec B 1.52 sec 1.79 sec C 384 sec 385 sec
  • 14. DEX: a High-Performance Graph Database Management System Conclusions Nom e la presenatació o altra info (opcional)  We propose DEX, a high performance graph database querying system for labeled and directed attributed multigraphs  We propose a graph representation based on the intensive use of bitmaps  We perform an experimental performance analysis to show the ability of DEX to store and query very large graphs
  • 15. DEX: a High-Performance Graph Database Management System Future work Nom e la presenatació o altra info (opcional)  Trillions of objects  Transactional system  Distributed system  Query language  Query optimization  High-level graph operations  Pattern matching
  • 16. DEX: a High-Performance Graph Database Management System Questions? Nom e la presenatació o altra info (opcional) sgomez@sparsity-technologies.com Sparsity Technologies http://www.sparsity-technologies.com DAMA-UPC http://www.dama.upc.edu