DEX: a High-Performance Graph Database Management System




                                                           A High-Performance Graph
                                                           Database Management
                                                           System


                                                           Authors:
                                                           Norbert Martínez-Bazan
                                                           Sergio Gómez-Villamor
                                                           Francesc Escalé-Claveras
DEX: a High-Performance Graph Database Management System


                                                           Outline
Nom e la presenatació o altra info (opcional)




                                                              Introduction
                                                              DEX
                                                                  Logical graph model
                                                                  Internal representation
                                                              Software architecture
                                                              Experimental results
                                                              Conclusions
                                                              Future work
DEX: a High-Performance Graph Database Management System


                                                           Introduction
Nom e la presenatació o altra info (opcional)




                                                              [2006] DEX started by DAMA-UPC
                                                              [2010] Sparsity Technologies is a spin-
                                                               out from DAMA-UPC
                                                                   Sparsity comercializes and provides
                                                                    services
                                                                   DAMA-UPC does development and
                                                                    research

                                                              DEX Versions
                                                                   V2.0   March/2009
                                                                   V3.0   October/2009
                                                                   V4.0   November/2010
DEX: a High-Performance Graph Database Management System


                                                           DEX
Nom e la presenatació o altra info (opcional)




                                                              DEX is a graph database:
                                                                  Data and schema both are represented as a graph
                                                                  Data operations are based on graph operations
                                                                  Graph-based integrity restrictions

                                                               Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database
                                                                 models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)


                                                              Focus:
                                                                  Management of very large graphs
                                                                  High-performance on query operations
DEX: a High-Performance Graph Database Management System


                                                               Logical graph model
Nom e la presenatació o altra info (opcional)




                                                              Labeled: nodes and edges are “typed”
                                                              Directed: edges can have a fixed direction
                                                              Attributed: nodes and edges can have multiple single-valued attributes
                                                              Multigraph: two nodes can be connected by multiple edges
DEX: a High-Performance Graph Database Management System


                                                           Internal representation
Nom e la presenatació o altra info (opcional)




                                                              Requirements

                                                                  Split the graph into smaller structures
                                                                    • Favour the caching
                                                                    • Move to main memory just significant parts

                                                                  OIDs instead of objects
                                                                    • Reduce memory requirements

                                                                  Specific structures to improve traversals
                                                                    • Index edges of a node

                                                                  Attributes fully indexed
                                                                    • Improve queries based on value filters
DEX: a High-Performance Graph Database Management System


                                                               Internal representation
Nom e la presenatació o altra info (opcional)




                                                              Our approach:
                                                                      Map + Bitmaps  Link
                                                              Link: bidirectional association between values and OIDs
                                                                      Two functionalities:
                                                                        • Given a value  a set of OIDs (a bitmap)
                                                                        • Given an OID  the value
                                                                                                                            Bitmaps

                                                                                                        oid                           oids
                                                                                                                              1   2    3     4   5
                                                               a                              1                     value
                                                                                                         1                    1 0 0 0 1
                                                               b                              2          2            a       1   2    3

                                                               c                              3          3            b       0 1 1
                                                                                                                              1   2    3     4
                                                                                                         4            c
                                                                                              4                               0 0 0 1
                                                                                                         5
                                                                                              5
                                                                                                              Map
                                                                                                                     Link
DEX: a High-Performance Graph Database Management System


                                                           Internal representation
Nom e la presenatació o altra info (opcional)




                                                           A Graph as a combination of Bitmaps:
                                                            1 Bitmap for each node or edge type
                                                            1 Link for each attribute
                                                            2 Links for each edge type:
                                                                   Out-going and in-going edges


                                                           N. Martínez-Bazán, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M. A.
                                                               Sánchez-Martínez, and J. Larriba-Pey, Dex: high-performance exploration
                                                               on large graphs for information retrieval. In Proceedings of the sixteenth
                                                               ACM conference on Conference on information and knowledge
                                                               management (CIKM '07)
DEX: a High-Performance Graph Database Management System


                                                           Software architecture
Nom e la presenatació o altra info (opcional)




                                                              DEXCORE
                                                                  Complete C++ library
                                                                    • Storage and query
                                                                  Linux / Windows /
                                                                   MacOSX
                                                                  32 / 64 bits
                                                              JDEX
                                                                  DEXCORE functionality
                                                                   provided as a
                                                                   Java library
DEX: a High-Performance Graph Database Management System


                                                               Software architecture
Nom e la presenatació o altra info (opcional)




                                                           DEXCORE:
                                                            IO
                                                                   Segment: Logical space of pages
                                                                   Pool: Groups of segments
                                                                   Storage: I/O device
                                                                   Cache: I/O management
                                                                     • Replacement policy
                                                               Data:
                                                                   Paged out-of-core structures
                                                                   Bitmaps, Maps, Links, …
                                                               Graph:
                                                                   A combination of structures
                                                                   DbGraph and RGraphs
                                                               DEX:
                                                                   Database and Session management
DEX: a High-Performance Graph Database Management System


                                                           Software architecture
Nom e la presenatació o altra info (opcional)




                                                           Implementation details:
                                                            37-bit unsigned integer OIDs
                                                                  + 137 billion objects per graph
                                                              Bitmaps are compressed
                                                                  Clusters of 32 consecutive bits
                                                                  Just existing clusters are stored
                                                              Groups of OIDs for each type
                                                                  Higher density of consecutive bits into bitmaps
                                                              Maps are B+ trees
                                                                  A compressed UTF-8 storage for UNICODE strings
DEX: a High-Performance Graph Database Management System


                                                            Experimental results
Nom e la presenatació o altra info (opcional)




                                                               Load tests

                                                                                IMDB          Wikipedia     RMAT (sf=28)
                                                           DB                     2.4 GB           7.6 GB          83 GB
                                                           Physical Mem                9 GB          9 GB          60 GB
                                                           Load time               21 min         2h 6min            15h
                                                           Nodes                       13 M          19 M          230 M
                                                           Edges                       22 M         180 M         2147 M
                                                           Values                      48 M         283 M          230 M
                                                           Insertions per sec          65 K          62 K            48 K
DEX: a High-Performance Graph Database Management System


                                                               Experimental results
Nom e la presenatació o altra info (opcional)




                                                              Query tests
                                                                   IMDB database (2.4 GB)
                                                                   Queries:
                                                                     • A: full extraction of a movie, multiple 1-hop traversal [4K edges]
                                                                     • B: distance between two actors [8K edges]
                                                                     • C: extract all movies that match a given pattern [315K edges]

                                                                               In-memory             128 MB
                                                                       A             0.13 sec             0.13 sec
                                                                       B             1.52 sec             1.79 sec

                                                                       C              384 sec             385 sec
DEX: a High-Performance Graph Database Management System


                                                               Conclusions
Nom e la presenatació o altra info (opcional)




                                                              We propose DEX, a high performance graph database
                                                               querying system for labeled and directed attributed
                                                               multigraphs

                                                              We propose a graph representation based on the
                                                               intensive use of bitmaps

                                                              We perform an experimental performance analysis to
                                                               show the ability of DEX to store and query very large
                                                               graphs
DEX: a High-Performance Graph Database Management System


                                                               Future work
Nom e la presenatació o altra info (opcional)




                                                              Trillions of objects
                                                              Transactional system
                                                              Distributed system
                                                              Query language
                                                                   Query optimization
                                                              High-level graph operations
                                                                   Pattern matching
DEX: a High-Performance Graph Database Management System


                                                           Questions?
Nom e la presenatació o altra info (opcional)




                                                                         sgomez@sparsity-technologies.com

                                                              Sparsity Technologies http://www.sparsity-technologies.com

                                                                        DAMA-UPC http://www.dama.upc.edu

Dex

  • 1.
    DEX: a High-PerformanceGraph Database Management System A High-Performance Graph Database Management System Authors: Norbert Martínez-Bazan Sergio Gómez-Villamor Francesc Escalé-Claveras
  • 2.
    DEX: a High-PerformanceGraph Database Management System Outline Nom e la presenatació o altra info (opcional)  Introduction  DEX  Logical graph model  Internal representation  Software architecture  Experimental results  Conclusions  Future work
  • 3.
    DEX: a High-PerformanceGraph Database Management System Introduction Nom e la presenatació o altra info (opcional)  [2006] DEX started by DAMA-UPC  [2010] Sparsity Technologies is a spin- out from DAMA-UPC  Sparsity comercializes and provides services  DAMA-UPC does development and research  DEX Versions  V2.0 March/2009  V3.0 October/2009  V4.0 November/2010
  • 4.
    DEX: a High-PerformanceGraph Database Management System DEX Nom e la presenatació o altra info (opcional)  DEX is a graph database:  Data and schema both are represented as a graph  Data operations are based on graph operations  Graph-based integrity restrictions Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)  Focus:  Management of very large graphs  High-performance on query operations
  • 5.
    DEX: a High-PerformanceGraph Database Management System Logical graph model Nom e la presenatació o altra info (opcional)  Labeled: nodes and edges are “typed”  Directed: edges can have a fixed direction  Attributed: nodes and edges can have multiple single-valued attributes  Multigraph: two nodes can be connected by multiple edges
  • 6.
    DEX: a High-PerformanceGraph Database Management System Internal representation Nom e la presenatació o altra info (opcional)  Requirements  Split the graph into smaller structures • Favour the caching • Move to main memory just significant parts  OIDs instead of objects • Reduce memory requirements  Specific structures to improve traversals • Index edges of a node  Attributes fully indexed • Improve queries based on value filters
  • 7.
    DEX: a High-PerformanceGraph Database Management System Internal representation Nom e la presenatació o altra info (opcional)  Our approach:  Map + Bitmaps  Link  Link: bidirectional association between values and OIDs  Two functionalities: • Given a value  a set of OIDs (a bitmap) • Given an OID  the value Bitmaps oid oids 1 2 3 4 5 a 1 value 1 1 0 0 0 1 b 2 2 a 1 2 3 c 3 3 b 0 1 1 1 2 3 4 4 c 4 0 0 0 1 5 5 Map Link
  • 8.
    DEX: a High-PerformanceGraph Database Management System Internal representation Nom e la presenatació o altra info (opcional) A Graph as a combination of Bitmaps:  1 Bitmap for each node or edge type  1 Link for each attribute  2 Links for each edge type:  Out-going and in-going edges N. Martínez-Bazán, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M. A. Sánchez-Martínez, and J. Larriba-Pey, Dex: high-performance exploration on large graphs for information retrieval. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (CIKM '07)
  • 9.
    DEX: a High-PerformanceGraph Database Management System Software architecture Nom e la presenatació o altra info (opcional)  DEXCORE  Complete C++ library • Storage and query  Linux / Windows / MacOSX  32 / 64 bits  JDEX  DEXCORE functionality provided as a Java library
  • 10.
    DEX: a High-PerformanceGraph Database Management System Software architecture Nom e la presenatació o altra info (opcional) DEXCORE:  IO  Segment: Logical space of pages  Pool: Groups of segments  Storage: I/O device  Cache: I/O management • Replacement policy  Data:  Paged out-of-core structures  Bitmaps, Maps, Links, …  Graph:  A combination of structures  DbGraph and RGraphs  DEX:  Database and Session management
  • 11.
    DEX: a High-PerformanceGraph Database Management System Software architecture Nom e la presenatació o altra info (opcional) Implementation details:  37-bit unsigned integer OIDs  + 137 billion objects per graph  Bitmaps are compressed  Clusters of 32 consecutive bits  Just existing clusters are stored  Groups of OIDs for each type  Higher density of consecutive bits into bitmaps  Maps are B+ trees  A compressed UTF-8 storage for UNICODE strings
  • 12.
    DEX: a High-PerformanceGraph Database Management System Experimental results Nom e la presenatació o altra info (opcional)  Load tests IMDB Wikipedia RMAT (sf=28) DB 2.4 GB 7.6 GB 83 GB Physical Mem 9 GB 9 GB 60 GB Load time 21 min 2h 6min 15h Nodes 13 M 19 M 230 M Edges 22 M 180 M 2147 M Values 48 M 283 M 230 M Insertions per sec 65 K 62 K 48 K
  • 13.
    DEX: a High-PerformanceGraph Database Management System Experimental results Nom e la presenatació o altra info (opcional)  Query tests  IMDB database (2.4 GB)  Queries: • A: full extraction of a movie, multiple 1-hop traversal [4K edges] • B: distance between two actors [8K edges] • C: extract all movies that match a given pattern [315K edges] In-memory 128 MB A 0.13 sec 0.13 sec B 1.52 sec 1.79 sec C 384 sec 385 sec
  • 14.
    DEX: a High-PerformanceGraph Database Management System Conclusions Nom e la presenatació o altra info (opcional)  We propose DEX, a high performance graph database querying system for labeled and directed attributed multigraphs  We propose a graph representation based on the intensive use of bitmaps  We perform an experimental performance analysis to show the ability of DEX to store and query very large graphs
  • 15.
    DEX: a High-PerformanceGraph Database Management System Future work Nom e la presenatació o altra info (opcional)  Trillions of objects  Transactional system  Distributed system  Query language  Query optimization  High-level graph operations  Pattern matching
  • 16.
    DEX: a High-PerformanceGraph Database Management System Questions? Nom e la presenatació o altra info (opcional) sgomez@sparsity-technologies.com Sparsity Technologies http://www.sparsity-technologies.com DAMA-UPC http://www.dama.upc.edu