Graph databases, the Web of Data
        storage engines


        Pere Urbón Bayes
          Senior Software Engineer
                Independent

             purbon@purbon.com
                                     purbon.com

                                     in/purbon
          February of 2010
                                     @purbon
Graph databases, the Web of Data
        storage engines
●   We are going to talk about
       –   Graph databases, facts and definitions.
       –   Graph database vendors.
       –   Use cases and applications, graph theory.




                    The web of data storage engines - DataDevRoom - Fosdem 2011   2
Graph databases, the Web of Data
        storage engines

“A graph database is a database that uses graph
 structures with nodes, edges, and properties to
         represent and store information.

  General graph databases that can store any
   graph are distinct from specialized graph
  databases such as triple stores and network
                  databases.”
                                                                             Wikipedia


               The web of data storage engines - DataDevRoom - Fosdem 2011           3
Graph Database
                          Property graph

●   Abstractions
          –   Nodes
          –   Relationships
          –   Properties on both.
    John smith liked http://www.example.com at 01/10/11




                         The web of data storage engines - DataDevRoom - Fosdem 2011   4
Graph databases
                                         Facts

Connectivity
                                                                        Everything
                                                                        connected


                                            RDF       Ontologies
                                                          Linked Data
                                         Tagging
                                 Blogs                Folksonomies
                                     Social Networks


               Text files



                        1990's                  2010's                2020's              Decades

                            The web of data storage engines - DataDevRoom - Fosdem 2011             5
Graph databases
                                       Facts

Size of




                        1990's                2010's                2020's              Decades
     http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion
                          The web of data storage engines - DataDevRoom - Fosdem 2011             6
Graph databases
                                 Facts

Performance

          Lists


                                              Graph like structures
                                                  Semantic web
                                                Semantic reasoning
                                                   Linked data


                                         Performance slowdown



                                                                          Unstructured

                    The web of data storage engines - DataDevRoom - Fosdem 2011      7
Graph databases
                      Performance comparison

                                           Query             RDBMS              OIM      GraphDB
                                    Q1: count             20.38             17.35        0
                                    Q2: projection        17.34             43.7         33.19
                                    Q3: scan              32.76             174.64       3.14
                                    Q4: values            12.28             20.77        0.01
                                    Q5: select            7.34              5.43         0.84
                                    Q6: hubs              >3hours           >3hours      624.68



            RDBMS        OIM        GraphDB
  data     27.36 GB    54 GB        9.69 GB
overhead   10.9        21.51        3.86
  load     52891 s     17543 s      95579 s


                           The web of data storage engines - DataDevRoom - Fosdem 2011            8
Graph databases
                             Vendors

●   Neo4J (neo4j.org)

●   Embedded, disk-based, fully transactional
    Java persistence engine that stores data
    structured in graphs rather than in tables.
●   Dual-Licensed AGPL and Commercial.
●   High Availability, scalability, concurrent,etc.

                   The web of data storage engines - DataDevRoom - Fosdem 2011   9
Graph databases
                              Vendors

●   InfiniteGraph



●   A java distributed, scalable, with high
    performance results commercial graph
    database, provided with the experience of
    Objectivity Inc.
●   More info: http://www.infinitegraph.com/
                    The web of data storage engines - DataDevRoom - Fosdem 2011   10
Graph databases
                           Vendors

●   OrientDB

●   An embedded pure java fast, transactional,
    scalable document-graph storage engine.
●   Schema free, ACID, suport for SQL and JSON.
●   Apache License 2.0
●   More info: http://www.orientechnologies.com/

                 The web of data storage engines - DataDevRoom - Fosdem 2011   11
Graph databases
                     More Vendors

●   Dex: The high performance graph database.
●   HyperGraphDB: An IA and semantic web graph
    database.
●   Infogrid: The Internet graph database.
●   Sones: SaaS dot Net graph database.
●   AllegroGraph: The semantic graph database.
●   VertexDB: High performance database server.

                  The web of data storage engines - DataDevRoom - Fosdem 2011   12
Graph Theory
                            analytics

●   Clustering                            ●   Task planning
    (Communities)                         ●   Scheduling
●   Social connexions                     ●   Process assignation
●   Hubs                                  ●   Routing
●   Graph Mining                          ●   Logistics
●   Centrality measures                   ●   League planning


                   The web of data storage engines - DataDevRoom - Fosdem 2011   13
Graph Theory
                         Applications

●   Pattern Recognition
●   Dependency analysis
●   Impact analysis
●   Network flow
        –   Traffic analysis and optimization
        –   Delivery optimization
●   Optimization of tasks

                     The web of data storage engines - DataDevRoom - Fosdem 2011   14
Graph Like
                          Applications

●   Recommendations
       –   Heuristics (PageRank)
       –   Local
               ●   Shortest Paths
               ●   Hammock Functions
               ●   Walks
               ●   Search algorithms
               ●   Shooting stars
               ●   K-nearest neighbours

                      The web of data storage engines - DataDevRoom - Fosdem 2011   15
Graph Like
                      Applications




●   Location based services
●   Hubs
●   Spatial databases
●   Logical (multi-)index construction


                  The web of data storage engines - DataDevRoom - Fosdem 2011   16
Web
                       Trending Topics

●   Semantic web
        –   RDF (OWL) Store
        –   RDF-Sail
        –   SPARQL
●   Linked data (Open Data)
●   Link analysis
●   Structure mining

                       The web of data storage engines - DataDevRoom - Fosdem 2011   17
Graph databases
                                   Performance
                   HPC Scalable Graph Analysis Benchmark IWGD 2010

Kernel      DEX         Neo4j        Jena        HyperGraphDB
Scale 15
Load(s)     7,44        697          141         +24h
Scan (s)    0,0010      2,71         0,689
2-Hops(s)   0,0120      0,0260       0,443
BC (s)      14,8        8,24         138
Size (MB)   30          17           207

                                  Kernel            DEX             Neo4j         Jena       HyperGraph
                                  Scale 20                                                   DB
                                  Load(s)           317             32.094        4.560      +24h
                                  Scan (s)          0,005           751           18,6
                                  2-Hops(s)         0,033           0,0230        0,4580
                                  BC (s)            617             7027          59512
                                  Size (MB)         893             539           6656

                               The web of data storage engines - DataDevRoom - Fosdem 2011          18
Graph databases
        XI FOSDEM Dinner

Interested in Graph Databases and NoSQL,
        attending this year FOSDEM.

              Meeting point:
                 20:00 PM
      In front of Le Roy d'Espagne
              Grand Place 1
                  Brussels

           The web of data storage engines - DataDevRoom - Fosdem 2011   19
Graph databases
        Moviepilot is hiring


Interested in movies, data analytics, ruby, git,
            opensource. Join us!.

Moviepilot is a leading provider and discovery
 services for movies and TV series, based in
                     Berlin.

  Interested, talk with @jannis or @purbon
              The web of data storage engines - DataDevRoom - Fosdem 2011   20
Graph databases, the Web of Data
        storage engines

             Questions?

        Pere Urbón Bayes
          Senior Software Engineer
                Independent

                purbon@purbon.com


          February of 2010

          The web of data storage engines - DataDevRoom - Fosdem 2011   21

Graph Databases, The Web of Data Storage Engines

  • 1.
    Graph databases, theWeb of Data storage engines Pere Urbón Bayes Senior Software Engineer Independent purbon@purbon.com purbon.com in/purbon February of 2010 @purbon
  • 2.
    Graph databases, theWeb of Data storage engines ● We are going to talk about – Graph databases, facts and definitions. – Graph database vendors. – Use cases and applications, graph theory. The web of data storage engines - DataDevRoom - Fosdem 2011 2
  • 3.
    Graph databases, theWeb of Data storage engines “A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as triple stores and network databases.” Wikipedia The web of data storage engines - DataDevRoom - Fosdem 2011 3
  • 4.
    Graph Database Property graph ● Abstractions – Nodes – Relationships – Properties on both. John smith liked http://www.example.com at 01/10/11 The web of data storage engines - DataDevRoom - Fosdem 2011 4
  • 5.
    Graph databases Facts Connectivity Everything connected RDF Ontologies Linked Data Tagging Blogs Folksonomies Social Networks Text files 1990's 2010's 2020's Decades The web of data storage engines - DataDevRoom - Fosdem 2011 5
  • 6.
    Graph databases Facts Size of 1990's 2010's 2020's Decades http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion The web of data storage engines - DataDevRoom - Fosdem 2011 6
  • 7.
    Graph databases Facts Performance Lists Graph like structures Semantic web Semantic reasoning Linked data Performance slowdown Unstructured The web of data storage engines - DataDevRoom - Fosdem 2011 7
  • 8.
    Graph databases Performance comparison Query RDBMS OIM GraphDB Q1: count 20.38 17.35 0 Q2: projection 17.34 43.7 33.19 Q3: scan 32.76 174.64 3.14 Q4: values 12.28 20.77 0.01 Q5: select 7.34 5.43 0.84 Q6: hubs >3hours >3hours 624.68 RDBMS OIM GraphDB data 27.36 GB 54 GB 9.69 GB overhead 10.9 21.51 3.86 load 52891 s 17543 s 95579 s The web of data storage engines - DataDevRoom - Fosdem 2011 8
  • 9.
    Graph databases Vendors ● Neo4J (neo4j.org) ● Embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. ● Dual-Licensed AGPL and Commercial. ● High Availability, scalability, concurrent,etc. The web of data storage engines - DataDevRoom - Fosdem 2011 9
  • 10.
    Graph databases Vendors ● InfiniteGraph ● A java distributed, scalable, with high performance results commercial graph database, provided with the experience of Objectivity Inc. ● More info: http://www.infinitegraph.com/ The web of data storage engines - DataDevRoom - Fosdem 2011 10
  • 11.
    Graph databases Vendors ● OrientDB ● An embedded pure java fast, transactional, scalable document-graph storage engine. ● Schema free, ACID, suport for SQL and JSON. ● Apache License 2.0 ● More info: http://www.orientechnologies.com/ The web of data storage engines - DataDevRoom - Fosdem 2011 11
  • 12.
    Graph databases More Vendors ● Dex: The high performance graph database. ● HyperGraphDB: An IA and semantic web graph database. ● Infogrid: The Internet graph database. ● Sones: SaaS dot Net graph database. ● AllegroGraph: The semantic graph database. ● VertexDB: High performance database server. The web of data storage engines - DataDevRoom - Fosdem 2011 12
  • 13.
    Graph Theory analytics ● Clustering ● Task planning (Communities) ● Scheduling ● Social connexions ● Process assignation ● Hubs ● Routing ● Graph Mining ● Logistics ● Centrality measures ● League planning The web of data storage engines - DataDevRoom - Fosdem 2011 13
  • 14.
    Graph Theory Applications ● Pattern Recognition ● Dependency analysis ● Impact analysis ● Network flow – Traffic analysis and optimization – Delivery optimization ● Optimization of tasks The web of data storage engines - DataDevRoom - Fosdem 2011 14
  • 15.
    Graph Like Applications ● Recommendations – Heuristics (PageRank) – Local ● Shortest Paths ● Hammock Functions ● Walks ● Search algorithms ● Shooting stars ● K-nearest neighbours The web of data storage engines - DataDevRoom - Fosdem 2011 15
  • 16.
    Graph Like Applications ● Location based services ● Hubs ● Spatial databases ● Logical (multi-)index construction The web of data storage engines - DataDevRoom - Fosdem 2011 16
  • 17.
    Web Trending Topics ● Semantic web – RDF (OWL) Store – RDF-Sail – SPARQL ● Linked data (Open Data) ● Link analysis ● Structure mining The web of data storage engines - DataDevRoom - Fosdem 2011 17
  • 18.
    Graph databases Performance HPC Scalable Graph Analysis Benchmark IWGD 2010 Kernel DEX Neo4j Jena HyperGraphDB Scale 15 Load(s) 7,44 697 141 +24h Scan (s) 0,0010 2,71 0,689 2-Hops(s) 0,0120 0,0260 0,443 BC (s) 14,8 8,24 138 Size (MB) 30 17 207 Kernel DEX Neo4j Jena HyperGraph Scale 20 DB Load(s) 317 32.094 4.560 +24h Scan (s) 0,005 751 18,6 2-Hops(s) 0,033 0,0230 0,4580 BC (s) 617 7027 59512 Size (MB) 893 539 6656 The web of data storage engines - DataDevRoom - Fosdem 2011 18
  • 19.
    Graph databases XI FOSDEM Dinner Interested in Graph Databases and NoSQL, attending this year FOSDEM. Meeting point: 20:00 PM In front of Le Roy d'Espagne Grand Place 1 Brussels The web of data storage engines - DataDevRoom - Fosdem 2011 19
  • 20.
    Graph databases Moviepilot is hiring Interested in movies, data analytics, ruby, git, opensource. Join us!. Moviepilot is a leading provider and discovery services for movies and TV series, based in Berlin. Interested, talk with @jannis or @purbon The web of data storage engines - DataDevRoom - Fosdem 2011 20
  • 21.
    Graph databases, theWeb of Data storage engines Questions? Pere Urbón Bayes Senior Software Engineer Independent purbon@purbon.com February of 2010 The web of data storage engines - DataDevRoom - Fosdem 2011 21