Graph ( Theory and Databases )


       Pere Urbón Bayes
         Senior Software Engineer
               Independent

            purbon@purbon.com
                                    purbon.com

                                    in/purbon
         December of 2010
                                    @purbon
Graph (Theory and Databases)

●   Graph Theory                           ●    Graph Databases
       –   Definitions                            –   Definitions
       –   Applications                           –   Facts
       –   Analytics
                                                  –   Performance
                                                  –   Vendors




                          Graph ( Theory and Databases )            2
Graph
                         Definitions

●   Graph G(V,E) where V = {v1,v2,...,vN) and E =
    {E1,E2,...,EN)
       –   Directed / Undirected
       –   Mixed
       –   Multigraph
       –   Weighted
       –   ....


                        Graph ( Theory and Databases )   3
Graph
                       Definitions

●   Directed graphs
●   Vertex
●   Edges
●   From V(N) to V(M)




                      Graph ( Theory and Databases )   4
Graph
                   Definitions

    Multigraph                          Labelling
●   More than one edge             ●    The process of
    between two nodes.                  assigning a label to a
●   Loops, edges                        vertex and edges.
    between the same
    node.



                  Graph ( Theory and Databases )                 5
Graph Theory
                    Applications

●   Task planning
●   Scheduling
●   Process assignation
●   Routing
●   Logistics
●   League planning


                    Graph ( Theory and Databases )   6
Graph Theory
                      Applications

●   Pattern Recognition
●   Dependency analysis
●   Impact analysis
●   Network flow
    –   Traffic analysis and optimization
    –   Delivery optimization
●   Optimization of tasks

                      Graph ( Theory and Databases )   7
Graph Theory
                      analytics

●   Clustering (Communities)
●   Social connexions
●   Hubs
●   Graph Mining
●   Centrality measures




                   Graph ( Theory and Databases )   8
Graph Like
                         Applications

●   Recommendations
       –   Heuristics (PageRank)
       –   Local
               ●   Shortest Paths
               ●   Hammock Functions
               ●   Walks
               ●   Search algorithms
               ●   Shooting stars
               ●   K-nearest neighbours

                          Graph ( Theory and Databases )   9
Graph Like
                    Applications

●   Location based services
●   Hubs
●   Spatial databases
●   Logical (multi-)index construction




                     Graph ( Theory and Databases )   10
Web
                       Trending Topics

●   Semantic web
        –   RDF (OWL) Store
        –   RDF-Sail
        –   SPARQL
●   Linked data (Open Data)
●   Link analysis
●   Structure mining

                         Graph ( Theory and Databases )   11
Graph databases

“A graph database is a database that uses graph
 structures with nodes, edges, and properties to
         represent and store information.

  General graph databases that can store any
   graph are distinct from specialized graph
  databases such as triple stores and network
                  databases.”
                                                  Wikipedia


                 Graph ( Theory and Databases )          12
Graph databases
                          Property graph

●   Abstractions
          –   Nodes
          –   Relationships
          –   Properties on both.
    John smith liked http://www.example.com at 01/10/11




                             Graph ( Theory and Databases )   13
Graph databases
                                    Facts

Connectivity
                                                              Everything
                                                              connected


                                      RDF       Ontologies
                                                     Linked Data
                                    Tagging
                            Blogs                Folksonomies
                                Social Networks


               Text files



                        1990's             2010's            2020's        Decades

                            Graph ( Theory and Databases )                       14
Graph databases
                                  Facts

Size of




                     1990's               2010's            2020's   Decades

                           Graph ( Theory and Databases )                      15
          http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion
Graph databases
                            Facts

Performance

          Lists


                                          Graph like structures
                                              Semantic web
                                            Semantic reasoning
                                               Linked data


                                    Performance slowdown



                                                                  Unstructured

                     Graph ( Theory and Databases )                         16
Graph databases
                              Performance
Kernel      DEX      Neo4j     Jena         HyperGraphDB
Scale 15
Load(s)     7,44     697       141          +24h
Scan (s)    0,0010   2,71      0,689
2-Hops(s)   0,0120   0,0260    0,443
BC (s)      14,8     8,24      138
Size (MB)   30       17        207

                              Kernel          DEX               Neo4j    Jena     HyperGraph
                              Scale 20                                            DB
                              Load(s)         317               32.094   4.560    +24h
                              Scan (s)        0,005             751      18,6
                              2-Hops(s)       0,033             0,0230   0,4580
                              BC (s)          617               7027     59512
                              Size (MB)       893               539      6656

                               Graph ( Theory and Databases )                            17
HPC Scalable Graph Analysis Benchmark IWGD 2010
Graph databases
                        Vendors

●   Neo4J: Open source database NoSQL graph.
●   Dex: The high performance graph database.
●   HyperGraphDB: An IA and semantic web graph
    database.
●   Infogrid: The Internet Graph database.
●   Sones: SaaS dot Net Graph database.
●   VertexDB: High performance database server.

                    Graph ( Theory and Databases )   18
Graph ( Theory and Databases )



              Thanks!
           purbon@purbon.com


         December of 2010



           Graph ( Theory and Databases )   19

Graph Theory and Databases

  • 1.
    Graph ( Theoryand Databases ) Pere Urbón Bayes Senior Software Engineer Independent purbon@purbon.com purbon.com in/purbon December of 2010 @purbon
  • 2.
    Graph (Theory andDatabases) ● Graph Theory ● Graph Databases – Definitions – Definitions – Applications – Facts – Analytics – Performance – Vendors Graph ( Theory and Databases ) 2
  • 3.
    Graph Definitions ● Graph G(V,E) where V = {v1,v2,...,vN) and E = {E1,E2,...,EN) – Directed / Undirected – Mixed – Multigraph – Weighted – .... Graph ( Theory and Databases ) 3
  • 4.
    Graph Definitions ● Directed graphs ● Vertex ● Edges ● From V(N) to V(M) Graph ( Theory and Databases ) 4
  • 5.
    Graph Definitions Multigraph Labelling ● More than one edge ● The process of between two nodes. assigning a label to a ● Loops, edges vertex and edges. between the same node. Graph ( Theory and Databases ) 5
  • 6.
    Graph Theory Applications ● Task planning ● Scheduling ● Process assignation ● Routing ● Logistics ● League planning Graph ( Theory and Databases ) 6
  • 7.
    Graph Theory Applications ● Pattern Recognition ● Dependency analysis ● Impact analysis ● Network flow – Traffic analysis and optimization – Delivery optimization ● Optimization of tasks Graph ( Theory and Databases ) 7
  • 8.
    Graph Theory analytics ● Clustering (Communities) ● Social connexions ● Hubs ● Graph Mining ● Centrality measures Graph ( Theory and Databases ) 8
  • 9.
    Graph Like Applications ● Recommendations – Heuristics (PageRank) – Local ● Shortest Paths ● Hammock Functions ● Walks ● Search algorithms ● Shooting stars ● K-nearest neighbours Graph ( Theory and Databases ) 9
  • 10.
    Graph Like Applications ● Location based services ● Hubs ● Spatial databases ● Logical (multi-)index construction Graph ( Theory and Databases ) 10
  • 11.
    Web Trending Topics ● Semantic web – RDF (OWL) Store – RDF-Sail – SPARQL ● Linked data (Open Data) ● Link analysis ● Structure mining Graph ( Theory and Databases ) 11
  • 12.
    Graph databases “A graphdatabase is a database that uses graph structures with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as triple stores and network databases.” Wikipedia Graph ( Theory and Databases ) 12
  • 13.
    Graph databases Property graph ● Abstractions – Nodes – Relationships – Properties on both. John smith liked http://www.example.com at 01/10/11 Graph ( Theory and Databases ) 13
  • 14.
    Graph databases Facts Connectivity Everything connected RDF Ontologies Linked Data Tagging Blogs Folksonomies Social Networks Text files 1990's 2010's 2020's Decades Graph ( Theory and Databases ) 14
  • 15.
    Graph databases Facts Size of 1990's 2010's 2020's Decades Graph ( Theory and Databases ) 15 http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion
  • 16.
    Graph databases Facts Performance Lists Graph like structures Semantic web Semantic reasoning Linked data Performance slowdown Unstructured Graph ( Theory and Databases ) 16
  • 17.
    Graph databases Performance Kernel DEX Neo4j Jena HyperGraphDB Scale 15 Load(s) 7,44 697 141 +24h Scan (s) 0,0010 2,71 0,689 2-Hops(s) 0,0120 0,0260 0,443 BC (s) 14,8 8,24 138 Size (MB) 30 17 207 Kernel DEX Neo4j Jena HyperGraph Scale 20 DB Load(s) 317 32.094 4.560 +24h Scan (s) 0,005 751 18,6 2-Hops(s) 0,033 0,0230 0,4580 BC (s) 617 7027 59512 Size (MB) 893 539 6656 Graph ( Theory and Databases ) 17 HPC Scalable Graph Analysis Benchmark IWGD 2010
  • 18.
    Graph databases Vendors ● Neo4J: Open source database NoSQL graph. ● Dex: The high performance graph database. ● HyperGraphDB: An IA and semantic web graph database. ● Infogrid: The Internet Graph database. ● Sones: SaaS dot Net Graph database. ● VertexDB: High performance database server. Graph ( Theory and Databases ) 18
  • 19.
    Graph ( Theoryand Databases ) Thanks! purbon@purbon.com December of 2010 Graph ( Theory and Databases ) 19