0
On Graph Databases



  Pere Urbón Bayes
         purbon@purbon.com


        May of 2010



   BcnOnRails May - 2010 - On...
On Graph Databases

●   NoSQL movement.
●   Graph databases.
●   Pros and cons.
●   Use cases.
●   Technology overview.
● ...
NoSQL Movement

●   Next Generation of Databases.
●   Innovative. (?)
●   Open Source. (?)
●   Non-Relational.
●   Schema-...
NoSQL Movement

●   Stores.                                  ●   More Stores.
        –   Document.                       ...
NoSQL Movement

●   NoSQL is not the holy grail, never forget it.
●   Precursors & roots begun at the early 70's.
        ...
Graph Databases

●   Data strongly related.
        –   Social networks.
        –   GIS Systems.
        –   Transportati...
Graph Databases

●   The Property Graph.
        –   Labeled.
        –   Directed.
        –   Attributed.
        –   Mu...
Graph Databases

●   Graph storage.
       –   Adjacency Matrix.
       –   Adjacency List.
       –   Incidence Matrix.
 ...
Graph Databases

                                                        Query           MySQL      OIM      DEX

        ...
Graph Databases




  BcnOnRails May - 2010 - On Graph Databases   10
Use cases

●   Network analysis.
●   Link analysis.
●   Graph mining.
●   Neural networks.
●   Bibliographic search.
●   S...
Use cases

●   Algorithmic recruitment with GitHub.
       –   Centrality: The importance of a vertex within a
           ...
Graph Databases

●   Shortest Paths.                            ●   Centrality.
        –   BFS/DFS.                      ...
Pros and cons

●   Data facts.                             ●   Relational model facts.
        –   Growths                ...
Technology overview

●   Neo4J: Open source database NoSQL graph.
●   Dex: The high performance graph database.
●   HyperG...
Benchmarking

Kernel      Scale 15   DEX        Neo4j        Jena      HypergraphDB
K1 Load (s)               7,44        ...
Technology overview




    BcnOnRails May - 2010 - On Graph Databases   17
Technology overview
●   Neo4J.rb ( JRuby target )
        –   Active record integration.
        –   Dynamic and schema fr...
Technology overview
    Creating nodes                                Properties
 require "rubygems"                      ...
Technology overview
 Accessing relationships
node1.rels.empty? # => false

# The rels method returns an enumeration of rel...
Example



For the joy of someone, lets play a little with a
                graph database.




               BcnOnRails...
On Graph Databases




    Thanks you!
    Pere Urbón Bayes
         purbon@purbon.com




   BcnOnRails May - 2010 - On G...
Upcoming SlideShare
Loading in...5
×

Bcn On Rails May2010 On Graph Databases

3,354

Published on

Short introduction to graph databases at Bcn On Rails May 2010.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,354
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
85
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Bcn On Rails May2010 On Graph Databases"

  1. 1. On Graph Databases Pere Urbón Bayes purbon@purbon.com May of 2010 BcnOnRails May - 2010 - On Graph Databases 1
  2. 2. On Graph Databases ● NoSQL movement. ● Graph databases. ● Pros and cons. ● Use cases. ● Technology overview. ● Example. BcnOnRails May - 2010 - On Graph Databases 2
  3. 3. NoSQL Movement ● Next Generation of Databases. ● Innovative. (?) ● Open Source. (?) ● Non-Relational. ● Schema-less. ● Distributed. ● Scalable. BcnOnRails May - 2010 - On Graph Databases 3
  4. 4. NoSQL Movement ● Stores. ● More Stores. – Document. – Grid database. – Key/Value. – XML Database. – Object oriented. – RDF. – Column. – ..... – Graph database. BcnOnRails May - 2010 - On Graph Databases 4
  5. 5. NoSQL Movement ● NoSQL is not the holy grail, never forget it. ● Precursors & roots begun at the early 70's. – Network databases, Charles Bachman 1969. 案ずるより産むが易し。 – Giving birth to a baby is easier than worrying about it. BcnOnRails May - 2010 - On Graph Databases 5
  6. 6. Graph Databases ● Data strongly related. – Social networks. – GIS Systems. – Transportation. – Bibliographic. – File systems. – ........ GitHub Ruby community by country BcnOnRails May - 2010 - On Graph Databases 6
  7. 7. Graph Databases ● The Property Graph. – Labeled. – Directed. – Attributed. – Multigraph. ● Talk about. – Nodes with types. – Edges with types. – Attributes. BcnOnRails May - 2010 - On Graph Databases 7
  8. 8. Graph Databases ● Graph storage. – Adjacency Matrix. – Adjacency List. – Incidence Matrix. – Incidence List. ● GraphDB's. – Bitmaps. – B+Trees. – RB Trees. BcnOnRails May - 2010 - On Graph Databases 8
  9. 9. Graph Databases Query MySQL OIM DEX Q1:count 20,38 17,35 0 RDBMS OIM DEX Q2:scan 32,76 174,64 3,14 data 27.36 GB 54 GB 9.69 GB Q3:select 7,34 5,43 0,84 Q4:projection 17,34 43,7 33,19 ratio 10,9 21,51 3,86 overhead Q5:combine 0,74 2,61 0,01 load time 52891 s 17543 s 95579 s Q6:explode 0,07 202,07 0,01 Q7:values 12,28 20,77 0,01 Q8:hub >3hours >3hours 624,68 BcnOnRails May - 2010 - On Graph Databases 9
  10. 10. Graph Databases BcnOnRails May - 2010 - On Graph Databases 10
  11. 11. Use cases ● Network analysis. ● Link analysis. ● Graph mining. ● Neural networks. ● Bibliographic search. ● Semantic web. BcnOnRails May - 2010 - On Graph Databases 11
  12. 12. Use cases ● Algorithmic recruitment with GitHub. – Centrality: The importance of a vertex within a graph. ● Betweens: Vertex that occur on many shortest path have higher centrality. – O(v^3) without any optimization. ● Another possible choices: – Closeness: Vertex with a short geodesic distance to other ones have a high closeness. ● Usually preferred on network analysis. BcnOnRails May - 2010 - On Graph Databases 12
  13. 13. Graph Databases ● Shortest Paths. ● Centrality. – BFS/DFS. – Betweenness. – Dijkstra. – Closeness. – Floyd-Warshall. – Diameter. – Ford. – Radius. ● Connectivity. ● Traversals. – Strongly connected. – BFS/DFS. – Weakly connected. ● Communities. ● Staining. BcnOnRails May - 2010 - On Graph Databases 13
  14. 14. Pros and cons ● Data facts. ● Relational model facts. – Growths – E.F Codd model. exponentially. – Normalization. – Hugh – Object-Relational interdependency impedance and complexity. mismatch. – Relationships are – Join's doesn't scale. important. – Big tables. – Structure change over time. – Denormalization. BcnOnRails May - 2010 - On Graph Databases 14
  15. 15. Technology overview ● Neo4J: Open source database NoSQL graph. ● Dex: The high performance graph database. ● HyperGraphDB: An IA and semantic web graph database. ● Infogrid: The Internet Graph database. ● Sones: SaaS dot Net Graph database. ● VertexDB: High performance database server. BcnOnRails May - 2010 - On Graph Databases 15
  16. 16. Benchmarking Kernel Scale 15 DEX Neo4j Jena HypergraphDB K1 Load (s) 7,44 697 141 +24h K2 Scan edges (s) 0,0010 2,71 0,689 K3 2-hops (s) 0,0120 0,0260 0,443 Kernel DEX Neo4j Jena Hypergr K4 BC (s) 14,8 8,24 138 aphDB Scale 20 Db size (MB) 30 17 207 K1 Load (s) 317 32.094 4.560 +24h K2 Scan 0,005 751 18,6 Graph Database Performance on the edges (s) HPC Scalable Graph Analysis Benchmark K3 2-hops (s) 0,033 0,0230 0,4580 K4 BC (s) 617 7.027 59.512 Db size (MB) 893 539 6.656 BcnOnRails May - 2010 - On Graph Databases 16
  17. 17. Technology overview BcnOnRails May - 2010 - On Graph Databases 17
  18. 18. Technology overview ● Neo4J.rb ( JRuby target ) – Active record integration. – Dynamic and schema free. – Fast traversal of relationships. – Transactions with rollbacks support. – Indexing and querying of ruby objects. – Massive loaders. http://wiki.neo4j.org/content/Ruby – Ruby on Rails integration. – Accessible throw REST. BcnOnRails May - 2010 - On Graph Databases 18
  19. 19. Technology overview Creating nodes Properties require "rubygems" node = Neo4j::Node.new require 'neo4j' node[:name] = 'foo' node[:age] = 123 Neo4j::Transaction.run do node[:hungry] = false node = Neo4j::Node.new node[4] = 3.14 end node[:age] # => 123 Transactions over blocks Creating relationships Neo4j::Transaction.run do node1 = Neo4j::Node.new # neo4j operations goes here node2 = Neo4j::Node.new end Neo4j::Relationship.new(:friends, node1, node2) # which is same as node1.rels.outgoing(:friends) << node2 BcnOnRails May - 2010 - On Graph Databases 19
  20. 20. Technology overview Accessing relationships node1.rels.empty? # => false # The rels method returns an enumeration of relationship objects. # The nodes method on the relationships returns the nodes instead. node1.rels.nodes.include?(node2) # => true node1.rels.first # => the first relationship this node1 has. node1.rels.nodes.first # => node2 first node of any relationship type node2.rels.incoming(:friends).nodes.first # => node1 first node of relationship type 'friends' node2.rels.incoming(:friends).first # => a relationship object between node1 and node2 Properties on Relationships rel = node1.rels.outgoing(:friends).first rel[:since] = 1982 node1.rels.first[:since] # => 1982 BcnOnRails May - 2010 - On Graph Databases 20
  21. 21. Example For the joy of someone, lets play a little with a graph database. BcnOnRails May - 2010 - On Graph Databases 21
  22. 22. On Graph Databases Thanks you! Pere Urbón Bayes purbon@purbon.com BcnOnRails May - 2010 - On Graph Databases 22
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×