Switching from the
 Relational to the
   Graph model

Luca Garulli –
Founder and CEO @NuvolaBase Ltd
Author of OrientDB Doc/Graph DB
                                                                     Nov 23rd 2012 in Oxford, UK
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License     Page 1
                                                                                                 www.orientechnologies.com
One of the main resistances of
RDBMS users to pass to a NoSQL product
          are related to the
      complexity of the model:

            Ok, NoSQL products are super for
                  BigData and BigScale
                         but...
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 2
...what about the model?



(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 3
What is the NoSQL answer
         about managing complex domains?


                      Key-Value stores ?
                        Column-Based ?
                     Document database ?
                       Graph database !
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 4
CAUTION!
               This presentation will not use a
                   social like domain with
                   the classic paradigm of
                       friend-of-friendN
                 where the graph databases
                 are already widely used...
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 5
...But rather we will explore how
   to think «graphically» with one of the
        most common domains in the
              enterprise world:

                   The old-classic CRM* domain

                    * today in 99% of the cases a RDBMS is used


(c) Luca Garulli      Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 6
Every developer knows
 the Relational Model,
  but who knows the
      Graph one?
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 7
Back to school:
       Graph Theory crash course




(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 8
Basic Graph

                                                                              All Your
                                                                               All Your
                                         Likes
                   Luca
                   Luca                                                         Base
                                                                                Base
                                                                             Conference
                                                                             Conference




(c) Luca Garulli     Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 9
Property Graph Model*
                                         Vertices are
                                          directed

               Luca
               Luca                                                          All Your Base
                                                                             All Your Base
                                           Likes
            name: Luca
             name: Luca                                                       Conference
                                                                              Conference
          surname: Garulli
           surname: Garulli                since: 2012
       company: NuvolaBase
        company: NuvolaBase                                                       date: Nov 23 2012
                                                                                   date: Nov 23 2012




 Vertices and Edges
 can have properties

                               * https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli       Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License     Page 10
Property Graph Model
                                        Likes
                                                    012
                                         since:
                                                2
                                                                              All Your
                                                                               All Your
                   Luca
                   Luca                                                         Base
                                                                                Base
                                      Speak                                  Conference
                                                                             Conference
                                            s
                                ti
                          abstra tle: «Switch
                                ct: «Th       in
                                       is talk g...»
                                              presen
                                                     ts...»
       An Edge connects 2
      vertices: use multiple
    vertices to represents 1-N
      and N-M relationships
(c) Luca Garulli     Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 11
Property Graph Model
                                  Studies                                             Oxford
                                                                                      Oxford
 Luca
 Luca
                      Likes                                                                         located

            FriendOf
                                                                                            All Your Base
                                                                                            All Your Base
                                                                                             Conference
                                                                                             Conference
               John
               John                   Organizes

(c) Luca Garulli      Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License    Page 12
Compliments, this is your diploma in
        «Graph Theory»




(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 13
Now go back
       to our domain:
          the CRM
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 14
Domain: the super minimal CRM

                     Customer
                     Customer                             Address
                                                          Address




Registry system
Order system


                      Order
                      Order                                      Stock
                                                                 Stock



  (c) Luca Garulli            Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 15
Domain: the super minimal CRM

                     Customer
                     Customer                             Address
                                                          Address




                                                                                   How does
                                                                                Relational DBMS
Registry system
                                                                              manage relationships?
Order system


                      Order
                      Order                                      Stock
                                                                 Stock



  (c) Luca Garulli            Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 16
Relational World: 1-1 Relationships
Primary key                                                         Primary key
                     Customer                                                               Address
         Id         Name          Address                                   Id                  Location
                                                    Foreign key
         10 Luca                34                                          34     Rome
         11 Mike                44                                          44     London
         34 John                54                                          54     Oxford
         56 Mark                66                                          66     New Mexico
         88 Steve               68                                          68     Palo Alto


                   JOIN Customer.Address -> Address.Id


(c) Luca Garulli        Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License        Page 17
Relational World: 1-N Relationships
                   Customer                                                                  Address
         Id           Name                                              Id       Customer                   Location
         10 Luca                                                       24             10          Rome
         11 Mike                                                       33             10          London
         34 John                                                       44             34          Oxford
         56 Mark                                                       66             56          Cologne
         88 Steve                                                      68             88          Palo Alto


    Inverse JOIN Address.Customer -> Customer.Id


(c) Luca Garulli              Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License        Page 18
Relational World: N-M Relationships
             Customer                             CustomerAddress                                        Address
          Id        Name                           Id           Address                            Id      Location
          10       Luca                            10      24                                      24     Rome
          11       Mike                            10      33                                      33     London
          34       John                            11      44                                      44     Oxford
          56       Mark                                                                            66     Cologne
          88       Steve                                                                           68     Palo Alto


                             Additional table with 2 JOINs
                      (1) CustomerAddress.Id -> Customer.Id and
                      (2) CustomerAddress.Address -> Address.Id
(c) Luca Garulli           Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License       Page 19
Relational World: N-M Relationships
             Customer                             CustomerAddress                                        Address
          Id        Name                           Id           Address                            Id      Location
          10       Luca                            10      24                                      24     Rome
          11       Mike                            10      33                                      33     London
          34       John                            11      44                                      44     Oxford
          56       Mark                                                                            66     Cologne
          88       Steve                                                                           68     Palo Alto


                             Additional table with 2 JOINs
                      (1) CustomerAddress.Id -> Customer.Id and
                      (2) CustomerAddress.Address -> Address.Id
(c) Luca Garulli           Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License       Page 20
What’s wrong with the
                     Relational Model?


(c) Luca Garulli     Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 21
The JOIN is the evil!
             Customer                             CustomerAddress                                        Address
          Id        Name                           Id           Address                            Id      Location
          10       Luca                            10      24                                      24     Rome
          11       Mike                            10      33                                      33     London
          34       John                            34      24                                      44     Oxford
          56       Mark                                                                            66     Cologne
          88       Steve                                                                           68     Palo Alto


                           These are all JOINs executed
                             everytime you traverse a
                                   relationship!
                                    relationship
(c) Luca Garulli           Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License       Page 22
A JOIN means searching for a key in
                  another table

   The first rule to improve performance
           is indexing all the keys

Index speeds up searches, but slows down
       insert, updates and deletes
 (c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 23
So in the best case a JOIN is a lookup
                into an index

                   This is done per single join!

If you traverse hundreds of relationships
    you’re executing hundreds of JOINs

(c) Luca Garulli     Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 24
Index Lookup
            is it really that fast?

(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 25
Index Lookup: how does it works?
                                                        A-Z

                                                  A-L         M-Z




                Think to an
               Address Book
           where we have to find
             the Luca’s phone
                  number


(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 26
Index Lookup: how does it works?
                                                        A-Z

                                                  A-L         M-Z


                              A-L                                              M-Z

                        A-D         E-L                                  M-R         S-Z



                                                          Index algorithms are all
                                                           similar and based on
                                                              balanced trees



(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 27
Index Lookup: how does it works?
                                                                           A-Z

                                                                     A-L         M-Z


                                           A-L                                               M-Z

                                     A-D         E-L                                   M-R         S-Z


                         A-D                                 E-L

                   A-B         C-D                     E-G         H-L




(c) Luca Garulli               Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 28
Index Lookup: how does it works?
                                                                           A-Z

                                                                     A-L         M-Z


                                           A-L                                                   M-Z

                                     A-D         E-L                                       M-R         S-Z


                         A-D                                 E-L

                   A-B         C-D                     E-G         H-L


                                           E-G                                 H-L

                                     E-F         G                       H-J         K-L




(c) Luca Garulli               Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 29
Index Lookup: how does it works?
                                                                           A-Z

                                                                     A-L         M-Z


                                           A-L                                               M-Z
                                                                                                    Found!
                                     A-D         E-L                                       M-R  S-Z
                                                                                              This lookup took 5
                         A-D                                 E-L                               steps and grows
                   A-B         C-D                     E-G         H-L
                                                                                               up with the index
                                           E-G                                 H-L                   size!
                                     E-F         G                       H-J         K-L


                                                                                  Luca


(c) Luca Garulli               Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 30
An index lookup is executed
                          for each JOIN

  Querying more tables can easily
produce millions of JOINs/Lookups!

      Here the rule: more entries
   = more lookup steps = slower JOIN
(c) Luca Garulli      Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 31
Oh! This is why
 performance of my database
       drops down when
      it becomes bigger,
          and bigger,
          and bigger!
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 32
Is there a better way to
               manage relationships?


(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 33
“A graph database is any
                storage system
                 that provides
            index-free adjacency”
                                                                    - Marko Rodriguez

(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 34
How does GraphDB manage
      index-free relationships?


(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 35
an Open Source (Apache licensed)
      document-graph NoSQL dbms
  supports: transactions, extended-SQL,
                   Multi-Master replication, etc
(c) Luca Garulli    Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 36
OrientDB: traverse a relationship
                                           The Record ID (RID)
                                         is the physical position



                   RID = #13:35
                   RID = #13:35                                                               RID = #13:100
                                                                                              RID = #13:100
                                                    RID = #14:54
                                                    RID = #14:54


                                                       Lives
                     Luca
                     Luca                                                                       Rome
                                                                                                Rome
                                                out: [#13:35]
                                                 out: [#13:35]
                                                in: [#13:100]
                                                 in: [#13:100]
           out : :[#14:54]                      Label : :‘Lives’
                                                 Label ‘Lives’                             in: [#14:54]
            out [#14:54]                                                                    in: [#14:54]
           label : :‘Customer’
            label ‘Customer’                                                               label = ‘Address’
                                                                                            label = ‘Address’
           name : :‘Luca’
            name ‘Luca’                                                                    name = ‘Rome’
                                                                                            name = ‘Rome’


(c) Luca Garulli           Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License     Page 37
GraphDB handles relationships as a
                       physical LINK to the record
                   assigned when the edge is created

                                     on the other side

                   RDBMS computes the
      relationship every time you query a database

                             Is not that crazy?!
(c) Luca Garulli      Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 38
This means jumping from a
                   O(log N) algorithm to a near O(1)

              traversing cost is not more affected
                       by database size!

                    This is huge in the BigData age

(c) Luca Garulli       Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 39
OrientDB in the Blueprints micro-benchmark,
           on common hw, with a hot cache,
                       traverses 29,6 Millions
             of records in less than 5 seconds

   about 6 Millions of nodes traversed per sec!
                   Do not try this at home
                       with a RDBMS*!


                                          *unless you live in the Google’s server farm
(c) Luca Garulli       Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 40
Create the graph in SQL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org)
Type 'help' to display all the commands supported.

orientdb> create vertex Customer set name = ‘Luca’
Created vertex #13:35 in 0.03 secs

orientdb> create vertex Address set name = ‘Rome’
Created vertex #13:100 in 0.02 secs

orientdb> create edge Lives from #13:35 to #13:100
Created edge #14:54 in 0.02 secs
(c) Luca Garulli     Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 41
Create the graph in Java
OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);

ODocument luca = graph.createVertex(“Customer");
luca.field(“name", “Luca");

ODocument rome = graph.createVertex(“Address”);
rome.field(“name", “Rome”);

ODocument edge = graph.createEdge(luca, rome, “Lives”);
edge.save();

graph.close();

(c) Luca Garulli     Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 42
Query the graph in SQL

orientdb> select in.out from Address where name = ‘Rome’
---+------+---------|--------------------+--------------------+--------+
  #| RID  |@class   |label               |out                 |in      |
---+------+---------+--------------------+--------------------+--------+
  0| 13:35|Customer |Luca                |[#14:54]            |        |
---+------+---------+--------------------+--------------------+--------+
1 item(s) found. Query executed in 0.007 sec(s).




                                                                                Incoming vertices


(c) Luca Garulli     Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 43
More on query power
orientdb> select sum( orders.total ) from Customer
                 where name = ‘Luca’


orientdb> traverse friend from Customer while $depth <= 7


orientdb> select from (
            traverse friend from Customer while $depth <= 7
           ) where city.name = ‘Oxford’


(c) Luca Garulli    Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 44
Query vs traversal

Once you’ve a well connected database
 in the form of a Super Graph you can
 cross records instead of query them!

           All you need is some root vertices
                where to start traversing
(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 45
Query vs traversal
    Special
     Special
                                                  Customers
                                                  Customers                                         Stocks
                                                                                                    Stocks
   Customers
   Customers




                   Luca
                   Luca                  John
                                         John                     Sylvia
                                                                  Sylvia
                                                                                                   White
                                                                                                   White
  This is a                                                                                        Soap
                                                                                                   Soap
root vertex                                          Order
                                                     Order                     Order
                                                                               Order
                                                     2332
                                                     2332                      8834
                                                                               8834

(c) Luca Garulli    Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 46
This is your database




(c) Luca Garulli    Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 47
Get last customer bought Whisky
                                      select last(orders.customers) from Stock
                                         where name = ‘Whisky’




(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 48
Get it’s country



select city.country from #34:22




(c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 49
Get orders from that country




                   select orders from #55:12




(c) Luca Garulli          Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 50
NuvolaBase.com

                                                                                  HTTP/REST
                   HTTP/REST




   The first Graph Database as a Service
                on the Cloud
(c) Luca Garulli    Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 51
Do we have enough time for a demo?




(c) Luca Garulli       Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 52
Questions & (maybe) Answers
                                    Luca Garulli
                                                                          CEO at

           Document-Graph NoSQL
             Open Source project
                                                                                      Ltd, London UK



                            www.twitter.com/lgarulli
(c) Luca Garulli    Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 53
Summary
      1)JOIN is heavy, specially on large databases

                       2)GraphDB uses LINK as
                      direct pointers to records:
                   times from O(log)N to near O(1)

 3) GraphDB has a query language specialized to
             traverse relationships
(c) Luca Garulli    Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 54
Let’s move like a
     Spider
   on the web




 (c) Luca Garulli   Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License   Page 55

Switching from relational to the graph model

  • 1.
    Switching from the Relational to the Graph model Luca Garulli – Founder and CEO @NuvolaBase Ltd Author of OrientDB Doc/Graph DB Nov 23rd 2012 in Oxford, UK (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 www.orientechnologies.com
  • 2.
    One of themain resistances of RDBMS users to pass to a NoSQL product are related to the complexity of the model: Ok, NoSQL products are super for BigData and BigScale but... (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
  • 3.
    ...what about themodel? (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
  • 4.
    What is theNoSQL answer about managing complex domains? Key-Value stores ? Column-Based ? Document database ? Graph database ! (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
  • 5.
    CAUTION! This presentation will not use a social like domain with the classic paradigm of friend-of-friendN where the graph databases are already widely used... (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
  • 6.
    ...But rather wewill explore how to think «graphically» with one of the most common domains in the enterprise world: The old-classic CRM* domain * today in 99% of the cases a RDBMS is used (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
  • 7.
    Every developer knows the Relational Model, but who knows the Graph one? (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
  • 8.
    Back to school: Graph Theory crash course (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
  • 9.
    Basic Graph All Your All Your Likes Luca Luca Base Base Conference Conference (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9
  • 10.
    Property Graph Model* Vertices are directed Luca Luca All Your Base All Your Base Likes name: Luca name: Luca Conference Conference surname: Garulli surname: Garulli since: 2012 company: NuvolaBase company: NuvolaBase date: Nov 23 2012 date: Nov 23 2012 Vertices and Edges can have properties * https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
  • 11.
    Property Graph Model Likes 012 since: 2 All Your All Your Luca Luca Base Base Speak Conference Conference s ti abstra tle: «Switch ct: «Th in is talk g...» presen ts...» An Edge connects 2 vertices: use multiple vertices to represents 1-N and N-M relationships (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
  • 12.
    Property Graph Model Studies Oxford Oxford Luca Luca Likes located FriendOf All Your Base All Your Base Conference Conference John John Organizes (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12
  • 13.
    Compliments, this isyour diploma in «Graph Theory» (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
  • 14.
    Now go back to our domain: the CRM (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
  • 15.
    Domain: the superminimal CRM Customer Customer Address Address Registry system Order system Order Order Stock Stock (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
  • 16.
    Domain: the superminimal CRM Customer Customer Address Address How does Relational DBMS Registry system manage relationships? Order system Order Order Stock Stock (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
  • 17.
    Relational World: 1-1Relationships Primary key Primary key Customer Address Id Name Address Id Location Foreign key 10 Luca 34 34 Rome 11 Mike 44 44 London 34 John 54 54 Oxford 56 Mark 66 66 New Mexico 88 Steve 68 68 Palo Alto JOIN Customer.Address -> Address.Id (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
  • 18.
    Relational World: 1-NRelationships Customer Address Id Name Id Customer Location 10 Luca 24 10 Rome 11 Mike 33 10 London 34 John 44 34 Oxford 56 Mark 66 56 Cologne 88 Steve 68 88 Palo Alto Inverse JOIN Address.Customer -> Customer.Id (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
  • 19.
    Relational World: N-MRelationships Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Mike 10 33 33 London 34 John 11 44 44 Oxford 56 Mark 66 Cologne 88 Steve 68 Palo Alto Additional table with 2 JOINs (1) CustomerAddress.Id -> Customer.Id and (2) CustomerAddress.Address -> Address.Id (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
  • 20.
    Relational World: N-MRelationships Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Mike 10 33 33 London 34 John 11 44 44 Oxford 56 Mark 66 Cologne 88 Steve 68 Palo Alto Additional table with 2 JOINs (1) CustomerAddress.Id -> Customer.Id and (2) CustomerAddress.Address -> Address.Id (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
  • 21.
    What’s wrong withthe Relational Model? (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
  • 22.
    The JOIN isthe evil! Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Mike 10 33 33 London 34 John 34 24 44 Oxford 56 Mark 66 Cologne 88 Steve 68 Palo Alto These are all JOINs executed everytime you traverse a relationship! relationship (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
  • 23.
    A JOIN meanssearching for a key in another table The first rule to improve performance is indexing all the keys Index speeds up searches, but slows down insert, updates and deletes (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
  • 24.
    So in thebest case a JOIN is a lookup into an index This is done per single join! If you traverse hundreds of relationships you’re executing hundreds of JOINs (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
  • 25.
    Index Lookup is it really that fast? (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
  • 26.
    Index Lookup: howdoes it works? A-Z A-L M-Z Think to an Address Book where we have to find the Luca’s phone number (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
  • 27.
    Index Lookup: howdoes it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z Index algorithms are all similar and based on balanced trees (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
  • 28.
    Index Lookup: howdoes it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z A-D E-L A-B C-D E-G H-L (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
  • 29.
    Index Lookup: howdoes it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z A-D E-L A-B C-D E-G H-L E-G H-L E-F G H-J K-L (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
  • 30.
    Index Lookup: howdoes it works? A-Z A-L M-Z A-L M-Z Found! A-D E-L M-R S-Z This lookup took 5 A-D E-L steps and grows A-B C-D E-G H-L up with the index E-G H-L size! E-F G H-J K-L Luca (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
  • 31.
    An index lookupis executed for each JOIN Querying more tables can easily produce millions of JOINs/Lookups! Here the rule: more entries = more lookup steps = slower JOIN (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
  • 32.
    Oh! This iswhy performance of my database drops down when it becomes bigger, and bigger, and bigger! (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
  • 33.
    Is there abetter way to manage relationships? (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
  • 34.
    “A graph databaseis any storage system that provides index-free adjacency” - Marko Rodriguez (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
  • 35.
    How does GraphDBmanage index-free relationships? (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
  • 36.
    an Open Source(Apache licensed) document-graph NoSQL dbms supports: transactions, extended-SQL, Multi-Master replication, etc (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
  • 37.
    OrientDB: traverse arelationship The Record ID (RID) is the physical position RID = #13:35 RID = #13:35 RID = #13:100 RID = #13:100 RID = #14:54 RID = #14:54 Lives Luca Luca Rome Rome out: [#13:35] out: [#13:35] in: [#13:100] in: [#13:100] out : :[#14:54] Label : :‘Lives’ Label ‘Lives’ in: [#14:54] out [#14:54] in: [#14:54] label : :‘Customer’ label ‘Customer’ label = ‘Address’ label = ‘Address’ name : :‘Luca’ name ‘Luca’ name = ‘Rome’ name = ‘Rome’ (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
  • 38.
    GraphDB handles relationshipsas a physical LINK to the record assigned when the edge is created on the other side RDBMS computes the relationship every time you query a database Is not that crazy?! (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
  • 39.
    This means jumpingfrom a O(log N) algorithm to a near O(1) traversing cost is not more affected by database size! This is huge in the BigData age (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
  • 40.
    OrientDB in theBlueprints micro-benchmark, on common hw, with a hot cache, traverses 29,6 Millions of records in less than 5 seconds about 6 Millions of nodes traversed per sec! Do not try this at home with a RDBMS*! *unless you live in the Google’s server farm (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
  • 41.
    Create the graphin SQL $luca> cd bin $luca> ./console.sh OrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org) Type 'help' to display all the commands supported. orientdb> create vertex Customer set name = ‘Luca’ Created vertex #13:35 in 0.03 secs orientdb> create vertex Address set name = ‘Rome’ Created vertex #13:100 in 0.02 secs orientdb> create edge Lives from #13:35 to #13:100 Created edge #14:54 in 0.02 secs (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
  • 42.
    Create the graphin Java OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”); ODocument luca = graph.createVertex(“Customer"); luca.field(“name", “Luca"); ODocument rome = graph.createVertex(“Address”); rome.field(“name", “Rome”); ODocument edge = graph.createEdge(luca, rome, “Lives”); edge.save(); graph.close(); (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
  • 43.
    Query the graphin SQL orientdb> select in.out from Address where name = ‘Rome’ ---+------+---------|--------------------+--------------------+--------+   #| RID  |@class   |label               |out                 |in      | ---+------+---------+--------------------+--------------------+--------+   0| 13:35|Customer |Luca                |[#14:54]            |        | ---+------+---------+--------------------+--------------------+--------+ 1 item(s) found. Query executed in 0.007 sec(s). Incoming vertices (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
  • 44.
    More on querypower orientdb> select sum( orders.total ) from Customer where name = ‘Luca’ orientdb> traverse friend from Customer while $depth <= 7 orientdb> select from ( traverse friend from Customer while $depth <= 7 ) where city.name = ‘Oxford’ (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
  • 45.
    Query vs traversal Onceyou’ve a well connected database in the form of a Super Graph you can cross records instead of query them! All you need is some root vertices where to start traversing (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
  • 46.
    Query vs traversal Special Special Customers Customers Stocks Stocks Customers Customers Luca Luca John John Sylvia Sylvia White White This is a Soap Soap root vertex Order Order Order Order 2332 2332 8834 8834 (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
  • 47.
    This is yourdatabase (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
  • 48.
    Get last customerbought Whisky select last(orders.customers) from Stock where name = ‘Whisky’ (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
  • 49.
    Get it’s country selectcity.country from #34:22 (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49
  • 50.
    Get orders fromthat country select orders from #55:12 (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 50
  • 51.
    NuvolaBase.com HTTP/REST HTTP/REST The first Graph Database as a Service on the Cloud (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 51
  • 52.
    Do we haveenough time for a demo? (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 52
  • 53.
    Questions & (maybe)Answers Luca Garulli CEO at Document-Graph NoSQL Open Source project Ltd, London UK www.twitter.com/lgarulli (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 53
  • 54.
    Summary 1)JOIN is heavy, specially on large databases 2)GraphDB uses LINK as direct pointers to records: times from O(log)N to near O(1) 3) GraphDB has a query language specialized to traverse relationships (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 54
  • 55.
    Let’s move likea Spider on the web (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 55