Switching from relational to the graph model

Switching from the
Relational to the
Graph model

Luca Garulli –
Founder and CEO @NuvolaBase Ltd
Author of OrientDB Doc/Graph DB
Nov 23rd 2012 in Oxford, UK
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1
www.orientechnologies.com

One of the main resistances of
RDBMS users to pass to a NoSQL product
are related to the
complexity of the model:

Ok, NoSQL products are super for
BigData and BigScale
but...

...what about the model?


What is the NoSQL answer
about managing complex domains?

Key-Value stores ?
Column-Based ?
Document database ?
Graph database !

CAUTION!
This presentation will not use a
social like domain with
the classic paradigm of
friend-of-friendN
where the graph databases
are already widely used...

...But rather we will explore how
to think «graphically» with one of the
most common domains in the
enterprise world:

The old-classic CRM* domain

* today in 99% of the cases a RDBMS is used


Every developer knows
the Relational Model,
but who knows the
Graph one?

Back to school:
Graph Theory crash course


Basic Graph

All Your
All Your
Likes
Luca
Luca Base
Base
Conference
Conference


Property Graph Model*
Vertices are
directed

Luca
Luca All Your Base
All Your Base
Likes
name: Luca
name: Luca Conference
Conference
surname: Garulli
surname: Garulli since: 2012
company: NuvolaBase
company: NuvolaBase date: Nov 23 2012
date: Nov 23 2012

Vertices and Edges
can have properties

* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

Property Graph Model
Likes
012
since:
2
All Your
All Your
Luca
Luca Base
Base
Speak Conference
Conference
s
ti
abstra tle: «Switch
ct: «Th in
is talk g...»
presen
ts...»
An Edge connects 2
vertices: use multiple
vertices to represents 1-N
and N-M relationships

Property Graph Model
Studies Oxford
Oxford
Luca
Luca
Likes located

FriendOf
All Your Base
All Your Base
Conference
Conference
John
John Organizes


Compliments, this is your diploma in
«Graph Theory»


Now go back
to our domain:
the CRM

Domain: the super minimal CRM

Customer
Customer Address
Address

Registry system
Order system

Order
Order Stock
Stock


Domain: the super minimal CRM

Customer
Customer Address
Address

How does
Relational DBMS
Registry system
manage relationships?
Order system

Order
Order Stock
Stock


Relational World: 1-1 Relationships
Primary key Primary key
Customer Address
Id Name Address Id Location
Foreign key
10 Luca 34 34 Rome
11 Mike 44 44 London
34 John 54 54 Oxford
56 Mark 66 66 New Mexico
88 Steve 68 68 Palo Alto

JOIN Customer.Address -> Address.Id


Relational World: 1-N Relationships
Customer Address
Id Name Id Customer Location
10 Luca 24 10 Rome
11 Mike 33 10 London
34 John 44 34 Oxford
56 Mark 66 56 Cologne
88 Steve 68 88 Palo Alto

Inverse JOIN Address.Customer -> Customer.Id


Relational World: N-M Relationships
Customer CustomerAddress Address
Id Name Id Address Id Location
10 Luca 10 24 24 Rome
11 Mike 10 33 33 London
34 John 11 44 44 Oxford
56 Mark 66 Cologne
88 Steve 68 Palo Alto

Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id

Relational World: N-M Relationships
10 Luca 10 24 24 Rome
56 Mark 66 Cologne

Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id

What’s wrong with the
Relational Model?


The JOIN is the evil!
10 Luca 10 24 24 Rome
56 Mark 66 Cologne

These are all JOINs executed
everytime you traverse a
relationship!
relationship

A JOIN means searching for a key in
another table

The first rule to improve performance
is indexing all the keys

Index speeds up searches, but slows down
insert, updates and deletes

So in the best case a JOIN is a lookup
into an index

This is done per single join!

If you traverse hundreds of relationships
you’re executing hundreds of JOINs


Index Lookup
is it really that fast?


Index Lookup: how does it works?
A-Z

A-L M-Z

Think to an
Address Book
where we have to find
the Luca’s phone
number


A-Z

A-L M-Z

A-L M-Z

A-D E-L M-R S-Z

Index algorithms are all
similar and based on
balanced trees


A-Z

A-L M-Z

A-L M-Z

A-D E-L M-R S-Z

A-D E-L

A-B C-D E-G H-L


A-Z

A-L M-Z

A-L M-Z

A-D E-L M-R S-Z

A-D E-L

A-B C-D E-G H-L

E-G H-L

E-F G H-J K-L


A-Z

A-L M-Z

A-L M-Z
Found!
A-D E-L M-R S-Z
This lookup took 5
A-D E-L steps and grows
A-B C-D E-G H-L
up with the index
E-G H-L size!
E-F G H-J K-L

Luca


An index lookup is executed
for each JOIN

Querying more tables can easily
produce millions of JOINs/Lookups!

Here the rule: more entries
= more lookup steps = slower JOIN

Oh! This is why
performance of my database
drops down when
it becomes bigger,
and bigger,
and bigger!

Is there a better way to
manage relationships?


“A graph database is any
storage system
that provides
index-free adjacency”
- Marko Rodriguez


How does GraphDB manage
index-free relationships?


an Open Source (Apache licensed)
document-graph NoSQL dbms
supports: transactions, extended-SQL,
Multi-Master replication, etc

OrientDB: traverse a relationship
The Record ID (RID)
is the physical position

RID = #13:35
RID = #13:35 RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54

Lives
Luca
Luca Rome
Rome
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
out : :[#14:54] Label : :‘Lives’
Label ‘Lives’ in: [#14:54]
out [#14:54] in: [#14:54]
label : :‘Customer’
label ‘Customer’ label = ‘Address’
label = ‘Address’
name : :‘Luca’
name ‘Luca’ name = ‘Rome’
name = ‘Rome’


GraphDB handles relationships as a
physical LINK to the record
assigned when the edge is created

on the other side

RDBMS computes the
relationship every time you query a database

Is not that crazy?!

This means jumping from a
O(log N) algorithm to a near O(1)

traversing cost is not more affected
by database size!

This is huge in the BigData age


OrientDB in the Blueprints micro-benchmark,
on common hw, with a hot cache,
traverses 29,6 Millions
of records in less than 5 seconds

about 6 Millions of nodes traversed per sec!
Do not try this at home
with a RDBMS*!

*unless you live in the Google’s server farm

Create the graph in SQL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org)
Type 'help' to display all the commands supported.

orientdb> create vertex Customer set name = ‘Luca’
Created vertex #13:35 in 0.03 secs

orientdb> create vertex Address set name = ‘Rome’
Created vertex #13:100 in 0.02 secs

orientdb> create edge Lives from #13:35 to #13:100
Created edge #14:54 in 0.02 secs

Create the graph in Java
OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);

ODocument luca = graph.createVertex(“Customer");
luca.field(“name", “Luca");

ODocument rome = graph.createVertex(“Address”);
rome.field(“name", “Rome”);

ODocument edge = graph.createEdge(luca, rome, “Lives”);
edge.save();

graph.close();


Query the graph in SQL

orientdb> select in.out from Address where name = ‘Rome’
---+------+---------|--------------------+--------------------+--------+
#| RID |@class |label |out |in |
---+------+---------+--------------------+--------------------+--------+
0| 13:35|Customer |Luca |[#14:54] | |
---+------+---------+--------------------+--------------------+--------+
1 item(s) found. Query executed in 0.007 sec(s).

Incoming vertices


More on query power
orientdb> select sum( orders.total ) from Customer
where name = ‘Luca’

orientdb> traverse friend from Customer while $depth <= 7

orientdb> select from (
traverse friend from Customer while $depth <= 7
) where city.name = ‘Oxford’


Query vs traversal

Once you’ve a well connected database
in the form of a Super Graph you can
cross records instead of query them!

All you need is some root vertices
where to start traversing

Query vs traversal
Special
Special
Customers
Customers Stocks
Stocks
Customers
Customers

Luca
Luca John
John Sylvia
Sylvia
White
White
This is a Soap
Soap
root vertex Order
Order Order
Order
2332
2332 8834
8834


This is your database


Get last customer bought Whisky
select last(orders.customers) from Stock
where name = ‘Whisky’


Get it’s country

select city.country from #34:22


Get orders from that country

select orders from #55:12


NuvolaBase.com

HTTP/REST
HTTP/REST

The first Graph Database as a Service
on the Cloud

Do we have enough time for a demo?


Questions & (maybe) Answers
Luca Garulli
CEO at

Document-Graph NoSQL
Open Source project
Ltd, London UK

www.twitter.com/lgarulli

Summary
1)JOIN is heavy, specially on large databases

2)GraphDB uses LINK as
direct pointers to records:
times from O(log)N to near O(1)

3) GraphDB has a query language specialized to
traverse relationships

Let’s move like a
Spider
on the web


Switching from relational to the graph model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Switching from relational to the graph model

Similar to Switching from relational to the graph model (10)

More from Luca Garulli

More from Luca Garulli (13)

Recently uploaded

Recently uploaded (20)

Switching from relational to the graph model