Long journey of Ruby standard library at RubyConf AU 2024
Switching from Relational 2 Graph - CloudConf.it
1. Switching from the
Relational to the
Graph model
Luca Garulli –
Founder and CEO @NuvolaBase Ltd
Author of OrientDB Cloud Conference
Apr 18th 2013 in Turin, Italy
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1
www.orientechnologies.com
2. 1979
First Relational DBMS available as product
2009
NoSQL movement
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
3. 1979
First Relational DBMS available as product
Hey, 30 years in the
IT field is so huge!
2009
NoSQL movement
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
4. Before 2009 teams of developers
always fought to select:
Operative System
Programming Language
Middleware (App-Servers)
What about the Database?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
5. One of the main resistances of
RDBMS users to pass to a NoSQL product
are related to the
complexity of the model:
Ok, NoSQL products are super for
BigData and BigScale
but...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
6. ...what about the model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
7. What is the NoSQL answer
about managing complex domains?
Key-Value stores ?
Column-Based ?
Document database ?
Graph database !
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
8. CAUTION!
This presentation will not use a
social like domain with
the classic paradigm of
friend-of-friendN
where the graph databases
are already widely used...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
9. ...But rather we will explore how
to think «graphically» with one of the
most common domains in the
enterprise world:
The old-classic CRM* domain
* today in 99% of the cases a RDBMS is used
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9
10. Every developer knows
the Relational Model,
but who knows the
Graph one?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
11. Back to school:
Graph Theory crash course
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
13. Property Graph Model*
Vertices are
directed
Luca
Luca
Likes Cloud
Cloud
name: Luca
name: Luca
surname: Garulli
surname: Garulli since: 2013 Conference
Conference
company: NuvolaBase
company: NuvolaBase
date: Oct 1° 2012
date: Oct 1° 2012
Vertices and Edges
can have properties
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
14. Property Graph Model
Likes
2 013
since:
Cloud
Cloud
Luca
Luca
Speak Conference
Conference
s
ti
abstra tle: «Switch
ct: «Th in
is talk g...»
presen
ts...»
An Edge connects 2
vertices: use multiple edges
to represents 1-N and N-M
relationships
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
15. Property Graph Model
Studies Turin
Turin
Luca
Luca
Likes located
FriendOf
Cloud
Cloud
Conference
Conference
Walter
Walter Organizes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
16. Compliments, this is your diploma in
«Graph Theory»
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
17. Now go back
to our domain:
the CRM
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
18. Domain: the super minimal CRM
Customer
Customer Address
Address
Registry system
Order system
Order
Order Stock
Stock
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
19. Domain: the super minimal CRM
Customer
Customer Address
Address
How does
Relational DBMS
Registry system
manage relationships?
Order system
Order
Order Stock
Stock
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
20. Relational World: 1-1 Relationships
Primary key Primary key
Customer Address
Id Name Address Id Location
Foreign key
10 Luca 34 34 Rome
11 Jill 44 44 London
34 John 54 54 Moscow
56 Mark 66 66 New Mexico
88 Steve 68 68 Palo Alto
JOIN Customer.Address -> Address.Id
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
21. Relational World: 1-N Relationships
Customer Address
Id Name Id Customer Location
10 Luca 24 10 Rome
11 Jill 33 10 London
34 John 44 34 Moscow
56 Mark 66 56 Cologne
88 Steve 68 88 Palo Alto
Inverse JOIN Address.Customer -> Customer.Id
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
22. Relational World: N-M Relationships
Customer CustomerAddress Address
Id Name Id Address Id Location
10 Luca 10 24 24 Rome
11 Jill 10 33 33 London
34 John 34 44 44 Moscow
56 Mark 66 Cologne
88 Steve 68 Palo Alto
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
23. What’s wrong with the
Relational Model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
24. The JOIN is the evil!
Customer CustomerAddress Address
Id Name Id Address Id Location
10 Luca 10 24 24 Rome
11 Jill 10 33 33 London
34 John 34 24 44 Moscow
56 Mark 66 Cologne
88 Steve 68 Palo Alto
These are all JOINs executed
everytime you traverse a
relationship!
relationship
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
25. A JOIN means searching for a key in
another table
The first rule to improve performance
is indexing all the keys
Index speeds up searches, but slows down
insert, updates and deletes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
26. So in the best case a JOIN is a lookup
into an index
This is done per single join!
If you traverse hundreds of relationships
you’re executing hundreds of JOINs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
27. Index Lookup
is it really that fast?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
28. Index Lookup: how does it works?
A-Z
A-L M-Z
Think to an
Address Book
where we have to find
the Luca’s phone
number
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
29. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
A-D E-L M-R S-Z
Index algorithms are all
similar and based on
balanced trees
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
30. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
A-D E-L M-R S-Z
A-D E-L
A-B C-D E-G H-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
31. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
A-D E-L M-R S-Z
A-D E-L
A-B C-D E-G H-L
E-G H-L
E-F G H-J K-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
32. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
Found!
A-D E-L M-R S-Z
This lookup took 5
A-D E-L steps and grows
A-B C-D E-G H-L
up with the index
E-G H-L size!
E-F G H-J K-L
Luca
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
33. An index lookup is executed
for each JOIN
Querying more tables can easily
produce millions of JOINs/Lookups!
Here the rule: more entries
= more lookup steps = slower JOIN
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
34. Oh! This is why
performance of my database
drops down when
it becomes bigger,
and bigger,
and bigger!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
35. Is there a better way to
manage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
36. “A graph database is any
storage system
that provides
index-free adjacency”
- Marko Rodriguez
(author of TinkerPop Blueprints)
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
37. How does GraphDB manage
index-free relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
38. an Open Source (Apache licensed)
document-graph NoSQL dbms
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
39. Ø config
download, unzip, run!
cut & paste the db directory
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
40. 150,000 records per second
(flat records, no index, on commodity hw)
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
41. Schema-less
schema is not mandatory, relaxed model,
collect heterogeneous documents all together
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
42. Schema-full
schema with constraints on fields and validation rules
Customer.age > 17
Customer.address not null
Customer.surname is mandatory
Customer.email matches 'b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b'
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
43. Schema-mixed
schema with mandatory and optional fields + constraints
the best of schema-less and schema-full modes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
44. ACID Transactions
db.begin();
try{
// your code
...
db.commit();
} catch( Exception e ) {
db.rollback();
}
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
45. Complex types
native support for collections, maps (key/value)
and embedded documents
no more additional tables to handle them
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
46. SQL
select * from employee where name like '%Jay%' and status=0
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
47. runs
Java
everywhere is available JRE1.6+
®
robust engine
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
48. Language bindings
Java as native
JRuby, PHP, C, C++, Scala, .NET,
Ruby, Clojure, Node.js,
Python, Javascript and more!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
49. JPA (partial)
public class Customer {
@Id
private Object id;
private String name;
private String surname;
}
db.save( new Customer() );
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49
50. Born for the Internet
Supports natively HTTP/RESTful protocol
Documents are transferred in JSON
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 50
51. MVRB-Tree index
the best of B+Tree and RB-Tree
fast on browsing, low insertion cost
it's a new algorithm!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 51
52. Security
users and roles, encrypted passwords
fine grain privileges
(similar to what RDBMSs offer)
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 52
53. Cache
You can avoid using 3°party caches
like Memcached
2 Levels of cache:
Level1: Database level, 1 per thread
Level2: Storage level, 1 per JVM
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 53
55. Polymorphic SQL Query
OGraphVertex (V)
Person Vehicle
Address : Address brand : BRANDS
select * from Person
where city.name = 'Rome‘
Queries are polymorphics
Customer Provider and subclasses of Person can be
totSold : float totBuyed : float part of result set
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 55
56. Let’s go back
to the Graph Stuff
How does OrientDB
manage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 56
57. OrientDB: traverse a relationship
The Record ID (RID)
is the physical position
RID = #13:35
RID = #13:35 RID = #13:100
RID = #13:100
Luca
Luca Rome
Rome
label : :‘Customer’
label ‘Customer’ label = ‘Address’
label = ‘Address’
name : :‘Luca’
name ‘Luca’ name = ‘Rome’
name = ‘Rome’
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 57
58. OrientDB: traverse a relationship
The Edge’s RID is saved
inside both vertices, as
«out» and «in»
RID = #13:35
RID = #13:35 RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54
Lives
Luca
Luca Rome
Rome
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
out ::[#14:54] Label : :‘Lives’
Label ‘Lives’ in: [#14:54]
out [#14:54] in: [#14:54]
label : :‘Customer’
label ‘Customer’ label = ‘Address’
label = ‘Address’
name : :‘Luca’
name ‘Luca’ name = ‘Rome’
name = ‘Rome’
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 58
59. OrientDB: traverse a relationship
RID = #13:35
RID = #13:35 RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54
Lives
Luca
Luca Rome
Rome
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
out ::[#14:54] Label : :‘Lives’
Label ‘Lives’ in: [#14:54]
out [#14:54] in: [#14:54]
label : :‘Customer’
label ‘Customer’ label = ‘Address’
label = ‘Address’
name : :‘Luca’
name ‘Luca’ name = ‘Rome’
name = ‘Rome’
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 59
60. OrientDB: traverse a relationship
RID = #13:35
RID = #13:35 RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54
Lives
Luca
Luca Rome
Rome
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
out ::[#14:54] Label : :‘Lives’
Label ‘Lives’ in: [#14:54]
out [#14:54] in: [#14:54]
label : :‘Customer’
label ‘Customer’ label = ‘Address’
label = ‘Address’
name : :‘Luca’
name ‘Luca’ name = ‘Rome’
name = ‘Rome’
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 60
61. GraphDB handles relationships as a
physical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes the
relationship every time you query a database
Is not that crazy?!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 61
62. This means jumping from a
O(log N) algorithm to a near O(1)
traversing cost is not more affected
by database size!
This is huge in the BigData age
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 62
63. OrientDB in the Blueprints micro-benchmark,
on common hw, with a hot cache,
traverses 29,6 Millions
of records in less than 5 seconds
about 6 Millions of nodes traversed per sec!
Do not try this at home
with a RDBMS*!
*unless you live in the Google’s server farm
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 63
64. Create the graph in SQL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org)
Type 'help' to display all the commands supported.
orientdb> create vertex Customer set name = ‘Luca’
Created vertex #13:35 in 0.03 secs
orientdb> create vertex Address set name = ‘Rome’
Created vertex #13:100 in 0.02 secs
orientdb> create edge Lives from #13:35 to #13:100
Created edge #14:54 in 0.02 secs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 64
66. Query the graph in SQL
orientdb> select in_lives.out from Address where name = ‘Rome’
---+------+---------|--------------------+--------------------+--------+
#| RID |@class |label |out_lives |in |
---+------+---------+--------------------+--------------------+--------+
0| 13:35|Customer |Luca |[#14:54] | |
---+------+---------+--------------------+--------------------+--------+
1 item(s) found. Query executed in 0.007 sec(s).
Incoming vertices
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 66
67. More on query power
orientdb> select sum( out_Order.in.total ) from Customer
where name = ‘Luca’
orientdb> traverse out_Friend.in, in_Friend.out
from Customer while $depth <= 7
orientdb> select from (
traverse out_Friend.in, in_Friend.out
from Customer while $depth <= 7
) where @class=‘Customer’ and city.name = ‘Turin’
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 67
68. Query vs traversal
Once you’ve a well connected database
in the form of a Super Graph you can
cross records instead of query them!
All you need is some root vertices
where to start traversing
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 68
69. Query vs traversal
Special
Special
Customers
Customers Stocks
Stocks
Customers
Customers
Mar
Mar
Luca
Luca Jill
Jill
k
k
White
White
This is a Soap
Soap
root vertex Order
Order Order
Order
2332
2332 8834
8834
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 69
70. Temporal based graph
Year
Year
Calendar
Calendar 2013
2013
Month
Month
April 2013
April 2013
Day
Day
9/4/2013
9/4/2013
Hour
Hour Hour
Hour
9/4/2013
9/4/2013 9/4/2013
9/4/2013
09:00
09:00 10:00
10:00
Order
Order Order
Order Order
Order
2332
2332 2333
2333 2334
2334
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 70
71. Location based graph
Country
Country
Location
Location Italy
Italy
Region
Region
Lazio
Lazio
State
State
RM
RM
City
City City
City
Fiumicino
Fiumicino Rome
Rome
Order
Order Order
Order Order
Order
2332
2332 2333
2333 2334
2334
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 71
72. Mix & Merge graphs
Region
Region State
State
Lazio
Lazio RM
RM
Country
Country City City
City
City
Italy
Italy Fiumicino Rome
Rome
Fiumicino
Location
Location
Order
Order Order
Order Order
Order
2332
2332 2333
2333 2334
2334
Calendar
Calendar
Hour
Hour Hour
Hour
9/4/2013
9/4/2013 9/4/2013
9/4/2013
Year
Year 09:00
09:00 10:00
10:00
2013
2013 Month
Month
April 2013
April 2013
Day
Day
9/4/2013
9/4/2013
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 72
73. This is your database
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 73
74. Get last customer bought ‘Barolo’
select last(out_Order.in.out_Customer.in]) from Stock
where name = ‘Barolo’
#34:22
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 74
75. Get his’s country
select out_City.in from #34:22
Turin, Italy
#55:12
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 75
76. Get orders from that country
select in_Customer.out from #55:12
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 76
77. NuvolaBase.com
HTTP/REST
HTTP/REST
The first Graph Database as a Service
on the Cloud
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 77
78. Do we have enough time for live demo?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 78
79. (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 79
80. Questions & (maybe) Answers
Luca Garulli
CEO at
Document-Graph NoSQL
Open Source project
Ltd, London UK
www.twitter.com/lgarulli
Conclusions at the end ->
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 80
81. Summary
1)JOIN is heavy, specially on large databases
2)GraphDB uses LINK as direct pointers to records:
times from O(log)N to near O(1)
= ready for the BigData
3) GraphDB has a query language specialized to
traverse relationships
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 81
82. Let’s move like a
Spider
on the web
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 82
Editor's Notes
Good afternoon! Today I’d like to show you a new way to design a database. In 1970 Relational DBMS