Switching from relational to the graph model

19,558 views

Published on

Presentation in "All Your Base" Conference http://allyourbaseconf.com/

Published in: Technology
4 Comments
65 Likes
Statistics
Notes
No Downloads
Views
Total views
19,558
On SlideShare
0
From Embeds
0
Number of Embeds
2,097
Actions
Shares
0
Downloads
0
Comments
4
Likes
65
Embeds 0
No embeds

No notes for slide

Switching from relational to the graph model

  1. Switching from the Relational to the Graph modelLuca Garulli –Founder and CEO @NuvolaBase LtdAuthor of OrientDB Doc/Graph DB Nov 23rd 2012 in Oxford, UK(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 www.orientechnologies.com
  2. One of the main resistances ofRDBMS users to pass to a NoSQL product are related to the complexity of the model: Ok, NoSQL products are super for BigData and BigScale but...(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
  3. ...what about the model?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
  4. What is the NoSQL answer about managing complex domains? Key-Value stores ? Column-Based ? Document database ? Graph database !(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
  5. CAUTION! This presentation will not use a social like domain with the classic paradigm of friend-of-friendN where the graph databases are already widely used...(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
  6. ...But rather we will explore how to think «graphically» with one of the most common domains in the enterprise world: The old-classic CRM* domain * today in 99% of the cases a RDBMS is used(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
  7. Every developer knows the Relational Model, but who knows the Graph one?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
  8. Back to school: Graph Theory crash course(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
  9. Basic Graph All Your All Your Likes Luca Luca Base Base Conference Conference(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9
  10. Property Graph Model* Vertices are directed Luca Luca All Your Base All Your Base Likes name: Luca name: Luca Conference Conference surname: Garulli surname: Garulli since: 2012 company: NuvolaBase company: NuvolaBase date: Nov 23 2012 date: Nov 23 2012 Vertices and Edges can have properties * https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
  11. Property Graph Model Likes 012 since: 2 All Your All Your Luca Luca Base Base Speak Conference Conference s ti abstra tle: «Switch ct: «Th in is talk g...» presen ts...» An Edge connects 2 vertices: use multiple vertices to represents 1-N and N-M relationships(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
  12. Property Graph Model Studies Oxford Oxford Luca Luca Likes located FriendOf All Your Base All Your Base Conference Conference John John Organizes(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12
  13. Compliments, this is your diploma in «Graph Theory»(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
  14. Now go back to our domain: the CRM(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
  15. Domain: the super minimal CRM Customer Customer Address AddressRegistry systemOrder system Order Order Stock Stock (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
  16. Domain: the super minimal CRM Customer Customer Address Address How does Relational DBMSRegistry system manage relationships?Order system Order Order Stock Stock (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
  17. Relational World: 1-1 RelationshipsPrimary key Primary key Customer Address Id Name Address Id Location Foreign key 10 Luca 34 34 Rome 11 Mike 44 44 London 34 John 54 54 Oxford 56 Mark 66 66 New Mexico 88 Steve 68 68 Palo Alto JOIN Customer.Address -> Address.Id(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
  18. Relational World: 1-N Relationships Customer Address Id Name Id Customer Location 10 Luca 24 10 Rome 11 Mike 33 10 London 34 John 44 34 Oxford 56 Mark 66 56 Cologne 88 Steve 68 88 Palo Alto Inverse JOIN Address.Customer -> Customer.Id(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
  19. Relational World: N-M Relationships Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Mike 10 33 33 London 34 John 11 44 44 Oxford 56 Mark 66 Cologne 88 Steve 68 Palo Alto Additional table with 2 JOINs (1) CustomerAddress.Id -> Customer.Id and (2) CustomerAddress.Address -> Address.Id(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
  20. Relational World: N-M Relationships Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Mike 10 33 33 London 34 John 11 44 44 Oxford 56 Mark 66 Cologne 88 Steve 68 Palo Alto Additional table with 2 JOINs (1) CustomerAddress.Id -> Customer.Id and (2) CustomerAddress.Address -> Address.Id(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
  21. What’s wrong with the Relational Model?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
  22. The JOIN is the evil! Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Mike 10 33 33 London 34 John 34 24 44 Oxford 56 Mark 66 Cologne 88 Steve 68 Palo Alto These are all JOINs executed everytime you traverse a relationship! relationship(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
  23. A JOIN means searching for a key in another table The first rule to improve performance is indexing all the keysIndex speeds up searches, but slows down insert, updates and deletes (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
  24. So in the best case a JOIN is a lookup into an index This is done per single join!If you traverse hundreds of relationships you’re executing hundreds of JOINs(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
  25. Index Lookup is it really that fast?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
  26. Index Lookup: how does it works? A-Z A-L M-Z Think to an Address Book where we have to find the Luca’s phone number(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
  27. Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z Index algorithms are all similar and based on balanced trees(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
  28. Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z A-D E-L A-B C-D E-G H-L(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
  29. Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z A-D E-L A-B C-D E-G H-L E-G H-L E-F G H-J K-L(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
  30. Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z Found! A-D E-L M-R S-Z This lookup took 5 A-D E-L steps and grows A-B C-D E-G H-L up with the index E-G H-L size! E-F G H-J K-L Luca(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
  31. An index lookup is executed for each JOIN Querying more tables can easilyproduce millions of JOINs/Lookups! Here the rule: more entries = more lookup steps = slower JOIN(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
  32. Oh! This is why performance of my database drops down when it becomes bigger, and bigger, and bigger!(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
  33. Is there a better way to manage relationships?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
  34. “A graph database is any storage system that provides index-free adjacency” - Marko Rodriguez(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
  35. How does GraphDB manage index-free relationships?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
  36. an Open Source (Apache licensed) document-graph NoSQL dbms supports: transactions, extended-SQL, Multi-Master replication, etc(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
  37. OrientDB: traverse a relationship The Record ID (RID) is the physical position RID = #13:35 RID = #13:35 RID = #13:100 RID = #13:100 RID = #14:54 RID = #14:54 Lives Luca Luca Rome Rome out: [#13:35] out: [#13:35] in: [#13:100] in: [#13:100] out : :[#14:54] Label : :‘Lives’ Label ‘Lives’ in: [#14:54] out [#14:54] in: [#14:54] label : :‘Customer’ label ‘Customer’ label = ‘Address’ label = ‘Address’ name : :‘Luca’ name ‘Luca’ name = ‘Rome’ name = ‘Rome’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
  38. GraphDB handles relationships as a physical LINK to the record assigned when the edge is created on the other side RDBMS computes the relationship every time you query a database Is not that crazy?!(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
  39. This means jumping from a O(log N) algorithm to a near O(1) traversing cost is not more affected by database size! This is huge in the BigData age(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
  40. OrientDB in the Blueprints micro-benchmark, on common hw, with a hot cache, traverses 29,6 Millions of records in less than 5 seconds about 6 Millions of nodes traversed per sec! Do not try this at home with a RDBMS*! *unless you live in the Google’s server farm(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
  41. Create the graph in SQL$luca> cd bin$luca> ./console.shOrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org)Type help to display all the commands supported.orientdb> create vertex Customer set name = ‘Luca’Created vertex #13:35 in 0.03 secsorientdb> create vertex Address set name = ‘Rome’Created vertex #13:100 in 0.02 secsorientdb> create edge Lives from #13:35 to #13:100Created edge #14:54 in 0.02 secs(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
  42. Create the graph in JavaOGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graph”);ODocument luca = graph.createVertex(“Customer");luca.field(“name", “Luca");ODocument rome = graph.createVertex(“Address”);rome.field(“name", “Rome”);ODocument edge = graph.createEdge(luca, rome, “Lives”);edge.save();graph.close();(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
  43. Query the graph in SQLorientdb> select in.out from Address where name = ‘Rome’---+------+---------|--------------------+--------------------+--------+  #| RID  |@class   |label               |out                 |in      |---+------+---------+--------------------+--------------------+--------+  0| 13:35|Customer |Luca                |[#14:54]            |        |---+------+---------+--------------------+--------------------+--------+1 item(s) found. Query executed in 0.007 sec(s). Incoming vertices(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
  44. More on query powerorientdb> select sum( orders.total ) from Customer where name = ‘Luca’orientdb> traverse friend from Customer while $depth <= 7orientdb> select from ( traverse friend from Customer while $depth <= 7 ) where city.name = ‘Oxford’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
  45. Query vs traversalOnce you’ve a well connected database in the form of a Super Graph you can cross records instead of query them! All you need is some root vertices where to start traversing(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
  46. Query vs traversal Special Special Customers Customers Stocks Stocks Customers Customers Luca Luca John John Sylvia Sylvia White White This is a Soap Soaproot vertex Order Order Order Order 2332 2332 8834 8834(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
  47. This is your database(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
  48. Get last customer bought Whisky select last(orders.customers) from Stock where name = ‘Whisky’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
  49. Get it’s countryselect city.country from #34:22(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49
  50. Get orders from that country select orders from #55:12(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 50
  51. NuvolaBase.com HTTP/REST HTTP/REST The first Graph Database as a Service on the Cloud(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 51
  52. Do we have enough time for a demo?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 52
  53. Questions & (maybe) Answers Luca Garulli CEO at Document-Graph NoSQL Open Source project Ltd, London UK www.twitter.com/lgarulli(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 53
  54. Summary 1)JOIN is heavy, specially on large databases 2)GraphDB uses LINK as direct pointers to records: times from O(log)N to near O(1) 3) GraphDB has a query language specialized to traverse relationships(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 54
  55. Let’s move like a Spider on the web (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 55

×