• Save
Switching from Relational 2 Graph - CloudConf.it
Upcoming SlideShare
Loading in...5
×
 

Switching from Relational 2 Graph - CloudConf.it

on

  • 1,701 views

 

Statistics

Views

Total Views
1,701
Views on SlideShare
1,598
Embed Views
103

Actions

Likes
13
Downloads
0
Comments
0

2 Embeds 103

https://twitter.com 100
http://lanyrd.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Good afternoon! Today I’d like to show you a new way to design a database. In 1970 Relational DBMS

Switching from Relational 2 Graph - CloudConf.it Switching from Relational 2 Graph - CloudConf.it Presentation Transcript

  • Switching from the Relational to the Graph modelLuca Garulli –Founder and CEO @NuvolaBase LtdAuthor of OrientDB Cloud Conference Apr 18th 2013 in Turin, Italy(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 www.orientechnologies.com
  • 1979 First Relational DBMS available as product 2009 NoSQL movement(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
  • 1979 First Relational DBMS available as product Hey, 30 years in the IT field is so huge! 2009 NoSQL movement(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
  • Before 2009 teams of developers always fought to select: Operative System Programming Language Middleware (App-Servers) What about the Database?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
  • One of the main resistances ofRDBMS users to pass to a NoSQL product are related to the complexity of the model: Ok, NoSQL products are super for BigData and BigScale but...(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
  • ...what about the model?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
  • What is the NoSQL answer about managing complex domains? Key-Value stores ? Column-Based ? Document database ? Graph database !(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
  • CAUTION! This presentation will not use a social like domain with the classic paradigm of friend-of-friendN where the graph databases are already widely used...(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
  • ...But rather we will explore how to think «graphically» with one of the most common domains in the enterprise world: The old-classic CRM* domain * today in 99% of the cases a RDBMS is used(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9
  • Every developer knows the Relational Model, but who knows the Graph one?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
  • Back to school: Graph Theory crash course(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
  • Basic Graph Likes Cloud Cloud Luca Luca Conference Conference(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12
  • Property Graph Model* Vertices are directed Luca Luca Likes Cloud Cloud name: Luca name: Luca surname: Garulli surname: Garulli since: 2013 Conference Conference company: NuvolaBase company: NuvolaBase date: Oct 1° 2012 date: Oct 1° 2012 Vertices and Edges can have properties * https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
  • Property Graph Model Likes 2 013 since: Cloud Cloud Luca Luca Speak Conference Conference s ti abstra tle: «Switch ct: «Th in is talk g...» presen ts...» An Edge connects 2 vertices: use multiple edges to represents 1-N and N-M relationships(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
  • Property Graph Model Studies Turin Turin Luca Luca Likes located FriendOf Cloud Cloud Conference Conference Walter Walter Organizes(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
  • Compliments, this is your diploma in «Graph Theory»(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
  • Now go back to our domain: the CRM(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
  • Domain: the super minimal CRM Customer Customer Address AddressRegistry systemOrder system Order Order Stock Stock (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
  • Domain: the super minimal CRM Customer Customer Address Address How does Relational DBMSRegistry system manage relationships?Order system Order Order Stock Stock (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
  • Relational World: 1-1 RelationshipsPrimary key Primary key Customer Address Id Name Address Id Location Foreign key 10 Luca 34 34 Rome 11 Jill 44 44 London 34 John 54 54 Moscow 56 Mark 66 66 New Mexico 88 Steve 68 68 Palo Alto JOIN Customer.Address -> Address.Id(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
  • Relational World: 1-N Relationships Customer Address Id Name Id Customer Location 10 Luca 24 10 Rome 11 Jill 33 10 London 34 John 44 34 Moscow 56 Mark 66 56 Cologne 88 Steve 68 88 Palo Alto Inverse JOIN Address.Customer -> Customer.Id(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
  • Relational World: N-M Relationships Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Jill 10 33 33 London 34 John 34 44 44 Moscow 56 Mark 66 Cologne 88 Steve 68 Palo Alto Additional table with 2 JOINs (1) CustomerAddress.Id -> Customer.Id and (2) CustomerAddress.Address -> Address.Id(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
  • What’s wrong with the Relational Model?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
  • The JOIN is the evil! Customer CustomerAddress Address Id Name Id Address Id Location 10 Luca 10 24 24 Rome 11 Jill 10 33 33 London 34 John 34 24 44 Moscow 56 Mark 66 Cologne 88 Steve 68 Palo Alto These are all JOINs executed everytime you traverse a relationship! relationship(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
  • A JOIN means searching for a key in another table The first rule to improve performance is indexing all the keysIndex speeds up searches, but slows down insert, updates and deletes (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
  • So in the best case a JOIN is a lookup into an index This is done per single join!If you traverse hundreds of relationships you’re executing hundreds of JOINs(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
  • Index Lookup is it really that fast?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
  • Index Lookup: how does it works? A-Z A-L M-Z Think to an Address Book where we have to find the Luca’s phone number(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
  • Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z Index algorithms are all similar and based on balanced trees(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
  • Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z A-D E-L A-B C-D E-G H-L(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
  • Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z A-D E-L M-R S-Z A-D E-L A-B C-D E-G H-L E-G H-L E-F G H-J K-L(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
  • Index Lookup: how does it works? A-Z A-L M-Z A-L M-Z Found! A-D E-L M-R S-Z This lookup took 5 A-D E-L steps and grows A-B C-D E-G H-L up with the index E-G H-L size! E-F G H-J K-L Luca(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
  • An index lookup is executed for each JOIN Querying more tables can easilyproduce millions of JOINs/Lookups! Here the rule: more entries = more lookup steps = slower JOIN(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
  • Oh! This is why performance of my database drops down when it becomes bigger, and bigger, and bigger!(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
  • Is there a better way to manage relationships?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
  • “A graph database is any storage system that provides index-free adjacency” - Marko Rodriguez (author of TinkerPop Blueprints)(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
  • How does GraphDB manage index-free relationships?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
  • an Open Source (Apache licensed) document-graph NoSQL dbms(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
  • Ø config download, unzip, run! cut & paste the db directory(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
  • 150,000 records per second (flat records, no index, on commodity hw)(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
  • Schema-less schema is not mandatory, relaxed model,collect heterogeneous documents all together(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
  • Schema-fullschema with constraints on fields and validation rules Customer.age > 17 Customer.address not null Customer.surname is mandatory Customer.email matches b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
  • Schema-mixedschema with mandatory and optional fields + constraints the best of schema-less and schema-full modes(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
  • ACID Transactions db.begin(); try{ // your code ... db.commit(); } catch( Exception e ) { db.rollback(); }(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
  • Complex typesnative support for collections, maps (key/value) and embedded documents no more additional tables to handle them (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
  • SQLselect * from employee where name like %Jay% and status=0(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
  • runs Java everywhere is available JRE1.6+ ® robust engine(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
  • Language bindings Java as native JRuby, PHP, C, C++, Scala, .NET, Ruby, Clojure, Node.js, Python, Javascript and more!(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
  • JPA (partial) public class Customer { @Id private Object id; private String name; private String surname; } db.save( new Customer() );(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49
  • Born for the Internet Supports natively HTTP/RESTful protocol Documents are transferred in JSON(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 50
  • MVRB-Tree index the best of B+Tree and RB-Tree fast on browsing, low insertion cost its a new algorithm!(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 51
  • Security users and roles, encrypted passwords fine grain privileges (similar to what RDBMSs offer)(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 52
  • Cache You can avoid using 3°party caches like Memcached 2 Levels of cache: Level1: Database level, 1 per thread Level2: Storage level, 1 per JVM(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 53
  • Inheritance OGraphVertex (V) Person Vehicle Address : Address brand : BRANDS Customer Provider totSold : float totBuyed : float(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 54
  • Polymorphic SQL Query OGraphVertex (V) Person Vehicle Address : Address brand : BRANDS select * from Person where city.name = Rome‘ Queries are polymorphics Customer Provider and subclasses of Person can be totSold : float totBuyed : float part of result set(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 55
  • Let’s go back to the Graph Stuff How does OrientDB manage relationships?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 56
  • OrientDB: traverse a relationship The Record ID (RID) is the physical position RID = #13:35 RID = #13:35 RID = #13:100 RID = #13:100 Luca Luca Rome Rome label : :‘Customer’ label ‘Customer’ label = ‘Address’ label = ‘Address’ name : :‘Luca’ name ‘Luca’ name = ‘Rome’ name = ‘Rome’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 57
  • OrientDB: traverse a relationship The Edge’s RID is saved inside both vertices, as «out» and «in» RID = #13:35 RID = #13:35 RID = #13:100 RID = #13:100 RID = #14:54 RID = #14:54 Lives Luca Luca Rome Rome out: [#13:35] out: [#13:35] in: [#13:100] in: [#13:100] out ::[#14:54] Label : :‘Lives’ Label ‘Lives’ in: [#14:54] out [#14:54] in: [#14:54] label : :‘Customer’ label ‘Customer’ label = ‘Address’ label = ‘Address’ name : :‘Luca’ name ‘Luca’ name = ‘Rome’ name = ‘Rome’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 58
  • OrientDB: traverse a relationship RID = #13:35 RID = #13:35 RID = #13:100 RID = #13:100 RID = #14:54 RID = #14:54 Lives Luca Luca Rome Rome out: [#13:35] out: [#13:35] in: [#13:100] in: [#13:100] out ::[#14:54] Label : :‘Lives’ Label ‘Lives’ in: [#14:54] out [#14:54] in: [#14:54] label : :‘Customer’ label ‘Customer’ label = ‘Address’ label = ‘Address’ name : :‘Luca’ name ‘Luca’ name = ‘Rome’ name = ‘Rome’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 59
  • OrientDB: traverse a relationship RID = #13:35 RID = #13:35 RID = #13:100 RID = #13:100 RID = #14:54 RID = #14:54 Lives Luca Luca Rome Rome out: [#13:35] out: [#13:35] in: [#13:100] in: [#13:100] out ::[#14:54] Label : :‘Lives’ Label ‘Lives’ in: [#14:54] out [#14:54] in: [#14:54] label : :‘Customer’ label ‘Customer’ label = ‘Address’ label = ‘Address’ name : :‘Luca’ name ‘Luca’ name = ‘Rome’ name = ‘Rome’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 60
  • GraphDB handles relationships as a physical LINK to the record assigned when the edge is created on the other side RDBMS computes the relationship every time you query a database Is not that crazy?!(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 61
  • This means jumping from a O(log N) algorithm to a near O(1) traversing cost is not more affected by database size! This is huge in the BigData age(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 62
  • OrientDB in the Blueprints micro-benchmark, on common hw, with a hot cache, traverses 29,6 Millions of records in less than 5 seconds about 6 Millions of nodes traversed per sec! Do not try this at home with a RDBMS*! *unless you live in the Google’s server farm(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 63
  • Create the graph in SQL$luca> cd bin$luca> ./console.shOrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org)Type help to display all the commands supported.orientdb> create vertex Customer set name = ‘Luca’Created vertex #13:35 in 0.03 secsorientdb> create vertex Address set name = ‘Rome’Created vertex #13:100 in 0.02 secsorientdb> create edge Lives from #13:35 to #13:100Created edge #14:54 in 0.02 secs(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 64
  • Create the graph in JavaGraph graph = new OrientGraph("local:/tmp/db/graph”);Vertex luca = graph.addVertex( “class:Customer” );luca.setProperty( “name", “Luca” );Vertex rome = graph.addVertex ( “class:Address” );rome.setProperty( “name", “Rome” );Edge edge = luca.addEdge( “Lives”, rome );graph.shutdown();(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 65
  • Query the graph in SQLorientdb> select in_lives.out from Address where name = ‘Rome’---+------+---------|--------------------+--------------------+--------+  #| RID  |@class   |label               |out_lives           |in      |---+------+---------+--------------------+--------------------+--------+  0| 13:35|Customer |Luca                |[#14:54]            |        |---+------+---------+--------------------+--------------------+--------+1 item(s) found. Query executed in 0.007 sec(s). Incoming vertices(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 66
  • More on query powerorientdb> select sum( out_Order.in.total ) from Customer where name = ‘Luca’orientdb> traverse out_Friend.in, in_Friend.out from Customer while $depth <= 7orientdb> select from ( traverse out_Friend.in, in_Friend.out from Customer while $depth <= 7 ) where @class=‘Customer’ and city.name = ‘Turin’(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 67
  • Query vs traversalOnce you’ve a well connected database in the form of a Super Graph you can cross records instead of query them! All you need is some root vertices where to start traversing(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 68
  • Query vs traversal Special Special Customers Customers Stocks Stocks Customers Customers Mar Mar Luca Luca Jill Jill k k White White This is a Soap Soaproot vertex Order Order Order Order 2332 2332 8834 8834(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 69
  • Temporal based graph Year Year Calendar Calendar 2013 2013 Month Month April 2013 April 2013 Day Day 9/4/2013 9/4/2013 Hour Hour Hour Hour 9/4/2013 9/4/2013 9/4/2013 9/4/2013 09:00 09:00 10:00 10:00 Order Order Order Order Order Order 2332 2332 2333 2333 2334 2334(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 70
  • Location based graph Country Country Location Location Italy Italy Region Region Lazio Lazio State State RM RM City City City City Fiumicino Fiumicino Rome Rome Order Order Order Order Order Order 2332 2332 2333 2333 2334 2334(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 71
  • Mix & Merge graphs Region Region State State Lazio Lazio RM RM Country Country City City City City Italy Italy Fiumicino Rome Rome Fiumicino Location Location Order Order Order Order Order Order 2332 2332 2333 2333 2334 2334 Calendar Calendar Hour Hour Hour Hour 9/4/2013 9/4/2013 9/4/2013 9/4/2013 Year Year 09:00 09:00 10:00 10:00 2013 2013 Month Month April 2013 April 2013 Day Day 9/4/2013 9/4/2013(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 72
  • This is your database(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 73
  • Get last customer bought ‘Barolo’ select last(out_Order.in.out_Customer.in]) from Stock where name = ‘Barolo’ #34:22(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 74
  • Get his’s country select out_City.in from #34:22 Turin, Italy #55:12(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 75
  • Get orders from that country select in_Customer.out from #55:12(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 76
  • NuvolaBase.com HTTP/REST HTTP/REST The first Graph Database as a Service on the Cloud(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 77
  • Do we have enough time for live demo?(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 78
  • (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 79
  • Questions & (maybe) Answers Luca Garulli CEO at Document-Graph NoSQL Open Source project Ltd, London UK www.twitter.com/lgarulli Conclusions at the end ->(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 80
  • Summary 1)JOIN is heavy, specially on large databases 2)GraphDB uses LINK as direct pointers to records: times from O(log)N to near O(1) = ready for the BigData 3) GraphDB has a query language specialized to traverse relationships(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 81
  • Let’s move like a Spider on the web (c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 82