Your SlideShare is downloading. ×
OrientDB distributed architecture 1.1
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

OrientDB distributed architecture 1.1

35,241
views

Published on

This is the official presentation of the new clustering Multi-Master architecture of OrientDB

This is the official presentation of the new clustering Multi-Master architecture of OrientDB


0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
35,241
On Slideshare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
366
Comments
0
Likes
16
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. rev 1.1 Distributed architecture with a Multi-Master approach Available in version 1.0 (planned for December 2011)www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 of 41
  • 2. Where is the previous OrientDB Master/Slave architecture?www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2 of 41
  • 3. www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3 of 41
  • 4. After first tests we decided to throw away the old Master-Slave architecture because it was against the OrientDB philosophy: doesnt scale and its hard to configure properlywww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4 of 41
  • 5. So whats next? Weve re-designed the entire distributed architecture to get it working as Multi-Master* *http://en.wikipedia.org/wiki/Multi-master_replication to release in the version 1.0 (december 2011)www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5 of 41
  • 6. In the Multi-Master architecture any node can read/write to the database this scale up horizontly adding nodes is straightforward Say wow!www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6 of 41
  • 7. www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7 of 41
  • 8. ...but you have to fight with conflictswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8 of 41
  • 9. Fortunately we found some smart ways to resolve conflicts without falling in a Blood Bathwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9 of 41
  • 10. The actors Only 1 per Leader per cluster, checks other nodes and Leader Node notify changes to other Peer Nodes. Can be any server node in the cluster, usually the first to start Any server node in the cluster. Has a permanent Peer Node connection to the Leader Node Clients are connected to Server Nodes no matter if Leader Client or Peer Database Database, where data are stored Synchronous mode replication. Server node propagates changes waiting for the response from the remote server, then sends the ACK to the client Asynchronous mode replication. Server node propagates changes and sends the ACK to the client without waiting for the response from the remote serverwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10 of 41
  • 11. How the cluster of nodes is composed and managed?www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11 of 41
  • 12. Cluster auto-discovering At start up each Server Node sends a IP Multicast message in broadcast to discover if any Leader Node is available to join the cluster. If available, the Leader Node will connect to it and it becomes a Peer Node, otherwise it becomes the Leader node. Server #1 (Leader) DBDB DBDB DBDB Server #2 (Peer) DBDB DBDB DBDBwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12 of 41
  • 13. One Leader Multiple Peers The first node to start is always the Leader but in case of failure can be elected any other. Leader Node polls all the servers verifying the status and alerts all the Peer Nodes at every changes in the cluster composition. Server #1 (Leader) DBDB DBDB DBDB Server #2 Server #3 (Peer) (Peer) DBDB DBDB DBDB DBDB DBDB DBwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13 of 41
  • 14. Asymmetric clustering Each database can be clustered in multiple server nodes. Databases can be moved across servers. Replication strategy has per database/server granularity. This means you could have Server #2 that replicates database B in asynch way to the Server #3 and database A in synch way to the Server #1. A Server #1 (Leader) C Server #2 Server #3 (Peer) (Peer) A B C Bwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14 of 41
  • 15. Distributed configuration Cluster configuration is broadcasted from the Leader Node to all the Peer Nodes. Peer Nodes broadcast to all the connected clients. Everybody knows who has the database Client #1 Server #1 (Leader) Client #3 Server #2 Server #3 (Peer) (Peer) Client #2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15 of 41
  • 16. Security To join a cluster the Server Node has to configure the cluster name and password Broadcast messages are encrypted using the password Password doesnt cross the network: its stored in the configuration file Server #1 (Leader) Server #2 Join the cluster (Peer) ONLY If knows the name DBDB DBDB DBDB and passwordwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16 of 41
  • 17. Leader election Each Peer Node continuously checks the connection with the Leader Node If lost try to elect itself as a new Leader Node Split Network resolved using a simple algorithm Server #1 Server #2 192.168.0.10:2424 192.168.10.27:2424 (Leader) (Leader) Server #1 takes the leadership because has the lower ID ID = <ip-address>:<port>www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17 of 41
  • 18. Multiple clusters Multiple separate clusters can coexist in the same network Clusters cant see each others. Are separated boxes What identify a cluster is name + password Cluster A, password aaa Server #1 Cluster B, password bbb (Leader) Server #2 Server #1 (Peer) Server #3 (Leader) (Peer) Server #2 (Peer) Server #3 (Peer)www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18 of 41
  • 19. Fail-over Clients knows about other nodes, so transparently switch to good servers. No error is sent to the client app. Running transactions will be repeated transparently too (v1.2) Client #1 Client #2 Client #3 Client #4 Server #1 Server #2 DB-1 DB-2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19 of 41
  • 20. How the replication works?www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20 of 41
  • 21. Synchronous Replication Guarantees two databases are always consistent More expensive than asynchronous because the First Server waits for the Second Servers answer before to send back the ACK to the client. After ACK the Client is secure the data is placed in multiple nodes at the same time Server #1 Server #2 DB-1 DB-2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21 of 41
  • 22. Synchronous Replication steps Client #1 6) Sends back OK to Client #1 1) Update record request 3) Propagates the update Server #1 Server #2 2) Update record to DB-1 5) Sends back OK to Server #1 4) update record to DB-2 DB-1 DB-2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22 of 41
  • 23. Asynchronous Replication Changes are propagated without waiting for the answerTwo databases could be not consistent in the range of few ms For this reason its called “Eventually Consistent” Its much less expensive than synchronous replication. Server #1 Server #2 DB-1 DB-2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23 of 41
  • 24. Asynchronous Replication steps (4a and 4b are executed in parallel) Client #1 4a) Sends back OK to Client #1 1) Update record request 3) Propagates the update Server #1 Server #2 2) Update record to DB-1 4b) update record to DB-2 DB-1 DB-2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24 of 41
  • 25. Error Management During replication the Second Server could get an error due to a conflict (the record was modified in the same moment from another client) or a I/O problem. In this case the error is logged to disk to being fixed later. Client #1 4) Sends back OK to Client #1 1) Update record request 3) Propagates the update Server #1 Server #2 2) Update record to DB-1 6) log the error 5) update record to DB-2 DB-1 Synch Log DB-2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25 of 41
  • 26. Conflict Management During replication conflicts could happen if two clients are updating the same record at the same timeThe conflicts resolution strategy can be plugged by providing implementations of the OConflictResolver interface Server #2 Conflict Strategy DB-2www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26 of 41
  • 27. Conflict Management Default strategy Default implementation Server #2 merges the records: in case same fields are changed the oldest Default DB-2 document wins and the Conflict Strategy newest is written into the Synch Log Synch Logwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27 of 41
  • 28. Manual control of conflicts like SVN/GIT toolswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28 of 41
  • 29. Display the diff of 2 databases > compare database db1 db2 Copy a record across databases > copy record #10:20@db1 to #10:20@db2 Copy entire cluster across databases > copy cluster city@db1 to city@db2 Merges two records across databases > merge records #10:20@db1 #10:20@db2 to #10:20@db1www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29 of 41
  • 30. How nodes are re-aligned once up again after a fail, shutdown or network problem?www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30 of 41
  • 31. During replication all operations are logged using unique op-id with the format <node>#<serial> Client Update a record Server #1 Server #2 Op-id: 192.168.0.10:2424#123232 Op-id: 192.168.0.10:2424#123232 Operation Log DB-1 DB-2 Operation Logwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31 of 41
  • 32. On restart the node asks to the Leader which are the servers to synchronize op-ids are used to know the operation missed Server #1 Server #2 Op-id: 192.168.1.11:2424#9569 Op-id: 192.168.0.10:2424#123232 Operation Log DB-1 DB-2 Operation Logwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32 of 41
  • 33. To be consistent or not be, that is the questionwww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33 of 41
  • 34. Always consistent use it as a Master-Slave Read only, consistent. Leave it as Read/Write. All replica. Since its always aligned its changes on this server the best candidate as new master if avoiding conflicts Server #1 is unavailable Client Server #1 Server #2 Master Synch Slave Client read + write read only Perfect for Analysis, One-way only Business Intelligence and Reportswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34 of 41
  • 35. Read-only scaling using many asynchronous replicas Read/Write. All changes on this server avoiding conflicts Server #2 Synch Slave Client Server #1 read only Master Client read + write Server #N Server #3 Asynch Slave#3 Server Asynch Slave#3 Server read only Asynch Slave read only Asynch Slave Read only, eventually read only read only consistent. Replication cost close to zerowww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35 of 41
  • 36. Read/Write scaling Multi master + handling conflicts Client Server #1 Master Client read + write Server #2 Client Master read + write Client Client Server #3 Master Client read + writewww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36 of 41
  • 37. Read/Write scaling + sharding Multi master, no conflict! :-) Server USA Client Master customers_usa Writes on read + write customers_usa Writes on customers_china Server CHI Client Master customers_china read + writewww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37 of 41
  • 38. Multi-Master + Sharding = big scale in high-availability and no conflictswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38 of 41
  • 39. www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39 of 41
  • 40. NuvolaBase.com (beta) The first Graph Database on the Cloud always available few seconds to setup it use it from Web & Mobile appswww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40 of 41
  • 41. Luca Garulli Author of OrientDB and Roma <Meta> Framework Open Source projects, Member of JSR#12 (jdo 1.0) and JSR#243 (jdo 2.0) www.twitter.com/lgarulli @London, UK CEO at Nuvola Base Ltd and @Rome, Italywww.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41 of 41

×