OrientDB distributed architecture 1.1

rev 1.1

Distributed architecture
with a Multi-Master approach

Available in version 1.0
(planned for December 2011)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 of 41

Where is the previous
OrientDB
Master/Slave
architecture?


After first tests we decided to
throw away the old Master-Slave
architecture because it was
against the OrientDB philosophy:

doesn't scale
and

it's hard to configure properly

So what's next?
We've re-designed the entire distributed
architecture to get it working as

Multi-Master* *http://en.wikipedia.org/wiki/Multi-master_replication

to release in the version 1.0
(december 2011)

In the Multi-Master architecture

any node can read/write to the database

this scale up horizontly

adding nodes is straightforward

Say wow!


...but

you have to fight
with

conflicts

Fortunately we found some
smart ways to resolve conflicts without
falling in a

Blood Bath


The actors
Only 1 per Leader per cluster, checks other nodes and
Leader Node notify changes to other Peer Nodes. Can be any server
node in the cluster, usually the first to start
Any server node in the cluster. Has a permanent
Peer Node connection to the Leader Node
Clients are connected to Server Nodes no matter if Leader
Client
or Peer

Database Database, where data are stored

Synchronous mode replication. Server node propagates
changes waiting for the response from the remote server,
then sends the ACK to the client
Asynchronous mode replication. Server node propagates
changes and sends the ACK to the client without waiting
for the response from the remote server


How the cluster
of nodes is
composed
and
managed?

Cluster auto-discovering
At start up each Server Node sends a IP Multicast message in broadcast to
discover if any Leader Node is available to join the cluster. If available, the
Leader Node will connect to it and it becomes a Peer Node, otherwise it becomes
the Leader node.

Server #1
(Leader) DBDB
DBDB
DBDB

Server #2
(Peer)

DBDB
DBDB
DBDB


One Leader Multiple Peers
The first node to start is always the Leader but in case of failure can be elected
any other. Leader Node polls all the servers verifying the status and alerts all the
Peer Nodes at every changes in the cluster composition.

Server #1
(Leader) DBDB
DBDB
DBDB

Server #2 Server #3
(Peer) (Peer)

DBDB
DBDB
DBDB DBDB
DBDB
DB


Asymmetric clustering
Each database can be clustered in multiple server nodes. Databases can be moved
across servers. Replication strategy has per database/server granularity.
This means you could have Server #2 that replicates database B in asynch way
to the Server #3 and database A in synch way to the Server #1.

A
Server #1
(Leader)
C

Server #2 Server #3
(Peer) (Peer)

A B C B


Distributed configuration
Cluster configuration is broadcasted from the Leader Node to all the Peer Nodes.
Peer Nodes broadcast to all the connected clients.
Everybody knows who has the database

Client #1 Server #1
(Leader) Client #3

Server #2 Server #3
(Peer) (Peer)

Client #2


Security
To join a cluster the Server Node has to configure the cluster name and password
Broadcast messages are encrypted using the password
Password doesn't cross the network: it's stored in the configuration file

Server #1
(Leader)

Server #2 Join the cluster
(Peer) ONLY
If knows the name
DBDB
DBDB
DBDB and password


Leader election
Each Peer Node continuously checks the connection with the Leader Node
If lost try to elect itself as a new Leader Node
Split Network resolved using a simple algorithm

Server #1 Server #2
192.168.0.10:2424 192.168.10.27:2424
(Leader) (Leader)

Server #1 takes the
leadership
because has the lower ID
ID = <ip-address>:<port>


Multiple clusters
Multiple separate clusters can coexist in the same network
Clusters can't see each others. Are separated boxes
What identify a cluster is name + password

Cluster 'A', password 'aaa'

Server #1 Cluster 'B', password 'bbb'
(Leader)
Server #2 Server #1
(Peer)
Server #3 (Leader)
(Peer) Server #2
(Peer)
Server #3
(Peer)


Fail-over
Clients knows about other nodes, so transparently switch
to good servers. No error is sent to the client app.
Running transactions will be repeated transparently too (v1.2)

Client #1 Client #2 Client #3 Client #4

Server #1 Server #2

DB-1 DB-2


How the replication works?

Synchronous Replication
Guarantees two databases are always consistent
More expensive than asynchronous because the First Server
waits for the Second Server's answer before to send back
the ACK to the client. After ACK the Client is secure
the data is placed in multiple nodes at the same time

Server #1 Server #2

DB-1 DB-2


Synchronous Replication
steps

Client #1
6) Sends back OK to Client #1
1) Update record request
3) Propagates the update

Server #1 Server #2
2) Update record to DB-1 5) Sends back OK to Server #1 4) update record to DB-2

DB-1 DB-2


Asynchronous Replication
Changes are propagated without waiting for the answer
Two databases could be not consistent in the range of few ms
For this reason it's called “Eventually Consistent”
It's much less expensive than synchronous replication.

Server #1 Server #2

DB-1 DB-2


Asynchronous Replication
steps
(4a and 4b are executed in parallel)

Client #1
4a) Sends back OK to Client #1

Server #1 Server #2
2) Update record to DB-1 4b) update record to DB-2

DB-1 DB-2


Error Management
During replication the Second Server could get an error due to a
conflict (the record was modified in the same moment from another client)
or a I/O problem. In this case the error is logged to disk to being fixed later.

Client #1
4) Sends back OK to Client #1

Server #1 Server #2
2) Update record to DB-1 6) log the error 5) update record to DB-2

DB-1 Synch Log DB-2


Conflict Management
During replication conflicts could happen if two clients are
updating the same record at the same time
The conflicts resolution strategy can be plugged by providing
implementations of the OConflictResolver interface

Server #2

Conflict Strategy DB-2


Conflict Management
Default strategy

Default implementation Server #2
merges the records:
in case same fields are
changed the oldest
Default DB-2
document wins and the
Conflict Strategy
newest is written into the
Synch Log
Synch Log


Manual control of conflicts
like SVN/GIT tools


Display the diff of 2 databases
> compare database db1 db2

Copy a record across databases
> copy record #10:20@db1 to #10:20@db2

Copy entire cluster across databases
> copy cluster city@db1 to city@db2

Merges two records across databases
> merge records #10:20@db1 #10:20@db2
to #10:20@db1


How nodes are re-aligned

once up again after a fail,
shutdown or network problem?

During replication all operations
are logged using

unique op-id with the format <node>#<serial>

Client
Update a record

Server #1 Server #2

Op-id: 192.168.0.10:2424#123232 Op-id: 192.168.0.10:2424#123232

Operation Log DB-1 DB-2 Operation Log


On restart the node asks to the Leader
which are the servers to synchronize

op-ids are used to know the operation missed

Server #1 Server #2

Op-id: 192.168.1.11:2424#9569 Op-id: 192.168.0.10:2424#123232

Operation Log DB-1 DB-2 Operation Log


To be
consistent
or not be,
that is
the question


Always consistent
use it as a Master-Slave
Read only, consistent. Leave it as
Read/Write. All replica. Since it's always aligned it's
changes on this server the best candidate as new master if
avoiding conflicts Server #1 is unavailable

Client Server #1 Server #2
Master Synch Slave
Client read + write read only

Perfect for Analysis,
One-way only
Business Intelligence
and Reports


Read-only scaling
using many asynchronous replicas

Read/Write. All
changes on this server
avoiding conflicts

Server #2
Synch Slave
Client Server #1 read only
Master
Client read + write Server #N
Server #3
Asynch Slave#3
Server
Asynch Slave#3
Server
read only
Asynch Slave
read only
Asynch Slave
Read only, eventually read only
read only
consistent. Replication
cost close to zero


Read/Write scaling
Multi master + handling conflicts
Client Server #1
Master
Client read + write

Server #2 Client
Master
read + write Client

Client Server #3
Master
Client read + write


Read/Write scaling + sharding
Multi master, no conflict! :-)
Server USA
Client Master customers_usa

Writes on read + write
customers_usa

Writes on
customers_china
Server CHI
Client Master customers_china
read + write


Multi-Master + Sharding
=
big scale in high-availability and no conflicts

NuvolaBase.com (beta)

The first
Graph Database
on the Cloud
always available
few seconds to setup it
use it from Web & Mobile
apps


Luca Garulli
Author of OrientDB and
Roma <Meta> Framework
Open Source projects,

Member of JSR#12 (jdo 1.0)
and JSR#243 (jdo 2.0)
www.twitter.com/lgarulli
@London, UK CEO at Nuvola Base Ltd
and
@Rome, Italy


OrientDB distributed architecture 1.1

More Related Content

What's hot

Viewers also liked

Similar to OrientDB distributed architecture 1.1

More from Luca Garulli

Recently uploaded

OrientDB distributed architecture 1.1