Avoiding the ring of death: Why ring replication isn't optimal for MySQL

Avoiding the ring of death
Why you shouldn’t use MySQL Ring/Circular replication
Aishvarya Verma

What is Ring/Circular
Replication
• Ring or circular replication is a
multi-master topology with the
nodes of the cluster organized in
a ring or circular manner
• Each node is a master, i.e. it
accepts writes, which are then
propagated to all the other
nodes serially,

What is the thought behind using this?
• It is thought that the multiple nodes provide High Availability
• Spreading the writes to multiple nodes should provide High Scalability
• Example : Most enterprises cater to data for different companies,
many users/accounts and from different geographies. So, the idea is
to spread out the load to multiple servers and have the data available
on all servers for high availability
• BUT DOES IT ACTUALLY PROVIDE THESE BENEFITS ???
• NO. It actually does the exact opposite of what it is believed to
provide

CON#1 : Multiple Points of failure
• Each time a server goes down, the whole
chain is impacted. So, availability is even poor
compared to multiple individual servers
• Things can get real trippy when a DB event
from a failed node is being replicated to other
nodes. It will go into an infinite loop, coz only
the failed node could have stopped the event
from propagating further
• When a failed node is out of the chain, there
is increased load on the other servers. If the
failed node is not recovered in time, and
there is high traffic it will have a domino
effect and if one or more servers fail, the
whole system can be choked and can
completely crash
• No single master data. So, its complex to
recover a failed node
• Chain is only as strong as its weakest link !!

CON#2 Write/Read scalability Mirage
• If single server is able to handle W writes/sec then its fair to assume
that using 3 servers we will get 3x the write capacity = 3W. Is it??
• No, because the replication puts extra load on the whole system
• Lot of computing resources on each server will now be consumed to
deal with the writes on the other 2 servers.
• Example: Assume W = 1000 w/s
Now, as each server needs to process the writes on the other 2
servers to be in sync, at peak time each server will only be able to
have 330 w/s of its own, as it will need to process 660 w/s for the
other 2 servers and so the cluster write speed in worst case is still
bound by each server’s write handling capacity.
• Even if we add another node, it will do more harm than good

CON#3 Write conflicts
• Duplicate key errors break the replication chain. This is caused by
AUTO_INCREMENT of keys and can literally bring your replication ring
to its knees
• Each time replication chain breaks, there is additional complexity of
removing that node from the chain and then recovering it and adding
it back to the chain
• Inconsistent data is also possible, if multiple users are allowed to
write/update same row of data on multiple nodes. One way to avoid
this is to have users mapped to only a single server for writes and
allow reads from other nodes only for load balancing. But, even this
doesn’t guarantee that incorrect data state will not be observed.

CON#4 Under-utilized Server resources
• Table sizes and index sizes are huge, due to the unnecessary 67% of
extra data that is added due to replication
• Due to this the performance of the DB server(s) goes down
• Server resources are wasted in dealing with replication of data from
other servers and are strained by the additional load from this data
• In summary, the server resources can be used more efficiently by
using it for work that is related to the active data on each node

Solution : Keep it simple silly..
• Sharding : Split your data on multiple nodes so that if unfortunately a
server does go down then only a subsection of the system is affected
• Use active passive multi master setup : Each node is now holding all the
data for a portion of your users/system. So, we can have independent
backup for that
• In case of Failure the passive master becomes the active master, while the
failed master is recovered and added as passive slave again
• Note : This is not the only solution, but shows how a simple setup can be
better than a complex ring topology

Active Passive Masters : Advantages
• Provides failover, and does not put extra load on other Master pairs
• Provides scalability : If a server X is attracting heavy load then a new
Active-passive pair can be added and the load from X can be split and
moved to the new node
• Replication delay is reduced significantly, because of dedicated slaves
• Hardware upgrades can be different on different pairs, depending
upon the load experienced by each of them
• Queries will perform better on each node because of smaller table
and index sizes on each active master node

Will it multiply cost?? Not really
• Lets say, we currently have a cluster of 4 nodes, with each node
having 32 CPU cores, and a storage of 3.6 TB(3x the actual DB size)
• Now, if we split this to Active passive configuration, we will need 8
nodes, i.e. 4 masters & 4 slaves.
• But, now as we are Sharding our data across these nodes, our DB size
on each node should reduce by a factor of 4 to 0.9 TB. We can now
choose to reduce the CPU cores and RAM too, by a factor of 2, as we
have less data on each node now to process.
• So, now lets compare the resource cost for this topology change

Performance comparison
• The cost of 4 new servers might seem like a deterrent, but if this setup is
implemented on AWS EC2 infra then it will not cost more than the current setup
in AWS EC2.
• The storage cost for each node will be reduced by a factor of 1/N, where N is the
number of nodes. This is because we will essentially be storing only the required
data on each node, which amounts to 1/N of the current DB size on each master
in the cluster
• As storage size reduces, so does the table sizes and index sizes
• As the RAM to DB size ratio increases, it is guaranteed to give better memory
performance
• CPU cores available per TB of data increases, which will give better
performance
• If we use SSDs for storage then we can achieve even better performance

Conclusion
• Ring replication is definitely not the right option for High availability
and scalability, and is not recommended for these use cases
• Active passive master configuration is not the only solution, but is
just compared here to show the inefficiencies in the current ring
replication strategy
• It shows that by just sharding our data and re-allocating our compute
resources, we can achieve a much better performance, with more
stability and efficiency of the cluster

References
• https://www.packtpub.com/books/content/setting-mysql-replication-
high-availability
• https://www.percona.com/blog/2014/10/07/mysql-ring-replication-
why-it-is-a-bad-option/
• http://www.onlamp.com/2006/04/20/advanced-mysql-
replication.html
• https://www.safaribooksonline.com/library/view/effective-mysql-
replication/9780071791861/

Avoiding the ring of death: Why ring replication isn't optimal for MySQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Avoiding the ring of death: Why ring replication isn't optimal for MySQL

Similar to Avoiding the ring of death: Why ring replication isn't optimal for MySQL (20)

Recently uploaded

Recently uploaded (20)

Avoiding the ring of death: Why ring replication isn't optimal for MySQL