Galera Replicator IRL
Art van Scheppingen
Head of Database Engineering
Overview
1.
2.
3.
4.
5.
6.

Who are we?
What is Galera?
What is Spil Games using Galera for?
What have we learned?
Future ...
Who are we?
Who is Spil Games?
Facts
•
•
•
•
•

Company founded in 2001
350+ employees world wide
180M+ unique visitors per month
Over 60M registered use...
Geographic Reach
180 Million Monthly Active Users(*)

Source: (*) Google Analytics, August 2012

5
What is Galera?
How to get Highly Available
and beyond
What is Galera?
1. Replication plugin for MySQL by Codership
• Synchronous (parallel) replication
• Supports InnoDB
• MyIS...
How does Galera work?
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Connect/read/write
to any node

MySQL
MySQL...
Galera replication

Server-n
Server-n

Client receives OK

MySQL
MySQL

commit

MySQL
MySQL
Galera replication
Galera repl...
High Availability (1)
Server-1
Server-1

MySQL
MySQL

Server-2
Server-2

MySQL
MySQL
Galera
Galera

Server-n
Server-n

MyS...
High Availability (2)
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer / Query router

MySQL
MySQL

...
High Availability (3)
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer

Load balancer

Load balancer...
High Availability (4)
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer (port 3306 read,3307 write)

...
Node joining SST (State Snapshot Transfer)
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer

SST
New...
Node joining IST (Incremental State Transfer)
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer

IST
...
Galera replication over WAN

Server-n
Server-n
Commit is delayed by RTT

DC1

DC2

MySQL
MySQL

MySQL
MySQL
Galera replica...
WAN replication Galera 2.x

DC1

DC2

Node 11
Node

Node 22
Node

Node 44
Node

Node 33
Node

Node 55
Node

Node 66
Node
WAN replication Galera 3.x

DC1

DC2

Node 11
Node

Node 22
Node

Node 44
Node

Node 33
Node

Node 55
Node

Node 66
Node
What are we
using Galera for?
Synchronous replication for
the masses
Our systems
1. Legacy services databases
• MySQL Master-Master
1. SSP (Spil Storage Platform)
• MySQL Master-Master (to be...
Master-Master setup used at Spil Games
Server-1
Server-1

Server-2
Server-2

read+write

read only

MySQL
MySQL
active
act...
Master-Master setup used at Spil Games
MMM
MMM

read+write
db-something (192.168.1.1)
db-something-r1 (192.168.1.2)

db-so...
Migrating legacy dbs to Galera (lab)
legacy1
legacy1
inactive
inactive
master
master

MySQL
MySQL

Clone database
(innobac...
Scaling Galera (1)
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer (port 3306 write+read)

MySQL
My...
Scaling Galera (2)
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer (port 3306 write, 3307 read)

re...
Why consolidate legacy systems?
1. Around 20 legacy database clusters
• 50 servers in total
1. Maintenance
• Master-Master...
SSP (Spil Storage Platform)
• Storage API between application and databases
• All data is sharded
• User
SSP
SSP
• Functio...
SSP Master-Master setup
Server-1
Server-1

Server-2
Server-2

read+write

read+write

MySQL
MySQL
active
active
master
mas...
SSP Master-Master setup
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

read+write

MySQL
MySQL
broken
broken
mas...
SSP Galera setup
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer

read/write to any node

MySQL
MyS...
Current state of the SSP
1. Total of 4 old style SSP shard nodes (2 clusters)
2. Total of 6 Galera SSP shard nodes (2 clus...
What have we
learned so far?
Pitfalls, hurdles, etc
Creating backups
1. Two ways to make backups:
• Issue SST
• Either mysqldump or Innobackupex
• Regular Innobackupex
• --ga...
Backup SST
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer

SST
Backup
Backup
receiver
done
receive...
Backup Innobackupex
Server-1
Server-1

Server-2
Server-2

Server-n
Server-n

Load balancer

read/write to two nodes

wsrep...
Restoring backups
1. Restored backup can be used to prevent SST of new
joiners
2. Automated backup verification
• Restores...
Monitoring
1. Cluster
• Nodes in the cluster
• Warning at 2, critical at 1
• Availability of the address
1. Load balancer
...
Flow control
1. Usage of replication threads
• Scale from 0.0 to 1.0
1. Recommended to stay below 0.1 (10% blocked)
2. Add...
Other things we bumped into…
1. MySQL version updates
• Update one by one
• PXC SST changes
1. Availability after restart
...
Future for Galera
at Spil Games
What will we do in the near
future?
Openstack
1. Offer DAAS to our (internal) customers
2. Spawning (automated) database nodes and clusters
when necessary
3. ...
WAN Replication
1. No immediate use case (yet)
• No need for WAN in sharded environment
• Game catalogue might need it in ...
MaxScale
1. Beta testing MaxScale for SkySQL
• Works flawless in the lab (so far)
• Not yet tested with mixed Galera/MySQL...
Conclusion
What is our verdict?
Conclusion(s)
1.
2.
3.
4.
5.
6.

Galera definitely live up to expectations
Decreased cluster wide performance
Increased re...
Thank you!
• Presentation can be found at:
http://spil.com/fosdem2014
• Mysql_statsd can be found at:
http://spil.com/mysq...
Photo sources
Our current HA environment:
http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highl...
Upcoming SlideShare
Loading in...5
×

Spil Games @ FOSDEM: Galera Replicator IRL

60,975

Published on

Spil Games @ FOSDEM: Galera Replicator IRL

Published in: Technology, Travel
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
60,975
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
1
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • so that may be the reason our name is not widely known.
  • Spil Games @ FOSDEM: Galera Replicator IRL

    1. 1. Galera Replicator IRL Art van Scheppingen Head of Database Engineering
    2. 2. Overview 1. 2. 3. 4. 5. 6. Who are we? What is Galera? What is Spil Games using Galera for? What have we learned? Future technologies Conclusion 2
    3. 3. Who are we? Who is Spil Games?
    4. 4. Facts • • • • • Company founded in 2001 350+ employees world wide 180M+ unique visitors per month Over 60M registered users 45 portals in 19 languages • Casual games • Social games • Real time multiplayer games • Mobile games • 35+ MySQL clusters • 60k queries per second (3.5 billion qpd) 4
    5. 5. Geographic Reach 180 Million Monthly Active Users(*) Source: (*) Google Analytics, August 2012 5
    6. 6. What is Galera? How to get Highly Available and beyond
    7. 7. What is Galera? 1. Replication plugin for MySQL by Codership • Synchronous (parallel) replication • Supports InnoDB • MyISAM “works” • Committing transactions actually replicates data 1. Allows clustering of nodes • Minimum of 3 nodes for HA • Galera Arbitrator allows 2 nodes • One node elected as Primary Component
    8. 8. How does Galera work? Server-1 Server-1 Server-2 Server-2 Server-n Server-n Connect/read/write to any node MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Synchronous replication
    9. 9. Galera replication Server-n Server-n Client receives OK MySQL MySQL commit MySQL MySQL Galera replication Galera replication MySQL MySQL Transaction applied to slaves
    10. 10. High Availability (1) Server-1 Server-1 MySQL MySQL Server-2 Server-2 MySQL MySQL Galera Galera Server-n Server-n MySQL MySQL
    11. 11. High Availability (2) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer / Query router MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL
    12. 12. High Availability (3) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer Load balancer Load balancer MySQL MySQL MySQL MySQL MySQL MySQL Galera Galera
    13. 13. High Availability (4) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer (port 3306 read,3307 write) read+write MySQL MySQL read only MySQL MySQL Galera Galera read only MySQL MySQL
    14. 14. Node joining SST (State Snapshot Transfer) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer SST New New MySQL MySQL node node Requests to join cluster read/write to two nodes Cluster drains node MySQL MySQL MySQL MySQL Galera Galera Galera Galera MySQL MySQL Synchronous replication
    15. 15. Node joining IST (Incremental State Transfer) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer IST Existing Existing MySQL MySQL node node Requests to join cluster MySQL MySQL MySQL MySQL Galera Galera Galera Galera MySQL MySQL Synchronous replication
    16. 16. Galera replication over WAN Server-n Server-n Commit is delayed by RTT DC1 DC2 MySQL MySQL MySQL MySQL Galera replication Galera replication
    17. 17. WAN replication Galera 2.x DC1 DC2 Node 11 Node Node 22 Node Node 44 Node Node 33 Node Node 55 Node Node 66 Node
    18. 18. WAN replication Galera 3.x DC1 DC2 Node 11 Node Node 22 Node Node 44 Node Node 33 Node Node 55 Node Node 66 Node
    19. 19. What are we using Galera for? Synchronous replication for the masses
    20. 20. Our systems 1. Legacy services databases • MySQL Master-Master 1. SSP (Spil Storage Platform) • MySQL Master-Master (to be phased out) • Galera 1. ROAR (Read Often, Alter Rarely) • Galera
    21. 21. Master-Master setup used at Spil Games Server-1 Server-1 Server-2 Server-2 read+write read only MySQL MySQL active active master master db-something (192.168.1.1) db-something-r1 (192.168.1.2) Server-n Server-n MySQL MySQL inactive inactive master master db-something-r2 (192.168.1.3) Asynchronous replication MMM MMM
    22. 22. Master-Master setup used at Spil Games MMM MMM read+write db-something (192.168.1.1) db-something-r1 (192.168.1.2) db-something-r3 (192.168.1.4) MySQL MySQL active active master master MySQL MySQL slave slave read only MySQL MySQL inactive inactive master master read only MySQL MySQL slave slave db-something-r2 (192.168.1.3) db-something-r4 (192.168.1.5)
    23. 23. Migrating legacy dbs to Galera (lab) legacy1 legacy1 inactive inactive master master MySQL MySQL Clone database (innobackupex) legacy2 legacy2 inactive inactive master master legacy3 legacy3 inactive inactive master master Feed database dump (mysqldump) Start slaving MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL
    24. 24. Scaling Galera (1) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer (port 3306 write+read) MySQL MySQL MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Galera Galera MySQL MySQL
    25. 25. Scaling Galera (2) Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer (port 3306 write, 3307 read) read only read only asynchronous replication MySQL MySQL asynchronous replication MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL MySQL MySQL
    26. 26. Why consolidate legacy systems? 1. Around 20 legacy database clusters • 50 servers in total 1. Maintenance • Master-Master requires a lot of (manual) maintenance 1. Replacement is needed • 35 of them will be older than 3 years in 2014 1. Current state: tested in lab
    27. 27. SSP (Spil Storage Platform) • Storage API between application and databases • All data is sharded • User SSP SSP • Function • Location • Every cluster (two masters) will contain two shards Shard 11 Shard 22 Shard Shard • Data written interleaved • HA for both shards • Both masters active and “warmed up” 27
    28. 28. SSP Master-Master setup Server-1 Server-1 Server-2 Server-2 read+write read+write MySQL MySQL active active master master db-ssp001 (192.168.2.1) Server-n Server-n MySQL MySQL active active master master db-ssp002 (192.168.2.2) Asynchronous replication MMM MMM
    29. 29. SSP Master-Master setup Server-1 Server-1 Server-2 Server-2 Server-n Server-n read+write MySQL MySQL broken broken master master MySQL MySQL active active master master db-ssp002 (192.168.2.2) db-ssp001 (192.168.2.1) MMM MMM
    30. 30. SSP Galera setup Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer read/write to any node MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Synchronous replication
    31. 31. Current state of the SSP 1. Total of 4 old style SSP shard nodes (2 clusters) 2. Total of 6 Galera SSP shard nodes (2 clusters) 3. Add Galera nodes/clusters when necessary
    32. 32. What have we learned so far? Pitfalls, hurdles, etc
    33. 33. Creating backups 1. Two ways to make backups: • Issue SST • Either mysqldump or Innobackupex • Regular Innobackupex • --galera_info • set global wsrep_desync=on to remove node
    34. 34. Backup SST Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer SST Backup Backup receiver done receiver done read/write to two nodes Cluster drains node MySQL MySQL Request SST MySQL MySQL Galera Galera MySQL MySQL Synchronous replication
    35. 35. Backup Innobackupex Server-1 Server-1 Server-2 Server-2 Server-n Server-n Load balancer read/write to two nodes wsrep_desync=ON wsrep_desync=OFF Stream backup BackupPC BackupPC MySQL MySQL MySQL MySQL Galera Galera MySQL MySQL Synchronous replication
    36. 36. Restoring backups 1. Restored backup can be used to prevent SST of new joiners 2. Automated backup verification • Restores (randomly) chosen backup • Installs necessary MySQL version (5.1/5.5) • Perform basic checks • Enable replication • Will not work fully as it needs a working cluster to join
    37. 37. Monitoring 1. Cluster • Nodes in the cluster • Warning at 2, critical at 1 • Availability of the address 1. Load balancer • Node checks 1. Performance monitoring • Adding metrics to mysql_statsd is easy • wsrep_flow_control
    38. 38. Flow control 1. Usage of replication threads • Scale from 0.0 to 1.0 1. Recommended to stay below 0.1 (10% blocked) 2. Adding more nodes will not solve your problem 3. Increase replication threads • Recommended 2*CPU cores • What if 64 is not enough? • How do you close flood gates?
    39. 39. Other things we bumped into… 1. MySQL version updates • Update one by one • PXC SST changes 1. Availability after restart • Joins cluster after IST/SST • LRU still loading 1. In descriptive errors during SST • Local user authentication (after starting mysqld with sudo!) 1. Schema changes
    40. 40. Future for Galera at Spil Games What will we do in the near future?
    41. 41. Openstack 1. Offer DAAS to our (internal) customers 2. Spawning (automated) database nodes and clusters when necessary 3. Mix and match Galera and regular MySQL replication
    42. 42. WAN Replication 1. No immediate use case (yet) • No need for WAN in sharded environment • Game catalogue might need it in the future 1. Wait for Galera 3.0 • Datacenter awareness
    43. 43. MaxScale 1. Beta testing MaxScale for SkySQL • Works flawless in the lab (so far) • Not yet tested with mixed Galera/MySQL replication 1. MaxScale itself is not HA (yet) • Keepalived?
    44. 44. Conclusion What is our verdict?
    45. 45. Conclusion(s) 1. 2. 3. 4. 5. 6. Galera definitely live up to expectations Decreased cluster wide performance Increased replication performance High investment in time for initial setup/tools Maintenance is easier Well worth the investment for us
    46. 46. Thank you! • Presentation can be found at: http://spil.com/fosdem2014 • Mysql_statsd can be found at: http://spil.com/mysqlstatsd http://github.com/spilgames/mysql-statsd • If you wish to contact me: Email: art@spilgames.com Twitter: @banpei • Engineering @ Spil Games Blog: http://engineering.spilgames.com Twitter: @spilengineering 46
    47. 47. Photo sources Our current HA environment: http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highly-available-cluster/ What we have learned so far: http://renaissanceronin.wordpress.com/2009/10/05/playing-with-plasma-cutters/ Near future: http://www.example-infographics.com/envisioning-the-near-future-of-technology/ Conclusion: http://www.flickr.com/photos/louisephotography/5796499806/in/photostream/ 47

    ×