Migrate an
infrastructure to
Galera Cluster
Art van Scheppingen
Head of Database Engineering
2
1. Who are we?
2. What is Galera?
3. What is Spil Games using Galera for?
4. What have we learned?
5. Future technologie...
Who are we?
Who is Spil Games?
4
• Game publishers
• Company founded in 2001
• 350+ employees world wide
• 180M+ unique visitors per month
• Over 60M reg...
5
Geographic Reach
180 Million Monthly Active Users(*)
Source: (*) Google Analytics, August 2012
What is Galera?
How to get Highly Available
and beyond
1. Replication plugin for MySQL by Codership
• Synchronous (parallel) replication
• Supports InnoDB
• MyISAM “works”
• Com...
How does Galera work?
Galera
Server-1 Server-2 Server-n
MySQL MySQL MySQL
Connect/read/write
to any node
Synchronous repli...
Galera synchronous replication
Galera replication
Server-n
MySQL MySQL MySQL
commit
Client receives OK
Transaction applied...
Galera vs Master-Slave replication
Server-1 Server-2 Server-n
MySQL
master
MySQL
slave
read onlyread+write
Asynchronous
si...
High Availability (1)
Galera
Server-1 Server-2 Server-n
MySQL MySQL MySQL
High Availability (2)
Galera
Server-1 Server-2 Server-n
Load balancer / Query router
MySQL MySQL MySQL
High Availability (3)
Galera
Server-1
Load balancer
MySQL MySQL MySQL
Server-2
Load balancer
Server-n
Load balancer
High Availability (4)
Galera
Server-1 Server-2 Server-n
Load balancer (port 3306 read,3307 write)
MySQL MySQL MySQL
read+w...
GaleraGalera
Server-1 Server-2 Server-n
Load balancer
read/write to two nodes
New
node
SST
Requests to
join cluster
Cluste...
GaleraGalera
Server-1 Server-2 Server-n
Load balancer
Existing
node
IST
Requests to
join cluster
MySQL MySQL MySQL
Synchro...
Galera replication over WAN
Galera replication
Server-n
MySQL MySQL
DC1 DC2
Commit is delayed by RTT
WAN replication Galera 2.x
DC1 DC2
Node 1
Node 2 Node 3
Node 4
Node 5 Node 6
WAN replication Galera 3.x
DC1 DC2
Node 1
Node 2 Node 3
Node 4
Node 5 Node 6
What are we
using Galera for?
Synchronous replication for
the masses
1. Legacy services databases
• MySQL Master-Master
2. SSP (Spil Storage Platform)
• MySQL Master-Master (last pair remaini...
Master-Master setup used at Spil Games
Server-1 Server-2 Server-n
MySQL
active
master
MySQL
inactive
master
Asynchronous r...
Master-Master setup used at Spil Games
MySQL
active
master
MySQL
inactive
master
read onlyread+write
MMM
db-something (192...
Migrating legacy dbs to Galera (lab)
Galera
MySQL MySQL MySQL
legacy1
inactive
master
MySQL
Clone database
(innobackupex) ...
Scaling Galera (1)
Server-1 Server-2 Server-n
Load balancer (port 3306 write+read)
Galera GaleraGalera
MySQLMySQL MySQL My...
Scaling Galera (2)
Server-1 Server-2 Server-n
Load balancer (port 3306 write, 3307 read)
Galera
MySQL MySQL MySQLMySQL MyS...
1. Around 20 legacy database clusters
• 50 servers in total
2. Maintenance
• Master-Master requires a lot of (manual)
main...
28
• Storage API between application and databases
• All data is sharded
• User
• Function
• Location
• Initial design:
• ...
SSP Master-Master setup
Server-1 Server-2 Server-n
MySQL
active
master
MySQL
active
master
Asynchronous replication
read+w...
SSP Master-Master setup
Server-1 Server-2 Server-n
MySQL
broken
master
MySQL
active
master
read+write
MMM
db-ssp001 (192.1...
SSP Galera setup
Galera
Server-1 Server-2 Server-n
Load balancer
MySQL MySQL MySQL
read/write to any node
Synchronous repl...
1. Total of 2 old style SSP shard nodes (1 pair)
2. Total of 6 Galera SSP shard nodes (2 clusters)
3. 6 more nodes are pla...
What have we
learned so far?
Pitfalls, hurdles, etc
Creating backups
1. Two ways to make backups:
• Issue SST
• Either mysqldump or Innobackupex
• Regular Innobackupex
• --ga...
Backup SST
Galera
Server-1 Server-2 Server-n
Load balancer
MySQL MySQL MySQL
read/write to two nodes
Synchronous replicati...
Backup Innobackupex
Galera
Server-1 Server-2 Server-n
Load balancer
MySQL MySQL MySQL
read/write to two nodes
Synchronous ...
Restoring backups
1. Restored backup can be used to prevent SST of new
joiners
2. Automated backup verification
• Restores...
Monitoring
1. Cluster
• Nodes in the cluster
• Warning at 2, critical at 1
• Availability of the address
2. Load balancer
...
Flow control
1. Usage of replication threads
• Scale from 0.0 to 1.0
2. Recommended to stay below 0.1 (10% blocked)
3. Add...
Other things we bumped into…
1. MySQL version updates
• Update one by one
• PXC SST changes
2. Availability after restart
...
Future for Galera
at Spil Games
What will we do in the near
future?
Openstack
1. Offer DAAS to our (internal) customers
2. Spawning (automated) database nodes and clusters
when necessary
3. ...
WAN Replication
1. No immediate use case (yet)
• No need for WAN in sharded environment
• Game catalogue cluster needs it ...
MaxScale
1. Beta testing MaxScale for SkySQL
• Works flawless in the lab (so far)
• Not yet tested with mixed Galera/MySQL...
Conclusion
What is our verdict?
Conclusion(s)
1. Galera definitely live up to expectations
2. Decreased cluster wide performance
3. Increased replication ...
Questions?
• Building globally available storage layers
Friday 1:50PM - 2:40PM @ Ballroom F
• MaxScale, the pluggable router
Today 4:...
49
• Presentation can be found at:
http://spil.com/plsc2014galera
• If you wish to contact me:
Email: art@spilgames.com
Tw...
50
Our current HA environment:
http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highly-available...
Upcoming SlideShare
Loading in...5
×

Spil Games: Migrate an infrastructure to Galera Cluster

2,835

Published on

Galera Cluster is a synchronous multi-master cluster for MySQL that allows you to synchronously replicate your data to every node in the cluster. Galera Cluster makes the life of a DBA easier with features like automatic node joining, electing donor nodes and automatic node removal once a node has failed. There is also no need to distinguish master and slave relations in your application as all nodes in the cluster are writable. Consider all nodes in the cluster as one big MySQL database server.

In this talk I will explain how Spil Games strives to have their entire MySQL environment running on Galera Cluster. At Spil Games we have a large variety of MySQL implementations ranging from traditional functionally sharded database master-master pairs, single masters with many read slaves and user/location sharded database nodes. Migrating these different implementations to a Galera Cluster requires careful planning.

The biggest challenge is to migrate all our existing (ageing) asynchronous Master-Master setup to a synchronous Galera replicator. Because the ageing servers are less performant than the new servers used for Galera we merge multiple existing clusters into one cluster, saving us lots of space and power in our data center.

The session will include design choices, learnings and the pitfalls we fell into.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,835
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • so that may be the reason our name is not widely known.
  • Transcript of "Spil Games: Migrate an infrastructure to Galera Cluster"

    1. 1. Migrate an infrastructure to Galera Cluster Art van Scheppingen Head of Database Engineering
    2. 2. 2 1. Who are we? 2. What is Galera? 3. What is Spil Games using Galera for? 4. What have we learned? 5. Future technologies 6. Conclusion 7. Questions? Overview
    3. 3. Who are we? Who is Spil Games?
    4. 4. 4 • Game publishers • Company founded in 2001 • 350+ employees world wide • 180M+ unique visitors per month • Over 60M registered users • 45 portals in 19 languages • Casual games • Social games • Real time multiplayer games • Mobile games • 40+ MySQL clusters • 65k queries per second • 8 Sphinx servers • 8k queries per second Facts
    5. 5. 5 Geographic Reach 180 Million Monthly Active Users(*) Source: (*) Google Analytics, August 2012
    6. 6. What is Galera? How to get Highly Available and beyond
    7. 7. 1. Replication plugin for MySQL by Codership • Synchronous (parallel) replication • Supports InnoDB • MyISAM “works” • Committing transactions actually replicates data 2. Allows clustering of nodes • Minimum of 3 nodes for HA • Galera Arbitrator allows 2 nodes • Primary Component • Normally the full cluster is a PC • Split: several components, one PC is allowed What is Galera?
    8. 8. How does Galera work? Galera Server-1 Server-2 Server-n MySQL MySQL MySQL Connect/read/write to any node Synchronous replication
    9. 9. Galera synchronous replication Galera replication Server-n MySQL MySQL MySQL commit Client receives OK Transaction applied to slaves
    10. 10. Galera vs Master-Slave replication Server-1 Server-2 Server-n MySQL master MySQL slave read onlyread+write Asynchronous single-threaded replication
    11. 11. High Availability (1) Galera Server-1 Server-2 Server-n MySQL MySQL MySQL
    12. 12. High Availability (2) Galera Server-1 Server-2 Server-n Load balancer / Query router MySQL MySQL MySQL
    13. 13. High Availability (3) Galera Server-1 Load balancer MySQL MySQL MySQL Server-2 Load balancer Server-n Load balancer
    14. 14. High Availability (4) Galera Server-1 Server-2 Server-n Load balancer (port 3306 read,3307 write) MySQL MySQL MySQL read+write read only read only
    15. 15. GaleraGalera Server-1 Server-2 Server-n Load balancer read/write to two nodes New node SST Requests to join cluster Cluster drains node MySQL MySQL MySQL Synchronous replication MySQL Node joining SST (State Snapshot Transfer)
    16. 16. GaleraGalera Server-1 Server-2 Server-n Load balancer Existing node IST Requests to join cluster MySQL MySQL MySQL Synchronous replication MySQL Node joining IST (Incremental State Transfer)
    17. 17. Galera replication over WAN Galera replication Server-n MySQL MySQL DC1 DC2 Commit is delayed by RTT
    18. 18. WAN replication Galera 2.x DC1 DC2 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6
    19. 19. WAN replication Galera 3.x DC1 DC2 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6
    20. 20. What are we using Galera for? Synchronous replication for the masses
    21. 21. 1. Legacy services databases • MySQL Master-Master 2. SSP (Spil Storage Platform) • MySQL Master-Master (last pair remaining) • Galera 3. ROAR (Read Often, Alter Rarely) • Galera / Sphinx Our systems
    22. 22. Master-Master setup used at Spil Games Server-1 Server-2 Server-n MySQL active master MySQL inactive master Asynchronous replication read onlyread+write MMM db-something (192.168.1.1) db-something-r1 (192.168.1.2) db-something-r2 (192.168.1.3)
    23. 23. Master-Master setup used at Spil Games MySQL active master MySQL inactive master read onlyread+write MMM db-something (192.168.1.1) db-something-r1 (192.168.1.2) db-something-r2 (192.168.1.3) MySQL slave MySQL slave db-something-r3 (192.168.1.4) db-something-r4 (192.168.1.5)read only
    24. 24. Migrating legacy dbs to Galera (lab) Galera MySQL MySQL MySQL legacy1 inactive master MySQL Clone database (innobackupex) Feed database dump (mysqldump) Start slaving legacy3 inactive master legacy2 inactive master
    25. 25. Scaling Galera (1) Server-1 Server-2 Server-n Load balancer (port 3306 write+read) Galera GaleraGalera MySQLMySQL MySQL MySQLMySQL
    26. 26. Scaling Galera (2) Server-1 Server-2 Server-n Load balancer (port 3306 write, 3307 read) Galera MySQL MySQL MySQLMySQL MySQL read only read only asynchronous replication asynchronous replication
    27. 27. 1. Around 20 legacy database clusters • 50 servers in total 2. Maintenance • Master-Master requires a lot of (manual) maintenance 3. Replacement is needed • 35 of them will be older than 3 years in 2014 4. Current state: 1 pair migrated to a cluster in Openstack Why consolidate legacy systems?
    28. 28. 28 • Storage API between application and databases • All data is sharded • User • Function • Location • Initial design: • Every cluster (two masters) will contain two shards • Data written interleaved • HA for both shards • Both masters active and “warmed up” SSP (Spil Storage Platform) SSP Shard 1 Shard 2
    29. 29. SSP Master-Master setup Server-1 Server-2 Server-n MySQL active master MySQL active master Asynchronous replication read+writeread+write MMM db-ssp001 (192.168.2.1) db-ssp002 (192.168.2.2)
    30. 30. SSP Master-Master setup Server-1 Server-2 Server-n MySQL broken master MySQL active master read+write MMM db-ssp001 (192.168.2.1) db-ssp002 (192.168.2.2)
    31. 31. SSP Galera setup Galera Server-1 Server-2 Server-n Load balancer MySQL MySQL MySQL read/write to any node Synchronous replication
    32. 32. 1. Total of 2 old style SSP shard nodes (1 pair) 2. Total of 6 Galera SSP shard nodes (2 clusters) 3. 6 more nodes are planned for Q2 4. Add Galera nodes/clusters when necessary Current state of the SSP
    33. 33. What have we learned so far? Pitfalls, hurdles, etc
    34. 34. Creating backups 1. Two ways to make backups: • Issue SST • Either mysqldump or Innobackupex • Regular Innobackupex • --galera_info • set global wsrep_desync=on to remove node
    35. 35. Backup SST Galera Server-1 Server-2 Server-n Load balancer MySQL MySQL MySQL read/write to two nodes Synchronous replication Backup receiver SST Request SST Cluster drains node Backup done
    36. 36. Backup Innobackupex Galera Server-1 Server-2 Server-n Load balancer MySQL MySQL MySQL read/write to two nodes Synchronous replication BackupPC wsrep_desync=ON Stream backup wsrep_desync=OFF
    37. 37. Restoring backups 1. Restored backup can be used to prevent SST of new joiners 2. Automated backup verification • Restores (randomly) chosen backup • Installs necessary MySQL version (5.1/5.5) • Perform basic checks • Enable replication • Will not work fully as it needs a working cluster to join
    38. 38. Monitoring 1. Cluster • Nodes in the cluster • Warning at 2, critical at 1 • Availability of the address 2. Load balancer • Node checks • Write only on one node 3. Performance monitoring • Adding metrics to mysql_statsd is easy • wsrep_flow_control
    39. 39. Flow control 1. Usage of replication threads • Scale from 0.0 to 1.0 2. Recommended to stay below 0.1 (10% blocked) 3. Adding more nodes will not solve your problem 4. Increase replication threads • Recommended 2*CPU cores • What if 64 is not enough? • How do you close flood gates?
    40. 40. Other things we bumped into… 1. MySQL version updates • Update one by one • PXC SST changes 2. Availability after restart • Joins cluster after IST/SST • LRU still loading 3. In descriptive errors during SST • Local user authentication (after starting mysqld with sudo!) 4. Schema changes
    41. 41. Future for Galera at Spil Games What will we do in the near future?
    42. 42. Openstack 1. Offer DAAS to our (internal) customers 2. Spawning (automated) database nodes and clusters when necessary 3. Mix and match Galera and regular MySQL replication
    43. 43. WAN Replication 1. No immediate use case (yet) • No need for WAN in sharded environment • Game catalogue cluster needs it in the future 2. Test Galera 3.0 • Datacenter awareness
    44. 44. MaxScale 1. Beta testing MaxScale for SkySQL • Works flawless in the lab (so far) • Not yet tested with mixed Galera/MySQL replication 2. MaxScale itself is not HA (yet) • Keepalived?
    45. 45. Conclusion What is our verdict?
    46. 46. Conclusion(s) 1. Galera definitely live up to expectations 2. Decreased cluster wide performance 3. Increased replication performance 4. High investment in time for initial setup/tools 5. Maintenance is easier 6. Well worth the investment for us
    47. 47. Questions?
    48. 48. • Building globally available storage layers Friday 1:50PM - 2:40PM @ Ballroom F • MaxScale, the pluggable router Today 4:50PM - 5:40PM @ Ballroom A • Geographical Disaster recovery challenges and solutions Today 4:50PM - 5:40PM @ Ballroom C • Write Conflicts in Multi-Master Replication Topologies Thursday 4:30PM - 5:20PM @ Ballroom H Other (Galara related) talks to attend to
    49. 49. 49 • Presentation can be found at: http://spil.com/plsc2014galera • If you wish to contact me: Email: art@spilgames.com Twitter: @banpei • Engineering @ Spil Games Blog: http://engineering.spilgames.com Twitter: @spilengineering Thank you!
    50. 50. 50 Our current HA environment: http://thinkaurelius.com/2013/03/30/titan-server-from-a-single-server-to-a-highly-available-cluster/ What we have learned so far: http://renaissanceronin.wordpress.com/2009/10/05/playing-with-plasma-cutters/ Near future: http://www.example-infographics.com/envisioning-the-near-future-of-technology/ Conclusion: http://www.flickr.com/photos/louisephotography/5796499806/in/photostream/ Photo sources

    ×