Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Upgrade to MySQL 5.6 without downtime

4,277 views

Published on

MySQL upgrade in a context of high load with hundreds of MySQL instances and terabytes of data.

Published in: Technology
  • Be the first to comment

Upgrade to MySQL 5.6 without downtime

  1. 1. Olivier Dasini - @freshdaz UpgradetoMySQL5.6without downtime Meetup LeMug.fr @Dailymotion - Paris - Sept 17, 2015 1
  2. 2. Olivier Dasini - @freshdaz Agenda Me, Myself & I Technical background Why upgrade to 5.6? Performance testing Preprod upgrade Production upgrade Wrap-up 2
  3. 3. Olivier Dasini - @freshdaz Olivier DASINI - @freshdaz ● MySQL Geek & Data enthusiast ● Technical writer, blogger and speaker ● Insatiable hunger of learning ● co-creator of French MySQL User Group Me,Myself&I 3
  4. 4. Olivier Dasini - @freshdaz Agenda Me, Myself & I Technical background Why upgrade to 5.6? Performance testing Preprod upgrade Production upgrade Wrap-up 4
  5. 5. Olivier Dasini - @freshdaz Technicalbackground1/3 Can split MySQL users in 3 types regarding their working set order of magnitude: ● <= Tens of GBs : 20% ○ MySQL usage probably not (so) critical ○ Migration (quite) easy, could be manual ● <= Tens of TBs : 75% ○ MySQL is critical => strong production constraints ○ Migration should be carefully planned ○ Need automation however some parts could be manual ● >= Hundreds+ of TBs : 5% ○ MySQL highly critical. think twice (or more) before upgrading. ○ Same than above w/ automation (everywhere) 5
  6. 6. Olivier Dasini - @freshdaz Technicalbackground2/3 The company : ● Software development ● Provides a cloud-based customer service platform ○ ~ 1,000 people ○ ~ 60,000 paid customers in 150 countries 6
  7. 7. Olivier Dasini - @freshdaz Technicalbackground3/3 MySQL flavour : Percona Server 5.5 on Fusion IO Data size : ~ 30 TB | Daily growth rate : up to 40 GB # MySQL group of replicas (1 Master / n Slaves) : ~ 50 # MySQL instances : ~ 200 Mostly OLTP oriented workload - InnoDB tables Thousands qps, mostly reads (Selects) Replication lag sensitive No downtime allowed!!! 7
  8. 8. Olivier Dasini - @freshdaz Agenda Me, Myself & I Technical background Why upgrade to 5.6? Performance testing Preprod upgrade Production upgrade Wrap-up 8
  9. 9. Olivier Dasini - @freshdaz Whyupgradeto5.6?1/3 Tons of new cool stuffs : ● Security improvements ● InnoDB enhancements ● Partitioning ● Performance Schema ● Replication and logging ● Optimizer enhancements ● … Complete list : http://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html 9
  10. 10. Olivier Dasini - @freshdaz Whyupgradeto5.6?2/3 Choose what features we'd like to have. Team brainstorming... ● Define which added features will suit ○ Schedule when we'll use them ○ Avoid too many changes at one time ● Pay attention to deprecated features ○ They'll probably be removed in future version ○ Shouldn't be used anymore ● Pay extra attention to removed features ○ They'll break your server 10
  11. 11. Olivier Dasini - @freshdaz Whyupgradeto5.6?3/3 Team brainstorming result : ● InnoDB enhancement ○ Persistent stats ○ Online DDL ○ New flushing algo ○ New checksum algo ● Performance Schema ● Replication ○ Smaller image for Row base replication ○ Crash safe Master ⇔ Crash safe binlog ○ Crash safe Slave ⇔ Table logging for master / slaves info ○ GTID (for automatic Switchover/Failover) : [Phase 2] ○ Parallel replication : [Phase 3] ● Optimizer enhancements... 11 Upgrade Confidence Index : 60%
  12. 12. Olivier Dasini - @freshdaz Agenda Me, Myself & I Technical background Why upgrade to 5.6? Performance testing Preprod upgrade Production upgrade Wrap-up 12
  13. 13. Olivier Dasini - @freshdaz Performancetesting1/13 5.6 upgrade will be awesome (at least in theory) Many articles proves it, Yeah! http://dimitrik.free.fr/blog/archives/2013/02/mysql-performance-mysql-56-vs-mysql-55-vs- mariadb-55.html https://blogs.oracle.com/MySQL/entry/mysql_5_6_is_a Benchmarks never lies :)… but is their truth ours? In real life perf will depend on many factors like workload, hardware, configurations, … What about us? 13
  14. 14. Olivier Dasini - @freshdaz Performancetesting2/13 ● The plan is to get our own numbers ● Compare 5.5 and 5.6 performances in a production context ● Unfortunately we have customers !!! :) ● Out of production but with similar context (as far as possible) ○ Data ○ Queries ○ Workload ○ Hardware ○ Configuration... => Ad-hoc 5.6 upgrade on 1 server 14
  15. 15. Olivier Dasini - @freshdaz Performancetesting3/13 Build 5.6 test server from a 5.5 slave. Choose a "small" cluster (1.5 TB) Ad_hoc upgrade is quite straightforward: Clone a 5.5 server -> Upgrade in 5.6 -> Setting up replication Steps ● Take a binary backup (Xtrabackup) from db5.5 (5.5 instance) ● Restore the binary backup on new server (5.6 candidate but still in 5.5) ● 5.6 binaries upgrade + New configuration (5.6 my.cnf) ● mysql_upgrade ● Start replication (master is still in 5.5) 15
  16. 16. Olivier Dasini - @freshdaz Performancetesting4/13 Issue : Fatal replication error 1/2 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'db_master_5.5-bin-log.003440' at 974453835, the last event read from '/var/log/mysql/db_master_5.5-bin-log.003440' at 974453835, the last byte read from '/var/log/mysql/db_master_5.5-bin-log.003440' at 974453854.' On the master binary log: ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1852797793, event_type: 104 Could not read entry at offset 974453835: Error in log format or read error. #150318 18:09:39 server id 174326798 end_log_pos 107 Start: binlog v 4, server v 5.5.32- 31.0-log created 150318 18:09:39 16
  17. 17. Olivier Dasini - @freshdaz Performancetesting5/13 Issue : Fatal replication error 2/2 ● We've never found any explanation. ● We tried to increase the max_allowed_packet dynamically on both master and the 5.6 slave… but no effect. ● Only 5.6 slave was impacted ie no issues for 5.5 slaves ● No fixes except ignore this binlog ie switch to the next one. ○ Meaning risks of losing events… ○ Also high risks of inconsistency So we dropped the data and reloaded a fresh 5.5 dump + mysql_upgrade. 17
  18. 18. Olivier Dasini - @freshdaz Performancetesting6/13 The goal is to compare performance between 5.5 & 5.6 5.6 status : ○ Replicating data as any other 5.5 slaves ○ Contains production data ○ Same hardware characteristics Ready to start our benchmarks o/ 18
  19. 19. Olivier Dasini - @freshdaz Performancetesting7/13 Tool pt-upgrade : https://www.percona.com/doc/percona-toolkit/2.2/pt-upgrade.html pt-upgrade executes queries in the given MySQL LOGS on each DSN, compares the results, and reports any significant differences. The tool can also save the results for later analyses. LOGS can be slow, general, binary, tcpdump and raw. Best practices ● Split your (slow) logs into small chunks : 200 ~ 500 MB of data ○ Easier to manage ○ Output easier to analyse ● Choose carefully your data samples ○ Capture queries at different time ○ Reduce the risk to missed important queries 19
  20. 20. Olivier Dasini - @freshdaz Performancetesting8/13 Phase 1 - Collect Slow Logs For each collection : ● Connect to 5.5 slave in production ● Set long_query_time to 0 ○ mysql> SET GLOBAL long_query_time = 0; ● Clean slow log ○ $ cp /dev/null /var/log/mysql/slow-log ● Wait for X mins or watch the slow-log grow to ~300MB (whichever comes 1st) ● Set long_query_time to its default value ○ mysql> SET GLOBAL long_query_time = <DEFAULT_VALUE>; ● Copy dated slow log ○ $ cp /var/log/mysql/slow-log ./slow-log-$(date +"%F-%H-%M-%S") ● Clean slow log ○ $ cp /dev/null /var/log/mysql/slow-log 20
  21. 21. Olivier Dasini - @freshdaz Performancetesting9/13 Phase 2 - Benchmarks (cold & warm buffers) and Compare 1/2 1. Ensure both slaves - 5.5 & 5.6 - have no replication lag 2. Stop replication on db_5.5: a. mysql_5.5> STOP SLAVE; 3. Wait for a few seconds.... 4. Stop replication on db_5.6: a. mysql_5.6> STOP SLAVE; 5. Note down the master log file and position from the above step-4. 6. Both slaves should be in perfect sync. Update db_5.5's master log/position to reflect db_5.6's master log/position respectively. So the when pt-upgrade is run, it returns the same set and the number of of rows a. mysql_5.5> START SLAVE SQL_THREAD UNTIL MASTER_LOG_FILE = '<log_file>', MASTER_LOG_POS = <log_position>; 21
  22. 22. Olivier Dasini - @freshdaz Performancetesting10/13 Phase 2 - Benchmarks (cold & warm buffers) and Compare 2/2 7. Run pt-upgrade on db_5.5 (reference results) a. Cold bench (after a mysql restart) b. Warm bench (after the first run) 8. Run pt-upgrade on db_5.6 a. Cold bench (after a mysql restart) b. Warm bench (after the first run) 9. db_5.5. back to production 22
  23. 23. Olivier Dasini - @freshdaz Performancetesting11/13 Our tests was interesting Query response time was usually equals or better in 5.6 However we found 1 big query regression ● Query time: From (0.09 sec) to (16 min 40.35 sec) 23 Upgrade Confidence Index : 75%
  24. 24. Olivier Dasini - @freshdaz Performancetesting12/13 Issue : Query regression ● Basically Optimizer was chosen the wrong index. ● Bug opened to MySQL (by Percona) Possible fixes : ● Disable index extensions algorithm (pre 5.6.9 behavior) ○ SET optimizer_switch="use_index_extensions=off"; ● Use hint: IGNORE / FORCE INDEX ○ … IGNORE INDEX (bad_index) … || … FORCE INDEX (good_index) … ● Use NULL-safe equal operator ie replace "IS NULL" by "<=> NULL" ○ … column_id <=> NULL … ● Rewrite query ○ The most sustainable choice ○ Many possibilities… worked with the appropriate dev team 24
  25. 25. Olivier Dasini - @freshdaz Performancetesting13/13 As soon as the query was fixed and tested we put the 5.6 in production. ● 5.6 is like the other 5.5 slaves ● Monitored closely for weeks ● Slow query logs analysis chown good numbers ○ Fewer slow queries ○ Smaller amount of total slow query time ● Smaller CPU usage So far so good… 25 Upgrade Confidence Index : 90%
  26. 26. Olivier Dasini - @freshdaz Agenda Me, Myself & I Technical background Why upgrade to 5.6? Performance testing Preprod upgrade Production upgrade Wrap-up 26
  27. 27. Olivier Dasini - @freshdaz Preprodupgrade1/9 ● Workload different from production : smaller ● Data size different from production : tinier ● Hardware also different => Not relevant for performance tests But is very important to : ● Test the upgrade process ○ Can't do it manually ○ Should be transparent for our customers ● Know how our internal tools / other apps will behave with 5.6 ○ Databases are used in so many different ways ○ Can't test them all so if it breaks someone will shout! ● Sensibilise other MySQL consumers to this migration ○ We need their feedback This step is also very important because an entire cluster downgrade (back to 5.5) is a painful operation 27
  28. 28. Olivier Dasini - @freshdaz Preprodupgrade2/9 Preprod technical context Flavour : Percona Server 5.5 on VMs Data size : ~ GBs # MySQL group of replicas : 4 # MySQL instances : 12 Mostly OLTP oriented workload - InnoDB tables Hundreds qps, mostly reads (Selects) Replication lag sensitive - Preferably no downtime 28
  29. 29. Olivier Dasini - @freshdaz Preprodupgrade3/9 Overall process - Upgrade the 1st slave ● Put OOR one slave (per) cluster ● Upgrade the slave ⇔ [more details later] ● Put it back to rotation (as a replica) ● Checks / Tests / Monitor ● Backup the slave (Binary backup w/ Xtrabackup) ○ Base backup for other slaves Similar to what we'll use in production (obvious!) 29
  30. 30. Olivier Dasini - @freshdaz Preprodupgrade4/9 Overall process - Upgrade the 2nd (other) slave(s) ● Put OOR the 5.5 slave ● Drop the data ● Upgrade the binaries ● Restore the 5.6 binary backup on this slave. ● Put it back to rotation ● Checks / Tests / Monitor ● So far, a downgrade is still quite easy: ○ Binary backup from master, restore to slave after binaries downgrade 30
  31. 31. Olivier Dasini - @freshdaz Preprodupgrade5/9 Overall process - Upgrade the master Last step, easy but very sensitive ● Switch master failover ○ Promote a 5.6 slave to become the new master ○ Usually less than 1 second in read only mode ● Then upgrade the old master & restore it from 5.6 backup ● We have our internal tool for switch master failover ○ but 5.6 broke it… ○ Whole cluster in a read only state without master ie no write allowed ○ Fortunately that happens in preprod :) 31
  32. 32. Olivier Dasini - @freshdaz Preprodupgrade6/9 Issue : Internal tools broken - Switch master failover The tool uses deprecated statements SLAVE START and SLAVE STOP, instead of START SLAVE and STOP SLAVE. But they were removed in 5.6. In old versions of MySQL (before 4.0.5), this statement was called SLAVE START. This usage is still accepted in MySQL 5.5 for backward compatibility, but is deprecated and is removed in MySQL 5.6 : https://dev.mysql.com/doc/refman/5.5/en/start-slave.html The SLAVE START and SLAVE STOP statements. Use The START SLAVE and STOP SLAVE statements : http://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html Fix: Use the right statements 32
  33. 33. Olivier Dasini - @freshdaz Preprodupgrade7/9 Issue : Internal tools broken - Internal usage Because of the new configuration, new information are logged in the binlog: You can also cause the server to write checksums for the events using CRC32 checksums by setting the binlog_checksum system variable : http://dev.mysql. com/doc/refman/5.6/en/mysql-nutshell.html http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log. html#sysvar_binlog_checksum These tools parses the binlog… Fix : Development by the relevant team 33
  34. 34. Olivier Dasini - @freshdaz Preprodupgrade8/9 Upgrade workflow 1/2 1. Extract schema and data + Pre-upgrade checks 2. Drop MySQL directories (datadir, logdir) [ binaries upgraded to 5.6 by OPS + Disk encryption ] : OPS tasks 3. Load schema + Post-upgrade checks 4. Load data + Post-upgrade check2 & Compare differences in "before" & "after" checks Checks: object count, charset,... 34
  35. 35. Olivier Dasini - @freshdaz Preprodupgrade9/9 Upgrade workflow 2/2 ● Upgrade process was split in a dozen of scripts ● Theses scripts was called by 4 main wrapper scripts for convenience ● 2 types of granularity provide more flexibility ○ In case of issue DBAs can resume the process "manually" at any step ○ An extra step can easily be added eg (schema modification) ● Automation is important ○ Tasks are pretty straightforward but time consuming ○ Lowering risk of error ○ Hundreds of servers ● DBA needs to be aware of the status ● Script sends emails to DBAs when ○ Task is completed ○ In case of error 35 Upgrade Confidence Index : 95%
  36. 36. Olivier Dasini - @freshdaz Agenda Me, Myself & I Technical background Why upgrade to 5.6? Performance testing Preprod upgrade Production upgrade Wrap-up 36
  37. 37. Olivier Dasini - @freshdaz Produpgrade Final step(s), final tests ● Preprod is similar but not identical to prod. ● To be more comfortable we ○ Added extra slaves on our smaller clusters ○ Ran the full process on them ● Not possible to test the switch master failover ● But we were confident enough to start, so we started ○ In progress... 37 Upgrade Confidence Index : 99%
  38. 38. Olivier Dasini - @freshdaz Agenda Me, Myself & I Technical background Why upgrade to 5.6? Performance testing Preprod upgrade Production upgrade Wrap-up 38
  39. 39. Olivier Dasini - @freshdaz Wrap-up ● Identified what's relevant for you in the new release ○ Understand the changes : added / removed features ○ Don't be an earlier adopter (if you don't have a proper support team) : let other clean the way ● Make your own tests ○ Performance : related to your workload / data set ○ Functional : are your apps depend on a removed/changed feature? ● Split the work in lots ○ Easier to manage/debug/... ● Automation ○ Manual things are error prone ○ Write it once, use it at will ● Communication ○ Explain / describe what you are going to do ○ Involve consumers, looking for their feedback 39
  40. 40. Olivier Dasini - @freshdaz Questions? 40 Thank you! Olivier DASINI Twitter : @freshdaz Mail : olivier@dasini.net Skype : olivier.dasini

×