Reducing Risk When Upgrading
Your MySQL Environment
 
Kenny Gryp
MySQL Practice Manager
My Experience as MySQL
Consultant On Upgrading MySQL
it's quite complex...
Kenny Gryp
MySQL Practice Manager
Table of Contents
The O cial Documentation
Make Your Own Documentation
Potential Risks
Establish Upgrade Method For A Single Server
Rollback Scenario Testing
Test Writes
Test Individual Reads
Workload Testing
Establish (& Test) Migration Process
Migration In Production
(Rollback)
Post-Migration Assessment
3 / 77
The O cial Documentation
4 / 77
Oracle's Recommended Process
Backup your data
Read all release notes and assess
https://dev.mysql.com/doc/relnotes/mysql/5.7/en/
Read Changes Affecting Upgrades to MySQL 5.7
https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-
previous-series.html
5 / 77
Oracle's Recommended Process
Upgrade Slaves First
6 / 77
Oracle's Recommended Process
Upgrade Slaves First
In-Place Upgrade:
Clean shutdown (innodb_fast_shutdown=0)
Run mysql_upgrade
Logical Upgrade:
mysqldump data
Import data again
Run mysql_upgrade to x mysql schema
http://dev.mysql.com/doc/refman/5.7/en/upgrading.html
7 / 77
Oracle's Recommended Process (cont.)
A Lot of Risk:
No guarantee queries will execute the same
No guarantee queries will be same speed or faster
No guarantee all your queries will still work (new default
stricter sql_mode)
There is no o cial support to upgrade from <5.6 to 5.7
but we might actually be able to do that
8 / 77
do-it-yourself
Documenting The Process
9 / 77
Documenting The Process
PEBKAC: Human errors happen and create issues
import data using wrong character set
setting up replica using wrong binlog le/pos
...
Document every step, we need to repeat it multiple times
10 / 77
making you afraid to upgrade by describing
Potential Risks
11 / 77
Optimizer Changes
Example: index_merge_intersection
Often seen during migrations to MySQL 5.6
Affects environments with sub-optimal indexing
Queries with c1='a' AND c2='b' when composite index
(c1,c2) is missing
Is often slower when selectivity with 1 of the 2 columns is
bad (and it happens frequently)
Result: a lot of queries were slower in new environment
Need SELECT performance tests between versions
https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/
12 / 77
New Defaults In MySQL 5.7
The new defaults in MySQL 5.7 make a lot of sense:
More use of available features and performance
enhancements out of the box
More strictness with data/query validation
New Reserved words
Applications might not be ready for it.
Drupal 7 - https://www.drupal.org/node/2545480
They will/might break the application more easily:
sql_mode=ONLY_FULL_GROUP_BY,
STRICT_TRANS_TABLES, NO_ZERO_IN_DATE,
NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO,
NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION
innodb_strict_mode=1
Needs SELECT & DML query validity tests between versions 13 / 77
Other Changes in MySQL 5.7
Passwords that use the older pre-4.1 password hashing format
is removed.
14 / 77
MySQL 5.0.37
+-------+
| 0 |
+-------+
MySQL 5.0.45
+-------+
| 1 |
+-------+
Minor Versions Also At Risk
CREATE TABLE date (d DATE);
INSERT INTO date VALUES ('2017-04-19');
SELECT COUNT(*) FROM date
WHERE d < NOW()-INTERVAL 1 DAY;
Seen with DELETE FROM date WHERE d < NOW()-
INTERVAL 1 DAY in binlog_format=STATEMENT
environments.
Needs SELECT & DML query result tests between versions
15 / 77
Workload
SYNC_BINLOG=1 in MySQL 5.7
Can impact certain environments, might not be noticed when
looking at a single query
InnoDB LRU Flushing changes require tuning for heavy
workloads in 5.6
(innodb_lru_scan_depth)
When switching to MySQL 8.0 with the new data dictionary
...
Need to do Workload Testing between versions
http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.html
http://lefred.be/content/sync_binlog-1-in-5-7/
16 / 77
How Do We Reduce All This Risk?
17 / 77
Testing!
18 / 77
establish
Upgrade Method For A Single Server
19 / 77
Upgrade Method For A Single Server
Follow MySQL documentation:
http://dev.mysql.com/doc/refman/5.7/en/upgrading.html
Ensure to document every command
Restore from backup
Or take a replica you can miss
20 / 77
Upgrade Method For A Single Server
21 / 77
Upgrade Method For A Single Server
22 / 77
Replication Consistency
Testing Writes
23 / 77
Writes - Replication Consistency
pt-table-checksum:
validate consistency in a replication topology
Identify problems caused by PEBKAC
Ensure events replicate properly
(binlog_format=STATEMENT)
Upgrade a replica or add a replica which is using the modi ed
version.
Do it on production, will have no result in test/staging
https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html
24 / 77
Writes - Replicate Test Server
25 / 77
often left behind is
Rollback Scenario Testing
26 / 77
Rollback Scenario Testing
Possibility to fall back in case something went wrong during
migration
Can be done using replication, but has to be tested!
27 / 77
Writes - Rollback Testing
28 / 77
Rollback Scenario Testing
You might need to change some settings to your new my.cnf to be
able to support replicating back.
Example:
binlog_checksum = NONE
binlog_row_image = FULL
binlog_rows_query_log_events = OFF
log_bin_use_v1_row_events = 1
gtid_mode = OFF
log_slave_updates=1
skip-slave-start
29 / 77
Writes - Checksums - GTID
30 / 77
Writes - Checksums - Non-GTID
31 / 77
Writes - Checkums - ROW
32 / 77
Writes - Checkums - ROW
33 / 77
Where To Run pt-table-checksum?
GTID:
pt-table-checksum can only be run on Master
(Errant Transactions)
Or scratch the pt-table-checksum host after tests
non-GTID:
pt-table-checksum can be run on intermediate master
binlog_format=ROW:
only 1 tier below can be checksummed
run on every tier that has a replica (for rollback)
pt-table-checksum can bring prod overhead when run on
active master
Let replication run for a while before checksumming
34 / 77
pt-table-checksum results
On every replica (including rollback):
SELECT db, tbl, SUM(this_cnt) AS total_rows,
COUNT(*) AS chunks
FROM percona.checksum
WHERE (master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;
+----+-----------------+------------+--------+
| db | tbl | total_rows | chunks |
+----+-----------------+------------+--------+
| db | telephone_debit | 44342 | 1 |
| db | orderline | 21451 | 3 |
| db | orders | 25125215 | 12 |
+----+-----------------+------------+--------+ 35 / 77
pt-table-checksum - Analysis
Troubleshooting starts now...
What went wrong?
36 / 77
pt-table-checksum - Analysis
Which chunks failed?
db: db
tbl: telephone_debit
chunk: 100
chunk_time: 0.4956125
chunk_index: PRIMARY
lower_boundary: 5014733
upper_boundary: 5059074
this_crc: 7fd37eb9
this_cnt: 44342
master_crc: b7babd94
master_cnt: 44342
ts: 2013-02-05 01:59:48
37 / 77
pt-table-checksum - Analysis
Which chunks failed?
db: db
tbl: telephone_debit
chunk: 100
chunk_time: 0.4956125
chunk_index: PRIMARY
lower_boundary: 5014733
upper_boundary: 5059074
this_crc: 7fd37eb9
this_cnt: 44342
master_crc: b7babd94
master_cnt: 44342
ts: 2013-02-05 01:59:48
38 / 77
pt-table-checksum - Analysis
SELECT *
INTO outfile '/tmp/telephone_debit_mysql56'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
SELECT *
INTO outfile '/tmp/telephone_debit_mysql57'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
39 / 77
pt-table-checksum - Analysis
SELECT *
INTO outfile '/tmp/telephone_debit_mysql56'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
SELECT *
INTO outfile '/tmp/telephone_debit_mysql57'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
Use twindb_table_compare!
https://github.com/twindb/twindb_table_compare
40 / 77
pt-table-checksum - Analysis
Wrong upgrade method
backups
wrong replication le/pos
...
binlog_format=STATEMENT using (UUID()...)
Common Seen Issues replicating older versions:
Floating point differences: Storing currencies in a DOUBLE
Temporal data types
Invalid dates converted to zero dates
Trailing spaces in CHAR elds
41 / 77
Testing Writes
Consistency Checks Process:
Checksum
Check for differences
On new environment
On rollback environment
For each inconsistency
Analyze diff
Find root cause
Fix problem
Document problem & solution
Repeat checksum again
42 / 77
Testing Individual Reads
43 / 77
Testing Reads - Collect Queries
44 / 77
Testing Reads - Collect Queries
Collection Techniques:
Slow Query Log
long_query_time=0
Careful when ~+10000 QPS
Percona Server: log_slow_rate_limit
tcpdump
'packets lost' in libpcap
Application/Load Balancer queries
Ensure:
Get the full workload (long enough)
Get data from Master & Replicas
Collect batchjob queries running at night
https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html
45 / 77
Testing Reads - Setup 2 Environments
46 / 77
Testing Reads - Setup 2 Environments
Need 2 Test Servers:
Reuse servers from checksum + rollback
Ensure they have the same data
(break replication at same time)
Same HW speci cations
Similar Con gurations on buffer pool, flatc...
Fast enough to more or less resemble production
Optionally can be done using 1 machine
(pt-upgrade --save-results)
47 / 77
Testing Reads - pt-upgrade
48 / 77
Testing Reads - pt-upgrade
pt-upgrade:
runs one query at a time on both test environments
compares differences:
warnings/errors
resultset (even different order)
query response time
Run pt-upgrade on third host with similar network latency
Run twice to warm up buffer pool rst (need to be equal)
Can also compare writes for execution time & warnings
Filter slowlog initially to limit similar queries
pt-query-digest
--no-report --output slowlog --samples 20
https://www.percona.com/doc/percona-toolkit/3.0/pt-upgrade.html
49 / 77
Testing Reads - pt-upgrade
Reporting class because there are 1000 row diffs.
Total queries 10
Unique queries 10
Discarded queries 0
select ... from ...
##
## Row diffs: 10
##
-- 1.
@ row 2
< 13178,"dim0",37,2,21,,,0,0,0,1,NULL,NULL
> 13178,"dimø",37,2,21,,,0,0,0,1,NULL,NULL
...
50 / 77
Testing Reads - pt-upgrade
Reporting class because it has diffs,
but hasn't been reported yet.
SELECT * FROM `database`.table
WHERE treeid = '' AND productid='0'
## Warning diffs: 2
Code: 1366
Level: Warning
Message: Incorrect integer value: ''
for column 'treeid' at row 1
vs.
No warning 1366
51 / 77
Testing Reads - pt-upgrade
SELECT *
FROM `database`.client_orders
WHERE client=?
AND blacklist=? LIMIT ?
## Query time diffs: 1
-- 1.
0.000513 vs. 0.036395 seconds (70.9x increase)
SELECT *
FROM `database`.client_orders
WHERE client=57450
AND blacklist=1 LIMIT 1
52 / 77
Testing Reads Process
Collect queries
Run pt-upgrade (twice)
For each entry in report
Figure out why it is reported
Deploy x in Prod Application
Make schema changes
Document analysis
Run pt-upgrade again
53 / 77
one of the most challenging is
Testing Workload
54 / 77
Workload Testing - Percona Playback
55 / 77
Workload Testing - Query Playback
Uses slowlog to replay queries
Needs long_query_time=0 - challenging on busy servers
Enough data during peak workload
Tries to execute workload as realistically as possible
same connections, same transactions, same delays between
queries
Run against both environments, compare speed
Think about preloading buffer on both the same way
Active development by Marius Wachtler (ex)-DropBox! Thank
you!
(uno cal product of Percona, no support)
56 / 77
Workload Testing - ProxySQL Mirroring
57 / 77
Workload Testing - ProxySQL Mirroring
Mirror queries from Load Balancer to test environment
Good Blogpost:
https://www.pythian.com/blog/using-proxysql-validate-mysql-
updates/
58 / 77
establish (& test)
Migration Process
59 / 77
Migration Process
Create Migration Plan
Different for every environment/application
Upgrade a replica rst for a couple of days/weeks?
How to switch masters?
How is failover being handled nowadays?
MHA, Orchestrator, Manual, GTID/msyqlrpladmin...?
Test in staging!
60 / 77
the actual
Migration In Production
61 / 77
Migration - Create Slave Environments
62 / 77
Migration - Redirect Read Tra c
63 / 77
Migration - Application Switchover - 1
64 / 77
Migration - Application Switchover - 2
65 / 77
you (think you) will never need to do a
Rollback
66 / 77
Rollback
67 / 77
Rollback
What went wrong?
I did not follow the full process! (or I forgot to document it)
Do consistency checks again!
68 / 77
after all that testing, it's ok to spend time doing
Post-Migration Assessment
69 / 77
Post-Migration
Check trending for different behavior
more cpu load?
more disk IO?
higher amount of innodb_rows_* and handler_*
threads_running stability?
do some query optimization
If all looks good, scratch the 5.6 rollback & make it 5.7
Remove the rollback speci c con guration options
70 / 77
Post Migration Cleanup
71 / 77
small recap
Summary
72 / 77
Multi-Use
(Minor MySQL version upgrades)
Major MySQL version upgrades
Switching Hardware from Intel -> AMD archicture
Using a new kernel/libc/memory allocator
Switching storage engines
MariaDB/Percona Server/MySQL
...
73 / 77
Do I really have to go through this?
Many success stories:
Have done several MySQL upgrades from 4.1 -> 5.5
without intermediate slaves
Upgraded environments with major schema changes in the mix
(mssql-style environments using stored procedures only)
Found numerous application bugs using this process
Optimized many customers schemas/queries in the meantime
As long as you follow this process completely,
the risk of running into problems is quite small.
74 / 77
Do I really have to go through this?
It Depends:
Your business might be risk-averse:
every change has to be thoroughly tested
Other companies just upgrade a replica in production and see
how it goes
My suggestion to do this at least for:
Major MySQL version upgrades
Switching storage engines
75 / 77
Summary
Test Step Skip?
Document Upgrade Single Server Really? Why?
Rollback Scenarios Not Recommended
Consistency Checks Required, No Debate!
Read Tests Strongly Suggested
Workload Tests Possible (Early Adopter Alert)
Migration Tests Not Recommended To Skip
76 / 77
Reducing Risk When Upgrading
Your MySQL Environment
Q&A!
Kenny Gryp
MySQL Practice Manager

Reducing Risk When Upgrading MySQL

  • 1.
    Reducing Risk WhenUpgrading Your MySQL Environment   Kenny Gryp MySQL Practice Manager
  • 2.
    My Experience asMySQL Consultant On Upgrading MySQL it's quite complex... Kenny Gryp MySQL Practice Manager
  • 3.
    Table of Contents TheO cial Documentation Make Your Own Documentation Potential Risks Establish Upgrade Method For A Single Server Rollback Scenario Testing Test Writes Test Individual Reads Workload Testing Establish (& Test) Migration Process Migration In Production (Rollback) Post-Migration Assessment 3 / 77
  • 4.
    The O cialDocumentation 4 / 77
  • 5.
    Oracle's Recommended Process Backupyour data Read all release notes and assess https://dev.mysql.com/doc/relnotes/mysql/5.7/en/ Read Changes Affecting Upgrades to MySQL 5.7 https://dev.mysql.com/doc/refman/5.7/en/upgrading-from- previous-series.html 5 / 77
  • 6.
  • 7.
    Oracle's Recommended Process UpgradeSlaves First In-Place Upgrade: Clean shutdown (innodb_fast_shutdown=0) Run mysql_upgrade Logical Upgrade: mysqldump data Import data again Run mysql_upgrade to x mysql schema http://dev.mysql.com/doc/refman/5.7/en/upgrading.html 7 / 77
  • 8.
    Oracle's Recommended Process(cont.) A Lot of Risk: No guarantee queries will execute the same No guarantee queries will be same speed or faster No guarantee all your queries will still work (new default stricter sql_mode) There is no o cial support to upgrade from <5.6 to 5.7 but we might actually be able to do that 8 / 77
  • 9.
  • 10.
    Documenting The Process PEBKAC:Human errors happen and create issues import data using wrong character set setting up replica using wrong binlog le/pos ... Document every step, we need to repeat it multiple times 10 / 77
  • 11.
    making you afraidto upgrade by describing Potential Risks 11 / 77
  • 12.
    Optimizer Changes Example: index_merge_intersection Oftenseen during migrations to MySQL 5.6 Affects environments with sub-optimal indexing Queries with c1='a' AND c2='b' when composite index (c1,c2) is missing Is often slower when selectivity with 1 of the 2 columns is bad (and it happens frequently) Result: a lot of queries were slower in new environment Need SELECT performance tests between versions https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/ 12 / 77
  • 13.
    New Defaults InMySQL 5.7 The new defaults in MySQL 5.7 make a lot of sense: More use of available features and performance enhancements out of the box More strictness with data/query validation New Reserved words Applications might not be ready for it. Drupal 7 - https://www.drupal.org/node/2545480 They will/might break the application more easily: sql_mode=ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION innodb_strict_mode=1 Needs SELECT & DML query validity tests between versions 13 / 77
  • 14.
    Other Changes inMySQL 5.7 Passwords that use the older pre-4.1 password hashing format is removed. 14 / 77
  • 15.
    MySQL 5.0.37 +-------+ | 0| +-------+ MySQL 5.0.45 +-------+ | 1 | +-------+ Minor Versions Also At Risk CREATE TABLE date (d DATE); INSERT INTO date VALUES ('2017-04-19'); SELECT COUNT(*) FROM date WHERE d < NOW()-INTERVAL 1 DAY; Seen with DELETE FROM date WHERE d < NOW()- INTERVAL 1 DAY in binlog_format=STATEMENT environments. Needs SELECT & DML query result tests between versions 15 / 77
  • 16.
    Workload SYNC_BINLOG=1 in MySQL5.7 Can impact certain environments, might not be noticed when looking at a single query InnoDB LRU Flushing changes require tuning for heavy workloads in 5.6 (innodb_lru_scan_depth) When switching to MySQL 8.0 with the new data dictionary ... Need to do Workload Testing between versions http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.html http://lefred.be/content/sync_binlog-1-in-5-7/ 16 / 77
  • 17.
    How Do WeReduce All This Risk? 17 / 77
  • 18.
  • 19.
    establish Upgrade Method ForA Single Server 19 / 77
  • 20.
    Upgrade Method ForA Single Server Follow MySQL documentation: http://dev.mysql.com/doc/refman/5.7/en/upgrading.html Ensure to document every command Restore from backup Or take a replica you can miss 20 / 77
  • 21.
    Upgrade Method ForA Single Server 21 / 77
  • 22.
    Upgrade Method ForA Single Server 22 / 77
  • 23.
  • 24.
    Writes - ReplicationConsistency pt-table-checksum: validate consistency in a replication topology Identify problems caused by PEBKAC Ensure events replicate properly (binlog_format=STATEMENT) Upgrade a replica or add a replica which is using the modi ed version. Do it on production, will have no result in test/staging https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html 24 / 77
  • 25.
    Writes - ReplicateTest Server 25 / 77
  • 26.
    often left behindis Rollback Scenario Testing 26 / 77
  • 27.
    Rollback Scenario Testing Possibilityto fall back in case something went wrong during migration Can be done using replication, but has to be tested! 27 / 77
  • 28.
    Writes - RollbackTesting 28 / 77
  • 29.
    Rollback Scenario Testing Youmight need to change some settings to your new my.cnf to be able to support replicating back. Example: binlog_checksum = NONE binlog_row_image = FULL binlog_rows_query_log_events = OFF log_bin_use_v1_row_events = 1 gtid_mode = OFF log_slave_updates=1 skip-slave-start 29 / 77
  • 30.
    Writes - Checksums- GTID 30 / 77
  • 31.
    Writes - Checksums- Non-GTID 31 / 77
  • 32.
    Writes - Checkums- ROW 32 / 77
  • 33.
    Writes - Checkums- ROW 33 / 77
  • 34.
    Where To Runpt-table-checksum? GTID: pt-table-checksum can only be run on Master (Errant Transactions) Or scratch the pt-table-checksum host after tests non-GTID: pt-table-checksum can be run on intermediate master binlog_format=ROW: only 1 tier below can be checksummed run on every tier that has a replica (for rollback) pt-table-checksum can bring prod overhead when run on active master Let replication run for a while before checksumming 34 / 77
  • 35.
    pt-table-checksum results On everyreplica (including rollback): SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksum WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +----+-----------------+------------+--------+ | db | tbl | total_rows | chunks | +----+-----------------+------------+--------+ | db | telephone_debit | 44342 | 1 | | db | orderline | 21451 | 3 | | db | orders | 25125215 | 12 | +----+-----------------+------------+--------+ 35 / 77
  • 36.
    pt-table-checksum - Analysis Troubleshootingstarts now... What went wrong? 36 / 77
  • 37.
    pt-table-checksum - Analysis Whichchunks failed? db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARY lower_boundary: 5014733 upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48 37 / 77
  • 38.
    pt-table-checksum - Analysis Whichchunks failed? db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARY lower_boundary: 5014733 upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48 38 / 77
  • 39.
    pt-table-checksum - Analysis SELECT* INTO outfile '/tmp/telephone_debit_mysql56' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; SELECT * INTO outfile '/tmp/telephone_debit_mysql57' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; # diff -u /tmp/telephone_debit_mysql5{6,7} 39 / 77
  • 40.
    pt-table-checksum - Analysis SELECT* INTO outfile '/tmp/telephone_debit_mysql56' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; SELECT * INTO outfile '/tmp/telephone_debit_mysql57' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; # diff -u /tmp/telephone_debit_mysql5{6,7} Use twindb_table_compare! https://github.com/twindb/twindb_table_compare 40 / 77
  • 41.
    pt-table-checksum - Analysis Wrongupgrade method backups wrong replication le/pos ... binlog_format=STATEMENT using (UUID()...) Common Seen Issues replicating older versions: Floating point differences: Storing currencies in a DOUBLE Temporal data types Invalid dates converted to zero dates Trailing spaces in CHAR elds 41 / 77
  • 42.
    Testing Writes Consistency ChecksProcess: Checksum Check for differences On new environment On rollback environment For each inconsistency Analyze diff Find root cause Fix problem Document problem & solution Repeat checksum again 42 / 77
  • 43.
  • 44.
    Testing Reads -Collect Queries 44 / 77
  • 45.
    Testing Reads -Collect Queries Collection Techniques: Slow Query Log long_query_time=0 Careful when ~+10000 QPS Percona Server: log_slow_rate_limit tcpdump 'packets lost' in libpcap Application/Load Balancer queries Ensure: Get the full workload (long enough) Get data from Master & Replicas Collect batchjob queries running at night https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html 45 / 77
  • 46.
    Testing Reads -Setup 2 Environments 46 / 77
  • 47.
    Testing Reads -Setup 2 Environments Need 2 Test Servers: Reuse servers from checksum + rollback Ensure they have the same data (break replication at same time) Same HW speci cations Similar Con gurations on buffer pool, flatc... Fast enough to more or less resemble production Optionally can be done using 1 machine (pt-upgrade --save-results) 47 / 77
  • 48.
    Testing Reads -pt-upgrade 48 / 77
  • 49.
    Testing Reads -pt-upgrade pt-upgrade: runs one query at a time on both test environments compares differences: warnings/errors resultset (even different order) query response time Run pt-upgrade on third host with similar network latency Run twice to warm up buffer pool rst (need to be equal) Can also compare writes for execution time & warnings Filter slowlog initially to limit similar queries pt-query-digest --no-report --output slowlog --samples 20 https://www.percona.com/doc/percona-toolkit/3.0/pt-upgrade.html 49 / 77
  • 50.
    Testing Reads -pt-upgrade Reporting class because there are 1000 row diffs. Total queries 10 Unique queries 10 Discarded queries 0 select ... from ... ## ## Row diffs: 10 ## -- 1. @ row 2 < 13178,"dim0",37,2,21,,,0,0,0,1,NULL,NULL > 13178,"dimø",37,2,21,,,0,0,0,1,NULL,NULL ... 50 / 77
  • 51.
    Testing Reads -pt-upgrade Reporting class because it has diffs, but hasn't been reported yet. SELECT * FROM `database`.table WHERE treeid = '' AND productid='0' ## Warning diffs: 2 Code: 1366 Level: Warning Message: Incorrect integer value: '' for column 'treeid' at row 1 vs. No warning 1366 51 / 77
  • 52.
    Testing Reads -pt-upgrade SELECT * FROM `database`.client_orders WHERE client=? AND blacklist=? LIMIT ? ## Query time diffs: 1 -- 1. 0.000513 vs. 0.036395 seconds (70.9x increase) SELECT * FROM `database`.client_orders WHERE client=57450 AND blacklist=1 LIMIT 1 52 / 77
  • 53.
    Testing Reads Process Collectqueries Run pt-upgrade (twice) For each entry in report Figure out why it is reported Deploy x in Prod Application Make schema changes Document analysis Run pt-upgrade again 53 / 77
  • 54.
    one of themost challenging is Testing Workload 54 / 77
  • 55.
    Workload Testing -Percona Playback 55 / 77
  • 56.
    Workload Testing -Query Playback Uses slowlog to replay queries Needs long_query_time=0 - challenging on busy servers Enough data during peak workload Tries to execute workload as realistically as possible same connections, same transactions, same delays between queries Run against both environments, compare speed Think about preloading buffer on both the same way Active development by Marius Wachtler (ex)-DropBox! Thank you! (uno cal product of Percona, no support) 56 / 77
  • 57.
    Workload Testing -ProxySQL Mirroring 57 / 77
  • 58.
    Workload Testing -ProxySQL Mirroring Mirror queries from Load Balancer to test environment Good Blogpost: https://www.pythian.com/blog/using-proxysql-validate-mysql- updates/ 58 / 77
  • 59.
  • 60.
    Migration Process Create MigrationPlan Different for every environment/application Upgrade a replica rst for a couple of days/weeks? How to switch masters? How is failover being handled nowadays? MHA, Orchestrator, Manual, GTID/msyqlrpladmin...? Test in staging! 60 / 77
  • 61.
    the actual Migration InProduction 61 / 77
  • 62.
    Migration - CreateSlave Environments 62 / 77
  • 63.
    Migration - RedirectRead Tra c 63 / 77
  • 64.
    Migration - ApplicationSwitchover - 1 64 / 77
  • 65.
    Migration - ApplicationSwitchover - 2 65 / 77
  • 66.
    you (think you)will never need to do a Rollback 66 / 77
  • 67.
  • 68.
    Rollback What went wrong? Idid not follow the full process! (or I forgot to document it) Do consistency checks again! 68 / 77
  • 69.
    after all thattesting, it's ok to spend time doing Post-Migration Assessment 69 / 77
  • 70.
    Post-Migration Check trending fordifferent behavior more cpu load? more disk IO? higher amount of innodb_rows_* and handler_* threads_running stability? do some query optimization If all looks good, scratch the 5.6 rollback & make it 5.7 Remove the rollback speci c con guration options 70 / 77
  • 71.
  • 72.
  • 73.
    Multi-Use (Minor MySQL versionupgrades) Major MySQL version upgrades Switching Hardware from Intel -> AMD archicture Using a new kernel/libc/memory allocator Switching storage engines MariaDB/Percona Server/MySQL ... 73 / 77
  • 74.
    Do I reallyhave to go through this? Many success stories: Have done several MySQL upgrades from 4.1 -> 5.5 without intermediate slaves Upgraded environments with major schema changes in the mix (mssql-style environments using stored procedures only) Found numerous application bugs using this process Optimized many customers schemas/queries in the meantime As long as you follow this process completely, the risk of running into problems is quite small. 74 / 77
  • 75.
    Do I reallyhave to go through this? It Depends: Your business might be risk-averse: every change has to be thoroughly tested Other companies just upgrade a replica in production and see how it goes My suggestion to do this at least for: Major MySQL version upgrades Switching storage engines 75 / 77
  • 76.
    Summary Test Step Skip? DocumentUpgrade Single Server Really? Why? Rollback Scenarios Not Recommended Consistency Checks Required, No Debate! Read Tests Strongly Suggested Workload Tests Possible (Early Adopter Alert) Migration Tests Not Recommended To Skip 76 / 77
  • 77.
    Reducing Risk WhenUpgrading Your MySQL Environment Q&A! Kenny Gryp MySQL Practice Manager