Webinar slides: Top 9 Tips for building a stable MySQL Replication environment

Copyright 2016 Severalnines AB
1
Your host & some logistics
I'm Jean-Jérôme from the Severalnines Team
and I'm your host for today's webinar!
Feel free to ask any questions in the Questions
section of this application or via the Chat box.
You can also contact me directly via the chat
box or via email: jj@severalnines.com during
or after the webinar.

2
About Severalnines and ClusterControl

3
What we do
Manage Scale
MonitorDeploy

4
ClusterControl Automation & Management
! Provisioning
! Deploy a cluster in minutes
! On-premises or in the cloud (AWS)
! Monitoring
! Systems view
! 1sec resolution
! DB / OS stats & performance advisors
! Configurable dashboards
! Query Analyzer
! Real-time / historical
! Management
! Multi cluster/data-center
! Automate repair/recovery
! Database upgrades
! Backups
! Configuration management
! Cloning
! One-click scaling

5
Supported Databases

6
Customers

9 DevOps Tips for Going in Production with MySQL Replication
December 6, 2016
Krzysztof Książek
Severalnines
krzysztof@severalnines.com
7

8
Agenda
! 1. Sanity checks before migrating into
MySQL replication setup
! 2. Operating system configuration
! 3. Replication
! 4. Backup
! 5. Provisioning
! 6. Performance
! 7. Schema changes
! 8. Reporting
! 9. Disaster recovery

9
1. Sanity checks before migrating into MySQL replication
setup

10
! Use InnoDB - MyISAM will break data consistency
when query gets killed on the master
! Killed query won’t be replicated but some changes
have been already implemented on the master -
no rollback support
! Primary keys - define them on every table
! Not only speeds up operations on InnoDB table, but
it is also requirement for pt-online-schema-change
! When in doubt - use auto incremented, unsigned
integer
1. Sanity checks before migrating into MySQL replication setup
! Embrace lag as an inevitable element of
working with MySQL replication
! Be aware of your app requirements - what
is an acceptable replication lag for you?
! Understand your write patterns to know when
lag will strike - Batch loads?
! Understand your read patterns - heavy, OLAP
reads may remove data from buffer pool
causing slowdown and lag

11
1. Sanity checks before migrating into MySQL replication setup
! ClusterControl Developer Studio
screen showing advisors
designed to find MyISAM tables
and tables lacking Primary Key

12
2. Operating system configuration

13
2. Operating system configuration
! Ensure you have swap enabled
! You rather want to see MySQL slowing
down than getting killed
! Reducing priority for OOM killer may not be
enough if you run out of memory:
! echo -1000 > /proc/$PID/oom_score_adj
! You can also reduce swappiness
(vm.swappiness)
! echo “1” > /proc/sys/vm/swappiness
! NUMA - make sure you use NUMA interleave
! Most of the time it’s already done in MySQL init
scripts
! Use EXT4 or XFS filesystems
! Benchmark to see which one works better with your
kernel
! Use noatime and noadirtime - no need to maintain this
metadata
! When running in virtualized environment, keep an eye
on CPU steal on the VM
! VM snapshots may impact databases

14
3. Replication

15
! Row-based replication is the way to go - only this
method will ensure data consistency
! Not in 100%, though - problems still may happen
! Another alternative is mixed format - RBR will be
impacted by inconsistency earlier
! To maintain consistency under control, use pt-
table-checksum
! Run it with regular intervals, make sure all slaves
are in sync
3. Replication - binary log format
! If you detect inconsistent slave, use pt-
table-sync to fix the problem
! In the worst case scenario, rebuild the
slave from the master
! Don’t be afraid of that - for more
complex problems this is the fastest way
to get a slave back in sync

16
! Automated failover
in ClusterControl
3. Replication

17
! As long as you can use GTID, use it
! Gives you great deal of flexibility in how you can
change your replication topology
! CHANGE MASTER TO … MASTER_AUTO_POSITION=1
! Make sure you check for errant transactions before
you switch the master
! Errant transaction is a transaction executed on a
slave but not on a master
! It will break the data consistency and may break
the replication
3. Replication - GTID
! If possible, try to use and benefit from
multithreaded replication
! Schema-based multithreading in 5.6
! Logical clock-based multithreading in 5.7
! Multithreaded replication goes always with
GTID
! It’s not enforced, but you will run into
problems if you stick to non-GTID
replication
! Multithreaded replication is a great way to
speedup your replication and reduce lag

18
4. Backup

19
! Two types of a backup
! Logical backup
! Physical backup
! Logical backup - data is stored in plain text
format: SQL, CSV or similar
! Easy way of modifying stored data - just edit a
row entry
! Easy way of restoring single rows - just find it
and execute SQL or LOAD DATA INFILE
4. Backup - backup types
! Physical backup - data is stored in binary form
! Xtrabackup
! SAN, EBS or LVM snapshot
! rsync
! Great to restore whole data at once
! Not so great to restore subset of data
! Xtrabackup allows to restore tablespaces and
schemas
! No way to restore single row

20
! Arguably the best known backup tool for MySQL
! Ability to dump tables, schemas
! Ability to recover even a single row (it’s easier
with --extended-insert=0)
! Locking - yes, but it’s not that big problem as
you may think (for InnoDB tables)
! May generate large SQL files - dump separate
tables, not full schema
4. Backup - mysqldump
! Can be used to build a slave (--master-
data)
! Long recovery time (need to parse all that
SQL)
! Single thread (you can run it in parallel on
per-table basis, though - little bit of
scripting required)
! Character set may be tricky if you don't
pay attention

21
! mysqldump only different - allows parallelization,
splits tables in chunks
! Improved significantly over last year
! Added support for dumping schema
! It’s so much easier to compile it from githup
repo
! Binary packages are available, but not always
up to date
4. Backup - mydumper/myloader
! Pretty nice dump time (1T ~ 4-6h, YMMV of
course)
! Long loading time (but not as long as with
mysqldump)
! The fastest logical backup I know - you may
need to get familiar if you have large data set
and plan an upgrade

22
! _The_ backup solution
! Online backup for InnoDB tables
! “Virtually” non-locking
! Works by copying the data files and logging
transactions which happened in the meantime
! If you have MyISAM, you’ll get locked. Don’t use
MyISAM
! Supports local backups and streaming over the
network
! Supports incremental backups
4. Backup - xtrabackup
! Backup needs to be prepared (transactions from the
log have to be applied)
! innobackupex --apply-log /path/to/BACKUP-DIR
! Remember about --use-memory when applying logs.
Memory speeds things up
! Supports partial backups - per schema and per table
! This comes handy when restoring missing data,
speeds it up
! Can bundle replication information with the backup

23
! Ensure you have some sort of backup policy
defined
! How you backup your data?
! How often do you backup your data?
! How long do you want to store backup?
! Do you need point-in-time recovery?
! Make sure you copy backup offsite, for disaster
recovery
4. Backup - best practices
! Backup may cause impact on a MySQL host
! Additional CPU and I/O load on the system
! Locking within MySQL
! If possible, use dedicated slave for backups
! Each backup is a Schrödinger’s backup unless
you test it
! Test them on a regular basis

24
4. Backup
! Create backup in
ClusterControl

25
! Schedule backup in
ClusterControl
4. Backup

26
5. Provisioning

27
! You will be provisioning slaves - often, for various
reasons
! New host has to be added to the cluster
! Data inconsistency detected on a slave
! Hardware upgrade
! Various testing purposes
5. Provisioning
! Build, test and maintain provisioning system
! Make sure it’s easy to rebuild/create new
slave
! Leverage one of existing backup solutions
! Physical backups work best for that

28
! Add a slave in ClusterControl
5. Provisioning

29
6. Performance

30
! While working with MySQL replication, keep in
mind that performance of a slave may impact
replication lag
! Heavy writes will cause a lag
! Split them if needed
! Try to distribute them in time
! Parallelize writes - leverage multithreaded
replication
6. Performance
! Monitor your system’s performance
! Use monitoring and trending tools
! Cacti
! Grafana
! PMM
! VividCortex
! ClusterControl

31
! Query performance
graph in ClusterControl
6. Performance

32
! Replication lag graph
in ClusterControl
6. Performance

33
! Use caching layer
! Redis, Memcached, Couchbase
! Good caching may reduce database load in
more than 99%
! Helps to hide issues of the database tier from the
application
! Make sure you handle cache refreshing properly
! Run one query, let other threads wait for the
result
6. Performance - cache and proxies
! Use proxy layer
! ProxySQL, MaxScale, HAProxy
! Proxies, especially SQL-aware, give you
great flexibility and control over database
! Detect and handle failed cluster nodes
and topology changes
! Reduce number of direct connections to
MySQL, helping to achieve better
performance

34
6. Performance
! ProxySQL deployment in
ClusterControl

35
7. Schema changes

36
! In replication environment, DDL = lag
! Even if DDL is online on the master, it will be
serialized on slaves
! This is also true for multithreaded replication
! Unless table is very small, direct DDL is not usable
! Use rolling schema upgrade instead
! Implement schema change on all slaves first, then
on the master
! Make sure schema change is compatible
7. Schema changes
! Compatible schema change, in short, is the one
which will allow replication to work
! RBR is very strict when it comes to what change is
compatible and which one is not
! This makes MIXED replication tricky - you need to
be aware which tables are involved in RBR-stored
events
! Consult MySQL documentation to understand what
change is allowed
! http://dev.mysql.com/doc/refman/5.7/en/
replication-features-differing-tables.html
! Test changes in your staging environment

37
! Instead of rolling schema change you can use
online schema change tools
! pt-online-schema-change from Percona
! gh-ost from GitHub
! Both allow non-compatible changes to be
executed
! Online schema changes may take a while,
make sure you tested them on a staging host to
assess time needed to accomplish the change
7. Schema changes - online schema changes
! pt-online-schema-change is a well known
and tested solution
! Uses triggers to keep up with changes
! LOW PRIORITY INSERT to copy data
! Requires metadata locks to create
triggers
! Foreign keys can be a problem to tackle

38
! gh-ost is a new tool, not yet widely used
! Make sure you test it before you apply it on
production
! gh-ost supports multiple test modes
! Skip --execute to do dry-run
! Use --test-on-replica to check the contents of
both old and new table on stopped slave
7. Schema changes - gh-ost
! gh-ost allows you to throttle the replication if
lag is too big
! It doesn’t use triggers - uses binlogs instead
! Makes possible to actually stop the whole
schema change activity
! Requires RBR on a host where binlogs are
scanned (by default - scans binlogs on a
slave, executes changes on the master)

39
8. Reporting

40
! OLAP processes can be heavy on a database
node
! CPU and I/O-wise, reporting may use serious
amount of resources, making it hard for the
replication to keep up
! As a result, a good practice is to define one slave
as a reporting slave and direct all OLAP traffic
there
! If you use a backup slave, it may also be used for
OLAP, unless performance will become
unacceptable
8. Reporting
! OLAP processes may also execute writes
! Those can also become a source of load
and, at the end, replication lag
! Make sure impact is monitored and
process can be throttled if needed

41
9. Disaster Recovery

42
! Disaster will happen - this is inevitable
! Plan for disaster, learn to live with it
! There are different ways you can minimize
impact - choose whatever suits best for your
environment
! You need to have a backup - make sure you
have them tested
! Store a copy of your backup outside of your
datacenter
9. Disaster Recovery - backups
! Backups are great safety measures but tend
to be slow to restore
! If you need to have your systems up
quickly, verify how long it takes to restore
from backup
! Not that you’ll have a choice in every
situation

43
! Another way of ensuring availability is to
have a standby environment up and
running at the separate location
! This could be a full-blown environment
(much more expensive)
! Or it could be a stub which will allow you to
build everything from scratch
! For example, a second backup server in
other data center - you can provision
rest of the hosts using those backups
9. Disaster Recovery - standby environment

44
9. Disaster Recovery
! In any case, this may not be enough to save you from “DROP SCHEMA production;”
! Backups will still be essential
! Make sure you have a runbook covering recovery process
! People can make mistakes under pressure - runbook will help them to execute a process
! Test your recovery process on a regular basis - you want to be 100% it works

45
Thank You!
! Related content:
! http://severalnines.com/blog/new-whitepaper-mysql-replication-high-availability
! http://severalnines.com/blog/become-mysql-dba-blog-series-common-operations-schema-
changes
! http://severalnines.com/blog-tags/performance
! Install ClusterControl:
! http://severalnines.com/getting-started
! Contact: jj@severalnines.com

Webinar slides: Top 9 Tips for building a stable MySQL Replication environment

More Related Content

What's hot

Viewers also liked

Similar to Webinar slides: Top 9 Tips for building a stable MySQL Replication environment

More from Severalnines

Recently uploaded

Webinar slides: Top 9 Tips for building a stable MySQL Replication environment