MySQL provides hundreds of status counters, but how do you make sense of all that monitoring data?
If you’re in Operations and your job is to monitor the health of MySQL/MariaDB Galera Cluster or Percona XtraDB Cluster, then this webinar is for you. Setting up a Galera Cluster is fairly straightforward, but keeping it in a good shape and knowing what to look for when it’s having production issues can be a challenge.
Status counters can be tricky to read …
Which of them are more important than others?
How do you find your way in a labyrinth of different variables?
Which of them can make a significant difference?
How might a host’s health impact MySQL performance?
How to identify problematic nodes in your cluster?
To find out more, read these webinar slides (or watch the replay).
Our colleague Krzysztof Książek provided a deep-dive session on what to monitor in Galera Cluster for MySQL & MariaDB. Krzysztof is a MySQL DBA with experience in managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
Amongst other things, Krzysztof discussed why having a good monitoring system is a must, covering the following topics:
Galera monitoring
• cluster status
• flow control
Host metrics and their impact on MySQL
• CPU
• memory
• I/O
InnoDB metrics
• CPU-related
• I/O-related
How to Check CNIC Information Online with Pakdata cf
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraDB Cluster
1. Copyright 2015 Severalnines AB
How to monitor your Galera Cluster?
April 21, 2015
Krzysztof Książek
Severalnines
krzysztof@severalnines.com
2. Copyright 2015 Severalnines AB
! My name is Krzysztof Książek
! MySQL DBA with 8 years of experience
! 2.5 years of work in PalominoDB/BlackbirdIT/Pythian
! Worked with, among others:
! Flipboard
! Pinterest
! Zendesk
! Currently - Senior Support Engineer at Severalnines
2
Who am I?
3. Copyright 2015 Severalnines AB
! Why do you need a good trending system?
! Monitoring Galera Cluster metrics
! Monitoring host metrics and their impact on MySQL
! The most important InnoDB related metrics
3
Agenda
6. Copyright 2015 Severalnines AB
! Monitoring system (i.e. Nagios)
! Checks if services are healthy
! Sends pages
! Trending system (i.e. Cacti, Graphite)
! Collects metrics
! Generate graphs
6
Monitoring vs. trending
7. Copyright 2015 Severalnines AB
! Periodical (daily/weekly) healthchecks
! Insight into all aspects of the database operations
! Post mortem and proactive monitoring
! Capacity planning
7
Why do we need a trending system?
8. Copyright 2015 Severalnines AB
! Healthchecks are a pain
! You want to see
aggregated data
! You want to be able to drill
down to a particular host
! You want to see the most
important data first and dig
in later on
8
Healthchecks
9. Copyright 2015 Severalnines AB
! Graphs based on MySQL
status counters
! Overall status and per-node
graphs
! Ability to get a timeshifted
graphs - useful for
comparing workload
changes across the time
9
Insight into internals, capacity planning
10. Copyright 2015 Severalnines AB
! Ability to dig into past data
! Even less than 5s of data
granularity (hardware-
dependent)
! Low granularity allows you
to catch the issue as it
evolves - no need to wait 5
minutes for a graph to
refresh
10
Post mortem and proactive monitoring
12. Copyright 2015 Severalnines AB
Important internals:
! Cluster status
! Flow control
! Send and receive queue
12
What to monitor?
13. Copyright 2015 Severalnines AB
! Node IP
! Node State
! Synced
! Donor
! Disconnected
! Cluster size
! Does node takes part in
writeset replication?
13
Cluster status monitoring
14. Copyright 2015 Severalnines AB
! How large percent of the
time node stalls?
! How many flow control
messages have been sent?
14
Flow control monitoring
16. Copyright 2015 Severalnines AB
! Average size of the send
and receive queue
! If a queue is large - question
is what caused it?
! Node slowness?
! Background operations?
16
Send/Receive queue monitoring
18. Copyright 2015 Severalnines AB
! Cert Deps Distance - on
average, how many
writesets can be applied at
the same time?
! Segment ID - are nodes
belong to the same
segment?
! Last Committed - which
sequence number was last
applied?
18
Other Galera-related data
20. Copyright 2015 Severalnines AB
! Understand the utilization of the hardware
! Capacity planning
! Determine the type of an issue
! I/O related?
! CPU related?
! Network related?
20
Host metrics - what for?
21. Copyright 2015 Severalnines AB
! CPU utilization (should I add more nodes to the cluster?)
! Network utilization (am I running out of bandwidth?)
! Ping (how badly latency affects my Galera cluster?)
! Disk throughput and IOPS (am I within my hardware
limits?)
! Disk space (do I have to plan for larger disks?)
! Memory utilization (do I suffer from a memory leak?)
21
Host metrics - what to look at?
25. Copyright 2015 Severalnines AB
! Overall CPU utilization
! 100% - maybe it’s time to scale up a cluster or tune
some queries?
! Low but fast queries are slow - maybe you are having
locking issues?
! High but not 100% and slow queries - maybe you are
suffering from internal contentions?
! Significant part of ‘system’ in the CPU utilization - you
are suffering from internal contentions
25
InnoDB and CPU - what to look for
26. Copyright 2015 Severalnines AB
! For Percona Server and MariaDB
! SHOW ENGINE INNODB STATUS
! SHOW ENGINE INNODB MUTEX
! Performance Schema (if you have mutex wait
instrumentation enabled - requires MySQL’s restart)
26
InnoDB - internal contention debugging
27. Copyright 2015 Severalnines AB
! List of the current waits - not
always the same as an
“average” workload
! Points to the source code -
very helpful to understand
what’s going on
! Performance -> InnoDB
Status in ClusterControl
! Nice material for further
googling
27
SHOW ENGINE INNODB STATUS
28. Copyright 2015 Severalnines AB
! List of most common waits
since server’s start
! Not flushable, unfortunatelly
! Maybe I should disable
Adaptive Hash Index or add
some partitions to it?
28
SHOW ENGINE INNODB MUTEX
29. Copyright 2015 Severalnines AB
Performance Schema (5.6)
29
! SQL reporting power - aggregate, sort, do whatever you like
! btr_search_latch again high on the list
! Even better visibility using “sys schema” from Mark Leith
30. Copyright 2015 Severalnines AB
! Galera Cluster runs on InnoDB
! You need to know your I/O to configure InnoDB correctly
! You need to know your I/O to pick a right hardware
! I/O getting out of control will result in unstable MySQL
performance
30
InnoDB I/O metrics - why are they important?
31. Copyright 2015 Severalnines AB
! Reads/writes/fsyncs
! Buffer pool dirty pages
! Checkpoint age
! InnoDB flushing
31
InnoDB I/O metrics - what to monitor?
32. Copyright 2015 Severalnines AB
Undestanding I/O metrics
32
! The difference between innodb_flush_log_at_trx_commit=2|1
! Increase in InnoDB log fsyncs and data fsyncs
33. Copyright 2015 Severalnines AB
Undestanding I/O metrics
33
! Dirty pages graphed in linear mode
! Please note sufficient resources to keep them under control
34. Copyright 2015 Severalnines AB
InnoDB I/O graph
34
! Data about reads, writes, fsyncs
! Happened straight after server start - reads spike at the beginning
35. Copyright 2015 Severalnines AB
! We want to have some
data in the InnoDB redo
logs to benefit from write
merging
! We don’t want to have too
many data in the InnoDB
redo logs - it may result in
spiky throughput and
transient pauses in the
workload
35
Checkpoint age data
36. Copyright 2015 Severalnines AB
Thank You!
36
! Lot’s of monitoring features in the free community edition of
ClusterControl
! http://www.severalnines.com/getting-started
! Easy to install:
$ wget http://www.severalnines.com/downloads/cmon/install-cc
$ chmod +x install-cc
# as root or sudo user
$ ./install-cc
! http://www.severalnines.com/blog-categories/db-ops
! Contact: krzysztof@severalnines.com