0
Alerting With MySQL and Nagios
http://bit.ly/nagios_mysql2013
Sheeri Cabral
Senior DB Admin/Architect
Mozilla
@sheeri
What is Monitoring?

Threshold alerting

Graphing/trending
Why Monitor?

Problem alerting

Find patterns
– capacity planning
– troubleshooting

Early warning for potential issues
What to Alert?

Problems you can fix
– max_connections
– long running queries
– locked queries
– backup disk space
Nagios is great because...
...anyone can write a plugin
The problem with Nagios...
...anyone can write a plugin
Official Nagios Plugins
for MySQL

check_mysql

check_mysql_query
check_mysql

db connectivity

slave running

slave lag using seconds_behind_master
check_mysql_query

Checks the output of a query is within a
certain range (numerical)
System vars Status vars Caching Calculations
check_mysql yes yes no no
check_mysql
(standard) no yes no no
check_mysqld_st...
“Let us know what you'd like to see!”
What I wanted
System vars Status vars Caching Calculations
mysql_health_check.pl
yes yes yes flexible
mysql_health_check.pl
Caching

Save information to a file
Caching

Save information to a file

--cache-dir /path/to/dir/
Caching

Save information to a file

--cache-dir /path/to/dir/

Use the file instead of connecting again
Caching

Save information to a file

--cache-dir /path/to/dir/

Use the file instead of connecting again
--max-cache-ag...
Caching

Save information to a file

--cache-dir /path/to/dir/

Use the file instead of connecting again
--max-cache-ag...
--mode=varcomp

%metadata{varstatus}
--mode=varcomp

%metadata{varstatus}

SHOW GLOBAL VARIABLES

SHOW GLOBAL STATUS
--mode=varcomp

%metadata{varstatus}

SHOW GLOBAL VARIABLES

SHOW GLOBAL STATUS

--expression allows word replacement
--mode=varcomp

%metadata{varstatus}

SHOW GLOBAL VARIABLES

SHOW GLOBAL STATUS

--expression allows word replacement
...
Sample Command Definition
define command {
command_name check_mysql_tmp_tables
command_line $USER1$/mysql_health_check.pl
...
Sample Command Definition
define command {
command_name check_mysql_tmp_tables
command_line $USER1$/mysql_health_check.pl
...
Sample Command Definition
define command {
command_name check_mysql_cxns
command_line $USER1$/mysql_health_check.pl
--host...
Sample Command Definition
command_name check_mysql_cxns
--mode=varcomp
--expression=
"Max_used_connections/max_connections...
Sample Service Definition
define service {
use generic-service
host_name __HOSTNAME__
service_description MySQL Connection...
Rates

Compare to last run
Rates

Compare to last run

mode=lastrun-varcomp
Rates

Compare to last run

mode=lastrun-varcomp

current{expr}
Rates

Compare to last run

mode=lastrun-varcomp

current{expr}

lastrun{expr}
Rates

Compare to last run

mode=lastrun-varcomp

current{expr}

lastrun{expr}

--comparison, no warn/crit
Query Rate
mysql_health_check.pl [host,user,pass]
mysql_health_check.pl [host,user,pass]
--mode lastrun-varcomp
Query Rate
mysql_health_check.pl [host,user,pass]
--mode lastrun-varcomp
--expression "(current{Queries} - lastrun{Queries})
Query Ra...
mysql_health_check.pl [host,user,pass]
--mode lastrun-varcomp
--expression "(current{Queries} - lastrun{Queries})
/ (curre...
mysql_health_check.pl [host,user,pass]
--mode lastrun-varcomp
--expression "abs((current{Queries} - lastrun{Queries})
/ (c...
define command {
command_name check_mysql_query_rate
command_line $USER1$/mysql_health_check.pl
--hostname $HOSTADDRESS$ -...
Other Modes

--mode=long-query

--mode=locked-query

%metadata{proc_list}

SHOW FULL PROCESSLIST
Sample Command
Definition
define command {
command_name check_mysql_locked_queries
command_line $USER1$/mysql_health_check...
Extending Information
sub fetch_server_meta_data {}
add a new hash key to %metadata
$metadata{varstatus} =
$dbh->selectall...
Extending Information
sub fetch_server_meta_data {}
add a new hash key to %metadata
$metadata{innodb_status} =
$dbh->selec...
For example
• %metadata{innodb_status}
– SHOW ENGINE INNODB STATUS
• Already exists, unused
For example
• %metadata{innodb_status}
– SHOW ENGINE INNODB STATUS
• Already exists, unused
• %metadata{master_status}
– S...
For example
• %metadata{innodb_status}
– SHOW ENGINE INNODB STATUS
• Already exists, unused
• %metadata{master_status}
– S...
“Standard” checks
% max connections
--expression
'Threads_connected/max_connections*100'
“Standard” checks
% max connections
--expression
'Threads_connected/max_connections*100'
InnoDB enabled
--expression “have...
define command {
command_name check_mysql_connections
command_line $USER1$/mysql_health_check.pl
--hostname $HOSTADDRESS$ ...
Did MySQL Crash?
Nagios set to check every 5 minutes
Did MySQL Crash?
Nagios set to check every 5 minutes
Might miss a crash
Did MySQL Crash?
Nagios set to check every 5 minutes
Might miss a crash
Uptime!
--expression 'Uptime'
--comparison=“<1800”
Did MySQL Crash?
define command {
command_name check_mysql_uptime
command_line $USER1$/mysql_health_check.pl
--hostname $H...
“Standard” checks
read_only for slaves
--expression “read_only”
--comparison=“ne 'YES'”
“Standard” checks
read_only for slaves
--expression “read_only”
--comparison=“ne 'YES'”
% of sleeping connections
# connec...
“Standard” checks
read_only for slaves
--expression “read_only”
--comparison=“ne 'YES'”
% of sleeping connections
# connec...
define command {
command_name check_mysql_read_only
command_line $USER1$/mysql_health_check.pl
--hostname $HOSTADDRESS$ --...
Limitations

One check/calculation per Nagios service
– But, you can use many variables
– Cached output

Does not output...
Where to get it
https://github.com/palominodb/PalominoDB-
Public-Code-Repository/tree/master/nagios/
www.palominodb.com->C...
Other Nagios Plugins
Nagios Plugin for Partitions
table_partitions.pl --host
--user --pass
--database --table
--range [days|weeks|months]
--ver...
Nagios Plugin for Partitions
table_partitions.pl --host db1.mozilla.com
--user nagiosuser --pass nagiospass
--database add...
After using pt-table-checksum (master)
Slaves have a table with checksum
this_crc vs master_crc
Nagios Plugin for Checksums
Nagios Plugin for Checksums
check_table_checksums.pl -H host
--user username --pass password
-T checksum_table
-I checksum...
More Resources
www.palominodb.com->Community->Projects
www.mysqlmarinate.com
scabral@mozilla.com
OurSQL podcast (oursql.co...
Upcoming SlideShare
Loading in...5
×

Nagios Conference 2013 - Sheeri Cabral - Alerting With MySQL and Nagios

1,109

Published on

Sheeri Cabral's presentation on Alerting With MySQL and Nagios.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology, Economy & Finance
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,109
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Nagios Conference 2013 - Sheeri Cabral - Alerting With MySQL and Nagios"

  1. 1. Alerting With MySQL and Nagios http://bit.ly/nagios_mysql2013 Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri
  2. 2. What is Monitoring?  Threshold alerting  Graphing/trending
  3. 3. Why Monitor?  Problem alerting  Find patterns – capacity planning – troubleshooting  Early warning for potential issues
  4. 4. What to Alert?  Problems you can fix – max_connections – long running queries – locked queries – backup disk space
  5. 5. Nagios is great because... ...anyone can write a plugin
  6. 6. The problem with Nagios... ...anyone can write a plugin
  7. 7. Official Nagios Plugins for MySQL  check_mysql  check_mysql_query
  8. 8. check_mysql  db connectivity  slave running  slave lag using seconds_behind_master
  9. 9. check_mysql_query  Checks the output of a query is within a certain range (numerical)
  10. 10. System vars Status vars Caching Calculations check_mysql yes yes no no check_mysql (standard) no yes no no check_mysqld_status no yes no no check_mysql_stats yes no yes no check_mysqld no many no no check_mysql_health yes yes no Hard-coded check_mysql yes yes no Hard-coded Third party plugins
  11. 11. “Let us know what you'd like to see!”
  12. 12. What I wanted System vars Status vars Caching Calculations mysql_health_check.pl yes yes yes flexible
  13. 13. mysql_health_check.pl
  14. 14. Caching  Save information to a file
  15. 15. Caching  Save information to a file  --cache-dir /path/to/dir/
  16. 16. Caching  Save information to a file  --cache-dir /path/to/dir/  Use the file instead of connecting again
  17. 17. Caching  Save information to a file  --cache-dir /path/to/dir/  Use the file instead of connecting again --max-cache-age <seconds>
  18. 18. Caching  Save information to a file  --cache-dir /path/to/dir/  Use the file instead of connecting again --max-cache-age <seconds> --no-cache to force connection
  19. 19. --mode=varcomp  %metadata{varstatus}
  20. 20. --mode=varcomp  %metadata{varstatus}  SHOW GLOBAL VARIABLES  SHOW GLOBAL STATUS
  21. 21. --mode=varcomp  %metadata{varstatus}  SHOW GLOBAL VARIABLES  SHOW GLOBAL STATUS  --expression allows word replacement
  22. 22. --mode=varcomp  %metadata{varstatus}  SHOW GLOBAL VARIABLES  SHOW GLOBAL STATUS  --expression allows word replacement  --warning --critical are flexible
  23. 23. Sample Command Definition define command { command_name check_mysql_tmp_tables command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass
  24. 24. Sample Command Definition define command { command_name check_mysql_tmp_tables command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300
  25. 25. Sample Command Definition define command { command_name check_mysql_cxns command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300 --mode=varcomp --expression= "Max_used_connections/max_connections * 100" --warning=">80" –critical=">90" }
  26. 26. Sample Command Definition command_name check_mysql_cxns --mode=varcomp --expression= "Max_used_connections/max_connections * 100" --warning=">80" –critical=">90" }
  27. 27. Sample Service Definition define service { use generic-service host_name __HOSTNAME__ service_description MySQL Connections check_command check_mysql_cxns }
  28. 28. Rates  Compare to last run
  29. 29. Rates  Compare to last run  mode=lastrun-varcomp
  30. 30. Rates  Compare to last run  mode=lastrun-varcomp  current{expr}
  31. 31. Rates  Compare to last run  mode=lastrun-varcomp  current{expr}  lastrun{expr}
  32. 32. Rates  Compare to last run  mode=lastrun-varcomp  current{expr}  lastrun{expr}  --comparison, no warn/crit
  33. 33. Query Rate mysql_health_check.pl [host,user,pass]
  34. 34. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp Query Rate
  35. 35. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp --expression "(current{Queries} - lastrun{Queries}) Query Rate
  36. 36. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp --expression "(current{Queries} - lastrun{Queries}) / (current{Uptime} – lastrun{Uptime})" Query Rate
  37. 37. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp --expression "abs((current{Queries} - lastrun{Queries}) / (current{Uptime} – lastrun{Uptime}))*100" --comparison ">80" Query Rate
  38. 38. define command { command_name check_mysql_query_rate command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300 --mode=lastrun-varcomp --expression= "abs((current{Queries} - lastrun{Queries}) / (current{Uptime} – lastrun{Uptime}))*100" --comparison > 100 }
  39. 39. Other Modes  --mode=long-query  --mode=locked-query  %metadata{proc_list}  SHOW FULL PROCESSLIST
  40. 40. Sample Command Definition define command { command_name check_mysql_locked_queries command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300 --mode=locked-query --warning=$ARG1$ --critical=$ARG2$ }
  41. 41. Extending Information sub fetch_server_meta_data {} add a new hash key to %metadata $metadata{varstatus} = $dbh->selectall_arrayref( q|SHOW GLOBAL VARIABLES|);
  42. 42. Extending Information sub fetch_server_meta_data {} add a new hash key to %metadata $metadata{innodb_status} = $dbh->selectall_arrayref( q|SHOW ENGINE INNODB STATUS|);
  43. 43. For example • %metadata{innodb_status} – SHOW ENGINE INNODB STATUS • Already exists, unused
  44. 44. For example • %metadata{innodb_status} – SHOW ENGINE INNODB STATUS • Already exists, unused • %metadata{master_status} – SHOW MASTER STATUS
  45. 45. For example • %metadata{innodb_status} – SHOW ENGINE INNODB STATUS • Already exists, unused • %metadata{master_status} – SHOW MASTER STATUS • %metadata{slave_status} – SHOW SLAVE STATUS
  46. 46. “Standard” checks % max connections --expression 'Threads_connected/max_connections*100'
  47. 47. “Standard” checks % max connections --expression 'Threads_connected/max_connections*100' InnoDB enabled --expression “have_innodb” --comparison=“ne 'YES'”
  48. 48. define command { command_name check_mysql_connections command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="Threads_connected/max_connections * 100" --comparison=">80" } define command { command_name check_mysql_innodb command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="have_innodb" --comparison="ne 'YES'" }
  49. 49. Did MySQL Crash? Nagios set to check every 5 minutes
  50. 50. Did MySQL Crash? Nagios set to check every 5 minutes Might miss a crash
  51. 51. Did MySQL Crash? Nagios set to check every 5 minutes Might miss a crash Uptime! --expression 'Uptime' --comparison=“<1800”
  52. 52. Did MySQL Crash? define command { command_name check_mysql_uptime command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="Uptime" --comparison="<1800" }
  53. 53. “Standard” checks read_only for slaves --expression “read_only” --comparison=“ne 'YES'”
  54. 54. “Standard” checks read_only for slaves --expression “read_only” --comparison=“ne 'YES'” % of sleeping connections # connected, # running, # max connections
  55. 55. “Standard” checks read_only for slaves --expression “read_only” --comparison=“ne 'YES'” % of sleeping connections # connected, # running, # max connections --expression="(Threads_connected- Threads_running)/max_connections * 100" --comparison=">$ARG1$"
  56. 56. define command { command_name check_mysql_read_only command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="read_only" --comparison="ne 'YES'" } define command { command_name check_mysql_connections command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="(Threads_connected- Threads_running)/max_connections * 100" --comparison=">$ARG1$"
  57. 57. Limitations  One check/calculation per Nagios service – But, you can use many variables – Cached output  Does not output for performance data – Not hard to modify, just no need yet
  58. 58. Where to get it https://github.com/palominodb/PalominoDB- Public-Code-Repository/tree/master/nagios/ www.palominodb.com->Community->Projects
  59. 59. Other Nagios Plugins
  60. 60. Nagios Plugin for Partitions table_partitions.pl --host --user --pass --database --table --range [days|weeks|months] --verify #
  61. 61. Nagios Plugin for Partitions table_partitions.pl --host db1.mozilla.com --user nagiosuser --pass nagiospass --database addons --table user_addons --range months --verify 3
  62. 62. After using pt-table-checksum (master) Slaves have a table with checksum this_crc vs master_crc Nagios Plugin for Checksums
  63. 63. Nagios Plugin for Checksums check_table_checksums.pl -H host --user username --pass password -T checksum_table -I checksum_freshness -b dbs,to,skip
  64. 64. More Resources www.palominodb.com->Community->Projects www.mysqlmarinate.com scabral@mozilla.com OurSQL podcast (oursql.com) slides: http://bit.ly/nagios_mysql2013 MySQL Administrator's Bible youtube.com/tcation http://planet.mysql.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×