Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Universal Open Source Enterprise Level Monitoring Solution
Monitoring All Elements of
Your Database Operations with
Za...
Who am I?
Alexei Vladishev
Founder of Zabbix
CEO, Architect and Product Manager
Twitter: @avladishev
Email: alex@zabbix.co...
My plan
• Introduction to Zabbix
• Zabbix capabilities
• What’s possible?
3
History of Zabbix
It started with a simple
script
What’s wrong with the
script?
• Hard to maintain and extend
• It did not scale well
• Provided no advanced problem detecti...
Then I redesigned everything and
eventually named it Zabbix…
Zabbix is a universal open
source enterprise level
monitoring solution
Zabbix team
45 members working full-time
Offices in Riga
(Headquarters), Tokyo and
New-York
9
All levels of infrastructure
10
All platforms and OS
All platforms
All vendors
Any metrics
11
Zabbix Architecture
13
Data collection
14
History
MySQL or PostgreSQL or Oracle or DB2
Analysis
Zabbix server
Data collection
15
History
Analysis
Data collection
Reaction:
Alerts
Automatic actions
Zabbix server
WEB UI:
Visualization
Management
How to install Zabbix
Official packages
Debian, Ubuntu
CentOS, RedHat, Oracle Linux
Database monitoring
17
Production environment
Load balancer based on HA Proxy
7 x MySQL nodes running on Linux
What do we need to monitor?
OS metrics
• CPU, memory and network utilization
• Disk IO time
• Available disk space
Databas...
Linux (OS) metrics: Zabbix Agent
shell> apt-get install zabbix-agent
or
shell> rpm -Uvh zabbix-agent
* for AIX, Solaris, H...
Active vs Passive
Pull
• Service checks
• Passive agent
• SSH and Telnet
Push
• Active agent
• Zabbix Trapper and SNMP Tra...
Passive
PULL
22
Zabbix Server
Zabbix
Agent
Zabbix
Agent
Active
PUSH
MySQL MySQL
Add a new host to Zabbix
23
Create a new host
24
25
26
Monitoring MySQL metrics
show global status …
select * from sys.*
select * from performance_schema
mysql log file
28
/etc/zabbix/agent/zabbix_mysql.conf:
/etc/zabbix/zabbix_agentd.conf:
29
30
31
Template App MySQL
32
Graphs
33
Maps
34
Any data
35
Aggregated metrics
Calculated metrics
Buffer pool disk read percentage: 100 * Innodb_buffer_pool_reads /
Innodb_buffer_...
36
Also calculated
Aggregated metrics for a group of hosts
Custom dashboards
37
How to detect problems in
this data flow?
38
Triggers!
39
Trigger is
а problem definition
40
{server:mysql.status[Questions].last()} > 5000
41
MySQL server is overloaded
Tags Datacenter: AM2 Env: Production Service:...
Triggers
Example
{server:mysql.status[Com_update].last()} / {server:mysql.status[Questions].last()} > 0.1
Operators
- + / ...
{MySQL_001:mysql.status[Questions].last()} > 5000

and

{MySQL_002:mysql.status[Questions].last()} > 5000

and

{MySQL_003...
Problem view
44
45
46
47
Problems
48
Performance: MySQL is overloaded
{MySQL_001:mysql.status[Questions].last()} > 5000
Availability: MySQL is not available
{M...
Flapping!
50
0
2500
5000
7500
10000
10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50
{MySQL_001:mysql.sta...
Too sensitive leads to
false positives
51
How to get rid of
false positives?
52
Properly define problem
conditions and think
carefully!
MySQL is overloaded
MySQL is not available
running out of disk spac...
Take advantage of history
MySQL is overloaded
{MySQL_001:mysql.status[Questions].min(10m)} > 5000
MySQL node is not availa...
Problem disappeared
!=
problem is resolved
55
A few examples
Problem: Queries per second > 5000

Now: 4999 Resolved?
Problem: Disk space < 10%

Now: 9.95% Resolved?
Pro...
A few examples
Problem: Queries per second > 5000

Now: 4999 Resolved?
Problem: Disk space < 10%

Now: 10.05% Resolved?
Pr...
A few examples
Problem: Queries per second > 5000

Now: 4999 Resolved?
Problem: Disk space < 10%

Now: 10.05% Resolved?
Pr...
Different conditions for problem and
recovery
Before:
{MySQL_001:mysql.status[Questions].last()} > 5000
Better alternative...
Several examples
System is overloaded
Problem: {MySQL_001:mysql.status[Questions].min(2m)} > 5000

Recovery: {MySQL_001:my...
No flapping. No false positives.
Suddenly we trust our monitoring!
61
Anomaly detection
Compare with a norm, where norm is system state in the past.
Average number of queries per second for th...
Problem forecasting
63
Problem Forecasting
64
0
12,5
25
37,5
50
7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 ...
Trend Prediction
65
0
12,5
25
37,5
50
7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:...
How Zabbix reacts
on problems?
66
67
Possible reactions
• Automatic problem resolution
• Sending alerts to user and user group
• Opening tickets in Helpdesk sy...
Escalate!
69
MySQL Cluster is
down Repeated Email
SMS and ticket in Helpdesk system
Restart HA Proxy
SMS to manager
5 min
...
Why Zabbix?
70
71
All-in-one solution
Trend prediction
Data collection
Problem
detection
Automatic
actions
Agent based
monitoring
Encrypt...
Focus on quality and ease of maintenance
72
All components are compatible within one major release
Virtually no third part...
Benefits of Zabbix
Free and Open Source Software
Extremely flexible
Easy to adopt, use commercial services if needed
No Lice...
share.zabbix.com
The Universal Open Source Enterprise Level Monitoring Solution
Thank you!
Twitter: @avladishev
Email: alex@zabbix.com
Lear...
Upcoming SlideShare
Loading in …5
×

Monitoring all Elements of Your Database Operations With Zabbix

31,354 views

Published on

In depth look into all aspects of Zabbix, from the history and origins of the software to an overview of the latest features, introduced in Zabbix 3.2 .
Presented by the founder and CEO of Zabbix, Alexei Vladishev at Percona Live 2016 Europe.

Published in: Technology

Monitoring all Elements of Your Database Operations With Zabbix

  1. 1. The Universal Open Source Enterprise Level Monitoring Solution Monitoring All Elements of Your Database Operations with Zabbix
  2. 2. Who am I? Alexei Vladishev Founder of Zabbix CEO, Architect and Product Manager Twitter: @avladishev Email: alex@zabbix.com 2
  3. 3. My plan • Introduction to Zabbix • Zabbix capabilities • What’s possible? 3
  4. 4. History of Zabbix
  5. 5. It started with a simple script
  6. 6. What’s wrong with the script? • Hard to maintain and extend • It did not scale well • Provided no advanced problem detection • Any change required script modifications • Etc etc etc
  7. 7. Then I redesigned everything and eventually named it Zabbix…
  8. 8. Zabbix is a universal open source enterprise level monitoring solution
  9. 9. Zabbix team 45 members working full-time Offices in Riga (Headquarters), Tokyo and New-York 9
  10. 10. All levels of infrastructure 10
  11. 11. All platforms and OS All platforms All vendors Any metrics 11
  12. 12. Zabbix Architecture
  13. 13. 13 Data collection
  14. 14. 14 History MySQL or PostgreSQL or Oracle or DB2 Analysis Zabbix server Data collection
  15. 15. 15 History Analysis Data collection Reaction: Alerts Automatic actions Zabbix server WEB UI: Visualization Management
  16. 16. How to install Zabbix Official packages Debian, Ubuntu CentOS, RedHat, Oracle Linux
  17. 17. Database monitoring 17
  18. 18. Production environment Load balancer based on HA Proxy 7 x MySQL nodes running on Linux
  19. 19. What do we need to monitor? OS metrics • CPU, memory and network utilization • Disk IO time • Available disk space Database metrics • Configuration related: max connections, buffer sizes, sync mode • Performance: QPS, query performance, cache hit rate, slow queries, buffer pool usage • Availability: DB is up, connections, log files • Consistency: DB encoding, replication • Security: SSL enabled, opened ports, log files 19
  20. 20. Linux (OS) metrics: Zabbix Agent shell> apt-get install zabbix-agent or shell> rpm -Uvh zabbix-agent * for AIX, Solaris, HP-UX, *BSD, Windows: download pre-compiled from www.zabbix.com 20
  21. 21. Active vs Passive Pull • Service checks • Passive agent • SSH and Telnet Push • Active agent • Zabbix Trapper and SNMP Traps • Monitoring of log files 21
  22. 22. Passive PULL 22 Zabbix Server Zabbix Agent Zabbix Agent Active PUSH MySQL MySQL
  23. 23. Add a new host to Zabbix 23
  24. 24. Create a new host 24
  25. 25. 25
  26. 26. 26
  27. 27. Monitoring MySQL metrics show global status … select * from sys.* select * from performance_schema mysql log file
  28. 28. 28 /etc/zabbix/agent/zabbix_mysql.conf: /etc/zabbix/zabbix_agentd.conf:
  29. 29. 29
  30. 30. 30
  31. 31. 31 Template App MySQL
  32. 32. 32
  33. 33. Graphs 33
  34. 34. Maps 34 Any data
  35. 35. 35 Aggregated metrics Calculated metrics Buffer pool disk read percentage: 100 * Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests Number of QPS per cluster: grpsum[“DB Cluster","mysql.status[Questions]",last] Buffer pool utilization: 100 * (Innodb_buffer_pool_pages_total - Innodb_buffer_pool_pages_free) / Innodb_buffer_pool_pages_total
  36. 36. 36 Also calculated Aggregated metrics for a group of hosts
  37. 37. Custom dashboards 37
  38. 38. How to detect problems in this data flow? 38
  39. 39. Triggers! 39
  40. 40. Trigger is а problem definition 40
  41. 41. {server:mysql.status[Questions].last()} > 5000 41 MySQL server is overloaded Tags Datacenter: AM2 Env: Production Service: DB Cluster
  42. 42. Triggers Example {server:mysql.status[Com_update].last()} / {server:mysql.status[Questions].last()} > 0.1 Operators - + / * < > = <> <= >= or and not Functions min max avg last count date time diff regexp and much more! 42
  43. 43. {MySQL_001:mysql.status[Questions].last()} > 5000
 and
 {MySQL_002:mysql.status[Questions].last()} > 5000
 and
 {MySQL_003:mysql.status[Questions].last()} > 5000 43 Analyse everything: any metric and any hosts Cluster is overloaded
  44. 44. Problem view 44
  45. 45. 45
  46. 46. 46
  47. 47. 47
  48. 48. Problems 48
  49. 49. Performance: MySQL is overloaded {MySQL_001:mysql.status[Questions].last()} > 5000 Availability: MySQL is not available {MySQL_001:mysql.ping.last()} = 0 Junior level 49
  50. 50. Flapping! 50 0 2500 5000 7500 10000 10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50 {MySQL_001:mysql.status[Questions].last()} > 5000
  51. 51. Too sensitive leads to false positives 51
  52. 52. How to get rid of false positives? 52
  53. 53. Properly define problem conditions and think carefully! MySQL is overloaded MySQL is not available running out of disk space 53 What really means ?
  54. 54. Take advantage of history MySQL is overloaded {MySQL_001:mysql.status[Questions].min(10m)} > 5000 MySQL node is not available {MySQL_001:mysql.ping.max(#3)} = 0 54
  55. 55. Problem disappeared != problem is resolved 55
  56. 56. A few examples Problem: Queries per second > 5000
 Now: 4999 Resolved? Problem: Disk space < 10%
 Now: 9.95% Resolved? Problem: MySQL is not available
 Now: last check returned Up Resolved? 56
  57. 57. A few examples Problem: Queries per second > 5000
 Now: 4999 Resolved? Problem: Disk space < 10%
 Now: 10.05% Resolved? Problem: MySQL is not available
 Now: last check returned Up Resolved? 57
  58. 58. A few examples Problem: Queries per second > 5000
 Now: 4999 Resolved? Problem: Disk space < 10%
 Now: 10.05% Resolved? Problem: MySQL is not available
 Now: last check returned Up Resolved? 58
  59. 59. Different conditions for problem and recovery Before: {MySQL_001:mysql.status[Questions].last()} > 5000 Better alternative: Problem: {MySQL_001:mysql.status[Questions].last()} > 5000 Recovery: {MySQL_001:mysql.status[Questions].last()} < 3000 59
  60. 60. Several examples System is overloaded Problem: {MySQL_001:mysql.status[Questions].min(2m)} > 5000
 Recovery: {MySQL_001:mysql.status[Questions].max(10m)} < 3000 MySQL server is not available
 
 Problem: {MySQL_001:mysql.ping.max(#3)} = 0
 Recovery: {MySQL_001:mysql.ping.min(#10)} = 1 60
  61. 61. No flapping. No false positives. Suddenly we trust our monitoring! 61
  62. 62. Anomaly detection Compare with a norm, where norm is system state in the past. Average number of queries per second for the last hour is 2x less than number of queries per second for the same period week ago {HA Proxy:Questions.avg(1h)} < 2 * {HA Proxy:Questions.avg(1h, 7d)} 62
  63. 63. Problem forecasting 63
  64. 64. Problem Forecasting 64 0 12,5 25 37,5 50 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 y = -2,9455x + 48,309 Problem 10% Trigger function timeleft() ??? hours
  65. 65. Trend Prediction 65 0 12,5 25 37,5 50 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 y = -2,9455x + 48,309 ??? % 4 hours Trigger function forecast()
  66. 66. How Zabbix reacts on problems? 66
  67. 67. 67
  68. 68. Possible reactions • Automatic problem resolution • Sending alerts to user and user group • Opening tickets in Helpdesk systems • Unlimited number of possible reactions 68
  69. 69. Escalate! 69 MySQL Cluster is down Repeated Email SMS and ticket in Helpdesk system Restart HA Proxy SMS to manager 5 min 10 min 15 min 20 min 0 min
  70. 70. Why Zabbix? 70
  71. 71. 71 All-in-one solution Trend prediction Data collection Problem detection Automatic actions Agent based monitoring Encryption Anomaly detection Maintenance Event correlation Scalability Visualization Auto discovery Trigger dependencies Centralized management Service checks IoT and embedded Distributed monitoring Zabbix API Alerting Escalations User permissions Integration with AD, OpenLDAP LLD Agent-less monitoring … and more
  72. 72. Focus on quality and ease of maintenance 72 All components are compatible within one major release Virtually no third party dependencies Zabbix Agents are backward compatible since Zabbix 1.0!
  73. 73. Benefits of Zabbix Free and Open Source Software Extremely flexible Easy to adopt, use commercial services if needed No License Fees Extremely low TCO No vendor lock in 73
  74. 74. share.zabbix.com
  75. 75. The Universal Open Source Enterprise Level Monitoring Solution Thank you! Twitter: @avladishev Email: alex@zabbix.com Learn more at Zabbix booth or www.zabbix.com

×