Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Monitoring MySQL with Prometheus and Grafana

1,868 views

Published on

Talk given at Open Source Monitoring Conference 2017 in Nuremberg about Prometheus and Grafana, and how to use them to monitor mysql.

Published in: Technology
  • Be the first to comment

Monitoring MySQL with Prometheus and Grafana

  1. 1. Monitoring MySQL with Prometheus & Grafana Julien Pivotto (@roidelapluie) Open Source Monitoring Conference November 22nd, 2017
  2. 2. SELECT USER(); Julien "roidelapluie" Pivotto @roidelapluie Sysadmin at inuits Automation, monitoring, HA MySQL/MariaDB user/admin/contributor Grafana and Prometheus user/contributor
  3. 3. inuits
  4. 4. Monitoring / Observability
  5. 5. Monitoring or Observability? Collecting lots of data Not only for alerting, also for understanding You don't know what you will need
  6. 6. Who decides what to monitor? Pre-Define checks Check only what os needed OR Let the system you monitor decide what to expose Alert based on that data
  7. 7. Push or pull ? Push: need to know and configure where your server is What if 2 3 10 servers? What in dev? Pull: Let any server take the metrics Use service discovery to know what to fetch
  8. 8. High availability Monitoring is very important Metamonitoring What is the price to get a highly available monitoring system?
  9. 9. Prometheus https://prometheus.io/
  10. 10. Prometheus Prometheus is a Cloud-Native Data-Centric Open- Source Performant Simple metrics collection, analysis and alerting tool. Nothing more.
  11. 11. Cloud Native Very easy to configure, deploy, maintain Designed in multiple services Container ready Orchestration ready (dynamic config)
  12. 12. Open Source Apache 2.0 CNCF Go Support for multiple OS Many "exporters": https://github.com/prometheus/prometheus/wiki/Defau lt-port-allocations
  13. 13. Performance Prometheus is designed to fetch data in an interval measured in SECONDS Designed to handle lots of metrics New storage engine in 2.0 to scale even better and support usecases like kubernetes
  14. 14. Data Centric A Metric in Prometheus has metadata: myql_global_status_handlers_total{handler="tmp_write"} 1122 And lots of function to filter, change, remove... those metadata while fetching them.
  15. 15. Metrics type Counters (always go up) Gauge (go up and down) Histograms (aggregate by buckets) Summary (percentiles) (most of the time useless)
  16. 16. A word about Prometheus vs Graphite Prometheus does not see a metric as an "event". Metrics are current value until they are replaced. You can not see when a metric has been included in Prometheus. For Events, Prometheus refers to Elasticsearch.
  17. 17. How? Creative Commons Attribution 2.0 https://www.flickr.com/photos/75001512@N00/6824648824
  18. 18. Warning : this is complicated.
  19. 19. Prometheus uses HTTP and PLAIN TEXT. http(s) is supported basic auth / token also (exposing https is not Prometheus job)
  20. 20. $ curl http://127.0.0.1:9090/metrics # HELP prometheus_notifications_queue_length # The number of alert notifications in the queue. # TYPE prometheus_notifications_queue_length gauge prometheus_notifications_queue_length 0 # HELP prometheus_notifications_sent_total # Total number of alerts successfully sent. # TYPE prometheus_notifications_sent_total counter prometheus_notifications_sent_total{alertmanager="127.0.0.1 :9093"} 4.796464e+06
  21. 21. $ curl http://127.0.0.1:9090/metrics prometheus_notifications_queue_length 0 prometheus_notifications_sent_total{alertmanager="127.0.0.1 :9093"} 4.796464e+06
  22. 22. Anything that can embed a http server can serve prometheus metrics.
  23. 23. Native Prometheus integrations Kubernetes Ceph etcd telegraph Grafana mgmt ...
  24. 24. What when you can't? Linux (not counting TUX web server) MySQL Third party software Third party services
  25. 25. Exporters Creative Commons Attribution 2.0 https://www.flickr.com/photos/fuzzy/563198182
  26. 26. Exporters Exporters expose metrics with an HTTP API They connect to the real target Bindings available for many languages Exporters do not save data ; they are not "proxies" and don't "cache" anything
  27. 27. Official exporters Consul, Memcached, MySQL, Node/System, HAProxy, AWS Cloudwatch, Collectd, Graphite, Statsd, JMX, influxdb, SNMP, Blackbox
  28. 28. Node exporter Linux Metrics Including: CPU, Memory, Disk space, networking, load, time...
  29. 29. Textfile As part of the node_exporter, you can write metrics in file Scripts output, etc ...
  30. 30. Blackbox exporter HTTP, DNS, TCP sockets, ICMP... http://127.0.0.1:9115/probe? target=https://inuits.eu&module=http_2xx probe_ssl_earliest_cert_expiry 1.52697083e+09
  31. 31. MySQL Creative Commons Attribution 2.0 https://www.flickr.com/photos/lurkerm/262541595
  32. 32. Frequency Some queries are expensive You can decide to fetch some data every 1m, others every 10s...
  33. 33. Wait .. Exporters don't cache?? How do I fetch some data every 10s and others every 1m?
  34. 34. scrape_configs: ��job_name:�'mysql�global�status' ��scrape_interval:�10s ��static_configs: ����targets: ������'172.31.14.3:9104' ��params: ����collect[]: ������global_status ��job_name:�'mysql�performance' ��scrape_interval:�1m ��static_configs: ����targets: ������'172.31.14.3:9104' ��params: ����collect[]: ������perf_schema.tableiowaits ������perf_schema.indexiowaits ������perf_schema.tablelocks
  35. 35. MySQL Replication MySQL Master <-> MySQL Master MySQL Master -> MySQL Slave MySQL Master -> MySQL Slave -> MySQL Slave MySQL Masters -> MySQL Slaves -> MySQL Slaves -> MySQL Slaves MySQL Master -> MySQL Slaves
  36. 36. pt-heartbeat pt-heartbeart is a daemon that updates an entry with current timestamp on a mysql server every second. On the replica, you can check the timestamp and do �NOW���timestamp�to get the real lag. +����������������������������+�����������+ |�ts�������������������������|�server_id�| +����������������������������+�����������+ |�2017�08�17T16:55:01.001030�|���������1�| +����������������������������+�����������+
  37. 37. pt-heartbeat GPL Perl Part of percona toolkit
  38. 38. wait, mysql has that natively mysql>�SHOW�SLAVE�STATUSG ... Seconds_Behind_Master:�0 ... aka mysqld_exporter metric: �mysql_slave_lag_seconds�
  39. 39. Bugs Fixes for Seconds_Behind_Master in: 5.7.18, 5.6.36, 5.6.23, 5.6.16.
  40. 40. How it works Checks the heartbeat table (SQL query). It's not calling the �pt�heartbeat�cli. So it is independant from it.
  41. 41. CLI flags collect.heartbeat collect.heartbeat.database collect.heartbeat.table
  42. 42. Metrics mysql_heartbeat_stored_timestamp_seconds{server_id="1"} mysql_heartbeat_now_timestamp_seconds{server_id="1"}
  43. 43. Back to Prometheus Creative Commons Attribution 2.0 https://www.flickr.com/photos/andrepierre/14843110667
  44. 44. Exploring Metrics
  45. 45. Exploring Metrics
  46. 46. Exploring Metrics
  47. 47. Exploring Metrics
  48. 48. PromQL mysql_global_status_commands_total
  49. 49. PromQL mysql_global_status_commands_total{command="select"}
  50. 50. PromQL mysql_global_status_commands_total {command=~"select|set_options"}
  51. 51. PromQL mysql_global_status_commands_total{command=~"select|set _options"}
  52. 52. PromQL deriv(mysql_global_status_connections[5m])
  53. 53. PromQL mysql_up�==�0
  54. 54. PromQL sum(�avg_over_time( mysql_info_schema_threads[10m]))�by (instance)���sum( avg_over_time(mysql_info_schema_thread s[10m]�offset�1d))�by�(instance)
  55. 55. PromQL {__name__=~".+innodb.+cache.*"} predict_linear(mysql_heartbeat_lag_seconds[5m],�60*2) sum(rate(mysql_global_status_commands_total{command=~" (commit|rollback)"}[5m]))
  56. 56. Recording and alerting rules
  57. 57. Rules Prometheus can record extra metrics Make expensive calculations Alert if value is present
  58. 58. groups: ��name:�mysql ��rules: ����record:�mysql_heartbeat_lag_second ����expr:�| ������mysql_heartbeat_now_timestamp_seconds�� ������mysql_heartbeat_stored_timestamp_seconds ����alert:�MysqlReplicationNotRunning ����expr:�| ������mysql_slave_status_slave_io_running�==�0�OR ������mysql_slave_status_slave_sql_running�==�0 ����for:�2m ����annotations:���� ������summary:�"Slave�replication�is�not�running" ����alert:�MySQLReplicationLag ����expr:�| �����(mysql_slave_lag_seconds�>�30) �����AND�on�(instance) �����(predict_linear( �������mysql_slave_lag_seconds[5m],�60*2)�>�0) ����for:�5m ����labels: ������priority:�immediate
  59. 59. Alertmanager When prometheus has an alert, it sends it. Every minute by default, as long as the alert is ongoing Alertmanager, a separated daemon, do the rest of the work
  60. 60. What is alertmanager doing? Grouping Inhibition Silence Notify
  61. 61. grouping alerts 5 nodes are down. Do you want 5 email? group_by:�['alertname',�'cluster',�'service']
  62. 62. inhibition Datacenter is on fire. Do you want to know that switches, hosts, services are down? ��source_match: ����severity:�'critical' ��target_match: ����severity:�'warning'
  63. 63. Silence I upgrade kernel and reboot the service. Do I want mails during that time?
  64. 64. Notifications Alertmanager can notify you with: Mails Webhooks 3rd party services SMS (sachet: 3rd party integration using webhook)
  65. 65. Alerting routes You can send alerts to defined people based on routes Everything to logs mailbox Critical alerts Network to net team SMS Svc to app team SMS Warning alerts Network to net team mail Svc to app team mail
  66. 66. High Availability: Prometheus Have different prometheis servers with the same config They do not talk to each other All of them fetch the same data They monitor each other
  67. 67. High availability: Alertmanager Have multiple alertmanagers with the same config Prometheis send alerts to all alertmanagers Alert manager talk to each other not to send the same notification
  68. 68. One tool does one job... Prometheus will collect data Alertmanager will send notifications Exporters will expose data Grafana will graph data
  69. 69. Grafana Open Source (Apache 2.0) Web app Specialized in visualization Pluggable Multiple datasources: prometheus, graphite, influxdb... Has an API!
  70. 70. History of Grafana Grafana is a fork of Kibana 3 ; used to be JS- Driven. Now fully featured, requires a database, multi- projects/users support, etc...
  71. 71. Grafana and Prometheus Prometheus shipped its own consoles Now it recommends Grafana and deprecated its own consoles
  72. 72. Grafana Dashboards
  73. 73. Grafana Dashboards
  74. 74. Time Picker
  75. 75. Configure Prometheus in Grafana
  76. 76. Configure Prometheus in Grafana
  77. 77. Prometheus Dashboard
  78. 78. Multiple Prometheus instances You can add multiple prometheus instances on grafana You can add dropdown on the top to select which one you want to use Use case: prometheus HA, local prometheus (with access mode=direct)
  79. 79. Creating Grafana Dashboards Takes time Requires deep knowledge of the tools Improved over time Easy to share (json + online library)
  80. 80. Percona Grafana Dashboard Percona Open Sourced Grafana Dashboards Covering MySQL, Mongo and Linux monitoring Part of a bigger picture, PMM, but usable standalone Open Source (AGPL!) https://github.com/percona/grafana- dashboards
  81. 81. Installing Percona Graphes Method 1 Enable File dashboards in Grafana Clone grafana-dashboards to the configured location (or make a package) Method 2 Use the Grafana API to upload the JSON's.
  82. 82. MySQL Setup You'll need mysqld_exporter, with a user MySQL 5.1+ Performance Schema for full set of metrics mysqld_exporter -collect.binlog_size=true -collect.info_schema.processlist=true`
  83. 83. node_exporter setup node_exporter -collectors.enabled= "diskstats,filefd,filesystem,loadavg,meminfo,n etdev,stat,time,uname,vmstat"
  84. 84. Prometheus (static file) scrape_configs: ����job_name:�prometheus ����static_configs: ��������targets:�['localhost:9090'] ��������labels: ����������instance:�prometheus ����job_name:�linux ����static_configs: ��������targets:�['10.0.98.43:9100'] ��������labels: ����������instance:�db1 ����job_name:�mysql ����static_configs: ��������targets:�['10.0.98.43:9104'] ��������labels: ����������instance:�db1
  85. 85. Dashboards
  86. 86. Dashboards
  87. 87. Dashboards
  88. 88. We don't need all of them? Because Grafana is just viz, you can import only the one you want (e.g. exclude Mongo) You can import later any extra dashboard you need
  89. 89. MySQL Overview
  90. 90. MySQL Overview
  91. 91. MySQL Overview
  92. 92. InnoDB
  93. 93. InnoDB
  94. 94. InnoDB
  95. 95. Replication
  96. 96. Other grafana features Multi tenant Deployments Folders for dashboards (comming soon)
  97. 97. Jsonnet library to create dashboard Brand new, still WIP https://github.com/grafana/grafonnet-lib
  98. 98. Conclusions Prometheus and Grafana are first-class monitoring tools Totally different approach than other tools Embeddable into your apps Percona Dashboards gets your graphes ready in no-time with minimal efforts
  99. 99. Julien Pivotto roidelapluie roidelapluie@inuits.eu Inuits https://inuits.eu info@inuits.eu Contact

×