Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next

2

Share

DevOps Fest 2019. Mykola Marzhan. Monitoring Cloud Databases

Troubleshooting the database-related problems turns out to be not a simple task even if you run your database on-premise. And performance debug can become a nightmare when you run it as a managed service in AWS/GCP/Azure, because you have no access to the underlying OS, and series of DB metrics gathered by your monitoring solution is the only subject you have to explore.
The talk will make an overview of a monitoring possibilities available for MySQL/PostgreSQL managed database in case of AWS, GCP, and Azure cloud providers. We will review what monitoring data can be gathered, speak about data granularity, and discuss ways to export these metrics to Prometheus for their simplified representation and wide/complex troubleshooting analysis of the whole instance.

Related Books

Free with a 30 day trial from Scribd

See all

DevOps Fest 2019. Mykola Marzhan. Monitoring Cloud Databases

  1. 1. CLOUD DATABASES MONITORING Iwo Panowicz, Mykola Marzhan Version: 11.04.19
  2. 2. Mykola Marzhan AWS Certified Solutions Architect - Professional.
 Mykola has been developing monitoring systems since 2004.
  3. 3. AGENDA ➤ OS Metrics ➤ AWS ➤ Google Cloud ➤ Azure ➤ Database Metrics ➤ MySQL ➤ PostgresSQL ➤ Query Analytics
  4. 4. OS METRICS
  5. 5. AMAZON WEB SERVICES AMAZON RDS
  6. 6. AMAZON AURORA
  7. 7. AMAZON RELATIONAL DATABASE SERVICE BASIC MONITORING
  8. 8. BASIC MONITORING 1-MINUTE SAMPLING
  9. 9. RDS BASIC MONITORING BinLogDiskUsage BurstBalance CPUUtilization CPUCreditUsage CPUCreditBalance DatabaseConnections DiskQueueDepth FreeableMemory FreeStorageSpace NetworkReceiveThroughput
 NetworkTransmitThroughput ReadIOPS
 WriteIOPS ReadLatency
 WriteLatency ReadThroughput
 WriteThroughput ReplicaLag SwapUsage
  10. 10. BASIC MONITORING BUT HOW MUCH 
 USEFUL OS METRICS?
  11. 11. RDS BASIC MONITORING BinLogDiskUsage BurstBalance CPUUtilization CPUCreditUsage CPUCreditBalance DatabaseConnections DiskQueueDepth FreeableMemory FreeStorageSpace NetworkReceiveThroughput
 NetworkTransmitThroughput ReadIOPS
 WriteIOPS ReadLatency
 WriteLatency ReadThroughput
 WriteThroughput ReplicaLag SwapUsage
  12. 12. BASIC MONITORING WHAT ABOUT RDS AURORA?
  13. 13. AURORA BASIC MONITORING BinLogDiskUsage BurstBalance CPUUtilization CPUCreditUsage CPUCreditBalance DatabaseConnections DiskQueueDepth FreeableMemory FreeStorageSpace NetworkReceiveThroughput
 NetworkTransmitThroughput VolumeReadIOPS ?!?
 VolumeWriteIOPS ?!? ReadLatency
 WriteLatency ReadThroughput
 WriteThroughput ReplicaLag SwapUsage
  14. 14. AMAZON RELATIONAL DATABASE SERVICE ENHANCED MONITORING
  15. 15. ENHANCED MONITORING engine instanceID instanceResourceID numVCPUs timestamp uptime version cpuUtilization.guest cpuUtilization.idle cpuUtilization.irq cpuUtilization.nice cpuUtilization.steal cpuUtilization.system cpuUtilization.total cpuUtilization.user cpuUtilization.wait diskIO.avgQueueLen diskIO.avgReqSz diskIO.await diskIO.device diskIO.readLatency ?!? diskIO.writeLatency ?!? diskIO.readIOsPS diskIO.writeIOsPS diskIO.readKb diskIO.writeKb diskIO.rrqmPS diskIO.wrqmPS diskIO.readKbPS diskIO.writeKbPS diskIO.tps diskIO.util fileSys.maxFiles fileSys.mountPoint fileSys.name fileSys.total fileSys.used fileSys.usedFilePercent fileSys.usedFiles fileSys.usedPercent loadAverageMinute.fiftee loadAverageMinute.five loadAverageMinute.one memory.buffers memory.cached memory.dirty memory.free memory.total memory.active memory.inactive memory.mapped memory.pageTables memory.slab memory.hugePagesFree memory.hugePagesRsvd memory.hugePagesSize memory.hugePagesSurp memory.hugePagesTotal memory.writeback network.interface network.rx network.tx processList.cpuUsedPc processList.id processList.memoryUsed processList.name processList.parentID processList.rss processList.tgid processList.VIRT swap.cached swap.in ?!? swap.out ?!? swap.free swap.total tasks.blocked tasks.running tasks.sleeping tasks.stopped tasks.total tasks.zombie
  16. 16. ENHANCED MONITORING engine instanceID instanceResourceID numVCPUs timestamp uptime version cpuUtilization.guest cpuUtilization.idle cpuUtilization.irq cpuUtilization.nice cpuUtilization.steal cpuUtilization.system cpuUtilization.total cpuUtilization.user cpuUtilization.wait diskIO.avgQueueLen diskIO.avgReqSz diskIO.await diskIO.device diskIO.readLatency ?!? diskIO.writeLatency ?!? diskIO.readIOsPS diskIO.writeIOsPS diskIO.readKb diskIO.writeKb diskIO.rrqmPS diskIO.wrqmPS diskIO.readKbPS diskIO.writeKbPS diskIO.tps diskIO.util fileSys.maxFiles fileSys.mountPoint fileSys.name fileSys.total fileSys.used fileSys.usedFilePercent fileSys.usedFiles fileSys.usedPercent loadAverageMinute.fiftee loadAverageMinute.five loadAverageMinute.one memory.buffers memory.cached memory.dirty memory.free memory.total memory.active memory.inactive memory.mapped memory.pageTables memory.slab memory.hugePagesFree memory.hugePagesRsvd memory.hugePagesSize memory.hugePagesSurp memory.hugePagesTotal memory.writeback network.interface network.rx network.tx processList.cpuUsedPc processList.id processList.memoryUsed processList.name processList.parentID processList.rss processList.tgid processList.VIRT swap.cached swap.in ?!? swap.out ?!? swap.free swap.total tasks.blocked tasks.running tasks.sleeping tasks.stopped tasks.total tasks.zombie
  17. 17. AMAZON RELATIONAL DATABASE SERVICE ENABLE
 ENHANCED MONITORING
  18. 18. ENHANCED MONITORING WHAT?!?
  19. 19. ENHANCED MONITORING { "engine": "Aurora", "instanceID": "mykola-test1", "instanceResourceID": "db-EPN53WC4JKM3GYOD7GNGZ67XBU", "timestamp": "2018-08-09T07:03:43Z", "version": 1, "uptime": "1:55:43", "numVCPUs": 1, "cpuUtilization": { "guest": 0, "irq": 0.03, "system": 1.47, "wait": 0.91, "idle": 93.04, "user": 3.85, "total": 6.96, "steal": 0.66, "nice": 0.04
  20. 20. ENHANCED MONITORING https://aws.amazon.com/premiumsupport/knowledge-center/custom-cloudwatch-metrics-rds/
  21. 21. ENHANCED MONITORING https://github.com/percona/rds_exporter
  22. 22. GOOGLE CLOUD PLATFORM GOOGLE CLOUD SQL
  23. 23. STACKDRIVER
  24. 24. STACKDRIVER 1-MINUTE SAMPLING,
 2.5-4 MINUTES DELAY
  25. 25. STACKDRIVER auto_failover_request_count available_for_failover cpu/reserved_cores cpu/usage_time cpu/utilization disk/bytes_used disk/quota disk/read_ops_count disk/utilization disk/write_ops_count memory/quota memory/usage memory/utilization mysql/ innodb_buffer_pool_pages_dirty mysql/innodb_buffer_pool_pages_free mysql/ innodb_buffer_pool_pages_total mysql/innodb_data_fsyncs mysql/innodb_os_log_fsyncs mysql/innodb_pages_read mysql/innodb_pages_written mysql/queries mysql/questions mysql/received_bytes_count mysql/replication/ seconds_behind_master mysql/replication/slave_io_running mysql/replication/slave_sql_running mysql/sent_bytes_count network/connections network/received_bytes_count network/sent_bytes_count postgresql/num_backends postgresql/replication/ replica_byte_lag postgresql/transaction_count state up uptime
  26. 26. BASIC MONITORING BUT HOW MUCH 
 USEFUL OS METRICS?
  27. 27. STACKDRIVER auto_failover_request_count available_for_failover cpu/reserved_cores cpu/usage_time cpu/utilization disk/bytes_used disk/quota disk/utilization disk/read_ops_count disk/write_ops_count memory/quota memory/usage memory/utilization mysql/ innodb_buffer_pool_pages_dirty mysql/innodb_buffer_pool_pages_free mysql/ innodb_buffer_pool_pages_total mysql/innodb_data_fsyncs mysql/innodb_os_log_fsyncs mysql/innodb_pages_read mysql/innodb_pages_written mysql/queries mysql/questions mysql/received_bytes_count mysql/replication/ seconds_behind_master mysql/replication/slave_io_running mysql/replication/slave_sql_running mysql/sent_bytes_count network/connections network/received_bytes_count network/sent_bytes_count postgresql/num_backends postgresql/replication/ replica_byte_lag postgresql/transaction_count state up uptime
  28. 28. STACKDRIVER https://github.com/frodenas/stackdriver_exporter
  29. 29. MICROSOFT AZURE AZURE DATABASE FOR MYSQL
  30. 30. AZURE DATABASE MONITORING
  31. 31. AZURE DATABASE MONITORING 1-MINUTE SAMPLING
  32. 32. AZURE DATABASE MONITORING cpu_percent memory_percent io_consumption_percent storage_percent storage_used storage_limit serverlog_storage_percent serverlog_storage_usage serverlog_storage_limit active_connections connections_failed seconds_behind_master network_bytes_egress network_bytes_ingress
  33. 33. BASIC MONITORING BUT HOW MUCH 
 USEFUL OS METRICS?
  34. 34. AZURE DATABASE MONITORING cpu_percent memory_percent io_consumption_percent storage_percent storage_used storage_limit serverlog_storage_percent serverlog_storage_usage serverlog_storage_limit active_connections connections_failed seconds_behind_master network_bytes_egress network_bytes_ingress
  35. 35. AZURE DATABASE MONITORING https://github.com/RobustPerception/azure_metrics_exporter
  36. 36. DATABASE METRICS
  37. 37. DATABASE METRICS MYSQL MONITORING
  38. 38. MYSQL MONITORING https://github.com/prometheus/mysqld_exporter
  39. 39. DATABASE METRICS POSTGRESQL MONITORING
  40. 40. POSTGRESQL MONITORING https://github.com/wrouesnel/postgres_exporter
  41. 41. { QUERY ANALYTICS }
  42. 42. QUERY ANALYTICS ENABLE PERFORMANCE_SCHEMA
  43. 43. WHAT IS PMM?
  44. 44. PERCONA MONITORING AND MANAGEMENT FREE, OPEN SOURCE DATABASE MONITORING PLATFORM FOR MYSQL, POSTGRESQL AND MONGODB
  45. 45. PERCONA MONITORING AND MANAGEMENT AGENTLESS SUPPORT
 IN PMM 1.16.0
  46. 46. Iwo Panowicz Mykola Marzhan
  • SergeyGladchenko

    Nov. 12, 2020
  • IanLi1

    May. 23, 2020

Troubleshooting the database-related problems turns out to be not a simple task even if you run your database on-premise. And performance debug can become a nightmare when you run it as a managed service in AWS/GCP/Azure, because you have no access to the underlying OS, and series of DB metrics gathered by your monitoring solution is the only subject you have to explore. The talk will make an overview of a monitoring possibilities available for MySQL/PostgreSQL managed database in case of AWS, GCP, and Azure cloud providers. We will review what monitoring data can be gathered, speak about data granularity, and discuss ways to export these metrics to Prometheus for their simplified representation and wide/complex troubleshooting analysis of the whole instance.

Views

Total views

157

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

0

Shares

0

Comments

0

Likes

2

×