Successfully reported this slideshow.
Your SlideShare is downloading. ×

Monitoring MySQL at scale

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 47 Ad
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (19)

Advertisement

Similar to Monitoring MySQL at scale (20)

Recently uploaded (20)

Advertisement

Monitoring MySQL at scale

  1. 1. Monitoring MySQL at SCALE
  2. 2. Who We Are Ilan Rabinovitch Dir. Technical Community Datadog Ovais Tariq Storage SRE Uber (formerly at Lithium & Percona)
  3. 3. Agenda 1. About Lithium and MySQL 2. Background: Monitoring Challenges in a Dynamic World 3. Theory: Monitoring 101 4. Practical: Triaging a Real Incident at Lithium
  4. 4. About Lithium Technologies Lithium’s platform helps brands connect, engage and understand their customers
  5. 5. MySQL Architecture / Data Flow •Multi-Tenant SaaS applications •Typical Master-slave replication setup •MySQL running ○ On bare metal ○ In AWS public cloud ○ In OpenStack
  6. 6. Culture Automation Metrics Sharing Damon Edwards and John Willis DevOps Day LA
  7. 7. Culture Automation Metrics Sharing Damon Edwards and John Willis DevOps Day LA
  8. 8. You’re in the cloud and it's everything you dreamed of! Autoscaling Infinite StorageManaged Databases Container Orchestration Private Clouds
  9. 9. Collecting data is cheap; not having it when you need it can be expensive
  10. 10. Instrument all the things!
  11. 11. Operational Complexity Increases with.. • Number of things to measure • Velocity of change
  12. 12. How much we measure? 1 instance • 10 metrics from CloudWatch 1 operating system (e.g., Linux) • 100 metrics MySQL Instance • 350~ metrics
  13. 13. 460 metrics per host 46,000 100 instances
  14. 14. •Earlier - typical Nagios and Cacti setup •Static config and lack of context •No correlation between alerts and graphs •No self-service for developers •In-house tooling has high cost
  15. 15. When to let a sleeping engineer lie?
  16. 16. Recurse until you find root cause
  17. 17. • Query Time • Queries Per Second Data Sources • Performance Schema • MySQL Status Variables
  18. 18. • Query Time • Queries Per Second Sources: • Performance Schema
  19. 19. • Disk Space Usage • Threads_connected • Threads_running • Connection_errors_ internal • Aborted_connects • Connection_errors_ max_connections Sources: ● Server Status Variables
  20. 20. • Configuration Change • Code Deployment • Service Started / Stopped • MySQL Upgrades • Failovers • etc
  21. 21. Change in workload without an increase in workload affected the schema ‘groupecasino’ • Workload characteristics change to make it more CPU bound • No increase in IO activity • Increase in number of read operations • No change in types of read operations • Similar number of range queries reading more rows
  22. 22. Monitoring 101: Alerting https://www.datadoghq.com/blog/monitoring-101-alerting/ Monitoring 101: Collecting the Right Data https://www.datadoghq.com/blog/monitoring-101-collecting-data/ Monitoring 101: Investigating performance issues https://www.datadoghq.com/blog/monitoring-101-investigation/ Monitoring MySQL Performance Metrics https://www.datadoghq.com/blog/monitoring-mysql-performance-metrics/ Collecting MySQL Metrics https://www.datadoghq.com/blog/collecting-mysql-statistics-and-metrics/

×