2. Who We Are
Ilan Rabinovitch
Dir. Technical Community
Datadog
Ovais Tariq
Storage SRE
Uber
(formerly at Lithium & Percona)
3. Agenda
1. About Lithium and MySQL
2. Background: Monitoring Challenges in a Dynamic World
3. Theory: Monitoring 101
4. Practical: Triaging a Real Incident at Lithium
5. MySQL Architecture / Data Flow
•Multi-Tenant SaaS applications
•Typical Master-slave replication setup
•MySQL running
○ On bare metal
○ In AWS public cloud
○ In OpenStack
25. •Earlier - typical Nagios and Cacti setup
•Static config and lack of context
•No correlation between alerts and
graphs
•No self-service for developers
•In-house tooling has high cost
32. • Query Time
• Queries Per Second
Data Sources
• Performance Schema
• MySQL Status Variables
33. • Query Time
• Queries Per Second
Sources:
• Performance Schema
34. • Disk Space Usage
• Threads_connected
• Threads_running
• Connection_errors_ internal
• Aborted_connects
• Connection_errors_ max_connections
Sources:
● Server Status Variables
35. • Configuration Change
• Code Deployment
• Service Started / Stopped
• MySQL Upgrades
• Failovers
• etc
45. Change in workload without an increase in
workload affected the schema ‘groupecasino’
• Workload characteristics change to make it more CPU bound
• No increase in IO activity
• Increase in number of read operations
• No change in types of read operations
• Similar number of range queries reading more rows
47. Monitoring 101: Alerting
https://www.datadoghq.com/blog/monitoring-101-alerting/
Monitoring 101: Collecting the Right Data
https://www.datadoghq.com/blog/monitoring-101-collecting-data/
Monitoring 101: Investigating performance issues
https://www.datadoghq.com/blog/monitoring-101-investigation/
Monitoring MySQL Performance Metrics
https://www.datadoghq.com/blog/monitoring-mysql-performance-metrics/
Collecting MySQL Metrics
https://www.datadoghq.com/blog/collecting-mysql-statistics-and-metrics/