MySQL performance monitoring using Statsd and Graphite (PLUK2013)

MySQL Performance
monitoring using
Statsd and Graphite
Art van Scheppingen
Head of Database Engineering

Overview
1.
2.
3.
4.
5.
6.
7.

Who are we?
What monitoring tools do we use?
What are StatsD, Collectd and Graphite?
How MySQL logs to StatsD
Graphing examples
Challenges
Questions?

2

Who are we?
Who is Spil Games?

Facts
•
•
•
•
•

Company founded in 2001
350+ employees world wide
180M+ unique visitors per month
Over 60M registered users
45 portals in 19 languages
• Casual games
• Social games
• Real time multiplayer games
• Mobile games
• 35+ MySQL clusters
• 60k queries per second (3.5 billion qpd)
4

Geographic Reach
180 Million Monthly Active Users(*)

Source: (*) Google Analytics, August 2012

5

Brands
Girls, Teens and Family

spielen.com
juegos.com
gamesgames.com
games.co.uk
6

Monitoring
We use(d) many many many
monitoring tools so far!

Existing monitoring systems we use(d)
•
•
•
•

Opsview/Nagios (mainly availability)
Cacti (using Baron Schwartz/Percona templates)
MONYog
Good ol’ RRD

8

Challenges
• Problems with existing systems
• Stats gathering through polling
• Data gets averaged out
• (Host) checks are run serial
• Slowdowns in a run means no/less data
• Setting up an SSH connection is slow
• Low granularity (1 to 5 minutes)
• Hardly scalable
• Difficult to correlate metrics

9

Difficult to add a new metric
host065
bash-3.2# netstat -s | grep "listen queue"
26 times the listen queue of a socket overflowed
host066
bash-3.2# netstat -s | grep "listen queue"

10

Statsd + Collectd
+ Graphite
What are they?

What is Collectd?
•
•
•
•

Unix daemon that gathers system statistics
Over 90 (input/output) plugins
Plugin to send metrics to Graphite/Carbon
Very useful for system metrics

12

Collectd
Collectd

Carbon

TCP

30 second interval

Gather data plugins

CPU

DISK

LOAD

13

….

What is StatsD?
•
•
•
•
•
•
•

Front-end proxy for Graphite/Carbon (by Etsy)
NodeJS daemon (also other languages)
Receives UDP (on localhost)
Buffers metrics locally
Flushes periodically data to Graphite/Carbon (TCP)
Client libraries available in about any language
Send any metric you like!

14

StatsD functions
• StatsD functions
• update_stats
• increment/decrement
• set
• gauge
• timers

15

StatsD Bash examples
echo ”some.metric:1|c" | nc -w 1 -u graphite.host 8125
echo ”some.metric:1|c" > /dev/udp/localhost/8125
bash-3.2# netstat -s | grep "listen"
netstat -s | grep "listen" | awk '{print "hostname.listen.queue.overflowed:"$1"|c"}’ >
/dev/udp/localhost/8125
hostname.listen.queue.overflowed:26|c
echo "show global status" | mysql -u root | awk '{print
"hostname.mysql.status."$1":"$2"|c"}'

16

StatsD
StatsD

Carbon

TCP

2 second interval
localhost:8125
UDP
Application Level

# OF LOGINS

MySQL_Statsd

CACHE HIT/MISS

STATUS

17

INNODB STATUS

What is Graphite?
• Highly scalable real-time graphing system
• Collects numeric time-series
• Backend daemon Carbon
• Carbon-cache: receives data
• Carbon-aggregator: aggregates data
• Carbon-relay: replication and sharding
• RRD or Whisper database

18

Graphite’s capabilities
• Each metric is in its own bucket
• Periods make folders
• prod.syseng.mmm.<hostname>.admin_offline
• Metric types
• Counters
• Gauge
• Retention can be set using a regex
• [mysql]
• pattern = ^prod.syseng.mysql..*$
• retentions = 2s:1d,1m:3d,5m:7d,1h:5y
19

Our Graphite environment
Client requesting graphs

Server-1

Loadbalancer (port 443)

Server-2

Server-n


Graphite Rendering Cluster

Carbon relay

3 nodes

2 nodes
24h retention

Skyline

1 node

8 nodes
DEV

SYSENG

SERVICES1

20

SERVICES2

Our Graphite cluster(s)
Client requesting graphs

Server-1

12 graphs/s


Graphite Rendering Cluster

Carbon relay

700 get/s

DEV

Server-n

a


250K m/s

Server-2

3M m(etrics)/s(econd)

1M m/s
SYSENG

1.5M m/s
SERVICES1

21

500K m/s
SERVICES2

MySQL + StatsD
How do we use them?

Why use StatsD over Collectd?
• MySQL plugin for Collectd
• Sends SHOW STATUS
• No INNODB STATUS
• Plugin not flexible
• DBI plugin for Collectd
• Metrics based on columns
• Different granularity needed
• Separate daemon (with persistent connection)
• StatsD is easy as ABC

24

MySQL StatsD daemon
•
•
•
•
•
•
•
•

Written in Python
Rewritten and open sourced during a hackday
Gathers data every 0.5 seconds
Sends to StatsD (localhost) after every run
Easy configuration
Persistent connection
Baron Schwartz’ InnoDB status parser (cacti poller)
Other interesting metrics and counters
• Information Schema
• Performance Schema
• MariaDB specific
• Galera specific
• If you can query it, you can use it as a metric!
25

MySQL StatsD overview
StatsD
MySQL

SHOW GLOBAL VARIABLES
SHOW GLOBAL STATUS
SHOW ENGINE INNODB STATUS

StatsD thread

MySQL Thread

MySQL StatsD daemon

26

Example configuration
[daemon]
logfile = /var/log/mysql_statsd/daemon.log
pidfile = /var/run/mysql_statsd.pid
[statsd]
host = localhost
port = 8125
prefix = prd.mysql
include_hostname = true
[mysql]
host = localhost
username = mysqlstatsd
password =ub3rs3cr3tp@ss!
stats_types = status,variables,innodb,commit
query_variables = SHOW GLOBAL VARIABLES
interval_variables = 10000
query_status = SHOW GLOBAL STATUS
interval_status = 500
query_innodb = SHOW ENGINE INNODB STATUS
interval_innodb = 10000
query_commit = COMMIT
interval_commit = 5000
sleep_interval = 500
[metrics]
variables.max_connections = g
status.max_used_connections = g
status.connections = c
innodb.spin_waits = c

27

MySQL Multi Master patch
•
•
•
•

Perl (Net::Statsd)
Sends any status change to StatsD (localhost)
Non-blocking (thanks to UDP)
Draw as infinite in Graphite

28

Other metrics
• Deployments
• User initiated actions
• Logins
• High scores
• Comments / ratings
• Images uploaded
• Payments
• Application metrics
• Error counts
• Cache statistics (cache hit/miss)
• Request timers
• Image sizes
29

Start graphing!
Now it starts to get
interesting!

What is important for you?
• Identify your KPIs
• Don’t graph everything
• More graphs == less overview
• Combine metrics
• Stack clusters

31

Correlate!
• Include other metrics into your graphs
• Deployments
• Failover(s)
• Combine application metrics with your database
• Other influences
• Launch of a new game
• Apple keynotes

32

Graphing
• Graphite Graphing Engine
• DIY
• Giraffe
• Readily available dashboards/tools
• Graph Explorer (vimeo)
• Team Dashboard
• Skyline (Etsy)
• Dashing (Shopify)

33

Graphite Graphing Engine
• URI based rendering API
• Support for wildcards
• stats.prod.syseng.mysql.*.status.com_select
• sumSeries (stats.prod.syseng.mysql.*.status.com_select)
• aliasByNode(stats.prod.syseng.mysql.*.status.com_select, 4)

• Many functions
• Nth percentile
• Holt-Winters Forecast
• Timeshift

40

Graphite Example URL
https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ
tDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.t
otal.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28
secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.que
stions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2
C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&
until=23%3A59_20130421

42

Graphite Example URL
https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ
tDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.t
otal.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28
secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.que
stions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2
C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&
until=23%3A59_20130421

43

What challenges do we have?
•
•
•
•
•
•
•

Improve MySQL-statsd (extensive issue list)
No zoom in on graphs
Get Skyline to work and not cry wolf
Machine learning
Eternal hunger for more metrics
Abuse of the system
Hitting limits of SSD write performance
• Virident? Fusion-IO?
• Carbon  OpenTSDB  Graphite-web?

47

What lessons have we learned?
• Persistent connections + repeatable read
• History list skyrocketed
• More hackdays are needed!
• Too many metrics slows down graphing
• Too many metrics can kill a host
• EstatsD for Erlang

48

Practical links
• Graphite:
http://graphite.readthedocs.org/en/latest/
• Collectd:
https://collectd.org/
• StatsD on Github by Etsy:
https://github.com/etsy/statsd/wiki
• Etsy on StatsD:
http://codeascraft.etsy.com/2011/02/15/measureanything-measure-everything/

50

Thank you!
• Presentation can be found at:
http://spil.com/pluk2013
• MySQL Statsd can be found at:
http://spil.com/mysqlstatsd
http://github.com/spilgames/mysql-statsd
• If you wish to contact me:
art@spilgames.com

51

MySQL performance monitoring using Statsd and Graphite (PLUK2013)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MySQL performance monitoring using Statsd and Graphite (PLUK2013)

Similar to MySQL performance monitoring using Statsd and Graphite (PLUK2013) (20)

More from spil-engineering

More from spil-engineering (6)

Recently uploaded

Recently uploaded (20)

MySQL performance monitoring using Statsd and Graphite (PLUK2013)

Editor's Notes