Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MySQL Performance
monitoring using
Statsd and Graphite
Art van Scheppingen
Head of Database Engineering
Overview
1.
2.
3.
4.
5.
6.
7.

Who are we?
What monitoring tools do we use?
What are StatsD, Collectd and Graphite?
How My...
Who are we?
Who is Spil Games?
Facts
•
•
•
•
•

Company founded in 2001
350+ employees world wide
180M+ unique visitors per month
Over 60M registered use...
Geographic Reach
180 Million Monthly Active Users(*)

Source: (*) Google Analytics, August 2012

5
Brands
Girls, Teens and Family

spielen.com
juegos.com
gamesgames.com
games.co.uk
6
Monitoring
We use(d) many many many
monitoring tools so far!
Existing monitoring systems we use(d)
•
•
•
•

Opsview/Nagios (mainly availability)
Cacti (using Baron Schwartz/Percona te...
Challenges
• Problems with existing systems
• Stats gathering through polling
• Data gets averaged out
• (Host) checks are...
Difficult to add a new metric
host065
bash-3.2# netstat -s | grep "listen queue"
26 times the listen queue of a socket ove...
Statsd + Collectd
+ Graphite
What are they?
What is Collectd?
•
•
•
•

Unix daemon that gathers system statistics
Over 90 (input/output) plugins
Plugin to send metric...
Collectd
Collectd

Carbon

TCP

30 second interval

Gather data plugins

CPU

DISK

LOAD

13

….
What is StatsD?
•
•
•
•
•
•
•

Front-end proxy for Graphite/Carbon (by Etsy)
NodeJS daemon (also other languages)
Receives...
StatsD functions
• StatsD functions
• update_stats
• increment/decrement
• set
• gauge
• timers

15
StatsD Bash examples
echo ”some.metric:1|c" | nc -w 1 -u graphite.host 8125
echo ”some.metric:1|c" > /dev/udp/localhost/81...
StatsD
StatsD

Carbon

TCP

2 second interval
localhost:8125
UDP
Application Level

# OF LOGINS

MySQL_Statsd

CACHE HIT/M...
What is Graphite?
• Highly scalable real-time graphing system
• Collects numeric time-series
• Backend daemon Carbon
• Car...
Graphite’s capabilities
• Each metric is in its own bucket
• Periods make folders
• prod.syseng.mmm.<hostname>.admin_offli...
Our Graphite environment
Client requesting graphs

Server-1

Loadbalancer (port 443)

Server-2

Server-n

Loadbalancer (po...
Our Graphite cluster(s)
Client requesting graphs

Server-1

12 graphs/s

Loadbalancer (port 2003)

Graphite Rendering Clus...
Graphite Storage Clusters

22
MySQL + StatsD
How do we use them?
Why use StatsD over Collectd?
• MySQL plugin for Collectd
• Sends SHOW STATUS
• No INNODB STATUS
• Plugin not flexible
• D...
MySQL StatsD daemon
•
•
•
•
•
•
•
•

Written in Python
Rewritten and open sourced during a hackday
Gathers data every 0.5 ...
MySQL StatsD overview
StatsD
MySQL

SHOW GLOBAL VARIABLES
SHOW GLOBAL STATUS
SHOW ENGINE INNODB STATUS

StatsD thread

MyS...
Example configuration
[daemon]
logfile = /var/log/mysql_statsd/daemon.log
pidfile = /var/run/mysql_statsd.pid
[statsd]
hos...
MySQL Multi Master patch
•
•
•
•

Perl (Net::Statsd)
Sends any status change to StatsD (localhost)
Non-blocking (thanks to...
Other metrics
• Deployments
• User initiated actions
• Logins
• High scores
• Comments / ratings
• Images uploaded
• Payme...
Start graphing!
Now it starts to get
interesting!
What is important for you?
• Identify your KPIs
• Don’t graph everything
• More graphs == less overview
• Combine metrics
...
Correlate!
• Include other metrics into your graphs
• Deployments
• Failover(s)
• Combine application metrics with your da...
Graphing
• Graphite Graphing Engine
• DIY
• Giraffe
• Readily available dashboards/tools
• Graph Explorer (vimeo)
• Team D...
DIY

34
Giraffe

35
Graph Explorer

36
Team Dashboard

37
Skyline

38
Dashing

39
Graphite Graphing Engine
• URI based rendering API
• Support for wildcards
• stats.prod.syseng.mysql.*.status.com_select
•...
Graphite web interface

41
Graphite Example URL
https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ
tDashed=1&target=alias%28...
Graphite Example URL
https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ
tDashed=1&target=alias%28...
Examples: timeshift

44
Examples: multiple weeks

45
Challenges
The road ahead
What challenges do we have?
•
•
•
•
•
•
•

Improve MySQL-statsd (extensive issue list)
No zoom in on graphs
Get Skyline to...
What lessons have we learned?
• Persistent connections + repeatable read
• History list skyrocketed
• More hackdays are ne...
Questions…
Practical links
• Graphite:
http://graphite.readthedocs.org/en/latest/
• Collectd:
https://collectd.org/
• StatsD on Githu...
Thank you!
• Presentation can be found at:
http://spil.com/pluk2013
• MySQL Statsd can be found at:
http://spil.com/mysqls...
Upcoming SlideShare
Loading in …5
×

MySQL performance monitoring using Statsd and Graphite (PLUK2013)

7,156 views

Published on

MySQL performance monitoring using Statsd and Graphite (PLUK2013)

Note: this is a placeholder for the presentation next Tuesday at the Percona Live London

Published in: Technology, Business
  • Valid point and we could have saved a bit of performance there.
    However it is only one of the reasons and it doesn't solve the others necessarily. We actually wish to step away from centralized polling for data and migrate to distributed pushing data to our Graphite setup.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • have you tried SSH Control-Master to keep ssh connection alived?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

MySQL performance monitoring using Statsd and Graphite (PLUK2013)

  1. 1. MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering
  2. 2. Overview 1. 2. 3. 4. 5. 6. 7. Who are we? What monitoring tools do we use? What are StatsD, Collectd and Graphite? How MySQL logs to StatsD Graphing examples Challenges Questions? 2
  3. 3. Who are we? Who is Spil Games?
  4. 4. Facts • • • • • Company founded in 2001 350+ employees world wide 180M+ unique visitors per month Over 60M registered users 45 portals in 19 languages • Casual games • Social games • Real time multiplayer games • Mobile games • 35+ MySQL clusters • 60k queries per second (3.5 billion qpd) 4
  5. 5. Geographic Reach 180 Million Monthly Active Users(*) Source: (*) Google Analytics, August 2012 5
  6. 6. Brands Girls, Teens and Family spielen.com juegos.com gamesgames.com games.co.uk 6
  7. 7. Monitoring We use(d) many many many monitoring tools so far!
  8. 8. Existing monitoring systems we use(d) • • • • Opsview/Nagios (mainly availability) Cacti (using Baron Schwartz/Percona templates) MONYog Good ol’ RRD 8
  9. 9. Challenges • Problems with existing systems • Stats gathering through polling • Data gets averaged out • (Host) checks are run serial • Slowdowns in a run means no/less data • Setting up an SSH connection is slow • Low granularity (1 to 5 minutes) • Hardly scalable • Difficult to correlate metrics 9
  10. 10. Difficult to add a new metric host065 bash-3.2# netstat -s | grep "listen queue" 26 times the listen queue of a socket overflowed host066 bash-3.2# netstat -s | grep "listen queue" 33 times the listen queue of a socket overflowed 10
  11. 11. Statsd + Collectd + Graphite What are they?
  12. 12. What is Collectd? • • • • Unix daemon that gathers system statistics Over 90 (input/output) plugins Plugin to send metrics to Graphite/Carbon Very useful for system metrics 12
  13. 13. Collectd Collectd Carbon TCP 30 second interval Gather data plugins CPU DISK LOAD 13 ….
  14. 14. What is StatsD? • • • • • • • Front-end proxy for Graphite/Carbon (by Etsy) NodeJS daemon (also other languages) Receives UDP (on localhost) Buffers metrics locally Flushes periodically data to Graphite/Carbon (TCP) Client libraries available in about any language Send any metric you like! 14
  15. 15. StatsD functions • StatsD functions • update_stats • increment/decrement • set • gauge • timers 15
  16. 16. StatsD Bash examples echo ”some.metric:1|c" | nc -w 1 -u graphite.host 8125 echo ”some.metric:1|c" > /dev/udp/localhost/8125 bash-3.2# netstat -s | grep "listen" 26 times the listen queue of a socket overflowed netstat -s | grep "listen" | awk '{print "hostname.listen.queue.overflowed:"$1"|c"}’ > /dev/udp/localhost/8125 hostname.listen.queue.overflowed:26|c echo "show global status" | mysql -u root | awk '{print "hostname.mysql.status."$1":"$2"|c"}' 16
  17. 17. StatsD StatsD Carbon TCP 2 second interval localhost:8125 UDP Application Level # OF LOGINS MySQL_Statsd CACHE HIT/MISS STATUS 17 INNODB STATUS
  18. 18. What is Graphite? • Highly scalable real-time graphing system • Collects numeric time-series • Backend daemon Carbon • Carbon-cache: receives data • Carbon-aggregator: aggregates data • Carbon-relay: replication and sharding • RRD or Whisper database 18
  19. 19. Graphite’s capabilities • Each metric is in its own bucket • Periods make folders • prod.syseng.mmm.<hostname>.admin_offline • Metric types • Counters • Gauge • Retention can be set using a regex • [mysql] • pattern = ^prod.syseng.mysql..*$ • retentions = 2s:1d,1m:3d,5m:7d,1h:5y 19
  20. 20. Our Graphite environment Client requesting graphs Server-1 Loadbalancer (port 443) Server-2 Server-n Loadbalancer (port 2003) Graphite Rendering Cluster Carbon relay 3 nodes 2 nodes 24h retention Skyline 1 node 8 nodes DEV SYSENG SERVICES1 20 SERVICES2
  21. 21. Our Graphite cluster(s) Client requesting graphs Server-1 12 graphs/s Loadbalancer (port 2003) Graphite Rendering Cluster Carbon relay 700 get/s DEV Server-n a Loadbalancer (port 443) 250K m/s Server-2 3M m(etrics)/s(econd) 1M m/s SYSENG 1.5M m/s SERVICES1 21 500K m/s SERVICES2
  22. 22. Graphite Storage Clusters 22
  23. 23. MySQL + StatsD How do we use them?
  24. 24. Why use StatsD over Collectd? • MySQL plugin for Collectd • Sends SHOW STATUS • No INNODB STATUS • Plugin not flexible • DBI plugin for Collectd • Metrics based on columns • Different granularity needed • Separate daemon (with persistent connection) • StatsD is easy as ABC 24
  25. 25. MySQL StatsD daemon • • • • • • • • Written in Python Rewritten and open sourced during a hackday Gathers data every 0.5 seconds Sends to StatsD (localhost) after every run Easy configuration Persistent connection Baron Schwartz’ InnoDB status parser (cacti poller) Other interesting metrics and counters • Information Schema • Performance Schema • MariaDB specific • Galera specific • If you can query it, you can use it as a metric! 25
  26. 26. MySQL StatsD overview StatsD MySQL SHOW GLOBAL VARIABLES SHOW GLOBAL STATUS SHOW ENGINE INNODB STATUS StatsD thread MySQL Thread MySQL StatsD daemon 26
  27. 27. Example configuration [daemon] logfile = /var/log/mysql_statsd/daemon.log pidfile = /var/run/mysql_statsd.pid [statsd] host = localhost port = 8125 prefix = prd.mysql include_hostname = true [mysql] host = localhost username = mysqlstatsd password =ub3rs3cr3tp@ss! stats_types = status,variables,innodb,commit query_variables = SHOW GLOBAL VARIABLES interval_variables = 10000 query_status = SHOW GLOBAL STATUS interval_status = 500 query_innodb = SHOW ENGINE INNODB STATUS interval_innodb = 10000 query_commit = COMMIT interval_commit = 5000 sleep_interval = 500 [metrics] variables.max_connections = g status.max_used_connections = g status.connections = c innodb.spin_waits = c 27
  28. 28. MySQL Multi Master patch • • • • Perl (Net::Statsd) Sends any status change to StatsD (localhost) Non-blocking (thanks to UDP) Draw as infinite in Graphite 28
  29. 29. Other metrics • Deployments • User initiated actions • Logins • High scores • Comments / ratings • Images uploaded • Payments • Application metrics • Error counts • Cache statistics (cache hit/miss) • Request timers • Image sizes 29
  30. 30. Start graphing! Now it starts to get interesting!
  31. 31. What is important for you? • Identify your KPIs • Don’t graph everything • More graphs == less overview • Combine metrics • Stack clusters 31
  32. 32. Correlate! • Include other metrics into your graphs • Deployments • Failover(s) • Combine application metrics with your database • Other influences • Launch of a new game • Apple keynotes 32
  33. 33. Graphing • Graphite Graphing Engine • DIY • Giraffe • Readily available dashboards/tools • Graph Explorer (vimeo) • Team Dashboard • Skyline (Etsy) • Dashing (Shopify) 33
  34. 34. DIY 34
  35. 35. Giraffe 35
  36. 36. Graph Explorer 36
  37. 37. Team Dashboard 37
  38. 38. Skyline 38
  39. 39. Dashing 39
  40. 40. Graphite Graphing Engine • URI based rendering API • Support for wildcards • stats.prod.syseng.mysql.*.status.com_select • sumSeries (stats.prod.syseng.mysql.*.status.com_select) • aliasByNode(stats.prod.syseng.mysql.*.status.com_select, 4) • Many functions • Nth percentile • Holt-Winters Forecast • Timeshift 40
  41. 41. Graphite web interface 41
  42. 42. Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ tDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.t otal.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28 secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.que stions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2 C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415& until=23%3A59_20130421 42
  43. 43. Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ tDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.t otal.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28 secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.que stions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2 C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415& until=23%3A59_20130421 43
  44. 44. Examples: timeshift 44
  45. 45. Examples: multiple weeks 45
  46. 46. Challenges The road ahead
  47. 47. What challenges do we have? • • • • • • • Improve MySQL-statsd (extensive issue list) No zoom in on graphs Get Skyline to work and not cry wolf Machine learning Eternal hunger for more metrics Abuse of the system Hitting limits of SSD write performance • Virident? Fusion-IO? • Carbon  OpenTSDB  Graphite-web? 47
  48. 48. What lessons have we learned? • Persistent connections + repeatable read • History list skyrocketed • More hackdays are needed! • Too many metrics slows down graphing • Too many metrics can kill a host • EstatsD for Erlang 48
  49. 49. Questions…
  50. 50. Practical links • Graphite: http://graphite.readthedocs.org/en/latest/ • Collectd: https://collectd.org/ • StatsD on Github by Etsy: https://github.com/etsy/statsd/wiki • Etsy on StatsD: http://codeascraft.etsy.com/2011/02/15/measureanything-measure-everything/ 50
  51. 51. Thank you! • Presentation can be found at: http://spil.com/pluk2013 • MySQL Statsd can be found at: http://spil.com/mysqlstatsd http://github.com/spilgames/mysql-statsd • If you wish to contact me: art@spilgames.com 51

×