MySQL performance monitoring using Statsd and Graphite (PLUK2013)

6,578 views

Published on

MySQL performance monitoring using Statsd and Graphite (PLUK2013)

Note: this is a placeholder for the presentation next Tuesday at the Percona Live London

Published in: Technology, Business
2 Comments
27 Likes
Statistics
Notes
  • Valid point and we could have saved a bit of performance there.
    However it is only one of the reasons and it doesn't solve the others necessarily. We actually wish to step away from centralized polling for data and migrate to distributed pushing data to our Graphite setup.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • have you tried SSH Control-Master to keep ssh connection alived?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
6,578
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
0
Comments
2
Likes
27
Embeds 0
No embeds

No notes for slide
  • so that may be the reason our name is not widely known.
  • The three main brands:Girls, aimed at girls ages from 8 to 12Teens aimed at boys and girls 10 to 15and Family basically mothers playing with their childrenStrong domains localized over 19 different languagesspielen.com, juegos.com, gamesgames.com, games.co.uk, oyunonya.comAll content is localized
  • ----- Meeting Notes (30-11-12 12:00) -----Abbreviations (try to pronounce)Theory too long, second part too brief.High Availability -> HA What do we do? Games!180M+Query numbers on DBsSome examples of portal namesSSP is abstraction layerSSP query exampleExplain why horizontal instead of verticalFunctional sharding slide!Explain why sattelite DCIntroduction to sattelite data centers (moving data to caching) but explain they do not own the dataInstead of example of migrating users, example of adding a new DCSlide 23: leave out slideWhy we chose erlang: remove pattern matching. Adds productivity: simplerAdd another example for buckets with a different backendSlide 22: partition on users, bucket and GIDs.It is not a mess in LAMP stack: the backend is just not scalables
  • MySQL performance monitoring using Statsd and Graphite (PLUK2013)

    1. 1. MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering
    2. 2. Overview 1. 2. 3. 4. 5. 6. 7. Who are we? What monitoring tools do we use? What are StatsD, Collectd and Graphite? How MySQL logs to StatsD Graphing examples Challenges Questions? 2
    3. 3. Who are we? Who is Spil Games?
    4. 4. Facts • • • • • Company founded in 2001 350+ employees world wide 180M+ unique visitors per month Over 60M registered users 45 portals in 19 languages • Casual games • Social games • Real time multiplayer games • Mobile games • 35+ MySQL clusters • 60k queries per second (3.5 billion qpd) 4
    5. 5. Geographic Reach 180 Million Monthly Active Users(*) Source: (*) Google Analytics, August 2012 5
    6. 6. Brands Girls, Teens and Family spielen.com juegos.com gamesgames.com games.co.uk 6
    7. 7. Monitoring We use(d) many many many monitoring tools so far!
    8. 8. Existing monitoring systems we use(d) • • • • Opsview/Nagios (mainly availability) Cacti (using Baron Schwartz/Percona templates) MONYog Good ol’ RRD 8
    9. 9. Challenges • Problems with existing systems • Stats gathering through polling • Data gets averaged out • (Host) checks are run serial • Slowdowns in a run means no/less data • Setting up an SSH connection is slow • Low granularity (1 to 5 minutes) • Hardly scalable • Difficult to correlate metrics 9
    10. 10. Difficult to add a new metric host065 bash-3.2# netstat -s | grep "listen queue" 26 times the listen queue of a socket overflowed host066 bash-3.2# netstat -s | grep "listen queue" 33 times the listen queue of a socket overflowed 10
    11. 11. Statsd + Collectd + Graphite What are they?
    12. 12. What is Collectd? • • • • Unix daemon that gathers system statistics Over 90 (input/output) plugins Plugin to send metrics to Graphite/Carbon Very useful for system metrics 12
    13. 13. Collectd Collectd Carbon TCP 30 second interval Gather data plugins CPU DISK LOAD 13 ….
    14. 14. What is StatsD? • • • • • • • Front-end proxy for Graphite/Carbon (by Etsy) NodeJS daemon (also other languages) Receives UDP (on localhost) Buffers metrics locally Flushes periodically data to Graphite/Carbon (TCP) Client libraries available in about any language Send any metric you like! 14
    15. 15. StatsD functions • StatsD functions • update_stats • increment/decrement • set • gauge • timers 15
    16. 16. StatsD Bash examples echo ”some.metric:1|c" | nc -w 1 -u graphite.host 8125 echo ”some.metric:1|c" > /dev/udp/localhost/8125 bash-3.2# netstat -s | grep "listen" 26 times the listen queue of a socket overflowed netstat -s | grep "listen" | awk '{print "hostname.listen.queue.overflowed:"$1"|c"}’ > /dev/udp/localhost/8125 hostname.listen.queue.overflowed:26|c echo "show global status" | mysql -u root | awk '{print "hostname.mysql.status."$1":"$2"|c"}' 16
    17. 17. StatsD StatsD Carbon TCP 2 second interval localhost:8125 UDP Application Level # OF LOGINS MySQL_Statsd CACHE HIT/MISS STATUS 17 INNODB STATUS
    18. 18. What is Graphite? • Highly scalable real-time graphing system • Collects numeric time-series • Backend daemon Carbon • Carbon-cache: receives data • Carbon-aggregator: aggregates data • Carbon-relay: replication and sharding • RRD or Whisper database 18
    19. 19. Graphite’s capabilities • Each metric is in its own bucket • Periods make folders • prod.syseng.mmm.<hostname>.admin_offline • Metric types • Counters • Gauge • Retention can be set using a regex • [mysql] • pattern = ^prod.syseng.mysql..*$ • retentions = 2s:1d,1m:3d,5m:7d,1h:5y 19
    20. 20. Our Graphite environment Client requesting graphs Server-1 Loadbalancer (port 443) Server-2 Server-n Loadbalancer (port 2003) Graphite Rendering Cluster Carbon relay 3 nodes 2 nodes 24h retention Skyline 1 node 8 nodes DEV SYSENG SERVICES1 20 SERVICES2
    21. 21. Our Graphite cluster(s) Client requesting graphs Server-1 12 graphs/s Loadbalancer (port 2003) Graphite Rendering Cluster Carbon relay 700 get/s DEV Server-n a Loadbalancer (port 443) 250K m/s Server-2 3M m(etrics)/s(econd) 1M m/s SYSENG 1.5M m/s SERVICES1 21 500K m/s SERVICES2
    22. 22. Graphite Storage Clusters 22
    23. 23. MySQL + StatsD How do we use them?
    24. 24. Why use StatsD over Collectd? • MySQL plugin for Collectd • Sends SHOW STATUS • No INNODB STATUS • Plugin not flexible • DBI plugin for Collectd • Metrics based on columns • Different granularity needed • Separate daemon (with persistent connection) • StatsD is easy as ABC 24
    25. 25. MySQL StatsD daemon • • • • • • • • Written in Python Rewritten and open sourced during a hackday Gathers data every 0.5 seconds Sends to StatsD (localhost) after every run Easy configuration Persistent connection Baron Schwartz’ InnoDB status parser (cacti poller) Other interesting metrics and counters • Information Schema • Performance Schema • MariaDB specific • Galera specific • If you can query it, you can use it as a metric! 25
    26. 26. MySQL StatsD overview StatsD MySQL SHOW GLOBAL VARIABLES SHOW GLOBAL STATUS SHOW ENGINE INNODB STATUS StatsD thread MySQL Thread MySQL StatsD daemon 26
    27. 27. Example configuration [daemon] logfile = /var/log/mysql_statsd/daemon.log pidfile = /var/run/mysql_statsd.pid [statsd] host = localhost port = 8125 prefix = prd.mysql include_hostname = true [mysql] host = localhost username = mysqlstatsd password =ub3rs3cr3tp@ss! stats_types = status,variables,innodb,commit query_variables = SHOW GLOBAL VARIABLES interval_variables = 10000 query_status = SHOW GLOBAL STATUS interval_status = 500 query_innodb = SHOW ENGINE INNODB STATUS interval_innodb = 10000 query_commit = COMMIT interval_commit = 5000 sleep_interval = 500 [metrics] variables.max_connections = g status.max_used_connections = g status.connections = c innodb.spin_waits = c 27
    28. 28. MySQL Multi Master patch • • • • Perl (Net::Statsd) Sends any status change to StatsD (localhost) Non-blocking (thanks to UDP) Draw as infinite in Graphite 28
    29. 29. Other metrics • Deployments • User initiated actions • Logins • High scores • Comments / ratings • Images uploaded • Payments • Application metrics • Error counts • Cache statistics (cache hit/miss) • Request timers • Image sizes 29
    30. 30. Start graphing! Now it starts to get interesting!
    31. 31. What is important for you? • Identify your KPIs • Don’t graph everything • More graphs == less overview • Combine metrics • Stack clusters 31
    32. 32. Correlate! • Include other metrics into your graphs • Deployments • Failover(s) • Combine application metrics with your database • Other influences • Launch of a new game • Apple keynotes 32
    33. 33. Graphing • Graphite Graphing Engine • DIY • Giraffe • Readily available dashboards/tools • Graph Explorer (vimeo) • Team Dashboard • Skyline (Etsy) • Dashing (Shopify) 33
    34. 34. DIY 34
    35. 35. Giraffe 35
    36. 36. Graph Explorer 36
    37. 37. Team Dashboard 37
    38. 38. Skyline 38
    39. 39. Dashing 39
    40. 40. Graphite Graphing Engine • URI based rendering API • Support for wildcards • stats.prod.syseng.mysql.*.status.com_select • sumSeries (stats.prod.syseng.mysql.*.status.com_select) • aliasByNode(stats.prod.syseng.mysql.*.status.com_select, 4) • Many functions • Nth percentile • Holt-Winters Forecast • Timeshift 40
    41. 41. Graphite web interface 41
    42. 42. Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ tDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.t otal.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28 secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.que stions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2 C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415& until=23%3A59_20130421 42
    43. 43. Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&righ tDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.t otal.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28 secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.que stions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2 C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415& until=23%3A59_20130421 43
    44. 44. Examples: timeshift 44
    45. 45. Examples: multiple weeks 45
    46. 46. Challenges The road ahead
    47. 47. What challenges do we have? • • • • • • • Improve MySQL-statsd (extensive issue list) No zoom in on graphs Get Skyline to work and not cry wolf Machine learning Eternal hunger for more metrics Abuse of the system Hitting limits of SSD write performance • Virident? Fusion-IO? • Carbon  OpenTSDB  Graphite-web? 47
    48. 48. What lessons have we learned? • Persistent connections + repeatable read • History list skyrocketed • More hackdays are needed! • Too many metrics slows down graphing • Too many metrics can kill a host • EstatsD for Erlang 48
    49. 49. Questions…
    50. 50. Practical links • Graphite: http://graphite.readthedocs.org/en/latest/ • Collectd: https://collectd.org/ • StatsD on Github by Etsy: https://github.com/etsy/statsd/wiki • Etsy on StatsD: http://codeascraft.etsy.com/2011/02/15/measureanything-measure-everything/ 50
    51. 51. Thank you! • Presentation can be found at: http://spil.com/pluk2013 • MySQL Statsd can be found at: http://spil.com/mysqlstatsd http://github.com/spilgames/mysql-statsd • If you wish to contact me: art@spilgames.com 51

    ×