Your SlideShare is downloading. ×
  • Like
MySQL Performance Monitoring
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

MySQL Performance Monitoring

  • 1,442 views
Published

MySQL Performance Monitoring using Statsd and Graphite

MySQL Performance Monitoring using Statsd and Graphite

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,442
On SlideShare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
20
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering
  • 2. 2   1.  Who  are  we?   2.  What  monitoring  tools  do  we  use?   3.  What  are  StatsD,  Collectd  and  Graphite?   4.  How  MySQL  logs  to  StatsD   5.  Graphing  examples   6.  Challenges   7.  QuesHons?   Overview
  • 3. Who are we? Who  is  Spil  Games?    
  • 4. 4   •  Company  founded  in  2001   •  350+  employees  world  wide   •  180M+  unique  visitors  per  month   •  Over  50M  registered  users   •  45  portals  in  19  languages   •  Casual  games   •  Social  games   •  Real  Hme  mulHplayer  games   •  Mobile  games   •  35+  MySQL  clusters   •  60k  queries  per  second  (3.5  billion  qpd)   Facts
  • 5. 5   Geographic Reach 180  Million  Monthly  AcHve  Users(*)   Source:  (*)  Google  Analy3cs,  August  2012    
  • 6. 6   Girls,  Teens  and  Family     spielen.com   juegos.com   gamesgames.com   games.co.uk   Brands
  • 7. Monitoring We  use(d)  many  many  many   monitoring  tools  so  far!    
  • 8. 8   •  Opsview/Nagios  (mainly  availability)   •  CacH  (using  Baron  Schwartz/Percona  templates)   •  MONYog   •  Good  ol’  RRD   Existing monitoring systems we use(d)
  • 9. 9   Opsview/Nagios •  Strong  points:   •  Easy  to  create  (nagios)  plugins   •  Slaves  for  scaling  out   •  Weak  points:   •  Stats  gathering  through  polling   •  Low  granularity  (1  to  5  minutes)   •  Difficult  URIs  for  graphs  
  • 10. 10   Cacti •  Strong  points:   •  Awesome  Percona  templates   •  Great  overviews  and  graphs   •  Weak  points:   •  Hard  to  add  new  metrics  (to  90+  servers)   •  Not  scalable   •  Low  granularity  (1  to  5  minutes)   •  Hard  to  correlate  
  • 11. 11   MonYOG •  Strong  points:   •  Easy  to  set  up   •  Compare  any  server  with  another   •  Compare  configuraHons   •  Weak  points:   •  “Closed  source”   •  Not  scalable   •  Jack  of  all  trades  
  • 12. 12   Poll limitations •  Limited  to  a  set  interval   •  Data  gets  averaged  out   •  (Host)  checks  are  run  serial   •  Slowdowns  in  a  run  means  no/less  data   •  Scaling:  add  more  masters/slaves   •  Sekng  up  an  SSH  connecHon  is  slow  
  • 13. 13   Difficult to add a new metric host065! bash-3.2# netstat -s | grep "listen queue"!     26 times the listen queue of a socket overflowed! ! host066! bash-3.2# netstat -s | grep "listen queue"!     33 times the listen queue of a socket overflowed!
  • 14. 14   Other things you can’t do!
  • 15. Statsd + Collectd + Graphite What  are  they?    
  • 16. 16   •  Highly  scalable  real-­‐Hme  graphing  system   •  Collects  numeric  Hme-­‐series   •  Backend  daemon  Carbon   •  Carbon-­‐cache:  receives  data   •  Carbon-­‐aggregator:  aggregates  data   •  Carbon-­‐relay:  replicaHon  and  sharding     •  RRD  or  Whisper  database   What is Graphite?
  • 17. 17   •  Each  metric  is  in  its  own  bucket   •  Periods  make  folders   •  prod.syseng.mmm.<hostname>.admin_offline   •  Metric  types   •  Counters   •  Gauge   •  RetenHon  can  be  set  using  a  regex   •  [mysql]     •  pasern  =  ^prod.syseng.mysql..*$     •  retenHons  =  2s:1d,1m:3d,5m:7d,1h:5y   Graphite’s capabilities
  • 18. 18   •  Unix  daemon  that  gathers  system  staHsHcs   •  Over  90  (input/output)  plugins   •  Plugin  to  send  metrics  to  Graphite/Carbon   •  Very  useful  for  system  metrics   What is Collectd?
  • 19. 19   •  Front-­‐end  proxy  for  Graphite/Carbon  (by  Etsy)   •  NodeJS  daemon  (also  other  languages)   •  Receives  UDP  (on  localhost)   •  Buffers  metrics  locally   •  Flushes  periodically  data  to  Graphite/Carbon  (TCP)   •  Client  libraries  available  in  about  any  language   •  Send  any  metric  you  like!   What is StatsD?
  • 20. 20   •  StatsD  funcHons   •  update_stats   •  increment/decrement   •  set   •  gauge   •  Hmers   StatsD functions
  • 21. 21   PHP:   $statsd = new StatsD();! $statsd->increment(“prod.app1.pages_rendered”, 1);! $statsd->gauge(“prod.app1.page_concurrency”, 10);! $statsd->set(“prod.app1.unique_users”, $userid);! …! $start = microtime(true); ! serve_out_content_to_clients(); ! $statsd->timing(”prod.app1.rendering_time", (microtime(true) - $start) * 1000);! ! Library:! https://github.com/etsy/statsd/blob/master/examples/php-example.php! ! StatsD PHP code examples
  • 22. 22   Our Graphite cluster(s) Client  requesHng  graphs   Graphite  Rendering  Cluster   Carbon  relay   Loadbalancer  (port  443)   DEV   SYSENG   SERVICES1   SERVICES2   Server-­‐1   Server-­‐2   Server-­‐n   Loadbalancer  (port  2003)   8 nodes 3 nodes 2 nodes
  • 23. 23   Graphite Storage Clusters
  • 24. 24   Collectd Collectd   Gather  data  plugins   CPU   DISK   LOAD   ….   Carbon  TCP   30 second interval
  • 25. 25   StatsD StatsD   ApplicaHon  Level   #  OF  LOGINS   CACHE  HIT/MISS   STATUS   INNODB  STATUS   Carbon  TCP   2 second interval MySQL_Statsd   localhost:8125 UDP
  • 26. 26   Global scale?
  • 27. MySQL + StatsD How  do  we  use  them?    
  • 28. 28   •  MySQL  plugin  for  Collectd   •  Sends  SHOW  STATUS   •  No  INNODB  STATUS   •  Plugin  not  flexible   •  DBI  plugin  for  Collectd   •  Metrics  based  on  columns   •  Different  granularity  needed   •  Separate  daemon  (with  persistent  connecHon)   •  StatsD  is  easy  as  ABC   Why use StatsD over Collectd?
  • 29. 29   •  Wrisen  in  Python   •  Gathers  data  every  0.5  seconds   •  Sends  to  StatsD  (localhost)  a•er  every  run   •  Easy  to  set  up:  no  configuraHon   •  Persistent  connecHon   •  Baron  Schwartz’  InnoDB  status  parser  (cacH  poller)   •  Other  interesHng  metrics  and  counters   •  InformaHon  Schema   •  MySQL  5.5/5.6  Performance  Schema   •  MariaDB  specific   •  Galera  specific   MySQL StatsD daemon
  • 30. 30   MySQL StatsD overview MySQLCollector SHOW STATUS SHOW INNODB STATUS SHOW VARIABLES Persistent connection StatsD Flushed every 0.5 seconds
  • 31. 31   •  Perl  (Net::Statsd)   •  Sends  any  status  change  to  StatsD  (localhost)   •  Non-­‐blocking  (thanks  to  UDP)   •  Draw  as  infinite  in  Graphite   MySQL Multi Master patch
  • 32. 32   use Net::Statsd;! $Net::Statsd::HOST = 'localhost'; # Default! $Net::Statsd::PORT = 8125; # Default! ! …! ! # ONLINE -> HARD_OFFLINE! unless ($ping && $mysql) {! Net::Statsd::update_stats('prod.syseng.mmm.'.$host.'.hard_offline', 1);! FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK'));! $agent->state('HARD_OFFLINE');! }! ! …! ! MMM Perl code example
  • 33. 33   •  Deployments   •  User  iniHated  acHons   •  Logins   •  High  scores   •  Comments  /  raHngs   •  Images  uploaded   •  Payments   •  ApplicaHon  metrics   •  Error  counts   •  Cache  staHsHcs  (cache  hit/miss)   •  Request  Hmers   •  Image  sizes   Other metrics
  • 34. Start graphing! Now  it  starts  to  get   interes=ng!  
  • 35. 35   •  IdenHfy  your  KPIs   •  Don’t  graph  everything   •  More  graphs  ==  less  overview   •  Combine  metrics   •  Stack  clusters   What is important for you?
  • 36. 36   •  Include  other  metrics  into  your  graphs   •  Deployments   •  Failover(s)   •  Combine  applicaHon  metrics  with  your  database   •  Other  influences   •  Solar  flares   •  Start  of  the  new  Maya  calendar   Correlate!
  • 37. 37   •  URI  based  rendering  API   •  Support  for  wildcards   •  stats.prod.syseng.mysql.*.status.com_select   •  sumSeries  (stats.prod.syseng.mysql.*.status.com_select)     •  aliasByNode(stats.prod.syseng.mysql.*.status.com_select,  4)     •  Many  funcHons   •  Nth  percenHle   •  Holt-­‐Winters  Forecast   •  Timeshi•   Graphite Graphing Engine
  • 38. 38   Graphite Aggregator syseng => {!            nodes => [”databasehost1", ”databasehost2"],!            copying_relay_instances => 8,!            hashing_relay_instances => 8,!            cache_instances => 8,!            aggregation => {!                0 => {!                    name => ”mysql",!                    pattern => '.*.mysql..*',!                    send_raw => 1,!                },!            }!        }! ! ! stats.<env>.syseng.mysql.cluster1.status.questions.all (2) = ! !sum stats.<env>.syseng.mysql.*.status.questions! !
  • 39. 39   Graphite web interface                
  • 40. 40   Graphite Example URL https://graphitehost/render/? width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias %28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C %22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis %28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C %20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C %22Number%20of%20queries%20profiles%20cluster %22%29&from=00%3A00_20130415&until=23%3A59_20130421!
  • 41. 41   Graphite Example URL https://graphitehost/render/? width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias %28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C %22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis %28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C %20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C %22Number%20of%20queries%20profiles%20cluster %22%29&from=00%3A00_20130415&until=23%3A59_20130421!
  • 42. 42   Other examples: MMM
  • 43. 43   Other examples: timeshift
  • 44. 44   Other examples: multiple weeks
  • 45. Challenges The  road  ahead  
  • 46. 46   •  MySQL_statsd  rewrite  necessary  (not  opensource  yet)   •  No  alerHng  through  Graphite  (yet)   •  Machine  learning   •  Eternal  hunger  for  more  metrics   •  Abuse  of  the  system   What challenges do we have?
  • 47. 47   •  Persistent  connecHons  +  repeatable  read   •  History  list  skyrocketed   •  Too  many  metrics  slows  down  graphing   •  Too  many  metrics  can  kill  a  host   •  EstatsD  for  Erlang   What lessons have we learned?
  • 48. Questions…
  • 49. 49   •  Graphite:   hsp://graphite.readthedocs.org/en/latest/   •  Collectd:   hsps://collectd.org/   •  StatsD  on  Github  by  Etsy:   hsps://github.com/etsy/statsd/wiki   •  Etsy  on  StatsD:   hsp://codeascra•.etsy.com/2011/02/15/measure-­‐ anything-­‐measure-­‐everything/     Practical links
  • 50. 50   •  PresentaHon  can  be  found  at:   hsp://spil.com/perconasc2013   •  If  you  wish  to  contact  me:   art@spilgames.com   •  Don’t  forget  to  rate  my  talk!   Thank you!