MySQL Performance
monitoring using
Statsd and Graphite
Art van Scheppingen
Head of Database Engineering
2	
  
1.  Who	
  are	
  we?	
  
2.  What	
  monitoring	
  tools	
  do	
  we	
  use?	
  
3.  What	
  are	
  StatsD,	
  Collectd	
  and	
  Graphite?	
  
4.  How	
  MySQL	
  logs	
  to	
  StatsD	
  
5.  Graphing	
  examples	
  
6.  Challenges	
  
7.  QuesHons?	
  
Overview
Who are we?
Who	
  is	
  Spil	
  Games?	
  
	
  
4	
  
•  Company	
  founded	
  in	
  2001	
  
•  350+	
  employees	
  world	
  wide	
  
•  180M+	
  unique	
  visitors	
  per	
  month	
  
•  Over	
  50M	
  registered	
  users	
  
•  45	
  portals	
  in	
  19	
  languages	
  
•  Casual	
  games	
  
•  Social	
  games	
  
•  Real	
  Hme	
  mulHplayer	
  games	
  
•  Mobile	
  games	
  
•  35+	
  MySQL	
  clusters	
  
•  60k	
  queries	
  per	
  second	
  (3.5	
  billion	
  qpd)	
  
Facts
5	
  
Geographic Reach
180	
  Million	
  Monthly	
  AcHve	
  Users(*)	
  
Source:	
  (*)	
  Google	
  Analy3cs,	
  August	
  2012	
  
	
  
6	
  
Girls,	
  Teens	
  and	
  Family	
  
	
  
spielen.com	
  
juegos.com	
  
gamesgames.com	
  
games.co.uk	
  
Brands
Monitoring
We	
  use(d)	
  many	
  many	
  many	
  
monitoring	
  tools	
  so	
  far!	
  
	
  
8	
  
•  Opsview/Nagios	
  (mainly	
  availability)	
  
•  CacH	
  (using	
  Baron	
  Schwartz/Percona	
  templates)	
  
•  MONYog	
  
•  Good	
  ol’	
  RRD	
  
Existing monitoring systems we use(d)
9	
  
Opsview/Nagios
•  Strong	
  points:	
  
•  Easy	
  to	
  create	
  (nagios)	
  plugins	
  
•  Slaves	
  for	
  scaling	
  out	
  
•  Weak	
  points:	
  
•  Stats	
  gathering	
  through	
  polling	
  
•  Low	
  granularity	
  (1	
  to	
  5	
  minutes)	
  
•  Difficult	
  URIs	
  for	
  graphs	
  
10	
  
Cacti
•  Strong	
  points:	
  
•  Awesome	
  Percona	
  templates	
  
•  Great	
  overviews	
  and	
  graphs	
  
•  Weak	
  points:	
  
•  Hard	
  to	
  add	
  new	
  metrics	
  (to	
  90+	
  servers)	
  
•  Not	
  scalable	
  
•  Low	
  granularity	
  (1	
  to	
  5	
  minutes)	
  
•  Hard	
  to	
  correlate	
  
11	
  
MonYOG
•  Strong	
  points:	
  
•  Easy	
  to	
  set	
  up	
  
•  Compare	
  any	
  server	
  with	
  another	
  
•  Compare	
  configuraHons	
  
•  Weak	
  points:	
  
•  “Closed	
  source”	
  
•  Not	
  scalable	
  
•  Jack	
  of	
  all	
  trades	
  
12	
  
Poll limitations
•  Limited	
  to	
  a	
  set	
  interval	
  
•  Data	
  gets	
  averaged	
  out	
  
•  (Host)	
  checks	
  are	
  run	
  serial	
  
•  Slowdowns	
  in	
  a	
  run	
  means	
  no/less	
  data	
  
•  Scaling:	
  add	
  more	
  masters/slaves	
  
•  Sekng	
  up	
  an	
  SSH	
  connecHon	
  is	
  slow	
  
13	
  
Difficult to add a new metric
host065!
bash-3.2# netstat -s | grep "listen queue"!
    26 times the listen queue of a socket overflowed!
!
host066!
bash-3.2# netstat -s | grep "listen queue"!
    33 times the listen queue of a socket overflowed!
14	
  
Other things you can’t do!
Statsd + Collectd
+ Graphite
What	
  are	
  they?	
  
	
  
16	
  
•  Highly	
  scalable	
  real-­‐Hme	
  graphing	
  system	
  
•  Collects	
  numeric	
  Hme-­‐series	
  
•  Backend	
  daemon	
  Carbon	
  
•  Carbon-­‐cache:	
  receives	
  data	
  
•  Carbon-­‐aggregator:	
  aggregates	
  data	
  
•  Carbon-­‐relay:	
  replicaHon	
  and	
  sharding	
  	
  
•  RRD	
  or	
  Whisper	
  database	
  
What is Graphite?
17	
  
•  Each	
  metric	
  is	
  in	
  its	
  own	
  bucket	
  
•  Periods	
  make	
  folders	
  
•  prod.syseng.mmm.<hostname>.admin_offline	
  
•  Metric	
  types	
  
•  Counters	
  
•  Gauge	
  
•  RetenHon	
  can	
  be	
  set	
  using	
  a	
  regex	
  
•  [mysql]	
  	
  
•  pasern	
  =	
  ^prod.syseng.mysql..*$	
  	
  
•  retenHons	
  =	
  2s:1d,1m:3d,5m:7d,1h:5y	
  
Graphite’s capabilities
18	
  
•  Unix	
  daemon	
  that	
  gathers	
  system	
  staHsHcs	
  
•  Over	
  90	
  (input/output)	
  plugins	
  
•  Plugin	
  to	
  send	
  metrics	
  to	
  Graphite/Carbon	
  
•  Very	
  useful	
  for	
  system	
  metrics	
  
What is Collectd?
19	
  
•  Front-­‐end	
  proxy	
  for	
  Graphite/Carbon	
  (by	
  Etsy)	
  
•  NodeJS	
  daemon	
  (also	
  other	
  languages)	
  
•  Receives	
  UDP	
  (on	
  localhost)	
  
•  Buffers	
  metrics	
  locally	
  
•  Flushes	
  periodically	
  data	
  to	
  Graphite/Carbon	
  (TCP)	
  
•  Client	
  libraries	
  available	
  in	
  about	
  any	
  language	
  
•  Send	
  any	
  metric	
  you	
  like!	
  
What is StatsD?
20	
  
•  StatsD	
  funcHons	
  
•  update_stats	
  
•  increment/decrement	
  
•  set	
  
•  gauge	
  
•  Hmers	
  
StatsD functions
21	
  
PHP:	
  
$statsd = new StatsD();!
$statsd->increment(“prod.app1.pages_rendered”, 1);!
$statsd->gauge(“prod.app1.page_concurrency”, 10);!
$statsd->set(“prod.app1.unique_users”, $userid);!
…!
$start = microtime(true); !
serve_out_content_to_clients(); !
$statsd->timing(”prod.app1.rendering_time", (microtime(true) - $start) *
1000);!
!
Library:!
https://github.com/etsy/statsd/blob/master/examples/php-example.php!
!
StatsD PHP code examples
22	
  
Our Graphite cluster(s)
Client	
  requesHng	
  graphs	
  
Graphite	
  Rendering	
  Cluster	
   Carbon	
  relay	
  
Loadbalancer	
  (port	
  443)	
  
DEV	
   SYSENG	
   SERVICES1	
   SERVICES2	
  
Server-­‐1	
   Server-­‐2	
   Server-­‐n	
  
Loadbalancer	
  (port	
  2003)	
  
8 nodes
3 nodes 2 nodes
23	
  
Graphite Storage Clusters
24	
  
Collectd
Collectd	
  
Gather	
  data	
  plugins	
  
CPU	
   DISK	
   LOAD	
   ….	
  
Carbon	
  TCP	
  
30 second interval
25	
  
StatsD
StatsD	
  
ApplicaHon	
  Level	
  
#	
  OF	
  LOGINS	
   CACHE	
  HIT/MISS	
   STATUS	
   INNODB	
  STATUS	
  
Carbon	
  TCP	
  
2 second interval
MySQL_Statsd	
  
localhost:8125
UDP
26	
  
Global scale?
MySQL + StatsD
How	
  do	
  we	
  use	
  them?	
  
	
  
28	
  
•  MySQL	
  plugin	
  for	
  Collectd	
  
•  Sends	
  SHOW	
  STATUS	
  
•  No	
  INNODB	
  STATUS	
  
•  Plugin	
  not	
  flexible	
  
•  DBI	
  plugin	
  for	
  Collectd	
  
•  Metrics	
  based	
  on	
  columns	
  
•  Different	
  granularity	
  needed	
  
•  Separate	
  daemon	
  (with	
  persistent	
  connecHon)	
  
•  StatsD	
  is	
  easy	
  as	
  ABC	
  
Why use StatsD over Collectd?
29	
  
•  Wrisen	
  in	
  Python	
  
•  Gathers	
  data	
  every	
  0.5	
  seconds	
  
•  Sends	
  to	
  StatsD	
  (localhost)	
  a•er	
  every	
  run	
  
•  Easy	
  to	
  set	
  up:	
  no	
  configuraHon	
  
•  Persistent	
  connecHon	
  
•  Baron	
  Schwartz’	
  InnoDB	
  status	
  parser	
  (cacH	
  poller)	
  
•  Other	
  interesHng	
  metrics	
  and	
  counters	
  
•  InformaHon	
  Schema	
  
•  MySQL	
  5.5/5.6	
  Performance	
  Schema	
  
•  MariaDB	
  specific	
  
•  Galera	
  specific	
  
MySQL StatsD daemon
30	
  
MySQL StatsD overview
MySQLCollector
SHOW
STATUS
SHOW
INNODB
STATUS
SHOW
VARIABLES
Persistent
connection
StatsD
Flushed
every
0.5 seconds
31	
  
•  Perl	
  (Net::Statsd)	
  
•  Sends	
  any	
  status	
  change	
  to	
  StatsD	
  (localhost)	
  
•  Non-­‐blocking	
  (thanks	
  to	
  UDP)	
  
•  Draw	
  as	
  infinite	
  in	
  Graphite	
  
MySQL Multi Master patch
32	
  
use Net::Statsd;!
$Net::Statsd::HOST = 'localhost'; # Default!
$Net::Statsd::PORT = 8125; # Default!
!
…!
!
# ONLINE -> HARD_OFFLINE!
unless ($ping && $mysql) {!
Net::Statsd::update_stats('prod.syseng.mmm.'.$host.'.hard_offline', 1);!
FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE
(ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql?
'OK' : 'not OK'));!
$agent->state('HARD_OFFLINE');!
}!
!
…!
!
MMM Perl code example
33	
  
•  Deployments	
  
•  User	
  iniHated	
  acHons	
  
•  Logins	
  
•  High	
  scores	
  
•  Comments	
  /	
  raHngs	
  
•  Images	
  uploaded	
  
•  Payments	
  
•  ApplicaHon	
  metrics	
  
•  Error	
  counts	
  
•  Cache	
  staHsHcs	
  (cache	
  hit/miss)	
  
•  Request	
  Hmers	
  
•  Image	
  sizes	
  
Other metrics
Start graphing!
Now	
  it	
  starts	
  to	
  get	
  
interes=ng!	
  
35	
  
•  IdenHfy	
  your	
  KPIs	
  
•  Don’t	
  graph	
  everything	
  
•  More	
  graphs	
  ==	
  less	
  overview	
  
•  Combine	
  metrics	
  
•  Stack	
  clusters	
  
What is important for you?
36	
  
•  Include	
  other	
  metrics	
  into	
  your	
  graphs	
  
•  Deployments	
  
•  Failover(s)	
  
•  Combine	
  applicaHon	
  metrics	
  with	
  your	
  database	
  
•  Other	
  influences	
  
•  Solar	
  flares	
  
•  Start	
  of	
  the	
  new	
  Maya	
  calendar	
  
Correlate!
37	
  
•  URI	
  based	
  rendering	
  API	
  
•  Support	
  for	
  wildcards	
  
•  stats.prod.syseng.mysql.*.status.com_select	
  
•  sumSeries	
  (stats.prod.syseng.mysql.*.status.com_select)	
  	
  
•  aliasByNode(stats.prod.syseng.mysql.*.status.com_select,	
  4)	
  	
  
•  Many	
  funcHons	
  
•  Nth	
  percenHle	
  
•  Holt-­‐Winters	
  Forecast	
  
•  Timeshi•	
  
Graphite Graphing Engine
38	
  
Graphite Aggregator
syseng => {!
           nodes => [”databasehost1", ”databasehost2"],!
           copying_relay_instances => 8,!
           hashing_relay_instances => 8,!
           cache_instances => 8,!
           aggregation => {!
               0 => {!
                   name => ”mysql",!
                   pattern => '.*.mysql..*',!
                   send_raw => 1,!
               },!
           }!
       }!
!
!
stats.<env>.syseng.mysql.cluster1.status.questions.all (2) = !
!sum stats.<env>.syseng.mysql.*.status.questions!
!
39	
  
Graphite web interface
	
  	
  	
  	
  	
  	
  	
  	
  
40	
  
Graphite Example URL
https://graphitehost/render/?
width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias
%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C
%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis
%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C
%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C
%22Number%20of%20queries%20profiles%20cluster
%22%29&from=00%3A00_20130415&until=23%3A59_20130421!
41	
  
Graphite Example URL
https://graphitehost/render/?
width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias
%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C
%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis
%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C
%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C
%22Number%20of%20queries%20profiles%20cluster
%22%29&from=00%3A00_20130415&until=23%3A59_20130421!
42	
  
Other examples: MMM
43	
  
Other examples: timeshift
44	
  
Other examples: multiple weeks
Challenges
The	
  road	
  ahead	
  
46	
  
•  MySQL_statsd	
  rewrite	
  necessary	
  (not	
  opensource	
  yet)	
  
•  No	
  alerHng	
  through	
  Graphite	
  (yet)	
  
•  Machine	
  learning	
  
•  Eternal	
  hunger	
  for	
  more	
  metrics	
  
•  Abuse	
  of	
  the	
  system	
  
What challenges do we have?
47	
  
•  Persistent	
  connecHons	
  +	
  repeatable	
  read	
  
•  History	
  list	
  skyrocketed	
  
•  Too	
  many	
  metrics	
  slows	
  down	
  graphing	
  
•  Too	
  many	
  metrics	
  can	
  kill	
  a	
  host	
  
•  EstatsD	
  for	
  Erlang	
  
What lessons have we learned?
Questions…
49	
  
•  Graphite:	
  
hsp://graphite.readthedocs.org/en/latest/	
  
•  Collectd:	
  
hsps://collectd.org/	
  
•  StatsD	
  on	
  Github	
  by	
  Etsy:	
  
hsps://github.com/etsy/statsd/wiki	
  
•  Etsy	
  on	
  StatsD:	
  
hsp://codeascra•.etsy.com/2011/02/15/measure-­‐
anything-­‐measure-­‐everything/	
  
	
  
Practical links
50	
  
•  PresentaHon	
  can	
  be	
  found	
  at:	
  
hsp://spil.com/perconasc2013	
  
•  If	
  you	
  wish	
  to	
  contact	
  me:	
  
art@spilgames.com	
  
•  Don’t	
  forget	
  to	
  rate	
  my	
  talk!	
  
Thank you!

MySQL Performance Monitoring

  • 1.
    MySQL Performance monitoring using Statsdand Graphite Art van Scheppingen Head of Database Engineering
  • 2.
    2   1.  Who  are  we?   2.  What  monitoring  tools  do  we  use?   3.  What  are  StatsD,  Collectd  and  Graphite?   4.  How  MySQL  logs  to  StatsD   5.  Graphing  examples   6.  Challenges   7.  QuesHons?   Overview
  • 3.
    Who are we? Who  is  Spil  Games?    
  • 4.
    4   •  Company  founded  in  2001   •  350+  employees  world  wide   •  180M+  unique  visitors  per  month   •  Over  50M  registered  users   •  45  portals  in  19  languages   •  Casual  games   •  Social  games   •  Real  Hme  mulHplayer  games   •  Mobile  games   •  35+  MySQL  clusters   •  60k  queries  per  second  (3.5  billion  qpd)   Facts
  • 5.
    5   Geographic Reach 180  Million  Monthly  AcHve  Users(*)   Source:  (*)  Google  Analy3cs,  August  2012    
  • 6.
    6   Girls,  Teens  and  Family     spielen.com   juegos.com   gamesgames.com   games.co.uk   Brands
  • 7.
    Monitoring We  use(d)  many  many  many   monitoring  tools  so  far!    
  • 8.
    8   •  Opsview/Nagios  (mainly  availability)   •  CacH  (using  Baron  Schwartz/Percona  templates)   •  MONYog   •  Good  ol’  RRD   Existing monitoring systems we use(d)
  • 9.
    9   Opsview/Nagios •  Strong  points:   •  Easy  to  create  (nagios)  plugins   •  Slaves  for  scaling  out   •  Weak  points:   •  Stats  gathering  through  polling   •  Low  granularity  (1  to  5  minutes)   •  Difficult  URIs  for  graphs  
  • 10.
    10   Cacti •  Strong  points:   •  Awesome  Percona  templates   •  Great  overviews  and  graphs   •  Weak  points:   •  Hard  to  add  new  metrics  (to  90+  servers)   •  Not  scalable   •  Low  granularity  (1  to  5  minutes)   •  Hard  to  correlate  
  • 11.
    11   MonYOG •  Strong  points:   •  Easy  to  set  up   •  Compare  any  server  with  another   •  Compare  configuraHons   •  Weak  points:   •  “Closed  source”   •  Not  scalable   •  Jack  of  all  trades  
  • 12.
    12   Poll limitations • Limited  to  a  set  interval   •  Data  gets  averaged  out   •  (Host)  checks  are  run  serial   •  Slowdowns  in  a  run  means  no/less  data   •  Scaling:  add  more  masters/slaves   •  Sekng  up  an  SSH  connecHon  is  slow  
  • 13.
    13   Difficult toadd a new metric host065! bash-3.2# netstat -s | grep "listen queue"!     26 times the listen queue of a socket overflowed! ! host066! bash-3.2# netstat -s | grep "listen queue"!     33 times the listen queue of a socket overflowed!
  • 14.
    14   Other thingsyou can’t do!
  • 15.
    Statsd + Collectd +Graphite What  are  they?    
  • 16.
    16   •  Highly  scalable  real-­‐Hme  graphing  system   •  Collects  numeric  Hme-­‐series   •  Backend  daemon  Carbon   •  Carbon-­‐cache:  receives  data   •  Carbon-­‐aggregator:  aggregates  data   •  Carbon-­‐relay:  replicaHon  and  sharding     •  RRD  or  Whisper  database   What is Graphite?
  • 17.
    17   •  Each  metric  is  in  its  own  bucket   •  Periods  make  folders   •  prod.syseng.mmm.<hostname>.admin_offline   •  Metric  types   •  Counters   •  Gauge   •  RetenHon  can  be  set  using  a  regex   •  [mysql]     •  pasern  =  ^prod.syseng.mysql..*$     •  retenHons  =  2s:1d,1m:3d,5m:7d,1h:5y   Graphite’s capabilities
  • 18.
    18   •  Unix  daemon  that  gathers  system  staHsHcs   •  Over  90  (input/output)  plugins   •  Plugin  to  send  metrics  to  Graphite/Carbon   •  Very  useful  for  system  metrics   What is Collectd?
  • 19.
    19   •  Front-­‐end  proxy  for  Graphite/Carbon  (by  Etsy)   •  NodeJS  daemon  (also  other  languages)   •  Receives  UDP  (on  localhost)   •  Buffers  metrics  locally   •  Flushes  periodically  data  to  Graphite/Carbon  (TCP)   •  Client  libraries  available  in  about  any  language   •  Send  any  metric  you  like!   What is StatsD?
  • 20.
    20   •  StatsD  funcHons   •  update_stats   •  increment/decrement   •  set   •  gauge   •  Hmers   StatsD functions
  • 21.
    21   PHP:   $statsd= new StatsD();! $statsd->increment(“prod.app1.pages_rendered”, 1);! $statsd->gauge(“prod.app1.page_concurrency”, 10);! $statsd->set(“prod.app1.unique_users”, $userid);! …! $start = microtime(true); ! serve_out_content_to_clients(); ! $statsd->timing(”prod.app1.rendering_time", (microtime(true) - $start) * 1000);! ! Library:! https://github.com/etsy/statsd/blob/master/examples/php-example.php! ! StatsD PHP code examples
  • 22.
    22   Our Graphitecluster(s) Client  requesHng  graphs   Graphite  Rendering  Cluster   Carbon  relay   Loadbalancer  (port  443)   DEV   SYSENG   SERVICES1   SERVICES2   Server-­‐1   Server-­‐2   Server-­‐n   Loadbalancer  (port  2003)   8 nodes 3 nodes 2 nodes
  • 23.
  • 24.
    24   Collectd Collectd   Gather  data  plugins   CPU   DISK   LOAD   ….   Carbon  TCP   30 second interval
  • 25.
    25   StatsD StatsD   ApplicaHon  Level   #  OF  LOGINS   CACHE  HIT/MISS   STATUS   INNODB  STATUS   Carbon  TCP   2 second interval MySQL_Statsd   localhost:8125 UDP
  • 26.
  • 27.
    MySQL + StatsD How  do  we  use  them?    
  • 28.
    28   •  MySQL  plugin  for  Collectd   •  Sends  SHOW  STATUS   •  No  INNODB  STATUS   •  Plugin  not  flexible   •  DBI  plugin  for  Collectd   •  Metrics  based  on  columns   •  Different  granularity  needed   •  Separate  daemon  (with  persistent  connecHon)   •  StatsD  is  easy  as  ABC   Why use StatsD over Collectd?
  • 29.
    29   •  Wrisen  in  Python   •  Gathers  data  every  0.5  seconds   •  Sends  to  StatsD  (localhost)  a•er  every  run   •  Easy  to  set  up:  no  configuraHon   •  Persistent  connecHon   •  Baron  Schwartz’  InnoDB  status  parser  (cacH  poller)   •  Other  interesHng  metrics  and  counters   •  InformaHon  Schema   •  MySQL  5.5/5.6  Performance  Schema   •  MariaDB  specific   •  Galera  specific   MySQL StatsD daemon
  • 30.
    30   MySQL StatsDoverview MySQLCollector SHOW STATUS SHOW INNODB STATUS SHOW VARIABLES Persistent connection StatsD Flushed every 0.5 seconds
  • 31.
    31   •  Perl  (Net::Statsd)   •  Sends  any  status  change  to  StatsD  (localhost)   •  Non-­‐blocking  (thanks  to  UDP)   •  Draw  as  infinite  in  Graphite   MySQL Multi Master patch
  • 32.
    32   use Net::Statsd;! $Net::Statsd::HOST= 'localhost'; # Default! $Net::Statsd::PORT = 8125; # Default! ! …! ! # ONLINE -> HARD_OFFLINE! unless ($ping && $mysql) {! Net::Statsd::update_stats('prod.syseng.mmm.'.$host.'.hard_offline', 1);! FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK'));! $agent->state('HARD_OFFLINE');! }! ! …! ! MMM Perl code example
  • 33.
    33   •  Deployments   •  User  iniHated  acHons   •  Logins   •  High  scores   •  Comments  /  raHngs   •  Images  uploaded   •  Payments   •  ApplicaHon  metrics   •  Error  counts   •  Cache  staHsHcs  (cache  hit/miss)   •  Request  Hmers   •  Image  sizes   Other metrics
  • 34.
    Start graphing! Now  it  starts  to  get   interes=ng!  
  • 35.
    35   •  IdenHfy  your  KPIs   •  Don’t  graph  everything   •  More  graphs  ==  less  overview   •  Combine  metrics   •  Stack  clusters   What is important for you?
  • 36.
    36   •  Include  other  metrics  into  your  graphs   •  Deployments   •  Failover(s)   •  Combine  applicaHon  metrics  with  your  database   •  Other  influences   •  Solar  flares   •  Start  of  the  new  Maya  calendar   Correlate!
  • 37.
    37   •  URI  based  rendering  API   •  Support  for  wildcards   •  stats.prod.syseng.mysql.*.status.com_select   •  sumSeries  (stats.prod.syseng.mysql.*.status.com_select)     •  aliasByNode(stats.prod.syseng.mysql.*.status.com_select,  4)     •  Many  funcHons   •  Nth  percenHle   •  Holt-­‐Winters  Forecast   •  Timeshi•   Graphite Graphing Engine
  • 38.
    38   Graphite Aggregator syseng=> {!            nodes => [”databasehost1", ”databasehost2"],!            copying_relay_instances => 8,!            hashing_relay_instances => 8,!            cache_instances => 8,!            aggregation => {!                0 => {!                    name => ”mysql",!                    pattern => '.*.mysql..*',!                    send_raw => 1,!                },!            }!        }! ! ! stats.<env>.syseng.mysql.cluster1.status.questions.all (2) = ! !sum stats.<env>.syseng.mysql.*.status.questions! !
  • 39.
    39   Graphite webinterface                
  • 40.
    40   Graphite ExampleURL https://graphitehost/render/? width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias %28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C %22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis %28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C %20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C %22Number%20of%20queries%20profiles%20cluster %22%29&from=00%3A00_20130415&until=23%3A59_20130421!
  • 41.
    41   Graphite ExampleURL https://graphitehost/render/? width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias %28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C %22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis %28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C %20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C %22Number%20of%20queries%20profiles%20cluster %22%29&from=00%3A00_20130415&until=23%3A59_20130421!
  • 42.
  • 43.
  • 44.
    44   Other examples:multiple weeks
  • 45.
  • 46.
    46   •  MySQL_statsd  rewrite  necessary  (not  opensource  yet)   •  No  alerHng  through  Graphite  (yet)   •  Machine  learning   •  Eternal  hunger  for  more  metrics   •  Abuse  of  the  system   What challenges do we have?
  • 47.
    47   •  Persistent  connecHons  +  repeatable  read   •  History  list  skyrocketed   •  Too  many  metrics  slows  down  graphing   •  Too  many  metrics  can  kill  a  host   •  EstatsD  for  Erlang   What lessons have we learned?
  • 48.
  • 49.
    49   •  Graphite:   hsp://graphite.readthedocs.org/en/latest/   •  Collectd:   hsps://collectd.org/   •  StatsD  on  Github  by  Etsy:   hsps://github.com/etsy/statsd/wiki   •  Etsy  on  StatsD:   hsp://codeascra•.etsy.com/2011/02/15/measure-­‐ anything-­‐measure-­‐everything/     Practical links
  • 50.
    50   •  PresentaHon  can  be  found  at:   hsp://spil.com/perconasc2013   •  If  you  wish  to  contact  me:   art@spilgames.com   •  Don’t  forget  to  rate  my  talk!   Thank you!