Metrics 2.0 @ Monitorama PDX 2014

12,502 views
12,725 views

Published on

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
12,502
On SlideShare
0
From Embeds
0
Number of Embeds
6,891
Actions
Shares
0
Downloads
80
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Metrics 2.0 @ Monitorama PDX 2014

  1. 1.    
  2. 2.     by niteroi @ panoramio.com
  3. 3.     vimeo.com/43800150
  4. 4.    
  5. 5.    
  6. 6.    
  7. 7.     problems Metrics 2.0 concepts implementations & examples
  8. 8.     Mostly graphite
  9. 9.     terminology sync
  10. 10.     (1234567890, 82) (1234567900, 123) (1234567910, 109) (1234567920, 77) db15.mysql.queries_running host=db15 mysql.queries_running
  11. 11.     Problems
  12. 12.     Vimeo.com pagerequests/s? server X write perf?
  13. 13.     Finding metrics Browse hierarchies Dashboard search .. which keywords? Search in source code/documentation? Ask around ...
  14. 14.     stats.hits.vimeo_com stats_counts.hits.vimeo_com stats.*.requesthostport. vimeo_com_80
  15. 15.     Meaning, difference Unit? Where and how.. hard Prefixes Understanding metrics
  16. 16.     collectd.db.disk.sda1.disk_time .write
  17. 17.     Terminology? Which field is where? Total so far? From zero per datapoint? Aggregate? Which? Point at t=x describes which timeframe? Understanding metrics
  18. 18.     Change agent?
  19. 19.     Unclear, inconsistent terminology, format tightly coupled lack information
  20. 20.     O(S*P*A)    S = # Sources      P = # People      A = # Aggregators    
  21. 21.    
  22. 22.    
  23. 23.     times N
  24. 24.     graph definitions are redundant and a time sink.
  25. 25.    
  26. 26.     http://litlquest.com/forest-trees/see-forest-trees-2
  27. 27.     metrics 2.0 concepts
  28. 28.     Self-describing Standardized Orthogonal dimensions
  29. 29.     stats.timers.dfs5. proxy-server.object.GET.200. timing.upper_90
  30. 30.     { server: dfvimeodfsproxy5, http_method: GET, http_code: 200, unit: ms, metric_type: gauge, stat: upper_90, swift_type: object }
  31. 31.     allow more characters unit: Req/s, site: vimeo.com, ...
  32. 32.     Metadata meta: { src: proxy.py:458, from: diamond }
  33. 33.     Conceptual model vs wire protocol vs storage
  34. 34.     metrics20.org
  35. 35.     SI + IEC B Err Warn Conn Job File Req ... MB/s Err/d Req/h ...
  36. 36.     Immediate understanding of metrics Minimize time to graphs, alerting rules, debugging compatibility & flexibility in tooling
  37. 37.     Implementations & examples
  38. 38.    
  39. 39.     Carbon-tagger … stats.gauges.host.foo 125 1234567890 service=foo instance=host target_type=gauge unit=B 123 1234567890 …
  40. 40.    
  41. 41.     Statsdaemon unit=B unit=B ... unit=ms unit=ms ... unit=B/s unit=ms stat=mean unit=ms stat=upper_90 ...
  42. 42.     Keep metric tags in sync with data
  43. 43.     Graph Explorer
  44. 44.    
  45. 45.     Graph­Explorer queries 101 proxy-server swift server:regex unit=ms (AND)
  46. 46.    
  47. 47.    
  48. 48.    
  49. 49.    
  50. 50.    
  51. 51.    
  52. 52.    
  53. 53.     upper_90 (or stat=upper_90) from <datetime> to <datetime> avg over <timespec> (5M, 1h, 3d, ...)
  54. 54.     Compare object put/get stack … http_method:(PUT|GET) swift_type=object avg by http_code,server
  55. 55.    
  56. 56.     Comparing servers http_method:(PUT|GET) group by unit,target_type avg by http_code, swift_type,http_method
  57. 57.    
  58. 58.     transcode unit=Job/s avg over <time> from <datetime> to <datetime>
  59. 59.     Note: data is obfuscated
  60. 60.     Bucketing sum by zone:eu-west|us- east|ap-southeast|us-west| sa-east|vimeo-df|vimeo-lv group by state
  61. 61.     Note: data is obfuscated
  62. 62.     Compare job states per region (zones  bucket) group by zone
  63. 63.     Note: data is obfuscated
  64. 64.     Unit conversion unit=Mb/s network server:regex sum by server
  65. 65.    
  66. 66.    
  67. 67.     Integration Metric unit=B/s Query unit=TB
  68. 68.    
  69. 69.     Deriving Metric unit=B Query unit=GB/d
  70. 70.    
  71. 71.     Bonus round
  72. 72.    
  73. 73.    
  74. 74.    
  75. 75.    
  76. 76.    
  77. 77.    
  78. 78.    
  79. 79.    
  80. 80.     Dashboard definition  queries = [ 'cpu usage sum by core', 'mem unit=B !total group by type:swap', 'stack network unit=Mb/s', 'unit=B (free|used) group by =mountpoint' ]
  81. 81.    
  82. 82.    
  83. 83.    
  84. 84.     Future Work
  85. 85.     ● Storage aggregation rules ● graphite API functions such as  cumulative, summarize and  smartSummarize ● consolidateBy & Graph  renderers
  86. 86.    
  87. 87.     Self-describing & standardized stat=upper/lower/mean/... target_type=counter..
  88. 88.     Select your view
  89. 89.     From: dygraphs.com
  90. 90.     Facet based suggestions
  91. 91.     unit=Err/s
  92. 92.     Conclusion structured self­describing  standardized metrics = enabler
  93. 93.     Conclusion Manual composing  should be last  resort, not default
  94. 94.     Conclusion This sucks – Tell me why – What should we do instead? This is neat! – Help me make it better – Adopt native metrics 2.0, structured_metrics
  95. 95.     Seen in this presentation: metrics20.org vimeo.github.io/graph-explorer github.com/vimeo/timeserieswidget github.com/vimeo/carbon-tagger github.com/vimeo/statsdaemon github.com/Dieterbe/anthracite github.com/graphite-ng github.com/vimeo/graphite-influxdb github.com/vimeo/smoketcp github.com/vimeo/tailgate twitter.com/Dieter_be dieter.plaetinck.be You might also like:

×