Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
   
   
by niteroi @ panoramio.com
   
vimeo.com/43800150
   
   
   
   
problems
Metrics 2.0 concepts
implementations
& examples
   
Mostly
graphite
   
terminology
sync
   
(1234567890, 82)
(1234567900, 123)
(1234567910, 109)
(1234567920, 77)
db15.mysql.queries_running
host=db15 mysql.queri...
   
Problems
   
Vimeo.com
pagerequests/s?
server X write perf?
   
Finding metrics
Browse hierarchies
Dashboard search .. which keywords?
Search in source code/documentation?
Ask around...
   
stats.hits.vimeo_com
stats_counts.hits.vimeo_com
stats.*.requesthostport.
vimeo_com_80
   
Meaning, difference
Unit?
Where and how.. hard
Prefixes
Understanding metrics
   
collectd.db.disk.sda1.disk_time
.write
   
Terminology? Which field is where?
Total so far? From zero per datapoint?
Aggregate? Which?
Point at t=x describes whi...
   
Change agent?
   
Unclear, inconsistent
terminology, format
tightly coupled
lack information
   
O(S*P*A) 
  S = # Sources     
P = # People     
A = # Aggregators    
   
   
   
times
N
   
graph definitions are
redundant and a time sink.
   
   
http://litlquest.com/forest-trees/see-forest-trees-2
   
metrics 2.0
concepts
   
Self-describing
Standardized
Orthogonal dimensions
   
stats.timers.dfs5.
proxy-server.object.GET.200.
timing.upper_90
   
{
server: dfvimeodfsproxy5,
http_method: GET,
http_code: 200,
unit: ms,
metric_type: gauge,
stat: upper_90,
swift_type...
   
allow more characters
unit: Req/s,
site: vimeo.com,
...
   
Metadata
meta: {
src: proxy.py:458,
from: diamond
}
   
Conceptual model vs
wire protocol vs
storage
   
metrics20.org
   
SI + IEC
B Err Warn Conn
Job File Req ...
MB/s Err/d
Req/h ...
   
Immediate understanding
of metrics
Minimize time to graphs,
alerting rules, debugging
compatibility & flexibility
in t...
   
Implementations
& examples
   
   
Carbon-tagger
…
stats.gauges.host.foo 125 1234567890
service=foo instance=host
target_type=gauge unit=B 123 1234567890...
   
   
Statsdaemon
unit=B
unit=B
...
unit=ms
unit=ms
...
unit=B/s
unit=ms stat=mean
unit=ms stat=upper_90
...
   
Keep metric
tags in sync
with data
   
Graph
Explorer
   
   
Graph­Explorer queries 101
proxy-server swift
server:regex unit=ms
(AND)
   
   
   
   
   
   
   
   
upper_90 (or stat=upper_90)
from <datetime>
to <datetime>
avg over <timespec>
(5M, 1h, 3d, ...)
   
Compare object put/get
stack …
http_method:(PUT|GET)
swift_type=object
avg by http_code,server
   
   
Comparing servers
http_method:(PUT|GET)
group by unit,target_type
avg by http_code,
swift_type,http_method
   
   
transcode unit=Job/s
avg over <time>
from <datetime> to <datetime>
   
Note: data is obfuscated
   
Bucketing
sum by zone:eu-west|us-
east|ap-southeast|us-west|
sa-east|vimeo-df|vimeo-lv
group by state
   
Note: data is obfuscated
   
Compare job states per region (zones 
bucket)
group by zone
   
Note: data is obfuscated
   
Unit conversion
unit=Mb/s network
server:regex
sum by server
   
   
   
Integration
Metric unit=B/s
Query unit=TB
   
   
Deriving
Metric unit=B
Query unit=GB/d
   
   
Bonus round
   
   
   
   
   
   
   
   
   
Dashboard definition
 queries = [
'cpu usage sum by core',
'mem unit=B !total group by type:swap',
'stack network unit...
   
   
   
   
Future
Work
   
● Storage aggregation rules
● graphite API functions such as 
cumulative, summarize and 
smartSummarize
● consolidateB...
   
   
Self-describing &
standardized
stat=upper/lower/mean/...
target_type=counter..
   
Select your view
   
From: dygraphs.com
   
Facet based suggestions
   
unit=Err/s
   
Conclusion
structured
self­describing 
standardized
metrics = enabler
   
Conclusion
Manual composing 
should be last 
resort, not default
   
Conclusion
This sucks
– Tell me why
– What should we do instead?
This is neat!
– Help me make it better
– Adopt native...
   
Seen in this presentation:
metrics20.org
vimeo.github.io/graph-explorer
github.com/vimeo/timeserieswidget
github.com/v...
Upcoming SlideShare
Loading in …5
×

Metrics 2.0 @ Monitorama PDX 2014

15,137 views

Published on

Published in: Technology
  • Be the first to comment

Metrics 2.0 @ Monitorama PDX 2014

  1. 1.    
  2. 2.     by niteroi @ panoramio.com
  3. 3.     vimeo.com/43800150
  4. 4.    
  5. 5.    
  6. 6.    
  7. 7.     problems Metrics 2.0 concepts implementations & examples
  8. 8.     Mostly graphite
  9. 9.     terminology sync
  10. 10.     (1234567890, 82) (1234567900, 123) (1234567910, 109) (1234567920, 77) db15.mysql.queries_running host=db15 mysql.queries_running
  11. 11.     Problems
  12. 12.     Vimeo.com pagerequests/s? server X write perf?
  13. 13.     Finding metrics Browse hierarchies Dashboard search .. which keywords? Search in source code/documentation? Ask around ...
  14. 14.     stats.hits.vimeo_com stats_counts.hits.vimeo_com stats.*.requesthostport. vimeo_com_80
  15. 15.     Meaning, difference Unit? Where and how.. hard Prefixes Understanding metrics
  16. 16.     collectd.db.disk.sda1.disk_time .write
  17. 17.     Terminology? Which field is where? Total so far? From zero per datapoint? Aggregate? Which? Point at t=x describes which timeframe? Understanding metrics
  18. 18.     Change agent?
  19. 19.     Unclear, inconsistent terminology, format tightly coupled lack information
  20. 20.     O(S*P*A)    S = # Sources      P = # People      A = # Aggregators    
  21. 21.    
  22. 22.    
  23. 23.     times N
  24. 24.     graph definitions are redundant and a time sink.
  25. 25.    
  26. 26.     http://litlquest.com/forest-trees/see-forest-trees-2
  27. 27.     metrics 2.0 concepts
  28. 28.     Self-describing Standardized Orthogonal dimensions
  29. 29.     stats.timers.dfs5. proxy-server.object.GET.200. timing.upper_90
  30. 30.     { server: dfvimeodfsproxy5, http_method: GET, http_code: 200, unit: ms, metric_type: gauge, stat: upper_90, swift_type: object }
  31. 31.     allow more characters unit: Req/s, site: vimeo.com, ...
  32. 32.     Metadata meta: { src: proxy.py:458, from: diamond }
  33. 33.     Conceptual model vs wire protocol vs storage
  34. 34.     metrics20.org
  35. 35.     SI + IEC B Err Warn Conn Job File Req ... MB/s Err/d Req/h ...
  36. 36.     Immediate understanding of metrics Minimize time to graphs, alerting rules, debugging compatibility & flexibility in tooling
  37. 37.     Implementations & examples
  38. 38.    
  39. 39.     Carbon-tagger … stats.gauges.host.foo 125 1234567890 service=foo instance=host target_type=gauge unit=B 123 1234567890 …
  40. 40.    
  41. 41.     Statsdaemon unit=B unit=B ... unit=ms unit=ms ... unit=B/s unit=ms stat=mean unit=ms stat=upper_90 ...
  42. 42.     Keep metric tags in sync with data
  43. 43.     Graph Explorer
  44. 44.    
  45. 45.     Graph­Explorer queries 101 proxy-server swift server:regex unit=ms (AND)
  46. 46.    
  47. 47.    
  48. 48.    
  49. 49.    
  50. 50.    
  51. 51.    
  52. 52.    
  53. 53.     upper_90 (or stat=upper_90) from <datetime> to <datetime> avg over <timespec> (5M, 1h, 3d, ...)
  54. 54.     Compare object put/get stack … http_method:(PUT|GET) swift_type=object avg by http_code,server
  55. 55.    
  56. 56.     Comparing servers http_method:(PUT|GET) group by unit,target_type avg by http_code, swift_type,http_method
  57. 57.    
  58. 58.     transcode unit=Job/s avg over <time> from <datetime> to <datetime>
  59. 59.     Note: data is obfuscated
  60. 60.     Bucketing sum by zone:eu-west|us- east|ap-southeast|us-west| sa-east|vimeo-df|vimeo-lv group by state
  61. 61.     Note: data is obfuscated
  62. 62.     Compare job states per region (zones  bucket) group by zone
  63. 63.     Note: data is obfuscated
  64. 64.     Unit conversion unit=Mb/s network server:regex sum by server
  65. 65.    
  66. 66.    
  67. 67.     Integration Metric unit=B/s Query unit=TB
  68. 68.    
  69. 69.     Deriving Metric unit=B Query unit=GB/d
  70. 70.    
  71. 71.     Bonus round
  72. 72.    
  73. 73.    
  74. 74.    
  75. 75.    
  76. 76.    
  77. 77.    
  78. 78.    
  79. 79.    
  80. 80.     Dashboard definition  queries = [ 'cpu usage sum by core', 'mem unit=B !total group by type:swap', 'stack network unit=Mb/s', 'unit=B (free|used) group by =mountpoint' ]
  81. 81.    
  82. 82.    
  83. 83.    
  84. 84.     Future Work
  85. 85.     ● Storage aggregation rules ● graphite API functions such as  cumulative, summarize and  smartSummarize ● consolidateBy & Graph  renderers
  86. 86.    
  87. 87.     Self-describing & standardized stat=upper/lower/mean/... target_type=counter..
  88. 88.     Select your view
  89. 89.     From: dygraphs.com
  90. 90.     Facet based suggestions
  91. 91.     unit=Err/s
  92. 92.     Conclusion structured self­describing  standardized metrics = enabler
  93. 93.     Conclusion Manual composing  should be last  resort, not default
  94. 94.     Conclusion This sucks – Tell me why – What should we do instead? This is neat! – Help me make it better – Adopt native metrics 2.0, structured_metrics
  95. 95.     Seen in this presentation: metrics20.org vimeo.github.io/graph-explorer github.com/vimeo/timeserieswidget github.com/vimeo/carbon-tagger github.com/vimeo/statsdaemon github.com/Dieterbe/anthracite github.com/graphite-ng github.com/vimeo/graphite-influxdb github.com/vimeo/smoketcp github.com/vimeo/tailgate twitter.com/Dieter_be dieter.plaetinck.be You might also like:

×