Metrics stack 2.0

Dieter Plaetinck
Dieter PlaetinckPrincipal Software Engineer at Grafana Labs
   
   
Credit: user niteroi @ panoramio.com
   
vimeo.com/43800150
   
   
   
   
   
   
   
   
1  Metrics 2.0 concepts
2  Implementation
3  Advanced stuff
   
“Dieter” ?
   
Peter   Deter→
   
Terminology sync
   
(1234567890, 82)
(1234567900, 123)
(1234567910, 109)
(1234567920, 77)
db15.mysql.queries_running
host=db15 mysql.queries_running
   
   
How many pagerequests/s is vimeo.com 
doing?
   
● stats.hits.vimeo_com
● stats_counts.hits.vimeo_com
   
   
stats.<host>.requesthostport.vimeo_
com_443
   
stats.timers.dfs5.proxy­
server.object.GET.200.timing
.upper_90
   
O(X*Y*Z)
X = # apps                
Y = # people             
Z = # aggregators     
   
How long does it take to retrieve an object from swift?
   
stats.timers.<host>.proxy­
server.<swift_type>.<http_method>.
<http_code>.timing.<stat>
stats.timers.<host>.object­
server.<http_method>.
timing.<stat>
target=stats.timers.dfs*.object*GET*timing.mean ?
target=groupByNode(stats.timers.dfs*.proxy
­server.object.GET.*.timing.mean,2,"avg")
target=stats.timers.dfs*.object­
server.GET.timing.mean
   
swift_type=object stat=mean timing GET avg by http_code
   
   
   
O((DxV)^2)
D = # dimensions             
V = # values per dim             
   
collectd.db.disk.sda1.dis
k_time.write
   
   
   
What should I name my metric?
   
10
100
1000
10000
100000
1000000
   
   
Metrics 2.0
   
Old:
● information lacking
● fields unclear & inconsistent
● cumbersome strings / trees
● forbidden characters
New:
● Self­describing
● Standardized
● all dimensions in orthogonal tag­space
● Allow some useful characters
   
stats.timers.dfs5.proxy­server.object.GET.200.timing.upper_90
{
    “server”: “dfvimeodfsproxy5”,
    “http_method”: “GET”,
    “http_code”: “200”,
    “unit”: “ms”,
    “target_type”: “gauge”,
    “stat”: “upper_90”,
    “swift_type”: “object”
    “plugin”: “swift_proxy_server”
}
   
Main advantages:
● Immediate understanding of metric meaning (ideally)
● Minimize time to graphs, dashboards, alerting rules 
   
github.com/vimeo/graph­explorer/wiki
   
SI + IEC
B   Err   Warn   Conn   Job   File   Req    ...
MB/s   Err/d   Req/h   ...
   
{
    “site”: “vimeo.com”,
    “port”: 80,
    “unit”: “Req/s”,
    “direction”: “in”,
    “service”: “webapp_php”,
    “server”:  “webxx”
}
   
   
Carbon­tagger:
... 
service=foo.instance=host.target_type=gauge.type=calculatio
n.unit=B 123 1234567890
…
Statsdaemon:
..unit=B..unit=B...        unit=B/s→
..unit=ms..unit=ms..    unit=ms stat=mean→
                                   → unit=ms stat=upper_90
                                   → ...
   
   
   
Graph­Explorer queries 101
site:api.vimeo.com unit=Req/s
requesthostport api_vimeo_com
   
   
Smoothing
avg over 10M
avg over ...
   
   
Aggregation, compare port 80 vs 443
avg by <dimension>
sum by <dimension>
sum by server
   
   
Compare 80 traffic amongt servers
site:api.vimeo.com unit=Req/s port=80 group by none avg 
over 10M
   
   
Graph­Explorer queries 201
proxy­server swift server:regex upper_90 unit=ms from 
<datetime> to <datetime> avg over <timespec> 
   
   
   
   
   
Compare object put/get
Stack .. http_method:(PUT|GET) swift_type=object avg by 
http_code,server
   
   
Comparing servers
http_method:(PUT|GET) avg by 
http_code,swift_type,http_method group by none
   
   
Compare http codes for GET, per swift type
http_method=GET avg by server group by swift_type
   
   
transcode unit=Job/s avg over <time> from <datetime> to 
<datetime>
    Note: data is obfuscated
   
Bucketing
!queue sum by zone:ap­southeast|eu­west|us­east|us­
west|sa­east|vimeo­df|vimeo­lv group by state
    Note: data is obfuscated
   
Compare job states per region (zones bucket)
group by zone
    Note: data is obfuscated
   
Unit conversion
unit=Mb/s network dfvimeorpc sum by server
   
   
   
unit=MB
   
   
   
{
    server=dfvimeodfs1
    plugin=diskspace
    mountpoint=_srv_node_dfs5
    unit=B
    type=used
    target_type=gauge
}
   
server:dfvimeodfs unit=GB type=free srv node
   
   
unit=GB/d group by mountpoint
   
   
   
   
   
   
   
Dashboard definition
 queries = [
   'cpu usage sum by core',
   'mem unit=B !total group by type:swap',
   'stack network unit=b/s',
   'unit=B (free|used) group by =mountpoint'
 ]
   
   
stats.dfvimeocliapp2.twitter.error
{
    “n1”: “dfvimeocliapp2”,
    “n2”: “twitter”,
    “n3”: “error”,
    “plugin”: “catchall_statsd”,
    “source”: “statsd”,
    “target_type”: “rate”,
    “unit”: “unknown/s”
}
   
Two hard things in computer science
   
stats.gauges.files.
id_boundary_7day
stats.gauges.files.
id_boundary_ceil
   
unit=File id_boundary_7d 
{
   “unit”: “File”,
   “n1”: “id_boundary_7d”,
}
   
{
    “intrinsic”: {
        “site”: “vimeo.com”,
        “unit”: “Req/s”
    },
    “extrinsic”: {
        “agent”: “diamond”,
        “processed_by”: “statsd1”,
        “src”: “index.php:135”,
        “replaces”: “vimeo_com_reqps”
    }
}
   
site=vimeo.com unit=Req/s 
  processed_by=statsd1  
src=index.php:135 added_by=dieter 
123 1234567890
   
   
Equivalence
servers.host.cpu.total.iowait   “core” : “_sum_”→
servers.host.cpu.<core­number>.iowait
servers.host.loadavg.15
   
Rollups & aggregation
   
/etc/carbon/storage­aggregation.conf
[min]
pattern = .min$
aggregationMethod = min
[max]
pattern = .max$
aggregationMethod = max
[sum]
pattern = .count$
aggregationMethod = sum
[default_average]
pattern = .*
aggregationMethod = average
   
   
2 kinds of graphite users
   
Self­describing metrics
stat=upper/lower/mean/...
target_type=counter..
   
●    stats.timers.render_time.histogram.bin_0.01
●    stats.timers.render_time.histogram.bin_0.1
●    stats.timers.render_time.histogram.bin_1           unit=Freq_abs bin_upper=1→
●    stats.timers.render_time.histogram.bin_10
●    stats.timers.render_time.histogram.bin_50
●    stats.timers.render_time.histogram.bin_inf
●    stats.timers.render_time.lower                            unit=ms stat=lower→
●    stats.timers.render_time.mean                            unit=ms stat=mean→
●    stats.timers.render_time.mean_90                      ...→
●    stats.timers.render_time.median
●    stats.timers.render_time.std
●    stats.timers.render_time.upper
●    stats.timers.render_time.upper_90
   
Also..
● graphite API functions such as "cumulative", "summarize" 
and "smartSummarize"
● Graph renderers
   
   
From: dygraphs.com
   
   
   
   
   
   
Facet based suggestions
   
   
Metric types
● gauge
● count & rate
● counter
● timer
   
   
   
   
   
gauge
● Multiple values in same interval
● “sticky”
   
   
Count & Rate
   
Counter
   
Timer..
   
   
http://janabeck.com/blog/2012/10/12/lessons­learned­from­100/
   
Timer..
   
● What should a metric be?
● Stickyness?
● Behavior on no packets received
● Behavior on multiple packets received
   
My personal takeaways
   
Conclusion
● Building graphs, setting up alerting cumbersome
● Esp. changing information needs (troubleshooting, exploring, ..)
● Esp. Complicated information needs 
  → PAIN
● Structuring metrics
● Self­describing metrics
● Standardized metrics
● Native metrics 2.0
●  → BREEZE 
   
Conclusion
● Metrics can be so much more usable and useful. Let's talk about 
tagging, standardisation, retaining information throughout the 
pipeline.
● Converting information needs into graph defs, alerting rules
● Graph­Explorer, carbon­tagger, statsdaemon, …
● Graphite­ng (native metrics 2.0)
● Metrics 2.0 in your apps, agents, aggregators?
● Build out structured metrics library
   
github.com/vimeo
github.com/Dieterbe
twitter.com/Dieter_be
dieter.plaetinck.be
   
1 of 131

Recommended

Metrics 2.0 & Graph-Explorer by
Metrics 2.0 & Graph-ExplorerMetrics 2.0 & Graph-Explorer
Metrics 2.0 & Graph-ExplorerDieter Plaetinck
13.7K views46 slides
Metrics 2.0 @ Monitorama PDX 2014 by
Metrics 2.0 @ Monitorama PDX 2014Metrics 2.0 @ Monitorama PDX 2014
Metrics 2.0 @ Monitorama PDX 2014Dieter Plaetinck
16.8K views95 slides
A Beginner's Guide to Building Data Pipelines with Luigi by
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiGrowth Intelligence
57K views26 slides
Graphite by
GraphiteGraphite
GraphiteMichael Masters
302 views24 slides
Weather of the Century: Design and Performance by
Weather of the Century: Design and PerformanceWeather of the Century: Design and Performance
Weather of the Century: Design and PerformanceMongoDB
1K views38 slides
The Weather of the Century Part 2: High Performance by
The Weather of the Century Part 2: High PerformanceThe Weather of the Century Part 2: High Performance
The Weather of the Century Part 2: High PerformanceMongoDB
1.3K views28 slides

More Related Content

What's hot

Why Grails? by
Why Grails?Why Grails?
Why Grails?Yiguang Hu
498 views13 slides
SharePoint Administration with PowerShell by
SharePoint Administration with PowerShellSharePoint Administration with PowerShell
SharePoint Administration with PowerShellEric Kraus
1.3K views18 slides
Deep dive into deeplearn.js by
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.jsKai Sasaki
2.9K views39 slides
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow by
Business Dashboards using Bonobo ETL, Grafana and Apache AirflowBusiness Dashboards using Bonobo ETL, Grafana and Apache Airflow
Business Dashboards using Bonobo ETL, Grafana and Apache AirflowRomain Dorgueil
2.8K views91 slides
Introduction to PiCloud by
Introduction to PiCloudIntroduction to PiCloud
Introduction to PiCloudBill Koch
3K views18 slides
Ember by
EmberEmber
Embermrphilroth
2.2K views43 slides

What's hot(20)

SharePoint Administration with PowerShell by Eric Kraus
SharePoint Administration with PowerShellSharePoint Administration with PowerShell
SharePoint Administration with PowerShell
Eric Kraus1.3K views
Deep dive into deeplearn.js by Kai Sasaki
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
Kai Sasaki2.9K views
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow by Romain Dorgueil
Business Dashboards using Bonobo ETL, Grafana and Apache AirflowBusiness Dashboards using Bonobo ETL, Grafana and Apache Airflow
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
Romain Dorgueil2.8K views
Introduction to PiCloud by Bill Koch
Introduction to PiCloudIntroduction to PiCloud
Introduction to PiCloud
Bill Koch3K views
Machine Learning Model Bakeoff by mrphilroth
Machine Learning Model BakeoffMachine Learning Model Bakeoff
Machine Learning Model Bakeoff
mrphilroth1.4K views
Mythbusting: Understanding How We Measure the Performance of MongoDB by MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDBMythbusting: Understanding How We Measure the Performance of MongoDB
Mythbusting: Understanding How We Measure the Performance of MongoDB
MongoDB847 views
Time Series Analysis for Network Secruity by mrphilroth
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
mrphilroth6.3K views
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa... by InfluxData
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
InfluxData136 views
Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ... by InfluxData
Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ...Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ...
Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ...
InfluxData118 views
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux... by InfluxData
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
InfluxData1.2K views
Graph Based Malware Analysis @ Graphday SF 2018 by Florian Hockmann
Graph Based Malware Analysis @ Graphday SF 2018Graph Based Malware Analysis @ Graphday SF 2018
Graph Based Malware Analysis @ Graphday SF 2018
Florian Hockmann496 views
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira... by InfluxData
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
InfluxData329 views
INFLUXQL & TICKSCRIPT by InfluxData
INFLUXQL & TICKSCRIPTINFLUXQL & TICKSCRIPT
INFLUXQL & TICKSCRIPT
InfluxData1.6K views
Liquid Stream Processing Across Web Browsers and Web Servers by Masiar Babazadeh
Liquid Stream Processing Across Web Browsers and Web ServersLiquid Stream Processing Across Web Browsers and Web Servers
Liquid Stream Processing Across Web Browsers and Web Servers
Masiar Babazadeh371 views

Similar to Metrics stack 2.0

Rethinking metrics: metrics 2.0 by
Rethinking metrics: metrics 2.0Rethinking metrics: metrics 2.0
Rethinking metrics: metrics 2.0Dieter Plaetinck
933 views108 slides
Rethinking metrics: metrics 2.0 @ Lisa 2014 by
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
1.3K views111 slides
Experienced Selenium Interview questions by
Experienced Selenium Interview questionsExperienced Selenium Interview questions
Experienced Selenium Interview questionsarchana singh
179 views12 slides
Google Cloud Platform monitoring with Zabbix by
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixMax Kuzkin
11.1K views26 slides
Jenkins Online Meetup - Automated SLI based Build Validation with Keptn by
Jenkins Online Meetup - Automated SLI based Build Validation with KeptnJenkins Online Meetup - Automated SLI based Build Validation with Keptn
Jenkins Online Meetup - Automated SLI based Build Validation with KeptnAndreas Grabner
420 views40 slides
Monitoring und Metriken im Wunderland by
Monitoring und Metriken im WunderlandMonitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandD
651 views68 slides

Similar to Metrics stack 2.0(20)

Rethinking metrics: metrics 2.0 @ Lisa 2014 by Dieter Plaetinck
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
Dieter Plaetinck1.3K views
Experienced Selenium Interview questions by archana singh
Experienced Selenium Interview questionsExperienced Selenium Interview questions
Experienced Selenium Interview questions
archana singh179 views
Google Cloud Platform monitoring with Zabbix by Max Kuzkin
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with Zabbix
Max Kuzkin11.1K views
Jenkins Online Meetup - Automated SLI based Build Validation with Keptn by Andreas Grabner
Jenkins Online Meetup - Automated SLI based Build Validation with KeptnJenkins Online Meetup - Automated SLI based Build Validation with Keptn
Jenkins Online Meetup - Automated SLI based Build Validation with Keptn
Andreas Grabner420 views
Monitoring und Metriken im Wunderland by D
Monitoring und Metriken im WunderlandMonitoring und Metriken im Wunderland
Monitoring und Metriken im Wunderland
D 651 views
DDD, CQRS, ES lessons learned by Qframe
DDD, CQRS, ES lessons learnedDDD, CQRS, ES lessons learned
DDD, CQRS, ES lessons learned
Qframe2.3K views
Measuring User Experience in the Browser by Alois Reitbauer
Measuring User Experience in the BrowserMeasuring User Experience in the Browser
Measuring User Experience in the Browser
Alois Reitbauer1.1K views
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale by Sriram Krishnan
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan4.1K views
Authentication by soon
AuthenticationAuthentication
Authentication
soon893 views
DjangoCon 2010 Scaling Disqus by zeeg
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
zeeg32.6K views
Cloud patterns - NDC Oslo 2016 - Tamir Dresher by Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherCloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Tamir Dresher375 views
An Introduction to Celery by Idan Gazit
An Introduction to CeleryAn Introduction to Celery
An Introduction to Celery
Idan Gazit70.3K views

Recently uploaded

An approach of ontology and knowledge base for railway maintenance by
An approach of ontology and knowledge base for railway maintenanceAn approach of ontology and knowledge base for railway maintenance
An approach of ontology and knowledge base for railway maintenanceIJECEIAES
12 views14 slides
Effect of deep chemical mixing columns on properties of surrounding soft clay... by
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...AltinKaradagli
6 views10 slides
K8S Roadmap.pdf by
K8S Roadmap.pdfK8S Roadmap.pdf
K8S Roadmap.pdfMaryamTavakkoli2
6 views1 slide
Thermal aware task assignment for multicore processors using genetic algorithm by
Thermal aware task assignment for multicore processors using genetic algorithm Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm IJECEIAES
31 views12 slides
Saikat Chakraborty Java Oracle Certificate.pdf by
Saikat Chakraborty Java Oracle Certificate.pdfSaikat Chakraborty Java Oracle Certificate.pdf
Saikat Chakraborty Java Oracle Certificate.pdfSaikatChakraborty787148
15 views1 slide
Digital Watermarking Of Audio Signals.pptx by
Digital Watermarking Of Audio Signals.pptxDigital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptxAyushJaiswal781174
11 views25 slides

Recently uploaded(20)

An approach of ontology and knowledge base for railway maintenance by IJECEIAES
An approach of ontology and knowledge base for railway maintenanceAn approach of ontology and knowledge base for railway maintenance
An approach of ontology and knowledge base for railway maintenance
IJECEIAES12 views
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli6 views
Thermal aware task assignment for multicore processors using genetic algorithm by IJECEIAES
Thermal aware task assignment for multicore processors using genetic algorithm Thermal aware task assignment for multicore processors using genetic algorithm
Thermal aware task assignment for multicore processors using genetic algorithm
IJECEIAES31 views
Taking out the Trash (And the Recyclables]: RFID and the Handling of Municipa... by ijseajournal
Taking out the Trash (And the Recyclables]: RFID and the Handling of Municipa...Taking out the Trash (And the Recyclables]: RFID and the Handling of Municipa...
Taking out the Trash (And the Recyclables]: RFID and the Handling of Municipa...
ijseajournal5 views
Generative AI Models & Their Applications by SN
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their Applications
SN6 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Instrumentation & Control Lab Manual.pdf by NTU Faisalabad
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdf
NTU Faisalabad 5 views
Machine learning in drug supply chain management during disease outbreaks: a ... by IJECEIAES
Machine learning in drug supply chain management during disease outbreaks: a ...Machine learning in drug supply chain management during disease outbreaks: a ...
Machine learning in drug supply chain management during disease outbreaks: a ...
IJECEIAES12 views
SUMIT SQL PROJECT SUPERSTORE 1.pptx by Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 12 views
Machine Element II Course outline.pdf by odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese18 views
A multi-microcontroller-based hardware for deploying Tiny machine learning mo... by IJECEIAES
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
IJECEIAES13 views
_MAKRIADI-FOTEINI_diploma thesis.pptx by fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi7 views

Metrics stack 2.0