Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Prometheus design and philosophy

1,286 views

Published on

Prometheus Design and Philosophy by Julius Volz at Docker Distributed System Summit
Prometheus - https://github.com/Prometheus
Liveblogging: http://canopy.mirage.io/Liveblog/MonitoringDDS2016

Published in: Technology
  • Be the first to comment

Prometheus design and philosophy

  1. 1. Prometheus Prometheus Design and Philosophy Why It Is the Way It Is Julius Volz, October 2016
  2. 2. Prometheus Opinionated Monitoring System ● Prometheus is different ● potentially surprising decisions ● disclaimer: our opinion
  3. 3. Prometheus ● Started in 2012 at SoundCloud by Matt and Julius ● Inspired by Google’s monitoring tools ● Motivation ○ needed to monitor dynamic cloud environment ○ unsatisfying data models, querying, and efficiency in existing approaches ● Now in CNCF, many other companies and people Origin
  4. 4. Prometheus Monitoring system and TSDB: ● instrumentation ● metrics collection and storage ● querying ● alerting ● dashboarding / graphing / trending What is Prometheus? Focus on: ● operational systems monitoring ● dynamic cloud environments
  5. 5. Prometheus Architecture
  6. 6. Prometheus Four main selling points 1. Dimensional data model 2. Powerful query language 3. Efficiency 4. Operational simplicity
  7. 7. Prometheus ● raw log / event collection ● request tracing ● “magic” anomaly detection ● durable long-term storage ● automatic horizontal scaling ● user / auth management What does Prometheus NOT do?
  8. 8. Prometheus Expression browser
  9. 9. Prometheus Built-in graphing
  10. 10. Prometheus Grafana Support
  11. 11. Prometheus Data Model api_http_requests_total method=POST method=GET method=... path=/tracks path=/users path=... status=200 status=404 status=... job=api-server job=node job=... instance=1.2.3.4:80 instance=1.2.3.4:81 instance=... Labels > Hierarchy api-server 1.2.3.4:80 /tracks GET 200 404 […] POST […] /users […] 1.2.3.4:81 /tracks GET 200 […] [...] /users [...] [...]
  12. 12. Prometheus Labels > Hierarchy ● more flexible ● more efficient ● explicit dimensions api_http_requests_total{method=”post”} vs. api-server.*.*.post.*
  13. 13. Prometheus Non-SQL Query Language PromQL: rate(api_http_requests_total[5m]) SQL: SELECT job, instance, method, status, path, rate(value, 5m) FROM api_http_requests_total PromQL: avg by(city) (temperature_celsius{country=”germany”}) SQL: SELECT city, AVG(value) FROM temperature_celsius WHERE country=”germany” GROUP BY city PromQL: rate(errors{job=”foo”}[5m]) / rate(total{job=”foo”}[5m]) SQL: SELECT errors.job, errors.instance, […more labels…], rate(errors.value, 5m) / rate(total.value, 5m) FROM errors JOIN total ON […all the label equalities…] WHERE errors.job=”foo” AND total.job=”foo”
  14. 14. Prometheus ● better for metrics computation ● only does reads PromQL
  15. 15. Prometheus Pull vs. Push ● automatic upness monitoring ● horizontal monitoring ● more flexible ● simpler HA ● less configuration ● yes, it scales!
  16. 16. Prometheus But but...
  17. 17. Prometheus Alternatives and Workarounds ● run Prometheus on same network segment ● open port(s) in firewall / router ● open tunnel / VPN
  18. 18. Prometheus Uber-Exporters or... Per-Process Exporters?
  19. 19. Prometheus Per-Machine Uber-Exporters
  20. 20. Prometheus Drawbacks ● operational bottleneck ● SPOF, no isolation ● can’t scrape selectively ● harder up-ness monitoring ● harder to associate metadata
  21. 21. Prometheus One Exporter per Process
  22. 22. Prometheus No Clustering? ● really hard to get right ● first thing to fail when you need it (e.g. during network outage) ● keep it simple, focus on operational monitoring ● HA for alerting still easy
  23. 23. Prometheus Conclusion ● it’s not about Prometheus ● it’s about monitoring best practices ● it’s our opinion
  24. 24. Prometheus Demo Time!
  25. 25. Prometheus Thanks!
  26. 26. Prometheus Why not JSON? We optimized for two extremes: Text format ● easy to construct ● relatively efficient ● readable ● streamable Protobuf format ● very efficient ● robust ● streamable JSON? Worse in all categories.
  27. 27. Prometheus Relabeling - WTF?
  28. 28. Prometheus Relabeling - OK... ● a new DSL ● steep learning curve ● ...but very flexible The alternative: many special config options.
  29. 29. Prometheus Stateful Client Libraries ● client libs keep state ● but not much ● manage metrics for you ● pre-aggregation is more efficient
  30. 30. Prometheus Everything is a float64... This is crazy! I only need integers! What about precision?
  31. 31. Prometheus Everything is a float64... ● it’s simpler ● we compress it incredibly well ● float64 integer precision until 2^53 To run into trouble: Increment counter 1 million times per second for over 285 years
  32. 32. Prometheus Auth or Multi-User? ● focus on great monitoring ● too many different ways ● solve auth externally ● multitenancy is more difficult

×