Successfully reported this slideshow.
Your SlideShare is downloading. ×

Monitoring in a fast-changing world with Prometheus

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 65 Ad

Monitoring in a fast-changing world with Prometheus

Download to read offline

Prometheus is an open source monitoring project used to gather metrics.
It as many capabilities built-in, such as service discovery, which makes it very suitable for an automated environment.

This talk will give a brief introduction of Prometheus, what are the latest developments, and then give practical tips and examples about how you can use it in an automated world.

Prometheus is an open source monitoring project used to gather metrics.
It as many capabilities built-in, such as service discovery, which makes it very suitable for an automated environment.

This talk will give a brief introduction of Prometheus, what are the latest developments, and then give practical tips and examples about how you can use it in an automated world.

Advertisement
Advertisement

More Related Content

More from Julien Pivotto (20)

Recently uploaded (20)

Advertisement

Monitoring in a fast-changing world with Prometheus

  1. 1. Julien Pivotto @roidelapluie Monitoring in a fast-changing world with Prometheus October 2021
  2. 2. Monitoring @roidelapluie
  3. 3. • Applications are short lived • Updated often • Infrastructure changes (Nothing new...) A fast changing world @roidelapluie
  4. 4. • Monitoring an infrastructure • Monitoring user experience ... together (dev&ops) Monitoring in a "fast changing world" @roidelapluie
  5. 5. CPU Usage, Disk space, Memory, Open file descriptors, ... Infrastructure monitoring @roidelapluie
  6. 6. Request Rate, Request Errors, Request Duration Utilization, Saturation, Errors User experience monitoring @roidelapluie RED method by Tom Wilkie, USE method by Brendan Gregg
  7. 7. • High level overview of the state of a service/component • Performance • Availability • Technical components What is going on? What is monitoring? @roidelapluie
  8. 8. • Understand how your services behave • Like you are at their place • Without specific code Why is this going on? What's observability? @roidelapluie
  9. 9. • Monitoring is required • Some monitoring systems are design for observability • If lucky, monitoring is enough • Observability is removing luck How do monitoring and observability connect? @roidelapluie
  10. 10. Three pillars: • Metrics • Logs • Traces What's observability - in Practice? @roidelapluie
  11. 11. Metrics @roidelapluie https://play.grafana.org/
  12. 12. Logs @roidelapluie
  13. 13. Traces @roidelapluie https://www.jaegertracing.io/img/trace-detail-ss.png
  14. 14. • Culture: building together • Automation • Measurement • Sharing A DevOps world @roidelapluie John Willis and Damon Edwards, 2010
  15. 15. Prometheus @roidelapluie
  16. 16. • Open Source monitoring solution • Graduated CNCF Project • Born in 2012, publicly announced in 2015 • Collects metrics • Plenty of integrations • Service discoveries, like kubernetes. • Easy to use query language • Built-in alerting Prometheus @roidelapluie
  17. 17. • A community • A server and many other components • An ecosystem What "Prometheus" means @roidelapluie
  18. 18. • Open Source • Pull-based Monitoring over HTTP • Powerful query language • Optimized TSDB The Prometheus server @roidelapluie
  19. 19. How does it work? @roidelapluie
  20. 20. How does it work? @roidelapluie
  21. 21. How does it work? @roidelapluie
  22. 22. How does it work? @roidelapluie
  23. 23. How does it work? @roidelapluie
  24. 24. Metrics @roidelapluie
  25. 25. Prometheus scrapes metrics over HTTP. caddy_http_requests_total{code="200",method="POST",path="/load"} 1 Dimensional data model, for filtering and aggregation. Metrics and Labels @roidelapluie
  26. 26. sum(rate(caddy_http_requests_total{code=~"5.."}[5m]) Gets the rate of all 5xx HTTP responses (server errors). Querying metrics and Labels @roidelapluie
  27. 27. sum(rate(caddy_http_requests_total{code=~"5.."}[5m]) without(code) / sum(rate(caddy_http_requests_total[5m]) without(code) Gets the % of 5xx HTTP responses (server errors). Querying metrics and Labels @roidelapluie
  28. 28. • Metrics do not represent problems • Metrics represent a state, give insights • Metrics can be graphed • You can alert based on them Metrics and monitoring @roidelapluie
  29. 29. In general you can just expose counters, and let the monitoring server do the real maths. That keeps the overhead very low of apps. Exposed metrics are "raw" @roidelapluie
  30. 30. Architecture @roidelapluie
  31. 31. • Prometheus server • Alertmanager • Exporters Prometheus components @roidelapluie
  32. 32. • Single binary • No clustering • No dependency on distributed FS Prometheus server @roidelapluie
  33. 33. • Single binary • Clustering (raft protocol) Alertmanager @roidelapluie
  34. 34. Automation @roidelapluie
  35. 35. Let's see what makes Prometheus play nicely with automation tools. Automating Prometheus @roidelapluie
  36. 36. • Works on your machine • Container ready • Not tied to kubernetes (see prometheus-operator) Deploy anywhere @roidelapluie
  37. 37. • Reloads on SIGHUP • /-/reload endpoint (--web.enable-lifecycle) • Working to have less and less overhead on reloads Reloading Prometheus @roidelapluie
  38. 38. - template: source: prometheus.yml.j2 target: /etc/prometheus/prometheus.yml validate: /usr/bin/promtool check config %s Also: check rules, check web-config. Ansible @roidelapluie
  39. 39. Plenty of situation do not require a reload of Prometheus: • Password files • TLS certificates Prometheus will read them before use, no reload needed! Not reloading Prometheus @roidelapluie
  40. 40. HashiCorp Vault enables retrieving temporary secrets and writing them to a file. ./vault agent -config vault-agent.hcl Using vault @roidelapluie https://inuits.eu/blog/prometheus-consul-vault-228/
  41. 41. scrape_configs: - job_name: consul-services consul_sd_configs: - server: localhost:8500 authorization: credentials_file: consul_token Reading token from vault @roidelapluie
  42. 42. Prometheus offers native TLS and basic auth. tls_server_config: cert_file: server.crt key_file: server.key basic_auth_users: alice: $2y$10$mDwo.lAisC94iLAyP8 bob: $2y$10$hLqFl9jSjoAAy95Z/zw8 TLS and basic auth @roidelapluie
  43. 43. The "web-config" file is read on every request: • No need to reload • Instantly change passwords, cert files Shared config format between Prometheus and exporters! TLS and basic auth @roidelapluie
  44. 44. Prometheus has a snapshot API. Enable with --web.enable-admin-api curl -d{} http://localhost:9090/api/v1/admin/tsdb/snapshot Prometheus TSDB is made of immutable blocks. Snapshots use hard links. Backups @roidelapluie
  45. 45. Service Discovery @roidelapluie
  46. 46. • Easier to know what's down with Pull • Easy debugging (curl) • Easier to spread the load • Central configuration point • High Availability Prometheus pull model @roidelapluie
  47. 47. • Prometheus must know what to pull • Source of Truth • Service Discovery != Auto discovery • Event based when possible Service Discovery @roidelapluie
  48. 48. • Kubernetes • Consul • Cloud providers (Azure, AWS, GCP, DigitalOcean, Hetzner, Scaleway, Linode) • Docker & Docker Swarm • And more! 20+ external SD in total. Sources of Truth @roidelapluie
  49. 49. • Static SD: into Prometheus main config • File SD: Files on disk • HTTP SD: HTTP endpoints Generic Service Discovery @roidelapluie https://inuits.eu/blog/prometheus-http-service-discovery/
  50. 50. [ { "targets": ["10.0.10.2:9100"], "labels": { "__meta_datacenter": "london" } } ] Generic SD format (file & http SD) @roidelapluie
  51. 51. • Both integrate your own SD systems into prometheus • File SD is event based (inotify) • HTTP SD can be integrated in your apps File SD vs HTTP SD @roidelapluie
  52. 52. Labels can be used to configure targets. • __address__: 127.0.0.1:9090 • __metrics_path__: /metrics • __scheme__: http or https • __param_<name>: http parameter • __scrape_interval__, __scrape_timeout__: 1m Labels @roidelapluie
  53. 53. Additionally, extra labels are added by SD. • __meta_kubernetes_pod_label_app • __meta_digitalocean_region • __meta_linode_public_ipv6 • __meta_scaleway_instance_status Meta labels @roidelapluie https://prometheus.io/docs/prometheus/latest/configuration/configuration/
  54. 54. A fundamental principle in Prometheus. Transform input labels into a new set of labels. Relabeling @roidelapluie
  55. 55. • Rename, merge, replace labels • Conditionally drop label sets • Only keep labels sets Relabeling actions @roidelapluie
  56. 56. • Get lots of labels as input • Turns them into targets • Remove labels prefixed with __ • Can use "special labels" Target relabeling @roidelapluie
  57. 57. - job_name: 'blackbox' metrics_path: /probe params: module: [http_2xx] static_configs: - targets: - http://prometheus.io relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115 Target relabeling example @roidelapluie
  58. 58. https://relabeler.promlabs.com/ Relabeler @roidelapluie
  59. 59. puppetdb_sd_configs: - url: http://127.0.0.1:8080/ query: 'resources { type = "Apache::Vhost" }' include_parameters: true relabel_configs: - source_labels: [__meta_puppetdb_parameter_servername] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115 Relabel from SD @roidelapluie
  60. 60. Recap @roidelapluie
  61. 61. • Simple to deploy • Behaves as you expect • Easy to reload Prometheus is automation Friendly @roidelapluie
  62. 62. • Easy password/cert rotation • Service discovery to keep up to date infra Prometheus is change friendly @roidelapluie
  63. 63. https://prometheus.io/community Prometheus is open, join us! @roidelapluie
  64. 64. Julien Pivotto @roidelapluie roidelapluie@inuits.eu Essensteenweg 31 2930 Brasschaat Belgium Contact: info@inuits.eu +32-3-8082105

×