Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Prometheus on Kubernetes

63 views

Published on

Der Talk auf der Konferenz „Talk4Nerds“ der R+V Versicherung bot eine Einführung in Prometheus als Monitoring-Lösung. Dabei ging inovexler Christoph auf die Anforderungen an ein modernes Monitoring Tool ein, wie Prometheus diesen Anforderungen entspricht und warum es zum defacto Standard im Kubernetes-Umfeld geworden ist. Abschließen beleuchtete Christoph die Non-Goals und wie man diese mit zusätzlichen Tools dennoch erreichen kann.
Speaker: Christoph Petrausch (inovex)
Event: Talk4Nerds
Datum: 29.04.2019

Mehr Tech-Vorträge: inovex.de/vortraege
Mehr Tech-Artikel: inovex.de/blog

Published in: Software
  • Be the first to comment

  • Be the first to like this

Prometheus on Kubernetes

  1. 1. Prometheus on Kubernetes Christoph Petrausch 29.4.2019
  2. 2. ● Golang ● Kubernetes Christoph Petrausch Linux Systems Engineer 2
  3. 3. 3 Umgebung https://www.flickr.com/photos/noaaphotolib/5578039998/
  4. 4. 4 Volatile Infrastruktur https://www.flickr.com/photos/burgtender/4052169876/
  5. 5. 5 Service Discovery Pods, Nodes, Services A P I targets.yml A P I VMs DNS SRV, A, AAAA
  6. 6. 6 Simple https://www.flickr.com/photos/nicokaiser/4667377944
  7. 7. 7 Metriken sammeln und speichern tsdb Target GET /metrics
  8. 8. 8 Hochverfügbarkeit https://www.flickr.com/photos/rob-sinclair/2553517053
  9. 9. 9 Doppelt hält besser Target Target Target . . .
  10. 10. 10 Skalierbar https://www.flickr.com/photos/rob-sinclair/2553517053
  11. 11. 11 Sharding Prometheus Target Target Target . . . Target
  12. 12. 12 Services https://www.flickr.com/photos/bigshock/363611248
  13. 13. 13
  14. 14. 14 Instrumentation https://www.flickr.com/photos/nasa_ice/15163970050
  15. 15. 15 Exporter Operating System Kubelet, Kube API, Kube Scheduler, etc.. Application Database mysql-exporter node-exporter
  16. 16. 16 Metrik Format http_count { } 731321 Name Wert
  17. 17. 17 Metrik Format # HELP Total Number of HTTP Requests # TYPE http_count counter http_count { } 731321 Dokumentation
  18. 18. 18 Metrik Format # HELP Total Number of HTTP Requests # TYPE http_count counter http_count { handler="/ui/static", instance="website-jas1kg1d-adjkm1", job="pods", service="website" } 731321 Labels
  19. 19. 19 Metrik Format # HELP node_disk_discard_time_seconds_total This is the total number of seconds spent by all discards. # TYPE node_disk_discard_time_seconds_total counter node_disk_discard_time_seconds_total{device="dm-0"} 0 node_disk_discard_time_seconds_total{device="dm-1"} 0 node_disk_discard_time_seconds_total{device="nvme0n1"} 0 node_disk_discard_time_seconds_total{device="sda"} 0 # HELP node_disk_discarded_sectors_total The total number of sectors discarded successfully. # TYPE node_disk_discarded_sectors_total counter node_disk_discarded_sectors_total{device="dm-0"} 0 node_disk_discarded_sectors_total{device="dm-1"} 0 node_disk_discarded_sectors_total{device="nvme0n1"} 0 node_disk_discarded_sectors_total{device="sda"} 0 # HELP node_cpu_seconds_total Seconds the cpus spent in each mode. # TYPE node_cpu_seconds_total counter node_cpu_seconds_total{cpu="0",mode="idle"} 100327.11 node_cpu_seconds_total{cpu="0",mode="iowait"} 167.2 node_cpu_seconds_total{cpu="0",mode="irq"} 1211.28 node_cpu_seconds_total{cpu="0",mode="nice"} 5762.09
  20. 20. Counter 20 Zeit
  21. 21. Gauge 21 Zeit
  22. 22. SLAs ● 99,9% aller Requests kürzer als 50ms ● 99,9% Verfügbarkeit ● 99,99% aller Request müssen erfolgreich sein 22
  23. 23. Long Tail 23
  24. 24. Long Tail 24 Average 99,9%
  25. 25. PromQL ● Angelehnt an SQL ● Aggregation über Labels ● Mathematische Funktionen ● Range und Offset Selektoren 25
  26. 26. PromQL Aggregationen Query: sum by (service)(http_count{handler="/ui/static"} ) Beispiel: http_count{handler="/ui/static",service="a",i="a"} 5 http_count{handler="/ui/static",service="a",i="y"} 5 http_count{handler="/ui/static",service="b",i="z"} 5 Resultat: {service="a"} =10, {service="b"} = 5 26
  27. 27. PromQL Funktionen Request Rate pro Instanz: rate( http_count{handler="/ui/static"}[1m] ) Aggregiert für jeden Service: sum by (service) ( rate( http_count{handler="/ui/static"}[1m])) 27
  28. 28. Histogram 28
  29. 29. PromQL Offset Differenz jetzt vs gestern: http_count – ( http_count offset 24h) 29
  30. 30. PromQL Alert Rules groups: - name: website rules: - alert: HighErrorRate expr: sum by (service)(rate(http_error{handler="/ui/static"}[1m])) > 0.5 for: 10m labels: severity: page annotations: summary: High request latency 30
  31. 31. PromQL Anwendung ● Grafana Dashboards ● Alert Rules ● Ad Hoc Queries ○ UI ○ API 31
  32. 32. Prometheus Ökosystem Grafana 32
  33. 33. Prometheus Ökosystem Alertmanager 33 ● Alert Routing ● Silences ● Inhibition ● Grouping
  34. 34. 34 1000 feet Architektur Pods, Nodes, Services A P I Target Target Target Target ... Grafana Alertmanager
  35. 35. Prometheus Offene Probleme 35 ● Langzeitspeicherung ● Aggregierung über mehrere Prometheus ● Deduplizierung bei Sharding
  36. 36. 36 Thanos T T T TSidecar Sidecar T T T TSidecar Sidecar Blob Storage Alertmanager Thanos Query Thanos Ruler Thanos Compact Grafana
  37. 37. Vielen Dank Christoph Petrausch Twitter: @hikhvar Mail: christoph.petrausch@ino vex.de
  38. 38. Quellen 38

×