Successfully reported this slideshow.

Time series denver an introduction to prometheus

1

Share

1 of 38
1 of 38

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Time series denver an introduction to prometheus

  1. 1. An Introduction to Prometheus Time Series Denver - May 30, 2018
  2. 2. Introduction ● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation ○ “Simplifying Kubernetes Visibility” ● bob@freshtracks.io ● @bob_cotton ● Father, Fly Fisher & Avid Homebrewer
  3. 3. Agenda ● What is a Cloud Native Application? ● Cloud Native Application Challenges ● The 5 Pillars of Monitoring ● An Introduction to Prometheus ● What FreshTracks Provides
  4. 4. What is a Cloud Native Application?
  5. 5. Cloud Native Application ● Follows 12 Factor Application Practices ● Packaged into containers ● Follows a micro-service architecture ● Managed by a Container Orchestration ○ Kubernetes, Docker Swarm, Mesos ● Usually deployed on dynamic infrastructure ○ VMWare ○ Cloud providers ● Application lifecycle allows for ○ Auto-provisioning ○ Auto-scaling ○ Auto-redundancy
  6. 6. Cloud Native Applications Challenges
  7. 7. Cloud Native Challenges ● Containers are ephemeral ○ Scheduled on any node in the cluster ○ Move Frequently on restarts and deployments ● Kubernetes needs to be monitored ● Kubernetes brings additional complexities ○ Resource Quotas ○ Pod and Cluster Scaling ● Challenges traditional tools
  8. 8. 5 Pillars of Monitoring
  9. 9. The 5 Pillars of Monitoring Metrics and Alerting Log Analytics Distributed Tracing Application Performance Monitoring Real User Monitoring
  10. 10. Enter Prometheus
  11. 11. Prometheus ● Started in 2012 at SoundCloud by ex-Google Engineers ○ Open Sourced in 2015 ● Patterned after “BorgMon” - Google’s Container monitoring system ● Second project accepted into the CNCF after Kubernetes ● Adoption surge is tracking Kubernetes ○ 63% of teams using Kubernetes use Prometheus
  12. 12. Prometheus Major Features ● Label/value based time series data model ● “Pull based” metrics collection ● Service discovery mechanism ● Simple metrics format with a rich set of “exporters” ● Extremely high-performance TSDB ● Extensive query language - PromQL ● Alert Manager ● Easily installable from Helm ○ Single, statically linked binary ● Open Source Grafana used for visualization
  13. 13. Time Series Data Model <identifier> → [(t0, v0), (t1, v1), (t2, v2) …] Identifier is a collection of label/value pairs Time stored as int64 - Millis since the epoch Values stored as float64 Efficient storage on disk -- 1.3 bytes/sample
  14. 14. Label/Value Based Data Model ● Graphite/StatsD ○ apache.192-168-5-1.home.200.http_request_total ○ apache.192-168-5-1.home.500.http_request_total ○ apache.192-168-5-1.about.200.http_request_total ● Prometheus ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”} ● Selecting Series ○ *.*.home.200.*.http_requests_total ○ http_requests_total{status=”200”, path=”/home”}
  15. 15. Client Data Model ● Counters ○ Always go up or get reset to 0 ● Gauge ○ Tracks a real value e.g. temperature ● Histogram and Summary ○ Used for percentiles
  16. 16. Prometheus Service Discovery and Target Scrape Prometheus K8s API Server TSDB Kublet (cAdvisor) node-exporter kube_state_metrics App containers other exporters node_exporter App containers Kublet (cAdvisor) Service Discovery
  17. 17. Prometheus Exposition Format and Exporters ● The Prometheus exposition format - Text over http. Simple, human readable ● Supported by Sysdig and the TICK collector ○ Efforts to make it a standard ● Close to 100 exporters for various technologies ● The jmx_exporter can cover any Java/JMX application ● https://prometheus.io/docs/instrumenting/exporters/ Official Exporters: ● node_exporter ● jmx_exporter ● snmp_exporter ● haproxy_exporter ● cloudwatch_exporter ● collectd_exporter ● mysql_exporter ● memcached_exporter
  18. 18. Querying Series with PromQL ● PromQL is a functional query language. Nothing like SQL rate(http_requests_total[5m]) select job, instance, path, status rate(value, 5m) FROM http_requests_total;
  19. 19. Querying Series with PromQL Calculate a ratio of website hits to failures: sum(rate(http_requests_total{status=”500”}[5m])) by (path) / sum(rate(http_requests_total[5m])) by (path) {path=”/home”} 0.014 {path=”/about”} 0.027
  20. 20. Graphing
  21. 21. Dashboards with Grafana
  22. 22. @bob_cotton@bob_cotton Labels, Re-Label and Recording Rules Oh My...
  23. 23. Label/Value Based Data Model ● Graphite/StatsD ○ apache.192-168-5-1.home.200.http_request_total ○ apache.192-168-5-1.home.500.http_request_total ○ apache.192-168-5-1.about.200.http_request_total ● Prometheus ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”} ● Selecting Series ○ *.*.home.200.*.http_requests_total ○ http_requests_total{status=”200”, path=”/home”}
  24. 24. @bob_cotton Kubernetes Labels ● Kubernetes gives us labels on all the things ● Our scrape targets live in the context of the K8s labels ○ This comes from service discovery ● We want to enhance the scraped metric labels with K8s labels ● This is why we need relabel rules in Prometheus
  25. 25. @bob_cotton K8s API Server TSDB Scrape Target Service Discovery Prometheus 0="{__address__ 300.196.17.41}" 1="{__meta_kubernetes_namespace default}" 2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}" 3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}" 4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}" 5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu request for container prometheus-configmap-reload; cpu request for container data-sidecar}" 6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}" 7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}" 8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}" 9="{__meta_kubernetes_pod_host_ip 172.20.42.119}" 10="{__meta_kubernetes_pod_ip 100.96.17.41}" 11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}" 12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}" 13="{__meta_kubernetes_pod_label_run data-sidecar}" 14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}" 15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}" 16="{__meta_kubernetes_pod_ready false}" 17="{__metrics_path__ /metrics}" 18="{__scheme__ http}" 19="{job ftio-data-sidecar-calc}" <relabel_config> {__address__ 300.196.17.41:8077} {__scheme__ http} {__metrics_path__ /metrics} {job ftio-data-sidecar-calc} {kubernetes_namespace default} {container_name prometheus-configmap-reload} http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”} = 5439 http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”, instance=”300.196.17.41:8077”, job=”ftio-data-sidecar-calc”, kubernetes_namespace=”default”, container_name=”prometheus-configmap-reload”, } = 5439 <metric_relabel_config>
  26. 26. Recording Rules - Derivative Series ● New series can be generated by querying existing series and storing them path:request_failures_per_requests:ratio_rate5m = sum(rate(http_requests_total{status=”500”}[5m])) by (path) sum(rate(http_requests_total[5m])) by (path)
  27. 27. High Availability Prometheus Prometheus
  28. 28. Federation Prometheus Prometheus Prometheus Prometheus Prometheus Prometheus Prometheus Prometheus Subset of Metrics
  29. 29. Long Term Storage and External Integrations Prometheus remote_write ● AppOptics: write ● Chronix: write ● Cortex: read and write ● CrateDB: read and write ● Elasticsearch: write ● Gnocchi: write ● Graphite: write ● InfluxDB: read and write ● OpenTSDB: write ● PostgreSQL/TimescaleD B: read and write ● SignalFx: write remote_read
  30. 30. Alerting
  31. 31. Alert Definition ALERT <alert name> EXPR <expression> [ FOR <duration> ] [ LABELS <label set> ] [ ANNOTATIONS <labelset> ] ALERT: IngesterCrowding EXPR: count by(ft_cluster, node) (cortex_ingester_ingested_samples_total) > 1 FOR: 30m LABELS: severity: critical ANNOTATIONS: description: https://github.com/Fresh-Tracks/gke-configs/blob/master /docs/alerts.md#ingestercrowding summary: Node {{ $labels.node }} is hosting {{ $value }} ingester pods
  32. 32. Alert Manager ● Deduplication ● Grouping ● Routing ● Suppression
  33. 33. Alert Manager Prometheus Prometheus Alert Manager Alert Manager PagerDuty VictorOps Slack
  34. 34. FreshTracks.io Simplifying Kubernetes Visibility
  35. 35. Filling the Gaps ● A small Kubernetes cluster generate > 500K unique samples ○ Which metrics are important? ● Performance of any one container is easy ○ How is the whole microservice behaving? Node? Cluster? ● Prometheus has no anomaly detection ● Dashboard creation is tedious, even if you know what to watch ● How is my service behaving in the context of the cluster? ○ How do node/container/application metrics correlate to each other?
  36. 36. Kubernetes Hierarchy Visibility Namespace Workload Pod Container (Workload can be a deployment, replicaSet, statefulSet, daemonSet or similar)
  37. 37. Demo
  38. 38. Thanks! We’re Hiring!

×