Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

How to monitor your micro-service with Prometheus? Slide 1 How to monitor your micro-service with Prometheus? Slide 2 How to monitor your micro-service with Prometheus? Slide 3 How to monitor your micro-service with Prometheus? Slide 4 How to monitor your micro-service with Prometheus? Slide 5 How to monitor your micro-service with Prometheus? Slide 6 How to monitor your micro-service with Prometheus? Slide 7 How to monitor your micro-service with Prometheus? Slide 8 How to monitor your micro-service with Prometheus? Slide 9 How to monitor your micro-service with Prometheus? Slide 10 How to monitor your micro-service with Prometheus? Slide 11 How to monitor your micro-service with Prometheus? Slide 12 How to monitor your micro-service with Prometheus? Slide 13 How to monitor your micro-service with Prometheus? Slide 14 How to monitor your micro-service with Prometheus? Slide 15 How to monitor your micro-service with Prometheus? Slide 16 How to monitor your micro-service with Prometheus? Slide 17 How to monitor your micro-service with Prometheus? Slide 18 How to monitor your micro-service with Prometheus? Slide 19 How to monitor your micro-service with Prometheus? Slide 20 How to monitor your micro-service with Prometheus? Slide 21 How to monitor your micro-service with Prometheus? Slide 22 How to monitor your micro-service with Prometheus? Slide 23 How to monitor your micro-service with Prometheus? Slide 24 How to monitor your micro-service with Prometheus? Slide 25 How to monitor your micro-service with Prometheus? Slide 26 How to monitor your micro-service with Prometheus? Slide 27 How to monitor your micro-service with Prometheus? Slide 28 How to monitor your micro-service with Prometheus? Slide 29 How to monitor your micro-service with Prometheus? Slide 30 How to monitor your micro-service with Prometheus? Slide 31 How to monitor your micro-service with Prometheus? Slide 32 How to monitor your micro-service with Prometheus? Slide 33 How to monitor your micro-service with Prometheus? Slide 34 How to monitor your micro-service with Prometheus? Slide 35 How to monitor your micro-service with Prometheus? Slide 36 How to monitor your micro-service with Prometheus? Slide 37 How to monitor your micro-service with Prometheus? Slide 38 How to monitor your micro-service with Prometheus? Slide 39 How to monitor your micro-service with Prometheus? Slide 40 How to monitor your micro-service with Prometheus? Slide 41 How to monitor your micro-service with Prometheus? Slide 42 How to monitor your micro-service with Prometheus? Slide 43 How to monitor your micro-service with Prometheus? Slide 44 How to monitor your micro-service with Prometheus? Slide 45 How to monitor your micro-service with Prometheus? Slide 46 How to monitor your micro-service with Prometheus? Slide 47 How to monitor your micro-service with Prometheus? Slide 48 How to monitor your micro-service with Prometheus? Slide 49 How to monitor your micro-service with Prometheus? Slide 50 How to monitor your micro-service with Prometheus? Slide 51 How to monitor your micro-service with Prometheus? Slide 52 How to monitor your micro-service with Prometheus? Slide 53 How to monitor your micro-service with Prometheus? Slide 54 How to monitor your micro-service with Prometheus? Slide 55 How to monitor your micro-service with Prometheus? Slide 56 How to monitor your micro-service with Prometheus? Slide 57 How to monitor your micro-service with Prometheus? Slide 58 How to monitor your micro-service with Prometheus? Slide 59 How to monitor your micro-service with Prometheus? Slide 60 How to monitor your micro-service with Prometheus? Slide 61 How to monitor your micro-service with Prometheus? Slide 62 How to monitor your micro-service with Prometheus? Slide 63 How to monitor your micro-service with Prometheus? Slide 64 How to monitor your micro-service with Prometheus? Slide 65 How to monitor your micro-service with Prometheus? Slide 66 How to monitor your micro-service with Prometheus? Slide 67 How to monitor your micro-service with Prometheus? Slide 68 How to monitor your micro-service with Prometheus? Slide 69 How to monitor your micro-service with Prometheus? Slide 70 How to monitor your micro-service with Prometheus? Slide 71 How to monitor your micro-service with Prometheus? Slide 72 How to monitor your micro-service with Prometheus? Slide 73 How to monitor your micro-service with Prometheus? Slide 74 How to monitor your micro-service with Prometheus? Slide 75 How to monitor your micro-service with Prometheus? Slide 76 How to monitor your micro-service with Prometheus? Slide 77 How to monitor your micro-service with Prometheus? Slide 78 How to monitor your micro-service with Prometheus? Slide 79 How to monitor your micro-service with Prometheus? Slide 80
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

How to monitor your micro-service with Prometheus?

Download to read offline


How to monitor your micro-service with Prometheus? How to design metrics, what is USE and RED? Metrics for a REST service with Prometheus, AlertManager, and Grafana.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

How to monitor your micro-service with Prometheus?

  1. 1. How to monitor your micro-service with Prometheus? How to design the metrics? WOJCIECH BARCZYŃSKI - SMACC.IO | 2 OCTOBER 2018
  2. 2. ABOUT ME Lead So ware Developer - SMACC (FinTech/AI) Before: System Engineer i Developer Lyke Before: 1000+ nodes, 20 data centers with Openstack Point of view: Startups, fast-moving environment
  3. 3. WHY? MONOLIT ;)
  4. 4. WHY? MICROSERVICES ;)
  5. 5. OBSERVABILITY Monitoring Logging Tracing
  6. 6. OBSERVABILITY Go for Industrial Programming by Peter Bourgon
  7. 7. NOT A SILVER-BULLET but: Easy to setup Immediately value Suprisengly: the last one implemented
  8. 8. CENTRALIZED LOGGING Usually much too late Post-mortem Hard to find the needle Like a debugging vs testing
  9. 9. MONITORING Numbers Trends Dependencies + Actions
  10. 10. METRIC Name Label Value traefik_requests_total code="200", method="GET" 3001
  11. 11. MONITORING Demo app
  12. 12. MONITORING Example from couchbase blog
  13. 13. HOW TO FIND THE RIGHT METRIC?
  14. 14. HOW TO FIND THE RIGHT METRIC? USE RED
  15. 15. USE Utilization the average time that the resource was busy servicing work Saturation extra work which it can't service, o en queued Errors the count of error events Documented and Promoted by Berdan Gregg
  16. 16. USE Utilization: as a percent over a time interval: "one disk is running at 90% utilization". Saturation: Errors:
  17. 17. USE Utilization: Saturation: as a queue length. eg, "the CPUs have an average run queue length of four". Errors:
  18. 18. USE utilization: saturation: errors: scalar counts. eg, "this network interface drops packages".
  19. 19. USE traditionaly more instance oriented still useful in the microservices world
  20. 20. RED Rate How busy is your service? Error Errors Duration What is the latency of my service? .Tom Wilkie's guideline for instrumenting applications
  21. 21. RED Rate - how many request per seconds handled Error Duration (distribution)
  22. 22. RED Rate Error - how many request per seconds handled we failed Duration
  23. 23. RED Rate Error Duration - how long the requests took
  24. 24. RED Follow Four Golden Signals by Google SREs [1] Focus on what matters for end-users [1] Latency, Traffic, Errors, Saturation ( )src
  25. 25. RED Not recommended for: batch-oriented streaming services
  26. 26. PROMETHEUS
  27. 27. WHAT PROMETHEUS IS? Aggregation of time-series data Not an event-based system
  28. 28. PROMETHEUS STACK Prometheus - collect Alertmanager - alerts Grafana - visualize
  29. 29. PROMETHEUS Wide support for languages Metrics collected over HTTP Pull model (see scrape time), push-mode possible integration with k8s PromQL metrics/
  30. 30. METRICS IN PLAIN TEXT # HELP order_mgmt_audit_duration_seconds Multiprocess metric # TYPE order_mgmt_audit_duration_seconds summary order_mgmt_audit_duration_seconds_count{status_code="200"} 41. order_mgmt_audit_duration_seconds_sum{status_code="200"} 27.44 order_mgmt_audit_duration_seconds_count{status_code="500"} 1.0 order_mgmt_audit_duration_seconds_sum{status_code="500"} 0.716 # HELP order_mgmt_duration_seconds Multiprocess metric # TYPE order_mgmt_duration_seconds summary order_mgmt_duration_seconds_count{method="GET",path="/complex" order_mgmt_duration_seconds_sum{method="GET",path="/complex",s order_mgmt_duration_seconds_count{method="GET",path="/",status order_mgmt_duration_seconds_sum{method="GET",path="/",status_c order_mgmt_duration_seconds_count{method="GET",path="/complex" order_mgmt_duration_seconds_sum{method="GET",path="/complex",s
  31. 31. METRICS IN PLAIN TEXT # HELP go_gc_duration_seconds A summary of the GC invocation d # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 9.01e-05 go_gc_duration_seconds{quantile="0.25"} 0.000141101 go_gc_duration_seconds{quantile="0.5"} 0.000178902 go_gc_duration_seconds{quantile="0.75"} 0.000226903 go_gc_duration_seconds{quantile="1"} 0.006099658 go_gc_duration_seconds_sum 18.749046756 go_gc_duration_seconds_count 89273
  32. 32. EXPORTERS Mongodb Mysql Postgresql rabbitmq ... also Blackbox exporter examples: ,memcached psql
  33. 33. CLOUD-NATIVE PROJECTS INTEGRATION API BACKOFFICE 1 DATA WEB ADMIN BACKOFFICE 2 BACKOFFICE 3 API.DOMAIN.COM DOMAIN.COM/WEB BACKOFFICE.DOMAIN.COM ORCHESTRATOR PRIVATE NETWORKINTERNET API LISTEN (DOCKER, SWARM, MESOS...) - --web.metrics.prometheus
  34. 34. PROMETHEUS PromQL working with historams: rates: more complex: histogram_quantile(0.9, rate(http_req_duration_seconds_bucket[10m] rate(http_requests_total{job="api-server"}[5 irate(http_requests_total{job="api-server"} redict_linear() holt_winters()
  35. 35. PROMETHEUS PromQL Alarming: ALERT ProductionAppServiceInstanceDown IF up { environment = "production", app =~ ".+"} == 0 FOR 4m ANNOTATIONS { summary = "Instance of {{$labels.app}} is down", description = " Instance {{$labels.instance}} of app }
  36. 36. METRICS Counter - just up Gauge - up/down Histogram Summary
  37. 37. HISTOGRAM traefik_duration_seconds_bucket {method="GET,code="200"} {le="0.1"} 2229 {le="0.3"} 107 {le="1.2"} 100 {le="5"} 4 {le="+Inf"} 2 _sum _count 2342
  38. 38. SUMMARY http_request_duration_seconds {quantile="0.5"} 4 {quantile="0.9"} 5 http_request_duration_seconds_sum 9 http_request_duration_seconds_count 3
  39. 39. HISTOGRAM / SUMMARY: Latency of services Request or Request size Histograms recommended
  40. 40. RED Metric + PromQL: sum(irate(order_mgmt_duration_seconds_count {job=~".*"}[1m])) by (status_code)
  41. 41. METRIC AND LABEL NAMING Best practises on : service name is your prefix user_ state the bae unit _seconds and _bytes metric names
  42. 42. PROMETHEUS + PYTHON
  43. 43. PYTHON CLIENT client_python Counter Gauge Summary Histogram
  44. 44. DEMO: SIMPLE REST SERVICE ----------- --------------- | App | ----->| Audit Service | | OrderMgmt | | | ----------- --------------- | | --------------- -------->| Database | ---------------
  45. 45. DEMO: - service - prometheus - grafana - alertmanager http://127.0.0.1:8080 http://127.0.0.1:8080/metrics/ http://127.0.0.1:9090 http://127.0.0.1:3000 http://127.0.0.1:9093
  46. 46. DEMO ☁ src ⚡ make docker_run ☁ src ⚡ docker ps CONTAINER ID IMAGE PORTS 5f824d1bc789 grafana/grafana:5.2.2 0.0.0.0:3000->3 d681a414a8b6 prom/prometheus:v2.1.0 0.0.0.0:9090->9 ea0d9233e159 prom/alertmanager:v0.15.1 0.0.0.0:9093->9
  47. 47. DEMO: GENERATE CALLS With error injection ☁ src ⚡ make srv_wrk_random
  48. 48. GRAFANA
  49. 49. PROMETHEUS
  50. 50. PROMETHEUS
  51. 51. PROMETHEUS
  52. 52. KILL THE SERVICE ☁ src ⚡ docker stop pycode-prom-flask_order-manager_1
  53. 53. PROMETHEUS
  54. 54. PROMETHEUS
  55. 55. ALERTMANAGER
  56. 56. GRAFANA
  57. 57. GRAFANA
  58. 58. GITHUB
  59. 59. DEMO: PYTHON CODE Metric Definition Metric Collection
  60. 60. DEMO: SIMULATING CALLS make docker_build make docker_run
  61. 61. DEMO: SIMULATING CALLS curl 127.0.0.1:8080/hello curl 127.0.0.1:8080/world curl 127.0.0.1:8080/complex
  62. 62. DEMO: SIMULATING CALLS curl 127.0.0.1:8080/complex?is_srv_error=True curl 127.0.0.1:8080/complex?is_db_error=True curl 127.0.0.1:8080/complex?db_sleep=3&srv_sleep=2 # load generator make srv_wrk_random
  63. 63. DEMO: PROM STACK Prometheus dashboard and config AlertManager dashboard and config Simulate the successful and failed calls Simple Queries for rate
  64. 64. PromQL sum(irate(order_mgmt_duration_seconds_count{job=~".*"}[1m])) by (status_code)
  65. 65. PromQL order_mgmt_duration_seconds_sum{job=~".*"} or order_mgmt_database_duration_seconds_sum{job=~".*"} or order_mgmt_audit_duration_seconds_sum{job=~".*"}
  66. 66. BEST PRACTISES Py: higher load requires muliprocessing Start simple (up/down), later add more complex rules Sum over Summaries with Q leads to incorrect results, see prom docs
  67. 67. SUMMARY Monitoring saves your time Checking logs Kibana vs Grafana is like debuging vs having tests Logging -> high TCO
  68. 68. SUMMARY Testing Testing in Production Smoke tests / Acceptance Tests Monitoring Simple (up/down + KPI) Monitoring Explorations / Logs
  69. 69. THANK YOU
  70. 70. QUESTIONS? ps. We are hiring.
  71. 71. BACKUP SLIDES
  72. 72. PROMETHUS - LABELS IN ALERT RULES The labels are propageted to alert rules: see ../src/prometheus/etc/alert.rules ALERT ProductionAppServiceInstanceDown IF up { environment = "production", app =~ ".+"} == 0 FOR 4m ANNOTATIONS { summary = "Instance of {{$labels.app}} is down", description = " Instance {{$labels.instance}} of app }
  73. 73. ALERTMANGER - LABELS IN ALERTMANGER Call somebody if the label is severity=page: see ../src/alertmanager/*.conf --- group_by: [cluster] # If an alert isn't caught by a route, send it to the pager. receiver: team-pager routes: - match: severity: page receiver: team-pager receivers: - name: team-pager opsgenie_configs: - api_key: $API_KEY teams: example_team
  74. 74. PROMETHEUS - PUSH MODEL See: Good for short living jobs in your cluster. https://prometheus.io/docs/instrumenting/pushing/
  75. 75. DESIGNING METRIC NAMES Which one is better? request_duration{app=my_app} my_app_request_duration see documentation on best practises for andmetric naming instrumentation
  76. 76. DESIGNING METRIC NAMES Which one is better? order_mgmt_db_duration_seconds_sum order_mgmt_duration_seconds_sum{dep_name='db
  77. 77. PROMETHEUS + K8S = <3 LABELS ARE PROPAGATED FROM K8S TO PROMETHEUS
  78. 78. INTEGRATION WITH PROMETHEUS cat memcached-0-service.yaml https://github.com/skarab7/kubernetes-memcached --- apiVersion: v1 kind: Service metadata: name: memcached-0 labels: app: memcached kubernetes.io/name: "memcached" role: shard-0 annotations: prometheus.io/scrape: "true" prometheus.io/scheme: "http" prometheus.io/path: "metrics" prometheus.io/port: "9150" spec:

How to monitor your micro-service with Prometheus? How to design metrics, what is USE and RED? Metrics for a REST service with Prometheus, AlertManager, and Grafana.

Views

Total views

568

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

29

Shares

0

Comments

0

Likes

0

×