Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Code instrumentation in Py with Prometheus and Grafana

74 views

Published on

Code instrumentation in Py with Prometheus and Grafana

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Code instrumentation in Py with Prometheus and Grafana

  1. 1. Code instrumentation in Py with Prometheus & Grafana Francois SCHMIDTS Vlad ZLOTEANU DOLEAD
  2. 2. Contents - Prometheus & Grafana - Code instrumentation example - 3 use cases - (Dolead’s) push client
  3. 3. Prometheus + Grafana = ❤ Metrics retrieval Target 2 Target 1 Target N Querying PromQL TimeSeriesDB - Multidimensional data model Exporter Grafana Pulls Queries Alert Manager Instrum entation Pulls Dashboard
  4. 4. Prometheus - TSDB - Open Source - Incubated by CNCF (After Kubernetes) - Adapted to VM/containers monitoring - Autodiscovery - Pull model - Multidimensional data - Includes alerting
  5. 5. Grafana - OS metric analytics / visualisation - multiple providers: CloudWatch, Prometheus, InfluxDb, ES, .. - multiple dashboards already available - in coop with Prometheus exporters
  6. 6. Node exporter + Grafana dashboard
  7. 7. MongoDB exporter + Grafana dashboard
  8. 8. Case study: RR Stats import ● Metric: Duration of execution Labels ● Result ○ success/failure ● Source ○ Google Ads, Fb Ads, Bing Ads, Taboola, etc. ● Category ○ Account, Campaign, Keyword, .. ○ Today vs Past ● Node
  9. 9. Instrumentation - Code example
  10. 10. 1. Debugging / Gain insight "Where does the problem come from / What is going on?" ● Segment by sources (Google Ads, Fb Ads, Bing Ads, Taboola, etc.) ○ Did they slow down? Error rate gone up? Are they unavailable? ● Segment by category ○ Did we introduce a bug on that code? ● Segment by node ○ do I have a problem on that node?
  11. 11. All successful stats downloads
  12. 12. All successful stats downloads - vs Bing
  13. 13. 1. Debugging / Gain insight Combination with external data / corroboration - deployments - CPU/Ram/Load on the node - “can we corroborate with a slow query increase in Mongodb?”
  14. 14. Example: Sync activity vs machine load
  15. 15. 2. Alerting - Grafana alerts: - alerts based on configured data sources - Prometheus AlertManager: - can alert based on PromQL query - Infrastructure as Code Instrument now, decide later
  16. 16. 2. Alerting - Example
  17. 17. 2. Alerting - Graph
  18. 18. 3. Trends / Scale ● Trends over time, drive scale (technical) / business decisions ○ Capacity planning ○ "Will I (when will I) have a problem in the future?" ● SLA / QoS
  19. 19. And all this is available thanks to this code:
  20. 20. Push (vs pull) - Async, short-lived processes - The prometheus way => send metrics to a push gateway - One push gateway per process ! - More infrastructure to setup - Our way, the prometheus-distributed-client => send metrics to a database - Available from everywhere - Consistent in case of concurrent calls - Use either
  21. 21. Conclusion - Try to always instrument your code - Limite the cardinality of the metrics you use - Make nice graphs ! - Use Our lib : https://github.com/dolead/prometheus-distributed-client
  22. 22. Thank you! Questions?

×