Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Prometheus for the traditional datacenter


Published on

Prometheus is best known in cloud environment but did you know it is also a good fit for traditional datacenters?

Published in: Technology
  • Be the first to comment

Prometheus for the traditional datacenter

  1. 1. Prometheus for the traditional datacenter Julien Pivotto (@roidelapluie) Loadays April 22nd, 2018
  2. 2. user{name="roidelapluie"} 1 I like Open Source I like monitoring I like automation ... and all of that is my job
  3. 3. The DevOps principles: CAMS (a definition of DevOps) Culture Automation Measurement Sharing (Damon Edwards and John Willis, 2010
  4. 4. Culture Break down the silo's - make 1 team Enable direct communication Work towards a unique goal Share responsibilities
  5. 5. Automation Automate everything Infra as Code Release and deploy consistontly More about that later...
  6. 6. Measurement Measure all the things (even DEV/ACC/PIL/...) Business metrics & technical metrics Take the metrics into consideration to take decisions Do alerting right; avoid alert fatigue
  7. 7. Sharing Share Metrics, lessons learned Share success, celebrate failure Share with outside world as well, learn from the industry
  8. 8. We are in the cloud era. Here are some buzzwords for you cloud, API, openstack, devops, docker, bimodal, stateless, kubernetes, orchestration, automation, serverless, docker, humanops, ansible, continuous deployment, cri-o, jenkins, agile, docker, red hat, containers, virtualization, provisionning, monitoring, observability...
  9. 9. What is the cloud Scale Velocity Change
  10. 10. On Premise looks like the cloud Nowadays you have no choice. Scale Velocity Change
  11. 11. What are the needs Automation Scalability
  12. 12. TIME�TO�GO Bye bye all-in-one tools Bye bye tools that don't scale Bye bye tools you can not automate
  13. 13. We need deserve better tools Our customers ask us to respond fast, in seconds We make hundreds of operations per second What is your frequency... 5 minutes?�
  14. 14. Time to get better tools / protocols
  15. 15. Prometheus
  16. 16. Cloud Native Easy to configure, deploy, maintain Designed in multiple services Container ready Orchestration ready (dynamic config) Fuzziness
  17. 17. Data Centric A Metric in Prometheus has metadata: myql_global_status_handlers_total{handler="tmp_write"} 1122 And lots of function to filter, change, remove... those metadata while fetching them.
  18. 18. Open Source Apache 2.0 Go Support for multiple OS Many "exporters": ult-port-allocations
  19. 19. Performance Prometheus is designed to fetch data in an interval measured in SECONDS You can fine tune its memory usage and when it flushes to disk It can also adapt its scraping frequency dynamically
  20. 20. How does it work?
  21. 21. How does it work?
  22. 22. How does it work?
  23. 23. How does it work?
  24. 24. How does it work?
  25. 25. Exporters Exporters expose metrics with an HTTP API Bindings available for many languages Exporters do not save data ; they are not "proxies" and don't "cache" anything Which exporters for your datacenter?
  26. 26. OS node_exporter wmi_exporter
  27. 27. Apache apache_exporter grok_exporter
  28. 28. DNS bind_exporter blackbox_exporter
  29. 29. Network snmp_exporter netscaler_exporter
  30. 30. Databases sql_exporter sqlagent+prometheus-sql
  31. 31. Security Prometheus supports TLS client (also with authentication) We use it with traefik (reverse proxy in go with native metrics) We manage certs with ansible
  32. 32. Exploring Metrics
  33. 33. Exploring Metrics
  34. 34. Exploring Metrics
  35. 35. Exploring Metrics
  36. 36. PromQL mysql_global_status_commands_total
  37. 37. PromQL mysql_global_status_commands_total{command="select"}
  38. 38. PromQL mysql_global_status_commands_total {command=~"select|set_options"}
  39. 39. PromQL mysql_global_status_commands_total{command=~"select|se t_options"}
  40. 40. PromQL deriv(mysql_global_status_connections[5m])
  41. 41. PromQL {__name__=~".+innodb.+cache.*"} predict_linear(mysql_heartbeat_lag_seconds[5m],�60*2) sum(rate(mysql_global_status_commands_total{command=~" (commit|rollback)"}[5m]))�without�(command)
  42. 42. Storage Prometheus 2.x uses prometheus/tsdb 2 hours blocks ; later compacted in up to 10 days blocks
  43. 43. Alerting
  44. 44. A word about Prometheus vs Graphite Prometheus does not see a metric as an "event". Metrics are current value until they are replaced. You can not see when a metric has been included in Prometheus. For Events, Prometheus refers to Elasticsearch.
  45. 45. One tool does one job... Prometheus collects data Exporters expose data Grafana graphes data Alertmanager sends alerts
  46. 46. Alerting and recording rules (p) yaml files (p) Queries run at specific intervals (p) sent to alertmanager (a) receives (a) groups (a) inhibits (a) dispatch
  47. 47. Grafana Open Source (Apache 2.0) Web app Specialized in visualization Pluggable Multiple datasources: prometheus, graphite, influxdb... Has an API!
  48. 48. History of Grafana Grafana is a fork of Kibana 3 ; used to be JS- Driven. Now fully featured, requires a database, multi- projects/users support, etc...
  49. 49. Grafana and Prometheus Prometheus shipped its own consoles Now it recommends Grafana and deprecated its own consoles
  50. 50. Grafana Dashboards
  51. 51. Grafana Dashboards
  52. 52. Time Picker
  53. 53. Configure Prometheus in Grafana
  54. 54. Configure Prometheus in Grafana
  55. 55. Prometheus Dashboard
  56. 56. Creating Grafana Dashboards Takes time Requires deep knowledge of the tools Improved over time Easy to share (json + online library)
  57. 57. Conclusion Open Source Flexible and dynamic Low on resources Rich ecosystem For Cloud and On Premise
  58. 58. Julien Pivotto roidelapluie Inuits Contact I will give a prometheus workshop at!