Advertisement

Prometheus and Docker (Docker Galway, November 2015)

Founder at Robust Perception
Nov. 25, 2015
Advertisement

More Related Content

Advertisement

More from Brian Brazil(20)

Advertisement

Prometheus and Docker (Docker Galway, November 2015)

  1. Prometheus and Docker Monitoring and Management Brian Brazil Founder
  2. Who am I? Engineer passionate about running software reliably in production. ● TCD CS Degree ● Google SRE for 7 years, working on high-scale reliable systems such as Adwords, Adsense, Ad Exchange, Billing, Database ● Boxever TL Systems&Infrastructure, applied processes and technology to let allow company to scale and reduce operational load ● Contributor to many open source projects, including Prometheus, Ansible, Python, Aurora and Zookeeper. ● Founder of Robust Perception, making scalability and efficiency available to everyone
  3. Why monitor? ● Know when things go wrong ○ To call in a human to prevent a business-level issue, or prevent an issue in advance ● Be able to debug and gain insight ● Trending to see changes over time, and drive technical/business decisions ● To feed into other systems/processes (e.g. QA, security, automation)
  4. Common Monitoring Challenges Themes common among companies I’ve talk to: ● Monitoring tools are limited, both technically and conceptually ● Tools don’t scale well and are unwieldy to manage ● Operational practices don’t align with the business For example: Your customers care about increased latency and it’s in your SLAs. You can only alert on individual machine CPU usage. Result: Engineers continuously woken up for non-issues, get fatigued
  5. Fundamental Challenge is Limited Visibility
  6. Prometheus Inspired by Google’s Borgmon monitoring system. Started in 2012 by ex-Googlers working in Soundcloud as an open source project, mainly written in Go. Publically launched in early 2015, and continues to be independent of any one company. Over 100 companies have started relying on it since then.
  7. What does Prometheus offer? ● Inclusive Monitoring ● Manageable and Reliable ● Easy to integrate with ● Dashboards ● Powerful data model ● Powerful query language ● Efficient ● Scalable
  8. Services have Internals
  9. Monitor the Internals
  10. Monitor as a Service, not as Machines
  11. Inclusive Monitoring Don’t monitor just at the edges: ● Instrument client libraries ● Instrument server libraries (e.g. HTTP/RPC) ● Instrument business logic Library authors get information about usage. Application developers get monitoring of common components for free. Dashboards and alerting can be provided out of the box, customised for your organisation!
  12. Python Instrumentation Example pip install prometheus_client from prometheus_client import Summary, start_http_server REQUEST_DURATION = Summary('request_duration_seconds', 'Request duration in seconds') @REQUEST_DURATION.time() def my_handler(request): pass # Your code here start_http_server(8000)
  13. Manageable and Reliable Core Prometheus server is a single binary. Doesn’t depend on Zookeeper, Consul, Cassandra, Hadoop or the Internet. Only requires local disk (SSD recommended). No potential for cascading failure. Pull based, so easy to on run a workstation for testing and rogue servers can’t push bad metrics. Advanced service discovery finds what to monitor.
  14. Running Prometheus with Docker Docker images at https://hub.docker.com/r/prom/ To run Prometheus: docker run -p 9090:9090 prom/prometheus:latest This monitors itself. http://localhost:9090/consoles/prometheus.html has a console.
  15. Easy to integrate with Many existing integrations: Java, JMX, Python, Go, Ruby, .Net, Machine, Cloudwatch, EC2, MySQL, PostgreSQL, Haskell, Bash, Node.js, SNMP, Consul, HAProxy, Mesos, Bind, CouchDB, Django, Mtail, Heka, Memcached, RabbitMQ, Redis, RethinkDB, Rsyslog, Meteor.js, Minecraft... Graphite, Statsd, Collectd, Scollector, Munin, Nagios integrations aid transition. It’s so easy, most of the above were written without the core team even knowing about them!
  16. Cadvisor natively supports Prometheus Run Cadvisor: docker run -v /var/run/:/var/run -v /sys:/sys -p 8080:8080 google/cadvisor http://localhost:8080/metrics will have metrics for all your containers
  17. Service Discovery It’s vital to know where each of your instances is meant to be running. Systems such as Docker Swarm spread applications across machines, so we need some form of service discovery to find them. Both Swarm and Prometheus support Consul, so let’s try that.
  18. Setting up Swarm on localhost (1) MY_IP=1.2.3.4 # - Run consul docker run -d --name=consul --net=host gliderlabs/consul-server -bootstrap # - Pass the docker daemon -H tcp://0.0.0.0:2375 so it’ll do TCP # - Register services in Consul docker run -d --net=host -v /var/run/docker.sock:/tmp/docker.sock gliderlabs/registrator consul://localhost:8500 # - Join the swarm docker run -d swarm join --addr=$MY_IP:2375 consul://$MY_IP:8500/swarm
  19. Setting up Swarm on localhost (2) # - Run some machine monitoring docker run -d -v /var/run/:/var/run -v /sys:/sys -p 8080:8080 google/cadvisor docker run -d -v /:/rootfs:ro -p 9100:9100 --net=host -e SERVICE_NAME=node prom/node-exporter # - Start the Swarm Manager docker run -d -p 2376:2375 swarm manage consul://$MY_IP:8500/swarm # - Check it’s working docker -H tcp://127.0.0.1:2376 info
  20. Prometheus configuration to use Consul scrape_configs: - job_name: 'consul' consul_sd_configs: - server: 127.0.0.1:8500 relabel_configs: - source_labels: [__meta_consul_service] regex: (.*) # Default next release replacement: $1 # Default next release target_label: job
  21. Run Prometheus docker run -d --net=host -p 9090:9090 -e SERVICE_NAME=prometheus robustperception/prometheus_local_consul Visit http://localhost:9090/status to see it running
  22. Automatic service discovery!
  23. Dashboards
  24. Many Dashboarding Options ● Grafana ○ Latest release has full Prometheus support ● Promdash ○ Prometheus-specific Ruby-on-Rails dashboarding solution ○ Direction is moving towards Grafana instead ● Console templates ○ Templating language inside Prometheus ○ Good for having your dashboards checked in, and for fully-custom dashboards ● Expression Browser ○ Included in Prometheus, good for ad-hoc debugging ● Roll your own ○ JSON API
  25. Powerful Data Model All metrics have arbitrary multi-dimensional labels. No need to force your model into dotted strings. Can aggregate, cut, and slice along them. Supports any double value, labels support full unicode.
  26. Example: Labels from Node Exporter
  27. Powerful Query Language Can multiply, add, aggregate, join, predict, take quantiles across many metrics in the same query. Can evaluate right now, and graph back in time. Answer questions like: ● What’s the 95th percentile latency in the European datacenter? ● How full will the disks be in 4 hours? ● Which services are the top 5 users of CPU? Can alert based on any query.
  28. Example: Top 5 Docker images by CPU topk(5, sum by (image)( rate(container_cpu_usage_seconds_total{ id=~"/system.slice/docker.*"}[5m] ) ) )
  29. Efficient Instrumenting everything means a lot of data. Prometheus is best in class for lossless storage efficiency, 3.5 bytes per datapoint. A single server can handle: ● millions of metrics ● hundreds of thousands of datapoints per second
  30. Scalable Prometheus is easy to run, can give one to each team in each datacenter. Federation allows pulling key metrics from other Prometheus servers. When one job is too big for a single Prometheus server, can use sharding+federation to scale out. Needed with thousands of machines.
  31. Final Thought: Instrument Once, Use Everywhere When an application is instrumented, tends to be limited in which monitoring systems it supports. You’re left with a choice: Have N monitoring systems, or run extra services to act as shims to translate. Prometheus clients aren’t just limited to outputting metrics in the Prometheus format, can also output to Graphite and other monitoring systems. Prometheus clients can also take in metrics from other monitoring systems. Prometheus clients can be your instrumentation and metrics interchange, even when not using Prometheus itself!
  32. What do we do? Robust Perception provides consulting and training to give you confidence in your production service's ability to run efficiently and scale with your business. We can help you: ● Decide if Prometheus is for you ● Manage your transition to Prometheus and resolve issues that arise ● With capacity planning, debugging, infrastructure etc. We are proud to be among the core contributors to the Prometheus project.
  33. Resources Official Project Website: prometheus.io Official Mailing List: prometheus-developers@googlegroups.com Demo: demo.robustperception.io Dockerfiles: https://github.com/RobustPerception/docker_examples Robust Perception Website: www.robustperception.io Queries: prometheus@robustperception.io
Advertisement