Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Monitoring in 2017 - TIAD Camp Docker

932 views

Published on

by Charly Fontaine, Datadog
TIAD Camp Docker 6 Octobre 2017

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Monitoring in 2017 - TIAD Camp Docker

  1. 1. Monitoring in 2017 Challenges in monitoring containers, and dynamic infrastructure. TIAD Oct 6, 2017 Charly Fontaine Software Engineer - Containers team
 Datadog
  2. 2. CharlyF [charly@datadoghq.com] Name: Charly Fontaine Role: Software Engineer 
 Interests: * Containerized Infrastructures * Monitoring all the things * Motorbikes
  3. 3. • SaaS based infrastructure and app monitoring • Open Source Agent • Time series data (metrics and events) • Processing nearly a trillion data points per day • Intelligent Alerting • We’re hiring! (www.datadoghq.com/careers/) Datadog Overview
  4. 4. Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches, Queues and more... Monitor Everything
  5. 5. $ cat ~/.plan 1. Intro: The Importance of Monitoring 2. The Challenge: Monitoring Dynamic Infrastructure 3. Finding the Signal: How do we know what to monitor? 4. Implementation: Applying it to Containerized Workloads 5. Demo: Monitoring of a containerized web app deployment
  6. 6. Collecting data is cheap;
 not having it when you need it can be expensive
  7. 7. Instrument all the things!
  8. 8. Sharing Using and Sharing the same metrics and measurements across teams is key to avoiding misunderstandings.
  9. 9. Why do we focus on Docker and Containers?
  10. 10. Source: http://bit.ly/1RQRsXW
  11. 11. When the choice of technology is determined by what is popular on HackerNews that week. Hacker News Driven Development
  12. 12. https://www.datadoghq.com/docker-adoption/
  13. 13. Docker Adoption Growth We’ve see 5x increase of Docker adoption over the last year.
  14. 14. Source: Datadog
  15. 15. Source: http://bit.ly/1qFylWK
  16. 16. Open Questions • Where is my container running? • What is the capacity of my cluster? • What’s the total throughput of my app? • What’s its response time per tag? (app, version, region) • What’s the distribution of 5xx error per container?
  17. 17. More Details at: http://www.datadoghq.com/blog/monitoring-101-alerting/
  18. 18. Monitoring VS Observing
  19. 19. Examples: NGINX - Metrics Resource Metrics:
 • Disk I/O • Memory • CPU • Queue Length Work Metrics: 
 • Requests Per Second • Request Time • Error Rates (4xx or 5xx) • Success (2xx)
  20. 20. Examples: NGINX - Events • Configuration Change • Code Deployment • Service Started / Stopped
  21. 21. Examples: Events
  22. 22. What to demand from our monitoring tooling?
  23. 23. Cryptic Alerts W H A T ?
  24. 24. EVERY ALERT MUST BE ACTIONABLE
  25. 25. Query Based Monitoring “What’s the average throughput of application:nginx per version ?” “Alert me when one of my pod from replication controller:foo is not behaving like the others?” “Show me rate of HTTP 500 responses from nginx” “… across all data centers” “… running my app version 2….”
  26. 26. Getting at the metrics…
  27. 27. Resource Metrics Utilization: • CPU (user + system) • memory • i/o • network traffic Saturation • throttling • swap Error • Network Errors 
 (receive vs transmit)
  28. 28. Container Events • Starting / Stopping Containers • Scaling Events for Underlying Instances • Deploying a new container build
  29. 29. Pseudo-files • Provide visibility into container metrics via the file system. • Generally under: 
 /cgroup/<resource>/docker/$CONTAINER_ID/ 
 or
 /sys/fs/cgroup/<resource>/docker/$CONTAINER_ID/

  30. 30. Pseudo-files: CPU Metrics $ cat /sys/fs/cgroup/cpuacct/docker/$CONTAINER_ID/cpuacct.stat > user 2451 # time spent running processes since boot > system 966 # time spent executing system calls since boot $ cat /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cpu.stat > nr_periods 565 # Number of enforcement intervals that have elapsed > nr_throttled 559 # Number of times the group has been throttled > throttled_time 12119585961 # Total time that members of the group were throttled (12.12 seconds) Pseudo-files: CPU Throttling
  31. 31. Docker API • Detailed streaming metrics as JSON HTTP socket
 $ curl -v --unix-socket /var/run/docker.sock http://localhost/containers/ 28d7a95f468e/stats

  32. 32. Side Car Containers
  33. 33. Service Discovery Docker API Kubernetes Monitoring Agent Container A O A O Containers List & Metadata Additional Metadata (Tags, etc) Config Backends Integration Configurations Host Level Metrics
  34. 34. Custom Metrics • Instrument custom applications
 • You know your key transactions best.
 • Use async protocols like Etys’ STATSD or 
 DogstatsD
  35. 35. My friend Martin The demo
  36. 36. Resources Monitoring 101: Alerting 
 https://www.datadoghq.com/blog/monitoring-101-alerting/ Monitoring 101: Collecting the Right Data https://www.datadoghq.com/blog/monitoring-101-collecting-data/ Monitoring 101: Investigating performance issues https://www.datadoghq.com/blog/monitoring-101-investigation/
 The Power of Tagged Metrics https://www.datadoghq.com/blog/the-docker-monitoring-problem/ How to Collect Docker Metrics https://www.datadoghq.com/blog/how-to-collect-docker-metrics/ 8 surprising facts about Docker Adoption https://www.datadoghq.com/docker-adoption/

×