Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014

2,406 views

Published on

If you have tried Docker but are unsure about how to run it at scale, you will benefit from this session. Like virtualization before, containerization (à; la Docker) is increasing the elastic nature of cloud infrastructure by an order of magnitude. But maybe you still have questions: How many containers can you run on a given Amazon EC2 instance type? Which metric should you look at to measure contention? How do you manage fleets of containers at scale?
Datadog is a monitoring service for IT, operations, and development teams who write and run applications at scale. In this session, the cofounder of Datadog presents the challenges and benefits of running containers at scale and how to use quantitative performance patterns to monitor your infrastructure at this magnitude and complexity. Sponsored by Datadog.

Published in: Technology

(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014

  1. 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in partwithout the express consent of Amazon.com, Inc. November 12, 2014 | Las Vegas, NV APP309Monitoring and Running Docker Containers at Scale Alexis Lê-Quôc, Datadog
  2. 2. @alq —CTO at Datadog
  3. 3. Datadog •Monitoring service •Made for the cloud •Aggregates everything •Support for Docker (since 1.0)
  4. 4. Goals 1.Present key Docker metrics 2.Explain operational complexity 3.Rethink monitoring of Docker containers
  5. 5. Agenda •A (very) brief history of containers •Docker containers on AWS •Key Docker metrics •Operational complexity •Monitoring Docker effectively •Demo
  6. 6. A brief history of containers
  7. 7. Containers in a nutshell •Been around for a long time –jails, zones, cgroups •No full-virtualization overhead •Used for runtime isolation (e.g.,jails) •Docker: escape from dependency hell
  8. 8. Escape from dependency hell a.out shared libs packages omnibus Docker ~
  9. 9. Container ~ single static binary Process Container Host Source Dockerfile Chef/PuppetKickstart .TEXT /var/lib/docker Full distro PID Name/ID Hostname
  10. 10. Docker on AWS: some numbers
  11. 11. (Some) Docker use cases •Continuous integration –eliminate dependency variance –same code from dev laptop to production –Git-like workflow •Continuous delivery –(quasi) stateless components –web workers, video encoders, etc. –not for data stores (Amazon RDS a better fit)
  12. 12. Instance types 20% 20% 19% 13% 8% 21% c3.2xl m3.medium m3.large m3.xlarge m1.large the rest Source: Datadog, October 2014
  13. 13. Containers per instance •Average: 5 (October,2014) •Highly dependent on the workload •This is just the beginning… •Expect higher container density going forward Source: Datadog, October 2014
  14. 14. Key Docker metrics
  15. 15. Docker containers consume… •Memory •CPU •I/O •Network
  16. 16. Memory Name Why it matters pgmajfault Paging to/from disk is slow pgfault Context switches hurt application performance resident set size (rss) Too much RSS causes paging and swapping swap Swapping in/out is slow
  17. 17. CPU Name Why it matters user Measures work being done system System calls, a necessary evil
  18. 18. Block I/O Name Why it matters blkio.io_service_bytes I/O is (often) bottleneck blkio.io_queued Measures saturation
  19. 19. Network Name Why it matters tx/rx_errors Because…errors are bad tx/rx_dropped Measures contention tx/rx_bytes Measures traffic
  20. 20. How to collect metrics •https://github.com/google/cadvisor
  21. 21. Operational complexity
  22. 22. Combinatorial multiplication Hardware OS Off-the-shelf Your Application Hardware Hypervisor Off-the- shelf App OS OS Off-the- shelf App Hardware Hypervisor OS OS A A A A Containers O O O O
  23. 23. Operational complexity •Average containers per instance: N (N=5, 10/2014) •N-times as many “hosts” to manage •Affects –provisioning: prep’ing & building containers –configuration: passing config to containers –orchestration: deciding where/when containers run –monitoring: making sure containers run properly
  24. 24. Monitoring: metric counts on Amazon EC2 •1 Amazon EC2 instance –10 Amazon CloudWatch metrics •1 operating system (e.g.,Linux) –100 metrics •1 container –50 metrics •1 off-the-shelf application –~50 metrics
  25. 25. Combinatorial multiplication 100 500 instances containers Assuming only5 containers per instance
  26. 26. Combinatorial multiplication 160 410 metrics per instance metrics per instance Assuming only5 containers per instance
  27. 27. Velocity hours, days, months minutes, hours, days EC2 instance half-life Container half-life
  28. 28. Aggravating factors •Hub-based provisioning –new images every day •Autonomic orchestration –from imperative to declarative –automated –individual containers don’t matter –e.g.,Kubernetes, Mesos
  29. 29. A lot more, A lot faster.
  30. 30. If your monitoring is still centered on individual hosts or instances…
  31. 31. Host-centric monitoring Monitor Monitor GAP Hypervisor OS OS A A A A Containers O O O O
  32. 32. A lot more pain, A lot faster.
  33. 33. Monitoring containers effectively
  34. 34. A new approach to container monitoring
  35. 35. Layers + Tags
  36. 36. Layers of monitoring Monitor Hypervisor OS OS A A A A Containers O O O O
  37. 37. Layers of monitoring CloudWatch Infrastructure Monitoring APM Hypervisor OS OS A A A A Containers O O O O
  38. 38. Layers of monitoring cpu/net/io filesystem docker mem docker cpu db queries web requests app throughput CloudWatch Infrastructure Monitoring APM e.g. Hypervisor OS OS A A A A Containers O O O O
  39. 39. Layers of monitoring •Access to metrics from all the layers •Amazon CloudWatch, OS metrics, Docker metrics, app metrics in 1 place •Shared timeline
  40. 40. If your monitoring does not cover all layers, pain.
  41. 41. Tags You use them already
  42. 42. Tags •Monitoring is like Auto-Scaling Groups •Monitoring is like Docker orchestration •From imperativeto declarative •Query-based •Queries operate on tags
  43. 43. Monitoring with tags and queries “Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… and make sure resident set size < 1GB on c3.xl”
  44. 44. Monitoring with tags and queries “Monitor all Docker containers running image web” “… in region us-west-2across all availability zones” “… and make sure resident set size < 1GB on c3.xl”
  45. 45. Monitoring with tags and queries “Monitor all Docker containers running image web” “… in region us-west-2across all availability zones” “… that use more than 1.5x the average on c3.xl”
  46. 46. “Dude, where’s my server?”
  47. 47. “Dude, where’s my container?”
  48. 48. If your monitoring is not tag-based, pain.
  49. 49. Demo
  50. 50. Take-aways 1.Docker increases operational complexity by an order of magnitude unless… 2.You have layered monitoring, from the instance to the container and to the application, and… 3.You monitor using tags and queries
  51. 51. http://bit.ly/awsevals

×