Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PAC 2020 Santorin - Edoardo Varani

144 views

Published on

Seeing is knowing, Measuring CPU throttling in containerized environments

Published in: Engineering
  • Be the first to comment

PAC 2020 Santorin - Edoardo Varani

  1. 1. PERFORMANCE IS NOT A MYTH P E R F O R M A N C E A D V I S O R Y C O U N C I L SANTORINI GREECE FEBRUARY 26 - 27 2020 Seeing is knowing: Measuring CPU throttling in containerized environments Edoardo Varani
  2. 2. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Organizations love containers
  3. 3. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Complexity is growing
  4. 4. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Bottleneck Analysis complexity --cpus = 1
  5. 5. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L First outcomes • Avg response times increase with the load • Bad spikes up to 4x • CPU??
  6. 6. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Jpetstore bottleneck 0,65 avg CPUsLimit set to 1 CPU • CPU Util is far from critical limits (80- 90%) • So? • We start to look at other resources to find the bottleneck
  7. 7. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Bottleneck analysis complexity What if the bottleneck is exactly in jpetstore CPU?
  8. 8. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L CPU limits under the hood • Based on Linux Kernel cgroups • Completely Fair Scheduler (CFS) Bandwidth control: • Quota • Period • Shares (Soft limit) • A cgroup can use at most his CPU time Quota in each wall-clock time Period • Each cgroup can still use all the physical CPUs • If the Quota is burned before the new Period, the cgroup gets throttled
  9. 9. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Throttling metrics from cgroup files • CFS native metrics: • CPU periods • CPU throttled periods • CPU throttling time Cgroup folder for the container
  10. 10. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Best APMs show throttling (in seconds)
  11. 11. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L A metric from Kubecon Δnr_throttled • By Dave Chiluk @ Indeed • Gives an approximate indication • It does not tell how bad the throttling spikes are Δ nr_periods Throttled% =
  12. 12. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L A metric from Kubecon Commits: de53fd7aedb1 & 763a9ec06c40 Applied to 5.4 Kernel Backported to: 4.14.154+, 4.19.84+, 5.3.9+ Distro kernels: • Ubuntu 5.3.0-24+ • Ubuntu 4.15.0-67+ • RHEL7 - kernel-3.10.0-1062.8.1.el7 • RHEL8.2 - WIP
  13. 13. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L So what? Ok, so we probably need to increase jpetstore quota. BUT How much more?
  14. 14. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Hard choices CONTAINER PERFORMANCE CLUSTER RELIABILITY • The higher the limits, the higher the risk • We want to avoid multiple performance tests just to know the right container quota • We need to know how far we are from the desired CPUs
  15. 15. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L We need to see the Throttled CPUs + 1.3 Max Throttled CPUs ThrottledCPUs
  16. 16. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Just add the 1.3 Throttled CPUs to the quota --cpus = 1--cpus = 2,3+1,3 desired CPUs
  17. 17. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Quota increased – no more throttling Quota=2.3Cpus Throttling < 0.1 CPUs
  18. 18. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L New Sizing – CPU footprint is similar 1 CPU quota 2.3 CPU quota 0.7 CPUs 0.7 CPUs
  19. 19. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Benefits Any real benefit from this sizing?
  20. 20. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L New Sizing – Response time cut About 200ms on average About 105ms on averageBefore After
  21. 21. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L New Sizing – GC Pauses cut About 230ms for a Full GCs About 100ms of Full GCs AfterBefore
  22. 22. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L How is it calculated? + 1.3 Max Throttled CPUs ThrottledCPUs
  23. 23. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Throttled CPUs - PromQL rate(container_cpu_cfs_throttled_seconds_total[interval]) • It’s just the “throttled seconds per second” rate • Close to standard "top-like container cpu utilization“ • You want this to be close to 0 but not 0
  24. 24. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Stress-ng – Batch Case Scenario docker run lorel/docker-stress-ng stress-ng --cpu 2 • Easy workload generator • It completely use the number of CPUs provided • Three experiments: • --cpus = 0.5 • --cpus = 1 • --cpus = 1.9
  25. 25. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Batch workloads – 0.5 CPU limit 1.5 Throttled CPUs 100% Throttled periods 0.5 Used CPUs
  26. 26. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Batch workloads – 1 CPU limit 100% Throttled periods 1 Used CPUs 1 Throttled CPUs
  27. 27. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Batch workloads – 1.9 CPU limit Still 100% Throttled periods0.1 Throttled CPUs 1.9 Used CPUs
  28. 28. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Quota dependency - Renassaince • Open Source • 30+ Benchmarks • Akka actors • Spark batches • Genetic algorithms • Jdk-streams • Reactive streams • … • Committee from Universities and Oracle Labs • Cool logo
  29. 29. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L JVM Ergonomics – Quota dependency • Increasing the quota caused a demand increase • Why?
  30. 30. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L JVM Ergonomics – Quota dependency • More threads are created • Parallel work is offloaded to the new threads • JVM see more availableProcessors
  31. 31. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L availableProcessors • Runtime.availableProcessors is used to set: • Compile threads • GC threads • Fork join pool size • Libraries / App Servers threads • Prior to JDK 8u131, there was no container awareness • In JDK11 PreferContainerQuotaForCPUCount is ON by default • If OFF, JVM will use the CPU Shares
  32. 32. P E R F O R M A N C E A D V I S O R Y C O U N C I L byP E R F O R M A N C E A D V I S O R Y C O U N C I L Takeaways • Choose a meaningful throttling metric • Give JVMs some room to spike, BUT • Try to downscale your threads if you are spiking too much • Compare throttling to compare the spikiness between releases • Tailor the limits around your workload to preserve the cluster • Upgrade your Kernels and JDKs • Hang tight for cgroup v2
  33. 33. PERFORMANCE IS NOT A MYTH P E R F O R M A N C E A D V I S O R Y C O U N C I L SANTORINI GREECE FEBRUARY 26 - 27 2020 Questions?

×