GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI

© 2023 Akamas • All Rights Reserved • Confidential
Kubernetes performance
tuning dilemma: How to
solve it with AI
Stefano Doni, CTO

Agenda
1 The problem
2 Tuning challenges for modern K8s apps
3 AI-powered optimization
4 Demo

● Obsessed with performance optimization
● 18+ years of capacity & performance work
● CMG speaker since 2014, Best paper on Java
performance & efficiency in 2015
● Co-founder and CTO @ Akamas,
the software platform for autonomous
optimization, powered by AI
Who Am I

Kubernetes has become the operating system
of the cloud
Cloud Native Computing Foundation, Annual Survey 2021
96% of organizations are either using or evaluating Kubernetes

The dark side of Kubernetes
youtu.be/watch?v=4CT0cI62YHk youtu.be/QXApVwRBeys
Cost efficiency Apps reliability Apps performance
Kubernetes FinOps Report, 2021 June
Kubernetes failure stories: k8s.af

© 2023 Akamas • All Rights Reserved • Conﬁdential
Application runtime
resource management
Kubernetes resource management
● Memory sizing
● Garbage collection
● Compiler & thread settings
● Container resource requests & limits
● Number of replicas
● Horizontal auto-scaling settings
New challenges for cloud-native apps
100s-1000s microservices
10s-100s inter-dependent
configurations

Why is K8s so hard?
K8s resource management

Pod A Pod B
Resource requests drive K8s cluster costs
CPU
Memory
● Requests are resources the container is guaranteed to get
● Cluster capacity is based on pod resource requests - there is no overcommitment!
● Resource requests != resource utilization: a cluster can be full even if utilization is 10%
Node (4 CPU, 8 GB Memory)
Resource requests from pod manifest
Pod A
2 cores
2GB
Memory
Pod A
apiVersion: v1
kind: Pod
metadata:
name: Pod A
spec:
containers:
- name: app
image: nginx:1.1
resources:
requests:
memory: “2Gi”
cpu: “2”
2 4
2 4 6 8
Pod B
Resource used

Resource limits may strongly impact application
performance and stability
● A container can consume more resources than it has requested
● Resource limits allow to specify the maximum resources a container can use (e.g. CPU = 2)
● When a container hits its resource limits bad things can happen
Container CPU limit
Container Memory limit
K8s throttle container CPU ->
Application performance slowdown
When hitting
Memory Limits
When hitting
CPU Limits
K8s kills the container -> Application
stability issues
X
CPU
Usage
Memory
Usage

CPU throttling impacts cost & performance in
surprising ways
SRE
Significant CPU
throttling…
… with CPU < 40%
“The container's CPU use is being throttled,
because the container is attempting to use
more CPU resources than its limit”
https://kubernetes.io/docs/tasks/configure-pod-
container/assign-cpu-resource
Why do I have CPU throttling if I’m
using less than 40% of my CPU limit?
Must be a K8s issue…
Perf. impact

Fact #4: Setting resource requests and limits is
required to ensure Kubernetes stability
“While your Kubernetes cluster might work
fine without setting resource requests and
limits, you will start running into stability
issues as your teams and projects grow”
(Google, Kubernetes best practices)
https://cloud.google.com/blog/products/containers-kubernetes/
kubernetes-best-practices-resource-requests-and-limits

Why is K8s so hard?
Application runtime resource
management

App runtimes are highly configurable engines
“Because Java is so often deployed on servers, this kind of
performance tuning is an essential activity for many
organizations.
The JVM is highly configurable with literally hundreds of
command-line options and switches. These switches provide
performance engineers a gold mine of possibilities to explore in
the pursuit of the optimal configuration for a given workload
on a given platform.”
$ docker run eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal

Why heap size tuning is important? JVM uses
all of the available memory
2 GiB
1.2 GiB
JVM heap
used
JVM max heap
App response time
● The JVM tends to use all of the memory it has been configured with
● Sizing based on K8s container memory usage is going to miss a lot of savings
● Experiment with JVM max heap size to see how much you can save - while monitoring app performance!
Key
Takeaways
-40%
Mem used

Max heap size is set by default to 25% of container memory limit
You can tune the 25% via the -XX:MaxRAMPercentage parameter:
Alternatively, you can always set a fixed max heap size with the -Xmx parameter:
How does the JVM set the max heap size in
K8s? JVM container-aware ergonomics
$ docker run --memory 1G eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize
size_t MaxHeapSize = 268435456 {product} {ergonomic}
$ docker run --memory 1G eclipse-temurin:11-alpine java -XX:MaxRAMPercentage=50 -XX:+PrintFlagsFinal 2>&1 | grep -w
MaxHeapSize
size_t MaxHeapSize = 536870912 {product} {ergonomic}
$ docker run --memory 1G eclipse-temurin:11-alpine java -Xmx1024M -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize
size_t MaxHeapSize = 1073741824 {product} {command line}

JVM ergonomics in K8s are tricky
Source: Microsoft
● JVM ergonomics do a lot of magic stuff, but they are tricky to understand and may do the wrong thing!
● MaxRAMPercentage default is very conservative: increase it, but watch out for out of memory kills by k8s
● Do not trust JVM ergonomics: it’s best to explicitly set JVM flags to avoid surprises
Key
Takeaways

OOM kills your app reliability - but better heap
sizing can fix them
Container memory used hits the
memory limit, triggering K8s
out-of-memory killer
Context: Java microservices getting restarted due to out-of-memory kill by K8s
SRE
Container memory limit
Container memory used
Availability
impact
My containers keep getting
OOM killed… Is this a memory
leak or a misconfiguration?
Let’s increase the memory
limit just in case…

App runtime memory management
Key
Takeaways
● Heap max heap size is the main memory tuning parameter (e.g. JVM -Xmx or -XX:MaxRAMPercentage)
● Off-heap cannot be sized via configuration options - memory usage depends on your application (200 MB up to
1GB is common for the JVM)
● You need to monitor your app in production and take both spaces into account when sizing memory to achieve
cost efficient and reliable microservices
Heap Threads
JVM max heap size
K8s container memory limit
JVM off-heap
Classes Compiler
JVM memory
Initial
Heap
Garbage
Collector

GC tuning can lead to big cost benefits
1500
millicores
600
millicores
CPU used
App response
time
G1 GC
(-XX:+UseG1GC)
Parallel GC
(-XX:+UseParallelGC)
-60%
CPU used

JVM default ergonomics in K8s: garbage
collector
2 4 6 8
1
Number of
CPUs
Memory
(MB)
1791 MB
Serial GC
G1 GC
Key
Takeaways
● Default GC selection is based on hard-coded thresholds defined decades ago
● You may end up paying the cost of a suboptimal GC, and you may not even know it!
● Other good collectors like Parallel GC are not considered
● Do not trust JVM ergonomics - always set your JVM options!

Golang CPU reduction with GOGC tuning
400
millicores
180
millicores
-55%
CPU used
Node.js has a lot of tuning flags as well (flaviocopes.com/node-runtime-v8-options)

How to solve this problem?
Performance Engineering to
the rescue!

The industry standard performance tuning
process
Analyze system
performance
Identify tuning
parameters
Change one
parameter
Test system
with new config
it’s manual, slow and error-prone, requires deep skills, doesn’t scale, is not continuous…
Optimizing cloud-native applications requires a better approach!

Enter AI-driven
Optimization

Autonomous optimization key capabilities

Autonomous optimization process

Optimization
Studies
Live
Optimizations
The Akamas Platform

Reducing cost of a Kubernetes
microservice, while preserving
app performance & reliability
Demo

Key takeaways
● K8s enables unprecedented scalability & efficiency, but it’s not automatic
● Tuning is your responsibility - if you don’t tune, you don’t save!
● The biggest cost & reliability wins lie in K8s workload and app runtime layers -
don’t rely on ergonomics!
● AI-powered optimization enables you to automate tuning and achieve savings
at scale
1
2
3
4

Q&A

Contacts
info@akamas.io
@AkamasLabs
@akamaslabs
Italy HQ
Via Schiaffino 11
Milan, 20158
+39-02-4951-7001
USA East
211 Congress Street
Boston, MA 02110
+1-617-936-0212
Singapore
5 Temasek Blvd
Singapore 038985
USA West
12130 Millennium Drive
Los Angeles, CA 90094
+1-323-524-0524
LinkedIn Twitter
Email

GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI

Recommended

Recommended

More Related Content

Similar to GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI

Similar to GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI (20)

More from James Anderson

More from James Anderson (20)

Recently uploaded

Recently uploaded (20)

GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI