SlideShare a Scribd company logo
1 of 31
Download to read offline
© 2023 Akamas • All Rights Reserved • Confidential
Kubernetes performance
tuning dilemma: How to
solve it with AI
Stefano Doni, CTO
© 2023 Akamas • All Rights Reserved • Confidential
Agenda
1 The problem
2 Tuning challenges for modern K8s apps
3 AI-powered optimization
4 Demo
© 2023 Akamas • All Rights Reserved • Confidential
● Obsessed with performance optimization
● 18+ years of capacity & performance work
● CMG speaker since 2014, Best paper on Java
performance & efficiency in 2015
● Co-founder and CTO @ Akamas,
the software platform for autonomous
optimization, powered by AI
Who Am I
© 2023 Akamas • All Rights Reserved • Confidential
Kubernetes has become the operating system
of the cloud
Cloud Native Computing Foundation, Annual Survey 2021
96% of organizations are either using or evaluating Kubernetes
© 2023 Akamas • All Rights Reserved • Confidential
The dark side of Kubernetes
youtu.be/watch?v=4CT0cI62YHk youtu.be/QXApVwRBeys
Cost efficiency Apps reliability Apps performance
Kubernetes FinOps Report, 2021 June
Kubernetes failure stories: k8s.af
© 2023 Akamas • All Rights Reserved • Confidential
Application runtime
resource management
Kubernetes resource management
● Memory sizing
● Garbage collection
● Compiler & thread settings
● Container resource requests & limits
● Number of replicas
● Horizontal auto-scaling settings
New challenges for cloud-native apps
100s-1000s microservices
10s-100s inter-dependent
configurations
© 2023 Akamas • All Rights Reserved • Confidential
Why is K8s so hard?
K8s resource management
© 2023 Akamas • All Rights Reserved • Confidential
Pod A Pod B
Resource requests drive K8s cluster costs
CPU
Memory
● Requests are resources the container is guaranteed to get
● Cluster capacity is based on pod resource requests - there is no overcommitment!
● Resource requests != resource utilization: a cluster can be full even if utilization is 10%
Node (4 CPU, 8 GB Memory)
Resource requests from pod manifest
Pod A
2 cores
2GB
Memory
Pod A
apiVersion: v1
kind: Pod
metadata:
name: Pod A
spec:
containers:
- name: app
image: nginx:1.1
resources:
requests:
memory: “2Gi”
cpu: “2”
2 4
2 4 6 8
Pod B
Resource used
© 2023 Akamas • All Rights Reserved • Confidential
Resource limits may strongly impact application
performance and stability
● A container can consume more resources than it has requested
● Resource limits allow to specify the maximum resources a container can use (e.g. CPU = 2)
● When a container hits its resource limits bad things can happen
Container CPU limit
Container Memory limit
K8s throttle container CPU ->
Application performance slowdown
When hitting
Memory Limits
When hitting
CPU Limits
K8s kills the container -> Application
stability issues
X
CPU
Usage
Memory
Usage
© 2023 Akamas • All Rights Reserved • Confidential
CPU throttling impacts cost & performance in
surprising ways
SRE
Significant CPU
throttling…
… with CPU < 40%
“The container's CPU use is being throttled,
because the container is attempting to use
more CPU resources than its limit”
https://kubernetes.io/docs/tasks/configure-pod-
container/assign-cpu-resource
Why do I have CPU throttling if I’m
using less than 40% of my CPU limit?
Must be a K8s issue…
Perf. impact
© 2023 Akamas • All Rights Reserved • Confidential
Fact #4: Setting resource requests and limits is
required to ensure Kubernetes stability
“While your Kubernetes cluster might work
fine without setting resource requests and
limits, you will start running into stability
issues as your teams and projects grow”
(Google, Kubernetes best practices)
https://cloud.google.com/blog/products/containers-kubernetes/
kubernetes-best-practices-resource-requests-and-limits
© 2023 Akamas • All Rights Reserved • Confidential
Why is K8s so hard?
Application runtime resource
management
© 2023 Akamas • All Rights Reserved • Confidential
App runtimes are highly configurable engines
“Because Java is so often deployed on servers, this kind of
performance tuning is an essential activity for many
organizations.
The JVM is highly configurable with literally hundreds of
command-line options and switches. These switches provide
performance engineers a gold mine of possibilities to explore in
the pursuit of the optimal configuration for a given workload
on a given platform.”
$ docker run eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal
© 2023 Akamas • All Rights Reserved • Confidential
Why heap size tuning is important? JVM uses
all of the available memory
2 GiB
1.2 GiB
JVM heap
used
JVM max heap
App response time
● The JVM tends to use all of the memory it has been configured with
● Sizing based on K8s container memory usage is going to miss a lot of savings
● Experiment with JVM max heap size to see how much you can save - while monitoring app performance!
Key
Takeaways
-40%
Mem used
© 2023 Akamas • All Rights Reserved • Confidential
Max heap size is set by default to 25% of container memory limit
You can tune the 25% via the -XX:MaxRAMPercentage parameter:
Alternatively, you can always set a fixed max heap size with the -Xmx parameter:
How does the JVM set the max heap size in
K8s? JVM container-aware ergonomics
$ docker run --memory 1G eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize
size_t MaxHeapSize = 268435456 {product} {ergonomic}
$ docker run --memory 1G eclipse-temurin:11-alpine java -XX:MaxRAMPercentage=50 -XX:+PrintFlagsFinal 2>&1 | grep -w
MaxHeapSize
size_t MaxHeapSize = 536870912 {product} {ergonomic}
$ docker run --memory 1G eclipse-temurin:11-alpine java -Xmx1024M -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize
size_t MaxHeapSize = 1073741824 {product} {command line}
© 2023 Akamas • All Rights Reserved • Confidential
JVM ergonomics in K8s are tricky
Source: Microsoft
● JVM ergonomics do a lot of magic stuff, but they are tricky to understand and may do the wrong thing!
● MaxRAMPercentage default is very conservative: increase it, but watch out for out of memory kills by k8s
● Do not trust JVM ergonomics: it’s best to explicitly set JVM flags to avoid surprises
Key
Takeaways
© 2023 Akamas • All Rights Reserved • Confidential
OOM kills your app reliability - but better heap
sizing can fix them
Container memory used hits the
memory limit, triggering K8s
out-of-memory killer
Context: Java microservices getting restarted due to out-of-memory kill by K8s
SRE
Container memory limit
Container memory used
Availability
impact
My containers keep getting
OOM killed… Is this a memory
leak or a misconfiguration?
Let’s increase the memory
limit just in case…
© 2023 Akamas • All Rights Reserved • Confidential
App runtime memory management
Key
Takeaways
● Heap max heap size is the main memory tuning parameter (e.g. JVM -Xmx or -XX:MaxRAMPercentage)
● Off-heap cannot be sized via configuration options - memory usage depends on your application (200 MB up to
1GB is common for the JVM)
● You need to monitor your app in production and take both spaces into account when sizing memory to achieve
cost efficient and reliable microservices
Heap Threads
JVM max heap size
K8s container memory limit
JVM off-heap
Classes Compiler
JVM memory
Initial
Heap
Garbage
Collector
© 2023 Akamas • All Rights Reserved • Confidential
GC tuning can lead to big cost benefits
1500
millicores
600
millicores
CPU used
App response
time
G1 GC
(-XX:+UseG1GC)
Parallel GC
(-XX:+UseParallelGC)
-60%
CPU used
© 2023 Akamas • All Rights Reserved • Confidential
JVM default ergonomics in K8s: garbage
collector
2 4 6 8
1
Number of
CPUs
Memory
(MB)
1791 MB
Serial GC
G1 GC
Key
Takeaways
● Default GC selection is based on hard-coded thresholds defined decades ago
● You may end up paying the cost of a suboptimal GC, and you may not even know it!
● Other good collectors like Parallel GC are not considered
● Do not trust JVM ergonomics - always set your JVM options!
© 2023 Akamas • All Rights Reserved • Confidential
Golang CPU reduction with GOGC tuning
400
millicores
180
millicores
-55%
CPU used
Node.js has a lot of tuning flags as well (flaviocopes.com/node-runtime-v8-options)
© 2023 Akamas • All Rights Reserved • Confidential
How to solve this problem?
Performance Engineering to
the rescue!
© 2023 Akamas • All Rights Reserved • Confidential
The industry standard performance tuning
process
Analyze system
performance
Identify tuning
parameters
Change one
parameter
Test system
with new config
it’s manual, slow and error-prone, requires deep skills, doesn’t scale, is not continuous…
Optimizing cloud-native applications requires a better approach!
© 2023 Akamas • All Rights Reserved • Confidential
Enter AI-driven
Optimization
© 2023 Akamas • All Rights Reserved • Confidential
Autonomous optimization key capabilities
© 2022 Akamas • All Rights Reserved • Confidential
Autonomous optimization process
© 2023 Akamas • All Rights Reserved • Confidential
Optimization
Studies
Live
Optimizations
The Akamas Platform
© 2023 Akamas • All Rights Reserved • Confidential
Reducing cost of a Kubernetes
microservice, while preserving
app performance & reliability
Demo
© 2023 Akamas • All Rights Reserved • Confidential
Key takeaways
● K8s enables unprecedented scalability & efficiency, but it’s not automatic
● Tuning is your responsibility - if you don’t tune, you don’t save!
● The biggest cost & reliability wins lie in K8s workload and app runtime layers -
don’t rely on ergonomics!
● AI-powered optimization enables you to automate tuning and achieve savings
at scale
1
2
3
4
© 2023 Akamas • All Rights Reserved • Confidential
Q&A
Contacts
info@akamas.io
@AkamasLabs
@akamaslabs
Italy HQ
Via Schiaffino 11
Milan, 20158
+39-02-4951-7001
USA East
211 Congress Street
Boston, MA 02110
+1-617-936-0212
Singapore
5 Temasek Blvd
Singapore 038985
USA West
12130 Millennium Drive
Los Angeles, CA 90094
+1-323-524-0524
LinkedIn Twitter
Email
© 2023 Akamas • All Rights Reserved • Confidential

More Related Content

Similar to GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI

DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...RichHagarty
 
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...Amazon Web Services
 
JPrime_JITServer.pptx
JPrime_JITServer.pptxJPrime_JITServer.pptx
JPrime_JITServer.pptxGrace Jansen
 
Gear6 Web Cache Overview
Gear6 Web Cache OverviewGear6 Web Cache Overview
Gear6 Web Cache OverviewGear6
 
IBM Maximo Performance Tuning
IBM Maximo Performance TuningIBM Maximo Performance Tuning
IBM Maximo Performance TuningFMMUG
 
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon Web Services
 
Aem hub oak 0.2 full
Aem hub oak 0.2 fullAem hub oak 0.2 full
Aem hub oak 0.2 fullMichael Marth
 
SemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptxSemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptxSumanMitra22
 
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...Edge AI and Vision Alliance
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsDataCore Software
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
 
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...Amazon Web Services
 
JITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfJITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfRichHagarty
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2DataWorks Summit
 
Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...
Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...
Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...Amazon Web Services
 
V mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentationV mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentationsolarisyourep
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
 
Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319 Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319 Amazon Web Services
 
Multi-Arch Infra From the Ground Up.pptx
Multi-Arch Infra From the Ground Up.pptxMulti-Arch Infra From the Ground Up.pptx
Multi-Arch Infra From the Ground Up.pptxCheryl Hung
 

Similar to GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI (20)

DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
 
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
 
JPrime_JITServer.pptx
JPrime_JITServer.pptxJPrime_JITServer.pptx
JPrime_JITServer.pptx
 
Gear6 Web Cache Overview
Gear6 Web Cache OverviewGear6 Web Cache Overview
Gear6 Web Cache Overview
 
IBM Maximo Performance Tuning
IBM Maximo Performance TuningIBM Maximo Performance Tuning
IBM Maximo Performance Tuning
 
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
 
Aem hub oak 0.2 full
Aem hub oak 0.2 fullAem hub oak 0.2 full
Aem hub oak 0.2 full
 
SemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptxSemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptx
 
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANs
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8
 
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
 
JITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfJITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdf
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...
Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...
Accelerating Containerized Workloads with Amazon EC2 Spot Instances - AWS Onl...
 
V mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentationV mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentation
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319 Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319
 
Multi-Arch Infra From the Ground Up.pptx
Multi-Arch Infra From the Ground Up.pptxMulti-Arch Infra From the Ground Up.pptx
Multi-Arch Infra From the Ground Up.pptx
 

More from James Anderson

GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...James Anderson
 
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...James Anderson
 
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for KubernetesGDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for KubernetesJames Anderson
 
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
 
GDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdfGDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdfJames Anderson
 
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdfGraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdfJames Anderson
 
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
 GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...James Anderson
 
A3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdfA3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdfJames Anderson
 
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...James Anderson
 
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsGDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsJames Anderson
 
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...James Anderson
 
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...James Anderson
 
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...James Anderson
 
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...James Anderson
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneJames Anderson
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...James Anderson
 
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...James Anderson
 
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesJames Anderson
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...James Anderson
 

More from James Anderson (20)

GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
 
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
 
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for KubernetesGDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
GDG Cloud Southlake 29 Jimmy Mesta OWASP Top 10 for Kubernetes
 
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
 
GDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdfGDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdf
 
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdfGraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
GraphQL Insights Deck ( Sabre_GDG - Sept 2023).pdf
 
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
 GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
 
A3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdfA3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdf
 
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
 
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsGDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
 
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
GDG Cloud Southlake no. 22 Gutta and Nayer GCP Terraform Modules Scaling Your...
 
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
 
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
 
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
GDG Cloud Southlake #18 Yujun Liang Crawl, Walk, Run My Journey into Google C...
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
 
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
 
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
 

Recently uploaded

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: How to solve it with AI

  • 1. © 2023 Akamas • All Rights Reserved • Confidential Kubernetes performance tuning dilemma: How to solve it with AI Stefano Doni, CTO
  • 2. © 2023 Akamas • All Rights Reserved • Confidential Agenda 1 The problem 2 Tuning challenges for modern K8s apps 3 AI-powered optimization 4 Demo
  • 3. © 2023 Akamas • All Rights Reserved • Confidential ● Obsessed with performance optimization ● 18+ years of capacity & performance work ● CMG speaker since 2014, Best paper on Java performance & efficiency in 2015 ● Co-founder and CTO @ Akamas, the software platform for autonomous optimization, powered by AI Who Am I
  • 4. © 2023 Akamas • All Rights Reserved • Confidential Kubernetes has become the operating system of the cloud Cloud Native Computing Foundation, Annual Survey 2021 96% of organizations are either using or evaluating Kubernetes
  • 5. © 2023 Akamas • All Rights Reserved • Confidential The dark side of Kubernetes youtu.be/watch?v=4CT0cI62YHk youtu.be/QXApVwRBeys Cost efficiency Apps reliability Apps performance Kubernetes FinOps Report, 2021 June Kubernetes failure stories: k8s.af
  • 6. © 2023 Akamas • All Rights Reserved • Confidential Application runtime resource management Kubernetes resource management ● Memory sizing ● Garbage collection ● Compiler & thread settings ● Container resource requests & limits ● Number of replicas ● Horizontal auto-scaling settings New challenges for cloud-native apps 100s-1000s microservices 10s-100s inter-dependent configurations
  • 7. © 2023 Akamas • All Rights Reserved • Confidential Why is K8s so hard? K8s resource management
  • 8. © 2023 Akamas • All Rights Reserved • Confidential Pod A Pod B Resource requests drive K8s cluster costs CPU Memory ● Requests are resources the container is guaranteed to get ● Cluster capacity is based on pod resource requests - there is no overcommitment! ● Resource requests != resource utilization: a cluster can be full even if utilization is 10% Node (4 CPU, 8 GB Memory) Resource requests from pod manifest Pod A 2 cores 2GB Memory Pod A apiVersion: v1 kind: Pod metadata: name: Pod A spec: containers: - name: app image: nginx:1.1 resources: requests: memory: “2Gi” cpu: “2” 2 4 2 4 6 8 Pod B Resource used
  • 9. © 2023 Akamas • All Rights Reserved • Confidential Resource limits may strongly impact application performance and stability ● A container can consume more resources than it has requested ● Resource limits allow to specify the maximum resources a container can use (e.g. CPU = 2) ● When a container hits its resource limits bad things can happen Container CPU limit Container Memory limit K8s throttle container CPU -> Application performance slowdown When hitting Memory Limits When hitting CPU Limits K8s kills the container -> Application stability issues X CPU Usage Memory Usage
  • 10. © 2023 Akamas • All Rights Reserved • Confidential CPU throttling impacts cost & performance in surprising ways SRE Significant CPU throttling… … with CPU < 40% “The container's CPU use is being throttled, because the container is attempting to use more CPU resources than its limit” https://kubernetes.io/docs/tasks/configure-pod- container/assign-cpu-resource Why do I have CPU throttling if I’m using less than 40% of my CPU limit? Must be a K8s issue… Perf. impact
  • 11. © 2023 Akamas • All Rights Reserved • Confidential Fact #4: Setting resource requests and limits is required to ensure Kubernetes stability “While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow” (Google, Kubernetes best practices) https://cloud.google.com/blog/products/containers-kubernetes/ kubernetes-best-practices-resource-requests-and-limits
  • 12. © 2023 Akamas • All Rights Reserved • Confidential Why is K8s so hard? Application runtime resource management
  • 13. © 2023 Akamas • All Rights Reserved • Confidential App runtimes are highly configurable engines “Because Java is so often deployed on servers, this kind of performance tuning is an essential activity for many organizations. The JVM is highly configurable with literally hundreds of command-line options and switches. These switches provide performance engineers a gold mine of possibilities to explore in the pursuit of the optimal configuration for a given workload on a given platform.” $ docker run eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal
  • 14. © 2023 Akamas • All Rights Reserved • Confidential Why heap size tuning is important? JVM uses all of the available memory 2 GiB 1.2 GiB JVM heap used JVM max heap App response time ● The JVM tends to use all of the memory it has been configured with ● Sizing based on K8s container memory usage is going to miss a lot of savings ● Experiment with JVM max heap size to see how much you can save - while monitoring app performance! Key Takeaways -40% Mem used
  • 15. © 2023 Akamas • All Rights Reserved • Confidential Max heap size is set by default to 25% of container memory limit You can tune the 25% via the -XX:MaxRAMPercentage parameter: Alternatively, you can always set a fixed max heap size with the -Xmx parameter: How does the JVM set the max heap size in K8s? JVM container-aware ergonomics $ docker run --memory 1G eclipse-temurin:11-alpine java -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize size_t MaxHeapSize = 268435456 {product} {ergonomic} $ docker run --memory 1G eclipse-temurin:11-alpine java -XX:MaxRAMPercentage=50 -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize size_t MaxHeapSize = 536870912 {product} {ergonomic} $ docker run --memory 1G eclipse-temurin:11-alpine java -Xmx1024M -XX:+PrintFlagsFinal 2>&1 | grep -w MaxHeapSize size_t MaxHeapSize = 1073741824 {product} {command line}
  • 16. © 2023 Akamas • All Rights Reserved • Confidential JVM ergonomics in K8s are tricky Source: Microsoft ● JVM ergonomics do a lot of magic stuff, but they are tricky to understand and may do the wrong thing! ● MaxRAMPercentage default is very conservative: increase it, but watch out for out of memory kills by k8s ● Do not trust JVM ergonomics: it’s best to explicitly set JVM flags to avoid surprises Key Takeaways
  • 17. © 2023 Akamas • All Rights Reserved • Confidential OOM kills your app reliability - but better heap sizing can fix them Container memory used hits the memory limit, triggering K8s out-of-memory killer Context: Java microservices getting restarted due to out-of-memory kill by K8s SRE Container memory limit Container memory used Availability impact My containers keep getting OOM killed… Is this a memory leak or a misconfiguration? Let’s increase the memory limit just in case…
  • 18. © 2023 Akamas • All Rights Reserved • Confidential App runtime memory management Key Takeaways ● Heap max heap size is the main memory tuning parameter (e.g. JVM -Xmx or -XX:MaxRAMPercentage) ● Off-heap cannot be sized via configuration options - memory usage depends on your application (200 MB up to 1GB is common for the JVM) ● You need to monitor your app in production and take both spaces into account when sizing memory to achieve cost efficient and reliable microservices Heap Threads JVM max heap size K8s container memory limit JVM off-heap Classes Compiler JVM memory Initial Heap Garbage Collector
  • 19. © 2023 Akamas • All Rights Reserved • Confidential GC tuning can lead to big cost benefits 1500 millicores 600 millicores CPU used App response time G1 GC (-XX:+UseG1GC) Parallel GC (-XX:+UseParallelGC) -60% CPU used
  • 20. © 2023 Akamas • All Rights Reserved • Confidential JVM default ergonomics in K8s: garbage collector 2 4 6 8 1 Number of CPUs Memory (MB) 1791 MB Serial GC G1 GC Key Takeaways ● Default GC selection is based on hard-coded thresholds defined decades ago ● You may end up paying the cost of a suboptimal GC, and you may not even know it! ● Other good collectors like Parallel GC are not considered ● Do not trust JVM ergonomics - always set your JVM options!
  • 21. © 2023 Akamas • All Rights Reserved • Confidential Golang CPU reduction with GOGC tuning 400 millicores 180 millicores -55% CPU used Node.js has a lot of tuning flags as well (flaviocopes.com/node-runtime-v8-options)
  • 22. © 2023 Akamas • All Rights Reserved • Confidential How to solve this problem? Performance Engineering to the rescue!
  • 23. © 2023 Akamas • All Rights Reserved • Confidential The industry standard performance tuning process Analyze system performance Identify tuning parameters Change one parameter Test system with new config it’s manual, slow and error-prone, requires deep skills, doesn’t scale, is not continuous… Optimizing cloud-native applications requires a better approach!
  • 24. © 2023 Akamas • All Rights Reserved • Confidential Enter AI-driven Optimization
  • 25. © 2023 Akamas • All Rights Reserved • Confidential Autonomous optimization key capabilities
  • 26. © 2022 Akamas • All Rights Reserved • Confidential Autonomous optimization process
  • 27. © 2023 Akamas • All Rights Reserved • Confidential Optimization Studies Live Optimizations The Akamas Platform
  • 28. © 2023 Akamas • All Rights Reserved • Confidential Reducing cost of a Kubernetes microservice, while preserving app performance & reliability Demo
  • 29. © 2023 Akamas • All Rights Reserved • Confidential Key takeaways ● K8s enables unprecedented scalability & efficiency, but it’s not automatic ● Tuning is your responsibility - if you don’t tune, you don’t save! ● The biggest cost & reliability wins lie in K8s workload and app runtime layers - don’t rely on ergonomics! ● AI-powered optimization enables you to automate tuning and achieve savings at scale 1 2 3 4
  • 30. © 2023 Akamas • All Rights Reserved • Confidential Q&A
  • 31. Contacts info@akamas.io @AkamasLabs @akamaslabs Italy HQ Via Schiaffino 11 Milan, 20158 +39-02-4951-7001 USA East 211 Congress Street Boston, MA 02110 +1-617-936-0212 Singapore 5 Temasek Blvd Singapore 038985 USA West 12130 Millennium Drive Los Angeles, CA 90094 +1-323-524-0524 LinkedIn Twitter Email © 2023 Akamas • All Rights Reserved • Confidential