SlideShare a Scribd company logo
1 of 37
Download to read offline
Optimizing Application
Performance on Kubernetes
Dinakar Guniguntala @dinogun
About Me
●
Architect, Runtime Cloud Optimization
●
Former Maintainer, AdoptOpenJDK Community Docker Images
●
Interested in every aspect of running Java Apps in K8s including
Cloud Native as well as Legacy migration to Cloud
●
Ex Linux Kernel and glibc hacker
Dinakar Guniguntala (@dinogun)
Runtimes Cloud Architect, Red Hat
Kubernetes is a portable,
extensible, open-source
platform for managing
containerized workloads
and services, that facilitates
both declarative configuration
… blah blah blah
Kitna Deti Hai ?*
Any questions ?
* What's the mileage ?
●
Throughput
●
Response Time
●
Utilization
Public
Private
Public
Public
Lower My Response Time!
What is the granularity of observation ?
●
Trade-off between accurate info and overhead
Additional Operational Info
●
Quarkus Micrometer
●
Spring Actuator
●
Liberty MicroProfile
●
Node.js prom-client
Observability
BIOS
●
CPU Power and Performance Policy: <Performance>
OS / Hypervisor
●
CPU Scaling governor: <Performance>
$ cat
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
performance powersave
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Performance
Hyperthreading
●
Do not count hyperthreading while capacity planning
Don’t Forget The Hardware
Node Affinity
SRE
Lower My Response Time!
Pod Affinity
Node Affinity
●
Helps to match workloads to right resources
Pod Affinity
●
Helps to schedule related pods together
Node and Pod Affinities
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: topology.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
CPU Request / Limit Memory Request / Limit
SRE
Lower My Response Time!
Node Affinity Pod Affinity
K8s QoS
classes
Guaranteed
Burstable
BestEffort
Right Size
apiVersion: apps/v1
kind: Deployment
metadata:
name: acmeair
labels:
app: acmeair-app
spec:
replicas: 1
selector:
matchLabels:
app: acmeair-deployment
template:
metadata:
labels:
name: acmeair-deployment
app: acmeair-deployment
app.kubernetes.io/name: "acmeair-mono"
version: v1
spec:
volumes:
- name: test-volume
hostPath:
path: "/root/icp/jLogs"
type: ""
containers:
- name: acmeair-libertyapp
image: dinogun/acmeair-monolithic
imagePullPolicy: Always
ports:
- containerPort: 8080
resources:
requests:
memory: 500M
cpu: 2
limits:
memory: 1024M
cpu: 3
volumeMounts:
- name: "test-volume"
mountPath: "/opt/jLogs"
Ensure LimitRange does not
get in the way of your
deployment !
apiVersion: v1
kind: LimitRange
metadata:
name: limit-range
spec:
limits:
- default:
cpu: 1
memory: 512Mi
defaultRequest:
cpu: 0.5
memory: 256Mi
type: Container
Requests → Should cover the observed peaks
Limits → Handle any spikes !
CPU Request / Limit Memory Request / Limit
SRE
Java Heap Size / Ratio
Lower My Response Time!
Node Affinity Pod Affinity
Container Aware JVM
Use -XX:MaxRAMPercentage and
-XX:InitialRAMPercentage
instead of -Xmx and -Xms.
Heap = 2.4G
Container Mem = 3G
Container Mem = 2G Container Mem = 4G
-Xmx = 2G -Xmx = 2G -Xmx = 2G
Comparing a fixed heap size with a “MaxRAMPercentage” setting
Here “-XX:MaxRAMPercentage=80”
Don’t Hardcode the Java Heap!
Heap = 1.6G Heap = 3.2G
Beware of Default Hotspot
Settings
If container “mem < 1G”,
assumed as “client-class”
machine by the JVM and
the default is “serial GC” !
CPU Request / Limit Memory Request / Limit
SRE
Java Heap Size / Ratio
Lower My Response Time!
Node Affinity Pod Affinity
VPA HPA CA
It’s All About the Scaling
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 1k
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
Set HPA with app specific metrics
- type: External
external:
metric:
name: concurrent_connections
selector: "connection=current"
target:
type: Value
Value: 1200
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: zookeeper
Use PodDisruptionBudget with CA
to ensure no service disruption
SRE
Optimize the App Stack?
Lets Take a Step Back
Life of a SRE?!
Finance
Developer
User
So what do we need here ?
●
Multiple stake holders to express requirements as an
“Objective Function”
●
Autonomously detect all the right options that tries to
match the “Objective Function”
●
Try options intelligently and provide a recommendation
Introducing
Kruize Autotune
https://github.com/kruize/autotune
Autotune Architecture
Example Autotune yaml
apiVersion: "recommender.com/v1"
kind: "Autotune"
metadata:
name: "quarkusapp-autotune"
namespace: "quarkusapp-autotune-ns"
spec:
slo:
objective_function: “performedChecks_total”
direction: “maximize”
slo_class: "throughput"
hpo_algo_impl: optuna_tpe
function_variables:
- name: “performedChecks_total”
query: "metrics_QuarkusApp_performedChecks_total"
datasource: "prometheus"
value_type: "double"
mode: "show"
selector:
matchLabel: "app.kubernetes.io/name"
matchLabelValue: "quarkusApp-deployment"
datasource:
name: “prometheus”
value: “prometheus_URL”
Dependency
Analyzer
Autotune
Operator
Experiment
Manager
App
Operator(s)
App Pods
(Production)
Deploy App Pods
with Experimental
Config
Config
experiment
Experiment
Results
App
Metrics
App Pods
(Training)
Incoming
App Load
Config
Recommendation
Recommendation
Manager
Metric
Providers
Tuning Sets
Search Space
Objective function
+
Tunables
(Container + Runtime +
App Server + App)
+
Ranges
optuna_tpe
Hyper-Parameter
Optimization
tpemultivariate
Hyper-Parameter
Optimization
optuna_scikit
Hyper-Parameter
Optimization
Results
Summary
Micrometer
Metrics
Layer
Info
Demo
Objective Fn: Reduce Response Time
[Layer] [Tunable] [Default, Range]
[Quarkus] quarkus.thread-pool.core-threads [1, 3-256]
[Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000]
[Quarkus] quarkus.datasource.jdbc.min-size [0, 2-31]
[Quarkus] quarkus.datasource.jdbc.max-size [20, 32-100]
[Hotspot] FreqInlineSize [325, 325-1000]
[Hotspot] MaxInlineLevel [9, 9-50]
[Hotspot] MinInliningThreshold [250, 0-500]
[Hotspot] CompileThreshold [1500, 1000-20000]
[Hotspot] CompileThresholdScaling [1, 1-20]
[Hotspot] ConcGCThreads [0, 0-32]
[Hotspot] InlineSmallCode [1000, 500-5000]
[Hotspot] LoopUnrollLimit [50, 20-250]
[Hotspot] LoopUnrollMin [4, 0-20]
[Hotspot] MinSurvivorRatio [3, 3-48]
[Hotspot] NewRatio [2, 1-20]
[Hotspot] TieredStopAtLevel [4, 0-4]
[Hotspot] TieredCompilation [false, ]
[Hotspot] AllowParallelDefineClass [false, ]
[Hotspot] AllowVectorizeOnDemand [true, ]
[Hotspot] AlwaysCompileLoopMethods [false, ]
[Hotspot] AlwaysPreTouch [false, ]
[Hotspot] AlwaysTenure [false, ]
[Hotspot] BackgroundCompilation [true, ]
[Hotspot] DoEscapeAnalysis [true, ]
[Hotspot] UseInlineCaches [true, ]
[Hotspot] UseLoopPredicate [true, ]
[Hotspot] UseStringDeduplication [false, ]
[Hotspot] UseSuperWord [true, ]
[Hotspot] UseTypeSpeculation [true, ]
[Container] cpuRequest [None, 1-32]
[Container] memoryRequest [None, 270M-8192M]
Openshift version 4.8.13
3 Master
6 Worker
32C – 32GB
Each
RHEL 8.3
4C – 8GB
Benchmark → TechEmpower
Framework
– Quarkus RestEasy
K8s resource requests = limits
Incoming load is constant = 512 users
Objective Fn: Reduce Response Time
Be careful
what you
wish for !
0.28 ms
Default
0.83 ms
Autotune vs Default Config – Take 1
[ Obj Fn = Minimal Response Time ]
Summary: Better perf at a cost of higher hardware config
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-4
Autotune vs Default Config – Take 1
[ Obj Fn = Minimal Response Time ]
60% better response time 19% better throughput
1.82 ms
Default
5.01 ms
Autotune vs Default Config – Take 2
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ]
Summary: Better perf but slightly higher tail latencies
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-6
Autotune vs Default Config – Take 2
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ]
64% better response time 6% better throughput
1.91 ms
Default
5.01 ms
Autotune vs Default Config – Take 3
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
Best perf taking into account all requirements !
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7
Autotune vs Default Config – Take 3
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
62% better response time 7% better throughput
Cost for handling 1 million transactions / sec
For full results please see
https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7
Autotune vs Default Config – Take 3 - COST
[ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
8% cost reduction
Objective Fn: Reduce Response Time
[Layer] [Tunable] [Default, Range] Best Config (1.91 ms)
[Quarkus] quarkus.thread-pool.core-threads [1, 0-32] = 19
[Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000] = 3700
[Quarkus] quarkus.datasource.jdbc.min-size [0, 1-12] = 10
[Quarkus] quarkus.datasource.jdbc.max-size [12, 12-90] = 86
[Hotspot] FreqInlineSize [325, 325-500] = 340
[Hotspot] MaxInlineLevel [9, 9-50] = 50
[Hotspot] MinInliningThreshold [250, 0-200] = 55
[Hotspot] CompileThreshold [1500, 1000-10000] = 6930
[Hotspot] CompileThresholdScaling [1, 1-15] = 8.3
[Hotspot] ConcGCThreads [0, 0-8] = 6
[Hotspot] InlineSmallCode [1000, 500-5000] = 1416
[Hotspot] LoopUnrollLimit [50, 20-250] = 128
[Hotspot] LoopUnrollMin [4, 0-20] = 13
[Hotspot] MinSurvivorRatio [3, 3-48] = 12
[Hotspot] NewRatio [2, 1-10] = 9
[Hotspot] TieredStopAtLevel [4, 0-4] = 4
[Hotspot] TieredCompilation [false, ] = true
[Hotspot] AllowParallelDefineClass [false, ] = false
[Hotspot] AllowVectorizeOnDemand [true, ] = true
[Hotspot] AlwaysCompileLoopMethods [false, ] = false
[Hotspot] AlwaysPreTouch [false, ] = false
[Hotspot] AlwaysTenure [false, ] = true
[Hotspot] BackgroundCompilation [true, ] = true
[Hotspot] DoEscapeAnalysis [true, ] = true
[Hotspot] UseInlineCaches [true, ] = false
[Hotspot] UseLoopPredicate [true, ] = false
[Hotspot] UseStringDeduplication [false, ] = false
[Hotspot] UseSuperWord [true, ] = true
[Hotspot] UseTypeSpeculation [true, ] = true
[Container] cpuRequest [None, 1-4] = 4
[Container] memoryRequest [None, 270M-4096M] = 3319M
Openshift version 4.8.13
3 Master
6 Worker
32C – 32GB
Each
RHEL 8.3
4C – 8GB
Benchmark → TechEmpower
Framework
– Quarkus RestEasy
K8s resource requests = limits
Incoming load is constant = 512 users
Autotune Roadmap
●
Autotune MVP expected 1H 2022
●
Currently single service only
●
For Dev / QA environments
●
Different load conditions = multiple
recommended configs
●
HPA recommendation
Summary
●
Observability is Key
●
Do not forget to tune the hardware
●
Set Node and Pod Affinities
●
Ensure requests and limits are set for all app pods and right sized
●
Do not hardcode the Java heap
●
Use app specific scaling metrics
●
Ensure no disruption with PDB
●
Check out Autotune for autonomous tuning and stay tuned(!) for
updates.
Repo’s and Contributing
●
Kruize Project - https://github.com/kruize
●
Autotune - https://github.com/kruize/autotune
●
Autotune Demo - https://github.com/kruize/autotune-demo
●
Benchmarks - https://github.com/kruize/benchmarks
●
Autotune Results - https://github.com/kruize/autotune-results
Call for collaboration !
Kruize Slack
@dinogun
Questions

More Related Content

What's hot

Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes ClusterKubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes Clustersmalltown
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016Qiming Teng
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayQAware GmbH
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetescraigbox
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Frank Kelly
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStackPradeep Kumar
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuVMware Tanzu
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developerPaul Czarkowski
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Uri Cohen
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayQiming Teng
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksRuslan Meshenberg
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019confluent
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...DataStax
 
Load Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLoad Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLee Calcote
 

What's hot (20)

Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes ClusterKubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFu
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native Way
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Load Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLoad Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & Kubernetes
 

Similar to DevoxxUK: Optimizating Application Performance on Kubernetes

Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunningguest1f2740
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance TunningTerry Cho
 
Java Performance and Using Java Flight Recorder
Java Performance and Using Java Flight RecorderJava Performance and Using Java Flight Recorder
Java Performance and Using Java Flight RecorderIsuru Perera
 
Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and ProfilingWSO2
 
Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...Jelastic Multi-Cloud PaaS
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To PrometheusEtienne Coutaud
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovValeriia Maliarenko
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan finalpreethaappan
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisationgrooverdan
 
Improving PHP Application Performance with APC
Improving PHP Application Performance with APCImproving PHP Application Performance with APC
Improving PHP Application Performance with APCvortexau
 
php & performance
 php & performance php & performance
php & performancesimon8410
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magentoMathew Beane
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with PrometheusOpenStack Korea Community
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1상욱 송
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practiceDocker, Inc.
 
Kubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby StepsKubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby StepsDigitalOcean
 

Similar to DevoxxUK: Optimizating Application Performance on Kubernetes (20)

Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Java Performance and Using Java Flight Recorder
Java Performance and Using Java Flight RecorderJava Performance and Using Java Flight Recorder
Java Performance and Using Java Flight Recorder
 
Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and Profiling
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
Improving PHP Application Performance with APC
Improving PHP Application Performance with APCImproving PHP Application Performance with APC
Improving PHP Application Performance with APC
 
php & performance
 php & performance php & performance
php & performance
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magento
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Kubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby StepsKubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby Steps
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

DevoxxUK: Optimizating Application Performance on Kubernetes

  • 1. Optimizing Application Performance on Kubernetes Dinakar Guniguntala @dinogun
  • 2. About Me ● Architect, Runtime Cloud Optimization ● Former Maintainer, AdoptOpenJDK Community Docker Images ● Interested in every aspect of running Java Apps in K8s including Cloud Native as well as Legacy migration to Cloud ● Ex Linux Kernel and glibc hacker Dinakar Guniguntala (@dinogun) Runtimes Cloud Architect, Red Hat
  • 3. Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration … blah blah blah Kitna Deti Hai ?* Any questions ? * What's the mileage ?
  • 7. What is the granularity of observation ? ● Trade-off between accurate info and overhead Additional Operational Info ● Quarkus Micrometer ● Spring Actuator ● Liberty MicroProfile ● Node.js prom-client Observability
  • 8. BIOS ● CPU Power and Performance Policy: <Performance> OS / Hypervisor ● CPU Scaling governor: <Performance> $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors performance powersave $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor Performance Hyperthreading ● Do not count hyperthreading while capacity planning Don’t Forget The Hardware
  • 9. Node Affinity SRE Lower My Response Time! Pod Affinity
  • 10. Node Affinity ● Helps to match workloads to right resources Pod Affinity ● Helps to schedule related pods together Node and Pod Affinities spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: topology.kubernetes.io/zone podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: topology.kubernetes.io/zone
  • 11. CPU Request / Limit Memory Request / Limit SRE Lower My Response Time! Node Affinity Pod Affinity
  • 12. K8s QoS classes Guaranteed Burstable BestEffort Right Size apiVersion: apps/v1 kind: Deployment metadata: name: acmeair labels: app: acmeair-app spec: replicas: 1 selector: matchLabels: app: acmeair-deployment template: metadata: labels: name: acmeair-deployment app: acmeair-deployment app.kubernetes.io/name: "acmeair-mono" version: v1 spec: volumes: - name: test-volume hostPath: path: "/root/icp/jLogs" type: "" containers: - name: acmeair-libertyapp image: dinogun/acmeair-monolithic imagePullPolicy: Always ports: - containerPort: 8080 resources: requests: memory: 500M cpu: 2 limits: memory: 1024M cpu: 3 volumeMounts: - name: "test-volume" mountPath: "/opt/jLogs" Ensure LimitRange does not get in the way of your deployment ! apiVersion: v1 kind: LimitRange metadata: name: limit-range spec: limits: - default: cpu: 1 memory: 512Mi defaultRequest: cpu: 0.5 memory: 256Mi type: Container Requests → Should cover the observed peaks Limits → Handle any spikes !
  • 13. CPU Request / Limit Memory Request / Limit SRE Java Heap Size / Ratio Lower My Response Time! Node Affinity Pod Affinity
  • 14. Container Aware JVM Use -XX:MaxRAMPercentage and -XX:InitialRAMPercentage instead of -Xmx and -Xms. Heap = 2.4G Container Mem = 3G Container Mem = 2G Container Mem = 4G -Xmx = 2G -Xmx = 2G -Xmx = 2G Comparing a fixed heap size with a “MaxRAMPercentage” setting Here “-XX:MaxRAMPercentage=80” Don’t Hardcode the Java Heap! Heap = 1.6G Heap = 3.2G Beware of Default Hotspot Settings If container “mem < 1G”, assumed as “client-class” machine by the JVM and the default is “serial GC” !
  • 15. CPU Request / Limit Memory Request / Limit SRE Java Heap Size / Ratio Lower My Response Time! Node Affinity Pod Affinity VPA HPA CA
  • 16. It’s All About the Scaling apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: php-apache spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k - type: Object object: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1beta1 kind: Ingress name: main-route target: type: Value value: 10k Set HPA with app specific metrics - type: External external: metric: name: concurrent_connections selector: "connection=current" target: type: Value Value: 1200 apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: zk-pdb spec: maxUnavailable: 1 selector: matchLabels: app: zookeeper Use PodDisruptionBudget with CA to ensure no service disruption
  • 18. Lets Take a Step Back
  • 19. Life of a SRE?! Finance Developer User
  • 20. So what do we need here ? ● Multiple stake holders to express requirements as an “Objective Function” ● Autonomously detect all the right options that tries to match the “Objective Function” ● Try options intelligently and provide a recommendation
  • 22. Autotune Architecture Example Autotune yaml apiVersion: "recommender.com/v1" kind: "Autotune" metadata: name: "quarkusapp-autotune" namespace: "quarkusapp-autotune-ns" spec: slo: objective_function: “performedChecks_total” direction: “maximize” slo_class: "throughput" hpo_algo_impl: optuna_tpe function_variables: - name: “performedChecks_total” query: "metrics_QuarkusApp_performedChecks_total" datasource: "prometheus" value_type: "double" mode: "show" selector: matchLabel: "app.kubernetes.io/name" matchLabelValue: "quarkusApp-deployment" datasource: name: “prometheus” value: “prometheus_URL” Dependency Analyzer Autotune Operator Experiment Manager App Operator(s) App Pods (Production) Deploy App Pods with Experimental Config Config experiment Experiment Results App Metrics App Pods (Training) Incoming App Load Config Recommendation Recommendation Manager Metric Providers Tuning Sets Search Space Objective function + Tunables (Container + Runtime + App Server + App) + Ranges optuna_tpe Hyper-Parameter Optimization tpemultivariate Hyper-Parameter Optimization optuna_scikit Hyper-Parameter Optimization Results Summary Micrometer Metrics Layer Info
  • 23. Demo
  • 24. Objective Fn: Reduce Response Time [Layer] [Tunable] [Default, Range] [Quarkus] quarkus.thread-pool.core-threads [1, 3-256] [Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000] [Quarkus] quarkus.datasource.jdbc.min-size [0, 2-31] [Quarkus] quarkus.datasource.jdbc.max-size [20, 32-100] [Hotspot] FreqInlineSize [325, 325-1000] [Hotspot] MaxInlineLevel [9, 9-50] [Hotspot] MinInliningThreshold [250, 0-500] [Hotspot] CompileThreshold [1500, 1000-20000] [Hotspot] CompileThresholdScaling [1, 1-20] [Hotspot] ConcGCThreads [0, 0-32] [Hotspot] InlineSmallCode [1000, 500-5000] [Hotspot] LoopUnrollLimit [50, 20-250] [Hotspot] LoopUnrollMin [4, 0-20] [Hotspot] MinSurvivorRatio [3, 3-48] [Hotspot] NewRatio [2, 1-20] [Hotspot] TieredStopAtLevel [4, 0-4] [Hotspot] TieredCompilation [false, ] [Hotspot] AllowParallelDefineClass [false, ] [Hotspot] AllowVectorizeOnDemand [true, ] [Hotspot] AlwaysCompileLoopMethods [false, ] [Hotspot] AlwaysPreTouch [false, ] [Hotspot] AlwaysTenure [false, ] [Hotspot] BackgroundCompilation [true, ] [Hotspot] DoEscapeAnalysis [true, ] [Hotspot] UseInlineCaches [true, ] [Hotspot] UseLoopPredicate [true, ] [Hotspot] UseStringDeduplication [false, ] [Hotspot] UseSuperWord [true, ] [Hotspot] UseTypeSpeculation [true, ] [Container] cpuRequest [None, 1-32] [Container] memoryRequest [None, 270M-8192M] Openshift version 4.8.13 3 Master 6 Worker 32C – 32GB Each RHEL 8.3 4C – 8GB Benchmark → TechEmpower Framework – Quarkus RestEasy K8s resource requests = limits Incoming load is constant = 512 users
  • 25. Objective Fn: Reduce Response Time Be careful what you wish for !
  • 26. 0.28 ms Default 0.83 ms Autotune vs Default Config – Take 1 [ Obj Fn = Minimal Response Time ]
  • 27. Summary: Better perf at a cost of higher hardware config For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-4 Autotune vs Default Config – Take 1 [ Obj Fn = Minimal Response Time ] 60% better response time 19% better throughput
  • 28. 1.82 ms Default 5.01 ms Autotune vs Default Config – Take 2 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ]
  • 29. Summary: Better perf but slightly higher tail latencies For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-6 Autotune vs Default Config – Take 2 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) ] 64% better response time 6% better throughput
  • 30. 1.91 ms Default 5.01 ms Autotune vs Default Config – Take 3 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ]
  • 31. Best perf taking into account all requirements ! For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7 Autotune vs Default Config – Take 3 [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ] 62% better response time 7% better throughput
  • 32. Cost for handling 1 million transactions / sec For full results please see https://github.com/kruize/autotune-results/tree/main/techempower/experiment-7 Autotune vs Default Config – Take 3 - COST [ Obj Fn = Minimal Response Time + Fixed Resources (4C, 4GB) + Low Tail Latency ] 8% cost reduction
  • 33. Objective Fn: Reduce Response Time [Layer] [Tunable] [Default, Range] Best Config (1.91 ms) [Quarkus] quarkus.thread-pool.core-threads [1, 0-32] = 19 [Quarkus] quarkus.thread-pool.queue-size [unbounded, 0-10000] = 3700 [Quarkus] quarkus.datasource.jdbc.min-size [0, 1-12] = 10 [Quarkus] quarkus.datasource.jdbc.max-size [12, 12-90] = 86 [Hotspot] FreqInlineSize [325, 325-500] = 340 [Hotspot] MaxInlineLevel [9, 9-50] = 50 [Hotspot] MinInliningThreshold [250, 0-200] = 55 [Hotspot] CompileThreshold [1500, 1000-10000] = 6930 [Hotspot] CompileThresholdScaling [1, 1-15] = 8.3 [Hotspot] ConcGCThreads [0, 0-8] = 6 [Hotspot] InlineSmallCode [1000, 500-5000] = 1416 [Hotspot] LoopUnrollLimit [50, 20-250] = 128 [Hotspot] LoopUnrollMin [4, 0-20] = 13 [Hotspot] MinSurvivorRatio [3, 3-48] = 12 [Hotspot] NewRatio [2, 1-10] = 9 [Hotspot] TieredStopAtLevel [4, 0-4] = 4 [Hotspot] TieredCompilation [false, ] = true [Hotspot] AllowParallelDefineClass [false, ] = false [Hotspot] AllowVectorizeOnDemand [true, ] = true [Hotspot] AlwaysCompileLoopMethods [false, ] = false [Hotspot] AlwaysPreTouch [false, ] = false [Hotspot] AlwaysTenure [false, ] = true [Hotspot] BackgroundCompilation [true, ] = true [Hotspot] DoEscapeAnalysis [true, ] = true [Hotspot] UseInlineCaches [true, ] = false [Hotspot] UseLoopPredicate [true, ] = false [Hotspot] UseStringDeduplication [false, ] = false [Hotspot] UseSuperWord [true, ] = true [Hotspot] UseTypeSpeculation [true, ] = true [Container] cpuRequest [None, 1-4] = 4 [Container] memoryRequest [None, 270M-4096M] = 3319M Openshift version 4.8.13 3 Master 6 Worker 32C – 32GB Each RHEL 8.3 4C – 8GB Benchmark → TechEmpower Framework – Quarkus RestEasy K8s resource requests = limits Incoming load is constant = 512 users
  • 34. Autotune Roadmap ● Autotune MVP expected 1H 2022 ● Currently single service only ● For Dev / QA environments ● Different load conditions = multiple recommended configs ● HPA recommendation
  • 35. Summary ● Observability is Key ● Do not forget to tune the hardware ● Set Node and Pod Affinities ● Ensure requests and limits are set for all app pods and right sized ● Do not hardcode the Java heap ● Use app specific scaling metrics ● Ensure no disruption with PDB ● Check out Autotune for autonomous tuning and stay tuned(!) for updates.
  • 36. Repo’s and Contributing ● Kruize Project - https://github.com/kruize ● Autotune - https://github.com/kruize/autotune ● Autotune Demo - https://github.com/kruize/autotune-demo ● Benchmarks - https://github.com/kruize/benchmarks ● Autotune Results - https://github.com/kruize/autotune-results Call for collaboration ! Kruize Slack @dinogun