SlideShare a Scribd company logo
Autoscaling in Kubernetes
Marian Soltys
DevOps Engineers
www.pixelfederation.com
Autoscaling in Kubernetes
Introduction
We are game studio based in Slovakia developing free to
play mobile games.
● Trainstation
● Diggy’s Adventure
● Seaport
● Trainstation 2
● AFK Cats
● Emporea
www.pixelfederation.com
Autoscaling in Kubernetes
TL;DR Summary
● Autoscaling - do we need it?
● Three levels of scaling - VPA, HPA, CA
● Cluster Autoscaler
● Horizontal Pod Autoscaler
● Vertical Pod Autoscaler
● Custom and External metrics for scaling
● Real life example
www.pixelfederation.com
Autoscaling in Kubernetes
Autoscaling - do we need it?
● Number of active players changes over daytime and day of week
● Batch processing
● Combination of both
● Cost optimization
www.pixelfederation.com
Autoscaling in Kubernetes
Three levels of scaling - VPA, HPA, CA
● Vertical Pod Autoscaler
Kubernetes
● CA - Cluster Autoscaler
● Horizontal Pod Autoscaler
www.pixelfederation.com
Autoscaling in Kubernetes
Three levels of scaling - VPA, HPA, CA
● Horizontal Pod Autoscaler
Kubernetes
www.pixelfederation.com
Autoscaling in Kubernetes
Three levels of scaling - VPA, HPA, CA
● CA - Cluster Autoscaler
Kubernetes
● Vertical Pod Autoscaler
● Horizontal Pod Autoscaler
www.pixelfederation.com
Autoscaling in Kubernetes
Node Autoscaling
● Cluster Autoscaler
(https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
● Escalator
(https://github.com/atlassian/escalator)
● Cerebral
(https://github.com/containership/cerebral)
www.pixelfederation.com
Autoscaling in Kubernetes
Cluster Autoscaler
Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster cross AZ:
● Watches for pod in pending state events due to insufficient resources.
● Periodically check for underutilized nodes with pods that can be placed on other existing
nodes.
● Respects PodDistributionBudget, Affinity, Annotation, ...
How we use it:
● MinReplicas of 2 with podAntiAffinity to hostname
● Parameters:
○ scale-down-delay-after-add: 10m
○ scale-down-delay-after-delete: 10s
○ scale-down-unneeded-time: 10m
○ scale-down-utilization-threshold: 0.65
● Spot instances
www.pixelfederation.com
Autoscaling in Kubernetes
Vertical Pod Autoscaler
● Can automatically adjust pod requests and limits
● Calculation based on current and historical metrics
● Modes: Auto, Recreate, Initial, Off
Cons:
● Pod restarts when request changes
(Auto/Recreate modes)
● All pods start events goes through VPA
● Could conflict with HPA (on CPU and
memory)
Pros:
● Recommender
● Can solve under or over provisioned
pods
How we use it: We don’t.
www.pixelfederation.com
Autoscaling in Kubernetes
Horizontal Pod Autoscaler
● Scale deployments (number of pods) on metrics base
○ Container resources - CPU/Memory
○ Object
○ Custom/External metrics
● Think twice to use it with stateful deployments
Take into consideration:
● Default metrics loop 15 sec
● Metric toleration 10%
● Downscale stabilization time window. The default value is 5 minutes (5m0s).
● Parameters prior Kubernetes 1.17 are configured on cluster level
● Since Kubernetes 1.18+ some parameters can be tweaked under HPA .spec.behavior
www.pixelfederation.com
Autoscaling in Kubernetes
Horizontal Pod Autoscaler
www.pixelfederation.com
Autoscaling in Kubernetes
Metrics types (autoscaling/v2beta2)
Resource:
● CPU/memory
● Container request(s) must be set
● API: metrics.k8s.io
External:
● Metrics not related to Kubernetes objects
● AWS SQS, RDS, …
● API: external.metrics.k8s.io
Custom:
● Pod/Object (in same namespace)
● Time series DB required (e.g. Prometheus)
● API: custom.metrics.k8s.io
Example with target average 65%:
ceil[currentReplicas * ( currentMetricValue /
desiredMetricValue )] = desiredReplicas
4*(0.781 / 0.650 ) = 4.8 (means +1 pod)
Note: With multiple metrics highest value is
chosen.
www.pixelfederation.com
Autoscaling in Kubernetes
Custom and External metrics
Limitation: One adapter per type Custom/External metrics or one for both
% kubectl get APIService v1beta1.external.metrics.k8s.io -o yaml
kind: APIService
metadata:
labels: …
name: v1beta1.external.metrics.k8s.io
spec:
group: external.metrics.k8s.io
service:
name: k8s-cloudwatch-adapter
namespace: ...
port: …
...
www.pixelfederation.com
Autoscaling in Kubernetes
Custom and External metrics
● Check available metrics:
% kubectl get --raw /apis/external.metrics.k8s.io/v1beta1 | jq .
% kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/<ns name>/<metric name> | jq .
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": {...},
"items": [
{
"metricName": "ts2-numberOfMessagesSent",
"metricLabels": null,
"timestamp": "2021-02-17T11:05:20Z",
"value": “27"
}
]
}
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "ts2-numberOfMessagesSent",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": ["get"]
}
]
}
www.pixelfederation.com
Autoscaling in Kubernetes
Tested Adapters for HPA
● Prometheus adapter - we no longer use it
○ Doesn’t fit our needs any more - redesign of infrastructure needed
○ (https://github.com/kubernetes-sigs/prometheus-adapter)
● K8s-cloudwatch adapter - in use now
○ (https://github.com/awslabs/k8s-cloudwatch-adapter)
● Kube-metrics-adapter - evaluated
○ Collectors: Pod, Prometheus, AWS, HTTP, …
○ (https://github.com/zalando-incubator/kube-metrics-adapter)
● KEDA - evaluating now
○ (https://keda.sh)
www.pixelfederation.com
Autoscaling in Kubernetes
Trainstation 2 - Real Life Example
Types of workloads:
● Live traffic - changes over daytime
● Start/End of the Event - once per month
● Start/End of the Competitions - twice per week
● Event cleanup - a few days after event ends
Goals:
● Scaling backend - based on live traffic
● Batch/Asynchronous workers scaling - based on queue size
● Limit batch processing under DB pressure
● Maximize off peak hours batch processing
● Start of the Event/Competitions - scale ahead
www.pixelfederation.com
Autoscaling in Kubernetes
Trainstation 2 - Real Life Example
Cloudwatch adapter
Common approach:
1. Monitor queue
2. Calculate number of optimal
workers
3. Return metrics
Advanced approach:
1. Monitor queue
2. Calculate number of optimal
workers
3. Monitor RDS utilization
4. Tune number of workers to not
overload live workload
5. Return metrics
www.pixelfederation.com
Autoscaling in Kubernetes
Trainstation 2 - Real Life Example
apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
name: "ts2-numberOfMessagesSent"
spec:
name: name: "ts2-numberOfMessagesSent"
resource:
resource: "deployment"
queries:
- id: queue_metric
metricStat:
metric:
namespace: "AWS/SQS"
metricName: "NumberOfMessagesSent"
dimensions:
- name: QueueName
value: "ts2-demo"
period: 30
stat: Sum
unit: Count
returnData: false
- id: db_cpuutilization
metricStat:
metric:
namespace: "AWS/RDS"
metricName: "CPUUtilization"
dimensions:
- name: DBClusterIdentifier
value: "ts2-demo-cluster"
- name: Role
value: WRITER
period: 300
stat: Average
unit: Percent
returnData: false
- id: workers_calculated
expression: "IF((queue_metric / 300) > 100, 100, queue_metric / 300)"
returnData: false
- id: workers_desired
expression: "IF(db_cpuutilization < 80, workers_calculated,
IF(db_cpuutilization < 90, workers_calculated * 80 / 100, 0))"
returnData: true
Desired pods: get queue size -> calculate desired pods -> limit to 100 max -> if DB util. is more than 80% reduce 20%
www.pixelfederation.com
Autoscaling in Kubernetes
Trainstation 2 - Real Life Example
Horizontal Pod Autoscaler config:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
labels:
app: ts2-demo
name: ts2-demo-worker
spec:
behavior:
scaleDown:
policies:
- periodSeconds: 15
type: Percent
value: 100
selectPolicy: Max
stabilizationWindowSeconds: 120
scaleUp:
...
stabilizationWindowSeconds: 0
spec:
minReplicas: 4
maxReplicas: 100
metrics:
- external:
metric:
name: ts2-numberOfMessagesVisible
target:
averageValue: "1"
type: AverageValue
type: External
- external:
metric:
name: ts2-numberOfMessagesSent
target:
averageValue: "1"
type: AverageValue
type: External
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ts2-demo-worker
www.pixelfederation.com
Autoscaling in Kubernetes
Trainstation 2 - Real Life Example
% k get hpa ts2-demo-worker
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ts2-demo-worker Deployment/ts2-demo-worker 0/1 (avg),
1038m/1 (avg)
4 100 27 2d4h
www.pixelfederation.com
Autoscaling in Kubernetes
Trainstation 2 - Real Life Example
It’s working :) but you have to take into account stabilization window and metrics delay
www.pixelfederation.com
Autoscaling in Kubernetes
Questions ?
msoltys@pixelfederation.com
linkedin.com/in/mariansoltys
www.pixelfederation.com
Autoscaling in Kubernetes
https://portal.pixelfederation.com/sk/career

More Related Content

What's hot

AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
Amazon Web Services
 
Get the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNINGGet the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNING
Amazon Web Services
 
(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands
Amazon Web Services
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacks
Emmanuel Quentin
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
Amazon Web Services
 
AWS EC2
AWS EC2AWS EC2
AWS EC2
whiskybar
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It Means
RightScale
 
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
Amazon Web Services
 
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
Amazon Web Services
 
Windows Azure Versioning Strategies
Windows Azure Versioning StrategiesWindows Azure Versioning Strategies
Windows Azure Versioning Strategies
Pavel Revenkov
 
일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017
일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017
일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017
Amazon Web Services Korea
 
Deep Dive on Elastic Load Balancing
Deep Dive on Elastic Load BalancingDeep Dive on Elastic Load Balancing
Deep Dive on Elastic Load Balancing
Amazon Web Services
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
Amazon Web Services
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
Coburn Watson
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
Amazon Web Services
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
Amazon Web Services
 
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
Amazon Web Services
 
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
Amazon Web Services
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
Scott Miao
 

What's hot (20)

AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
 
Get the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNINGGet the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNING
 
(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacks
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
 
AWS EC2
AWS EC2AWS EC2
AWS EC2
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It Means
 
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
 
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
 
Windows Azure Versioning Strategies
Windows Azure Versioning StrategiesWindows Azure Versioning Strategies
Windows Azure Versioning Strategies
 
일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017
일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017
일본 시골 개발자의 AWS 활용기 - AWS Summit Seoul 2017
 
Deep Dive on Elastic Load Balancing
Deep Dive on Elastic Load BalancingDeep Dive on Elastic Load Balancing
Deep Dive on Elastic Load Balancing
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
 
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
 
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
AWS re:Invent 2016: How to Launch a 100K-User Corporate Back Office with Micr...
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 

Similar to Autoscaling in kubernetes v1

Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
craigbox
 
Automatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit
Automatically scaling Kubernetes workloads - SVC215-S - New York AWS SummitAutomatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit
Automatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit
Amazon Web Services
 
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep diveKubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
Bob Cotton
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summits
 
Kubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby StepsKubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby Steps
DigitalOcean
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Bob Cotton
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring
Terry Cho
 
Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming
Nicolas (Nick) Barcet
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
Wenrui Meng
 
eBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platformeBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platform
KyoungMo Yang
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Tony Ng
 
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Amazon Web Services
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
Animesh Singh
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
 How we Auto Scale applications based on CPU with Kubernetes at M6Web? How we Auto Scale applications based on CPU with Kubernetes at M6Web?
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
Vincent Gallissot
 
Autoscaling in Kubernetes
Autoscaling in KubernetesAutoscaling in Kubernetes
Autoscaling in Kubernetes
Hrishikesh Deodhar
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
Datadog
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
DoKC
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
DoKC
 

Similar to Autoscaling in kubernetes v1 (20)

Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Automatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit
Automatically scaling Kubernetes workloads - SVC215-S - New York AWS SummitAutomatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit
Automatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit
 
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep diveKubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
 
Kubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby StepsKubernetes: Beyond Baby Steps
Kubernetes: Beyond Baby Steps
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring
 
Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
 
eBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platformeBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platform
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
 How we Auto Scale applications based on CPU with Kubernetes at M6Web? How we Auto Scale applications based on CPU with Kubernetes at M6Web?
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
 
Autoscaling in Kubernetes
Autoscaling in KubernetesAutoscaling in Kubernetes
Autoscaling in Kubernetes
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 

Recently uploaded

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 

Recently uploaded (20)

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 

Autoscaling in kubernetes v1

  • 1. Autoscaling in Kubernetes Marian Soltys DevOps Engineers
  • 2. www.pixelfederation.com Autoscaling in Kubernetes Introduction We are game studio based in Slovakia developing free to play mobile games. ● Trainstation ● Diggy’s Adventure ● Seaport ● Trainstation 2 ● AFK Cats ● Emporea
  • 3. www.pixelfederation.com Autoscaling in Kubernetes TL;DR Summary ● Autoscaling - do we need it? ● Three levels of scaling - VPA, HPA, CA ● Cluster Autoscaler ● Horizontal Pod Autoscaler ● Vertical Pod Autoscaler ● Custom and External metrics for scaling ● Real life example
  • 4. www.pixelfederation.com Autoscaling in Kubernetes Autoscaling - do we need it? ● Number of active players changes over daytime and day of week ● Batch processing ● Combination of both ● Cost optimization
  • 5. www.pixelfederation.com Autoscaling in Kubernetes Three levels of scaling - VPA, HPA, CA ● Vertical Pod Autoscaler Kubernetes ● CA - Cluster Autoscaler ● Horizontal Pod Autoscaler
  • 6. www.pixelfederation.com Autoscaling in Kubernetes Three levels of scaling - VPA, HPA, CA ● Horizontal Pod Autoscaler Kubernetes
  • 7. www.pixelfederation.com Autoscaling in Kubernetes Three levels of scaling - VPA, HPA, CA ● CA - Cluster Autoscaler Kubernetes ● Vertical Pod Autoscaler ● Horizontal Pod Autoscaler
  • 8. www.pixelfederation.com Autoscaling in Kubernetes Node Autoscaling ● Cluster Autoscaler (https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) ● Escalator (https://github.com/atlassian/escalator) ● Cerebral (https://github.com/containership/cerebral)
  • 9. www.pixelfederation.com Autoscaling in Kubernetes Cluster Autoscaler Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster cross AZ: ● Watches for pod in pending state events due to insufficient resources. ● Periodically check for underutilized nodes with pods that can be placed on other existing nodes. ● Respects PodDistributionBudget, Affinity, Annotation, ... How we use it: ● MinReplicas of 2 with podAntiAffinity to hostname ● Parameters: ○ scale-down-delay-after-add: 10m ○ scale-down-delay-after-delete: 10s ○ scale-down-unneeded-time: 10m ○ scale-down-utilization-threshold: 0.65 ● Spot instances
  • 10. www.pixelfederation.com Autoscaling in Kubernetes Vertical Pod Autoscaler ● Can automatically adjust pod requests and limits ● Calculation based on current and historical metrics ● Modes: Auto, Recreate, Initial, Off Cons: ● Pod restarts when request changes (Auto/Recreate modes) ● All pods start events goes through VPA ● Could conflict with HPA (on CPU and memory) Pros: ● Recommender ● Can solve under or over provisioned pods How we use it: We don’t.
  • 11. www.pixelfederation.com Autoscaling in Kubernetes Horizontal Pod Autoscaler ● Scale deployments (number of pods) on metrics base ○ Container resources - CPU/Memory ○ Object ○ Custom/External metrics ● Think twice to use it with stateful deployments Take into consideration: ● Default metrics loop 15 sec ● Metric toleration 10% ● Downscale stabilization time window. The default value is 5 minutes (5m0s). ● Parameters prior Kubernetes 1.17 are configured on cluster level ● Since Kubernetes 1.18+ some parameters can be tweaked under HPA .spec.behavior
  • 13. www.pixelfederation.com Autoscaling in Kubernetes Metrics types (autoscaling/v2beta2) Resource: ● CPU/memory ● Container request(s) must be set ● API: metrics.k8s.io External: ● Metrics not related to Kubernetes objects ● AWS SQS, RDS, … ● API: external.metrics.k8s.io Custom: ● Pod/Object (in same namespace) ● Time series DB required (e.g. Prometheus) ● API: custom.metrics.k8s.io Example with target average 65%: ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] = desiredReplicas 4*(0.781 / 0.650 ) = 4.8 (means +1 pod) Note: With multiple metrics highest value is chosen.
  • 14. www.pixelfederation.com Autoscaling in Kubernetes Custom and External metrics Limitation: One adapter per type Custom/External metrics or one for both % kubectl get APIService v1beta1.external.metrics.k8s.io -o yaml kind: APIService metadata: labels: … name: v1beta1.external.metrics.k8s.io spec: group: external.metrics.k8s.io service: name: k8s-cloudwatch-adapter namespace: ... port: … ...
  • 15. www.pixelfederation.com Autoscaling in Kubernetes Custom and External metrics ● Check available metrics: % kubectl get --raw /apis/external.metrics.k8s.io/v1beta1 | jq . % kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/<ns name>/<metric name> | jq . { "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {...}, "items": [ { "metricName": "ts2-numberOfMessagesSent", "metricLabels": null, "timestamp": "2021-02-17T11:05:20Z", "value": “27" } ] } { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "ts2-numberOfMessagesSent", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": ["get"] } ] }
  • 16. www.pixelfederation.com Autoscaling in Kubernetes Tested Adapters for HPA ● Prometheus adapter - we no longer use it ○ Doesn’t fit our needs any more - redesign of infrastructure needed ○ (https://github.com/kubernetes-sigs/prometheus-adapter) ● K8s-cloudwatch adapter - in use now ○ (https://github.com/awslabs/k8s-cloudwatch-adapter) ● Kube-metrics-adapter - evaluated ○ Collectors: Pod, Prometheus, AWS, HTTP, … ○ (https://github.com/zalando-incubator/kube-metrics-adapter) ● KEDA - evaluating now ○ (https://keda.sh)
  • 17. www.pixelfederation.com Autoscaling in Kubernetes Trainstation 2 - Real Life Example Types of workloads: ● Live traffic - changes over daytime ● Start/End of the Event - once per month ● Start/End of the Competitions - twice per week ● Event cleanup - a few days after event ends Goals: ● Scaling backend - based on live traffic ● Batch/Asynchronous workers scaling - based on queue size ● Limit batch processing under DB pressure ● Maximize off peak hours batch processing ● Start of the Event/Competitions - scale ahead
  • 18. www.pixelfederation.com Autoscaling in Kubernetes Trainstation 2 - Real Life Example Cloudwatch adapter Common approach: 1. Monitor queue 2. Calculate number of optimal workers 3. Return metrics Advanced approach: 1. Monitor queue 2. Calculate number of optimal workers 3. Monitor RDS utilization 4. Tune number of workers to not overload live workload 5. Return metrics
  • 19. www.pixelfederation.com Autoscaling in Kubernetes Trainstation 2 - Real Life Example apiVersion: metrics.aws/v1alpha1 kind: ExternalMetric metadata: name: "ts2-numberOfMessagesSent" spec: name: name: "ts2-numberOfMessagesSent" resource: resource: "deployment" queries: - id: queue_metric metricStat: metric: namespace: "AWS/SQS" metricName: "NumberOfMessagesSent" dimensions: - name: QueueName value: "ts2-demo" period: 30 stat: Sum unit: Count returnData: false - id: db_cpuutilization metricStat: metric: namespace: "AWS/RDS" metricName: "CPUUtilization" dimensions: - name: DBClusterIdentifier value: "ts2-demo-cluster" - name: Role value: WRITER period: 300 stat: Average unit: Percent returnData: false - id: workers_calculated expression: "IF((queue_metric / 300) > 100, 100, queue_metric / 300)" returnData: false - id: workers_desired expression: "IF(db_cpuutilization < 80, workers_calculated, IF(db_cpuutilization < 90, workers_calculated * 80 / 100, 0))" returnData: true Desired pods: get queue size -> calculate desired pods -> limit to 100 max -> if DB util. is more than 80% reduce 20%
  • 20. www.pixelfederation.com Autoscaling in Kubernetes Trainstation 2 - Real Life Example Horizontal Pod Autoscaler config: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: labels: app: ts2-demo name: ts2-demo-worker spec: behavior: scaleDown: policies: - periodSeconds: 15 type: Percent value: 100 selectPolicy: Max stabilizationWindowSeconds: 120 scaleUp: ... stabilizationWindowSeconds: 0 spec: minReplicas: 4 maxReplicas: 100 metrics: - external: metric: name: ts2-numberOfMessagesVisible target: averageValue: "1" type: AverageValue type: External - external: metric: name: ts2-numberOfMessagesSent target: averageValue: "1" type: AverageValue type: External scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ts2-demo-worker
  • 21. www.pixelfederation.com Autoscaling in Kubernetes Trainstation 2 - Real Life Example % k get hpa ts2-demo-worker NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE ts2-demo-worker Deployment/ts2-demo-worker 0/1 (avg), 1038m/1 (avg) 4 100 27 2d4h
  • 22. www.pixelfederation.com Autoscaling in Kubernetes Trainstation 2 - Real Life Example It’s working :) but you have to take into account stabilization window and metrics delay
  • 23. www.pixelfederation.com Autoscaling in Kubernetes Questions ? msoltys@pixelfederation.com linkedin.com/in/mariansoltys