Migrating from EKS Cluster Autoscaler to Karpenter

Migrating from EKS
Cluster Autoscaler to
Karpenter
Ashish Gajjar
AWS Community Builder
Member of Technical Staff
eInfochips

 What is AutoScaling ?
 HPA and VPA
 Cluster Autoscaler
 Karpenter
 Autoscaler to Karpenter Migration
 Demo
Outline

What is Autoscaling?
Scale our Amazon EC2 capacity out or down
automatically according to the load patterns :
 expands the number of instances from 1 to 100+
automatically during load peaks.
 reduce the number of instances from 100+ to 1
automatically during load valleys.

Horizontal Pod Autoscaler
• HPA is responsible for dynamically adjusting the number of pod
replicas based on CPU or memory utilization metrics

Vertical Pod Autoscaler
 Automatically adjusts the CPU and memory
reservations for your pods to help "right
size" your applications.
 Improve cluster resource utilization and
free up CPU and memory for other pods.
Memory “64Mi” Memory “128Mi”
64 Mi
128 Mi

1. Horizontal Pod
Autoscaling (HPA)
2. Vertical Pod
Autoscaling (VPA)
3. Cluster
Autoscaler (CAS)
Metrics
Store
HPA
Pending pods
CAS
X
VPA
Kuberenetes Autoscaling

What is Cluster Autoscaler ?
 Automatically adjusts the number of worker nodes in a cluster based on workload.
 Scaling down nodes when they are no longer needed.
 Utilization of resources, such as CPU and memory, and adjusts the size of the cluster by
adding or removing nodes.
 Costs associated with overprovisioning of resources.
Pending pods CAS
X
Scale up ASG
API call to ASG

•Scaling Delays:
• Takes time to adjust scaling based on pending pods.
• Limited granularity in terms of optimizing resource utilization.
•Complex Node Group Management:
• Requires manual configuration of Auto Scaling Groups and
instance types.
•Limited Flexibility:
• Doesn't support rapid scaling of diverse workloads or smaller,
heterogeneous instances.
•Suboptimal Utilization:
• Can lead to resource fragmentation or inefficient use of EC2
instances
Limitation

Create Cluster
eksctl create cluster
--name < Name of Cluster >
--version < Version Number >
--region < Region Name >
--node group-name < ASG Name >
--node-type < Instance Type >
--nodes 2
--nodes-min 1
--nodes-max 4
--managed
Kubectl get pod –A
 AWS-Node
 COREDNS
 KUBE-PROXY
[ec2-user@ip-172-31-0-244 ~]$ eksctl create cluster --name ashish --version 1.30
--region ap-south-1 --nodegroup-name ashish-workers --node-type t3.medium --nodes
1 --nodes-min 1 --nodes-max 3 --managed
[root@ip-172-31-18-24 ~]# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-kdmxh 1/1 Running 0 51m
kube-system aws-node-m7x56 1/1 Running 0 51m
kube-system coredns-79df7fff65-9c4z7 1/1 Running 0 56m
kube-system coredns-79df7fff65-lz9t9 1/1 Running 0 56m
kube-system kube-proxy-bfkfk 1/1 Running 0 51m
kube-system kube-proxy-m4ckw 1/1 Running 0 51m

• Identity Provider
• An identity provider (IdP) is responsible for user authentication, and a service provider
(SP), such as a service or an application, controls access to resources.
• Amazon EKS supports using OpenID Connect (OIDC) identity providers as a method to
authenticate users to your cluster.
• IAM Policy
• IAM Roles
• A web identity role is an IAM role that a user assumes when they log in to AWS using an
identity provider (IdP)
• When a user logs in, they receive temporary security credentials that are associated with
an IAM role.
IAM

Cluster Autoscaler
• Automates the management of your
cluster's node resources
• Functionality:
• Scaling Up: Adds nodes to the cluster when pods
cannot be scheduled due to insufficient resources.
• Scaling Down: Removes nodes when they are
underutilized and have no running pods.
Cluster Autoscaler and metrics server
Metrics server
- Collects and aggregates resource usage data for a
Kubernetes
- kubectl apply -f https://github.com/kubernetes-
sigs/metrics-server/releases/latest/download/
components.yaml

Modify IAM
Role
node-group-auto-
discovery
-- Cluster name
Cluster Autoscaler

 Configure resource limits
 Monitor the cluster regularly
 Consider auto-scaling policies
 Optimize node sizes
 Use multiple availability zones
 Plan for node termination
 Test the Autoscaler
How it’s works?

Simple Application
Application Name : Chitti
kubectl run chitti
--image=k8s.gcr.io/hpa-example
--requests=cpu=200m
--expose
--port=80
kubectl autoscale
deployment chitti
--cpu-percent=50
--min=1
--max=10

Load Generator
while true; do wget -q -O - http://chitti; done
kubectl run -i --tty load-generator --image=busybox /bin/sh

 Configure resource limits
 Monitor the cluster regularly
 Consider auto-scaling policies
 Optimize node sizes
 Use multiple availability zones
 Plan for node termination
 Test the Autoscaler
Chitti

Karpenter is an intelligent and high-performance
Kubernetes compute provisioning and management solution
What is Karpenter?
Dynamic, groupless
node provisioning
Open source
and native
to Kubernetes
Rapid scaling
Automatic
node sizing

Consolidates instance orchestration
responsibilities within a single system
CA ASG
EC2 API
Cluster
Auto-
scaler
Auto
Scaling
Group
Pod
Autoscaling
Pending
pods
EC2 Fleet
(instant)
How Karpenter works ?

•Faster Scaling:
• Scales faster and more efficiently in response to demand changes.
•Cost Optimization:
• Proactively selects the most cost-effective instance types and sizes.
• Right-sizes instances based on real-time workloads.
•Simplified Cluster Management:
• No need for predefined Auto Scaling Groups.
• Automatically handles instance provisioning, scaling, and decommissioning.
•Enhanced Flexibility:
• Works with a broader range of instance types and other cloud providers.
•Support for Diverse Workloads:
• Better suited for mixed workloads, GPU instances, and burstable resources.
Why Karpenter?

Difference
Features EKS Cluster Autoscaler Karpenter
Scaling Mechanism Scales based on node groups and limits. Dynamic provisioning of EC2 instances on-
demand.
Node Groups Requires predefined node groups. No need for predefined node groups.
Instance Types Limited to predefined instance types. Chooses optimal instance types
dynamically.
Spot Instances
Supports Spot Instances but with pre-configured
groups.
Extensive support for Spot Instances and
cost optimization.
Scaling Speed Slower scaling due to Auto Scaling Group delays. Faster scaling, better suited for on-demand
scaling.
Cost Efficiency Limited by node group configuration
Optimized for cost with dynamic instance
selection.

Consolidates instance orchestration
responsibilities within a single system
Amazon EC2 Fleet
Spot
Instances
On-Demand
Instances
On-Demand
Instances
Spot
Instances
Reserved
Instances
Reserved
Instances
AZ1 AZ2
EC2 fleet
Allows to synchronously provision capacity across different instance types, Availability
Zones (AZ), and purchase options with a single API
“Builders API”
Use all three purchase options to optimize costs
Benefits
• Reduce costs
• Increase operational efficiency
• Reduce development effort
Key features
• Flexible capacity allocation
• Massive scale
• Simplified provisioning
• Instant Fleets: Drop-In
replacement for RunInstances

Groupless, Flexible, Simple
Node Autoscaling with a NodePool (CRD)
EKS cluster
Karpenter provisioner
AZ 1 AZ 2 AZ 3
g
4
g
5
P
4

Node template CRD
• AWS node template – configures cloud
provider-specific parameters, such as
tags, subnets, and AMIs
• Supports custom user-data
…..

Single
A single nodepool can
manage compute for
multiple teams and
workloads.
Example use cases:
• Single nodepool for a
mix of Graviton and x86,
while a pending pod has
a requirement for a
specific processor type
Multiple
Isolating compute for
different purposes.
Example use cases:
• Expensive hardware
• Security isolation
• Team separation
• Different AMI
Weighted
Define order across your
nodepools so that the node
scheduler will attempt to
scheudle with one
nodepool before another.
Example use cases:
• Prioritize Spot and RI
ahead of other instance
types
• Default cluster-wide
configuration
Strategies for defining NodePools

Pod scheduling constraints must fall
within a nodepool’s constraints
• Provision nodes using Kubernetes scheduling constraints
• Track nodes using native Kubernetes labels
Node selectors Node
affinity
Taints and
tolerations
Topology spread
Native to Kubernetes

EC2 Allocation strategy
32
Spot allocation
• Price-Capacity-Optimized
• Reduce the cost of the instances
• Reduce the frequency of Spot
terminations
On-demand allocation
• Lowest-Price
• Reduce the cost of the
instances
• Built-in Spot instance lifecycle
management
• Support Spot to On-demand fallback

Node Consolidation
• Deletes a node – When pods
can run on free capacity of
other nodes in the cluster
• Deletes a node – When node is
empty
• Replaces a node – When pods
can run on a combination of
free capacity of other nodes in
the cluster
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: my-nodepool
spec:
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s

● Kubernetes Cluster (1.21+).
● Cloud Provider Access (AWS, GCP, Azure, etc.).
● IAM Permissions for Cloud Providers.
● Helm for Deployment.
Prerequisites for Using Karpenter

1. KarpenterNodeRole-$cluster_name
2. KarpenterControllerRole-$cluster_name
Create an IAM Role

• AmazonEKSWorkerNodePolicy
• allows Amazon EKS worker nodes to connect to Amazon EKS Clusters.
• AmazonEKS_CNI_Policy
• Provides the Amazon VPC CNI Plugin
• AmazonEC2ContainerRegistryReadOnly
• Provides read-only access to Amazon EC2 Container Registry repositories.
• AmazonSSMManagedInstanceCore
• Enable AWS Systems Manager service core functionality.
KarpenterNodeRole-ashish

KarpenterControllerRole-ashish
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:oidc-provider/$
{OIDC_ENDPOINT#*//}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_ENDPOINT#*//}:aud": "sts.amazonaws.com",
"${OIDC_ENDPOINT#*//}:sub": "system:serviceaccount:$
{KARPENTER_NAMESPACE}:karpenter"
}
}
}
]
}
• Adding Trust relationship Policy and attach karpentercontroller-policy

ControllerRole
Policy
• ${AWS_REGION}
• ${CLUSTER_NAME}

Add tags to subnets and security groups
[ec2-user@ip-172-31-0-244 ~]$ aws eks describe-nodegroup --cluster-name "ashish" --nodegroup-name
"ashish-workers" --query 'nodegroup.subnets' --output text
subnet-0a968db0a4c73858d subnet-0bcd684f5878c3282 subnet-061e107c1f8ebc361
[ec2-user@ip-172-31-0-244 ~]$ aws eks describe-cluster --name "ashish" --query
"cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text
sg-0e0ac4fa44824e1aa
aws ec2 create-tags --tags "Key=karpenter.sh/discovery,Value=ashish" --resources "sg-
0e0ac4fa44824e1aa"
• Collect the security group
• Create tags for Security Groups and subnets.
• Collect the subnet details

• Allow nodes that are using the node IAM role we just created to join the cluster. To do that we
have to modify the aws-auth ConfigMap in the cluster.
• add a section to the mapRoles that looks something like this.
Update aws-auth ConfigMap
kubectl edit configmap aws-auth -n kube-system
- groups:
- system:bootstrappers
- system:nodes
# - eks:kube-proxy-windows
rolearn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-
ashish
username: system:node:{{EC2PrivateDNSName}}

Deploy Karpenter
helm install karpenter oci://public.ecr.aws/karpenter/karpenter --namespace "karpenter"
--create-namespace
--set "settings.clusterName=ashish"
--set "serviceAccount.annotations.eks.amazonaws.com/role-arn=arn:aws:iam::$
{AWS_ACCOUNT_ID}:role/KarpenterControllerRole-ashish"
--set controller.resources.requests.cpu=1
--set controller.resources.requests.memory=1Gi
--set controller.resources.limits.cpu=1
--set controller.resources.limits.memory=1Gi

Set node affinity
• Edit the karpenter.yaml file and find the
karpenter deployment affinity rules.
Modify the affinity so karpenter will run on
one of the existing node group nodes.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecuti
on:
nodeSelectorTerms:
- matchExpressions:
- key: karpenter.sh/nodepool
operator: DoesNotExist
- key: eks.amazonaws.com/nodegroup
operator: In
values:
- ${NODEGROUP}
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecuti
on:
- topologyKey:
"kubernetes.io/hostname"

NodePool
• Multi CPU architecture
• Mixed Purchase Options –
On-Demand/Spot
• We need to create a default
NodePool so Karpenter knows
what types of nodes we want for
unscheduled workloads.

Karpenter Resources.
https://github.com/aws/karpenter-provide
r-aws
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2
role: "KarpenterNodeRole-${CLUSTER_NAME}" #
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
# replace with your cluster name
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
# amiSelectorTerms:
- id: "${ARM_AMI_ID}"
- id: "${AMD_AMI_ID}"
# - id: "${GPU_AMI_ID}" # <- GPU Optimized AMD
AMI
# - name: "amazon-eks-node-${K8S_VERSION}-*"

Remove CAS
kubectl scale deploy/cluster-autoscaler -n kube-system --replicas=0
kubectl logs -f -n karpenter -c controller -l app.kubernetes.io/name=karpenter
Verify logs

Cost
• Assume you're running an instance type m5.large at $0.096 per hour for 5
nodes.
Cost Comparison for 30 Days:
• Cluster Autoscaler (with On-Demand EC2 instances):
• $11.52 * 30 = $345.60 per month
• Karpenter (with Spot Instances):
• $4.61 * 30 = $138.30 per month
Total Cost Calculation (for one month):
• Cluster Autoscaler Total:
• EKS Fee ($0.10 * 24 hours * 30 days) = $72
• EC2 Cost = $345.60
• $72 + $345.60 = $417.60 per month
• Karpenter Total:
• EKS Fee ($0.10 * 24 hours * 30 days) = $72
• EC2 Cost = $138.30
• Total = $72 + $138.30 = $210.30 per month
Cost Savings with Karpenter:
By migrating to Karpenter, you can save approximately:
• $417.60 - $210.30 = $207.30 per month

Migrating from EKS Cluster Autoscaler to Karpenter

More Related Content

Similar to Migrating from EKS Cluster Autoscaler to Karpenter

Recently uploaded

Migrating from EKS Cluster Autoscaler to Karpenter

Editor's Notes