Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload

Peter Schuurman (@pwschuurman)
Software Engineer / Google
2022-08-12
K8s Cluster Upgrade Strategies
Best Practices for your Stateful Workload

Agenda
● Why Upgrade?
● Stateful Workloads and Upgrades
● Nodepool Upgrade Strategies
● Control Plane Upgrade Strategies
● Upgrade Strategy and Workload Selection

Why Upgrade: Kubernetes Version Lifecycle
source

Version Skew
Kubernetes Version
Skew Policy maintains
support for 2 node minor
versions
New Features
New features are
introduced in upcoming
Kubernetes versions. Eg:
StatefulSet
MaxUnavailable was
introduced in 1.24.
Security
Compliance
Organizations following
compliance protocols
(PCI, HIPAA, FedRamp)
are required to apply
security patches within
30 days of availability
Patch Support
Kubernetes minor
versions are maintained
for 1 year
Why Upgrade: Modern and Protected

MariaDB has modernized their architecture by bringing
SkySQL to the cloud on Kubernetes. Built using the
Kubernetes operator pattern, MariaDB leverages
resiliency and maintains high availability during
upgrades.
We have been using containers for many years … Our goal
was to simplify the implementation and focus less on
lower-level infrastructure, dependencies and instance
life-cycle. With Kubernetes, our engineers could leverage
the strong momentum from the open source community
to drive infrastructure logic and security. (Reference)
Why Upgrade: Modern Applications

Why Upgrade: Upgrade Dimensions
Application Developer
Kubernetes Administrator
Cloud Platform

Why Upgrade: Upgrade Dimensions
Application Compatibility Nodes Control Plane
Ensuring your application is compatible
with an upgraded Kubernetes version
Kubernetes (node or control plane)
Upgrading the operating system,
dependant libraries and kubernetes
software of your cluster’s data plane
Upgrading the operating system and
kubernetes software of your cluster’s
orchestration layer

Why Upgrade: Key Concerns
Application
Availability
Cost Speed

Nodes: Surge Upgrades
● Application Availability: Suitable for fault-tolerant workloads.
Control availability by specifying node maxUnavailable
● Cost: Cost effective
● Speed: Increase upgrade velocity with parallel node surge

Nodes: Blue/Green Upgrades
● Application Availability: Granular
control during migration
● Cost: Increased cost with resource
pre-provisioning
● Speed: Slow and controlled

Node Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback scenarios make take
more time
High degree of application
availability
Cost Lower cost, upgraded node
creation occurs just in time
Higher cost, upgraded nodes
are pre-provisioned
Speed Nodes can be upgraded in
batches for increased speed
Higher control over node
migration reduces speed

Control Plane: Upgrades
● Kubernetes maintains API versions with each minor release
● API schema may change with new minor versions

Control Plane: Surge Upgrade
● Application Availability: HA control plane setups limit disruptions. Kubernetes minor
rollback is not supported
● Cost: Cost effective
● Speed: Fast

Control Plane: Blue/Green Upgrade
● Application Availability: Granular control over application upgrade. Safe minor version
rollback
● Cost: Increased cost over in-place upgrades with cluster pre-provisioning
● Speed: Slow and controlled

Control Plane: Blue/Green Upgrade
● KEP-3335: Introduces building blocks to the StatefulSet API to enable StatefulSet
replicas to be moved across clusters.
● With Kubernetes Multi-Cluster Services (KEP-1645), applications can maintain
connectivity
● Demo

Control Plane Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback is not possible Applications can be rolled
back to a cluster with a
known compatible Control
Plane
Cost Lower cost, upgraded control
plane creation occurs just in
time
Higher cost, cluster
pre-provisioned
Speed Control Plane upgrade is fast
and scales sub-linearly as
cluster size increases
Upgrade speed scales with
application migration speed

Takeaways
● Trade-off between business requirements: application availability, speed and cost
● Modern applications update consistently and often
● Kubernetes has the tools to support safe stateful upgrades today, and the community is
building new tools to increase this margin of safety

Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload

Recommended

Recommended

More Related Content

More from DoKC

More from DoKC (20)

Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload