Link: https://youtu.be/qUW8LkxYayc
https://go.dok.community/slack
https://dok.community/
ABSTRACT OF THE TALK
How do you make sure your Stateful Workloads remain available when your Kubernetes infrastructure updates? This talk will discuss different strategies of upgrading a Kubernetes cluster, and how you can manage risk for your workload. The talk will showcase demos of each upgrade strategy.
BIO
Peter is a Senior Software Engineer on GKE at Google. He works on improving Kubernetes for Stateful workloads. His main focus is on enhancing the Kubernetes ecosystem for high availability applications.
KEY TAKE-AWAYS FROM THE TALK
The mechanics of different upgrade strategies, when to apply a particular upgrade strategy depending on your Stateful workload and how to mitigate risk to your application’s availability.
4. Version Skew
Kubernetes Version
Skew Policy maintains
support for 2 node minor
versions
New Features
New features are
introduced in upcoming
Kubernetes versions. Eg:
StatefulSet
MaxUnavailable was
introduced in 1.24.
Security
Compliance
Organizations following
compliance protocols
(PCI, HIPAA, FedRamp)
are required to apply
security patches within
30 days of availability
Patch Support
Kubernetes minor
versions are maintained
for 1 year
Why Upgrade: Modern and Protected
5. MariaDB has modernized their architecture by bringing
SkySQL to the cloud on Kubernetes. Built using the
Kubernetes operator pattern, MariaDB leverages
resiliency and maintains high availability during
upgrades.
We have been using containers for many years … Our goal
was to simplify the implementation and focus less on
lower-level infrastructure, dependencies and instance
life-cycle. With Kubernetes, our engineers could leverage
the strong momentum from the open source community
to drive infrastructure logic and security. (Reference)
Why Upgrade: Modern Applications
8. Why Upgrade: Upgrade Dimensions
Application Compatibility Nodes Control Plane
Ensuring your application is compatible
with an upgraded Kubernetes version
Kubernetes (node or control plane)
Upgrading the operating system,
dependant libraries and kubernetes
software of your cluster’s data plane
Upgrading the operating system and
kubernetes software of your cluster’s
orchestration layer
10. Nodes: Surge Upgrades
● Application Availability: Suitable for fault-tolerant workloads.
Control availability by specifying node maxUnavailable
● Cost: Cost effective
● Speed: Increase upgrade velocity with parallel node surge
11. Nodes: Blue/Green Upgrades
● Application Availability: Granular
control during migration
● Cost: Increased cost with resource
pre-provisioning
● Speed: Slow and controlled
12. Node Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback scenarios make take
more time
High degree of application
availability
Cost Lower cost, upgraded node
creation occurs just in time
Higher cost, upgraded nodes
are pre-provisioned
Speed Nodes can be upgraded in
batches for increased speed
Higher control over node
migration reduces speed
13. Control Plane: Upgrades
● Kubernetes maintains API versions with each minor release
● API schema may change with new minor versions
14. Control Plane: Surge Upgrade
● Application Availability: HA control plane setups limit disruptions. Kubernetes minor
rollback is not supported
● Cost: Cost effective
● Speed: Fast
15. Control Plane: Blue/Green Upgrade
● Application Availability: Granular control over application upgrade. Safe minor version
rollback
● Cost: Increased cost over in-place upgrades with cluster pre-provisioning
● Speed: Slow and controlled
16. Control Plane: Blue/Green Upgrade
● KEP-3335: Introduces building blocks to the StatefulSet API to enable StatefulSet
replicas to be moved across clusters.
● With Kubernetes Multi-Cluster Services (KEP-1645), applications can maintain
connectivity
● Demo
17. Control Plane Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback is not possible Applications can be rolled
back to a cluster with a
known compatible Control
Plane
Cost Lower cost, upgraded control
plane creation occurs just in
time
Higher cost, cluster
pre-provisioned
Speed Control Plane upgrade is fast
and scales sub-linearly as
cluster size increases
Upgrade speed scales with
application migration speed
18. Takeaways
● Trade-off between business requirements: application availability, speed and cost
● Modern applications update consistently and often
● Kubernetes has the tools to support safe stateful upgrades today, and the community is
building new tools to increase this margin of safety