SlideShare a Scribd company logo
1 of 18
Download to read offline
Peter Schuurman (@pwschuurman)
Software Engineer / Google
2022-08-12
K8s Cluster Upgrade Strategies
Best Practices for your Stateful Workload
Agenda
● Why Upgrade?
● Stateful Workloads and Upgrades
● Nodepool Upgrade Strategies
● Control Plane Upgrade Strategies
● Upgrade Strategy and Workload Selection
Why Upgrade: Kubernetes Version Lifecycle
source
Version Skew
Kubernetes Version
Skew Policy maintains
support for 2 node minor
versions
New Features
New features are
introduced in upcoming
Kubernetes versions. Eg:
StatefulSet
MaxUnavailable was
introduced in 1.24.
Security
Compliance
Organizations following
compliance protocols
(PCI, HIPAA, FedRamp)
are required to apply
security patches within
30 days of availability
Patch Support
Kubernetes minor
versions are maintained
for 1 year
Why Upgrade: Modern and Protected
MariaDB has modernized their architecture by bringing
SkySQL to the cloud on Kubernetes. Built using the
Kubernetes operator pattern, MariaDB leverages
resiliency and maintains high availability during
upgrades.
We have been using containers for many years … Our goal
was to simplify the implementation and focus less on
lower-level infrastructure, dependencies and instance
life-cycle. With Kubernetes, our engineers could leverage
the strong momentum from the open source community
to drive infrastructure logic and security. (Reference)
Why Upgrade: Modern Applications
Why Upgrade: Upgrade Dimensions
Application Developer
Kubernetes Administrator
Cloud Platform
Why Upgrade: Upgrade Dimensions
Application Developer
Kubernetes Administrator
Cloud Platform
Why Upgrade: Upgrade Dimensions
Application Compatibility Nodes Control Plane
Ensuring your application is compatible
with an upgraded Kubernetes version
Kubernetes (node or control plane)
Upgrading the operating system,
dependant libraries and kubernetes
software of your cluster’s data plane
Upgrading the operating system and
kubernetes software of your cluster’s
orchestration layer
Why Upgrade: Key Concerns
Application
Availability
Cost Speed
Nodes: Surge Upgrades
● Application Availability: Suitable for fault-tolerant workloads.
Control availability by specifying node maxUnavailable
● Cost: Cost effective
● Speed: Increase upgrade velocity with parallel node surge
Nodes: Blue/Green Upgrades
● Application Availability: Granular
control during migration
● Cost: Increased cost with resource
pre-provisioning
● Speed: Slow and controlled
Node Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback scenarios make take
more time
High degree of application
availability
Cost Lower cost, upgraded node
creation occurs just in time
Higher cost, upgraded nodes
are pre-provisioned
Speed Nodes can be upgraded in
batches for increased speed
Higher control over node
migration reduces speed
Control Plane: Upgrades
● Kubernetes maintains API versions with each minor release
● API schema may change with new minor versions
Control Plane: Surge Upgrade
● Application Availability: HA control plane setups limit disruptions. Kubernetes minor
rollback is not supported
● Cost: Cost effective
● Speed: Fast
Control Plane: Blue/Green Upgrade
● Application Availability: Granular control over application upgrade. Safe minor version
rollback
● Cost: Increased cost over in-place upgrades with cluster pre-provisioning
● Speed: Slow and controlled
Control Plane: Blue/Green Upgrade
● KEP-3335: Introduces building blocks to the StatefulSet API to enable StatefulSet
replicas to be moved across clusters.
● With Kubernetes Multi-Cluster Services (KEP-1645), applications can maintain
connectivity
● Demo
Control Plane Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback is not possible Applications can be rolled
back to a cluster with a
known compatible Control
Plane
Cost Lower cost, upgraded control
plane creation occurs just in
time
Higher cost, cluster
pre-provisioned
Speed Control Plane upgrade is fast
and scales sub-linearly as
cluster size increases
Upgrade speed scales with
application migration speed
Takeaways
● Trade-off between business requirements: application availability, speed and cost
● Modern applications update consistently and often
● Kubernetes has the tools to support safe stateful upgrades today, and the community is
building new tools to increase this margin of safety

More Related Content

More from DoKC

Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudDoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native DatabaseDoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sDoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators DoKC
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...DoKC
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?DoKC
 
What's New in Kubernetes Storage
What's New in Kubernetes StorageWhat's New in Kubernetes Storage
What's New in Kubernetes StorageDoKC
 
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesDoKC
 
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...DoKC
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceDoKC
 
The many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataThe many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataDoKC
 
The Data on Kubernetes Landscape
The Data on Kubernetes LandscapeThe Data on Kubernetes Landscape
The Data on Kubernetes LandscapeDoKC
 
Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...DoKC
 

More from DoKC (20)

Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
 
What's New in Kubernetes Storage
What's New in Kubernetes StorageWhat's New in Kubernetes Storage
What's New in Kubernetes Storage
 
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on Kubernetes
 
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
Weathering The Cloud Storm: Modern Data Management Patterns for Reliability a...
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
 
The many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent dataThe many uses of Kubernetes cross cluster migration of persistent data
The many uses of Kubernetes cross cluster migration of persistent data
 
The Data on Kubernetes Landscape
The Data on Kubernetes LandscapeThe Data on Kubernetes Landscape
The Data on Kubernetes Landscape
 
Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...Testing the Mettle: Evaluating data solutions for large-scale production to c...
Testing the Mettle: Evaluating data solutions for large-scale production to c...
 

Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload

  • 1. Peter Schuurman (@pwschuurman) Software Engineer / Google 2022-08-12 K8s Cluster Upgrade Strategies Best Practices for your Stateful Workload
  • 2. Agenda ● Why Upgrade? ● Stateful Workloads and Upgrades ● Nodepool Upgrade Strategies ● Control Plane Upgrade Strategies ● Upgrade Strategy and Workload Selection
  • 3. Why Upgrade: Kubernetes Version Lifecycle source
  • 4. Version Skew Kubernetes Version Skew Policy maintains support for 2 node minor versions New Features New features are introduced in upcoming Kubernetes versions. Eg: StatefulSet MaxUnavailable was introduced in 1.24. Security Compliance Organizations following compliance protocols (PCI, HIPAA, FedRamp) are required to apply security patches within 30 days of availability Patch Support Kubernetes minor versions are maintained for 1 year Why Upgrade: Modern and Protected
  • 5. MariaDB has modernized their architecture by bringing SkySQL to the cloud on Kubernetes. Built using the Kubernetes operator pattern, MariaDB leverages resiliency and maintains high availability during upgrades. We have been using containers for many years … Our goal was to simplify the implementation and focus less on lower-level infrastructure, dependencies and instance life-cycle. With Kubernetes, our engineers could leverage the strong momentum from the open source community to drive infrastructure logic and security. (Reference) Why Upgrade: Modern Applications
  • 6. Why Upgrade: Upgrade Dimensions Application Developer Kubernetes Administrator Cloud Platform
  • 7. Why Upgrade: Upgrade Dimensions Application Developer Kubernetes Administrator Cloud Platform
  • 8. Why Upgrade: Upgrade Dimensions Application Compatibility Nodes Control Plane Ensuring your application is compatible with an upgraded Kubernetes version Kubernetes (node or control plane) Upgrading the operating system, dependant libraries and kubernetes software of your cluster’s data plane Upgrading the operating system and kubernetes software of your cluster’s orchestration layer
  • 9. Why Upgrade: Key Concerns Application Availability Cost Speed
  • 10. Nodes: Surge Upgrades ● Application Availability: Suitable for fault-tolerant workloads. Control availability by specifying node maxUnavailable ● Cost: Cost effective ● Speed: Increase upgrade velocity with parallel node surge
  • 11. Nodes: Blue/Green Upgrades ● Application Availability: Granular control during migration ● Cost: Increased cost with resource pre-provisioning ● Speed: Slow and controlled
  • 12. Node Upgrade Takeaways Surge Upgrades Blue/Green Upgrades Application Availability Rollback scenarios make take more time High degree of application availability Cost Lower cost, upgraded node creation occurs just in time Higher cost, upgraded nodes are pre-provisioned Speed Nodes can be upgraded in batches for increased speed Higher control over node migration reduces speed
  • 13. Control Plane: Upgrades ● Kubernetes maintains API versions with each minor release ● API schema may change with new minor versions
  • 14. Control Plane: Surge Upgrade ● Application Availability: HA control plane setups limit disruptions. Kubernetes minor rollback is not supported ● Cost: Cost effective ● Speed: Fast
  • 15. Control Plane: Blue/Green Upgrade ● Application Availability: Granular control over application upgrade. Safe minor version rollback ● Cost: Increased cost over in-place upgrades with cluster pre-provisioning ● Speed: Slow and controlled
  • 16. Control Plane: Blue/Green Upgrade ● KEP-3335: Introduces building blocks to the StatefulSet API to enable StatefulSet replicas to be moved across clusters. ● With Kubernetes Multi-Cluster Services (KEP-1645), applications can maintain connectivity ● Demo
  • 17. Control Plane Upgrade Takeaways Surge Upgrades Blue/Green Upgrades Application Availability Rollback is not possible Applications can be rolled back to a cluster with a known compatible Control Plane Cost Lower cost, upgraded control plane creation occurs just in time Higher cost, cluster pre-provisioned Speed Control Plane upgrade is fast and scales sub-linearly as cluster size increases Upgrade speed scales with application migration speed
  • 18. Takeaways ● Trade-off between business requirements: application availability, speed and cost ● Modern applications update consistently and often ● Kubernetes has the tools to support safe stateful upgrades today, and the community is building new tools to increase this margin of safety