ZERO-DOWNTIME DEPLOYMENT on K8S the missing part Bảo Huỳnh Site Reliability Engineering 12-Jun-2020
AGENDA ● Deployment & Replicas: are we really safe ? ● Understand Pod Eviction Lifecycle ● Avoid Outages ● Beyond the Outa...
1. Deployment & Replicas: really safe ? We have: ● Replicas : 2 ● RollingUpdate Strategy ● maxUnavailable: 1 * Everything ...
2. Understand Pod Eviction Lifecycle ● kubectl delete / drain / upgrade ● A request à nodes where pod is located ● kubelet...
1. Deployment & Replicas: really safe ? Downtime will occur IF: - Existing traffic does not being handled properly - Appli...
Add preStop hook to graceful shutdown nginx à Make sure app finish handling existing connections before quit 2. Understand...
2. Understand Pod Eviction Lifecycle - Drain “node 1” - Sent SIGTERM to nginx pod - preStop hook is executed (nginx quit)
2. Understand Pod Eviction Lifecycle + New request is coming + Being routed to stopping Nginx + Error….
2. Understand Pod Eviction Lifecycle
2. Understand Pod Eviction Lifecycle - Why does this sh*t happens ? - Why does stupid K8S still routing traffic to a “term...
2. Avoid the Outages Recall pod shutdown sequence ● kubectl delete / drain / upgrade ● A request à nodes where pod is loca...
2. Avoid the Outages Figure 1: Sequences occur when pod is deleted
2. Avoid the Outages Figure 2: Timeline “version” for pod deletion’s events - Two flows run in parellel - No guarantee [A]...
2. Avoid the Outages
2. Avoid the Outages BUT HOW ???
2. Avoid the Outages ● don’t work, just SLEEP ● … & wait for deregister flow (B) to complete before graceful shutdown
2. Avoid the Outages
2. Avoid the Outages
2. Avoid the Outages
3. Beyond the Outages - Introducing: PodDisruptionBudgets - An indicator of the number of disruptions that can be tolerate...
3. Beyond the Outages
Summary Application: - Handed SIGTERM for graceful shutdown System: - Apply preStop lifecycle - Apply Sleep to make sure p...
Questions & Answers
Appendix: Service Disruption Involuntary disruptions Voluntary disruptions HW failure, node disappear from cluster deploym...
K8s ZeroDowntime - The missing part

+ The missing part for K8S deployment zerodowntime
+ Step-by-step exploration
+ Solution & fix

K8s ZeroDowntime - The missing part

×