SlideShare a Scribd company logo
1 of 15
Download to read offline
Demystifying
etcd failure scenarios
for Kubernetes
By William Caban
1
@williamcaban
etcd 101
2
Kubernetes Control-Plane & etcd
3
W W
S S S
W
S W S W S W
Multi Node Cluster
Compact Cluster
S W
All-in-One K8s
W W W
Multi Node Cluster
S W S W S W
kube-apiserver
kube-scheduler
kube-controller-manager
cloud-controller-manager
container runtime
kubelet
Kubernetes Architectures
A
B
C
D
K8s Control Plane
(Supervisor role)
4
Etcd Redundancy vs Performance
Failure
Tolerance
x 2
x 1
x 0
x 0
Write
Performance
High
Low
Required Active
Quorum Size
Low
High
Redundancy
Low
High 3
2
2
1
5
The life of a write on etcd
1. No leader 2. The election & vote 3. Leader coordinate the
writes
4. For “Set Foo=bar”. Leader
writes into log entry
Foo=bar
5. Replicate “Foo=bar” to
follower nodes
Foo=bar Foo=bar
Foo=bar
6. Leader waits for majority
to write the entry to commit
Foo=bar Foo=bar
Foo=bar
7. Leader notifies followers
entry is committed
Foo=bar Foo=bar
Foo=bar
8. Leader send regular role
notifications to followers
Foo=bar Foo=bar
Foo=bar
Writing to etcd via a Leader
(etcd client)
A C
(Follower)
(Leader)
(write “foo”)
B(Follower)
1
Wait while I work…
2
Write to my Raft log
Send to Followers
4
3
Write to my Raft log
Send acknowledgement
6
7
Write to my Raft log
Send acknowledgement
6
7
Wait for ack
Ack to client
8
5
Send acknowledgement to
client and close session
6
(write
“foo=bar”)
9
Writing to etcd via a Follower
(etcd client)
A C
(Follower) (Leader)
(write “foo=bar”)
I’m not the leader.
Let me forward that to “C”.
B
(Follower)
1
7
(proxied write requests)
7
2
3
4
5
6
Myths & Realities
8
9
● Critical etcd timers settings:
○ HEARTBEAT_INTERVAL (100ms)
■ Frequency with which the Leader will notify
Followers that it is still the Leader
○ ELECTION_TIMEOUT (1000ms)
■ How long a Follower node will wait without hearing
a heartbeat before attempting to become Leader
itself.
Why the Critical ETCD Timers?
Best Practices
Heartbeat Interval
❏ < max(RTT) between members
❏ Too low increase CPU and network usage
❏ Too high leads to high election timeout
❏ slower to recover and detect
failures
Election Timeout
❏ 10 times the HEARTBEAT_INTERVAL
Why the Hardware Specifications?
10
CPU RAM DISK
2 to 4 cores
8 to 16 cores
MINIMUM
PRODUCTION
8 GB
16GB to 64GB
< 30ms latency
< 10ms latency
Introducing the Magic Latency Formula for ETCD latency profiles…
Effective Latency = Disk Latency + Max(Jitter(Disk Latency)) + Network RTT + Max(Network Jitter)
Note: To maintain etcd stability at scale, the Effective Latency must be well below < Election Timeout
Myth Collection 1
11
Myth: We can use stretched control-plane for Kubernetes:
● without impact in performance
● for high availability
● as a highly available Kubernetes design
What happens with failures?
❏ High Network Latency
❏ High Disk Latency
❏ Client to Leader Latency
❏ Cross-site Disconnection
❏ Kube-apiserver transaction rate?
❏ Memory utilization due to etcd
fragmentation?
Myth Collection 2
12
Myth: We can use backups of etcd to:
● Restore Kubernetes in case of disaster recovery
● Rollback Kubernetes
● To recover the applications running in the cluster
What happens with failures?
❏ Cluster identity?
❏ Certificates?
❏ ETCD peer certificates?
❏ ETCD identity?
❏ Persistent storage?
❏ API Schema Version?
Manifest and other K8s objects
Container image
PersistentVolumeClaim
PersistentVolume
CSI-enabled storage backend
Kubernetes Application
Stack (Pods, Manifests,
Storage mappings, etc)
VS.
13
ETCD Failure Modes
https://etcd.io/docs/v3.5/op-guide/failures/
Leader failure
Follower failure
Majority failure
Majority failure
Network Partition
Network Partition
14
What to Remember about etcd?
Enjoy the rest of
the event!
Image by https://www.opsramp.com/guides/why-kubernetes/who-made-kubernetes/
15

More Related Content

What's hot

DevSecOps reference architectures 2018
DevSecOps reference architectures 2018DevSecOps reference architectures 2018
DevSecOps reference architectures 2018Sonatype
 
Ngoc renal cystic desease in children sua
Ngoc renal cystic desease in children  suaNgoc renal cystic desease in children  sua
Ngoc renal cystic desease in children suaPhòng Khám An Nhi
 
Digital 2023 Guinea (February 2023) v01
Digital 2023 Guinea (February 2023) v01Digital 2023 Guinea (February 2023) v01
Digital 2023 Guinea (February 2023) v01DataReportal
 
Clvt bệnh lý nhiễm trùng gan
Clvt bệnh lý nhiễm trùng ganClvt bệnh lý nhiễm trùng gan
Clvt bệnh lý nhiễm trùng ganNgân Lượng
 
X QUANG BỤNG Ở TRẺ EM
X QUANG BỤNG Ở TRẺ EMX QUANG BỤNG Ở TRẺ EM
X QUANG BỤNG Ở TRẺ EMSoM
 
Một số đặc điểm perfusion
Một số đặc điểm perfusionMột số đặc điểm perfusion
Một số đặc điểm perfusionNguyen Binh
 
Bệnh sợi bọc vú
Bệnh sợi bọc vúBệnh sợi bọc vú
Bệnh sợi bọc vúSoM
 
CÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOA
CÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOACÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOA
CÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOASoM
 
Digital 2022 Fiji (February 2022) v01
Digital 2022 Fiji (February 2022) v01Digital 2022 Fiji (February 2022) v01
Digital 2022 Fiji (February 2022) v01DataReportal
 
Cc khung chau vs tinh hoan chaumoitre
Cc khung chau vs tinh hoan   chaumoitre Cc khung chau vs tinh hoan   chaumoitre
Cc khung chau vs tinh hoan chaumoitre Lan Đặng
 
Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...
Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...
Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...Nguyen Lam
 
Leveraging Azure DevOps across the Enterprise
Leveraging Azure DevOps across the EnterpriseLeveraging Azure DevOps across the Enterprise
Leveraging Azure DevOps across the EnterpriseAndrew Kelleher
 
Digital 2023 Cambodia (February 2023) v01
Digital 2023 Cambodia (February 2023) v01Digital 2023 Cambodia (February 2023) v01
Digital 2023 Cambodia (February 2023) v01DataReportal
 
Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...
Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...
Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...Nguyen Lam
 
Giả ung thư phổi
Giả ung thư phổiGiả ung thư phổi
Giả ung thư phổiNgoan Pham
 
Clvt trong cđ u gan 2
Clvt trong cđ u gan 2Clvt trong cđ u gan 2
Clvt trong cđ u gan 2Ngân Lượng
 
2019 DevSecOps Reference Architectures
2019 DevSecOps Reference Architectures2019 DevSecOps Reference Architectures
2019 DevSecOps Reference ArchitecturesSonatype
 
Khuyen cao SMFM 2018 ve xu tri gian nao that thai nhi
Khuyen cao SMFM 2018 ve xu tri gian nao that thai nhiKhuyen cao SMFM 2018 ve xu tri gian nao that thai nhi
Khuyen cao SMFM 2018 ve xu tri gian nao that thai nhiVõ Tá Sơn
 
The Journey to Devops: From Waterfall to Continuous Integration
The Journey to Devops: From Waterfall to Continuous IntegrationThe Journey to Devops: From Waterfall to Continuous Integration
The Journey to Devops: From Waterfall to Continuous IntegrationSauce Labs
 

What's hot (20)

DevSecOps reference architectures 2018
DevSecOps reference architectures 2018DevSecOps reference architectures 2018
DevSecOps reference architectures 2018
 
Ngoc renal cystic desease in children sua
Ngoc renal cystic desease in children  suaNgoc renal cystic desease in children  sua
Ngoc renal cystic desease in children sua
 
Digital 2023 Guinea (February 2023) v01
Digital 2023 Guinea (February 2023) v01Digital 2023 Guinea (February 2023) v01
Digital 2023 Guinea (February 2023) v01
 
Clvt bệnh lý nhiễm trùng gan
Clvt bệnh lý nhiễm trùng ganClvt bệnh lý nhiễm trùng gan
Clvt bệnh lý nhiễm trùng gan
 
X QUANG BỤNG Ở TRẺ EM
X QUANG BỤNG Ở TRẺ EMX QUANG BỤNG Ở TRẺ EM
X QUANG BỤNG Ở TRẺ EM
 
Một số đặc điểm perfusion
Một số đặc điểm perfusionMột số đặc điểm perfusion
Một số đặc điểm perfusion
 
Xuất Huyết Khoang Dưới Nhện
Xuất Huyết Khoang Dưới NhệnXuất Huyết Khoang Dưới Nhện
Xuất Huyết Khoang Dưới Nhện
 
Bệnh sợi bọc vú
Bệnh sợi bọc vúBệnh sợi bọc vú
Bệnh sợi bọc vú
 
CÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOA
CÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOACÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOA
CÁC ĐƯỜNG CẮT CƠ BẢN TRONG SẢN KHOA
 
Digital 2022 Fiji (February 2022) v01
Digital 2022 Fiji (February 2022) v01Digital 2022 Fiji (February 2022) v01
Digital 2022 Fiji (February 2022) v01
 
Cc khung chau vs tinh hoan chaumoitre
Cc khung chau vs tinh hoan   chaumoitre Cc khung chau vs tinh hoan   chaumoitre
Cc khung chau vs tinh hoan chaumoitre
 
Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...
Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...
Ky thuat sieu am vu 3D tu dong tren may sieu am Invenia Abus tai BV Vinmec, G...
 
Leveraging Azure DevOps across the Enterprise
Leveraging Azure DevOps across the EnterpriseLeveraging Azure DevOps across the Enterprise
Leveraging Azure DevOps across the Enterprise
 
Digital 2023 Cambodia (February 2023) v01
Digital 2023 Cambodia (February 2023) v01Digital 2023 Cambodia (February 2023) v01
Digital 2023 Cambodia (February 2023) v01
 
Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...
Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...
Sieu am dan hoi ung dung trong khao sat benh ly gan - PGS.TS Nguyen Phuoc Bao...
 
Giả ung thư phổi
Giả ung thư phổiGiả ung thư phổi
Giả ung thư phổi
 
Clvt trong cđ u gan 2
Clvt trong cđ u gan 2Clvt trong cđ u gan 2
Clvt trong cđ u gan 2
 
2019 DevSecOps Reference Architectures
2019 DevSecOps Reference Architectures2019 DevSecOps Reference Architectures
2019 DevSecOps Reference Architectures
 
Khuyen cao SMFM 2018 ve xu tri gian nao that thai nhi
Khuyen cao SMFM 2018 ve xu tri gian nao that thai nhiKhuyen cao SMFM 2018 ve xu tri gian nao that thai nhi
Khuyen cao SMFM 2018 ve xu tri gian nao that thai nhi
 
The Journey to Devops: From Waterfall to Continuous Integration
The Journey to Devops: From Waterfall to Continuous IntegrationThe Journey to Devops: From Waterfall to Continuous Integration
The Journey to Devops: From Waterfall to Continuous Integration
 

Similar to Demystifying etcd failure scenarios for Kubernetes

Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaHenning Jacobs
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
 
Performance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releasePerformance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releaseLibbySchulze
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity HardwareMirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity HardwareRyan Aydelott
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
OpenSlava Infrastructure Automation Patterns
OpenSlava   Infrastructure Automation PatternsOpenSlava   Infrastructure Automation Patterns
OpenSlava Infrastructure Automation PatternsAntons Kranga
 
Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018Danni Moiseyev
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster inwin stack
 
Apache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS SecurityApache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS SecurityDatabricks
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
Red hat open stack and storage presentation
Red hat open stack and storage presentationRed hat open stack and storage presentation
Red hat open stack and storage presentationMayur Shetty
 
1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBSJim Plush
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable CloudChris Genazzio
 
KubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeAcademy
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMakerKris Buytaert
 

Similar to Demystifying etcd failure scenarios for Kubernetes (20)

Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe Barcelona
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Performance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 releasePerformance improvements in etcd 3.5 release
Performance improvements in etcd 3.5 release
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity HardwareMirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
OpenSlava Infrastructure Automation Patterns
OpenSlava   Infrastructure Automation PatternsOpenSlava   Infrastructure Automation Patterns
OpenSlava Infrastructure Automation Patterns
 
Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018
 
How to Fail at VDI
How to Fail at VDIHow to Fail at VDI
How to Fail at VDI
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
Apache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS SecurityApache Spark on K8s and HDFS Security
Apache Spark on K8s and HDFS Security
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Red hat open stack and storage presentation
Red hat open stack and storage presentationRed hat open stack and storage presentation
Red hat open stack and storage presentation
 
1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS1 Million Writes per second on 60 nodes with Cassandra and EBS
1 Million Writes per second on 60 nodes with Cassandra and EBS
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable Cloud
 
KubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container Scheduling
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMaker
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Demystifying etcd failure scenarios for Kubernetes

  • 1. Demystifying etcd failure scenarios for Kubernetes By William Caban 1 @williamcaban
  • 3. Kubernetes Control-Plane & etcd 3 W W S S S W S W S W S W Multi Node Cluster Compact Cluster S W All-in-One K8s W W W Multi Node Cluster S W S W S W kube-apiserver kube-scheduler kube-controller-manager cloud-controller-manager container runtime kubelet Kubernetes Architectures A B C D K8s Control Plane (Supervisor role)
  • 4. 4 Etcd Redundancy vs Performance Failure Tolerance x 2 x 1 x 0 x 0 Write Performance High Low Required Active Quorum Size Low High Redundancy Low High 3 2 2 1
  • 5. 5 The life of a write on etcd 1. No leader 2. The election & vote 3. Leader coordinate the writes 4. For “Set Foo=bar”. Leader writes into log entry Foo=bar 5. Replicate “Foo=bar” to follower nodes Foo=bar Foo=bar Foo=bar 6. Leader waits for majority to write the entry to commit Foo=bar Foo=bar Foo=bar 7. Leader notifies followers entry is committed Foo=bar Foo=bar Foo=bar 8. Leader send regular role notifications to followers Foo=bar Foo=bar Foo=bar
  • 6. Writing to etcd via a Leader (etcd client) A C (Follower) (Leader) (write “foo”) B(Follower) 1 Wait while I work… 2 Write to my Raft log Send to Followers 4 3 Write to my Raft log Send acknowledgement 6 7 Write to my Raft log Send acknowledgement 6 7 Wait for ack Ack to client 8 5 Send acknowledgement to client and close session 6 (write “foo=bar”) 9
  • 7. Writing to etcd via a Follower (etcd client) A C (Follower) (Leader) (write “foo=bar”) I’m not the leader. Let me forward that to “C”. B (Follower) 1 7 (proxied write requests) 7 2 3 4 5 6
  • 9. 9 ● Critical etcd timers settings: ○ HEARTBEAT_INTERVAL (100ms) ■ Frequency with which the Leader will notify Followers that it is still the Leader ○ ELECTION_TIMEOUT (1000ms) ■ How long a Follower node will wait without hearing a heartbeat before attempting to become Leader itself. Why the Critical ETCD Timers? Best Practices Heartbeat Interval ❏ < max(RTT) between members ❏ Too low increase CPU and network usage ❏ Too high leads to high election timeout ❏ slower to recover and detect failures Election Timeout ❏ 10 times the HEARTBEAT_INTERVAL
  • 10. Why the Hardware Specifications? 10 CPU RAM DISK 2 to 4 cores 8 to 16 cores MINIMUM PRODUCTION 8 GB 16GB to 64GB < 30ms latency < 10ms latency Introducing the Magic Latency Formula for ETCD latency profiles… Effective Latency = Disk Latency + Max(Jitter(Disk Latency)) + Network RTT + Max(Network Jitter) Note: To maintain etcd stability at scale, the Effective Latency must be well below < Election Timeout
  • 11. Myth Collection 1 11 Myth: We can use stretched control-plane for Kubernetes: ● without impact in performance ● for high availability ● as a highly available Kubernetes design What happens with failures? ❏ High Network Latency ❏ High Disk Latency ❏ Client to Leader Latency ❏ Cross-site Disconnection ❏ Kube-apiserver transaction rate? ❏ Memory utilization due to etcd fragmentation?
  • 12. Myth Collection 2 12 Myth: We can use backups of etcd to: ● Restore Kubernetes in case of disaster recovery ● Rollback Kubernetes ● To recover the applications running in the cluster What happens with failures? ❏ Cluster identity? ❏ Certificates? ❏ ETCD peer certificates? ❏ ETCD identity? ❏ Persistent storage? ❏ API Schema Version? Manifest and other K8s objects Container image PersistentVolumeClaim PersistentVolume CSI-enabled storage backend Kubernetes Application Stack (Pods, Manifests, Storage mappings, etc) VS.
  • 13. 13 ETCD Failure Modes https://etcd.io/docs/v3.5/op-guide/failures/ Leader failure Follower failure Majority failure Majority failure Network Partition Network Partition
  • 14. 14 What to Remember about etcd?
  • 15. Enjoy the rest of the event! Image by https://www.opsramp.com/guides/why-kubernetes/who-made-kubernetes/ 15