SlideShare a Scribd company logo
1 of 63
1
Monitor Disk
Space
and other ways to keep Kafka
happy
Gwen Shapira, @gwenshap, Software Engineer
Me
• Software engineer @ Confluent
• Committer on Apache Kafka
• Co-author of
“Kafka - the Definitive Guide”
• Tweets a lot: @gwenshap
• Learning to devops
3
In which disk-related failure
scenarios are discussed in
unprecedented level of detail
4
Apache Kafka in 3 slides
Producer
Consume
r
Kafka Cluster
Stream Processing Apps
Connectors Connectors
Partitions
• Kafka organizes messages into
topics
• Each topics have a set of
partitions
• Each partition is a replicated log
of messages, referenced by
sequential offset
Partition 0
Partition 1
Partition 2
0 1 2 3 4 5
0 1 2 3 4 5 6 7
0 1 2 3 4
Offset
Replication
• Each Partition is replicated 3
times
• Each replica lives on separate
broker
• Leader handles all reads and
writes.
• Followers replicate events
from leader.
01234567
Replica 1 Replica 2 Replica 3
01234567
01234567
Producer
8
The way failures SHOULD go
9
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
10
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
11
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
Oh no.
Broker 100
is missing.
12
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
Broker 102: you
now lead
partition 1
Broker 101: you
now follow
broker 102 for
partition 1
13
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
14
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
15
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
Broker 100
is back!
Broker 100:
Note the
new
leaders: 101
and 102
16
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
What did I
miss?
17
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
What did I
miss?
Lots of
events!
18
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
Thanks
guys, I
caught up!
19
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
Broker 100,
you are
preferred
leader for
partition 1
Broker 101,
follow broker
100 for
partition 1
Broker 102,
follow broker
100 for
partition 1
20
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
21
What could possibly go wrong?
22
When Kafka runs out of disk
space
23
Best case scenario:
Broker ran out of disk space
and crashed.
24
Solution:
1. Get bigger disks
2. Store less data
25
What not to do. Ever:
cat /dev/null > /data/log/my_topic-
15/00000000000001548736.log
While Kafka is up and running.
26
When you are in a hole
Stop digging.
Don’t know where the holes are?
Walk slowly.
27
General Tips for
Stable Kafka
● Over-provision
● Upgrade to latest bug-fixes
● Don’t mess with stuff you don’t
understand.
● Call support when you have to
28
Bad Scenario
https://issues.apache.org/jira/browse/KAFKA-7151
29
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1000 Latest offset 1000 Latest offset 1000
Latest offset 1000 Latest offset 1000 Latest offset 1000
30
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1010 Latest offset 1000 Latest offset 1000
Latest offset 1000 Latest offset 1000 Latest offset 1000
What did
I miss?
Anything
after
1000?
31
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1010 Latest offset 1000 Latest offset 1000
Latest offset 1000 Latest offset 1000 Latest offset 1000
Here is
1001 to
1010
32
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1010 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1000 Latest offset 1010
33
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1000 Latest offset 1010
34
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
What did
I miss?
Anything
after
1010?
35
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
Too busy
trying to
access
disk
36
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
IO
ERROR
37
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
✗
Too far
behind to
be leader
Too far
behind to
be leader
38
Downtime.
39
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1000 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
I’m back.
As you know, I’m the
leader.
Based on my disk,
latest event is 1000
40
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1000 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
What did I
miss?
LOL. No.
Latest is 1010.
We can’t follow
you.
41
Solution:
Enable unclean leader election.
Lose messages 1010-1020.
42
Solution:
https://issues.apache.org/jira/browse/KAFKA-7151
43
Solution
45
Systems Hierarchy of Needs
CPU
Bandwidth
Disk
RAM
46
Most common Symptom:
Under-replicated partitions
You basically can’t alert on that.
We monitor the resources,
act early and add resources.
47
How to add CPU / Bandwidth?
Normally by adding brokers
And rebalancing partitions
48
When good EBS volumes go bad
49
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Hanging.
Not talking to
anyone
Zookeeper: /brokers/100, 101, 102
50
What will happen?
Lets zoom in
51
Zookeeper: /brokers/100, 101, 102
Network Threads
Also reading from disk
Partition 1
Replica 101
Broker 100 Broker 101
Partition 2
Replica 101
Leader
Partition 1
Replica 100
Leader
Partition 2
Replica 100Request Threads
Writing to disk
Replica Fetchers
Reading from leader
Writing to disk
Zookeeper Client
No disks involved
Hanging.
Not talking to
anyone
52
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗Only part
of the
broker
that is
alive!
53
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
Broker 100 is
totally alive!
No need to
elect leaders!
54
Downtime.
55
Solution:
Stop the broker ASAP.
Open ticket to replace disk
56
How to detect this?
● Broker is up
● Logs look fine
● Request Handler idle% is 0
● Network Handler idle% is 0
● Client time-out
57
58
Canary
● Lead partition on every broker
● Produce and Consume
● Every 10 seconds
● Yell if 3 consecutive misses
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Partition 2
Replica 100
Partition 2
Replica 101
Leader
59
You can reuse canary for simple
failure injection testing
60
Summary
61
You don’t really know how your
software will behave until it is in
production for quite a while.
62
More Key Points
● Keep an eye on your key resources
● Tread carefully in unknown territory
● Sometimes crashed broker is GOOD
● Monitor user scenarios -
especially for SLAs
63
64

More Related Content

Similar to Velocity 2019 - Kafka Operations Deep Dive

SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 

Similar to Velocity 2019 - Kafka Operations Deep Dive (20)

Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafka
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer Optimization
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
 
JDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof DębskiJDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof Dębski
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Processors (CPU)
Processors (CPU)Processors (CPU)
Processors (CPU)
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb
 
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
 
Microsoft kafka load imbalance
Microsoft   kafka load imbalanceMicrosoft   kafka load imbalance
Microsoft kafka load imbalance
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
 
Perforce Server: The Next Generation
Perforce Server: The Next GenerationPerforce Server: The Next Generation
Perforce Server: The Next Generation
 
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
 
MirrorMaker: Beyond the Basics with Mickael Maison
MirrorMaker: Beyond the Basics with Mickael MaisonMirrorMaker: Beyond the Basics with Mickael Maison
MirrorMaker: Beyond the Basics with Mickael Maison
 
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerUnder the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
 

More from Gwen (Chen) Shapira

More from Gwen (Chen) Shapira (20)

Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Velocity 2019 - Kafka Operations Deep Dive

  • 1. 1 Monitor Disk Space and other ways to keep Kafka happy Gwen Shapira, @gwenshap, Software Engineer
  • 2. Me • Software engineer @ Confluent • Committer on Apache Kafka • Co-author of “Kafka - the Definitive Guide” • Tweets a lot: @gwenshap • Learning to devops
  • 3. 3 In which disk-related failure scenarios are discussed in unprecedented level of detail
  • 4. 4 Apache Kafka in 3 slides
  • 6. Partitions • Kafka organizes messages into topics • Each topics have a set of partitions • Each partition is a replicated log of messages, referenced by sequential offset Partition 0 Partition 1 Partition 2 0 1 2 3 4 5 0 1 2 3 4 5 6 7 0 1 2 3 4 Offset
  • 7. Replication • Each Partition is replicated 3 times • Each replica lives on separate broker • Leader handles all reads and writes. • Followers replicate events from leader. 01234567 Replica 1 Replica 2 Replica 3 01234567 01234567 Producer
  • 8. 8 The way failures SHOULD go
  • 9. 9 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102
  • 10. 10 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗
  • 11. 11 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗ Oh no. Broker 100 is missing.
  • 12. 12 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗ Broker 102: you now lead partition 1 Broker 101: you now follow broker 102 for partition 1
  • 13. 13 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗
  • 14. 14 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102
  • 15. 15 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 Broker 100 is back! Broker 100: Note the new leaders: 101 and 102
  • 16. 16 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 What did I miss?
  • 17. 17 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 What did I miss? Lots of events!
  • 18. 18 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 Thanks guys, I caught up!
  • 19. 19 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 Broker 100, you are preferred leader for partition 1 Broker 101, follow broker 100 for partition 1 Broker 102, follow broker 100 for partition 1
  • 20. 20 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102
  • 22. 22 When Kafka runs out of disk space
  • 23. 23 Best case scenario: Broker ran out of disk space and crashed.
  • 24. 24 Solution: 1. Get bigger disks 2. Store less data
  • 25. 25 What not to do. Ever: cat /dev/null > /data/log/my_topic- 15/00000000000001548736.log While Kafka is up and running.
  • 26. 26 When you are in a hole Stop digging. Don’t know where the holes are? Walk slowly.
  • 27. 27 General Tips for Stable Kafka ● Over-provision ● Upgrade to latest bug-fixes ● Don’t mess with stuff you don’t understand. ● Call support when you have to
  • 29. 29 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000
  • 30. 30 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 What did I miss? Anything after 1000?
  • 31. 31 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Here is 1001 to 1010
  • 32. 32 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1010 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1010
  • 33. 33 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1010
  • 34. 34 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 What did I miss? Anything after 1010?
  • 35. 35 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 Too busy trying to access disk
  • 36. 36 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 IO ERROR
  • 37. 37 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 ✗ Too far behind to be leader Too far behind to be leader
  • 39. 39 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1000 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 I’m back. As you know, I’m the leader. Based on my disk, latest event is 1000
  • 40. 40 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1000 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 What did I miss? LOL. No. Latest is 1010. We can’t follow you.
  • 41. 41 Solution: Enable unclean leader election. Lose messages 1010-1020.
  • 44. 45 Systems Hierarchy of Needs CPU Bandwidth Disk RAM
  • 45. 46 Most common Symptom: Under-replicated partitions You basically can’t alert on that. We monitor the resources, act early and add resources.
  • 46. 47 How to add CPU / Bandwidth? Normally by adding brokers And rebalancing partitions
  • 47. 48 When good EBS volumes go bad
  • 48. 49 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Hanging. Not talking to anyone Zookeeper: /brokers/100, 101, 102
  • 50. 51 Zookeeper: /brokers/100, 101, 102 Network Threads Also reading from disk Partition 1 Replica 101 Broker 100 Broker 101 Partition 2 Replica 101 Leader Partition 1 Replica 100 Leader Partition 2 Replica 100Request Threads Writing to disk Replica Fetchers Reading from leader Writing to disk Zookeeper Client No disks involved Hanging. Not talking to anyone
  • 51. 52 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗Only part of the broker that is alive!
  • 52. 53 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗ Broker 100 is totally alive! No need to elect leaders!
  • 54. 55 Solution: Stop the broker ASAP. Open ticket to replace disk
  • 55. 56 How to detect this? ● Broker is up ● Logs look fine ● Request Handler idle% is 0 ● Network Handler idle% is 0 ● Client time-out
  • 56. 57
  • 57. 58 Canary ● Lead partition on every broker ● Produce and Consume ● Every 10 seconds ● Yell if 3 consecutive misses Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Partition 2 Replica 100 Partition 2 Replica 101 Leader
  • 58. 59 You can reuse canary for simple failure injection testing
  • 60. 61 You don’t really know how your software will behave until it is in production for quite a while.
  • 61. 62 More Key Points ● Keep an eye on your key resources ● Tread carefully in unknown territory ● Sometimes crashed broker is GOOD ● Monitor user scenarios - especially for SLAs
  • 62. 63
  • 63. 64

Editor's Notes

  1. When you download Apache Kafka, you get a cluster of brokers, Java clients, connector framework and stream processing libraries. We are going to talk about the cluster bits. Although all bits need to be monitored.
  2. Those two messages are sent async, so you may see few log messages where broker 101 tries to follow broker 102 and fails because broker 102 doesn’t know it is a leader yet. This should resolve itself in few ms.
  3. As you may have noticed, the “best scenario” is somewhat complex. There are lots of moving parts. So today we are taking a look at cases where things did not go as expected.
  4. You know things are bad when the BEST CASE is a crash. But this case is relatively easy: Get bigger disks. Easy on AWS, GCP and on most advanced storage system. Adjust retention policy and delete some large partitions by deleting entire directory (make sure they are not leaders!)
  5. You know things are bad when the BEST CASE is a crash. But this case is relatively easy: Get bigger disks. Easy on AWS, GCP and on most advanced storage system. Adjust retention policy and delete some large partitions by deleting entire directory (make sure they are not leaders!)
  6. This is snatching defeat from jaws of victory. If you are close to running out of space, but not there yet, you can do a lot of adjustments to retention or you can cleanly shut down the broker. The above “solution” will go undetected by the broker until someone tries to access an offset with non-existing data. Since the index still points to the now-empty file, this will be seen as corruption and can lead to crashes (depending on exact scenario).
  7. If this already happened, you don’t have lots of options.
  8. Note that this is a good solution because the alert is very actionable. Look at the awesome runbook! You can do this even at 2am.
  9. When you are close to the limit in any of those, the brokers will be unhealthy. Small things can
  10. When you are close to the limit in any of those, the brokers will be unhealthy. Small things can tip Kafka over the edge. The most stable systems are overprovisioned Brendan Gregg USE method is useful to understand the current state: http://www.brendangregg.com/usemethod.html
  11. Somewhat related problem except worse
  12. The moment you stop the broker, downtime is over.
  13. If you don’t want to be an expert… there are options.