SlideShare a Scribd company logo
1 of 63
1
Monitor Disk
Space
and other ways to keep Kafka
happy
Gwen Shapira, @gwenshap, Software Engineer
Me
• Software engineer @ Confluent
• Committer on Apache Kafka
• Co-author of
“Kafka - the Definitive Guide”
• Tweets a lot: @gwenshap
• Learning to devops
3
In which disk-related failure
scenarios are discussed in
unprecedented level of detail
4
Apache Kafka in 3 slides
Producer
Consume
r
Kafka Cluster
Stream Processing Apps
Connectors Connectors
Partitions
• Kafka organizes messages into
topics
• Each topics have a set of
partitions
• Each partition is a replicated log
of messages, referenced by
sequential offset
Partition 0
Partition 1
Partition 2
0 1 2 3 4 5
0 1 2 3 4 5 6 7
0 1 2 3 4
Offset
Replication
• Each Partition is replicated 3
times
• Each replica lives on separate
broker
• Leader handles all reads and
writes.
• Followers replicate events
from leader.
01234567
Replica 1 Replica 2 Replica 3
01234567
01234567
Producer
8
The way failures SHOULD go
9
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
10
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
11
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
Oh no.
Broker 100
is missing.
12
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
Broker 102: you
now lead
partition 1
Broker 101: you
now follow
broker 102 for
partition 1
13
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
14
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
15
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
Broker 100
is back!
Broker 100:
Note the
new
leaders: 101
and 102
16
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
What did I
miss?
17
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
What did I
miss?
Lots of
events!
18
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
Thanks
guys, I
caught up!
19
Partition 1
Replica 102
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 100
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
Broker 100,
you are
preferred
leader for
partition 1
Broker 101,
follow broker
100 for
partition 1
Broker 102,
follow broker
100 for
partition 1
20
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/100, 101, 102
21
What could possibly go wrong?
22
When Kafka runs out of disk
space
23
Best case scenario:
Broker ran out of disk space
and crashed.
24
Solution:
1. Get bigger disks
2. Store less data
25
What not to do. Ever:
cat /dev/null > /data/log/my_topic-
15/00000000000001548736.log
While Kafka is up and running.
26
When you are in a hole
Stop digging.
Don’t know where the holes are?
Walk slowly.
27
General Tips for
Stable Kafka
● Over-provision
● Upgrade to latest bug-fixes
● Don’t mess with stuff you don’t
understand.
● Call support when you have to
28
Bad Scenario
https://issues.apache.org/jira/browse/KAFKA-7151
29
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1000 Latest offset 1000 Latest offset 1000
Latest offset 1000 Latest offset 1000 Latest offset 1000
30
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1010 Latest offset 1000 Latest offset 1000
Latest offset 1000 Latest offset 1000 Latest offset 1000
What did
I miss?
Anything
after
1000?
31
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1010 Latest offset 1000 Latest offset 1000
Latest offset 1000 Latest offset 1000 Latest offset 1000
Here is
1001 to
1010
32
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1010 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1000 Latest offset 1010
33
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1000 Latest offset 1010
34
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
What did
I miss?
Anything
after
1010?
35
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
Too busy
trying to
access
disk
36
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
IO
ERROR
37
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1020 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
✗
Too far
behind to
be leader
Too far
behind to
be leader
38
Downtime.
39
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1000 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
I’m back.
As you know, I’m the
leader.
Based on my disk,
latest event is 1000
40
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Latest offset 1000 Latest offset 1010 Latest offset 1010
Latest offset 1000 Latest offset 1010 Latest offset 1010
What did I
miss?
LOL. No.
Latest is 1010.
We can’t follow
you.
41
Solution:
Enable unclean leader election.
Lose messages 1010-1020.
42
Solution:
https://issues.apache.org/jira/browse/KAFKA-7151
43
Solution
45
Systems Hierarchy of Needs
CPU
Bandwidth
Disk
RAM
46
Most common Symptom:
Under-replicated partitions
You basically can’t alert on that.
We monitor the resources,
act early and add resources.
47
How to add CPU / Bandwidth?
Normally by adding brokers
And rebalancing partitions
48
When good EBS volumes go bad
49
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101 Broker 102
Partition 1
Replica 102
Hanging.
Not talking to
anyone
Zookeeper: /brokers/100, 101, 102
50
What will happen?
Lets zoom in
51
Zookeeper: /brokers/100, 101, 102
Network Threads
Also reading from disk
Partition 1
Replica 101
Broker 100 Broker 101
Partition 2
Replica 101
Leader
Partition 1
Replica 100
Leader
Partition 2
Replica 100Request Threads
Writing to disk
Replica Fetchers
Reading from leader
Writing to disk
Zookeeper Client
No disks involved
Hanging.
Not talking to
anyone
52
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗Only part
of the
broker
that is
alive!
53
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Broker 102
Controller
Partition 1
Replica 102
Partition 2
Replica 100
Partition 2
Replica 101
Leader
Partition 2
Replica 102
Zookeeper: /brokers/101, 102
✗
Broker 100 is
totally alive!
No need to
elect leaders!
54
Downtime.
55
Solution:
Stop the broker ASAP.
Open ticket to replace disk
56
How to detect this?
● Broker is up
● Logs look fine
● Request Handler idle% is 0
● Network Handler idle% is 0
● Client time-out
57
58
Canary
● Lead partition on every broker
● Produce and Consume
● Every 10 seconds
● Yell if 3 consecutive misses
Partition 1
Replica 100
Leader
Partition 1
Replica 101
Broker 100 Broker 101
Partition 2
Replica 100
Partition 2
Replica 101
Leader
59
You can reuse canary for simple
failure injection testing
60
Summary
61
You don’t really know how your
software will behave until it is in
production for quite a while.
62
More Key Points
● Keep an eye on your key resources
● Tread carefully in unknown territory
● Sometimes crashed broker is GOOD
● Monitor user scenarios -
especially for SLAs
63
64

More Related Content

Similar to Velocity 2019 - Kafka Operations Deep Dive

SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 

Similar to Velocity 2019 - Kafka Operations Deep Dive (20)

Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafka
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer Optimization
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
 
JDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof DębskiJDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof Dębski
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Processors (CPU)
Processors (CPU)Processors (CPU)
Processors (CPU)
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb
 
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging -...
 
Microsoft kafka load imbalance
Microsoft   kafka load imbalanceMicrosoft   kafka load imbalance
Microsoft kafka load imbalance
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
 
Perforce Server: The Next Generation
Perforce Server: The Next GenerationPerforce Server: The Next Generation
Perforce Server: The Next Generation
 
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...
 
MirrorMaker: Beyond the Basics with Mickael Maison
MirrorMaker: Beyond the Basics with Mickael MaisonMirrorMaker: Beyond the Basics with Mickael Maison
MirrorMaker: Beyond the Basics with Mickael Maison
 
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerUnder the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
 

More from Gwen (Chen) Shapira

More from Gwen (Chen) Shapira (20)

Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 

Velocity 2019 - Kafka Operations Deep Dive

  • 1. 1 Monitor Disk Space and other ways to keep Kafka happy Gwen Shapira, @gwenshap, Software Engineer
  • 2. Me • Software engineer @ Confluent • Committer on Apache Kafka • Co-author of “Kafka - the Definitive Guide” • Tweets a lot: @gwenshap • Learning to devops
  • 3. 3 In which disk-related failure scenarios are discussed in unprecedented level of detail
  • 4. 4 Apache Kafka in 3 slides
  • 6. Partitions • Kafka organizes messages into topics • Each topics have a set of partitions • Each partition is a replicated log of messages, referenced by sequential offset Partition 0 Partition 1 Partition 2 0 1 2 3 4 5 0 1 2 3 4 5 6 7 0 1 2 3 4 Offset
  • 7. Replication • Each Partition is replicated 3 times • Each replica lives on separate broker • Leader handles all reads and writes. • Followers replicate events from leader. 01234567 Replica 1 Replica 2 Replica 3 01234567 01234567 Producer
  • 8. 8 The way failures SHOULD go
  • 9. 9 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102
  • 10. 10 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗
  • 11. 11 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗ Oh no. Broker 100 is missing.
  • 12. 12 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗ Broker 102: you now lead partition 1 Broker 101: you now follow broker 102 for partition 1
  • 13. 13 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗
  • 14. 14 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102
  • 15. 15 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 Broker 100 is back! Broker 100: Note the new leaders: 101 and 102
  • 16. 16 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 What did I miss?
  • 17. 17 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 What did I miss? Lots of events!
  • 18. 18 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 Thanks guys, I caught up!
  • 19. 19 Partition 1 Replica 102 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 100 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102 Broker 100, you are preferred leader for partition 1 Broker 101, follow broker 100 for partition 1 Broker 102, follow broker 100 for partition 1
  • 20. 20 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/100, 101, 102
  • 22. 22 When Kafka runs out of disk space
  • 23. 23 Best case scenario: Broker ran out of disk space and crashed.
  • 24. 24 Solution: 1. Get bigger disks 2. Store less data
  • 25. 25 What not to do. Ever: cat /dev/null > /data/log/my_topic- 15/00000000000001548736.log While Kafka is up and running.
  • 26. 26 When you are in a hole Stop digging. Don’t know where the holes are? Walk slowly.
  • 27. 27 General Tips for Stable Kafka ● Over-provision ● Upgrade to latest bug-fixes ● Don’t mess with stuff you don’t understand. ● Call support when you have to
  • 29. 29 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000
  • 30. 30 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 What did I miss? Anything after 1000?
  • 31. 31 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Latest offset 1000 Here is 1001 to 1010
  • 32. 32 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1010 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1010
  • 33. 33 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1000 Latest offset 1010
  • 34. 34 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 What did I miss? Anything after 1010?
  • 35. 35 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 Too busy trying to access disk
  • 36. 36 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 IO ERROR
  • 37. 37 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1020 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 ✗ Too far behind to be leader Too far behind to be leader
  • 39. 39 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1000 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 I’m back. As you know, I’m the leader. Based on my disk, latest event is 1000
  • 40. 40 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Latest offset 1000 Latest offset 1010 Latest offset 1010 Latest offset 1000 Latest offset 1010 Latest offset 1010 What did I miss? LOL. No. Latest is 1010. We can’t follow you.
  • 41. 41 Solution: Enable unclean leader election. Lose messages 1010-1020.
  • 44. 45 Systems Hierarchy of Needs CPU Bandwidth Disk RAM
  • 45. 46 Most common Symptom: Under-replicated partitions You basically can’t alert on that. We monitor the resources, act early and add resources.
  • 46. 47 How to add CPU / Bandwidth? Normally by adding brokers And rebalancing partitions
  • 47. 48 When good EBS volumes go bad
  • 48. 49 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Partition 1 Replica 102 Hanging. Not talking to anyone Zookeeper: /brokers/100, 101, 102
  • 50. 51 Zookeeper: /brokers/100, 101, 102 Network Threads Also reading from disk Partition 1 Replica 101 Broker 100 Broker 101 Partition 2 Replica 101 Leader Partition 1 Replica 100 Leader Partition 2 Replica 100Request Threads Writing to disk Replica Fetchers Reading from leader Writing to disk Zookeeper Client No disks involved Hanging. Not talking to anyone
  • 51. 52 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗Only part of the broker that is alive!
  • 52. 53 Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Broker 102 Controller Partition 1 Replica 102 Partition 2 Replica 100 Partition 2 Replica 101 Leader Partition 2 Replica 102 Zookeeper: /brokers/101, 102 ✗ Broker 100 is totally alive! No need to elect leaders!
  • 54. 55 Solution: Stop the broker ASAP. Open ticket to replace disk
  • 55. 56 How to detect this? ● Broker is up ● Logs look fine ● Request Handler idle% is 0 ● Network Handler idle% is 0 ● Client time-out
  • 56. 57
  • 57. 58 Canary ● Lead partition on every broker ● Produce and Consume ● Every 10 seconds ● Yell if 3 consecutive misses Partition 1 Replica 100 Leader Partition 1 Replica 101 Broker 100 Broker 101 Partition 2 Replica 100 Partition 2 Replica 101 Leader
  • 58. 59 You can reuse canary for simple failure injection testing
  • 60. 61 You don’t really know how your software will behave until it is in production for quite a while.
  • 61. 62 More Key Points ● Keep an eye on your key resources ● Tread carefully in unknown territory ● Sometimes crashed broker is GOOD ● Monitor user scenarios - especially for SLAs
  • 62. 63
  • 63. 64

Editor's Notes

  1. When you download Apache Kafka, you get a cluster of brokers, Java clients, connector framework and stream processing libraries. We are going to talk about the cluster bits. Although all bits need to be monitored.
  2. Those two messages are sent async, so you may see few log messages where broker 101 tries to follow broker 102 and fails because broker 102 doesn’t know it is a leader yet. This should resolve itself in few ms.
  3. As you may have noticed, the “best scenario” is somewhat complex. There are lots of moving parts. So today we are taking a look at cases where things did not go as expected.
  4. You know things are bad when the BEST CASE is a crash. But this case is relatively easy: Get bigger disks. Easy on AWS, GCP and on most advanced storage system. Adjust retention policy and delete some large partitions by deleting entire directory (make sure they are not leaders!)
  5. You know things are bad when the BEST CASE is a crash. But this case is relatively easy: Get bigger disks. Easy on AWS, GCP and on most advanced storage system. Adjust retention policy and delete some large partitions by deleting entire directory (make sure they are not leaders!)
  6. This is snatching defeat from jaws of victory. If you are close to running out of space, but not there yet, you can do a lot of adjustments to retention or you can cleanly shut down the broker. The above “solution” will go undetected by the broker until someone tries to access an offset with non-existing data. Since the index still points to the now-empty file, this will be seen as corruption and can lead to crashes (depending on exact scenario).
  7. If this already happened, you don’t have lots of options.
  8. Note that this is a good solution because the alert is very actionable. Look at the awesome runbook! You can do this even at 2am.
  9. When you are close to the limit in any of those, the brokers will be unhealthy. Small things can
  10. When you are close to the limit in any of those, the brokers will be unhealthy. Small things can tip Kafka over the edge. The most stable systems are overprovisioned Brendan Gregg USE method is useful to understand the current state: http://www.brendangregg.com/usemethod.html
  11. Somewhat related problem except worse
  12. The moment you stop the broker, downtime is over.
  13. If you don’t want to be an expert… there are options.