1
Kafka Needs No Keeper
Colin McCabe
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
kafka-zookeeper/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
2
● Kafka has gotten its mileage out of Zookeeper
● But it is still a second system
● KIP-500 has been adopted by the community
● This is not a 1-1 replacement
● We’ve been headed this direction for years
Introduction
3
Evolution of Apache
Kafka Clients
4
Producer
Consumer
Admin Tools
5
write to
topics
Producer
Consumer
Admin Tools
6
write to
topics
read from
topics
Producer
Consumer
Admin Tools
7
write to
topics
read from
topics
offset
fetch/commit
group partition
assignment
Producer
Consumer
Admin Tools
8
write to
topics
read from
topics
offset
fetch/commit
group partition
assignment
topic
create/delete
Producer
Consumer
Admin Tools
9
Consumer Group Coordinator
10
Consumer
offset
fetch/commit
group partition
assignment
read from
topics
11
Consumer
offset
fetch/commit
group partition
assignment
read from
topics
Consumer APIs
● Fetch
12
Consumer
offset
fetch/commit
group partition
assignment
read from
topics
Consumer APIs
● Fetch
13
Consumer
Consumer APIs
● Fetchoffset
fetch/commit
group partition
assignment
read from
topics
__offsets
14
offset
fetch/commit
Consumer
group partition
assignment
read from
topics
Consumer APIs
● Fetch
● OffsetCommit
● OffsetFetch
__offsets
15
Consumer
group partition
assignment
read from
topics
offset
fetch/commit Consumer APIs
● Fetch
● OffsetCommit
● OffsetFetch
__offsets
16
Consumer
group partition
assignment
read from
topics
offset
fetch/commit Consumer APIs
● Fetch
● OffsetCommit
● OffsetFetch
__offsets
17
group partition
assignment
Consumer
read from
topics
offset
fetch/commit Consumer APIs
● Fetch
● OffsetCommit
● OffsetFetch
● JoinGroup
● SyncGroup
● Heartbeat
__offsets
18
Consumer
read from
topics
offset
fetch/commit
group partition
assignment
Consumer APIs
● Fetch
● OffsetCommit
● OffsetFetch
● JoinGroup
● SyncGroup
● Heartbeat
__offsets
19
Consumer
read from
topics
offset
fetch/commit
group partition
assignment
Consumer APIs
● Fetch
● OffsetCommit
● OffsetFetch
● JoinGroup
● SyncGroup
● Heartbeat
__offsets
20
read from
topics
offset
fetch/commit
group partition
assignment
Consumer
Consumer APIs
● Fetch
● OffsetCommit
● OffsetFetch
● JoinGroup
● SyncGroup
● Heartbeat
__offsets
21
Consumer
Producer
Admin Tools
create/delete
topics
22
Kafka Security
and the
Admin Client
23
Consumer
Producer
create/delete
topics
Admin Tools
24
ACL Enforcement
create/delete
topics
Admin Tools
Consumer
Producer
25
create/delete
topics
ACL Enforcement
Admin Tools
Consumer
Producer
26
create/delete
topics
ACL Enforcement
Admin Tools
27
AdminClient
Admin Tools
ACL Enforcement
create/delete
topics
28
AdminClient
Admin Tools
ACL Enforcement
create/delete
topics
Admin APIs:
● CreateTopics
● DeleteTopics
● AlterConfigs
● ...
29
Admin APIs:
● CreateTopics
● DeleteTopics
● AlterConfigs
● ...
AdminClient
Admin Tools
ACL Enforcement
30
Producer
Consumer
AdminClient
Client APIs:
● Produce
● Fetch
● Metadata
● CreateTopics
● DeleteTopics
● ...
31
Producer
Consumer
AdminClient
Client APIs:
● Produce
● Fetch
● Metadata
● CreateTopics
● DeleteTopics
● ...
● Encapsulation
● Security
● Validation
● Compatibility
32
Inter Broker
Communication
33
34
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
35
Controller
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
36
Controller
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
37
Controller
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
38
Controller
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
39
Controller Controller APIs:
● LeaderAndIsr
● UpdateMetadata
● StopReplica
Leader/ISR Push
Update Metadata
Stop/Delete Replica
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
40
Controller Controller APIs:
● LeaderAndIsr
● UpdateMetadata
● StopReplica
Leader/ISR Push
Update Metadata
Stop/Delete Replica
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
41
Controller Controller APIs:
● LeaderAndIsr
● UpdateMetadata
● StopReplica
Leader/ISR Push
Update Metadata
Stop/Delete Replica
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
42
Controller Controller APIs:
● LeaderAndIsr
● UpdateMetadata
● StopReplica
● AlterIsr
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
Leader/ISR Push
Update Metadata
Stop/Delete Replica
43
Controller
Leader/ISR Push
Update Metadata
Stop/Delete Replica
ISR Management
Controller APIs:
● LeaderAndIsr
● UpdateMetadata
● StopReplica
● AlterIsr
Broker Registration
ACL Management
Dynamic Configuration
ISR Management
Controller Election
44
45
● Encapsulation
● Compatibility
● Ownership
46
Broker Liveness
47
Zk Session
48
/brokers/1 -> {
host: 10.10.10.1:9092
rack: rack-1
}
49
/brokers/1 -> {
host: 10.10.10.1:9092
rack: rack-1
}
50
51
Watch trigger
52
Watch trigger
Broker 1 is offline
53
Network Partition
Resilience
54
55
Case 1: Total partition
56
Case 2: Broker partition
57
Case 3: Zk Partition
58
Case 4: Controller partition
59
Metadata Inconsistency
60
61
Metadata Source
of Truth
62
Metadata Source
of Truth
Metadata Cache
- sync writes
- async updates
63
Metadata Source
of Truth
Metadata Cache
- async update
Metadata Cache
- sync writes
- async updates
Metadata Cache
- async update
64
65
66
67
Last Resort:
> rmr /controller
68
Last Resort:
> rmr /controller
New controller!
69
Last Resort:
> rmr /controller
Load ALL
Metadata
70
Last Resort:
> rmr /controller
Load ALL
Metadata
71
Last Resort:
> rmr /controller
Push ALL
Metadata
72
Last Resort:
> rmr /controller
Push ALL
Metadata
73
Last Resort:
> rmr /controller
Push ALL
Metadata
How do you know the metadata
has diverged?
74
Performance of Controller
Initialization
75
76
77
New controller!
78
Load ALL
Metadata
79
Load ALL
Metadata
Complexity: O(N)
N = number of partitions
80
81
Push ALL
Metadata
82
Push ALL
Metadata
Complexity: O(N*M)
N = number of partitions
M = number of brokers
83
Metadata as an
Event Log
8484
Metadata as an
Event Log
- Each change becomes a
message
- Changes are propagated
to all brokers
...
924 Create topic ”foo”
925 Delete topic “bar”
926 Add node 4 to the cluster
927 Create topic “baz”
928 Alter ISR for “foo-0”
929 Add node 5 to the cluster
8585
Metadata as an
Event Log
- Clear ordering
- Can send deltas
- Offset tracks consumer
position
- Easy to measure lag
...
924 Create topic ”foo”
925 Delete topic “bar”
926 Add node 4 to the cluster
927 Create topic “baz”
928 Alter ISR for “foo-0”
929 Add node 5 to the cluster
86
Consumer
Consumer
Consumer
offset=3
offset=1
offset=2
87
offset=3
offset=1
offset=2
Broker
Broker
Broker
?
88
offset=3
offset=1
offset=2
Broker
Broker
Broker
Controller
89
Can we use the existing Kafka log
replication protocol?
- How do we elect the leader?
We need a self-managed quorum.
Implementing
the Controller
Log
90
Can we use the existing Kafka log
replication protocol?
- How do we elect the leader?
We need a self-managed quorum.
Implementing
the Controller
Log
Enter Raft.
Leader election
is by simple
majority.
91
Kafka Raft
Writes Single Leader Single Leader
Fencing Monotonically increasing
epoch
Monotonically increasing
term
Log reconciliation Offset and epoch Term and index
Push/Pull Pull Push
Commit Semantics ISR Majority
Leader Election From ISR through
Zookeeper
Majority
92
The Controller Quorum
93
offset=1
offset=2
Broker
Broker
Controller
Controller
Controller
The Controller Raft Quorum
- The leader is the active controller
- Controls reads / writes to the log
- Typically 3 or 5 nodes, like ZK
94
offset=1
offset=2
Broker
Broker
Controller
Controller
Controller
Instant Failover
- Low-latency failover via Raft election
- Standbys contain all data in memory
- Brokers do not need to re-fetch
95
/mnt/logs/kafka/metadata
offset=1
Broker
Broker
Controller
Controller
Controller
Metadata Caching
- Brokers can persist metadata to disk
- Only fetch what they need
- Use snapshots if we’re too far behind
/mnt/logs/kafka/metadata
offset=2
96
Broker Registration
- Building a map of the
cluster
- What brokers exist in
the cluster?
- How can they be
reached?
Controller
97
Broker Registration
- Brokers send
heartbeats to the active
controller
- The controller uses this
to build a map of the
cluster
Controller
98
Controller
Broker Registration
- Brokers send
heartbeats to the active
controller
- The controller uses this
to build a map of the
cluster
- The controller also tells
brokers if they should
be fenced or shut down
99
Controller
Fencing
- Brokers need to be
fenced if they’re
partitioned from the
controller, or can’t keep
up
- Brokers self-fence if
they can’t talk to the
controller
100
Handling network
partitions
101
Case 1: Total partition
102
Case 1: Total partition
103
Case 2: Broker
partition
104
Case 3:
Controller
partition
105
Case 3:
Controller
partition
106
Deployment
Current KIP-500
Configuration File Kafka and
ZooKeeper
Kafka
Metrics Kafka and ZK Kafka
Administrative
Tools
ZK Shell, Four
letter words,
Kafka tools
Kafka tools
Security Kafka and ZK Kafka
107
Shared
Controller
Nodes
- Fewer resources
used
- Single node
clusters
(eventually)
108
Separate
Controller
Nodes
- Better resource
isolation
- Good for big
clusters
109
Roadmap
110
Remove
Client-side ZK
dependencies
Remove
Broker-side ZK
dependencies
Controller
Quorum
111
Remove
Client-side ZK
dependencies
Remove
Broker-side ZK
dependencies
Controller
Quorum
Incremental KIP-4
Improvements
- Create new APIs
- Deprecate direct ZK
access
112
Remove
Client-side ZK
dependencies
Remove
Broker-side ZK
dependencies
Controller
Quorum
Broker-Side Fixes
- Remove deprecated
direct ZK access for
tools
- Create broker-side
APIs
- Centralize ZK access
in the controller
113
Remove
Client-side ZK
dependencies
Remove
Broker-side ZK
dependencies
Controller
Quorum
First Release
without ZooKeeper
- Raft
- Controller quorum
114
Upgrade
Issues
- Tools using ZK
- Brokers
accessing ZK
- State in ZK
KIP-500 Release
Older Kafka Release
115
Bridge Release
KIP-500 Release
Older Kafka Release
Bridge
Release
- No ZK access
from tools,
brokers
(except
controller)
116
Upgrading
- Starting from the
bridge release
117
Upgrading
- Start new controller
nodes (possibly
combined)
- Quorum elects leader
- Claims leadership in
ZK
118
Upgrading
- Roll nodes one by
one as usual
- Controller continues
sending
LeaderAndIsr, etc. to
old nodes
119
Upgrading
- When all brokers
have been rolled,
decommission ZK
nodes
120
Conclusion
121
Apache ZooKeeper has served us well
- KIP-500 is not a 1:1 replacement, but a different
paradigm
We have already started removing ZK from clients
- Consumer, AdminClient
- Improved encapsulation, security, upgradability
122
Metadata should be managed as a log
- Deltas, ordering, caching
- Controller Failover, Fencing
- Improved scalability, robustness, easier deployment
The metadata log must be self-managed
- Raft
- Controller quorum
123
It will take a few releases to implement KIP-500
- Additional KIPs for APIs, Raft, Metadata, etc.
Rolling upgrades will be supported
- Bridge release
- Post-ZK release
Kafka needs no Keeper
124
cnfl.io/meetups cnfl.io/blog cnfl.io/slack
THANK YOU
Colin McCabe
cmccabe@confluent.io
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
kafka-zookeeper/

Kafka Needs No Keeper