A Deep Dive into Kafka
Controller
Jun Rao
VP of Apache Kafka
Co-founder of Confluent
Apache Kafka overview
• Core
• Pub/sub
• Connect
• Integration
• Streams
• Processing
Kafka adoption in enterprises
6 of the top 10
travel companies
8 of the top 10
insurance companies
7 of the top 10
global banks
9 of the top 10
telecom companies
Kafka Replication
• Configurable replication factor
• Tolerating f – 1 failures with f replicas
• Automated failover
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1
High Level Data Flow in Replication
broker 1
producer
leader
broker 2
follower
broker 3
follower
4
2
2
3
commit
ack
topic1-part1 topic1-part1 topic1-part1
consumer
1
What’s controller
6
• One broker in a cluster acts as controller
• Monitor the liveness of brokers
• Elect new leaders on broker failure
• Communicate new leaders to brokers
Controller election
Zookeeper
/controller	à broker	0
Controller
broker	0 broker	3broker	2broker	1
Partition state: stored in ZK, cached in
controller
Zookeeper
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Controller
broker	0 broker	3broker	2broker	1
Controlled shutdown
SIG_TERM
Zookeeper
Controller
1
2
broker	2
part	t-0:	follower
part	t-1:	follower
broker	1
part	t-0:	leader
part	t-1:	leader
broker	0
Zookeeper
Controller
3
5
broker	2
part	t-0:	leader
part	t-1:	leader
broker	1
part	t-0:
part	t-1:
broker	0
4
/topics/t/0	à 2
/topics/t/1	à 2
Issues with controlled shutdown (pre 1.1)
Zookeeper
Controller
3
5
broker	0
4
Writes to ZK
are serial
Impact:
longer
shutdown
time
Communication of new
leaders not batched
Impact: client timeout
broker	2
part	t-0:	leader
part	t-1:	leader
broker	1
part	t-0:
part	t-1:
/topics/t/0	à 2
/topics/t/1	à 2
Controller failover
Zookeeper
/controller	à broker	0
Controller
broker	0 broker	3broker	2broker	1
1
Controller failover
Controller
broker	0 broker	3broker	2broker	1
1 2
Controller
Zookeeper
/controller	à broker	2
Controller failover
Zookeeper
/controller	à broker	2
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Controller
broker	0 broker	3broker	2broker	1
1 2
3
Controller
Issues with controller failover (pre 1.1)
Controller
broker	0 broker	3broker	2broker	1
1 2
3
Controller
Reads from ZK are serial
Impact: availability
Zombie old controller
Impact: inconsistency
Zookeeper
/controller	à broker	2
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Performance improvements in 1.1
15
• Controller uses async ZK api for reads/writes
• Controller communicates new leaders to brokers in batches
part	1 part	2 part	3 part	4
part	1
part	2
part	3
part	4
Old	(serial):
New	(pipelined):
/topics/t/0	à 2
/topics/t/1	à 2
Controlled shutdown (post 1.1)
Zookeeper
Controller
3
5
broker	0
4
Writes to ZK
pipelined
Communication of new
leaders batched
broker	2
part	t-0:	leader
part	t-1:	leader
broker	1
part	t-0:
part	t-1:
Controller failover (post 1.1)
Controller
broker	0 broker	3broker	2broker	1
1 2
3
Controller
Reads from ZK pipelined
Zookeeper
/controller	à broker	2
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Results for controlled shutdown
18
• 5 ZK nodes and 5 brokers on different racks
• 25K topics, 1 partition, 2 replicas
• 10K partitions per broker
Kafka	1.0.0 Kafka	1.1.0
Controlled	shutdown	time 6.5	minutes 3	seconds
Results for controller failover
19
• 5 ZK nodes and 5 brokers on different racks
• 2K topics, 50 partitions, 1 replica
• Controller failover: reload100K partitions from ZK
Kafka	1.0.0 Kafka	1.1.0
State	reload	time 28	seconds 14	seconds
Fencing zombie controller
20
• ZK session expiration
• Better handling in the controller (1.1)
• Controller path deletion
• Writes to ZK conditioned on controller epoch (to be in 2.1)
Controller failover (expected in 2.1)
Controller
broker	0 broker	3broker	2broker	1
1 2
Controller
Zombie old controller
fenced
Zookeeper
/controller	à broker	2
Summary
• Significant performance improvement in controller in 1.1
• Allow 10X more partitions in a Kafka cluster
• Better fencing of zombie controller in 1.1 and 2.1
• More details in KAFKA-5027
Future work in controller
• Further improvement on controller failover
• Standby controller
• Better handling of quick broker restart (KAFKA-1120)
• Broker generation
Q/A
• Acknowledgment: Onur Karaman, Manikumar Reddy,
Prasanna Gautam, Ismael Juma, Mickael Maison, Sandor
Murakozi, Rajini Sivaram,Ted Yu, Zhanxiang Huang
• Apache Kafka: http://kafka.apache.org/
• Confluent: http://confluent.io/

A Deep Dive into Kafka Controller