The Silver Bullet for Endless Rebalancing

A. Sophie Blee-Goldman, Guozhang Wang
Bay Area Kafka Meetup, Dec. 5, 2019
The Silver Bullet for Endless Rebalances
Introduction to the Incremental Cooperative Protocol

Outline
• Review of the current eager rebalance algorithm
• Identify the known issues with common scenarios
• A new proposal: incremental cooperative rebalancing
2

3
A Short History of Consumer Groups
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers

4
Consumers
Consumers
fetch
fetch
1) assignment (who owns what)
2) offset (consumed up to where)
Kafka 0.8.2-

5
Consumers
Consumers
fetch
fetch
1) assignment (who owns what)
2) offset (consumed up to where)
Kafka 0.9.0+
Group Coordinator

6
Consumer Rebalance Protocol
• A rebalance happens when:
• Membership change 
• Member crash: failure of a consumer 
• Scaling in: member leaves the group 
• Scaling out: new member joins
• Partition resources change
• Topics are created or deleted 
• More partitions added to topics

7
Member Crash: Failure Detection (heartbeat)
C1
C2
Group Coordinator (broker side)

7
C1
C2
heartbeat
ok

7
C1
C2
heartbeat heartbeat
ok ok

8
C1
C2
heartbeat
ok

8
C1
C2
session.timeout.ms
heartbeat
ok

9
Scaling In: Consumer Shutdown (leave-group)
C1
C2
heartbeat
ok

9
Scaling In: Consumer Shutdown (leave-group)
C1
C2
heartbeat leave-group
ok

10
Scaling Out: Consumer Startup (join-group)
C1
C2

10
C1
C2
C3

10
C1
C2
C3
join-group

11
Resources Change: Re-Subscribe (join-group)
C1
C2
C3
join-group

11
C1
C2
C3
join-group
consumer resubscribe

11
C1
C2
C3
join-group join-group
consumer resubscribe

12
Consumer Rebalance Protocol
• During the rebalance:
• Existing consumers re-join the group
• A single member is chosen as group leader
• leader determines partition assignment (user customizable)

13
Consumers Re-join Group
C1
C2
1 2 3
4 5 6

13
C1
C2
C3
1 2 3
4 5 6

13
C1
C2
C3
join-group
1 2 3
4 5 6

13
C1
C2
C3
join-group
re-join
1 2 3
4 5 6

13
C1
C2
C3
join-group
re-join
#onPartitionsRevoked(1,2,3)

#onPartitionsRevoked(4,5,6)1 2 3
4 5 6

13
C1
C2
C3
join-group
re-join


13
C1
C2
C3
join-group
re-join
join-group


13
C1
C2
C3
join-group
re-join
join-group
sync. barrier


13
C1
C2
C3
join-group
re-join
join-group
rebalance.timeout.ms
sync. barrier


14
Partition Reassignment (sync-group)
re-join
join-group


14
re-join
join-group
select C1 as leader


14
re-join
join-group
select C1 as leader
sync-group


14
re-join
join-group
select C1 as leader
#assign(…)
sync-group


14
re-join
join-group
select C1 as leader
#assign(…)
sync-group
C1: {1, 2}
C2: {4, 5}
C3: {3, 6}


14
re-join
join-group
select C1 as leader
#assign(…)
sync-group
#onPartitionsAssigned(1,2)


C1: {1, 2}
C2: {4, 5}
C3: {3, 6}


14
re-join
join-group
select C1 as leader
#assign(…)
sync-group


C1: {1, 2}
C2: {4, 5}
C3: {3, 6}
1 2
4 5
3 6


15
Summary of Rebalance Protocol
• ConsumerRebalanceListener
• #onPartitionsRevoked (before sending join-group) 
• #onPartitionsAssigned (after receiving sync-group)
• ConsumerPartitionAssignor
• #assign (only triggered by the leader)
Built-in: {range, round-robin, sticky}; Custom: {streams, …}

16
Known Issue #1: Stop-the-world Rebalance
join-group
re-
join-group
#onPartitionsRevoked(all partitions) #assign(…)
sync-
#onPartitionsAssigned(given partitions)
re-join
sync-group
C1
C2
Group Coordinator(broker side)
C3
1 2 3
4 5 6
1 2
4 5
3 6

16
join-group
re-
join-group
sync-
re-join
sync-group
C1
C2
C3
revoked all
1 2 3
4 5 6
1 2
4 5
3 6

16
join-group
re-
join-group
sync-
re-join
sync-group
C1
C2
C3
revoked all re-assigned most
1 2 3
4 5 6
1 2
4 5
3 6

16
join-group
re-
join-group
sync-
re-join
sync-group
C1
C2
C3
revoked all re-assigned most
eager rebalance:
before rebalance revoked all the partitions,
after rebalance most of the partitions are reassigned back
1 2 3
4 5 6
1 2
4 5
3 6

17
Known Issue #2: Back-and-forth Rebalance
join-group
re-
join-group
#onPartitionsRevoked(all partitions)
sync-
re-join
sync-group
C1
C2
C3
leave-group
#assign(…) #onPartitionsRevoked(all partitions) #assign(…)
1 2
4 5
3 6
1 2
4 5
3 6

17
join-group
re-
join-group
sync-
re-join
sync-group
C1
C2
C3
leave-group
bounce a consumer
1 2
4 5
3 6
1 2
4 5
3 6

17
join-group
re-
join-group
sync-
re-join
sync-group
C1
C2
C3
leave-group
unnecessary rebalances:  
 
first one to move partitions from C3 to C1/C2, 
second one to move them back to C3 from C1/C2
bounce a consumer
1 2
4 5
3 6
1 2
4 5
3 6

18
Let’s Revisit:  
 
When to trigger a rebalance,   
Who to participate in a rebalance,   
What to reassign during rebalance

19
Rebalance Protocols
When Who What
Current Protocol
(Eager)
Immediately Everyone Everything

20
When Who What
Current Protocol
(Eager)
Immediately Everyone Everything
Proposed Protocol
(Cooperative)
After determining what
needs to be reassigned
Only those whose
assignment will be
changed
Only those partitions
who change ownership
Rebalance Protocols

21
Rebalance Protocols
• [KIP-415] : incremental rebalance in Connect(2.3+) 
• [KIP-345] : static membership in Consumer / Streams(2.3+) 
• [KIP-429] : incremental rebalance in Consumer / Streams(2.4+)

22
Incremental Assignment in the Consumer

22
owned-partitions

22
owned-partitions assigned-partitions

22
unchanged-partitions

22
partitions-to-be-revoked

22
partitions-to-be-revoked
#onPartitionsRevoked

22
partitions-to-be-revoked partitions-to-be-added
#onPartitionsRevoked

22
partitions-to-be-revoked partitions-to-be-added
#onPartitionsRevoked #onPartitionsAssigned

23
Cooperative Protocol
C1
C2
C3
join-group
re-join
1 2 3
4 5 6

23
C1
C2
C3
join-group
re-join
join-group
1 2 3
4 5 6

23
C1
C2
C3
join-group
re-join
join-group
1 2 3
4 5 6
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: { }

23
C1
C2
C3
join-group
re-join
join-group
1 2 3
4 5 6
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: { }
#assign(…)
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: {3*, 6*} -> { }

23
C1
C2
C3
join-group
re-join
join-group
1 2 3
4 5 6
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: { }
sync-group
#assign(…)
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: {3*, 6*} -> { }

23
C1
C2
C3
join-group
re-join
join-group
1 2 3
4 5 6
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: { }
sync-group
#assign(…)
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: {3*, 6*} -> { }
#onPartitionsRevoked(3)


23
C1
C2
C3
join-group
re-join
join-group
1 2
4 5
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: { }
sync-group
#assign(…)
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: {3*, 6*} -> { }

join-group
C1: {1, 2}
C2: {4, 5}
C3: { }

24

join-group
C1: {1, 2}
C2: {4, 5}
C3: { }
1 2
4 5
C1
C2
C3

24

join-group
C1: {1, 2}
C2: {4, 5}
C3: { }
#assign(…)
1 2
4 5
C1
C2
C3

24

join-group
C1: {1, 2}
C2: {4, 5}
C3: { }
#assign(…)
C1: {1, 2}
C2: {4, 5}
C3: {3, 6}1 2
4 5
C1
C2
C3

24

join-group
C1: {1, 2}
C2: {4, 5}
C3: { }
sync-group
#assign(…)
C1: {1, 2}
C2: {4, 5}
C3: {3, 6}1 2
4 5
C1
C2
C3

24

join-group
C1: {1, 2}
C2: {4, 5}
C3: { }
sync-group
#assign(…)
C1: {1, 2}
C2: {4, 5}
C3: {3, 6}
1 2
4 5
3 6
C1
C2
C3

25
Cooperative Rebalance
C1
C2
C3
join-group
re-
join-group
1 2 3
4 5 6
C1: {1, 2, 3} C2: {4, 5, 6}
sync-group
#assign(…)
C1: {1, 2, 3}
C2: {4, 5, 6}
C3: {3*, 6*} -> {}

join-group
C1: {1, 2} C2: {4, 5}
sync-group
#assign(…)
C1: {1, 2}
C2: {4, 5}
C3: {3, 6}
1 2
4 5
3 6
• Trade-off: more rebalances, but way cheaper
• Works better with a “sticky” assignor: fewer partitions to migrate
• Consumers can continue to fetch during a rebalance event (2.5+)

26
Benchmark Results
• 10 streams instances rolling bounce, measuring process rate
• …and pause time: 3522 ms v.s. 37138 ms

27
Augmented Assignor Interface
ConsumerPartitionAssignor
• #assign (subscription now includes “owned-partitions”)
• #supportedProtocols (eager and/or cooperative)
Built-in: {range, round-robin, sticky : eager}; 
 
{sticky-cooperative : cooperative} 
 
Custom: {streams, … : eager and cooperative}

28
Augmented Listener Interface
ConsumerRebalanceListener
• #onPartitionsRevoked (will not be triggered if there is nothing to revoke) 
• #onPartitionsAssigned (triggered at completion of rebalance, regardless of newly added partitions)
• # #onPartitionsLost (triggered instead of onPartitionRevoked when a member falls out of group)

29
Switch to Cooperative Rebalancing
In Consumer
• first rolling bounce: add “sticky-cooperative” / “my-cooperative” to [partition.assignment.strategy]
• second rolling bounce: remove old assignor (e.g.,“range”) from the config
In Streams
• first rolling bounce: set [upgrade.from = old version (“2.3”)]
• second rolling bounce: remove [upgrade.from] config

Take-aways
• We have extended the rebalance protocol to enable
smarter assignment (when, who, and what)
30

Take-aways
• We have extended the rebalance protocol to enable
smarter assignment (when, who, and what)
• No more stop-the-world rebalances with the incremental
cooperative protocol!
31

THANKS!
Guozhang Wang | guozhang@confluent.io | @guozhangwang
32
A. Sophie Blee-Goldman | sophie@confluent.io | @ableegoldman

The Silver Bullet for Endless Rebalancing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Silver Bullet for Endless Rebalancing

Similar to The Silver Bullet for Endless Rebalancing (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

The Silver Bullet for Endless Rebalancing