In this presentation, we introduce static membership (KIP-345) and share the story of adopting it at Pinterest. The static membership aims to improve the availability of stream applications, consumer groups and other applications built on top of it. The original rebalance protocol relies on the group coordinator to allocate entity ids to group members. These generated ids are ephemeral and will change when members restart and rejoin. For consumer based apps, this "dynamic membership" can cause a large percentage of tasks re-assigned to different instances during administrative operations such as code deploys, configuration updates and periodic restarts. For large state applications, shuffled tasks need a long time to recover their local states before processing and cause applications to be partially or entirely unavailable. At Pinterest, the group membership is stable between administrative operations. Motivated by this observation, we modified the Kafka's group management protocol allowing group members to provide persistent entity ids. Group membership remains unchanged based on those ids, thus no rebalance will be triggered. We can conveniently leverage Kubernetes or other cloud management frameworks to provide entity ids. By adopting static membership to the realtime infrastructure at Pinterest, applications resume processing only a few seconds after administrative operations finish. Previously with dynamic membership, it can take more than 30 minutes before applications resume. The talk is organized as follows: we first review Kafka's group management protocol and demonstrate high availability use cases that dynamic membership is unable to support. Then we share the design and adoption story of static membership. At the end, we do a live demo to show the impact of static membership.
2. 2
Software Engineer at
Confluent
Kafka Summit SF 2018: Building Pinterest Real-Time Ads Platform Using Kafka Streams
Senior Software Engineer, Tech
Lead at Pinterest
28. 28
Task recovery time breakdown
Our Goal
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
29. 29
availability = 1 - (∑task recovery time / ∑task online time)
task recovery time = session timeout + rebalance time + task assignment time + state restore time
30. 30
availability = 1 - (∑task recovery time / ∑task online time)
task recovery time = session timeout + rebalance time + task assignment time + state restore time
31. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
31Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2 T3
T5T4 T6
T7
T8
Generation 1
Generation 1 Generation 1 Generation 1
32. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
32Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2 T3
T5T4 T6
T7
T8
Generation 1
Generation 1 Generation 1 Generation 1
33. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
2. Members rejoin the group
33Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2 T3
T5T4 T6
T7
T8
Generation 1
Generation 1 Generation 1 Generation 1
34. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
2. Members rejoin the group
3. All members are rejoined
34Coordinator
T3T1 T2 T8T7
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5
Generation 1
Generation 1 Generation 1 Generation 1
35. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
2. Members rejoin the group
3. All members are rejoined
4. Generation ID bumps up by one
35Coordinator
T3T1 T2 T8T7
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5
Generation 1
Generation 2
Generation 1 Generation 1 Generation 1
36. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
2. Members rejoin the group
3. All members are rejoined
4. Generation ID bumps up by one
5. Reply to the join request with a
new generation and nominate a
member as leader
36Coordinator
T3T1 T2 T8T7
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5
Generation 2
Generation 1 Generation 1 Generation 1
37. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
2. Members rejoin the group
3. All members are rejoined
4. Generation ID bumps up by one
5. Reply to the join request with a
new generation and nominate a
member as leader
37Coordinator
T3T1 T2 T8T7
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5
Generation 2
Generation 2 Generation 1 Generation 1
38. Kafka Rebalance
1. Group is stable with three
consumers and eight tasks at
Generation 1
2. Members rejoin the group
3. All members are rejoined
4. Generation ID bumps up by one
5. Reply to the join request with a
new generation and nominate a
member as leader
38Coordinator
T3T1 T2 T8T7
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5
Generation 2
Generation 2 Generation 2 Generation 1
39. Kafka Rebalance
1. …
2. Leader performs assignment
while other members wait
39Coordinator
T3T1 T2 T8T7
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5
T1 T2
T5T4
T7 T8
T3 T6
Generation 2
Generation 2 Generation 2 Generation 2
40. Kafka Rebalance
1. …
2. Leader performs assignment
while other members wait
40Coordinator
T3T1 T2 T8T7
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5
T1 T2
T5T4
T7 T8
T3 T6
Generation 2
Generation 2 Generation 2 Generation 2
41. Kafka Rebalance
1. …
7. Leader performs assignment
while other members wait
8. Leader sends back new
assignment
41Coordinator
T3T1 T2 T8T7
T1 T4 T7
T5T2 T8
T3
T6
T6T4 T5
T1 T2
T5T4
T7 T8
T3 T6
Generation 2
Generation 2 Generation 2 Generation 2
42. Kafka Rebalance
1. …
7. Leader performs assignment
while other members wait
8. Leader sends back new
assignment
9. Leader propagates assignment
to members
42Coordinator
T3T1 T2 T8T7
T1 T4 T7
T5T2 T8
T3
T6
T6T4 T5
Generation 2
Generation 2 Generation 2 Generation 2
43. Kafka Rebalance
1. …
7. Leader performs assignment
while other members wait
8. Leader sends back new
assignment
9. Coordinator propagates
assignment to members
43Coordinator
T7T1 T4 T8T7
T1 T4 T7
T5T2 T8
T3
T6
T6T4 T5
Generation 2
Generation 2 Generation 2 Generation 2
44. Kafka Rebalance
1. …
7. Leader performs assignment
while other members wait
8. Leader sends back new
assignment
9. Coordinator propagates
assignment to members
44Coordinator
T7T1 T4 T8T7
T1 T4 T7
T5T2 T8
T3
T6
T8T2 T5
Generation 2
Generation 2 Generation 2 Generation 2
45. Kafka Rebalance
1. …
7. Leader performs assignment
while other members wait
8. Leader sends back new
assignment
9. Coordinator propagates
assignment to members
45Coordinator
T7T1 T4 T6T3
T1 T4 T7
T5T2 T8
T3
T6
T8T2 T5
Generation 2
Generation 2 Generation 2 Generation 2
46. Kafka Rebalance
1. …
7. Leader performs assignment
while other members wait
8. Leader sends back new
assignment
9. Coordinator propagates
assignment to members
10. Group is stable at Generation 2!
46Coordinator
T7T1 T4 T6T3
T1 T4 T7
T5T2 T8
T3
T6
T8T2 T5
Generation 2
Generation 2 Generation 2 Generation 2
48. 48
Task recovery time breakdown
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
49. 49
Task recovery time breakdown
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
50. 50
Task recovery time breakdown
Better 100% availability
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
? ? ?
51. 51
Task recovery time breakdown
Better 100% availability
Worse recovery time!
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
54. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
54Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
55. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
55Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
56. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
56Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
57. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
5. Propagate…
6. Done! However one member
becomes zombie now
7.
57Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
T7 T8
58. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
58Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
T7 T8
59. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
59Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5
T7 T8
60. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
60Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2
T7 T8
61. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
10. Perform assignment (different
from last time)
61Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T4 T5T3T1 T2
T7 T8
62. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
10. Perform assignment (different
from last time)
11. Propagate, and done!
62Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T3T7T1 T2 T8T4 T5
65. 65
Task recovery time breakdown
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
? ? ?
66. 66
Task recovery time breakdown
Better stickiness
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
? ? ?
67. 67
Task recovery time breakdown
Better stickiness
Not necessarily better recovery time
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
69. Rolling bounce
1. Restart member fleet
69Coordinator
T1 T2 T3
T5T4 T6
T7
T8
ID: -- ID: -- ID: --
m1, m2, m3
Members
T3T1 T2 T6T4 T5 T8T7
70. Rolling bounce
1. Restart member fleet
70Coordinator
T1 T2 T3
T5T4 T6
T7
T8
ID: -- ID: -- ID: --
m1, m2, m3
Members
T3T1 T2 T6T4 T5 T8T7
71. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
71Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
[ ]
72. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
72Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4
73. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
73Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4, m5
74. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
74Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4, m5, m6
75. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
75Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
m4, m5, m6
Members
ID: -- ID: -- ID: --
76. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
76Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
m4, m5, m6
Members
77. Rolling bounce
77Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
m4, m5, m6
Members
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
78. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
78Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5 T8T7
ID: m4 ID: -- ID: --
m4, m5, m6
Members
T7T3 T4
79. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
79Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T8T7
ID: m4 ID: m5 ID: --
m4, m5, m6
Members
T7T3 T4 T8T2 T5
80. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
6. Propagate…
7. Done!
80Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
ID: m4 ID: m5 ID: m6
m4, m5, m6
Members
81. Rolling bounce
● Another unnecessary
assignment change
81Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
T3T1 T2 T6T4 T5 T8T7
82. ● Another unnecessary
assignment change
● No persistence of member
identity. After restart, the
member is unknown to the
coordinator
Rolling bounce
82Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
T3T1 T2 T6T4 T5 T8T7
m1, m2, m3
m4, m5, m6
Members
83. Rolling bounce
● Another unnecessary
assignment change
● No persistence of member
identity. After restart, the
member is unknown to the
coordinator.
83Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
T3T1 T2 T6T4 T5 T8T7
m1, m2, m3
m4, m5, m6
Members
86. Static Membership
● Give each member a unique ID
○ Configuration:
group.instance.id
86Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
86
87. Static Membership
● Give each member a unique ID
○ Configuration:
group.instance.id
○ Remember assignment info
on the coordinator
87Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
87
88. Static Membership
● Give each member a unique ID
○ Configuration:
group.instance.id
○ Remember assignment info
on the coordinator
○ A static member never
sends a “leave group”
request
88Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
88
89. Static Membership
● Give each member a unique ID
○ Configuration:
group.instance.id
○ Remember assignment info
on the coordinator
○ A static member never
sends a “leave group”
request
○ No rebalance upon known
static member rejoin
89Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
89
99. Static Membership
1. …
8. Member w3 rejoins
9. Coordinator gets w3’s
assignment
10. Member w3 gets the same
assignment
Done!
99Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
99
100. 100
Task recovery time breakdown
Better stickiness
Not necessarily better recovery time
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
101. 101
Task recovery time breakdown
Better stickiness
Not necessarily better recovery time
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
Dynamic Membership
102. 102
Task recovery time breakdown
Better stickiness
Unnecessary rebalance rate drop!
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
Static Membership
103. To opt into Static
Membership
103Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
103
104. To opt into Static
Membership
● Set broker version >= 2.3
104Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
104
105. To opt into Static
Membership
● Set broker version >= 2.3
● Set config group.instance.id
105Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
105
106. To opt into Static
Membership
● Set broker version >= 2.3
● Set config group.instance.id
● Set session timeout long enough
106Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
106
107. To opt into Static
Membership
● Set broker version >= 2.3
● Set config group.instance.id
● Set session timeout long enough
● More details here.
107Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
107
109. Scaling Static
Members
1. Member w2 steps down
109Coordinator
T1 T2 T3
T5T4 T6
ID: w1
T3T1 T2
109
Members
w1 -> m1,
w2 -> m2
Members
ID: w2
T6T4 T5
110. Scaling Static
Members
1. Member w2 steps down
2. It won’t call leave group, so
coordinator won’t rebalance, thus
T4~T6 make no progress
110Coordinator
T1 T2 T3
T5T4 T6
ID: w1
T3T1 T2
110
Members
w1 -> m1,
w2 -> m2
Members
ID: w2
T6T4 T5
111. Scaling Static
Members
1. Member w2 steps down
2. It won’t call leave group, so
coordinator won’t rebalance, thus
T4~T6 make no progress
3. Until session timeout, the
coordinator removes w2 starts
rebalancing
111Coordinator
T1 T2 T3
T5T4 T6
ID: w1
T3T1 T2
111
Members
w1 -> m1,
w2 -> m2
Members
113. Scaling Static
Members
1. Member w2 steps down
113Coordinator
T1 T2 T3
T5T4 T6
ID: w1
T3T1 T2
113
Members
w1 -> m1,
w2 -> m2
Members
ID: w2
T6T4 T5 Admin
114. Scaling Static
Members
1. Member w2 steps down
2. Use admin client to remove static
members
114Coordinator
T1 T2 T3
T5T4 T6
ID: w1
T3T1 T2
114
Members
w1 -> m1,
w2 -> m2
Members
ID: w2
T6T4 T5 Admin
Remove W2
115. Scaling Static
Members
1. Member w2 steps down
2. Use admin client to remove static
members
3. Group starts rebalancing
immediately
115Coordinator
T1 T2 T3
T5T4 T6
ID: w1
T3T1 T2
115
Members
w1 -> m1,
w2 -> m2
Members
Admin
116. Scaling Static
Members
1. Member w2 steps down
2. Use admin client to remove static
members
3. Group starts rebalancing
immediately
4. T4 ~ T6 have no excessive
downtime!
116Coordinator
T1 T2 T3
T5T4 T6
ID: w1
T3T1 T2
116
Members
w1 -> m1
Members
AdminT6T4 T5
128. 1. Member m3 restarts
2. m3 loses its identity and rejoin
128Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5
ID: m1 ID: m2 ID: --
m1, m2, m3
Members
T8T7
Incremental Cooperative
Rebalancing
129. 1. Member m3 restarts
2. m3 loses its identity and rejoin
3. m1 and m2 rejoin without
revoking tasks
129Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5
ID: m1 ID: m2 ID: --
m1, m2, m3
Members
T8T7
Incremental Cooperative
Rebalancing
130. 1. Member m3 restarts
2. m3 loses its identity and rejoin
3. m1 and m2 rejoin without
revoking tasks
4. Compute new assignment
130Coordinator
T1 T2 T7
T5T4 T6
T3
T8
T3T1 T2 T6T4 T5
ID: m1 ID: m2 ID: m4
m1, m2,
m3, m4
Members
T8T7
Incremental Cooperative
Rebalancing
131. 1. Member m3 restarts
2. m3 loses its identity and rejoin
3. m1 and m2 rejoin without
revoking tasks
4. Compute new assignment
5. Propagate new assignment
131Coordinator
T1 T2 T7
T5T4 T6
T3
T8
T3T1 T2 T6T4 T5
ID: m1 ID: m2 ID: m4
m1, m2, m4
Members
T8T3
Incremental Cooperative
Rebalancing
132. 1. …
6. Member m1 assignment
conflicts with previous
assignment
132Coordinator
T1 T2 T7
T5T4 T6
T3
T8
T3T1 T2 T6T4 T5
ID: m1 ID: m2 ID: m4
m1, m2, m4
Members
T8T3
Incremental Cooperative
Rebalancing
133. 1. …
6. Member m1 assignment
conflicts with previous
assignment
7. Member m1 rejoins with T3
revoked
133Coordinator
T1 T2 T7
T5T4 T6
T3
T8
T3T1 T2 T6T4 T5
ID: m1 ID: m2 ID: m4
m1, m2, m4
Members
T8T3
Incremental Cooperative
Rebalancing
134. 1. …
6. Member m1 assignment
conflicts with previous
assignment
7. Member m1 rejoins with T3
revoked
8. Member m2 and m3 rejoin
without revoking any task
134Coordinator
T1 T2 T7
T5T4 T6
T3
T8
T3T1 T2 T6T4 T5
ID: m1 ID: m2 ID: m4
m1, m2, m4
Members
T8T3
Incremental Cooperative
Rebalancing
135. 1. …
6. Member m1 assignment
conflicts with previous
assignment
7. Member m1 rejoins with T3
revoked
8. Member m2 and m3 rejoin
without revoking any task
9. Finalize assignment by second
rebalance!
135Coordinator
T1 T2 T7
T5T4 T6
T3
T8
T7T1 T2 T6T4 T5
ID: m1 ID: m2 ID: m4
m1, m2, m4
Members
T8T3
Incremental Cooperative
Rebalancing
136. 1. …
6. Member m1 assignment
conflicts with previous
assignment
7. Member m1 rejoins with T3
revoked
8. Member m2 and m3 rejoin
without revoking any task
9. Finalize assignment by second
rebalance!
136Coordinator
T1 T2 T7
T5T4 T6
T3
T8
T7T1 T2 T6T4 T5
ID: m1 ID: m2 ID: m4
m1, m2, m4
Members
T8T3
Incremental Cooperative
Rebalancing
137. 137
Task recovery time breakdown
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
Incremental Cooperative Rebalancing
138. 138
Task recovery time breakdown
Highly efficient rebalance
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
Incremental Cooperative Rebalancing
139. 139
Task recovery time breakdown
Highly efficient rebalance
A sticky assignment protocol
Session
timeout
(Failure
detection)
Rebalance
time
(time to shuffle
tasks)
Task
assignment
time
State restore
time
(replay from
changelog)
Configurable
Semi-configurable
Non-configurable
Incremental Cooperative Rebalancing
141. 141
141
Range
assignor
1. Assign tasks based
on relative order
141Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: m1 ID: m2 ID: m3
m1, m2, m3
Members
Generation 1
142. 142
142
Range
assignor
142Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5T8T7
ID: m4 ID: m6 ID: m5
m1, m2, m3
m4, m6, m5
Members
Generation 1
Generation 2
1. Assign tasks based
on relative order
2. On generation 2,
relative order based
on ephemeral ID shall
change
143. 143
143
Range
assignor
143Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5T8T7
ID: m4 ID: m6 ID: m5
m1, m2, m3
m4, m6, m5
Members
Generation 1
Generation 2
1. Assign tasks based
on relative order
2. On generation 2,
relative order based
on ephemeral ID shall
change
3. So does the
assignment
144. 144
144
Range
assignor
144Coordinator
T1 T2 T3
T5T4 T6
T7
T8
w1 -> m1,
w2 -> m2,
w3 -> m3
Members
Generation 1
1. …
4. Honor static instance
ID over ephemeral
member ID
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
145. 145
145
Range
assignor
145Coordinator
T1 T2 T3
T5T4 T6
T7
T8
w1 -> m4,
w2 -> m6,
w3 -> m5
Members
Generation 1
Generation 2
1. …
4. Honor static instance
ID over ephemeral
member ID
5. Assignment won’t
change in a rebalance
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
146. 146
146
Range
assignor
146Coordinator
T1 T2 T3
T5T4 T6
T7
T8
w1 -> m4,
w2 -> m6,
w3 -> m5
Members
Generation 2
1. …
4. Honor static instance
ID over ephemeral
member ID
5. Assignment won’t
change in a rebalance
6. Same for round robin
assignment
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
153. 153
153
153153
● Understand the concepts
○ Session timeout
○ Member ID
○ Group instance ID
● Why reducing the session timeout not necessarily improve overall
recovery time?
Takeaways
154. 154
154
154154
● Understand the concepts
○ Session timeout
○ Member ID
○ Group instance ID
● Why reducing the session timeout not necessarily improve overall
recovery time?
● Why Static Membership could improve the rolling bounce behavior?
Takeaways
155. 155
155
155155
● Understand the concepts
○ Session timeout
○ Member ID
○ Group instance ID
● Why reducing the session timeout not necessarily improve overall
recovery time?
● Why Static Membership could improve the rolling bounce behavior?
● Do you understand how to enable Static Membership?
Takeaways
156. 156
156
156156
● Understand the concepts
○ Session timeout
○ Member ID
○ Group instance ID
● Why reducing the session timeout not necessarily improve overall
recovery time?
● Why Static Membership could improve the rolling bounce behavior?
● Do you understand how to enable Static Membership?
● How does Static Membership compare with Incremental Cooperative
Rebalancing?
Takeaways
157. 157
Resources
• Static Membership blog: https://www.confluent.io/blog/kafka-rebalance-protocol-static-membership
• KIP-62: Allow consumer to send heartbeats from a background thread
• KIP-345: Introduce Static Membership protocol to reduce consumer rebalances
• Incremental Cooperative Rebalancing blog:
https://www.confluent.io/blog/incremental-cooperative-rebalancing-in-kafka
• KIP-415: Incremental Cooperative Rebalancing in Kafka Connect
• KIP-429: Kafka Consumer Incremental Rebalance Protocol