The document discusses the current Apache Kafka consumer rebalance protocol and proposes a next-generation protocol. The current protocol uses a global synchronization barrier during rebalances that stops all processing. The next-gen protocol assigns this coordination responsibility to the group coordinator to avoid stopping all consumers. It introduces a declarative assignment process where consumers report their subscriptions and the coordinator assigns partitions, rather than consumers negotiating assignments.
The Next Generation of the Consumer Rebalance Protocol With David Jacot | Current 2022
1. The Next Generation of the
Consumer Rebalance Protocol
in Apache Kafka
David Jacot (@davidjacot)
Apache Kafka PMC
Staff Software Engineer II, Confluent
4. Consumer Group (group.id = foo)
What is the Rebalance Protocol used for?
4
Partition 1 Partition 2 Partition 3 Partition 4
Consumer A Consumer B Consumer C
Kafka Cluster
5. Current Consumer Rebalance Protocol
5
JoinGoup, SyncGroup, Heartbeat APIs
Group Coordinator
Consumer A
Membership
Resources
Assignment
6. A Rebalance is a Synchronization Barrier
6
Member
A
Member
B
Member
A
Member
B
Member
C
Member
B
Member
C
Member
B
Member
C
Generation
1
Generation
2
Generation
3
Generation
4
💥
C joins the group.
A and B join the
group.
B updates its
subscriptions.
A fails.
7. Rebalance’s 2-Phases: JoinGoup & SyncGroup
7
Member A
Member B
Group Coordinator
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
Synchronization Barrier
Rebalance Timeout
8. Rebalance’s 2-Phases: JoinGoup & SyncGroup
8
Member A
Member B
Group Coordinator
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
J
o
i
n
G
r
o
u
p
(
m
e
m
b
e
r
s
)
J
o
i
n
G
r
o
u
p
(
)
Synchronization Barrier
Member A is the
leader and computes
the assignment for
the group.
Rebalance Timeout
9. Rebalance’s 2-Phases: JoinGoup & SyncGroup
9
Member A
Member B
Group Coordinator
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
J
o
i
n
G
r
o
u
p
(
m
e
m
b
e
r
s
)
J
o
i
n
G
r
o
u
p
(
)
S
y
n
c
G
r
o
u
p
(
a
s
s
i
g
n
m
e
n
t
s
)
S
y
n
c
G
r
o
u
p
(
)
Synchronization Barrier
Member A is the
leader and computes
the assignment for
the group.
Rebalance Timeout Rebalance Timeout
10. Rebalance’s 2-Phases: JoinGoup & SyncGroup
10
Member A
Member B
Group Coordinator
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
J
o
i
n
G
r
o
u
p
(
m
e
t
a
d
a
t
a
)
J
o
i
n
G
r
o
u
p
(
m
e
m
b
e
r
s
)
J
o
i
n
G
r
o
u
p
(
)
S
y
n
c
G
r
o
u
p
(
a
s
s
i
g
n
m
e
n
t
s
)
S
y
n
c
G
r
o
u
p
(
)
S
y
n
c
G
r
o
u
p
(
a
s
s
i
g
n
m
e
n
t
)
S
y
n
c
G
r
o
u
p
(
a
s
s
i
g
n
m
e
n
t
)
Synchronization Barrier
Member A is the
leader and computes
the assignment for
the group.
Rebalance Timeout Rebalance Timeout
11. Failure Detection - Liveness
11
Member A
Member B
Group Coordinator
Session Timeout*
Heartbeat Interval*
Heartbeat
Request/Response
Heartbeat
Request/Response
Heartbeat
Request/Response
Heartbeat
Request/Response
* They are defined by the client!
12. Eager Assignment (e.g. range)
12
Member A
Member B
Group Coordinator
Member C joins
P1 P2 P3
P4 P5 P6
P1 P2
P3
P4 P5
P6
Assigned partitions
self-revoked. Partitions reassigned.
Processing paused
JoinGroup
Request/Response
SyncGroup
Request/Response
13. Cooperative Assignment (e.g. cooperative-sticky)
13
Member A
Member B
Group Coordinator
Member C joins
P1 P2 P3
P4 P5 P6
JoinGroup
Request/Response
SyncGroup
Request/Response
P1 P2
P4 P5
P3 and P6 are
revoked.
1st rebalance to revoke
14. Cooperative Assignment (e.g. cooperative-sticky)
14
Member A
Member B
Group Coordinator
Member C joins
P1 P2 P3
P4 P5 P6
P1 P2
P3
P4 P5
P6
JoinGroup
Request/Response
SyncGroup
Request/Response
JoinGroup
Request/Response
SyncGroup
Request/Response
P1 P2
P4 P5
P3 and P6 are
revoked.
1st rebalance to revoke 2nd rebalance to assign
16. Group-Wide Synchronization Barrier
● “Stop the World” is a fundamental issue.
● Cooperative assignors improved it but Consumers cannot commit
offsets while a rebalance is inflight.
● Faulty or slow members have a large impact on the entire group.
● Stability and scalability are limited.
16
17. With Great Power Comes Great Responsibility
● Consumer drives key configurations (e.g. session.timeout.ms).
● Client side assignors are complicated and also depends on local
metadata. Group leader is responsible for monitoring metadata.
● Wildcard subscriptions are computed locally by all members.
● Adoption of newer clients is slow, bugs and regressions tend to stay
for a while.
● Implementing the protocol in other clients is not trivial.
17
18. Embedded Consumer Protocol
● Interoperability between consumer libraries, toolings, etc. relies on a
correct definition of the Consumer Protocol.
● Visibility on the broker side is poor. We only see bytes exchanged
between members. This makes debugging difficult.
18
20. Consumer A
Empower the Group Coordinator, and
Simplify the Consumer
20
Group Coordinator
Declarative
Assignment
(Subscriptions, Owned Partitions)
(Assign/Revoke Partitions)
Membership
Member
Assignment
Heartbeat API
25. Reconciliation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
P1 P2 P3
P4 P5 P6
Member C joins
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
0
0
0
0
1
1
0
26. Reconciliation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
P1 P2
P1 P2 P3
P4 P5 P6
P3
0
0
0
0
0
1
0 1
27. Reconciliation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
P1 P2
P1 P2 P3
P4 P5 P6 P4 P5
P3
P6
0
0
0
0
0
0
1
0 1
28. Reconciliation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
P1 P2
P1 P2 P3
P4 P5 P6 P4 P5 P6
P3
0
0
0
0
0
0
1 1
0 1
29. Reconciliation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
P1 P2
P1 P2 P3
P4 P5 P6 P4 P5 P6
P3 P1 P2
A acks than P3 is
released.
0
0
0
0
0
0
1 1
1
0 1
30. Reconciliation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
P1 P2
P1 P2 P3
P4 P5 P6 P4 P5 P6
P3 P1 P2
P4 P5 P6
B still owns P6.
0
0
0
0
0
0
1 1
1
0
0 1
31. Reconciliation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
P1 P2
P1 P2 P3
P4 P5 P6 P4 P5 P6
P3 P1 P2
P4 P5 P6
P3
0
0
0
0
0
0
1 1
1
0
1
0 1
33. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
P6
P1 P2
1
0
1
1
1
34. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
P6
P1 P2
P4 P5 P6
1
0
1
1
0
1
35. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
P6
P1 P2
P4 P5 P6
P3
1
0
1
1
0
1
1
36. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
Member D joins
P6
P1 P2
P4 P5 P6
P3
Assignment
P1 P2
P4 P5
P3
Member A
Member B
Member C
P6
Member D
1
0
1
1
0
1
2
2
1
37. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
Member D
P6
P1 P2
P4 P5 P6
P3
Assignment
P1 P2
P4 P5
P3
Member A
Member B
Member C
P6
Member D
P1 P2
1
0
1
1
0
1
2
2
1 2
38. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
Member D
P6
P1 P2
P4 P5 P6
P3
Assignment
P1 P2
P4 P5
P3
Member A
Member B
Member C
P6
Member D
P1 P2
P4 P5
B acks than P6 is
released.
1
0
1
1
0
1
2
2
2
1 2
39. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
Member D
P6
P1 P2
P4 P5 P6
P3
Assignment
P1 P2
P4 P5
P3
Member A
Member B
Member C
P6
Member D
P1 P2
P4 P5
P3
1
0
1
1
0
1
2
2
2
2
1 2
40. Reconciliation Process
Member A
Member B
Member C
P1 P2
P4 P5
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
Group Coordinator
P3
Member D
P6
P1 P2
P4 P5 P6
P3
Assignment
P1 P2
P4 P5
P3
Member A
Member B
Member C
P6
Member D
P1 P2
P4 P5
P6 is assigned to D.
P3
P6
1
0
1
1
0
1
2
2
2
2
2
1 2
41. Reconciliation Process’ Key Principles
● Members are not globally synchronized anymore.
● The group coordinator reconciles members towards the desired
group assignment and resolves dependencies between them.
● A member is moved to a newer epoch only when it has revoked
partitions.
● A member can jump epochs.
41
42. Declarative Server-Side Assignor
● The assignor is declarative! It tells the consumer what the end state
should be.
● Range & Uniform assignors provided by default.
● Assignors are pluggable with the group.consumer.assignors
configuration and the PartitionAssignor interface.
● The consumer selects its assignor with the group.remote.assignor
configuration.
42
43. Rebalance Timeout (or Revocation Timeout)
● When revoking partitions, the group coordinator gives the
rebalance timeout to the member to complete the revocation.
● The rebalance timeout is provided by the consumer.
● It is defined based on the max.poll.interval.ms (the maximum delay
between invocations of Consumer#poll).
43
45. Dynamic Group Configurations
● All configurations are defined on the server side (except the
rebalance timeout):
● They are defined per group id and dynamically updatable with the
Admin APIs.
45
Name Default Doc
group.consumer.heartbeat.interval.ms 5s The heartbeat interval given to the members.
group.consumer.session.timeout.ms 45s
The timeout to detect client failures when
using the consumer group protocol.
46. Wildcard Subscriptions & Metadata Monitoring
● Wildcard subscriptions are managed centrally by the group
coordinator.
● The group coordinator monitors topics & partitions and triggers a
rebalance if necessary.
46
47. Topic Identifiers (KIP-516)
● Fetch & Metadata APIs use Topic IDs since KIP-516.
● Current Rebalance Protocol, OffsetCommit & OffsetFetch APIs do
not.
● The Next-Gen Protocol uses Topic IDs and OffsetCommit &
OffsetFetch APIs will be updated to support them as well.
47
50. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
0
0
0
51. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
P1 P2 P3
0
0
0
0
52. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
P1 P2 P3
P4 P5 P6
0
0
0
0
0
53. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
P1 P2 P3
P4 P5 P6
Member C joins
A new Assignment
is required.
0
0
0
0
0
0
54. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
A new Assignment
is required.
A is requested to
compute a new
assignment.
P1 P2 P3
P4 P5 P6
0
0
0
0
0
P1 P2 P3
0
0
55. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
A new Assignment
is required.
P1 P2 P3
P4 P5 P6
0
0
0
0
0
P1 P2 P3
0
P4 P5 P6
0
0
0
56. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
A new Assignment
is required.
P1 P2 P3
P4 P5 P6
0
0
0
0
0
P1 P2 P3
0
P4 P5 P6
0
0
A get assignor’s
input.
0
57. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
A new Assignment
is required.
P1 P2 P3
P4 P5 P6
0
0
0
0
0
P1 P2 P3
0
P4 P5 P6
0
0
A computes the
assignment.
0
58. Client-Side Assignment Delegation Process
Member A
Member B
P1 P2 P3
P4 P5 P6
P1 P2 P3
P4 P5 P6
Member A
Member B
Assignment
Group Coordinator
Member C joins
A new Assignment
is required.
P1 P2 P3
P4 P5 P6
0
0
0
0
0
P1 P2 P3
0
P4 P5 P6
0
0
A installs the
assignment.
P1 P2
P4 P5
P3 P6
Member A
Member B
Member C
Assignment
1
0
59. Declarative Client-Side Assignor
● Assignors are still declarative!
● The assignment is delegated to one member of the group. The
member has to provide it within the rebalance timeout.
● The client-side assignor receives its full input from the group
coordinator. No local dependency.
● The assignors are configured by setting group.local.assignors. They
have to implement the PartitionAssignor interface.
● The group coordinator selects a client-side assignor commonly used
in the group.
59
61. How to upgrade to the new protocol?
1. Upgrade the cluster to a software version that supports the new
protocol.
2. Roll the cluster to enable the IBP (ZK) or update the Metadata
Version (KRaft).
3. Upgrade the consumers to a software version that supports the
new protocol.
4. Roll the consumers to enable the new protocol.
61
62. Current Protocol Next-Gen Protocol
Live Consumer Group Upgrade, How?
62
ConsumerGroupHeartbeat API
● Updates Subscriptions
● Updates Owned Partitions
● Collects Assignment
● Maintains Session
JoinGroup API
● Updates Subscriptions
● Updates Owned Partitions
SyncGroup API
● Collects Assignment
Heartbeat API
● Maintains Session
● Gets Rebalance Notification
63. Live Consumer Group Upgrade, How?
63
Group Coordinator
Group Foo
Consumer A
Consumer B
Consumer C
Old Protocol
Old Protocol
Old Protocol
64. New Protocol
Live Consumer Group Upgrade, How?
64
Group Coordinator
Group Foo
(Assignment)
Consumer A
Consumer B
Consumer C
Old Protocol is
proxied to the new
protocol.
Old Protocol
Old Protocol
65. New Protocol
New Protocol
Live Consumer Group Upgrade, How?
65
Group Coordinator
Group Foo
(Assignment)
Consumer A
Consumer B
Consumer C
Old Protocol is
proxied to the new
protocol.
Old Protocol
66. New Protocol
New Protocol
New Protocol
Live Consumer Group Upgrade, How?
66
Group Coordinator
Group Foo
(Assignment)
Consumer A
Consumer B
Consumer C
68. Takeaways
● The new rebalance protocol is tailored to the Consumer.
● The group-wide synchronization barrier is gone.
● The complexity moves away from the clients to the group
coordinator. Implementing clients should be easier.
● Server side assignor by default. Client side assignor for power users
(e.g. Kafka Streams).
● Live upgrade path from the current protocol to next-gen protocol.
68
69. KIP-848: The Next Generation of the
Consumer Rebalance Protocol
69
Thank you!
David Jacot
@davidjacot
dajac@apache.org