Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
5. developer.confluent.io
Record Schema
Record =>
timestamp
key
value
Headers
Event Stream
key/
value
Bytes
Area Description
0
Magic
Byte
Confluent serialization format version number;
currently always 0.
1-4
Schema
ID
4-byte schema ID as returned by Schema
Registry.
5-... Data Serialized data for the specified schema format.
13. developer.confluent.io
Records Accumulated Into Record Batches
Record =>
timestamp
key
value
Headers
RecordBatch =>
…
…
attributes: int16
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
4: zstd
…
…
records: [Records]
Compression
Record 1
Record 2
Record n
…
14. developer.confluent.io
Record Batch 1
Record Batch 2
Record Batch n
.
.
.
Produce Request => acks [topic_data]
acks => INT16
topic_data => topic [data]
topic => STRING
data => partition record_set
partition => INT32
topic_data => topic [data]
topic => STRING
data => partition record_set
partition => INT32
record_set => BYTES
.
.
.
record_set => BYTES
Record Batches Drained Into Produce Requests
Record =>
timestamp
key
value
Headers
RecordBatch =>
…
…
attributes: int16
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
4: zstd
…
…
records: [Records]
Compression
Record 1
Record 2
Record n
…
linger.ms
batch.size
23. developer.confluent.io
Producer
23
acks=1
enable.idempotence=false
max.request.size=1MB
retries=MAX_INT
delivery.timeout.ms=2min
max.in.flight.requests.
per.connection=5
Serializer
● Retrieves and
caches schemas
from Schema
Registry
Partitioner
● Java client uses
murmur2 for
hashing
● If key not
provided
performs round
robin
● If keys
unbalanced it will
overload one
leader
● Upcoming
changes in KIP-
794
Sender thread
● Batches grouped
by destination
broker into
requests
● Multiple batches
to different
partitions
potentially in the
same producer
request
Record accumulator
● Buffer per partition,
seldom used partitions
may not achieve high
batching
● If many producers are in
the same JVM, memory
and GC could become
important
● Sticky partitioner could
be used to increase
batches in the case of
round robin (KIP-
408/KIP-794)
Compression
● At batch level
● Allows faster transfer to
the broker
● Reduces the inter
broker replication load
● Reduces page cache &
disk space utilization on
brokers
● Gzip is more CPU
intensive, Snappy is
lighter, LZ4/ZStd are a
good balance*
compress.type=none
batch.size=16KB
buffer.memory=32MB
max.block.ms=60s
record batch request
batch.size=16KB
linger.ms=0
buffer.memory=32MB
max.block.ms=60s
compress.type=none
33. developer.confluent.io
Group Startup: Step 1 - Find Group Coordinator
Broker
Group Coordinator
__consumer_offsets
Broker
__consumer_offsets
Broker
__consumer_offsets
Consumer Group
Consumer 1
group.id=1
Consumer 2
group.id=1
Sent to any broker
34. developer.confluent.io
Group Startup: Step 2 - Members Join
Broker
Group Coordinator
__consumer_offsets
Broker
topic-a
Broker
topic-a
Consumer Group
Consumer 1
group.id=1
group leader
Consumer 2
group.id=1
35. developer.confluent.io
Group Startup: Step 3 - Partitions Assigned
Consumer Group
Consumer 1
group.id=1
group leader
Consumer 2
group.id=1
Broker
Group Coordinator
__consumer_offsets
Broker
topic-a
Broker
topic-a
39. developer.confluent.io
Consumer Group
Consumer 1
group.id=1
poll( )
Consumer 2
group.id=1
poll( )
Determining Starting Offset to Consume
Broker
Group Coordinator
__consumer_offsets
Broker
topic-a
Broker
topic-a
If no committed
offset is available,
auto.offset.reset
value determines
starting offset
40. developer.confluent.io
Group Coordinator Failover
Broker
Group Coordinator
__consumer_offsets
Broker
Group Coordinator
__consumer_offsets
Broker
Group Coordinator
__consumer_offsets
Consumer Group
Consumer 1
group.id=1
Consumer 2
group.id=1
Group coordinator
fails over to
__consumer_offsets
new partition leader
41. developer.confluent.io
Consumer Group
Consumer 1
group.id=1
poll( )
HeartbeatThread
Consumer Group Rebalance Triggers
topic_a
P0
P1
P2
P3
topic_b
P0
P1
Topic added or deleted that matches subscription
consumer.subscribe(Pattern.compile("topic_.*");
# of partitions
increases, e.g.
from 3 to 4
Consumer instance
joins or leaves group,
e.g. heartbeat timeout
Consumer 2
group.id=1
poll( )
HeartbeatThread
Consumer 3
group.id=1
poll( )
HeartbeatThread
43. developer.confluent.io
Stop-the-world Rebalance
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0)
(p1)
(p2)
1
2 3
4
Consumers:
1) Revoke current partition
assignment and clean up
the partition states
2)Join the group
3)Sync with the group
4)Receive new partition
assignments
a)Build the partition
state
b)Resume consumption
44. developer.confluent.io
Stop-the-world Problem 1 -
Rebuilding the State
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0)
(p1)
(p2)
Since partitions p0 and p1
are assigned to the same
consumer instance,
rebuilding the state is
unnecessary
45. developer.confluent.io
Stop-the-world Problem 2 -
Paused Processing
Processing
paused
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0)
(p1)
(p2)
Processing pauses for all
subscribed partitions for
the duration of the
rebalance
● The pausing for p0 and
p1 is unnecessary
46. developer.confluent.io
Avoid Needless State Rebuild with StickyAssignor
Group coordinator
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
Synchronization barrier
(p0, p2)
(p1)
(p2)
Processing
paused
Partition reassigned
● State cleanup and build
Assigned partitions self-revoked
● Clean up state
47. developer.confluent.io
Avoid Processing Pause with CooperativeStickyAssignor
Consumer 1 (p0,p2)
Consumer 2 (p1)
Consumer 3 joins
(p0)
(p1)
(p2)
p0, p1
consumption
continues
1st rebalance
Synchronization
barrier
SyncGroupResponse revokes p2 assignment
World does not stop!
p2
revoked
49. developer.confluent.io
Consumer Group
Consumer 1
group.id=1
group.instance.id=1
HeartbeatThread
Avoid Rebalance with Static
Group Membership
topic_a
P0
P1
P2
P3
topic_b
P0
P1
Consumer 2
group.id=1
group.instance.id=2
HeartbeatThread
Consumer 3
group.id=1
group.instance.id=3
HeartbeatThread
Establishes
static group
membership
Members do not
send LeaveGroup
request when
they are stopped
Group
Coordinator
No rebalance if
member rejoins prior
to session.timeout.ms
51. developer.confluent.io
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 P7
Broker
balances topic
($10)
transfers topic
A->B
Broker
balances topic
$10
P0
P0
P1
Why Are Transactions Needed?
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
1
4
Alice pays
Bob $10
1. Event is fetched by the
consumer
2. Debit event is written
3. Credit event is written
4. Transfer event offset is
committed
Event is written to
transfers topic
Alice, ($10)
Bob, $10
2
3
Downstream App
Consumer API
52. developer.confluent.io
Atomic Transaction
Broker
Consumer Group
Coordinator
__consumer_offsets topic
P7
Broker
balances topic
($10) A
transfers topic
A->B
Broker
balances topic
P0
P0
P1
Kafka Transactions Deliver Exactly Once
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
1
4
Alice, ($10)
Bob, $10
2
3
Downstream App
Consumer API
Transaction is only
committed if all
parts succeed
Is aborted if
any part fails
Using transactions with
Kafka Streams is quite
simple:
1) Set processing.guarantee
to exactly_once_v2 in
StreamsConfig
2) Set isolation.level to
read_committed in the
Consumer configuration
54. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 P7
Broker
balances topic
($10)($10)
transfers topic
A->B
Broker
balances topic
$10
P0
P0
P1
Downstream App
Consumer API
System Failure Without Transactions
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
Funds Transfer App
Consumer API
Producer API
1. Event fetched by consumer
2. Alice’s account debited
Application instance fails
without committing offset and
new application instance starts
1. Event fetched by consumer
2. Alice’s account is debited a
second time
3. Bob’s account is credited
4. Consumer offset committed
5. Two debit events processed
by downstream consumer
1
Alice, ($10)
2
Alice, ($10)
Bob, $10
2
5
1
3
4
55. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
P7
Broker
balances topic
($10)
transfers topic
A->B
Broker
balances topic
P0
P0
P1
Broker
Transaction Coordinator
__transaction_state topic
System Failure with Transactions
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
transactional.id='fund-tr'
Coordinates txn and
persists txn metadata
2
1
Alice, ($10)
4
3
Downstream App
Consumer API
'fund-tr'=>pid e0 P0
1. Requests txn ID, is returned
PID and txn epoch
2. Event fetched by consumer
3. Notifies coordinator of
partition being written to
4. Alice’s account debited
isolation.level=
'read_committed'
56. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
P7
Broker
balances topic
($10) A
transfers topic
A->B
Broker
balances topic
P0
P0
P1
System Failure with Transactions
Funds Transfer App
Consumer API
Producer API
Broker
Transaction Coordinator
__transaction_state topic
Alice
Bob
transfer $10
Alice → Bob
transactional.id='fund-tr'
2
1
Alice, ($10)
4
3
Downstream App
Consumer API
1. Requests txn ID, is returned PID
and txn epoch
2. Event fetched by consumer
3. Notifies coordinator of partition
being written to
4. Alice’s account debited
Application instance fails
without committing offset and
new application instance starts
1. New instance requests txn ID
a. Coordinator fences
previous instance by
aborting pending txn and
bumping up epoch
2. Downstream consumer with
read_committed discards
aborted events
Funds Transfer App
Consumer API
Producer API 1
transactional.id='fund-tr'
2
isolation.level=
'read_committed'
'fund-tr'=>pid e0 P0 A pid e1
57. developer.confluent.io
A
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 C P7
Broker
balances topic
($10) C
transfers topic
A->B
Broker
balances topic
$10 C
P0
P0
P1
Broker
Transaction Coordinator
__transaction_state topic
'fund-tr'=>pid e0 P0 P1 P7 C
System with Successful Committed Transaction
Funds Transfer App
Consumer API
Producer API
Downstream App
Consumer API
Alice
Bob
transfer $10
Alice → Bob
1. Requests txn ID and
assigned PID and epoch
2. Event fetched by consumer
3. Notifies coordinator of
partition being written to
4. Alice’s account debited
5. Bob’s account credited
6. Consumer offset committed
7. Notify coordinator that
transaction is complete
8. Coordinator writes commit
markers to p0, p1, p7
9. Downstream consumer with
read_committed processes
committed events
2
1
Alice, ($10)
4
3
Bob, $10
5
6
7
9
transactional.id='fund-tr'
isolation.level=
'read_committed'
8
58. developer.confluent.io
Consuming Transactions with read_committed
● Leader maintains last stable offset
(LSO), the smallest offset of any
open transaction
● Fetch response includes
○ only records up to LSO
○ metadata for skipping aborted
records
Broker
balances topic
57
pid 1
($10)
58
pid 2
($8)
60
pid 2
$8
61
pid 1
A
62
pid 1
($7)
63
pid 2
C
64
pid 2
($9)
65
pid 1
$7
66
pid 2
$9
67
pid 1
C
LSO
Consumer able to
read these records
Offset 57 discarded
by consumer
fetch
response
HW
59. developer.confluent.io
A
Interacting with External Systems
Atomic writes to Kafka and
external systems are not
supported
● Instead, write the
transactional output to a
Kafka topic first
● Rely on idempotence to
propagate the data from
the output topic to the
external system
Broker
Consumer Group
Coordinator
__consumer_offsets topic
1 C P7
Broker
balances topic
($10) C
transfers topic
A->B
Broker
balances topic
$10 C
P0
P0
P1
Broker
Transaction Coordinator
__transaction_state topic
txn id=>pid e0 P0 P1 P7 C
Funds Transfer App
Consumer API
Producer API
Alice
Bob
transfer $10
Alice → Bob
transactional.id='fund-tr'
External System
Kafka
Connect
'fund-tr'=>pid e0 P0 P1 P7 C