7. 7
7
Partitions, Tasks, and Consumer Groups
input topic
result topic
4 input topic
partitions
=> 4 tasks
Task
executes
processor
topology
One consumer
group:
can be
executed with
1 to 4 thread on
1 to 4 machines
19. • Changelog topics are log compacted
• Size of changelog topic linear in size of state
Large state implies high recovery times
Recovery Time
20. 20
20
Recovery Overhead
Changelog topic
Segments
(default size 1GB)
Min Topic Size: 21 GB (per shard)
Recovery overhead about 5%
After compaction
Segments
(default size 1GB)
State size: 20 GB (per shard)
Topic size can grow larger
if not compacted
Active Segment
Active Segment
21. 21
21
Recovery Overhead
Changelog topic
Segments
(default size 1GB)
Active Segment
Compaction
Segment
(only 100 MB)
State size: 100 MB (per shard)
Min Topic Size: 1.1 GB
Recovery overhead about 1000%
Each key is stored up to 11 times…
Active Segment
22. • Recovery overhead is proportional to
segment-size / state-size
• Segment-size is smaller than state-size => reduced overhead
• Update changelog topic segment size accordingly
• topic config: log.segments.bytes
• log cleaner interval important, too
Recovery Overhead
36. 36
36
But I’ll want to scale-
out and back
anyway.
Besides, I don’t really
trust my storage
admin.
37. 37
Recommendations:
● Keep change-log shards small
● If you trust your storage:
Use StatefulSets
● Use anti-affinity when possible
● Use “parallel” pod management
41. 41
41
Automate Deployment and Management of Apache Kafka®
Confluent Operator enables you to:
Automate provisioning of
Kafka pods in minutes
Monitor SLAs through
Confluent Control Center or
Prometheus
Scale your Kafkas clusters
elastically
Operate at scale with
enterprise support from
Confluent
Want to learn more about running Kafka on Kubernetes?
confluent.io/kubernetes
43. Summary
• Kafka Streams has recoverable state, that gives streams
apps easy elasticity and high availability
• Kubernetes makes it easy to scale applications
• It also has StatefulSets for applications with state.
• Now you know how to deploy Kafka Streams on
Kubernetes and take advantage on all the scalability and
high-availability capabilities