Deploying Kafka Streams Applications with Docker and Kubernetes

1
1
Deploying Kafka Streams
Applications with Docker
and Kubernetes
Gwen Shapira | Principal Data Architect
Matthias J. Sax | Software Engineer

Agenda
• Kafka Streams 101
• How do Kafka Streams applications scale?
• Kubernetes 101
• Recommendations for Kafka Streams
• Demo:
https://github.com/gwenshap/kafka-streams-stockstats

3
3
Your App
Kafka Streams
KafkaConnect
KafkaConnect
OtherSystems
OtherSystems
Kafka Streams – 101

4
4
Stock Trade Stats Example
val builder = new StreamsBuilder()
val source: KStream[String, Trade] = builder.stream("stocks-topic")
val stats: KTable[Windowed[String], TradeStats] = source
.groupByKey.windowedBy(TimeWindows.of(5000L)
.advanceBy(1000L))
.aggregate(new TradeStats) // init
((_, trade, tradeStats) => tradestats.add(trade)) // agg
stats.toStream // get changelog stream from table
.mapValues(tradeStats => tradeStats.computeAvgPrice())
.to("stockstats-output")
val streams: KafkaStreams = new KafkaStreams(builder.build(), props)
streams.start()

5
5
Topologies
Source Node
Processor Node
Processor Node
Sink Node
streams
state stores
Processor
Topology
builder.stream
source.groupByKey
.windowedBy(…)
.aggregate(…)
mapValues()
to(…)

6
6
How Do
Kafka Streams Application
Scale?

7
7
Partitions, Tasks, and Consumer Groups
input topic
result topic
4 input topic
partitions
=> 4 tasks
Task
executes
processor
topology
One consumer
group:
can be
executed with
1 to 4 thread on
1 to 4 machines

8
8
Scaling with State
Instance 1
Trade Stats App

9
9
Instance 1
Trade Stats App
Instance 2
Trade Stats App
Scaling with State

10
10
Instance 1
Trade Stats App
Instance 2 Instance 3
Trade Stats App Trade Stats App
Scaling with State

11
11
Scaling and Fault-Tolerance
Two Sides of Same Coin

12
12
Fault-Tolerance
Instance 1
Trade Stats App

13
13
Fault-Tolerance
Instance 1
Trade Stats App

14
14
Fault-Tolerance
Instance 1
Trade Stats App

15
15
Fault-Tolerant State
Input Topic
Result Topic
Changelog Topic
State Updates

16
16
Migrate State
Changelog Topic
Instance 1
Trade Stats App

17
17
Trade Stats App
Migrate State
Instance 2
Changelog Topic
Instance 1
Trade Stats App
restore

18
18
Migrate State
Changelog Topic
Instance 1
Instance 2
Trade Stats App

• Changelog topics are log compacted
• Size of changelog topic linear in size of state
Large state implies high recovery times
Recovery Time

20
20
Recovery Overhead
Changelog topic
Segments
(default size 1GB)
Min Topic Size: 21 GB (per shard)
Recovery overhead about 5%
After compaction
Segments
(default size 1GB)
State size: 20 GB (per shard)
Topic size can grow larger
if not compacted
Active Segment
Active Segment

21
21
Recovery Overhead
Changelog topic
Segments
(default size 1GB)
Active Segment
Compaction
Segment
(only 100 MB)
State size: 100 MB (per shard)
Min Topic Size: 1.1 GB
Recovery overhead about 1000%
Each key is stored up to 11 times…
Active Segment

• Recovery overhead is proportional to
segment-size / state-size
• Segment-size is smaller than state-size => reduced overhead
• Update changelog topic segment size accordingly
• topic config: log.segments.bytes
• log cleaner interval important, too
Recovery Overhead

24
24
Container:
Stock Trade Stats App

25
25
Pod:
Container:
Stock Trade Stats App
Container:
Prometheus JMX Exporter Sidecar
Storage
IP

26
26
Pod:
Container
Container
ReplicaSet = pod selection + # of replicas
Deployment = ReplicaSet + update policy
Pod:
Container
Container
Pod:
Container
Container
Kubernetes Pod
replica,
Not Kafka partition
replica…

27
27
Node 2
Node 1
Pod:
Streams
Pod:
Streams
Affinity = place pods on same host
Node 1
Pod:
Streams
Pod:
StreamsAnti-Affinity = place pods on different hosts

28
28
Frontend
Pod
Frontend
Pod
Frontend
Pod
Service
Backend
Pod
Backend
Pod
Backend
Pod
Static IP /
Name
Ephemeral IP

29
29
Pod 1:
Container
Container
StatefulSet = deployment + identity per pod
Pod 2:
Container
Container
Pod 3:
Container
Container
Pod1
Storage
Pod2
Storage
Pod3
Storage
Headless Service:
servicename.namespace.svc.cluster.local
Pod-1 Pod-2 Pod-3

30
30
Recommendations for Kafka Streams

31
31
Stock Stats App
Kafka Streams
Instance 1
Stock Stats App
Kafka Streams
Instance 2
Stock Stats App
Kafka Streams
Instance 3

32
32
This sure looks
stateful.
I see state right
there!

33
33
WordCount App
Kafka Streams
Instance 1
WordCount App
Kafka Streams
Instance 2
WordCount App
Kafka Streams
Instance 3

34
34
StatefulSets are
new and
complicated.
We don’t need
them.

35
35
Recovering state
takes time.
Statelful is faster.

36
36
But I’ll want to scale-
out and back
anyway.
Besides, I don’t really
trust my storage
admin.

37
Recommendations:
● Keep change-log shards small
● If you trust your storage:
Use StatefulSets
● Use anti-affinity when possible
● Use “parallel” pod management

38
38
Goats are cool.
But does it work?

39
39
kind: "Deployment"
metadata:
name: "streams-stock-stats"
labels:
app: "streams-stock-stats"
spec:
replicas: 1
selector:
matchLabels:
template:
metadata:
labels:
spec:
affinity:
podAntiAffinity:
…
containers:
- name: "kafka-streams-stockstat"
image: "gcr.io/gwen-test-202722/kafka-streams-stockstat:latest"

40
40
kind: Service
spec:
clusterIP: None
---
kind: StatefulSet
spec:
replicas: 2
podManagementPolicy: "Parallel"
spec:
containers:
- name: kafka-streams-stockstat
image: gcr.io/…kafka-streams-stockstat:latest
volumeMounts:
- name: rocksdb
mountPath: /var/lib/kafka-streams
volumeClaimTemplates:
- metadata:
name: rocksdb
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi

41
41
Automate Deployment and Management of Apache Kafka®
Confluent Operator enables you to:
Automate provisioning of
Kafka pods in minutes
Monitor SLAs through
Confluent Control Center or
Prometheus
Scale your Kafkas clusters
elastically
Operate at scale with
enterprise support from
Confluent
Want to learn more about running Kafka on Kubernetes?
confluent.io/kubernetes

Summary
• Kafka Streams has recoverable state, that gives streams
apps easy elasticity and high availability
• Kubernetes makes it easy to scale applications
• It also has StatefulSets for applications with state.
• Now you know how to deploy Kafka Streams on
Kubernetes and take advantage on all the scalability and
high-availability capabilities

Deploying Kafka Streams Applications with Docker and Kubernetes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deploying Kafka Streams Applications with Docker and Kubernetes

Similar to Deploying Kafka Streams Applications with Docker and Kubernetes (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Deploying Kafka Streams Applications with Docker and Kubernetes