https://strimzi.io
Everything you ever needed to know
about Kafka on Kubernetes
Jakub Scholz

Kafka Summit Europe 2021
About me
" Principal Software Engineer @ Red Hat
" Maintainer of Strimzi project (https://strimzi.io)
" Apache Kafka contributor
@scholzj

https://github.com/scholzj

https://www.linkedin.com/in/scholzj/
Everything you ever needed to know about Kafka on Kubernetes
2
Kafka on Kubernetes
" Many different ways to run Kafka on Kubernetes
○ Bunch of YAML files
○ Helm Charts
○ Operators
" You should still understand how Kubernetes works
Everything you ever needed to know about Kafka on Kubernetes
3
Resources
Everything you ever needed to know about Kafka on Kubernetes
4
Resources
" Configure which resources available to Pods
○ CPU, Memory, Hugepages
" Requests and Limits
○ Requests are guaranteed
○ Limits can be available if enough resources are available
○ When only Limit is configured, it is used automatically as Request
Everything you ever needed to know about Kafka on Kubernetes
5
Resources
" CPU
○ Pods are not killed for exceeding CPU usage
" Memory
○ Pods are killed when the exceed the memory limit
○ Pods might be killed when they exceed the memory request and node runs OoM
Everything you ever needed to know about Kafka on Kubernetes
6
Kafka and Memory
" New JVMs can correctly detect available memory to the container
○ It will auto-configure to use the memory limit and not request
○ Be careful, because the limit might not be really available
" Disk page-cache is counted into the memory request / limit
Everything you ever needed to know about Kafka on Kubernetes
7
Key takeaways
" Always configure container resources
○ More stable and predictable performance, Better scheduling results
" Configure Java memory
○ Control how much memory should be used by JVM and how much by disk cache
○ Configure Java to use only the requested memory
Everything you ever needed to know about Kafka on Kubernetes
8
Pod 

Scheduling
Everything you ever needed to know about Kafka on Kubernetes
9
Affinity
" Defines relationships between different resources
○ Between different Pods
○ Between Pods and Nodes
" Affinity versus Anti-affinity
" Required versus Preferred
Everything you ever needed to know about Kafka on Kubernetes
10
Node affinity
" Defines on which worker nodes will your broker pods be scheduled
" Uses node labels to express where the pods should be placed
○ Built-in labels or custom labels
○ Labels might describe node features (node type, network performance, ...)
○ But also the cluster topology including zones / racks in which the node is running
Everything you ever needed to know about Kafka on Kubernetes
11
affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: node.kubernetes.io/instance-type

operator: In

values:

- m5.8xlarge

- m5.16xlarge
affinity:

nodeAffinity:

preferredDuringSchedulingIgnoredDuringExecution:

- weight: 1

preference:

- matchExpressions:

- key: custom-label

operator: In

values:

- my-value
Pod (anti-)affinity
" Defines which pods should or should not be co-located in the same topology
" Configurable topology to which it applies
○ Worker node
○ Availability zone
Everything you ever needed to know about Kafka on Kubernetes
14
Node 1 Node 2 Node 3
affinity:

podAntiAffinity:

preferredDuringSchedulingIgnoredDuringExecution:

- weight: 100

podAffinityTerm:

labelSelector:

- matchExpressions:

- key: app-type

operator: In

values:

- database

- storage

topologyKey: kubernetes.io/hostname
Node 1 Node 2 Node 3
affinity:

podAntiAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

- labelSelector:

- matchExpressions:

- key: app

operator: In

values:

- kafka

topologyKey: kubernetes.io/hostname
Topology Spread
" Affinity supports only preferred or required scheduling
○ No guarantees when you have more pods than topologies
○ Problem when spreading pods across racks / availability zones
" Topology Spread Constraints come to the rescue
○ Define how are pods spread across topology
Everything you ever needed to know about Kafka on Kubernetes
19
A

ZONE
C

ZONE
B

ZONE
P2
P1
P2
P1
P2
P1
P3 P3
P3
Topology Spread
" Maximal skew
○ Defines how unevenly the pods can be spread
" Configures the behaviour when maximal skew is unsatisfiable
○ Do not schedule the pod versus schedule it anyway
" Label selector to define which pod should be included in the topology spread
Everything you ever needed to know about Kafka on Kubernetes
21
topologySpreadConstraints:

- maxSkew: 1

topologyKey: topology.kubernetes.io/zone

whenUnsatisfiable: DoNotSchedule

labelSelector:

matchLabels:

app: kafka
Stability
" Assignment of a pod to a worker node is by default not permanent
○ After the pods are deleted, they might be scheduled to different nodes or zones
○ Use tools such as Cruise Control to regularly check that your topic replicas are
distributed across the racks / zones
Everything you ever needed to know about Kafka on Kubernetes
23
Storage
" Storage might have its own scheduling limitations
○ AWS EBS volumes are bound to single availability zone
○ Affects how Pods can be scheduled
" Pro-tip: Use allowedTopologies field in Storage Class to schedule volumes
Everything you ever needed to know about Kafka on Kubernetes
24
Dedicated nodes
" Worker nodes dedicated only for Kafka
○ Will still run Kubernetes components, log / metrics collectors etc.
○ Less competing for resources with other applications
○ Better isolation and more predictable performance
Everything you ever needed to know about Kafka on Kubernetes
25
Node 1 Node 2 Node 3
Dedicated nodes
" Taint the nodes to prevent other apps to be scheduled there
" In your Kafka Pods
○ Configure tolerations to allow them on the tainted nodes
○ Configure node affinity to make sure they are not scheduled on any other nodes
Everything you ever needed to know about Kafka on Kubernetes
27
Node 1 Node 2 Node 3
Dedicated Node
kubectl taint nodes node1 dedicated=kafka:NoSchedule
tolerations:

- key: dedicated

operator: Equal

value: kafka

effect: NoSchedule
affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: dedicated

operator: In

values:

- kafka
Key takeaways
" Schedule brokers to the right type of nodes
" Avoid sharing nodes with other I/O intensive workloads or other Kafka brokers
" Spread brokers equally over all zones and use Kafka rack-awareness
" Check distributions of topic replicas over racks regularly
" Consider using dedicated nodes for big clusters
Everything you ever needed to know about Kafka on Kubernetes
32
Disruptions
Everything you ever needed to know about Kafka on Kubernetes
33
Disruptions
" Can impact any environment => Kubernetes is not an exception
" Involuntary disruptions
○ Hardware failures, Network issues, Kernel panics
" Voluntary disruptions
○ Node draining (node repair, upgrades or scaling), bin-packing
Everything you ever needed to know about Kafka on Kubernetes
34
Node 1 Node 2 Node 3
Node 4 Node 5 Node 6
Disruptions
" PodDisruptionBudgets define how much disruption can your cluster handle
○ Limits maximal number of unavailable pods / Minimum number of available pods
○ Defined as absolute number / percentage
○ Selector selects pods to which the budget applies
" Any voluntary disruptions should check PDBs before disrupting your cluster
Everything you ever needed to know about Kafka on Kubernetes
36
Node 1 Node 2 Node 3
Node 4 Node 5 Node 6
apiVersion: policy/v1beta1

kind: PodDisruptionBudget

metadata:

name: kafka-pdb

spec:

maxUnavailable: 1

selector:

matchLabels:

app: kafka
Key takeaways
" Configure Pod Disruption Budgets
" Set max-unavailability to 1 to minimize the disruptions
" Set max-unavailability to 0 to avoid voluntary disruptions
○ Pods will need to be restarted manually when needed
Everything you ever needed to know about Kafka on Kubernetes
39
Others
Everything you ever needed to know about Kafka on Kubernetes
40
Others
" Local Persistent Volumes
" Pod Priority and Preemption
" Scheduling Framework
" Horizontal Pod Autoscaler
Everything you ever needed to know about Kafka on Kubernetes
41
Thank you
http://jsch.cz/kafkasummiteurope2021
Everything
you
ever
needed
to
know
about
Kafka
on
Kubernetes
42

Everything you ever needed to know about Kafka on Kubernetes but were afraid to ask | Jakub Scholz, Red Hat