Everything you ever needed to know about Kafka on Kubernetes but were afraid to ask | Jakub Scholz, Red Hat
The document discusses the important aspects of running Apache Kafka on Kubernetes, covering configurations, resource management, and scheduling strategies to ensure stability and performance. Key topics include setting CPU and memory limits, pod affinity and anti-affinity, using dedicated nodes, and managing disruptions through pod disruption budgets. The document provides best practices for deploying Kafka on Kubernetes effectively.
About me
" PrincipalSoftware Engineer @ Red Hat
" Maintainer of Strimzi project (https://strimzi.io)
" Apache Kafka contributor
@scholzj
https://github.com/scholzj
https://www.linkedin.com/in/scholzj/
Everything you ever needed to know about Kafka on Kubernetes
2
3.
Kafka on Kubernetes
"Many different ways to run Kafka on Kubernetes
○ Bunch of YAML files
○ Helm Charts
○ Operators
" You should still understand how Kubernetes works
Everything you ever needed to know about Kafka on Kubernetes
3
Resources
" Configure whichresources available to Pods
○ CPU, Memory, Hugepages
" Requests and Limits
○ Requests are guaranteed
○ Limits can be available if enough resources are available
○ When only Limit is configured, it is used automatically as Request
Everything you ever needed to know about Kafka on Kubernetes
5
6.
Resources
" CPU
○ Podsare not killed for exceeding CPU usage
" Memory
○ Pods are killed when the exceed the memory limit
○ Pods might be killed when they exceed the memory request and node runs OoM
Everything you ever needed to know about Kafka on Kubernetes
6
7.
Kafka and Memory
"New JVMs can correctly detect available memory to the container
○ It will auto-configure to use the memory limit and not request
○ Be careful, because the limit might not be really available
" Disk page-cache is counted into the memory request / limit
Everything you ever needed to know about Kafka on Kubernetes
7
8.
Key takeaways
" Alwaysconfigure container resources
○ More stable and predictable performance, Better scheduling results
" Configure Java memory
○ Control how much memory should be used by JVM and how much by disk cache
○ Configure Java to use only the requested memory
Everything you ever needed to know about Kafka on Kubernetes
8
Affinity
" Defines relationshipsbetween different resources
○ Between different Pods
○ Between Pods and Nodes
" Affinity versus Anti-affinity
" Required versus Preferred
Everything you ever needed to know about Kafka on Kubernetes
10
11.
Node affinity
" Defineson which worker nodes will your broker pods be scheduled
" Uses node labels to express where the pods should be placed
○ Built-in labels or custom labels
○ Labels might describe node features (node type, network performance, ...)
○ But also the cluster topology including zones / racks in which the node is running
Everything you ever needed to know about Kafka on Kubernetes
11
Pod (anti-)affinity
" Defineswhich pods should or should not be co-located in the same topology
" Configurable topology to which it applies
○ Worker node
○ Availability zone
Everything you ever needed to know about Kafka on Kubernetes
14
Topology Spread
" Affinitysupports only preferred or required scheduling
○ No guarantees when you have more pods than topologies
○ Problem when spreading pods across racks / availability zones
" Topology Spread Constraints come to the rescue
○ Define how are pods spread across topology
Everything you ever needed to know about Kafka on Kubernetes
19
Topology Spread
" Maximalskew
○ Defines how unevenly the pods can be spread
" Configures the behaviour when maximal skew is unsatisfiable
○ Do not schedule the pod versus schedule it anyway
" Label selector to define which pod should be included in the topology spread
Everything you ever needed to know about Kafka on Kubernetes
21
Stability
" Assignment ofa pod to a worker node is by default not permanent
○ After the pods are deleted, they might be scheduled to different nodes or zones
○ Use tools such as Cruise Control to regularly check that your topic replicas are
distributed across the racks / zones
Everything you ever needed to know about Kafka on Kubernetes
23
24.
Storage
" Storage mighthave its own scheduling limitations
○ AWS EBS volumes are bound to single availability zone
○ Affects how Pods can be scheduled
" Pro-tip: Use allowedTopologies field in Storage Class to schedule volumes
Everything you ever needed to know about Kafka on Kubernetes
24
25.
Dedicated nodes
" Workernodes dedicated only for Kafka
○ Will still run Kubernetes components, log / metrics collectors etc.
○ Less competing for resources with other applications
○ Better isolation and more predictable performance
Everything you ever needed to know about Kafka on Kubernetes
25
Dedicated nodes
" Taintthe nodes to prevent other apps to be scheduled there
" In your Kafka Pods
○ Configure tolerations to allow them on the tainted nodes
○ Configure node affinity to make sure they are not scheduled on any other nodes
Everything you ever needed to know about Kafka on Kubernetes
27
Key takeaways
" Schedulebrokers to the right type of nodes
" Avoid sharing nodes with other I/O intensive workloads or other Kafka brokers
" Spread brokers equally over all zones and use Kafka rack-awareness
" Check distributions of topic replicas over racks regularly
" Consider using dedicated nodes for big clusters
Everything you ever needed to know about Kafka on Kubernetes
32
Disruptions
" Can impactany environment => Kubernetes is not an exception
" Involuntary disruptions
○ Hardware failures, Network issues, Kernel panics
" Voluntary disruptions
○ Node draining (node repair, upgrades or scaling), bin-packing
Everything you ever needed to know about Kafka on Kubernetes
34
Disruptions
" PodDisruptionBudgets definehow much disruption can your cluster handle
○ Limits maximal number of unavailable pods / Minimum number of available pods
○ Defined as absolute number / percentage
○ Selector selects pods to which the budget applies
" Any voluntary disruptions should check PDBs before disrupting your cluster
Everything you ever needed to know about Kafka on Kubernetes
36
Key takeaways
" ConfigurePod Disruption Budgets
" Set max-unavailability to 1 to minimize the disruptions
" Set max-unavailability to 0 to avoid voluntary disruptions
○ Pods will need to be restarted manually when needed
Everything you ever needed to know about Kafka on Kubernetes
39
Others
" Local PersistentVolumes
" Pod Priority and Preemption
" Scheduling Framework
" Horizontal Pod Autoscaler
Everything you ever needed to know about Kafka on Kubernetes
41