Slides from the talk on lessons on running Kafka on Kubernetes by Pavan Keshavamurthy and Avinash Upadhyaya of Platformatory at the Apache Kafka Mumbai July 2023 meetup.
Look at various tooling around running Apache Kafka on Kubernetes and cover best practices for running a distributed system such as Kafka on Kubernetes.
5. - More gluttony for torture
- Surprisingly simpler than
configuring
server.properties by hand
(or ansible)
- (if done well)
You want to run Kafka on K8S?
6. The Operator
Pattern in a
summary
- Kubernetes operator watches a CR type and takes application-specific actions to make the
current state match the desired state in that resource
- Implement domain-specific knowledge using Kubernetes
- Allows managing complex applications using the Kubernetes API and the kubectl interface
8. Scope of coverage:
A mental model on
Kubernetes
Operators for kafka
- Operator Core
- Custom Resources
- Workload Type
- Networking
- Storage
- Security
- Authentication
- Authorization
- Operational Features
- Balancing
- Monitoring
- Disaster Recovery
- Scale up/out
- Deployments & Rollouts
- Extensibility
9. Security: What is a typical requirement for kafka?
● Auto generate certificates for TLS and mTLS between brokers and other internal components
● Natively support authentication mechanism such as SASL/PLAIN, SASL/SCRAM,
SASL/OAUTHBEARER, SASL/GSSAPI
● Authorization with ACLs - Provide user management capabilities using the k8s API
10. Operations: What is a typical requirement for kafka?
● Re-balancing partitions when the load on the brokers is uneven, broker is added/removed
● Monitoring cluster health with JMX metrics
● Rolling upgrades with no downtime
● Replicate data across clusters
● Rack awareness for durability
11. Confluent For
Kubernetes(CFK)
● Confluent Platform on Kubernetes
● Based on experience of running Kafka on
Kubernetes for Confluent Cloud
● Uses StatefulSets for restoring a Kafka pod with
the same Kafka broker ID, configuration, and
persistent storage volumes if a failure occurs.
● Provides server properties, JVM, and Log4j
configuration overrides for customization of all
Confluent Platform components.
● Complete granular RBAC
● Support for credential management systems,
such as Hashicorp Vault, to inject sensitive
configurations in memory to Confluent
deployments
● Supports tiered storage
● Supports multi-region
12. Strimzi
● Open source, CNCF sandbox project
● Implement security in a Kubernetes-native
fashion
● Uses StrimziPodSets to overcome challenges of
StatefulSets
○ Add/remove broker arbitrarily
○ Stretch cluster across k8s clusters
○ Different configurations and volumes for different
brokers
● KafkaBridge for a RESTful HTTP interface
13. Koperator (Banzai
Cloud)
● Open-source core component of Banzai Cloud
Supertubes
○ most of the compelling features and integrations
are only available as part of the Supertubes Core
or Supertubes Pro product suites
● Envoy based load balancing for external access
● Uses pods instead of StatefulSets, in order to
○ modify the configuration of unique Brokers
○ remove specific Brokers from clusters
○ use multiple Persistent Volumes for each Broker
15. Prescriptive Advise
- As with all things, k8s: It is important to setup
resource constraints (CPU, MemLimits)
- Generally advised to have Kafka nodes tainted
to NoSchedule and run on a dedicated basis.
- = no binpack nodes
- For most real-life use-cases, CRs are a starting
point. Will need to be or packaged to “platform
recipes” with different components, orienting
some level of tenancy around the brokers as
well as the components
- Typically a higher order Helm chart, preferably
with GitOps style deployments
- Prospective users must also think about operator
tenancy itself. Could be a global operator or a
namespaced operator
16. Key Takeaways
- Running Kafka on K8S can be a lot of toil,
without an operator. If you are running Kafka at
scale (and not on a managed service), consider
running one. It will save you time, money &
sanity
- You can make a choice based on your
environment, features (or the lack thereof),
licensing and other specialized purposes
- YMMV with Operator CRs. Each operator has its
own opinion based on the realities it was
designed for
- Kafka is ultimately not “k8s native”. The operator
only provides so much operational sugar
- As a result, there are several shoehorning
mechanisms (such as config overrides to inject
component properties, builtin); Full expressivity of
the workload doesn’t quite exist
- All operators provide comparable performance