Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes

2
Housekeeping Items
● This session will last about an hour.
● This session will be recorded.
● You can submit your questions by entering them into the GoToWebinar panel.
● The last 10-15 minutes will consist of Q&A.
● The slides and recording will be available after the talk.

Joe Beda started his career at Microsoft working on
Internet Explorer (he was young and naive). Throughout
7 years at Microsoft and 10 at Google, Joe has worked
on GUI frameworks, real-time voice and chat, telephony,
machine learning for ads, and cloud computing. Most
notably, while at Google Joe started the Google
Compute Engine and, along with Brendan and Craig
McLuckie, created Kubernetes. Joe proudly calls Seattle
home.
Gwen is a principal data architect at Confluent helping
customers achieve success with their Apache Kafka®
implementation. She has 15 years of experience
working with code and customers to build scalable data
architectures. Gwen is a committer on the Apache
Kafka project, author of “Kafka - The Definitive Guide,"
and a frequent presenter at industry conferences.

Kubernetes:
Zero to Operators
Joe Beda
@jbeda
CTO, Co-Founder

What is Kubernetes?
Cluster Orchestrator
Decide which container goes on which node
Integrate with networking and service discovery
Value Prop?
Developer and operator productivity via improved workflows
Better resource efficiency via bin packing

What is Kubernetes, really?
Database
API and policy (authN/Z, schemas, defaults, etc.)
“Controllers”
Implement container orchestration logic
Node agent

Controllers
Small piece of code that implements domain logic
Uses exact same APIs as user has access to
Declarative reconciliation loop
Start with supplied desired state
Compare “real world”
Work to reconcile. Relentless forward progress.
Stable distributed programming pattern
Deals well with changes to desired state
Deals well with unexpected “real world” conditions
Handles software crashes, restarts, upgrades

More like Jazz Improv vs. Orchestration
Each controller plays off of each other and the real world.
The actions aren’t pre-planned. Emergent behavior can deal with
unexpected conditions.
Based on goals and not a predefined set of actions.

Kubernetes Objects and Layering
Primitive: Pod
Most primitive thing in Kubernetes.
The set of resources (containers, volumes, networking) that is put on a node
Controller: ReplicaSet
Manages creation/deletion of Pods
Has target number of replicas and pod template. Creates/deletes pods to match.
Controller: Deployment
Layers on top of ReplicaSet to manage rolling upgrades
When upgrading, creates ReplicaSet.
Over time, adjusts # of replicas to move to new version.

Kubernetes Objects and Layering
Controller: Job
Creates and manages Pods for “run to completion”
Has target number to run in parallel and success condition
Restarts on failure
Controller: CronJob
Creates and manages Jobs based based on time schedule

Stateful Services
Primitive: PersistentVolume (PV)
Abstraction over infrastructure volume
Referenced by Pod
Kubernetes connects PV to Node where Pod gets scheduled
Wide support: AWS EBS, AzureDisk, iSCSI, NFS, etc.
Controller: StatefulSet
Replication and management or Pods and PVs in predictable way
More careful and deliberate around scaling and updating
Feature: Headless Service
Name isn’t super helpful here :(
Way to publish discovery of individual Pods for direct access

Custom Resource Definitions (CRDs)
Key to allowing customized controllers
Controller consists of a schema/type along with control code
Anyone can write code to interact with API, CRDs allow schema extension
Allows users to create to controllers
Extend/replace built in controllers (BlueGreenDeployment?)
Usually take a CRD and create/manage other Kubernetes resources
Can interface with external systems outside of Kubernetes

Operators
A piece of software; not a job title!
Domain specific controller
Built for managing a specific software application (e.g. Kafka)
Manages Kubernetes objects and software system
Coordinates between the two
Handles common operational tasks
Captures operational knowledge as code

Future of Kubernetes
More and more automation built on top of Kubernetes
Higher and higher level abstractions
Software will become natively operable
Controllers will get easier to write
Currently too difficult!
Great projects emerging: metacontroller, kubebuilder, operator framework
Example: BlueGreenDeployment in 120 lines of javascript with metacontroller
Kubernetes controller pattern used outside containers
API Server being made “generic” so it can be used in other situations
Example: Heptio Gimbal for multi-cluster load balancing

Thank you!
Joe Beda
CTO, Co-Founder
joe@heptio.com
@jbeda

- Developer and operator productivity via improved workflows
- Better resource efficiency via bin packing

• Brokers need identity
• Brokers need to find each other
• Brokers need persistent storage

• Brokers need identity
• Brokers need to find each other
• Brokers need persistent storage
Stateful sets!

Persistent Ephemeral
Local Still beta in
Kubernetes 1.10
Relies on Kafka
Replication. Slow
and high traffic.
Shared Recommended N/A

My app needs to talk to Kafka
Is the app
running on
Kubernetes?
Use ClusterIP and
service name
Can I use a
load
balancer?
Use
Loadbalancer
Use Nodeport
Yes
No
Yes No

Node
Kafka Pod
Kafka Pod
Node
Kafka Pod
Service
Service
Service
Traffic
Port 31093
Port 31091
Port 31092

Node
Kafka Pod
Kafka Pod
Node
Kafka Pod
Service
Service
Service
Port 9091
Port 9092
Port 9093
Traffic
Loadbalancer

Node
Kafka Connect
Schema Registry
Prometheus
agent
Prometheus
agent
Log Collector Agent
Logs Aggregation Service
Metrics Service

• Basic provisioning and un-provisioning of components
• Storage - persistent vs ephemeral, local vs shared
• Traffic - how do external services communicate with Kafka
• Log aggregation
• Metrics collection

• DIY - describe deployment, pod definitions, replicasets, etc, etc
• Some tools can make it easier:
• Operator: Continuously watch the state of the cluster and adjust as needed.

apiVersion: "v1"
kind: "ConfluentCluster"
metadata:
name: "confluent-cluster"
spec:
size: 3
version: "5.0.0"
# kubectl apply -f myspec.yaml

apiVersion: "v1"
kind: "ConfluentCluster"
metadata:
name: "confluent-cluster"
spec:
size: 5
version: "5.0.0"
# kubectl apply -f myspec.yaml

Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes

More Related Content

What's hot

Similar to Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes

More from confluent

Recently uploaded

Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes