2
Housekeeping Items
● This session will last about an hour.
● This session will be recorded.
● You can submit your questions by entering them into the GoToWebinar panel.
● The last 10-15 minutes will consist of Q&A.
● The slides and recording will be available after the talk.
Joe Beda started his career at Microsoft working on
Internet Explorer (he was young and naive). Throughout
7 years at Microsoft and 10 at Google, Joe has worked
on GUI frameworks, real-time voice and chat, telephony,
machine learning for ads, and cloud computing. Most
notably, while at Google Joe started the Google
Compute Engine and, along with Brendan and Craig
McLuckie, created Kubernetes. Joe proudly calls Seattle
home.
Gwen is a principal data architect at Confluent helping
customers achieve success with their Apache Kafka®
implementation. She has 15 years of experience
working with code and customers to build scalable data
architectures. Gwen is a committer on the Apache
Kafka project, author of “Kafka - The Definitive Guide,"
and a frequent presenter at industry conferences.
Kubernetes:
Zero to Operators
Joe Beda
@jbeda
CTO, Co-Founder
What is Kubernetes?
Cluster Orchestrator
Decide which container goes on which node
Integrate with networking and service discovery
Value Prop?
Developer and operator productivity via improved workflows
Better resource efficiency via bin packing
What is Kubernetes, really?
Database
API and policy (authN/Z, schemas, defaults, etc.)
“Controllers”
Implement container orchestration logic
Node agent
Controllers
Small piece of code that implements domain logic
Uses exact same APIs as user has access to
Declarative reconciliation loop
Start with supplied desired state
Compare “real world”
Work to reconcile. Relentless forward progress.
Stable distributed programming pattern
Deals well with changes to desired state
Deals well with unexpected “real world” conditions
Handles software crashes, restarts, upgrades
More like Jazz Improv vs. Orchestration
Each controller plays off of each other and the real world.
The actions aren’t pre-planned. Emergent behavior can deal with
unexpected conditions.
Based on goals and not a predefined set of actions.
Kubernetes Objects and Layering
Primitive: Pod
Most primitive thing in Kubernetes.
The set of resources (containers, volumes, networking) that is put on a node
Controller: ReplicaSet
Manages creation/deletion of Pods
Has target number of replicas and pod template. Creates/deletes pods to match.
Controller: Deployment
Layers on top of ReplicaSet to manage rolling upgrades
When upgrading, creates ReplicaSet.
Over time, adjusts # of replicas to move to new version.
Kubernetes Objects and Layering
Controller: Job
Creates and manages Pods for “run to completion”
Has target number to run in parallel and success condition
Restarts on failure
Controller: CronJob
Creates and manages Jobs based based on time schedule
Stateful Services
Primitive: PersistentVolume (PV)
Abstraction over infrastructure volume
Referenced by Pod
Kubernetes connects PV to Node where Pod gets scheduled
Wide support: AWS EBS, AzureDisk, iSCSI, NFS, etc.
Controller: StatefulSet
Replication and management or Pods and PVs in predictable way
More careful and deliberate around scaling and updating
Feature: Headless Service
Name isn’t super helpful here :(
Way to publish discovery of individual Pods for direct access
Custom Resource Definitions (CRDs)
Key to allowing customized controllers
Controller consists of a schema/type along with control code
Anyone can write code to interact with API, CRDs allow schema extension
Allows users to create to controllers
Extend/replace built in controllers (BlueGreenDeployment?)
Usually take a CRD and create/manage other Kubernetes resources
Can interface with external systems outside of Kubernetes
Operators
A piece of software; not a job title!
Domain specific controller
Built for managing a specific software application (e.g. Kafka)
Manages Kubernetes objects and software system
Coordinates between the two
Handles common operational tasks
Captures operational knowledge as code
Future of Kubernetes
More and more automation built on top of Kubernetes
Higher and higher level abstractions
Software will become natively operable
Controllers will get easier to write
Currently too difficult!
Great projects emerging: metacontroller, kubebuilder, operator framework
Example: BlueGreenDeployment in 120 lines of javascript with metacontroller
Kubernetes controller pattern used outside containers
API Server being made “generic” so it can be used in other situations
Example: Heptio Gimbal for multi-cluster load balancing
Thank you!
Joe Beda
CTO, Co-Founder
joe@heptio.com
@jbeda
- Developer and operator productivity via improved workflows
- Better resource efficiency via bin packing
a Jazz band
Don’t ask:
But rather:
• Brokers need identity
• Brokers need to find each other
• Brokers need persistent storage
• Brokers need identity
• Brokers need to find each other
• Brokers need persistent storage
Stateful sets!
Persistent Ephemeral
Local Still beta in
Kubernetes 1.10
Relies on Kafka
Replication. Slow
and high traffic.
Shared Recommended N/A
My app needs to talk to Kafka
Is the app
running on
Kubernetes?
Use ClusterIP and
service name
Can I use a
load
balancer?
Use
Loadbalancer
Use Nodeport
Yes
No
Yes No
Node
Kafka Pod
Kafka Pod
Node
Kafka Pod
Service
Service
Service
Traffic
Port 31093
Port 31091
Port 31092
Node
Kafka Pod
Kafka Pod
Node
Kafka Pod
Service
Service
Service
Port 9091
Port 9092
Port 9093
Traffic
Loadbalancer
Node
Kafka Connect
Schema Registry
Prometheus
agent
Prometheus
agent
Log Collector Agent
Logs Aggregation Service
Metrics Service
• Basic provisioning and un-provisioning of components
• Storage - persistent vs ephemeral, local vs shared
• Traffic - how do external services communicate with Kafka
• Log aggregation
• Metrics collection
• DIY - describe deployment, pod definitions, replicasets, etc, etc
• Some tools can make it easier:
• Operator: Continuously watch the state of the cluster and adjust as needed.
apiVersion: "v1"
kind: "ConfluentCluster"
metadata:
name: "confluent-cluster"
spec:
size: 3
version: "5.0.0"
# kubectl apply -f myspec.yaml
apiVersion: "v1"
kind: "ConfluentCluster"
metadata:
name: "confluent-cluster"
spec:
size: 5
version: "5.0.0"
# kubectl apply -f myspec.yaml
42
Questions?
44
Thank you for joining us!

Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes

  • 2.
    2 Housekeeping Items ● Thissession will last about an hour. ● This session will be recorded. ● You can submit your questions by entering them into the GoToWebinar panel. ● The last 10-15 minutes will consist of Q&A. ● The slides and recording will be available after the talk.
  • 3.
    Joe Beda startedhis career at Microsoft working on Internet Explorer (he was young and naive). Throughout 7 years at Microsoft and 10 at Google, Joe has worked on GUI frameworks, real-time voice and chat, telephony, machine learning for ads, and cloud computing. Most notably, while at Google Joe started the Google Compute Engine and, along with Brendan and Craig McLuckie, created Kubernetes. Joe proudly calls Seattle home. Gwen is a principal data architect at Confluent helping customers achieve success with their Apache Kafka® implementation. She has 15 years of experience working with code and customers to build scalable data architectures. Gwen is a committer on the Apache Kafka project, author of “Kafka - The Definitive Guide," and a frequent presenter at industry conferences.
  • 5.
    Kubernetes: Zero to Operators JoeBeda @jbeda CTO, Co-Founder
  • 6.
    What is Kubernetes? ClusterOrchestrator Decide which container goes on which node Integrate with networking and service discovery Value Prop? Developer and operator productivity via improved workflows Better resource efficiency via bin packing
  • 7.
    What is Kubernetes,really? Database API and policy (authN/Z, schemas, defaults, etc.) “Controllers” Implement container orchestration logic Node agent
  • 9.
    Controllers Small piece ofcode that implements domain logic Uses exact same APIs as user has access to Declarative reconciliation loop Start with supplied desired state Compare “real world” Work to reconcile. Relentless forward progress. Stable distributed programming pattern Deals well with changes to desired state Deals well with unexpected “real world” conditions Handles software crashes, restarts, upgrades
  • 11.
    More like JazzImprov vs. Orchestration Each controller plays off of each other and the real world. The actions aren’t pre-planned. Emergent behavior can deal with unexpected conditions. Based on goals and not a predefined set of actions.
  • 12.
    Kubernetes Objects andLayering Primitive: Pod Most primitive thing in Kubernetes. The set of resources (containers, volumes, networking) that is put on a node Controller: ReplicaSet Manages creation/deletion of Pods Has target number of replicas and pod template. Creates/deletes pods to match. Controller: Deployment Layers on top of ReplicaSet to manage rolling upgrades When upgrading, creates ReplicaSet. Over time, adjusts # of replicas to move to new version.
  • 13.
    Kubernetes Objects andLayering Controller: Job Creates and manages Pods for “run to completion” Has target number to run in parallel and success condition Restarts on failure Controller: CronJob Creates and manages Jobs based based on time schedule
  • 14.
    Stateful Services Primitive: PersistentVolume(PV) Abstraction over infrastructure volume Referenced by Pod Kubernetes connects PV to Node where Pod gets scheduled Wide support: AWS EBS, AzureDisk, iSCSI, NFS, etc. Controller: StatefulSet Replication and management or Pods and PVs in predictable way More careful and deliberate around scaling and updating Feature: Headless Service Name isn’t super helpful here :( Way to publish discovery of individual Pods for direct access
  • 15.
    Custom Resource Definitions(CRDs) Key to allowing customized controllers Controller consists of a schema/type along with control code Anyone can write code to interact with API, CRDs allow schema extension Allows users to create to controllers Extend/replace built in controllers (BlueGreenDeployment?) Usually take a CRD and create/manage other Kubernetes resources Can interface with external systems outside of Kubernetes
  • 16.
    Operators A piece ofsoftware; not a job title! Domain specific controller Built for managing a specific software application (e.g. Kafka) Manages Kubernetes objects and software system Coordinates between the two Handles common operational tasks Captures operational knowledge as code
  • 17.
    Future of Kubernetes Moreand more automation built on top of Kubernetes Higher and higher level abstractions Software will become natively operable Controllers will get easier to write Currently too difficult! Great projects emerging: metacontroller, kubebuilder, operator framework Example: BlueGreenDeployment in 120 lines of javascript with metacontroller Kubernetes controller pattern used outside containers API Server being made “generic” so it can be used in other situations Example: Heptio Gimbal for multi-cluster load balancing
  • 18.
    Thank you! Joe Beda CTO,Co-Founder joe@heptio.com @jbeda
  • 21.
    - Developer andoperator productivity via improved workflows - Better resource efficiency via bin packing
  • 22.
  • 26.
  • 29.
    • Brokers needidentity • Brokers need to find each other • Brokers need persistent storage
  • 30.
    • Brokers needidentity • Brokers need to find each other • Brokers need persistent storage Stateful sets!
  • 31.
    Persistent Ephemeral Local Stillbeta in Kubernetes 1.10 Relies on Kafka Replication. Slow and high traffic. Shared Recommended N/A
  • 32.
    My app needsto talk to Kafka Is the app running on Kubernetes? Use ClusterIP and service name Can I use a load balancer? Use Loadbalancer Use Nodeport Yes No Yes No
  • 33.
    Node Kafka Pod Kafka Pod Node KafkaPod Service Service Service Traffic Port 31093 Port 31091 Port 31092
  • 34.
    Node Kafka Pod Kafka Pod Node KafkaPod Service Service Service Port 9091 Port 9092 Port 9093 Traffic Loadbalancer
  • 35.
    Node Kafka Connect Schema Registry Prometheus agent Prometheus agent LogCollector Agent Logs Aggregation Service Metrics Service
  • 36.
    • Basic provisioningand un-provisioning of components • Storage - persistent vs ephemeral, local vs shared • Traffic - how do external services communicate with Kafka • Log aggregation • Metrics collection
  • 37.
    • DIY -describe deployment, pod definitions, replicasets, etc, etc • Some tools can make it easier: • Operator: Continuously watch the state of the cluster and adjust as needed.
  • 39.
    apiVersion: "v1" kind: "ConfluentCluster" metadata: name:"confluent-cluster" spec: size: 3 version: "5.0.0" # kubectl apply -f myspec.yaml
  • 40.
    apiVersion: "v1" kind: "ConfluentCluster" metadata: name:"confluent-cluster" spec: size: 5 version: "5.0.0" # kubectl apply -f myspec.yaml
  • 42.
  • 44.
    44 Thank you forjoining us!