2
Why should you care about Operators?
3
Any application in any system must be
installed, configured, managed and
upgraded over time
Patching is critical to security
“Anything that isn’t
automated is slowing you
down”
5
$ kubectl scale deploy/staticweb --replicas=3
6
Deploying a database is easy
7
$ kubectl create deployment db --image=quay.io/my/db
8
Running a database over time is harder
9
● Resize/Upgrade
● Reconfigure
● Backup
● Healing
10
If only Kubernetes knew...
11
1. Application-specific custom controllers
2. Custom resource definitions (CRD)
Extending the Kubernetes API
Custom Resource
Developer /
Kubernetes User
Deployments
StatefulSets
Autoscalers
Secrets
Config maps
PersistentVolume
How Does an Operator Work?
K8s API
kind: ProductionReadyDatabase
apiVersion:
database .example.com/v1alpha1
metadata:
name: my-important -database
spec:
connectionPoolSize: 300
readReplicas: 2
version: v4.0.1
Custom Kubernetes
Controller
Watch Events
Reconciliation
+
Custom Resource Definition
Kubernetes Operator Native Kubernetes
Resources
13
Custom Resource (CR)
kind: ProductionReadyDatabase
apiVersion: database.example.com/v1alpha1
metadata:
name: my-production-ready-database
spec:
clusterSize: 3
readReplicas: 2
version: v4.0.1
[...]
14
Operators are automated software
managers software SREs that manage
the entire lifecycle of Kubernetes
applications
controllers
Hausenblas, Schimanski. Programming Kubernetes. O’Reilly, 2019.
Value of Operators
Improve the “time to
first value” for your
customers
Minimize software upgrade
risk and associated
operational costs
Embed best practices
from the experts – you
– into the Operator
Provide a cloud-like
"As a Service"
experience
Red Hat Products
ISV Partners
Community
OPERATOR HUB
Operator Hub - Allows
administrators to
selectively make
operators available from
curated sources to users
in the cluster.
...and many more
OPERATORS ACROSS THE INDUSTRY
19
Operator Maturity Model
Phase I Phase II Phase III Phase IV Phase V
Basic Install
Automated application
provisioning and
configuration management
Seamless Upgrades
Patch and minor version
upgrades supported
Full Lifecycle
App lifecycle, storage
lifecycle (backup, failure
recovery)
Deep Insights
Metrics, alerts, log
processing and workload
analysis
Auto Pilot
Horizontal/vertical scaling,
auto config tuning, abnormal
detection, scheduling tuning
20
● O’Reilly “SRE Book” (Beyer et al)
● Carla Geisser: “Human intervention… is a bug”
● SREs write code to fix those bugs
● SREs write software to run other software
● SREs write Kubernetes Operators
Site Reliability Engineering (SRE)
21
● Can you set operand configuration in the CR?
● Do CR changes cause non-disruptive updates to the Operand?
● Does CR status show what has and hasn’t been applied?
Level 1
Installation - Deployment
22
● Can the Operator upgrade its Operand?
● Without disruption?
● Does CR status show what has and hasn’t been upgraded?
Level 2
Upgrades
23
● Can your Operator back up its Operand?
● Can your Operator restore from a previous Operand backup?
● Ready/Live probes? Active monitoring of basic execution state?
● CPU and other requests and limits set for Operand?
Level 3
Full Lifecycle Management
24
● Does the Operator expose metrics about its own health?
● Metrics and alerts for the Operand?
● Does CR status show what has and hasn’t been applied?
Level 4
Deep Insights
25
The RED Method defines the three key metrics for every service in your
architecture.
● Rate (the number of requests per second)
● Errors (the number of those requests that are failing)
● Duration (the amount of time those requests take)
RED
Rate (aka Traffic) - Errors - Duration (aka Latency)
26
● Marine autopilots are reasonable models, especially with rudder
position feedback
● Auto scaling, healing, tuning
○ Detect condition from metrics, scale horizontally (Replicas) or vertically
(Requests/Limits)
○ Think especially about scaling back down; resource savings
○ Detecting deterioration in Operand(s) (based on Level 4’s metrics) and take
action to redeploy or reconfigure
● CR Status, custom Events: Clear status and especially error
conditions
Level 5
Auto Pilot
27
“Toil Not, Neither Spin” (Kubernetes Operators, Dobies & Wood)
SRE defines “toil” as:
● Automatable - your computer would enjoy it!
● Without enduring value - needs done but doesn’t change the
system
● Grows linearly with growth of the system
Level 5 (cont.)
Auto Pilot
28
Operator Maturity Model
Phase I Phase II Phase III Phase IV Phase V
Basic Install
Automated application
provisioning and
configuration management
Seamless Upgrades
Patch and minor version
upgrades supported
Full Lifecycle
App lifecycle, storage
lifecycle (backup, failure
recovery)
Deep Insights
Metrics, alerts, log
processing and workload
analysis
Auto Pilot
Horizontal/vertical scaling,
auto config tuning, abnormal
detection, scheduling tuning
29
● SRE stuff: Add metrics awareness and tuning to your Operator
● Other APIs / API representations: k8fs?
● K8fs presents Kubernetes API as a synthetic file hierarchy
● % cp manifest.yaml /mnt/k8s/ns/default/deployments/
● % echo 3 >/mnt/k8s/ns/default/deployments/myapp/replicas
Experiments/Challenge Coins
“...left as an exercise for the reader…”
30
https://operatorframework.io
https://operatorhub.io
https://learn.openshift.com/operatorframework/
http://bit.ly/kubernetes-operators
Resources
SRE principles and (Kubernetes) Operator practice | DevNation Tech Talk

SRE principles and (Kubernetes) Operator practice | DevNation Tech Talk

  • 2.
    2 Why should youcare about Operators?
  • 3.
    3 Any application inany system must be installed, configured, managed and upgraded over time Patching is critical to security
  • 4.
    “Anything that isn’t automatedis slowing you down”
  • 5.
    5 $ kubectl scaledeploy/staticweb --replicas=3
  • 6.
  • 7.
    7 $ kubectl createdeployment db --image=quay.io/my/db
  • 8.
    8 Running a databaseover time is harder
  • 9.
  • 10.
  • 11.
    11 1. Application-specific customcontrollers 2. Custom resource definitions (CRD) Extending the Kubernetes API
  • 12.
    Custom Resource Developer / KubernetesUser Deployments StatefulSets Autoscalers Secrets Config maps PersistentVolume How Does an Operator Work? K8s API kind: ProductionReadyDatabase apiVersion: database .example.com/v1alpha1 metadata: name: my-important -database spec: connectionPoolSize: 300 readReplicas: 2 version: v4.0.1 Custom Kubernetes Controller Watch Events Reconciliation + Custom Resource Definition Kubernetes Operator Native Kubernetes Resources
  • 13.
    13 Custom Resource (CR) kind:ProductionReadyDatabase apiVersion: database.example.com/v1alpha1 metadata: name: my-production-ready-database spec: clusterSize: 3 readReplicas: 2 version: v4.0.1 [...]
  • 14.
    14 Operators are automatedsoftware managers software SREs that manage the entire lifecycle of Kubernetes applications
  • 15.
  • 16.
    Value of Operators Improvethe “time to first value” for your customers Minimize software upgrade risk and associated operational costs Embed best practices from the experts – you – into the Operator Provide a cloud-like "As a Service" experience
  • 17.
    Red Hat Products ISVPartners Community OPERATOR HUB Operator Hub - Allows administrators to selectively make operators available from curated sources to users in the cluster.
  • 18.
    ...and many more OPERATORSACROSS THE INDUSTRY
  • 19.
    19 Operator Maturity Model PhaseI Phase II Phase III Phase IV Phase V Basic Install Automated application provisioning and configuration management Seamless Upgrades Patch and minor version upgrades supported Full Lifecycle App lifecycle, storage lifecycle (backup, failure recovery) Deep Insights Metrics, alerts, log processing and workload analysis Auto Pilot Horizontal/vertical scaling, auto config tuning, abnormal detection, scheduling tuning
  • 20.
    20 ● O’Reilly “SREBook” (Beyer et al) ● Carla Geisser: “Human intervention… is a bug” ● SREs write code to fix those bugs ● SREs write software to run other software ● SREs write Kubernetes Operators Site Reliability Engineering (SRE)
  • 21.
    21 ● Can youset operand configuration in the CR? ● Do CR changes cause non-disruptive updates to the Operand? ● Does CR status show what has and hasn’t been applied? Level 1 Installation - Deployment
  • 22.
    22 ● Can theOperator upgrade its Operand? ● Without disruption? ● Does CR status show what has and hasn’t been upgraded? Level 2 Upgrades
  • 23.
    23 ● Can yourOperator back up its Operand? ● Can your Operator restore from a previous Operand backup? ● Ready/Live probes? Active monitoring of basic execution state? ● CPU and other requests and limits set for Operand? Level 3 Full Lifecycle Management
  • 24.
    24 ● Does theOperator expose metrics about its own health? ● Metrics and alerts for the Operand? ● Does CR status show what has and hasn’t been applied? Level 4 Deep Insights
  • 25.
    25 The RED Methoddefines the three key metrics for every service in your architecture. ● Rate (the number of requests per second) ● Errors (the number of those requests that are failing) ● Duration (the amount of time those requests take) RED Rate (aka Traffic) - Errors - Duration (aka Latency)
  • 26.
    26 ● Marine autopilotsare reasonable models, especially with rudder position feedback ● Auto scaling, healing, tuning ○ Detect condition from metrics, scale horizontally (Replicas) or vertically (Requests/Limits) ○ Think especially about scaling back down; resource savings ○ Detecting deterioration in Operand(s) (based on Level 4’s metrics) and take action to redeploy or reconfigure ● CR Status, custom Events: Clear status and especially error conditions Level 5 Auto Pilot
  • 27.
    27 “Toil Not, NeitherSpin” (Kubernetes Operators, Dobies & Wood) SRE defines “toil” as: ● Automatable - your computer would enjoy it! ● Without enduring value - needs done but doesn’t change the system ● Grows linearly with growth of the system Level 5 (cont.) Auto Pilot
  • 28.
    28 Operator Maturity Model PhaseI Phase II Phase III Phase IV Phase V Basic Install Automated application provisioning and configuration management Seamless Upgrades Patch and minor version upgrades supported Full Lifecycle App lifecycle, storage lifecycle (backup, failure recovery) Deep Insights Metrics, alerts, log processing and workload analysis Auto Pilot Horizontal/vertical scaling, auto config tuning, abnormal detection, scheduling tuning
  • 29.
    29 ● SRE stuff:Add metrics awareness and tuning to your Operator ● Other APIs / API representations: k8fs? ● K8fs presents Kubernetes API as a synthetic file hierarchy ● % cp manifest.yaml /mnt/k8s/ns/default/deployments/ ● % echo 3 >/mnt/k8s/ns/default/deployments/myapp/replicas Experiments/Challenge Coins “...left as an exercise for the reader…”
  • 30.