This talk was given at Boston Cloud Native Meetup on Feb 9th 2023
DORA’s Four Key DevOps have gained much attention as they provide critical insights into an organization’s maturity in automating the delivery of high-quality software. Google provides a blueprint implementation which requires extending your existing delivery pipelines (Jenkins, Argo, Flux, GitHub, GitLab …) to push those metrics to an external database. While doable, many platform engineers we spoke to are seeking an alternative solution and more cloud-native approach.
The CNCF project Keptn saw this as an opportunity to provide a K8s- & Cloud-Native solution that provides 100% coverage, WITHOUT changing pipelines and using OpenTelemetry as standard collection framework.
Join this talk where Andi (Andreas) Grabner, DevRel at Keptn, will show you how you can use Keptn’s Lifecyle Toolkit to get your DORA metrics within 5 minutes. Andi also covers how the Lifecycle Toolkit brings application-awareness into your deployments and allows you to execute pre- and post-deployment checks as serverless functions – all declaratively as part of your existing K8s CRDs.
Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments
1. Andreas Grabner
DevOps Activist @ Dynatrace
DevRel & Maintainer @ Keptn
@grabnerandi, https://linkedin.at/grabnerandi
@keptnProject, https://www.keptn.sh
“Don’t Deploy into the Dark: DORA Metrics for K8s” aka
Observability and Orchestration of your Deployments
Taming the operational complexity of GitOps with Keptn
2. What is DORA?
DORA: Measuring DevOps Efficiency
?
Deployment Frequency
How often an organization successfully releases to production
Lead Time for Changes
The amount of time it takes a commit to get into production
Change Failure Rate
The percentage of deployments causing a failure in production
Time to Restore Service
How long it takes an organization to recover from a failure in production
Read More: https://www.allstacks.com/blog/dora-metrics
3. Google’s 4 Keys BluePrint: 4 Challenges to address when providing a K8s native implementation
#1: Data Retrieval
PUSH: needs pipeline updates
Or PULL: needs access to API
Hard to achieve 100% coverage!
#3: Storage
Requires a Custom data store
#4: Analytics
Custom dashboarding.
Not using any open standards!
#2: App-Context
Manual mapping of pipeline,
ticket, PR … data to “an app”
https://cloud.google.com/blog/products/devops-sre/using-the-four-keys-to-measure-your-devops-performance
4. KeptnV1: General Purpose Orchestration for Cloud and Non-Cloud-Native Tooling on K8s
2019 2023
2022
2021
2020
KeptnV1
Release
Keptn 1.x LTS
Releases
Strengthen core use cases
(Quality Gates)
Core Use Case: Quality Gates & WebHook Core Use Case: Delivery Orchestration Core Use Case: Auto Remediation
5. But in the recent past, Cloud Native GitOps has shifted complexity from Dev to Ops
Complex Dev & Continuous Integration (CI): Composition, Validation, Security
Feature A
Feature B
Backend
Svc
Authentication
Service
Frontend
Components
Service A (v1)
Service B (v2, v3)
Service C (v1)
Service D (v3,v4)
Infra & Cloud (vY)
Svc A (v1)
Svc B (v2)
Svc C (v1)
Svc D (v3)
Istio/Prom/
xxx X
Monolithic
World
Cloud
Native
World
Simpler Ops: Deploy, observe and operate a well-defined app
Simpler Dev: Service-based Continuous Delivery (CD) Complex Ops: Composition, Validation, Security, Observability, Orchestration
Svc A (v1)
Svc B (v3)
Svc C (v1)
Svc D (v3)
Istio/Prom/
xxx Y
Well-defined Infra, e.g: VMWare, EC2 …
with diff flavors (EKS, AKS, OpenShift …)
Test Security
Test, security, validation not part of GitOps
Validate
6. This shift also brought us a lot of feedback for KeptnV1
Single Pod-Ready != Application Ready
That’s why 90% Keptn Users adopted our SLO-based Validation
Zero Integration
It’s hard integrating new tools into EVERY pipeline. Especially in
organizations with > 800 Global Dev Teams. Need a zero-integration
approach
Configuration as CRDs
Many teams are adopting ArgoCD, Flux … non CRDs in external
repository doesn’t work in a “GitOps world”
OpenObservability
The CNCF community has agreed on OpenTelemetry and
Prometheus for observability data. Why store it in MongoDB?
Application Awareness
While we deploy individual workloads (=microservices) we need
visibility into applications that are made up of one or many services
7. Solution: A K8s Operator* to Observe and Orchestrate App-aware Workload Lifecycle
My-Application:2.0 **
Frontend-Svc:2.0
Backend-Svc:1.5
Storage-Svc:1.0
Post
Pre
Post
Pre
Post
Pre
Timespan & Result for each single deployment
Pre-App-Deployment
Post-App-Deployment
Timespan Time & Result for whole app deployment
Tasks: Dependency, Env Health,
Certificates, Approval, ...
Evaluations: SLOs, Error
Budgets ...
Timespan & result of
each task / evaluation
Tasks: Tests, Security Scans,
Cleanups, Promote ...
Evaluations: SLOs, User
Experience, Adoption ...
Timespan & result of
each task / evaluation
* K8s Operators can leverage K8s webhooks and extend K8s scheduler for pre- and post-deployment hooks
** K8s doesn’t yet have a standard application concept but Delivery SIG is working on it
Observe: Metrics (DORA) & Traces
Orchestrate: Pre-Deploy Orchestrate: Post-Deploy
8. Introducing Keptn Lifecycle Toolkit:
App-aware Workload Lifecycle Observability and Orchestration
kind: Deployment
name: simplenode
...
template:
metadata:
annotations:
keptn.sh/workload: simplenode
keptn.sh/version: 3.0.1
keptn.sh/pre-tasks: notify
keptn.sh/pre-eval: check-error-budget
keptn.sh/post-tasks: api-tests, notify
keptn.sh/post-eval: evaluate-slo
...
kind: KeptnApp
name: simplenode-app
spec:
workloads:
- name: simplenode-frontend-svc
version: 3.0.1
- name: simplenode-backend-svc
version: 2.0.1
pre-task: check-approval
pre-eval: check-error-budget
post-tasks: functests, notify
post-eval: evaluate-slo
Example from upcoming Live Demo: Single deployment with Keptn annotations, Keptn Tasks and optional Keptn App
Step 1: Annotate your Deployment & StatefulSets! Step 2: Define Keptn Tasks and Evaluations!
kind: KeptnTaskDefinition
name: notify
...
spec:
function: |
let text = Deno.env.get("SECURE_DATA");
let context = Deno.env.get("CONTEXT");
let resp = await fetch("https://hooks.slack.com/xxxx");
console.log("Sending slack message")
Step 3: (optional) KeptnApp for app-awareness !
kind: KeptnEvaluation
name: evaluate-slo
Spec:
source: prometheus
objectives:
- name: cpu capacity
query: "sum(kube_node_status_capacity{resource='cpu’} "
evaluationTarget: ">4"
14. Keptn Lifecycle Toolkit tames the complexity that GitOps shifted towards Ops
Cloud
Native
World
Simpler Dev: Service-based Continuous Delivery (CD) Simpler Ops: Deploy, observe and operate a well-defined app
Service A (v1)
Service B (v2, v3)
Service C (v1)
Service D (v3,v4)
Infra & Cloud (vY)
Svc A (v1)
Svc B (v2)
Svc C (v1)
Svc D (v3)
Istio/Prom/
xxx X
Svc A (v1)
Svc B (v3)
Svc C (v1)
Svc D (v3)
Istio/Prom/
xxx Y
-native on any flavor and tooling
Extends GitOps with Test, security, validation
Declarative in Git
15. Where is the journey heading for the Keptn project and which Keptn to choose?
2023
If your toolstack is … … and you need …
Automated Observability into your K8s Deployments
Deployment policy enforcements natively in K8s
Observability-based based application Health Checks
A custom DevOps orchestration engine
Integrate Quality Gates into your existing CI/CD
Application- and not just workload-awareness
… then go with
Keptn 1.x LTS
Releases
KLT
Keptn Lifecycle Toolkit
16. Andreas Grabner
DevOps Activist @ Dynatrace
DevRel & Maintainer @ Keptn
@grabnerandi, https://linkedin.at/grabnerandi
@keptnProject, https://www.keptn.sh
Observability and Orchestration of your Deployments
Taming the operational complexity of GitOps with Keptn