Continuous Lifecycle London 2018 Event Keynote

GitOps
Git push all the things
Alexis Richardson
CEO, Weaveworks
TOC Chair, CNCF
@monadic
May 2018

Hello
● WTF is GitOps
● Why is Cloud Native relevant
● How does GitOps work and in what ways is it different from $MY_DEVOPS
● Tools
● Recap
3

Meet Qordoba
● SF based team use machine learning
to create ”local” marketing UX for big
brands
● Rapid iteration while obeying SOC2
compliance
● Google Cloud – Kubernetes & CI
● Weave Cloud – single cont. delivery
& observability pipeline

Over 30 releases per day per team, up from 1-2 per week across all teams
1) Estimated time needed to fix prod software bugs ~60% less time
2) Estimated time to respond to customer requests ~43% less time
3) Uptime 99% à 100% (so far…!)
Impact

Kubernetes: declarative infrastructure & orchestration

Image credit:
Helen Beal,
Ranger4
At least a decade of DevOps best practices

GitOps is
Automation for
Cloud Native
Describe the system
& build to that plan

New ways of working
cloud led us to devops
cloud native leads us to gitops
“push code, not containers”
“operations by pull request”

• Config is code
• Code must be version controlled
• Config must be version controlled too
GitOps follows the Logic of DevOps

GitOps follows the Logic of DevOps
• Config is code
• Code must be version controlled
• Config must be version controlled too
• What can be described can be automated
• Describe everything: code, config,
monitoring & policy; and then keep it in
version control

GitOps
• Git as a source of truth for desired state of whole system yes really
the whole system
• Control loop compares desired with actual state to pull changes,
enforce convergent atomic updates and writeback to log in Git
• Diff alerts, eg.:

Atomic updates for
declarative stack
Developer experience
is just Git push
Best practice for
Continuous Delivery
with Kubernetes
Kubernetes
Current
State via
Observability
Tools
Control &
Operations
Desired State
in Git Diff
Observe
Orient
Decide
Act
Release

What this gets us
• Any developer can use GitHub
• Anyone can join team and ship a new
app or make changes easily
• All changes can be triggered, stored,
audited and validated in Git
And we didn’t have to do anything very
new or clever

“The world is envisioned
as a repo and not as a
kubernetes installation"
- Kelsey Hightower
Kubernetes ❤ GitOps

Kubernetes is complex, ideally you’d like to…
Make a pull request & just go to a URL to see app change
Avoid kubectl
Have “Bonus points for Metrics… If you give people visibility, they will
stop asking for tools like kubectl to do their job, because now they can
actually observe what’s happening in the cluster”

Who is talking about or doing GitOps?
Weaveworks
Cloudbees
Bitnami
OpenFaaS
Hasura
Ocado
Financial Times
& more!

19
About Weaveworks
● Founded in 2014, backed by Google Ventures &
Accel Partners
● Mission: help software teams go faster by
providing technologies that support cloud native
development

20
● 40 people
● Berlin
● London
● San Francisco
Team

21
Team
Some of us are known for...

● Building cloud-native OSS since 2014
(Weave Net, Moby, Kubernetes, Prometheus)
● Founding member of CNCF
● Alexis Richardson (Weaveworks CEO) is chair of
the CNCF Technical Oversight Committee
● Weave Cloud runs on Kubernetes since 2015
22
About Weaveworks

• We use declarative infrastructure ie.
Kubernetes, Docker, Terraform, … and we
“diff all the things”
• Our entire system including code, config,
monitoring rules, dashboards, is described
in GitHub with full audit trail
• We roll out major or minor changes as pull
requests for any updates, outages and D/R
GitOps at Weaveworks

CNCF is building a cloud platform
● Goal of a Cloud Platform for era of ubiquitous services
à a bigger deal than the Web
à open like Linux
à everyone is on board this time
● Business Peeps TLDR Cloud Native is Cloud
● Outcome: Innovation and new Business Models for make profit

Hadoop
Typical Hadoop Project 2013

No platform?
Who
wants to
build a
toaster?

Platforms enable Velocity
● Higher speed
● Lower barriers to entry
● Explosion of higher order systems

Velocity is a key metric in Continuous Delivery
High-performing teams deploy
more frequently and have
much faster lead times
They make changes with fewer
failures, and recover faster
from failures
200x more frequent
deployments
2,555x shorter lead
times
3x lower
change failure rate
24x faster
recovery from failures
200x
2,555x 3x
24x
Source: 2016 State of DevOps Report (Puppet Labs)

Make me a Velocity
Developers write code
that powers Applications
and integrates Services
deployed to a Cloud Platform that is easy, stable & operable
using best practices for Continuous Delivery at high velocity

New Cloud Platform
“Just run my code”
Kubernetes
Infra - Cloud & DCs & Edge
Other CNCF
Projects
Local Services &
Data
Code >>
Containers >>

1000s of ways to “Just Run My Code”
● Serverless: Openfaas, Kubeless, OpenEvents, AWS Lambda….
● PaaS (Openshift, Cloud Foundry..), MBaaS, KMaaS, ..
● Kubeflow, Istio, Pachyderm and other k8s native app f/works
● Declarative app def eg compose, ksonnet, ballerina
● Native general frameworks: metaparticle
● Ports: Laravel (PHP!) and other app frameworks to Kube
● Tools: Cert-manager, ChaosIQ, ..
● Explosion of higher order systems is caused by platform

Serverless & Kubernetes will converge
● Ubiquity of Kubernetes will pull serverless into the story - from “run my
containers” to “run my code”
● Consumption and packaging of services is where serverless and functions
add value today, and will be part of the Platform. AWS Lambda is a “clue”
not the “answer”.
● Commonly used programming tools will unify Kubernetes, containers,
“serverless”, managed services / APIs
● These models will be cloud agnostic
● The “pay per call” serverless business model will just be a feature of the
cloud platform management layer (eg: AWS Fargate)

Getting to a Cloud Platform
2017 2018-20 2020+
Core Platform
- Kubernetes & containers
Observability / Operability
- monitoring (prom.)
- logging (fluentd)
- tracing (jaeger, OT)
Routing
- mesh (envoy, linkerd)
- messaging (nats)
Security:
Spiffe, OPA, SAFE
Storage:
- orchestration
- CSI
- other
Interfaces:
- OpenMetrics
- OpenEvents
Developer On Ramp:
CICD, Helm packaging, &c
Marketplace of Services
and other Add-ons
“Just run my code” user
experiences for 1000s of
different use cases
>> Towards Ubiquity

Cloud native – just run my code

New ways of working
cloud led us to devops
cloud native leads to gitops
“push code not containers”
“operations by pull request”

Summary
● Cloud Platform powered by CNCF tools, Kubernetes at the core
● Multi Cloud support: Amazon, Azure, OSS
● Explosion of higher order tools and services
● GitOps for high velocity delivery pipeline

● Why Git
● Examples of what’s in Git (and image repo)
● CICD pipeline
● Security, Compliance & Audit
● Observability & Control
● Tools Overview
GitOps in depth
55

GitOps builds on DevOps with Git as a single source of truth for the
desired state of the system
● The entire system state is under version control and described in Git (trunk best)
● Operational changes on production clusters are made by pull request
● Rollback and audit logs are provided via Git
● When disaster strikes, the whole infrastructure can be quickly restored from Git

59
Canonical
source of truth
People

60
Canonical
source of truth
People
Software
Agents

61
Canonical
source of truth
People
Software
Agents
Software
Agents

62
Canonical
source of truth
People
Software
Agents
Software
Agents

63
Canonical
source of truth
Clear model with strong separations of concerns
(safety)
Easy rollbacks and reverts (velocity)
Tapping into existing code review tools and
processes
Great compliance tool
Collaboration point between software and
humans

Dashboards
Alerts
Playbook
Kubernetes Manifests
Application configuration
Provisioning scripts
65
Application checklists
Recording Rules
Sealed Secrets

67
Grafanalib dashboard library
https://github.com/weaveworks/grafanalib

Destination
config
apiVersion: config.istio.io/v1beta1
kind: DestinationPolicy
metadata:
name: ratings-lb-policy
namespace: default
spec:
destination:
name: reviews
labels:
version: v1
loadBalancing:
name: ROUND_ROBIN
circuitBreaker:
simpleCb:
maxConnections: 100
httpMaxRequests: 1000
httpMaxRequestsPerConnection: 10
httpConsecutiveErrors: 7
sleepWindow: 15m
httpDetectionInterval: 5m
RANDOM, LEAST_CONN
Limits outgoing connections to
“v1” of the reviews service
● 100 connections
● 1000 concurrent requests
● 10 rps
Load-balances in round-robin
fashion across all reviews “v1”
endpoints
Configures host ejection
● 7 consecutive 5xx errors
● Period of 15 minutes
● Scanned every 5 minutes

Egress config
kind: EgressRule
metadata:
name: foo-egress-rule
spec:
destination:
service: *.foo.com
ports:
- port: 80
protocol: http
- port: 443
protocol: https
Provides access to a set of
services under the foo.com
domain.
Sidecar will handle automatically
upgrading connection to TLS, if
desired.
● Must access as HTTP
● Example:
http://mail.foo.com:443

Routing config
kind: RouteRule
metadata:
name: reviews-rating-jason-rule
namespace: default
spec:
destination:
name: ratings
route:
- labels:
version: v1
weight: 100
match:
source:
name: reviews
labels:
version: v2
request:
headers:
cookie:
regex: "^(.*?;)?(user=jason)(;.*)?"
uri:
For traffic going to the ratings
service send all of it to “v1” if:
● It is coming from “v2” the
reviews services
● And the URL path starts
with /ratings/v2
● And the request contains a
cookie with the value
“user=jason”

Redirect Config
Fault Injection
# HTTP Redirect snippet
spec:
destination:
name: ratings
match:
request:
headers:
uri: /v1/getProductRatings
redirect:
uri: /v1/bookRatings
authority: bookratings.default.svc.cluster.local
---
# Fault injection snippet
spec:
destination:
name: reviews
route:
- labels:
version: v1
httpFault:
abort:
percent: 10
httpStatus: 400
HTTP Redirection
● For all requests to
/v1/getProductRatings,
return a 302 with a location
of /v1/bookRatings and
overwrite the
host/authority header.
HTTP Fault injection
● For 10% of requests to v1 of
the reviews service, fail with
a status code of 400
Timeouts, retries, request
rewrites, delays configured
similarly

Pipelines & Control Loops
Deployment
App Dev Build (CI) Containers
Execution
(CD + Release
Automation)
Observe & Control

CI Image RepoCode Repo
Typical CICD pipeline
ClusterDev RW
RW RWRW
RO RW RO

There should be a firewall between CI and CD
CI CD

GitOps separation of concerns
CI tooling
Scope: test, build, publish artifacts
● Runs outside the production cluster
● Read access to code repo
● Read/Write access to image repo
● Read/Write access to integration env
● “Push” based
CD tooling
Scope: reconciliation between git and the cluster
● Runs inside the production cluster
● Read/Write access to config repo
● Read access to image repo
● Read/Write access to production cluster
● “Pull” based

CICode Repo
Kubernetes API
GitOps CICD pipeline
Dev RO
RO
CD OperatorRO
RW
RW
RW
RW Image Repo
Config Repo

GitOps enables security
● The CI tooling can be push based but has no production system
access
● The CD tooling is pull based and retains the production
credentials inside the cluster
● Developers can’t push directly to image registry
● Cluster API & credentials are never exposed/cross boundary
● Encrypted API keys and data storage credentials can be stored in
Git and decrypted at deploy time inside the cluster

Kubernetes: operator pattern
Git
Config
Kubernetes Cluster
Deployment
Service
Deploy
Operator

Write back from Kubernetes to maintain TX audit log
○ Config is code & everything is config (‘declarative infra’)
○ Code (& config!) must be version controlled
○ Anything that does not record changes in version
control is harmful – Git as Audit Log

Atomic Updates
○ Groups of changes are hard
○ Partial success / failure à redeploy cluster?
○ Want atomic update-in-place
○ Operators can do this. It’s really hard with CI scripts.
○ Git as Transaction Log

Example pipeline
Git
Code
Git
Config
Container
Registry
Build
Container
(CI)
Update image in staging config
1/ Code change
2/ Merge
Staging to
Prod
Config Updater
Kubernetes Cluster
Deployment
Service
Deploy
Operator

Typical (not mandatory) Structure of a GitOps repository
● At least 1 repository per application/service
● Config & code in separate repos. Images named via labels.
● Use a separate branch per environment (maps to a Kubernetes
namespace, or cluster)
● Push changes such as the image name, health checks, etc to
staging (or feature) branches first.
● Rolling out to production involves a merge. (use `git merge -s
ours branchname` to skip a set of staging-only changes).
● Use protected branches to enforce code review requirements.

Use declarative configuration to define your application and services.
All changes need to go through your git review process – noone should be using
kubectl directly. (also: don’t push from CI to prod)
Use an operator in the cluster to drive the observed cluster state to the desired
state, as declared by your configuration in git
Summary: Three core principles of GitOps

Cluster updates are a sequence of atomic transactions which succeed or fail
cleanly, and are so easy to do that your team velocity will rocket up
Git provides a transaction log for rollback, audit, and team work
Config and image repos act as a “firewall” between dev and prod, e.g. so that CI
cannot “own production” if hacked.
Summary: Three technical benefits of GitOps

❯ GitOps operational mindset, all
k8s applications stored in Git.
❯ Securely automate & share
secrets publicly
❯ Asymmetric (public key)
cryptography
❯ Encrypt data up to (and inside)
K8s cluster
Bitnami: Encrypt Kubernetes SecretsSealed
Secrets

Validating what happened is PART OF THE DEPLOYMENT

96
Declare
Implement
Automated by
software
agents

97
Declare
Implement
Monitor /
Observe
Automated by
software
agents

98
Declare
Implement
Monitor /
Observe
Default
dashboards
Automated by
software
agents

99
Declare
Implement
Monitor /
Observe
Plan
Automated by
software
agents
Default
dashboards

10
0
Declare
Implement
Monitor /
Observe
Plan
Automated by
software
agents
Default
dashboards

10
1
Declare
Implement
Monitor /
Observe
Plan
Automated by
software
agents
Default
dashboards

10
2
Declare
Implement
Monitor
Plan
Continuous
Deployment
Default
dashboards
Automated by
software
agents

Improving UX is PART OF DEPLOYMENT
• End user happiness is all
• Integrate GitOps CD pipeline with
tools to observe results of PRs
• Developers have to correlate UX
to operational concepts like
monitoring, tracing, logs
• Like doctors, we must be able to
validate health as well as
diagnose problems

Every service should have a uniﬁed interactive dash
(eg. metrics + events + actions; image is from Lyft)

Fundamental
Theorem
ONLY what can be
described and
observed can be
automated and
controlled

Three GitOps Takeaways
• Git push is a great DX – “push code not containers" - best
practice for Kubernetes, Cloud Native & Serverless…
• GitOps is about more than triggering cluster deployment via a
PR, it is a full transactional operating model for the whole
stack. It is “scale invariant” and it uses a control loop to
implement a “joined up” pipeline for delivery and observability
• GitOps is different from CI ops. It is based on ‘firewall’ between
Dev and Ops, it guarantees deployments are correct or fail
cleanly, it integrates with Observability & Control tools

● DIY
● CI ops
● PaaS (Heroku, Cloud Foundry …)
● Dedicated modern CD tools
Choices
10
9

Not EITHER / OR
● Spinnaker
● Helm
● Weave Flux / Weave Cloud
● JenkinsX
● Skaffold
● Gitkube
● Harness
Dedicated tools for app dev and/or cicd
11
0

● Created by Netflix for Netflix
● Jenkins++ CICD tool, with Pipeline Management and Release Management
● Pipelines GUI, nested pipelines, canary as pipeline…
● Designed for VMs – doesn’t “speak Kubernetes” (also: Terraform?)
● Good if your Release model is “Deploy my VMs and start my cluster”
● “CI Ops”, so Not Good if your Release model is atomic updates pulled by operator
● Does not use Git, uses external DB.
● Audit log & desired state not complete
● Generally complicated with lots of moving parts. Operationally burdensome even if
run in Kubernetes
Spinnaker
11
1

● V2 of Kubernetes templating system
● Writes a group of changes as a “chart” – so can be a packaging tool for Kubernetes
● De facto “app API” for Kubernetes – great for getting started
● *** IS NOT A CD TOOL ***
● CI + Helm is a dangerous pattern
● Non-atomic
● Non-deterministic
● Non-compositional
● Tiller
Helm
11
2

● Created for Kubernetes by Weaveworks, will go to CNCF
● Only does Release Management: pull based CD, policy, staging, audit trail
● Works with any CI but *** does not connect to CI ***
● Watches repos. Updates on label & config change, no need for a “full rebuild”
● Kubernetes native – all Kube objects, also Helm, CRDs – make Helm do GitOps
● Secure (if cluster is)
● Orchestrator forces convergent atomic updates on cluster even for group of
changes – succeeds or fails cleanly, no need for full cluster reboot
● COMPLETE record in Git kept in sync. Rollback & roll forward
● Diffs – continually monitors cluster & repo to spot drift
Weave Flux
11
3

● Simple Gitops model for DEV with Kubernetes
● Push to gitkube remote server that lives in your cluster (ie. runs custom git server
inside Kubernetes cluster)
● Runs build for you, instead of CI. Couples continuous build of Docker images &
continuous deployment to the cluster. These should be decoupled.
● Pushes container into Kubernetes, but not Kube objects, not Helm, not CRDs
● Not atomic or idempotent
● No built in monitoring, so deployments may not converge
● Does not track changes in Git
Gitkube
11
4

● Skaffold
● Weave Flux
● Jenkins X
● Minikube
● Docker
Gitops developer toolkit?
11
5

Anything missing?
Developers
(That means YOU)

Thank you!
12
4
Alexis Richardson
alexis@weave.works
@monadic
facebook.com/WeaveworksInc/
twitter.com/weaveworks
slack.weave.works/
youtube.com/c/WeaveWorksInc
linkedin.com/company/weaveworks
@weaveworks
https://weave.works

Continuous Lifecycle London 2018 Event Keynote

More Related Content

What's hot

Similar to Continuous Lifecycle London 2018 Event Keynote

More from Weaveworks

Recently uploaded

Continuous Lifecycle London 2018 Event Keynote