Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Continuous Lifecycle London 2018 Event Keynote

Today it’s all about delivering velocity without compromising on quality, yet it’s becoming increasingly difficult for organisations to keep up with the challenges of current release management and traditional operations. The demand for developers to own the end-to-end delivery, including operational ownership, is increasing. A “you build it, you own it” development process requires tools that developers know and understand. So I’d like to introduce “GitOps”- an agile software lifecycle for modern applications.

In this session, I will discuss these industry challenges, including current CICD trends and how they’re converging with operations and monitoring. I’ll also illustrate the GitOps model, identify best practices and tools to use, and explain how you can benefit from adopting this methodology inherited from best practices going back 10-15 years.

Continuous Lifecycle London 2018 Event Keynote

  1. 1. GitOps Git push all the things Alexis Richardson CEO, Weaveworks TOC Chair, CNCF @monadic May 2018
  2. 2. Hello 2
  3. 3. Hello ● WTF is GitOps ● Why is Cloud Native relevant ● How does GitOps work and in what ways is it different from $MY_DEVOPS ● Tools ● Recap 3
  4. 4. Meet Qordoba ● SF based team use machine learning to create ”local” marketing UX for big brands ● Rapid iteration while obeying SOC2 compliance ● Google Cloud – Kubernetes & CI ● Weave Cloud – single cont. delivery & observability pipeline
  5. 5. Over 30 releases per day per team, up from 1-2 per week across all teams 1) Estimated time needed to fix prod software bugs ~60% less time 2) Estimated time to respond to customer requests ~43% less time 3) Uptime 99% à 100% (so far…!) Impact
  6. 6. Kubernetes: declarative infrastructure & orchestration
  7. 7. Image credit: Helen Beal, Ranger4 At least a decade of DevOps best practices
  8. 8. GitOps is Automation for Cloud Native Describe the system & build to that plan
  9. 9. New ways of working cloud led us to devops cloud native leads us to gitops “push code, not containers” “operations by pull request”
  10. 10. • Config is code • Code must be version controlled • Config must be version controlled too GitOps follows the Logic of DevOps
  11. 11. GitOps follows the Logic of DevOps • Config is code • Code must be version controlled • Config must be version controlled too • What can be described can be automated • Describe everything: code, config, monitoring & policy; and then keep it in version control
  12. 12. GitOps • Git as a source of truth for desired state of whole system yes really the whole system • Control loop compares desired with actual state to pull changes, enforce convergent atomic updates and writeback to log in Git • Diff alerts, eg.:
  13. 13. Atomic updates for declarative stack Developer experience is just Git push Best practice for Continuous Delivery with Kubernetes Kubernetes Current State via Observability Tools Control & Operations Desired State in Git Diff Observe Orient Decide Act Release
  14. 14. What this gets us • Any developer can use GitHub • Anyone can join team and ship a new app or make changes easily • All changes can be triggered, stored, audited and validated in Git And we didn’t have to do anything very new or clever
  15. 15. “The world is envisioned as a repo and not as a kubernetes installation" - Kelsey Hightower Kubernetes ❤ GitOps
  16. 16. Kubernetes is complex, ideally you’d like to… Make a pull request & just go to a URL to see app change Avoid kubectl Have “Bonus points for Metrics… If you give people visibility, they will stop asking for tools like kubectl to do their job, because now they can actually observe what’s happening in the cluster”
  17. 17. Who is talking about or doing GitOps? Weaveworks Cloudbees Bitnami OpenFaaS Hasura Ocado Financial Times & more!
  18. 18. 19 About Weaveworks ● Founded in 2014, backed by Google Ventures & Accel Partners ● Mission: help software teams go faster by providing technologies that support cloud native development
  19. 19. 20 ● 40 people ● Berlin ● London ● San Francisco Team
  20. 20. 21 Team Some of us are known for...
  21. 21. ● Building cloud-native OSS since 2014 (Weave Net, Moby, Kubernetes, Prometheus) ● Founding member of CNCF ● Alexis Richardson (Weaveworks CEO) is chair of the CNCF Technical Oversight Committee ● Weave Cloud runs on Kubernetes since 2015 22 About Weaveworks
  22. 22. • We use declarative infrastructure ie. Kubernetes, Docker, Terraform, … and we “diff all the things” • Our entire system including code, config, monitoring rules, dashboards, is described in GitHub with full audit trail • We roll out major or minor changes as pull requests for any updates, outages and D/R GitOps at Weaveworks
  23. 23. Cloud Native
  24. 24. Cloud Native
  25. 25. Cloud Native
  26. 26. Copenhagen: Home of Lego
  27. 27. Home of Lego
  28. 28. CNCF in 2016
  29. 29. CNCF in 2018
  30. 30. CNCF is building a cloud platform ● Goal of a Cloud Platform for era of ubiquitous services à a bigger deal than the Web à open like Linux à everyone is on board this time ● Business Peeps TLDR Cloud Native is Cloud ● Outcome: Innovation and new Business Models for make profit
  31. 31. Velocity
  32. 32. Hadoop Typical Hadoop Project 2013
  33. 33. 2018: Kubeflow
  34. 34. Componentisation
  35. 35. Componentisation
  36. 36. No platform? Who wants to build a toaster?
  37. 37. Platforms enable Velocity ● Higher speed ● Lower barriers to entry ● Explosion of higher order systems
  38. 38. Velocity is a key metric in Continuous Delivery High-performing teams deploy more frequently and have much faster lead times They make changes with fewer failures, and recover faster from failures 200x more frequent deployments 2,555x shorter lead times 3x lower change failure rate 24x faster recovery from failures 200x 2,555x 3x 24x Source: 2016 State of DevOps Report (Puppet Labs)
  39. 39. Make me a Velocity Developers write code that powers Applications and integrates Services deployed to a Cloud Platform that is easy, stable & operable using best practices for Continuous Delivery at high velocity
  40. 40. New Cloud Platform “Just run my code” Kubernetes Infra - Cloud & DCs & Edge Other CNCF Projects Local Services & Data Code >> Containers >>
  41. 41. 1000s of ways to “Just Run My Code” ● Serverless: Openfaas, Kubeless, OpenEvents, AWS Lambda…. ● PaaS (Openshift, Cloud Foundry..), MBaaS, KMaaS, .. ● Kubeflow, Istio, Pachyderm and other k8s native app f/works ● Declarative app def eg compose, ksonnet, ballerina ● Native general frameworks: metaparticle ● Ports: Laravel (PHP!) and other app frameworks to Kube ● Tools: Cert-manager, ChaosIQ, .. ● Explosion of higher order systems is caused by platform
  42. 42. Serverless & Kubernetes will converge ● Ubiquity of Kubernetes will pull serverless into the story - from “run my containers” to “run my code” ● Consumption and packaging of services is where serverless and functions add value today, and will be part of the Platform. AWS Lambda is a “clue” not the “answer”. ● Commonly used programming tools will unify Kubernetes, containers, “serverless”, managed services / APIs ● These models will be cloud agnostic ● The “pay per call” serverless business model will just be a feature of the cloud platform management layer (eg: AWS Fargate)
  43. 43. Getting to a Cloud Platform 2017 2018-20 2020+ Core Platform - Kubernetes & containers Observability / Operability - monitoring (prom.) - logging (fluentd) - tracing (jaeger, OT) Routing - mesh (envoy, linkerd) - messaging (nats) Security: Spiffe, OPA, SAFE Storage: - orchestration - CSI - other Interfaces: - OpenMetrics - OpenEvents Developer On Ramp: CICD, Helm packaging, &c Marketplace of Services and other Add-ons “Just run my code” user experiences for 1000s of different use cases >> Towards Ubiquity
  44. 44. Cloud native – just run my code
  45. 45. Practice Tribes gotta tribe
  46. 46. New ways of working cloud led us to devops cloud native leads to gitops “push code not containers” “operations by pull request”
  47. 47. Summary ● Cloud Platform powered by CNCF tools, Kubernetes at the core ● Multi Cloud support: Amazon, Azure, OSS ● Explosion of higher order tools and services ● GitOps for high velocity delivery pipeline
  48. 48. So about GitOps
  49. 49. ● Why Git ● Examples of what’s in Git (and image repo) ● CICD pipeline ● Security, Compliance & Audit ● Observability & Control ● Tools Overview GitOps in depth 55
  50. 50. GitOps builds on DevOps with Git as a single source of truth for the desired state of the system ● The entire system state is under version control and described in Git (trunk best) ● Operational changes on production clusters are made by pull request ● Rollback and audit logs are provided via Git ● When disaster strikes, the whole infrastructure can be quickly restored from Git
  51. 51. 57
  52. 52. 58 Canonical source of truth
  53. 53. 59 Canonical source of truth People
  54. 54. 60 Canonical source of truth People Software Agents
  55. 55. 61 Canonical source of truth People Software Agents Software Agents
  56. 56. 62 Canonical source of truth People Software Agents Software Agents
  57. 57. 63 Canonical source of truth Clear model with strong separations of concerns (safety) Easy rollbacks and reverts (velocity) Tapping into existing code review tools and processes Great compliance tool Collaboration point between software and humans
  58. 58. 64 ?
  59. 59. Dashboards Alerts Playbook Kubernetes Manifests Application configuration Provisioning scripts 65 Application checklists Recording Rules Sealed Secrets
  60. 60. 66
  61. 61. 67 Grafanalib dashboard library
  62. 62. 68 YAML Service Checklist
  63. 63. Destination config apiVersion: kind: DestinationPolicy metadata: name: ratings-lb-policy namespace: default spec: destination: name: reviews labels: version: v1 loadBalancing: name: ROUND_ROBIN circuitBreaker: simpleCb: maxConnections: 100 httpMaxRequests: 1000 httpMaxRequestsPerConnection: 10 httpConsecutiveErrors: 7 sleepWindow: 15m httpDetectionInterval: 5m RANDOM, LEAST_CONN Limits outgoing connections to “v1” of the reviews service ● 100 connections ● 1000 concurrent requests ● 10 rps Load-balances in round-robin fashion across all reviews “v1” endpoints Configures host ejection ● 7 consecutive 5xx errors ● Period of 15 minutes ● Scanned every 5 minutes
  64. 64. Egress config apiVersion: kind: EgressRule metadata: name: foo-egress-rule spec: destination: service: * ports: - port: 80 protocol: http - port: 443 protocol: https Provides access to a set of services under the domain. Sidecar will handle automatically upgrading connection to TLS, if desired. ● Must access as HTTP ● Example:
  65. 65. Routing config apiVersion: kind: RouteRule metadata: name: reviews-rating-jason-rule namespace: default spec: destination: name: ratings route: - labels: version: v1 weight: 100 match: source: name: reviews labels: version: v2 request: headers: cookie: regex: "^(.*?;)?(user=jason)(;.*)?" uri: For traffic going to the ratings service send all of it to “v1” if: ● It is coming from “v2” the reviews services ● And the URL path starts with /ratings/v2 ● And the request contains a cookie with the value “user=jason”
  66. 66. Redirect Config Fault Injection # HTTP Redirect snippet spec: destination: name: ratings match: request: headers: uri: /v1/getProductRatings redirect: uri: /v1/bookRatings authority: bookratings.default.svc.cluster.local --- # Fault injection snippet spec: destination: name: reviews route: - labels: version: v1 httpFault: abort: percent: 10 httpStatus: 400 HTTP Redirection ● For all requests to /v1/getProductRatings, return a 302 with a location of /v1/bookRatings and overwrite the host/authority header. HTTP Fault injection ● For 10% of requests to v1 of the reviews service, fail with a status code of 400 Timeouts, retries, request rewrites, delays configured similarly
  67. 67. Pipelines & Security 73
  68. 68. Pipelines & Control Loops Deployment App Dev Build (CI) Containers Execution (CD + Release Automation) Observe & Control
  69. 69. CI Image RepoCode Repo Typical CICD pipeline ClusterDev RW RW RWRW RO RW RO
  70. 70. There should be a firewall between CI and CD CI CD
  71. 71. GitOps separation of concerns CI tooling Scope: test, build, publish artifacts ● Runs outside the production cluster ● Read access to code repo ● Read/Write access to image repo ● Read/Write access to integration env ● “Push” based CD tooling Scope: reconciliation between git and the cluster ● Runs inside the production cluster ● Read/Write access to config repo ● Read access to image repo ● Read/Write access to production cluster ● “Pull” based
  72. 72. CICode Repo Kubernetes API GitOps CICD pipeline Dev RO RO CD OperatorRO RW RW RW RW Image Repo Config Repo
  73. 73. GitOps enables security ● The CI tooling can be push based but has no production system access ● The CD tooling is pull based and retains the production credentials inside the cluster ● Developers can’t push directly to image registry ● Cluster API & credentials are never exposed/cross boundary ● Encrypted API keys and data storage credentials can be stored in Git and decrypted at deploy time inside the cluster
  74. 74. CI ops 80
  75. 75. Kubernetes: operator pattern Git Config Kubernetes Cluster Deployment Service Deploy Operator
  76. 76. Write back from Kubernetes to maintain TX audit log ○ Config is code & everything is config (‘declarative infra’) ○ Code (& config!) must be version controlled ○ Anything that does not record changes in version control is harmful – Git as Audit Log
  77. 77. Atomic Updates ○ Groups of changes are hard ○ Partial success / failure à redeploy cluster? ○ Want atomic update-in-place ○ Operators can do this. It’s really hard with CI scripts. ○ Git as Transaction Log
  78. 78. Example pipeline Git Code Git Config Container Registry Build Container (CI) Update image in staging config 1/ Code change 2/ Merge Staging to Prod Config Updater Kubernetes Cluster Deployment Service Deploy Operator
  79. 79. Typical (not mandatory) Structure of a GitOps repository ● At least 1 repository per application/service ● Config & code in separate repos. Images named via labels. ● Use a separate branch per environment (maps to a Kubernetes namespace, or cluster) ● Push changes such as the image name, health checks, etc to staging (or feature) branches first. ● Rolling out to production involves a merge. (use `git merge -s ours branchname` to skip a set of staging-only changes). ● Use protected branches to enforce code review requirements.
  80. 80. Staging
  81. 81. Use declarative configuration to define your application and services. All changes need to go through your git review process – noone should be using kubectl directly. (also: don’t push from CI to prod) Use an operator in the cluster to drive the observed cluster state to the desired state, as declared by your configuration in git Summary: Three core principles of GitOps
  82. 82. Cluster updates are a sequence of atomic transactions which succeed or fail cleanly, and are so easy to do that your team velocity will rocket up Git provides a transaction log for rollback, audit, and team work Config and image repos act as a “firewall” between dev and prod, e.g. so that CI cannot “own production” if hacked. Summary: Three technical benefits of GitOps
  83. 83. ❯ GitOps operational mindset, all k8s applications stored in Git. ❯ Securely automate & share secrets publicly ❯ Asymmetric (public key) cryptography ❯ Encrypt data up to (and inside) K8s cluster Bitnami: Encrypt Kubernetes SecretsSealed Secrets
  84. 84. Observability & Control 91
  85. 85. Validating what happened is PART OF THE DEPLOYMENT
  86. 86. 94 Declare
  87. 87. 95 Declare Implement
  88. 88. 96 Declare Implement Automated by software agents
  89. 89. 97 Declare Implement Monitor / Observe Automated by software agents
  90. 90. 98 Declare Implement Monitor / Observe Default dashboards Automated by software agents
  91. 91. 99 Declare Implement Monitor / Observe Plan Automated by software agents Default dashboards
  92. 92. 10 0 Declare Implement Monitor / Observe Plan Automated by software agents Default dashboards
  93. 93. 10 1 Declare Implement Monitor / Observe Plan Automated by software agents Default dashboards
  94. 94. 10 2 Declare Implement Monitor Plan Continuous Deployment Default dashboards Automated by software agents
  95. 95. Improving UX is PART OF DEPLOYMENT • End user happiness is all • Integrate GitOps CD pipeline with tools to observe results of PRs • Developers have to correlate UX to operational concepts like monitoring, tracing, logs • Like doctors, we must be able to validate health as well as diagnose problems
  96. 96. Every service should have a unified interactive dash (eg. metrics + events + actions; image is from Lyft)
  97. 97. Fundamental Theorem ONLY what can be described and observed can be automated and controlled
  98. 98. Three GitOps Takeaways • Git push is a great DX – “push code not containers" - best practice for Kubernetes, Cloud Native & Serverless… • GitOps is about more than triggering cluster deployment via a PR, it is a full transactional operating model for the whole stack. It is “scale invariant” and it uses a control loop to implement a “joined up” pipeline for delivery and observability • GitOps is different from CI ops. It is based on ‘firewall’ between Dev and Ops, it guarantees deployments are correct or fail cleanly, it integrates with Observability & Control tools
  99. 99. FASTER, BETTER & SAFER 10 7
  100. 100. Tools? 10 8
  101. 101. ● DIY ● CI ops ● PaaS (Heroku, Cloud Foundry …) ● Dedicated modern CD tools Choices 10 9
  102. 102. Not EITHER / OR ● Spinnaker ● Helm ● Weave Flux / Weave Cloud ● JenkinsX ● Skaffold ● Gitkube ● Harness Dedicated tools for app dev and/or cicd 11 0
  103. 103. ● Created by Netflix for Netflix ● Jenkins++ CICD tool, with Pipeline Management and Release Management ● Pipelines GUI, nested pipelines, canary as pipeline… ● Designed for VMs – doesn’t “speak Kubernetes” (also: Terraform?) ● Good if your Release model is “Deploy my VMs and start my cluster” ● “CI Ops”, so Not Good if your Release model is atomic updates pulled by operator ● Does not use Git, uses external DB. ● Audit log & desired state not complete ● Generally complicated with lots of moving parts. Operationally burdensome even if run in Kubernetes Spinnaker 11 1
  104. 104. ● V2 of Kubernetes templating system ● Writes a group of changes as a “chart” – so can be a packaging tool for Kubernetes ● De facto “app API” for Kubernetes – great for getting started ● *** IS NOT A CD TOOL *** ● CI + Helm is a dangerous pattern ● Non-atomic ● Non-deterministic ● Non-compositional ● Tiller Helm 11 2
  105. 105. ● Created for Kubernetes by Weaveworks, will go to CNCF ● Only does Release Management: pull based CD, policy, staging, audit trail ● Works with any CI but *** does not connect to CI *** ● Watches repos. Updates on label & config change, no need for a “full rebuild” ● Kubernetes native – all Kube objects, also Helm, CRDs – make Helm do GitOps ● Secure (if cluster is) ● Orchestrator forces convergent atomic updates on cluster even for group of changes – succeeds or fails cleanly, no need for full cluster reboot ● COMPLETE record in Git kept in sync. Rollback & roll forward ● Diffs – continually monitors cluster & repo to spot drift Weave Flux 11 3
  106. 106. ● Simple Gitops model for DEV with Kubernetes ● Push to gitkube remote server that lives in your cluster (ie. runs custom git server inside Kubernetes cluster) ● Runs build for you, instead of CI. Couples continuous build of Docker images & continuous deployment to the cluster. These should be decoupled. ● Pushes container into Kubernetes, but not Kube objects, not Helm, not CRDs ● Not atomic or idempotent ● No built in monitoring, so deployments may not converge ● Does not track changes in Git Gitkube 11 4
  107. 107. ● Skaffold ● Weave Flux ● Jenkins X ● Minikube ● Docker Gitops developer toolkit? 11 5
  108. 108. Weave Cloud 11 6
  109. 109. Commercial 11 7
  110. 110. 11 8
  111. 111. 11 9
  112. 112. 12 0
  113. 113. 12 1
  114. 114. Anything missing?
  115. 115. Anything missing? Developers (That means YOU)
  116. 116. Thank you! 12 4 Alexis Richardson @monadic @weaveworks