Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech & Innovation Day

Talk held on 2019-09-26 in Paderborn:

Die Keynote:
Warum Kubernetes? Cloud Native und Developer Experience bei Zalando

Kubernetes hat sich als defacto Standard for Cloud Native Plattformen durchgesetzt. Warum? Welche Vorteile und Fallstricke gibt es in der Praxis?
Henning Jacobs zeigt am Beispiel von Zalando wie Kubernetes als Infrastruktur für 1200+ Entwickler dient, welche Aspekte Kubernetes trotz seiner Komplexität einzigartig machen, und was das für die Developer.
Experience bedeutet.

Henning Jacobs ist der Head of Developer Productivity bei Zalando und damit verantwortlich für die Developer Experience von mehr als 200 Zalando Delivery Teams.
Das Kubernetes eine hervorragende Plattform für den Erfahrungsaustausch darstellt, zeigt Henning mit seiner Liste von Kubernetes Failure Stories.

https://teuto.net/owl-tech-innovation-day/

  • Login to see the comments

Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech & Innovation Day

  1. 1. OWL TECH & INNOVATION DAY 2019-09-26 HENNING JACOBS @try_except_ Why Kubernetes?
  2. 2. 2 ROLLING OUT KUBERNETES? "We are rolling out Kubernetes to production next month and I'm interested to hear from people who made that step already."
  3. 3. 3 DON'T USE IT !!!!!
  4. 4. 4 DON'T USE IT !!!!!
  5. 5. 5
  6. 6. 6 KUBERNETES FAILURE STORIES
  7. 7. 7 ZALANDO AT A GLANCE ~ 5.4billion EUR revenue 2018 > 250 million visits per month > 15.000 employees in Europe > 79% of visits via mobile devices > 26 million active customers > 300.000 product choices ~ 2.000 brands 17 countries
  8. 8. 8 A BRIEF HISTORY OF ZALANDO TECH
  9. 9. 9 2010 "Sysop-Test" "QA-Test"
  10. 10. 10 2013: SELF SERVICE
  11. 11. 11 2015: RADICAL AGILITY AWS STUPS DOCKER DEPLOY SSH ACCESS AUDIT REPORTS FULL AWS ACCESS Teams have admin access & full responsibility
  12. 12. 12 2015: ISOLATED AWS ACCOUNTS Internet *.abc.example.org *.xyz.example.org Team ABC Team XYZ EC2EC2 ELBELB EC2
  13. 13. 13 2019: SCALE 140Clusters 396Accounts
  14. 14. 14 2019: DEVELOPERS USING KUBERNETES
  15. 15. 15 Platform > 1100 developers > 200 development teams
  16. 16. 16 YOU BUILD IT, YOU RUN IT The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. - A Conversation with Werner Vogels, ACM Queue, 2006
  17. 17. 17 ON-CALL: YOU OWN IT, YOU RUN IT When things are broken, we want people with the best context trying to fix things. - Blake Scrivener, Netflix SRE Manager
  18. 18. 18 DEVELOPER JOURNEY Consistent story that models all aspects of SW dev
  19. 19. 19 Developer Journey
  20. 20. 20 Developer Journey Correctness Compliance GDPR Security Cost Efficiency 24x7 On Call Governance Resilience Capacity ...
  21. 21. 21 DEVELOPER PRODUCTIVITY Code Build Test Deploy OperateSetup Cloud Native Application Runtime
  22. 22. 22 CLOUD NATIVE .. uses an open source software stack to deploy applications as microservices, packaging each part into its own container, and dynamically orchestrating those containers to optimize resource utilization. Cloud native technologies enable software developers to build great products faster. - https://www.cncf.io/
  23. 23. 23 CONTAINERS END-TO-END Code Build Test Deploy OperateSetup Cloud Native Application Runtime
  24. 24. 24 CONTAINERS
  25. 25. 25 CONTAINERS
  26. 26. 27 PLAN & SETUP
  27. 27. 28 Plan Stories Rules of Play Tech Radar
  28. 28. 30 Setup Application Bootstrapping
  29. 29. 33 BUILD & TEST
  30. 30. 34 CDPGit code push CONTINUOUS DELIVERY PLATFORM: BUILD
  31. 31. 36 DEPLOY
  32. 32. 37 Deploy Kubernetes
  33. 33. 38 DEPLOYMENT CONFIGURATION ├── deploy/apply │ ├── deployment.yaml │ ├── credentials.yaml # Zalando IAM │ ├── ingress.yaml │ └── service.yaml └── delivery.yaml # Zalando CI/CD
  34. 34. 39 INGRESS.YAML kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80
  35. 35. 40 TEMPLATING: MUSTACHE kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "{{{APPLICATION}}}.example.org" http: paths: - backend: serviceName: "{{{APPLICATION}}}" servicePort: 80
  36. 36. 41 CONTINUOUS DELIVERY PLATFORM
  37. 37. 42 CDP: DEPLOY "glorified kubectl apply"
  38. 38. 43 CDP: OPTIONAL APPROVAL
  39. 39. 44 STACKSET: TRAFFIC SWITCHING github.com/zalando-incubator/stackset-controller
  40. 40. 45 STACKSET CRD kind: StackSet ... spec: ingress: hosts: ["foo.example.org"] backendPort: 8080 stackLifecycle: scaledownTTLSeconds: 1800 limit: 5 stackTemplate: spec: podTemplate: ... github.com/zalando-incubator/stackset-controller
  41. 41. 46 TRAFFIC SWITCHING STEPS IN CDP github.com/zalando-incubator/stackset-controller
  42. 42. 47 Deploy You build it, you run it!
  43. 43. 48 EMERGENCY ACCESS SERVICE Emergency access by referencing Incident zkubectl cluster-access request --emergency -i INC REASON Privileged production access via 4-eyes zkubectl cluster-access request REASON zkubectl cluster-access approve USERNAME
  44. 44. 49 KUBERNETES WEB VIEW kubectl get pods,stacks,deploys,..
  45. 45. 50 SEARCHING ACROSS 140+ CLUSTERS codeberg.org/hjacobs/kube-web-view
  46. 46. 51 INTEGRATIONS
  47. 47. 52 CLOUD FORMATION VIA CI/CD ├── deploy/apply │ ├── deployment.yaml # Kubernetes │ ├── cf-iam-role.yaml # AWS IAM Role │ ├── cf-rds.yaml # AWS RDS Database │ ├── kube-ingress.yaml │ ├── kube-secret.yaml │ └── kube-service.yaml └── delivery.yaml # CI/CD config "Infrastructure as Code"
  48. 48. 53 POSTGRES OPERATOR Application to manage PostgreSQL clusters on Kubernetes >500 clusters running on Kubernetes github.com/zalando/postgres-operator
  49. 49. Elasticsearch in Kubernetes Elasticsearch 2.500 vCPUs 1 TB RAM github.com/zalando-incubator/es-operator/
  50. 50. 55 SUMMARY • Application Bootstrapping • Git as source of truth and UI • 4-eyes principle for master/production • Extensible Kubernetes API as primary interface • OAuth/IAM credentials • PostgreSQL, Elasticsearch • CloudFormation for proprietary AWS services
  51. 51. 56 MONITORING & COST EFFICIENCY
  52. 52. 57 KUBERNETES RESOURCE REPORT github.com/hjacobs/kube-resource-report
  53. 53. 58 RESOURCE REPORT: TEAMS Sorting teams by Slack Costs github.com/hjacobs/kube-resource-report
  54. 54. 59 KUBERNETES APPLICATION DASHBOARD
  55. 55. https://github.com/hjacobs/kube-ops-view
  56. 56. 61 VERTICAL POD AUTOSCALER limit/requests adapted by VPA
  57. 57. 62 DOWNSCALING DURING OFF-HOURS github.com/hjacobs/kube-downscaler Weekend
  58. 58. 63 KUBERNETES JANITOR ● TTL and expiry date annotations, e.g. ○ set time-to-live for your test deployment ● Custom rules, e.g. ○ delete everything without "app" label after 7 days github.com/hjacobs/kube-janitor
  59. 59. 64 EC2 SPOT NODES 72% savings
  60. 60. 65 STABILITY ↔ EFFICIENCY Slack Autoscaling Buffer Disable Overcommit Cluster Overhead Resource Report HPA VPA Downscaler Janitor EC2 Spot
  61. 61. 66 DELIVERY PERFORMANCE METRICS • Lead Time • Release Frequency • Time to Restore Service • Change Fail Rate srcco.de/posts/accelerate-software-delivery-performance.html
  62. 62. 67 CONTAINERS From "Accelerate: The Science of Lean Software and DevOps"
  63. 63. 68 DELIVERY PERFORMANCE METRICS • Lead Time • Release Frequency • Time to Restore Service • Change Fail Rate ≙ Commit to Prod ≙ Deploys/week/dev ≙ MTRS from incidents ≙ n/a
  64. 64. “.. means establishing empathy with internal consumers (read: developers) and collaborating with them on the design. Platform product managers establish roadmaps and ensure the platform delivers value to the business and enhances the developer experience.” - ThoughtWorks Technology Radar
  65. 65. 71 DEVELOPER SATISFACTION
  66. 66. 72 DOCUMENTATION "Documentation is hard to find" "Documentation is not comprehensive enough" "Remove unnecessary complexity and obstacles." "Get the documentation up to date and prepare use cases" "More and more clear documentation" "More detailed docs, example repos with more complicated deployments."
  67. 67. 73 DOCUMENTATION • Restructure following https://www.divio.com/en/blog/documentation/ • Concepts • How Tos • Tutorials • Reference • Global Search • Weekly Health Check: Support → Documentation
  68. 68. 75 WHY KUBERNETES?
  69. 69. 76 WHY KUBERNETES? • provides enough abstractions (StatefulSet, CronJob, ..) • provides consistency (API spec/status) • is extensible (annotations, CRDs, API aggreg.) • certain compatibility guarantee (versioning) • widely adopted (all cloud providers) • works across environments and implementations srcco.de/posts/why-kubernetes.html
  70. 70. 77 WHY KUBERNETES? • Efficiency • Common Operational Model • Developer Experience • Cloud Provider Independent • Compliance and Security • Talent (for Zalando)
  71. 71. 78 KUBERNETES FAILURE STORIES • Learning about production pitfalls! • Availability bias? https://k8s.af
  72. 72. 79 FACTFULNESS Things can be both better and bad! How would failure stories for your non-K8s infra look like? https://k8s.af
  73. 73. 80 COMPLEXITY FOR GOOGLE-SCALE INFRA? • Managed DO cluster: 4 minutes • K3s single node: 2 minutes demo.j-serv.de
  74. 74. 81 DE-FACTO STANDARD, EXTENSIBLE API
  75. 75. 82
  76. 76. 83 MAYBE THAT'S GOOD?
  77. 77. 84 OPEN SOURCE & MORE Kubernetes on AWS github.com/zalando-incubator/kubernetes-on-aws Skipper HTTP Router & Ingress controller github.com/zalando/skipper External DNS github.com/kubernetes-incubator/external-dns Postgres Operator github.com/zalando-incubator/postgres-operator More Zalando Tech Talks github.com/zalando/public-presentations
  78. 78. QUESTIONS? HENNING JACOBS SENIOR PRINCIPAL henning@zalando.de @try_except_ Illustrations by @01k

×