Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent

Many clusters, many problems? Having many clusters has benefits: reduced blast radius, less vertical scaling of cluster components, and a natural trust boundary. In this session, Zalando shows its approach for running 140+ clusters on AWS, how it does continuous delivery for its cluster infrastructure, and how it created open-source tooling to manage cost efficiency and improve developer experience. The company openly shares its failures and the learnings collected during three years of Kubernetes in production.

AWS re:Invent session OPN211 on 2019-12-05

  • Login to see the comments

How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent

  1. 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How Zalando runs Kubernetes clusters at scale on AWS Henning Jacobs OPN211 Senior Principal Zalando SE
  2. 2. 3 THE EUROPEAN ONLINE PLATFORM FOR FASHION
  3. 3. 4 ~ 5.4billion EUR revenue 2018 > 300 million visits per month ~ 14,000 employees in Europe > 80% of visits via mobile devices > 28 million active customers > 400,000 product choices > 2,000 brands 17 countries as of June 2019 ZALANDO AT A GLANCE
  4. 4. 5 2015: JOURNEY INTO THE CLOUD AWS STUPS DOCKER DEPLOY SSH ACCESS AUDIT REPORTS FULL AWS ACCESS Teams have admin access & full responsibility
  5. 5. 6 2015: ISOLATED AWS ACCOUNTS Internet *.abc.example.org *.xyz.example.org Team ABC Team XYZ EC2EC2 ELBELB EC2
  6. 6. 7 INFRASTRUCTURE @ ZALANDO STUPS (toolset around AWS) Kubernetes AWS accounts per team. All instances must run the same AMI. PowerUser access to Production. Clusters per product (multiple teams). Instances are not managed by teams. Hands off approach. You build it, you run EVERYTHING. A lot of stuff out of the box.
  7. 7. 8 2019: SCALE 140Clusters 396Accounts
  8. 8. 9 2019: DEVELOPERS USING KUBERNETES
  9. 9. 10 Platform > 1100 developers > 200 development teams
  10. 10. 11 YOU BUILD IT, YOU RUN IT The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. - A Conversation with Werner Vogels, ACM Queue, 2006
  11. 11. 12 ON-CALL: YOU OWN IT, YOU RUN IT When things are broken, we want people with the best context trying to fix things. - Blake Scrivener, Netflix SRE Manager
  12. 12. 13 GOALS • No manual operations • No pet clusters • Reliability • Autoscaling • Latest Kubernetes • Cost efficient
  13. 13. 14 ARCHITECTURE Pairs of clusters, each cluster in isolated account AWS Acc. foobar-test Cluster foobar-test AWS Acc. foobar Cluster foobar
  14. 14. 15 CloudFormation stacks, node pools w/ self-baked Ubuntu AMI ARCHITECTURE etcd Master Nodes Worker Nodes
  15. 15. 16 ARCHITECTURE Master Nodes Worker Nodes https://cluster-id.example.org AWS ELB AZ a AZ b AZ c
  16. 16. 17 CLUSTER METADATA (CLUSTER-REGISTRY) clusters: - id: “cluster-id” api_server_url: “https://cluster-id.example.org” config_items: Key: “value” environment: “test” region: “eu-central-1” lifecycle_status: “ready” node_pools: - name: “worker-pool” instance_type: “m5.large” min_size: 3 max_size: 20
  17. 17. 18 CLUSTER CONFIGURATION github.com/zalando-incubator/kubernetes-on-aws cluster ├── cluster.yaml # Kubernetes cluster stack ├── etcd-cluster.yaml # etcd cluster stack ├── manifests │ ├── ... └── node-pools # master/worker nodes ├── ...
  18. 18. 19 KUBERNETES CLUSTER MANIFESTS github.com/zalando-incubator/kubernetes-on-aws
  19. 19. 20 CLUSTER LIFECYCLE MANAGER (CLM) github.com/zalando-incubator/cluster-lifecycle-manager
  20. 20. 21 CLUSTER UPGRADE FLOW
  21. 21. 22 CLUSTER CHANNELS github.com/zalando-incubator/kubernetes-on-aws Channel Description Clusters dev Development and playground clusters 3 alpha Main infrastructure cluster (important to us) 1 beta Non-prod clusters for the rest of the org 65+ stable Production clusters. 65+
  22. 22. 23 E2E TESTS ON EVERY PR github.com/zalando-incubator/kubernetes-on-aws
  23. 23. 24 E2E TESTS Conformance Tests Upstream Kubernetes e2e conformance tests ✓ 159 Zalando Tests (custom) Custom tests for ingress, external-dns, PSP etc. 17 StatefulSet Tests Rolling update of stateful sets including volume mounting ✓ 2 ✓
  24. 24. 25 RUNNING E2E TESTS Control plane nodenode Control plane nodenode branch: alpha (base) branch: dev (head) Create Cluster Update Cluster Run e2e tests Delete Cluster Testing dev to alpha upgrade Control plane Control plane
  25. 25. 26 UPGRADING NODES
  26. 26. 27 NAÏVE NODE UPGRADE STRATEGY Auto Scaling Group Min: 3 Max: 9 Current: 5 Desired: 5
  27. 27. 28 NAÏVE NODE UPGRADE STRATEGY Auto Scaling Group Min: 6 Max: 6 Current: 5 Desired: 6 Set ASG size to current + 1
  28. 28. 29 NAÏVE NODE UPGRADE STRATEGY Auto Scaling Group Min: 6 Max: 6 Current: 6 Desired: 6 drain Get a new instance drain
  29. 29. 30 PROBLEMS WITH THE NAÏVE STRATEGY What about stateful applications like Postgres? Node master Node Node replica replica drain Postgres cluster unavailable :(
  30. 30. 31 STATEFUL WORKLOADS (POSTGRES)
  31. 31. 32 POSTGRES OPERATOR github.com/zalando-incubator/postgres-operator Node pg role=master Node pg role=replica Node pg role=replica Node postgres operator Evict ✘ evict pg role=replica promote role=masterrole=replica drain ✓
  32. 32. 33 POSTGRES OPERATOR github.com/zalando-incubator/postgres-operator apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: "postgres-cluster" spec: minAvailable: 1 selector: matchLabels: application: “postgres-cluster” role: “master”
  33. 33. 34 ROLLING UPGRADE OF NODES Node Pool az-1a PVs PreferNoSchedule drain az-1b PVs az-1c PVs PreferNoSchedule PreferNoSchedule PreferNoSchedule
  34. 34. 35 POSTGRES OPERATOR Application to manage PostgreSQL clusters on Kubernetes >500 clusters running on Kubernetes github.com/zalando/postgres-operator
  35. 35. Elasticsearch in Kubernetes Elasticsearch 2.500 vCPUs 1 TB RAM github.com/zalando-incubator/es-operator/
  36. 36. 37 SLAS FOR CLUSTER UPDATES • Respect PodDisruptionBudgets • Force-terminate Pods after 3 days (or 8h on test) • Cluster updates can be blocked anytime! zkubectl cluster-update block [+ REASON]
  37. 37. 38 DEPLOY & USER INTERFACE
  38. 38. 39 APP DEPLOYMENT CONFIGURATION ├── deploy/apply │ ├── deployment.yaml │ ├── credentials.yaml # Zalando IAM │ ├── ingress.yaml │ └── service.yaml └── delivery.yaml # Zalando CI/CD
  39. 39. 40 APP INGRESS.YAML kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80
  40. 40. 41 CONTINUOUS DELIVERY PLATFORM
  41. 41. 42 CDP: DEPLOY "glorified kubectl apply"
  42. 42. 43 EMERGENCY ACCESS SERVICE Emergency access by referencing Incident zkubectl cluster-access request --emergency -i INC REASON Privileged production access via 4-eyes zkubectl cluster-access request REASON zkubectl cluster-access approve USERNAME
  43. 43. 44 KUBERNETES WEB VIEW kubectl get pods,stacks,deploys,..
  44. 44. 45 SEARCHING ACROSS 140+ CLUSTERS codeberg.org/hjacobs/kube-web-view
  45. 45. codeberg.org/hjacobs/kube-web-view
  46. 46. 47 UPGRADE TO KUBERNETES 1.14 "Found 1223 rows for 1 resource type in 148 clusters in 3.301 seconds."
  47. 47. 48 SOME USE CASES All Pending Pods across all clusters
  48. 48. 49 AVOIDING CONFIGURATION DRIFT
  49. 49. 50 CLUSTER CONFIGURATION Clusters look mostly the same, except: • secrets, e.g. credentials for external logging provider • node pools and their instance sizes Cluster-specific config items are stored in Cluster Registry
  50. 50. 51 CLUSTER AUTOSCALER
  51. 51. 52 VERTICAL POD AUTOSCALER • Prometheus • External DNS • Heapster / Metrics Server • our ALB Ingress Controller CPU/memory
  52. 52. 53 VERTICAL POD AUTOSCALER
  53. 53. 54 MONITORING & COST EFFICIENCY
  54. 54. 55 MONITORING SYSTEM - ZMON • Dynamic entity registration (clusters, pods, ..) • Generic checks on entity attributes, e.g. for all production clusters "Less than 60% of worker nodes are ready" • OpsGenie alerts
  55. 55. 56 OPENTRACING
  56. 56. 57 KUBERNETES RESOURCE REPORT github.com/hjacobs/kube-resource-report
  57. 57. 58 RESOURCE REPORT: TEAMS Sorting teams by Slack Costs github.com/hjacobs/kube-resource-report
  58. 58. 59 KUBERNETES APPLICATION DASHBOARD
  59. 59. 60 VERTICAL POD AUTOSCALER limit/requests adapted by VPA
  60. 60. 61 DOWNSCALING DURING OFF-HOURS github.com/hjacobs/kube-downscaler Weekend
  61. 61. 62 KUBERNETES JANITOR ● TTL and expiry date annotations, e.g. ○ set time-to-live for your test deployment ● Custom rules, e.g. ○ delete everything without "app" label after 7 days github.com/hjacobs/kube-janitor
  62. 62. 63 EC2 SPOT NODES 72% savings
  63. 63. 64 OUR SETUP VS VANILLA KUBERNETES
  64. 64. 65 HOW MUCH DO WE DIVERGE? • API access via Zalando OAuth • CPU throttling disabled via Kubelet flag • No memory overcommit (requests == limits) • Ingress: External DNS, Skipper, AWS ALB • Custom CRDs: Zalando OAuth, Postgres, StackSet • Kubernetes Downscaler • DNS setup (CoreDNS DaemonSet, ndots: 2)
  65. 65. 66 INGRESS: ALB + SKIPPER NODE Skipper :9999 MyApp 10.2.1.2:8080 NODE Skipper :9999 MyApp 10.2.0.2:8080 Service (list of pod IPs - endpoints) MyApp 10.2.0.3:8080 ALB :443 :80 - redirect K8S network EC2 network TLS HTTP github.com/zalando/skipper github.com/zalando-incubator/kube-ingress-aws-controller
  66. 66. 67 DNS: COREDNS AS DAEMONSET github.com/zalando-incubator/kubernetes-on-aws/blob/dev/docs/postmortems/jan-2019-dns-outage.md
  67. 67. 68 NON-PROD VS PROD • Non-production similar to plain hosted Kubernetes • Production: • No write access (only via CI/CD) • Compliance webhooks • Require production-ready Docker images
  68. 68. 69 COMPLIANCE FOR PRODUCTION • Pods require application label pointing to application registry ⇒ establishes link to owning team • Docker images must be built from master via CDP NOTE: teams can freely choose their namespace(s)
  69. 69. 71 MONTHLY DEVELOPER NEWSLETTER
  70. 70. 72 SUMMARY • Seamless updates • Avoid pet clusters • Small disruptions are normal • Automated cluster e2e tests • Documentation & communication
  71. 71. 73 FUTURE • API version updates (1.16+) • Improved Autoscaling • Improved StackSet, Gradual Rollout • Migrations • Cost efficiency • Looking at VPC CNI, AWS IAM, EKS, ...
  72. 72. 74 KUBERNETES FAILURE STORIES • Zalando's Failure Stories - KubeCon EU 2019 • Build Errors of Continuous Delivery Platform • Total DNS outage in Kubernetes cluster https://k8s.af
  73. 73. 75 COMMON PITFALLS • Insufficient e2e tests • Readiness & Liveness Probes • Resource Requests & Limits • DNS
  74. 74. 76 OPEN SOURCE & MORE Cluster Config github.com/zalando-incubator/kubernetes-on-aws Skipper HTTP Router & Ingress controller github.com/zalando/skipper Ingress Controller for AWS github.com/zalando-incubator/kube-ingress-aws-controller Kubernetes Web View codeberg.org/hjacobs/kube-web-view More Zalando Tech Talks github.com/zalando/public-presentations
  75. 75. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Henning Jacobs @try_except_

×