Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup

1,085 views

Published on

This talk will highlight our challenges while migrating from our STUPS infrastructure (Docker on EC2, Cloud Formation) to Kubernetes on AWS.

Talk was held at Berlin Kubernetes Meetup on 2017-05-18: https://www.meetup.com/Berlin-Kubernetes-Meetup/events/239313998/

Published in: Technology

From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes Meetup

  1. 1. FROM AWS/STUPS TO KUBERNETES BERLIN KUBERNETES MEETUP 2017-05-18 HENNING JACOBS @try_except_
  2. 2. 2 ZALANDO 15 markets 6 fulfillment centers 20 million active customers 3.6 billion € net sales 2016 165 million visits per month 12,000 employees in Europe
  3. 3. 3 ZALANDO TECHNOLOGY HOME-BREWED, CUTTING-EDGE & SCALABLE technology solutions >1,600 employees from tech locations + HQs in Berlin6 77 nations help our brand to WIN ONLINE
  4. 4. 4 AWS/STUPS: SOME HISTORY
  5. 5. 5 STUPS ON AWS AWS STUPS DOCKER DEPLOY SSH ACCESS AUDIT REPORTS FULL AWS ACCESS
  6. 6. 6 ISOLATED AWS ACCOUNTS Internet *.abc.example.org *.xyz.example.org Team ABC Team XYZ EC2 LBLB https://stups.io/
  7. 7. 8 IMMUTABLE STACKS ELB myapp-v1 myapp.example.org EC2 + Docker EC2 + Docker EC2 + Docker
  8. 8. 9 IMMUTABLE STACKS ELB myapp-v1 EC2 + Docker EC2 + Docker EC2 + Docker ELB myapp-v2 EC2 + Docker EC2 + Docker myapp.example.org
  9. 9. 10 KUBERNETES: ARCHITECTURE
  10. 10. 11 KUBERNETES ON AWS: CONTEXT 200 engineering teams 30 prod. clusters AWS/STUPS Dockerized apps No manual operations Reliability Autoscaling Seamless migration
  11. 11. 12 ISOLATED AWS ACCOUNTS Internet *.abc.example.org *.xyz.example.org Product ABC Product XYZ EC2 LBLB
  12. 12. 13 KUBERNETES ON AWS
  13. 13. 14 DEPLOYMENT
  14. 14. 15 DEPLOYMENT CONFIGURATION . ├── apply │ ├── credentials.yaml # K8s TPR │ ├── ingress.yaml # K8s Ingress │ ├── redis-deployment.yaml # K8s Deployment │ ├── redis-service.yaml # K8s Service │ └── service.yaml # K8s Service ├── deployment.yaml # K8s Deployment └── pipeline.yaml # proprietary config
  15. 15. 16 INGRESS.YAML apiVersion: extensions/v1beta1 kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80
  16. 16. 17 JENKINS DEPLOY PIPELINE
  17. 17. 18 AWS INTEGRATION
  18. 18. 19 CLOUD FORMATION VIA CI/CD . ├── apply │ ├── cf-iam-role.yaml # AWS IAM Role │ ├── cf-rds.yaml # AWS RDS Database │ ├── kube-ingress.yaml # K8s Ingress │ ├── kube-secret.yaml # K8s Secret │ └── kube-service.yaml # K8s Service ├── deployment.yaml # K8s Deployment └── pipeline.yaml # CI/CD config
  19. 19. 20 ASSIGNING AWS IAM ROLE TO POD kind: Deployment spec: template: metadata: annotations: # annotation for kube2iam iam.amazonaws.com/role: "app-myapp-role" spec: containers: - name: ... ... https://github.com/jtblin/kube2iam ⇒ AWS SDKs just work as expected
  20. 20. 21 CLUSTER AUTOSCALING
  21. 21. 22 CLUSTER AUTOSCALING Control # of worker nodes in ASG: • Satisfy all resource requests • One spare node per AZ • No manual config “tweaking” • Scale down, but not too fast ⇒ we want to be “elastic” https://github.com/hjacobs/kube-aws-autoscaler
  22. 22. 23 CHALLENGES
  23. 23. 24 1. Getting Started 2. Stability 3. Onboarding 4. User Experience 5. Operations CHALLENGES
  24. 24. 25 CHALLENGE 1: GETTING STARTED
  25. 25. 26 GETTING STARTED https://github.com/hjacobs/kubernetes-on-aws-users
  26. 26. 27 GETTING STARTED https://github.com/hjacobs/kubernetes-on-aws-users
  27. 27. 28 CLUSTER PROVISIONING
  28. 28. 29 CLUSTER PROVISIONING • Two Cloud Formation stacks • Master & worker ASGs + etcd • Nodes w/ Container Linux • K8s manifests applied separately • kube-system Deployments • DaemonSets
  29. 29. 30 GETTING STARTED Goal: use Kubernetes API as primary interface for AWS • Mate, External DNS • Kubernetes Ingress Controller for AWS • kube2iam ⇒ we wrote new components to achieve our goal
  30. 30. 31 INGRESS CONTROLLER https://github.com/zalando-incubator/kube-ingress-aws-controller / https://github.com/kubernetes-incubator/external-dns
  31. 31. 32 GETTING STARTED Other questions we asked ourselves.. • Single AZ vs. Multi AZ? • Federation? • Overlay network? • Authnz?
  32. 32. 33 GETTING STARTED Other questions we asked ourselves.. • Single AZ vs. Multi AZ? ⇒ Multi AZ • Federation? ⇒ No, not ready yet • Overlay network? ⇒ Flannel, “rock solid” • Authnz? ⇒ OAuth, webhook
  33. 33. 34 CHALLENGE 2: STABILITY
  34. 34. 35 STABILITY • Cluster Updates • Docker • AWS Rate Limits
  35. 35. 36 CLUSTER UPDATES
  36. 36. 38 STABILITY: AWS RATE LIMITS • Ran into the same trap twice (Mate & Ingress Ctrl) • Kubernetes core causes many calls (e.g. EBS) • Monitoring (ZMON) needs to poll AWS ⇒ One of our biggest pain points with AWS (and all workarounds are hard and/or ugly)
  37. 37. 39 STABILITY: LIMIT RANGE kubectl describe limitrange Name: limits Namespace: default Type Resource Min Max Default Req Default Limit Max Limit/Request Ratio ---- -------- --- --- ----------- ------------- ----------------------- Container memory - 64Gi 100Mi 1Gi - Container cpu - 16 100m 3 - http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html#resources ⇒ Mitigate errors on OSI layer 8 ;-)
  38. 38. 41 CHALLENGE 3: ONBOARDING
  39. 39. 42 ONBOARDING • Many new concepts to grasp vs. 200 teams • Kubernetes Training (2h) • Documentation • Recorded Friday Demos • Support Channels (chat, mail)
  40. 40. 43 CHALLENGE 4: USER EXPERIENCE
  41. 41. 44 USER EXPERIENCE • Jenkins deployment only covers “happy case” • Juggling with YAMLs • Weighted traffic switching missing
  42. 42. 45 UX: WEIGHTED TRAFFIC SWITCHING • STUPS uses weighted Route53 DNS records • Allows canary, blue/green, slow ramp up • Current proposal: add weights to Ingress backends https://github.com/zalando/skipper/issues/324
  43. 43. 46 UX: WEIGHTED TRAFFIC SWITCHING https://github.com/zalando/skipper/issues/324
  44. 44. 47 CHALLENGE 5: OPERATIONS
  45. 45. 48 OPERATIONS • Team Autonomy? • Platform as a Service • Convergence • Emergency Operator Access ⇒ Hard challenges..
  46. 46. https://github.com/hjacobs/kube-ops-view
  47. 47. 50 LINKS Running Kubernetes in Production on AWS http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html Kube AWS Ingress Controller https://github.com/zalando-incubator/kube-ingress-aws-controller External DNS https://github.com/kubernetes-incubator/external-dns PostgreSQL Operator https://github.com/zalando-incubator/postgres-operator Zalando Cluster Configuration https://github.com/zalando-incubator/kubernetes-on-aws List of Organizations using Kubernetes on AWS https://github.com/hjacobs/kubernetes-on-aws-users
  48. 48. QUESTIONS? HENNING JACOBS TECH INFRASTRUCTURE CLOUD ENGINEER henning@zalando.de @try_except_ Illustrations by @01k

×