Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CONTAINER DAYS HAMBURG
2017-06-20
HENNING JACOBS
@try_except_
Kubernetes on AWS
@ZalandoTech
2
ZALANDO
15 markets
6 fulfillment centers
20 million active customers
3.6 billion € net sales 2016
165 million visits per...
3
ZALANDO TECHNOLOGY
HOME-BREWED,
CUTTING-EDGE
& SCALABLE
technology solutions
>1,700
employees from
tech locations
+ HQs ...
4
ZALANDO TECH’S
INFRASTRUCTURE
5
FOUR ERAS AT ZALANDO TECH
ZOMCATPHP STUPS KUBERNETES
2010 2015 2016
Data center
WAR
AWS
Docker
Cloud Formation
Low level...
6
LARGE SCALE?
8
KUBERNETES:
ARCHITECTURE
9
KUBERNETES ON AWS: CONTEXT
200 engineering teams
30 prod. clusters
AWS/STUPS
Dockerized apps
No manual operations
Reliab...
10
ISOLATED AWS ACCOUNTS
Internet
*.abc.example.org *.xyz.example.org
Product ABC Product XYZ
EC2
LBLB
11
KUBERNETES ON AWS
12
DEPLOYMENT
13
DEPLOYMENT CONFIGURATION
.
├── apply
│ ├── credentials.yaml # K8s TPR
│ ├── ingress.yaml # K8s Ingress
│ ├── redis-depl...
14
INGRESS.YAML
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your applicatio...
15
JENKINS DEPLOY PIPELINE
16
AWS INTEGRATION
17
CLOUD FORMATION VIA CI/CD
.
├── apply
│ ├── cf-iam-role.yaml # AWS IAM Role
│ ├── cf-rds.yaml # AWS RDS Database
│ ├── ...
18
ASSIGNING AWS IAM ROLE TO POD
kind: Deployment
spec:
template:
metadata:
annotations:
# annotation for kube2iam
iam.ama...
19
CLUSTER
AUTOSCALING
20
CLUSTER AUTOSCALING
Control # of worker nodes in ASG:
• Satisfy all resource requests
• One spare node per AZ
• No manu...
21
OAUTH / IAM
INTEGRATION
22
SERVICE TO SERVICE AUTHNZ
Kubernetes Cluster
https://resource-server.example.org/protected
HTTP/1.1 401 Unauthorized
{
...
23
CREDENTIAL PROVIDER
24
USING THE OAUTH CREDENTIALS
#!/bin/bash
secret=$(cat /creds/mytok-token-secret)
curl -H "Authorization: Bearer $secret"...
25
CHALLENGES
26
1. Getting Started
2. Stability
3. Onboarding
4. User Experience
5. Operations
CHALLENGES
27
CHALLENGE 1:
GETTING STARTED
28
GETTING STARTED
https://github.com/hjacobs/kubernetes-on-aws-users
29
GETTING STARTED
https://github.com/hjacobs/kubernetes-on-aws-users
30
CLUSTER PROVISIONING
31
CLUSTER PROVISIONING
• Two Cloud Formation stacks
• Master & worker ASGs + etcd
• Nodes w/ Container Linux
• K8s manife...
32
GETTING STARTED
Goal: use Kubernetes API as primary interface for AWS
• Mate, External DNS
• Kubernetes Ingress Control...
33
INGRESS CONTROLLER
https://github.com/zalando-incubator/kube-ingress-aws-controller / https://github.com/kubernetes-inc...
34
GETTING STARTED
Other questions we asked ourselves..
• Single AZ vs. Multi AZ?
• Federation?
• Overlay network?
• Authn...
35
GETTING STARTED
Other questions we asked ourselves..
• Single AZ vs. Multi AZ? ⇒ Multi AZ
• Federation? ⇒ No, not ready...
36
CHALLENGE 2:
STABILITY
37
STABILITY
• Cluster Updates
• Docker
• AWS Rate Limits
38
CLUSTER
UPDATES
40
STABILITY: AWS RATE LIMITS
• Ran into the same trap twice (Mate & Ingress Ctrl)
• Kubernetes core causes many calls (e....
41
STABILITY: LIMIT RANGE
kubectl describe limitrange
Name: limits
Namespace: default
Type Resource Min Max Default Req De...
Recommended: The 5 Whys
https://en.wikipedia.org/wiki/5_Whys
44
CHALLENGE 3:
ONBOARDING
45
ONBOARDING
• Many new concepts to grasp vs. 200 teams
• Kubernetes Training (2h)
• Documentation
• Recorded Friday Demo...
46
CHALLENGE 4:
USER EXPERIENCE
47
USER EXPERIENCE
• Jenkins deployment only covers “happy case”
• Juggling with YAMLs
• Weighted traffic switching missing
48
UX: WEIGHTED TRAFFIC SWITCHING
• STUPS uses weighted Route53 DNS records
• Allows canary, blue/green, slow ramp up
• Ap...
49
UX: WEIGHTED TRAFFIC SWITCHING
https://github.com/zalando/skipper/issues/324
50
CHALLENGE 5:
OPERATIONS
51
OPERATIONS
• Team Autonomy?
• Platform as a Service
• Convergence
• Emergency Operator Access
⇒ Hard challenges..
https://github.com/hjacobs/kube-ops-view
53
LINKS
Running Kubernetes in Production on AWS
http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-...
QUESTIONS?
HENNING JACOBS
TECH INFRASTRUCTURE
CLOUD ENGINEER
henning@zalando.de
@try_except_
Illustrations by @01k
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - Container Days Hamburg
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - Container Days Hamburg
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - Container Days Hamburg
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - Container Days Hamburg

Download to read offline

Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando Technology department. We will highlight in the context of Kubernetes: AWS service integrations, our IAM/OAuth infrastructure, cluster autoscaling, continuous delivery and general developer experience. The talk will cover our most important learnings and we will openly share failure stories.

Talk given at Container Days HH (https://containerdays.io/) on 2017-06-20.

Related Books

Free with a 30 day trial from Scribd

See all

Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - Container Days Hamburg

  1. 1. CONTAINER DAYS HAMBURG 2017-06-20 HENNING JACOBS @try_except_ Kubernetes on AWS @ZalandoTech
  2. 2. 2 ZALANDO 15 markets 6 fulfillment centers 20 million active customers 3.6 billion € net sales 2016 165 million visits per month 12,000 employees in Europe
  3. 3. 3 ZALANDO TECHNOLOGY HOME-BREWED, CUTTING-EDGE & SCALABLE technology solutions >1,700 employees from tech locations + HQs in Berlin6 77 nations help our brand to WIN ONLINE
  4. 4. 4 ZALANDO TECH’S INFRASTRUCTURE
  5. 5. 5 FOUR ERAS AT ZALANDO TECH ZOMCATPHP STUPS KUBERNETES 2010 2015 2016 Data center WAR AWS Docker Cloud Formation Low level (AWS API) AWS Docker Kubernetes manifest High abstraction level Data center PHP files
  6. 6. 6 LARGE SCALE?
  7. 7. 8 KUBERNETES: ARCHITECTURE
  8. 8. 9 KUBERNETES ON AWS: CONTEXT 200 engineering teams 30 prod. clusters AWS/STUPS Dockerized apps No manual operations Reliability Autoscaling Seamless migration
  9. 9. 10 ISOLATED AWS ACCOUNTS Internet *.abc.example.org *.xyz.example.org Product ABC Product XYZ EC2 LBLB
  10. 10. 11 KUBERNETES ON AWS
  11. 11. 12 DEPLOYMENT
  12. 12. 13 DEPLOYMENT CONFIGURATION . ├── apply │ ├── credentials.yaml # K8s TPR │ ├── ingress.yaml # K8s Ingress │ ├── redis-deployment.yaml # K8s Deployment │ ├── redis-service.yaml # K8s Service │ └── service.yaml # K8s Service ├── deployment.yaml # K8s Deployment └── pipeline.yaml # proprietary config
  13. 13. 14 INGRESS.YAML apiVersion: extensions/v1beta1 kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80
  14. 14. 15 JENKINS DEPLOY PIPELINE
  15. 15. 16 AWS INTEGRATION
  16. 16. 17 CLOUD FORMATION VIA CI/CD . ├── apply │ ├── cf-iam-role.yaml # AWS IAM Role │ ├── cf-rds.yaml # AWS RDS Database │ ├── kube-ingress.yaml # K8s Ingress │ ├── kube-secret.yaml # K8s Secret │ └── kube-service.yaml # K8s Service ├── deployment.yaml # K8s Deployment └── pipeline.yaml # CI/CD config
  17. 17. 18 ASSIGNING AWS IAM ROLE TO POD kind: Deployment spec: template: metadata: annotations: # annotation for kube2iam iam.amazonaws.com/role: "app-myapp-role" spec: containers: - name: ... ... https://github.com/jtblin/kube2iam ⇒ AWS SDKs just work as expected
  18. 18. 19 CLUSTER AUTOSCALING
  19. 19. 20 CLUSTER AUTOSCALING Control # of worker nodes in ASG: • Satisfy all resource requests • One spare node per AZ • No manual config “tweaking” • Scale down, but not too fast ⇒ we want to be “elastic” https://github.com/hjacobs/kube-aws-autoscaler
  20. 20. 21 OAUTH / IAM INTEGRATION
  21. 21. 22 SERVICE TO SERVICE AUTHNZ Kubernetes Cluster https://resource-server.example.org/protected HTTP/1.1 401 Unauthorized { "message": "Authorization required" }
  22. 22. 23 CREDENTIAL PROVIDER
  23. 23. 24 USING THE OAUTH CREDENTIALS #!/bin/bash secret=$(cat /creds/mytok-token-secret) curl -H "Authorization: Bearer $secret" https://resource-server.example.org/protected
  24. 24. 25 CHALLENGES
  25. 25. 26 1. Getting Started 2. Stability 3. Onboarding 4. User Experience 5. Operations CHALLENGES
  26. 26. 27 CHALLENGE 1: GETTING STARTED
  27. 27. 28 GETTING STARTED https://github.com/hjacobs/kubernetes-on-aws-users
  28. 28. 29 GETTING STARTED https://github.com/hjacobs/kubernetes-on-aws-users
  29. 29. 30 CLUSTER PROVISIONING
  30. 30. 31 CLUSTER PROVISIONING • Two Cloud Formation stacks • Master & worker ASGs + etcd • Nodes w/ Container Linux • K8s manifests applied separately • kube-system Deployments • DaemonSets
  31. 31. 32 GETTING STARTED Goal: use Kubernetes API as primary interface for AWS • Mate, External DNS • Kubernetes Ingress Controller for AWS • kube2iam ⇒ we wrote new components to achieve our goal
  32. 32. 33 INGRESS CONTROLLER https://github.com/zalando-incubator/kube-ingress-aws-controller / https://github.com/kubernetes-incubator/external-dns
  33. 33. 34 GETTING STARTED Other questions we asked ourselves.. • Single AZ vs. Multi AZ? • Federation? • Overlay network? • Authnz?
  34. 34. 35 GETTING STARTED Other questions we asked ourselves.. • Single AZ vs. Multi AZ? ⇒ Multi AZ • Federation? ⇒ No, not ready yet • Overlay network? ⇒ Flannel, “rock solid” • Authnz? ⇒ OAuth, webhook
  35. 35. 36 CHALLENGE 2: STABILITY
  36. 36. 37 STABILITY • Cluster Updates • Docker • AWS Rate Limits
  37. 37. 38 CLUSTER UPDATES
  38. 38. 40 STABILITY: AWS RATE LIMITS • Ran into the same trap twice (Mate & Ingress Ctrl) • Kubernetes core causes many calls (e.g. EBS) • Monitoring (ZMON) needs to poll AWS ⇒ One of our biggest pain points with AWS (and all workarounds are hard and/or ugly)
  39. 39. 41 STABILITY: LIMIT RANGE kubectl describe limitrange Name: limits Namespace: default Type Resource Min Max Default Req Default Limit Max Limit/Request Ratio ---- -------- --- --- ----------- ------------- ----------------------- Container memory - 64Gi 100Mi 1Gi - Container cpu - 16 100m 3 - http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html#resources ⇒ Mitigate errors on OSI layer 8 ;-)
  40. 40. Recommended: The 5 Whys https://en.wikipedia.org/wiki/5_Whys
  41. 41. 44 CHALLENGE 3: ONBOARDING
  42. 42. 45 ONBOARDING • Many new concepts to grasp vs. 200 teams • Kubernetes Training (2h) • Documentation • Recorded Friday Demos • Support Channels (chat, mail)
  43. 43. 46 CHALLENGE 4: USER EXPERIENCE
  44. 44. 47 USER EXPERIENCE • Jenkins deployment only covers “happy case” • Juggling with YAMLs • Weighted traffic switching missing
  45. 45. 48 UX: WEIGHTED TRAFFIC SWITCHING • STUPS uses weighted Route53 DNS records • Allows canary, blue/green, slow ramp up • Approach: add weights to Ingress backends https://github.com/zalando/skipper/issues/324
  46. 46. 49 UX: WEIGHTED TRAFFIC SWITCHING https://github.com/zalando/skipper/issues/324
  47. 47. 50 CHALLENGE 5: OPERATIONS
  48. 48. 51 OPERATIONS • Team Autonomy? • Platform as a Service • Convergence • Emergency Operator Access ⇒ Hard challenges..
  49. 49. https://github.com/hjacobs/kube-ops-view
  50. 50. 53 LINKS Running Kubernetes in Production on AWS http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html Kube AWS Ingress Controller https://github.com/zalando-incubator/kube-ingress-aws-controller External DNS https://github.com/kubernetes-incubator/external-dns PostgreSQL Operator https://github.com/zalando-incubator/postgres-operator Zalando Cluster Configuration https://github.com/zalando-incubator/kubernetes-on-aws List of Organizations using Kubernetes on AWS https://github.com/hjacobs/kubernetes-on-aws-users
  51. 51. QUESTIONS? HENNING JACOBS TECH INFRASTRUCTURE CLOUD ENGINEER henning@zalando.de @try_except_ Illustrations by @01k
  • GaetanoBorgione

    Oct. 23, 2017
  • BooVeMan

    Jul. 13, 2017
  • WilliamStewart20

    Jun. 21, 2017

Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando Technology department. We will highlight in the context of Kubernetes: AWS service integrations, our IAM/OAuth infrastructure, cluster autoscaling, continuous delivery and general developer experience. The talk will cover our most important learnings and we will openly share failure stories. Talk given at Container Days HH (https://containerdays.io/) on 2017-06-20.

Views

Total views

1,773

On Slideshare

0

From embeds

0

Number of embeds

202

Actions

Downloads

25

Shares

0

Comments

0

Likes

3

×