Kubernetes on AWS
AT EUROPE’S LEADING
ONLINE FASHION PLATFORM
HENNING JACOBS
@try_except_
2017-03-27
2
ZALANDO
15 markets
6 fulfillment centers
20 million active customers
3.6 billion € net sales 2016
165 million visits per month
12,000 employees in Europe
3
ZALANDO TECHNOLOGY
HOME-BREWED,
CUTTING-EDGE
& SCALABLE
technology solutions
>1,600
employees from
tech locations
+ HQs in Berlin6
77
nations
help our brand to
WIN ONLINE
4
KUBERNETES ON AWS: CONTEXT
200 engineering teams
30 prod. clusters
AWS
Dockerized apps
No manual operations
Reliability
Autoscaling
Seamless migration
5
ARCHITECTURE
6
ISOLATED AWS ACCOUNTS
Internet
*.abc.example.org *.xyz.example.org
Product ABC Product XYZ
EC2
LBLB
7
KUBERNETES ON AWS
8
ARCHITECTURE DECISIONS
• API server behind SSL ELB
• Webhook for authn & authz
• OAuth Bearer token
• Group membership lookup
• Read only access to production
• CI/CD for write access
• etcd running separately on EC2
• Multi AZ clusters
9
CLUSTER PROVISIONING
10
CLUSTER PROVISIONING
• Two Cloud Formation stacks
• Master & worker ASGs + etcd
• Nodes w/ Container Linux
• K8s manifests applied separately
• kube-system Deployments
• DaemonSets
11
DEPLOYMENT
12
DEPLOYMENT CONFIGURATION
.
├── apply
│ ├── credentials.yaml # K8s TPR
│ ├── ingress.yaml # K8s Ingress
│ ├── redis-deployment.yaml # K8s Deployment
│ ├── redis-service.yaml # K8s Service
│ └── service.yaml # K8s Service
├── deployment.yaml # K8s Deployment
└── pipeline.yaml # proprietary config
13
JENKINS DEPLOY PIPELINE
14
INGRESS
15
INGRESS.YAML
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: "{{ application }}"
annotations:
# optional: SSL certificate ARN to use for the ALB (auto discovery for ACM)
zalando.org/aws-load-balancer-ssl-cert: "arn:aws:iam:..:..:..1a"
spec:
rules:
# DNS name your application should be exposed on
- host: "myapp.foo.example.org"
http:
paths:
- backend:
serviceName: "{{ application }}"
servicePort: 80
16
INGRESS CONTROLLER
17
AWS INTEGRATION
18
CLOUD FORMATION VIA CI/CD
.
├── apply
│ ├── cf-iam-role.yaml # AWS IAM Role
│ ├── cf-rds.yaml # AWS RDS Database
│ ├── kube-ingress.yaml # K8s Ingress
│ ├── kube-secret.yaml # K8s Secret
│ └── kube-service.yaml # K8s Service
├── deployment.yaml # K8s Deployment
└── pipeline.yaml # CI/CD config
19
ASSIGNING AWS IAM ROLE TO POD
kind: Deployment
spec:
template:
metadata:
annotations:
# annotation for kube2iam
iam.amazonaws.com/role: "app-{{ application }}-1"
spec:
containers:
- name: ...
...
20
CLUSTER
AUTOSCALING
21
CLUSTER AUTOSCALING
Control # of worker nodes in ASG:
• Satisfy all resource requests
• One spare node per AZ
• No manual config “tweaking”
• Scale down, but not too fast
22
CURRENT SETUP
• https://github.com/hjacobs/kube-aws-autoscaler
• Node draining via systemd unit
Open topic: node “readiness” during scale out
24
OAUTH / IAM
INTEGRATION
25
DECLARING NEEDED CREDENTIALS
# apply/credentials.yaml
apiVersion: "zalando.org/v1"
kind: PlatformCredentialsSet
metadata:
name: "{{ application }}"
spec:
application: "{{ application }}"
tokens: # OAuth service tokens
mytok: # the token name used in application code
privileges:
- com.zalando::foobar.write
clients: # OAuth clients
implicit:
grant: implicit # grant type according to RFC-6749
realm: users
redirectUri: https://myapp.foo.example.org/oauth
26
MOUNTING THE OAUTH CREDENTIALS
kind: Deployment
spec:
template:
spec:
containers:
- name: ...
...
volumeMounts:
- name: "{{ application }}-credentials"
mountPath: /meta/credentials
readOnly: true
volumes:
- name: "{{ application }}-credentials"
secret:
secretName: "{{ application }}"
27
USING THE OAUTH CREDENTIALS
#!/bin/bash
type=$(cat /meta/credentials/read-only-token-type)
secret=$(cat /meta/credentials/read-only-token-secret)
curl -H "Authorization: $type $secret" 
https://resource-server.example.org/protected
28
OPERATIONS
&
MONITORING
29
OPERATIONS
• Cluster updates automatic via CLM
• CronJob is great, but needs cleanup
• Docker can be PITA
30
CLUSTER
UPDATES
31
LIMIT RANGE
kubectl describe limitrange
Name: limits
Namespace: default
Type Resource Min Max Default Req Default Limit Max Limit/Request Ratio
---- -------- --- --- ----------- ------------- -----------------------
Container memory - 64Gi 100Mi 1Gi -
Container cpu - 16 100m 3 -
32
MONITORING
•
33
SIMPLE ZMON CHECK/ALERT EXAMPLE
•
34
MONITORING
• Each cluster contains ZMON appliance
• K8s resources are available as ZMON entities
• Users can create app checks/alerts via UI
https://github.com/hjacobs/kube-ops-view
36
OPEN SOURCE
37
OPEN SOURCE
Kube AWS Ingress Controller
https://github.com/zalando-incubator/kube-ingress-aws-controller
External DNS
https://github.com/kubernetes-incubator/external-dns
Zalando Cluster Config & Docs
https://github.com/zalando-incubator/kubernetes-on-aws
more to come...
QUESTIONS?
HENNING JACOBS
TECH INFRASTRUCTURE
CLOUD ENGINEER
henning@zalando.de
@try_except_

Kubernetes on AWS at Europe's Leading Online Fashion Platform