Kubernetes
Learning from Zero to Production
October 11, 2018
1
@joatmon08 @TWTechTalksNYC
whoarewe
Rosemary Wang
@joatmon08
Infra @Thoughtworks
Abel Tamrat
Dev @Thoughtworks
2
@joatmon08 @TWTechTalksNYC
What’s the Story?
3
@joatmon08 @TWTechTalksNYC
“Build a Platform”
1. Must be hosted on public cloud.
Use AWS (existing skills & compliance).
2. Avoid vendor lock-in when possible.
Use Kubernetes (cross-cloud).
3. Use SaaS products when available.
Use ??? since EKS was not GA at the time.
4
@joatmon08 @TWTechTalksNYC
Building Capability is Difficult
Where do I start?
What’s actually useful?
Running it in production!?
Constraints
Time
Developer Capacity
Active Development
5
@joatmon08 @TWTechTalksNYC 6
@joatmon08 @TWTechTalksNYC
I don’t need to be an expert…
I just need to know enough
& pivot if things don’t work or change.
7
@joatmon08 @TWTechTalksNYC
Table of Contents
1. Start with a local tutorial. (Minikube)
2. What does it do?! (Deployments)
3. Put it on the cloud. (kops)
4. Somehow, reach my application. (Service Ingress)
5. Manage logging & metrics agents. (DaemonSets)
6. Manage stateful stuff like Consul. (StatefulSets)
7. We need more resources! (Autoscaling)
8. Testing, testing. 1, 2, 3. (Jobs)
9. “Good enough” security. (Secrets & More)
10. The cluster needs to use the new image. (Cluster Rolling Upgrade)
11. Goodbye, cluster! (Backup & Restore)
12. What’s still in backlog after go-live?
8
@joatmon08 @TWTechTalksNYC
Suspend Your Disbelief
1. Business Case for Containers / Kubernetes
2. Kubernetes Basics, A to Z
3. Kubernetes Troubleshooting
4. Kubernetes Internals by the Bits
5. Advanced Secrets Management
6. kops internals by the Bits
7. Advanced Key-Value Stores
8. Building Operational Knowledge (with Chaos Pygmy Marmoset)
Questions? Please wait until the end!
Want the slides? Check Meetup & Twitter!
9
@joatmon08 @TWTechTalksNYC
Prologue
10
@joatmon08 @TWTechTalksNYC 11
@joatmon08 @TWTechTalksNYC 12
@joatmon08 @TWTechTalksNYC
How do we manage these containers?
a. Group them.
b. Write some code to schedule them on resources.
c. Build some connectivity to bridge them all together.
d. Identify them in a human-friendly way.
13
@joatmon08 @TWTechTalksNYC
How do we manage these containers?
a. Group them.
b. Write some code to schedule them on resources.
c. Build some connectivity to bridge them all together.
d. Identify them in a human-friendly way.
e. All of the above!
14
@joatmon08 @TWTechTalksNYC
Master
Node 1 Node 2 Node N
kube-apiserver
etcd
kube-scheduler
kube-controller-manager
https://kubernetes.io/docs/concepts/overview/components/#master-components
kubelet
kube-proxy
container runtime
(docker)
15
@joatmon08 @TWTechTalksNYC
Start with a local tutorial.
(Minikube)
16
@joatmon08 @TWTechTalksNYC
What is Minikube?
a. A smaller-than-average Kubernetes cluster.
b. A tool to deploy a local Kubernetes cluster.
c. A packing cube that compresses garments.
d. An alias for famous hip-hop star, Lil Kube.
17
@joatmon08 @TWTechTalksNYC
What is Minikube?
a. A smaller-than-average Kubernetes cluster.
b. A tool to deploy a local Kubernetes cluster.
c. A packing cube that compresses garments.
d. An alias for famous hip-hop star, Lil Kube.
18
@joatmon08 @TWTechTalksNYC
https://kubernetes.io/docs/tutorials/hello-minikube/
19
@joatmon08 @TWTechTalksNYC
What does it do?
(Deployments)
20
@joatmon08 @TWTechTalksNYC
Pods
xyz
:123
abc
:456
PodSpec
I want
container xyz on
port 123
&
container abc
with ports 456.
Smallest unit of service
Groups of containers and/or
volumes
Shared storage / network
https://kubernetes.io/docs/tutorials/k8s101/
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
21
@joatmon08 @TWTechTalksNYC
Deployments
(Our) commonly used
construct
Consists of pods
Reconciliation loop (always
maintain desired state)
xyz
:123
abc
:456
xyz
:123
abc
:456
Deployment
And I want 2
pods.
https://kubernetes.io/docs/tutorials/k8s101/
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
22
@joatmon08 @TWTechTalksNYC
Put it on the cloud.
(kops)
23
@joatmon08 @TWTechTalksNYC
What to use?
https://kubernetes.io/docs/setup/pick-right-solution/#table-of-solutions
24
43 on the list
(as of October 2018)
@joatmon08 @TWTechTalksNYC
kops
https://github.com/kubernetes/kops
AWS
CRUD operations
Configurable (Multi-Master, Bastion for SSH Access)
Dry Runs
25
@joatmon08 @TWTechTalksNYC
https://github.com/kubernetes/kops
26
@joatmon08 @TWTechTalksNYC
1
dry
runkops
update
cluster 2
applykops
update
cluster
--yes
3
test
O
urO
w
n
+
Kubernetes
Conform
ance
(Sonobuoy)
27
deployment pipeline
@joatmon08 @TWTechTalksNYC
Community tools can give the
illusion of managed services.
Watch out for versioning.
Go straight to the source code
for documentation.
28
@joatmon08 @TWTechTalksNYC
Somehow, reach my application.
(Service Ingress)
29
@joatmon08 @TWTechTalksNYC
https://github.com/kubernetes/kops
Node 1 Node 2 Node N
kube-apiserver
Service
I want to reach my
app at
helloworld
on Port 80
30
@joatmon08 @TWTechTalksNYC
Cluster
Load Balancer
DNS Alias DNS Alias
Load Balancer
Ingress
Controller
/hello
helloworld.com mycluster.com/hello
ClusterCluster
helloworld.default
Type: ClusterIP Type: LoadBalancer Ingress Controller
31
helloworld.default helloworld.default
@joatmon08 @TWTechTalksNYC
Should we go for an ingress controller?
a. Yes! It’s a standard for microservices.
b. Yes! We need a reverse proxy.
c. Not now, will add complexity (and scale isn’t a concern).
d. I searched it and it’s an augmented reality game…?
32
@joatmon08 @TWTechTalksNYC
Should we go for an ingress controller?
a. Yes! It’s a standard for microservices.
b. Yes! We need a reverse proxy.
c. Not now, will add complexity (and scale isn’t a concern).
d. I searched it and it’s an augmented reality game…?
33
@joatmon08 @TWTechTalksNYC
Less complexity is less to debug.
Only expose the services you need.
34
@joatmon08 @TWTechTalksNYC
Manage logging & metrics agents.
(DaemonSets)
35
@joatmon08 @TWTechTalksNYC
Our Goal
Keep it simple for the applications.
36
@joatmon08 @TWTechTalksNYC
Push… nah.
Application
Logging System
37
@joatmon08 @TWTechTalksNYC
Pull!
Application
Logging System
38
@joatmon08 @TWTechTalksNYC
PodApp
Node
Agent
PodApp
Agent
PodApp
Agent
Node
PodApp
Pod
Agent
PodApp
PodApp
39
Sidecars Daemonsets
@joatmon08 @TWTechTalksNYC
Metrics
Node
Pod
Agent
PodApp
Scrapes metrics endpoint
(Prometheus-formatted)
Autodiscovery
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
Forwards m
etrics to
m
etrics backend
40
@joatmon08 @TWTechTalksNYC
Logging
Node
Pod
Agent
PodApp
Tails logs
/var/log/containers/*.log
Kubelet writes stdout
and stderr to log files
Forwards logs to
logging backend
41
@joatmon08 @TWTechTalksNYC
Manage stateful stuff like Consul.
(StatefulSets)
42
@joatmon08 @TWTechTalksNYC
What do statefulsets allow that
deployments don’t?
a. Scaling.
b. Rolling upgrades.
c. Persistent storage.
d. Sticky identities for pods
43
@joatmon08 @TWTechTalksNYC
What do statefulsets allow that
deployments don’t?
a. Scaling.
b. Rolling upgrades.
c. Persistent storage.
d. Sticky identities for pods.
44
@joatmon08 @TWTechTalksNYC
Pod Identity Can Be Important
Member-1 Member-2 Member-3
Replication Replication
Leader Forwarding
Leader Forwarding
45
@joatmon08 @TWTechTalksNYC
Deployment
Deployments Can Have Persistence
Pod
46
@joatmon08 @TWTechTalksNYC
Statefulsets Make Persistence Easier
Member-1 Member-2 Member-3
Replication Replication
Leader Forwarding
Leader Forwarding
47
@joatmon08 @TWTechTalksNYC
We need more resources!
(Autoscaling)
48
@joatmon08 @TWTechTalksNYC
Cluster Autoscaling
https://github.com/kubernetes/autoscaler
“Let me add more.”
“Schedule me!”
49
@joatmon08 @TWTechTalksNYC
Cluster Autoscaling
https://github.com/kubernetes/autoscaler
“I’ll scale up by 1.”
50
@joatmon08 @TWTechTalksNYC
Cluster Autoscaling
https://github.com/kubernetes/autoscaler
“I’m scheduled!”
51
@joatmon08 @TWTechTalksNYC
Cluster Autoscaling
https://github.com/kubernetes/autoscaler
“I’ll scale up by 1.”
Caution: Possibly conflicting
permissions with backup...
52
@joatmon08 @TWTechTalksNYC
Pod Autoscaling
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale
Lots of requests = I need more pods.
Horizontal Pod Autoscaler to the rescue!
(Scale on CPU by default.)
53
@joatmon08 @TWTechTalksNYC
Pod Autoscaling
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale
<1.8
heapster
>1.11
metrics-server
1.9??
54
@joatmon08 @TWTechTalksNYC
Why don’t my queue consumers scale up?
a. Cookie monster is consuming my queue consumers.
b. They don’t process messages fast enough.
c. They are broken.
d. They pull one message at a time from the queue.
55
@joatmon08 @TWTechTalksNYC
Why don’t my queue consumers scale up?
a. Cookie monster is consuming my queue consumers.
b. They don’t process messages fast enough.
c. They are broken.
d. They pull one message at a time from the queue.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-APIs
56
@joatmon08 @TWTechTalksNYC
Autoscaling not only reduces nodes but alerts.
Release notes are very valuable.
Application behavior can dictate scaling metrics.
57
@joatmon08 @TWTechTalksNYC
Testing, testing. 1, 2, 3.
(Jobs)
58
@joatmon08 @TWTechTalksNYC
Scenario
CI
Server
Load
Balancer
k8s API
server
59
@joatmon08 @TWTechTalksNYC
Scenario
CI
Server
Load
Balancer
k8s API
server
My App
60
@joatmon08 @TWTechTalksNYC
How do you test?
CI
Server
Load
Balancer
k8s API
server
My App
61
@joatmon08 @TWTechTalksNYC
How can we test it (realistically)?
Test it the same way as the applications that run on it.
(Our CI framework is not inside of the cluster.)
62
@joatmon08 @TWTechTalksNYC
We tried...
kube-proxy
kubectl exec
jobs, jobs, jobs
63
@joatmon08 @TWTechTalksNYC
How do you test?
CI
Server
Load
Balancer
k8s API
server
My App
64
@joatmon08 @TWTechTalksNYC
How do you test?
CI
Server
Load
Balancer
k8s API
server
My App
Test Job
65
@joatmon08 @TWTechTalksNYC
“Good Enough” Security.
(Secrets & More)
66
@joatmon08 @TWTechTalksNYC
● Security Groups (Thanks kops!)
● Load Balancer Restrictions
● Internal Load Balancers
● Private Networks
● Identity & Access Management
● API Authorization for Services
● (Primitive) Kubernetes RBAC
● NO Dashboard
The Usual Suspects
67
@joatmon08 @TWTechTalksNYC
Initially
Secrets Management
Operational Complexity
Time Constraints
68
@joatmon08 @TWTechTalksNYC
KMS + Parameter Store
Secrets Management
Kubernetes Secret
Environment Variable
Access Control
Encrypted Backend
Injected via Pipeline
69
@joatmon08 @TWTechTalksNYC
Vulnerability Management
Containers
Except
anything with kernel-level
dependencies
Virtual Machines
CIS-Hardened
kops Debian had
vulnerabilities
70
@joatmon08 @TWTechTalksNYC
Security
Test(Functions)
Remediate
Discover
71
@joatmon08 @TWTechTalksNYC
The cluster needs to
use the new image.
(Cluster Rolling Upgrade)
72
@joatmon08 @TWTechTalksNYC
kops
kops to the rescue!
kops rolling-update cluster <cluster name>
kops rolling-update cluster <cluster name> --yes
73
@joatmon08 @TWTechTalksNYC
How it works
74
@joatmon08 @TWTechTalksNYC
How it works
Cordoned
75
@joatmon08 @TWTechTalksNYC
How it works
Cordoned
76
@joatmon08 @TWTechTalksNYC
How it works
Cordoned
DRAINING
77
@joatmon08 @TWTechTalksNYC
How it works
Cordoned
DRAINING
78
@joatmon08 @TWTechTalksNYC
How it works
Cordoned
DRAINING
79
@joatmon08 @TWTechTalksNYC
How it works
Cordoned
DRAINING
80
@joatmon08 @TWTechTalksNYC
How it works
81
@joatmon08 @TWTechTalksNYC
DIY!
Cordon Drain Delete Validate
82
@joatmon08 @TWTechTalksNYC
Goodbye, cluster!
(Backup & Restore)
83
@joatmon08 @TWTechTalksNYC
Problem
When things reeeeally go south,
how can we recover quickly?
84
@joatmon08 @TWTechTalksNYC
Solutions
85
@joatmon08 @TWTechTalksNYC
Cluster Backups
https://github.com/kubernetes/autoscaler
k8s API
server
ark
1. “What’s running in the
cluster?”
2. Stores the api objects in s3
86
@joatmon08 @TWTechTalksNYC
Cluster Restores
https://github.com/kubernetes/autoscaler
k8s API
server
ark
CI
Server
1. “Restore from latest backup”
2. “Give me latest backup”
3. “Recreate these api objects”
87
@joatmon08 @TWTechTalksNYC
What’s still in backlog after go-live?
88
@joatmon08 @TWTechTalksNYC
More Scale!
● Ingress Controller
● Sidecars for Handling Certificates
● Kubernetes OIDC
● Secrets Management
● Secrets Injection via Init Containers
89
@joatmon08 @TWTechTalksNYC
Which of the following would NOT give us
more security?
a. Ephemeral VMs (Bastions, Masters, and Nodes)
b. Cats on a Turntable
c. PodSecurityPolicy
d. NetworkPolicy
90
@joatmon08 @TWTechTalksNYC
Which of the following would NOT give us
more security?
a. Ephemeral VMs (Bastions, Masters, and Nodes)
b. Cats on a Turntable
c. PodSecurityPolicy
d. NetworkPolicy
91
@joatmon08 @TWTechTalksNYC
Which of the following would NOT give us
more security?
a. Ephemeral VMs (Bastions, Masters, and Nodes)
b. Cats on a Turntable
c. PodSecurityPolicy
d. NetworkPolicy
92
@joatmon08 @TWTechTalksNYC
Managed Kubernetes?
Maybe some struggles go
away…
● Cluster Autoscaler
● Virtual Machine Images
● CI in the Cluster
Questions don’t!
● How we do scalable RBAC?
● Init container with secrets?
● Resiliency testing when
upgrading?
● How do we test our
applications & components?
93
@joatmon08 @TWTechTalksNYC
Epilogue (for now)
94
@joatmon08 @TWTechTalksNYC
Did we learn enough?
95
@joatmon08 @TWTechTalksNYC
The good news:
We didn’t hear...
“You’re blocking a deployment to production”.
96
@joatmon08 @TWTechTalksNYC
The good news:
We made it to production.
97
@joatmon08 @TWTechTalksNYC
There is a lot in the Kubernetes ecosystem.
A community is a powerful resource.
Be hands-on.
Expertise is in the ability to learn.
98
@joatmon08 @TWTechTalksNYC
https://github.com/kubernetes/kubernetes/issues/44308
99
@joatmon08 @TWTechTalksNYC
Thank you!
100

Kubernetes: Learning from Zero to Production