Kubernetes: The Very Hard Way

@roboll_
Kubernetes:
The Very Hard Way
A tale of early adoption.
Rob Boll
Compute Lead

@roboll_
A bit of background
It’s 2016:
● Datadog is running entirely in AWS in one region
● Our EC2 hosts are configured with Chef
● Software is deployed using Capistrano and Chef

@roboll_
A bit of background
The challenge:
● Replicate Datadog
○ In a second region
○ On a different cloud provider

@roboll_
An opportunity
Provide a proper platform:
● Native support for multiple cloud providers
● Native support for stateful workloads
● API driven and automation friendly
● Meet our projected scale

@roboll_
So, what is this talk about?
This is the talk we wish someone gave us at the beginning.
● What works?
● What’s broken?
● How can I avoid surprises?
● Hard earned lessons learned

@roboll_
Toolbox Pattern
● A toolbox is a pod that does nothing
○ Deployed alongside workloads
○ Image contains tools for ops
● Allow operators to use familiar tools
○ Access a shell using kubectl exec
Allow operators to gradually build cloud native tools

@roboll_
Native Pod Routing
● Overlay networks are expensive!
○ Encapsulated traffic (VXLAN, IPIP, etc.)
○ Bridging from host to container
● CNI provides flexibility in networking implementation
○ Plugins configure networking
Put pods on the native network for performance and simplicity

@roboll_
Container Runtime
● Containerd offers a simpler alternative to Docker
○ Smaller codebase that is more accessible
○ Less real world (independent) use
● Some bad bugs
○ Zombie process causing hung shim
○ Maintainers are very responsive
Containerd has less surface area, but is less mature

@roboll_
Control Plane Topology
● Kubernetes control plane has four components
○ Datastore: etcd
○ API Server
○ Scheduler
○ Controller Manager
● By default, they run colocated
○ On large clusters, this is problematic
○ To scale independently, they can be separated

@roboll_
Control Plane Topology

@roboll_
What’s broken?
Hope is not an option.

@roboll_
Load Balancer Services
● Cloud provider load balancers are integrated tightly into Kubernetes
○ A LoadBalancer service creates a load balancer and attaches every host
○ The kube-proxy on each host forwards traffic to the right pod
● ExternalTrafficPolicy determines which hosts to send traffic
○ With Local, only hosts with local pods receive traffic
○ With Cluster, all hosts in the cluster receive traffic

@roboll_
Pod Native Ingress
● Pod Native Ingress means that traffic is sent directly to pods
○ Requires routable pod IPs and a cloud provider abstraction
● No support for TCP
○ We’re working on support using L4 load balancers and custom resources

@roboll_
PKI
● PKI is used everywhere
○ Control plane, kubelet, webook configurations, aggregated apis, etc.
● No proper support for rotating credentials
etcd-io/etcd#9541 - etcd doesn’t reload certificates for connections to ip addresses
kubernetes/kubernetes#4672 - key/certificate rotation for kubernetes clients

@roboll_
PKI Workarounds
● Careful orchestration to enable rotation
● No solution from the community yet
● Several issues with work in progress

@roboll_
Ecosystem
● Dynamic community that is very eager to engage
● Many components lack production use and testing at scale
kubernetes/autoscaler - issues with greater than >50 node groups
kubernetes/kube-state-metrics - huge payload, not easily partitioned
kubernetes-incubator/external-dns - batch size, headless services, rate limits

@roboll_
Carefully vet your dependencies
● Kubernetes is highly automatable
○ Which means everyone is producing something
● Be careful what you pick up off the shelf

@roboll_
Cargo Culting
How can I keep a container running on Kubernetes?
https://stackoverflow.com/questions/31870222/how-can-i-keep-container-running-on-kube
rnetes

@roboll_
Invest in training
● The technology is new! For everyone!
● Engineers will find a way, and it may not be pretty
Give teams the tools and resources they need to succeed!

@roboll_
Namespace Organization
● “A single namespace is simpler...”
● Not concerned with isolation (for now)
● Data in etcd is organized by path
○ Performance degrades with poor distribution
Single namespace is a Bad Idea™

@roboll_
Namespaces are more than just access control
● Large namespaces are difficult to deal with
○ API responses are slow
○ CLI output is unreadable
● How big should a Namespace be?
○ Rough guideline: ~3k pods per namespace
○ Large clusters support hundreds of Namespaces
Organize Namespaces to limit the number of objects

@roboll_
“One of my pods isn’t running...”
● Pods fail scheduling with an error:
○ Image tag “latest” is not allowed
○ Where is the error coming from?
○ Why is it surfaced at runtime?
● Validating admission webhook registered on all pods
○ When pods are rescheduled, they fail the validation
○ Pod scheduling is often when there is no user present

@roboll_
Avoid Pod admission webhooks
● Admission webhooks are great for giving users feedback
○ Only at deploy time, never at runtime
● Pods are not controlled by users directly!
○ Usually driven by a workload controller
○ Unpredictable life cycle
Admission webhooks on pods give unactionable feedback

@roboll_
Stampede!
● We’re alerted by a sustained
increase in image pulls
● A DaemonSet is crash looping
on all clusters in a region
● Things escalate: all image pulls
start failing.
● We’re rate limited by our image
registry.

@roboll_
Avoid imagePullPolicy: Always
● The image was present on all hosts
○ Each crash triggered a new pull because of the imagePullPolicy
● imagePullPolicy: Always is useful for dynamic tags
○ Dynamic tags are unpredictable
Avoid dynamic image tags and imagePullPolicy: Always

@roboll_
Paying the early adopter tax
● It’s expensive!
○ Progress slows down
○ Users can get frustrated

@roboll_
Communicate with your customers
● Communicate clearly!
○ If users don’t understand the situation, they become frustrated
● Share successes, challenges, and plans

@roboll_
Incidents as an early adopter
● Two fundamental approaches
○ Restore service immediately, debug with forensics
■ Requires a high level of confidence in forensic data
○ Investigating causes in real time
■ Can extend disruption, not always an option
As an early adopter, forensics aren’t always reliable

@roboll_
The Very Hard Way
In summary:
● Kubernetes is extremely flexible and powerful
● Many parts of this ecosystem are still very immature
● The community is accessible and eager to help

@roboll_
Bye!
Thanks for listening!

Kubernetes: The Very Hard Way

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kubernetes: The Very Hard Way

Similar to Kubernetes: The Very Hard Way (20)

Recently uploaded

Recently uploaded (20)

Kubernetes: The Very Hard Way