SlideShare a Scribd company logo
@roboll_
Kubernetes:
The Very Hard Way
A tale of early adoption.
Rob Boll
Compute Lead
@roboll_
A bit of background
It’s 2016:
● Datadog is running entirely in AWS in one region
● Our EC2 hosts are configured with Chef
● Software is deployed using Capistrano and Chef
@roboll_
A bit of background
The challenge:
● Replicate Datadog
○ In a second region
○ On a different cloud provider
@roboll_
An opportunity
Provide a proper platform:
● Native support for multiple cloud providers
● Native support for stateful workloads
● API driven and automation friendly
● Meet our projected scale
@roboll_
So, what is this talk about?
This is the talk we wish someone gave us at the beginning.
● What works?
● What’s broken?
● How can I avoid surprises?
● Hard earned lessons learned
@roboll_
What works?
@roboll_
Toolbox Pattern
● A toolbox is a pod that does nothing
○ Deployed alongside workloads
○ Image contains tools for ops
● Allow operators to use familiar tools
○ Access a shell using kubectl exec
Allow operators to gradually build cloud native tools
@roboll_
Native Pod Routing
● Overlay networks are expensive!
○ Encapsulated traffic (VXLAN, IPIP, etc.)
○ Bridging from host to container
● CNI provides flexibility in networking implementation
○ Plugins configure networking
Put pods on the native network for performance and simplicity
@roboll_
Container Runtime
● Containerd offers a simpler alternative to Docker
○ Smaller codebase that is more accessible
○ Less real world (independent) use
● Some bad bugs
○ Zombie process causing hung shim
○ Maintainers are very responsive
Containerd has less surface area, but is less mature
@roboll_
Control Plane Topology
● Kubernetes control plane has four components
○ Datastore: etcd
○ API Server
○ Scheduler
○ Controller Manager
● By default, they run colocated
○ On large clusters, this is problematic
○ To scale independently, they can be separated
@roboll_
Control Plane Topology
@roboll_
What’s broken?
Hope is not an option.
@roboll_
Load Balancer Services
● Cloud provider load balancers are integrated tightly into Kubernetes
○ A LoadBalancer service creates a load balancer and attaches every host
○ The kube-proxy on each host forwards traffic to the right pod
● ExternalTrafficPolicy determines which hosts to send traffic
○ With Local, only hosts with local pods receive traffic
○ With Cluster, all hosts in the cluster receive traffic
@roboll_
Load Balancers
@roboll_
Pod Native Ingress
● Pod Native Ingress means that traffic is sent directly to pods
○ Requires routable pod IPs and a cloud provider abstraction
● No support for TCP
○ We’re working on support using L4 load balancers and custom resources
@roboll_
Pod Native Ingress
@roboll_
PKI
● PKI is used everywhere
○ Control plane, kubelet, webook configurations, aggregated apis, etc.
● No proper support for rotating credentials
etcd-io/etcd#9541 - etcd doesn’t reload certificates for connections to ip addresses
kubernetes/kubernetes#4672 - key/certificate rotation for kubernetes clients
@roboll_
PKI Workarounds
● Careful orchestration to enable rotation
● No solution from the community yet
● Several issues with work in progress
@roboll_
Ecosystem
● Dynamic community that is very eager to engage
● Many components lack production use and testing at scale
kubernetes/autoscaler - issues with greater than >50 node groups
kubernetes/kube-state-metrics - huge payload, not easily partitioned
kubernetes-incubator/external-dns - batch size, headless services, rate limits
@roboll_
Carefully vet your dependencies
● Kubernetes is highly automatable
○ Which means everyone is producing something
● Be careful what you pick up off the shelf
@roboll_
Surprises!
@roboll_
Cargo Culting
How can I keep a container running on Kubernetes?
https://stackoverflow.com/questions/31870222/how-can-i-keep-container-running-on-kube
rnetes
@roboll_
Invest in training
● The technology is new! For everyone!
● Engineers will find a way, and it may not be pretty
Give teams the tools and resources they need to succeed!
@roboll_
Namespace Organization
● “A single namespace is simpler...”
● Not concerned with isolation (for now)
● Data in etcd is organized by path
○ Performance degrades with poor distribution
Single namespace is a Bad Idea™
@roboll_
Namespaces are more than just access control
● Large namespaces are difficult to deal with
○ API responses are slow
○ CLI output is unreadable
● How big should a Namespace be?
○ Rough guideline: ~3k pods per namespace
○ Large clusters support hundreds of Namespaces
Organize Namespaces to limit the number of objects
@roboll_
“One of my pods isn’t running...”
● Pods fail scheduling with an error:
○ Image tag “latest” is not allowed
○ Where is the error coming from?
○ Why is it surfaced at runtime?
● Validating admission webhook registered on all pods
○ When pods are rescheduled, they fail the validation
○ Pod scheduling is often when there is no user present
@roboll_
Avoid Pod admission webhooks
● Admission webhooks are great for giving users feedback
○ Only at deploy time, never at runtime
● Pods are not controlled by users directly!
○ Usually driven by a workload controller
○ Unpredictable life cycle
Admission webhooks on pods give unactionable feedback
@roboll_
Stampede!
● We’re alerted by a sustained
increase in image pulls
● A DaemonSet is crash looping
on all clusters in a region
● Things escalate: all image pulls
start failing.
● We’re rate limited by our image
registry.
@roboll_
Avoid imagePullPolicy: Always
● The image was present on all hosts
○ Each crash triggered a new pull because of the imagePullPolicy
● imagePullPolicy: Always is useful for dynamic tags
○ Dynamic tags are unpredictable
Avoid dynamic image tags and imagePullPolicy: Always
@roboll_
Lessons Learned
@roboll_
Paying the early adopter tax
● It’s expensive!
○ Progress slows down
○ Users can get frustrated
@roboll_
Communicate with your customers
● Communicate clearly!
○ If users don’t understand the situation, they become frustrated
● Share successes, challenges, and plans
@roboll_
Incidents as an early adopter
● Two fundamental approaches
○ Restore service immediately, debug with forensics
■ Requires a high level of confidence in forensic data
○ Investigating causes in real time
■ Can extend disruption, not always an option
As an early adopter, forensics aren’t always reliable
@roboll_
The Very Hard Way
In summary:
● Kubernetes is extremely flexible and powerful
● Many parts of this ecosystem are still very immature
● The community is accessible and eager to help
@roboll_
Bye!
Thanks for listening!

More Related Content

What's hot

Neodev
NeodevNeodev
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise
 
React Native from Scratch | Session 01
React Native from Scratch | Session 01React Native from Scratch | Session 01
React Native from Scratch | Session 01
Amir Ahangari
 
Next Generation Automation in Ruckus Wireless
Next Generation Automation in Ruckus WirelessNext Generation Automation in Ruckus Wireless
Next Generation Automation in Ruckus WirelessDavid Ko
 
Last-Fi
Last-FiLast-Fi
Last-Fi
Ross McKinlay
 
meetPHP#8 - PHP startups prototypes
meetPHP#8 - PHP startups prototypesmeetPHP#8 - PHP startups prototypes
meetPHP#8 - PHP startups prototypes
Max Małecki
 
Tis the Season to Scale
Tis the Season to ScaleTis the Season to Scale
Tis the Season to Scale
James Cryer
 
Lamba scaffold webinar
Lamba scaffold webinarLamba scaffold webinar
Lamba scaffold webinar
Matt Billock
 
Reef: AJAX the way it should be 
Reef: AJAX the way it should be Reef: AJAX the way it should be 
Reef: AJAX the way it should be 
ESUG
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
Shimi Bandiel
 
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
Thoughtworks
 
Core FP Concepts
Core FP ConceptsCore FP Concepts
Core FP Concepts
Diego Pacheco
 
Vintool presentation-1
Vintool presentation-1Vintool presentation-1
Vintool presentation-1Mark McDonald
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
Shimi Bandiel
 
Multitenant SaaS Apps In Rails By Iqbal Hasnan
Multitenant SaaS Apps In Rails By Iqbal HasnanMultitenant SaaS Apps In Rails By Iqbal Hasnan
Multitenant SaaS Apps In Rails By Iqbal Hasnan
iqbal hasnan
 
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris BuytaertOSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
NETWAYS
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
Bruno Bossola
 
Php : Why and When!
Php : Why and When!Php : Why and When!
Php : Why and When!
Nishant Shrivastava
 
Rubigraph
RubigraphRubigraph
So you want to write a cloud function
So you want to write a cloud functionSo you want to write a cloud function
So you want to write a cloud function
Elad Hirsch
 

What's hot (20)

Neodev
NeodevNeodev
Neodev
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
React Native from Scratch | Session 01
React Native from Scratch | Session 01React Native from Scratch | Session 01
React Native from Scratch | Session 01
 
Next Generation Automation in Ruckus Wireless
Next Generation Automation in Ruckus WirelessNext Generation Automation in Ruckus Wireless
Next Generation Automation in Ruckus Wireless
 
Last-Fi
Last-FiLast-Fi
Last-Fi
 
meetPHP#8 - PHP startups prototypes
meetPHP#8 - PHP startups prototypesmeetPHP#8 - PHP startups prototypes
meetPHP#8 - PHP startups prototypes
 
Tis the Season to Scale
Tis the Season to ScaleTis the Season to Scale
Tis the Season to Scale
 
Lamba scaffold webinar
Lamba scaffold webinarLamba scaffold webinar
Lamba scaffold webinar
 
Reef: AJAX the way it should be 
Reef: AJAX the way it should be Reef: AJAX the way it should be 
Reef: AJAX the way it should be 
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
 
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
 
Core FP Concepts
Core FP ConceptsCore FP Concepts
Core FP Concepts
 
Vintool presentation-1
Vintool presentation-1Vintool presentation-1
Vintool presentation-1
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 
Multitenant SaaS Apps In Rails By Iqbal Hasnan
Multitenant SaaS Apps In Rails By Iqbal HasnanMultitenant SaaS Apps In Rails By Iqbal Hasnan
Multitenant SaaS Apps In Rails By Iqbal Hasnan
 
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris BuytaertOSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Php : Why and When!
Php : Why and When!Php : Why and When!
Php : Why and When!
 
Rubigraph
RubigraphRubigraph
Rubigraph
 
So you want to write a cloud function
So you want to write a cloud functionSo you want to write a cloud function
So you want to write a cloud function
 

Similar to Kubernetes: The Very Hard Way

Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Stanislav Pogrebnyak
 
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
DynamicInfraDays
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Karthik Murugesan
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Go at uber
Go at uberGo at uber
Go at uber
Rob Skillington
 
Meetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCMeetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaC
DamienCarpy
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does Matter
Anne Nicolas
 
Tapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout SessionTapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout Session
Weston Jossey
 
Who needs containers in a serverless world
Who needs containers in a serverless worldWho needs containers in a serverless world
Who needs containers in a serverless world
Matthias Luebken
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...
OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...
OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...
Russell Pavlicek
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
C4Media
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
Luciano Mammino
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
CloudOps2005
 
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Pôle Systematic Paris-Region
 
Inoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migrationInoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migration
OpenNebula Project
 
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
Gregory Taylor
 
How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product
Invotra
 

Similar to Kubernetes: The Very Hard Way (20)

Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
 
Container Days
Container DaysContainer Days
Container Days
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Go at uber
Go at uberGo at uber
Go at uber
 
Meetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCMeetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaC
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does Matter
 
Tapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout SessionTapjoy OpenStack Summit Paris Breakout Session
Tapjoy OpenStack Summit Paris Breakout Session
 
Who needs containers in a serverless world
Who needs containers in a serverless worldWho needs containers in a serverless world
Who needs containers in a serverless world
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...
OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...
OSAC16: Unikernel-powered Transient Microservices: Changing the Face of Softw...
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
 
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...
 
Inoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migrationInoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migration
 
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
Kubernetes at Reddit: An Origin Story - KubeCon NA 2018
 
How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product
 

Recently uploaded

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
bhadouriyakaku
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 

Recently uploaded (20)

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 

Kubernetes: The Very Hard Way

  • 1. @roboll_ Kubernetes: The Very Hard Way A tale of early adoption. Rob Boll Compute Lead
  • 2. @roboll_ A bit of background It’s 2016: ● Datadog is running entirely in AWS in one region ● Our EC2 hosts are configured with Chef ● Software is deployed using Capistrano and Chef
  • 3. @roboll_ A bit of background The challenge: ● Replicate Datadog ○ In a second region ○ On a different cloud provider
  • 4. @roboll_ An opportunity Provide a proper platform: ● Native support for multiple cloud providers ● Native support for stateful workloads ● API driven and automation friendly ● Meet our projected scale
  • 5. @roboll_ So, what is this talk about? This is the talk we wish someone gave us at the beginning. ● What works? ● What’s broken? ● How can I avoid surprises? ● Hard earned lessons learned
  • 7. @roboll_ Toolbox Pattern ● A toolbox is a pod that does nothing ○ Deployed alongside workloads ○ Image contains tools for ops ● Allow operators to use familiar tools ○ Access a shell using kubectl exec Allow operators to gradually build cloud native tools
  • 8. @roboll_ Native Pod Routing ● Overlay networks are expensive! ○ Encapsulated traffic (VXLAN, IPIP, etc.) ○ Bridging from host to container ● CNI provides flexibility in networking implementation ○ Plugins configure networking Put pods on the native network for performance and simplicity
  • 9. @roboll_ Container Runtime ● Containerd offers a simpler alternative to Docker ○ Smaller codebase that is more accessible ○ Less real world (independent) use ● Some bad bugs ○ Zombie process causing hung shim ○ Maintainers are very responsive Containerd has less surface area, but is less mature
  • 10. @roboll_ Control Plane Topology ● Kubernetes control plane has four components ○ Datastore: etcd ○ API Server ○ Scheduler ○ Controller Manager ● By default, they run colocated ○ On large clusters, this is problematic ○ To scale independently, they can be separated
  • 13. @roboll_ Load Balancer Services ● Cloud provider load balancers are integrated tightly into Kubernetes ○ A LoadBalancer service creates a load balancer and attaches every host ○ The kube-proxy on each host forwards traffic to the right pod ● ExternalTrafficPolicy determines which hosts to send traffic ○ With Local, only hosts with local pods receive traffic ○ With Cluster, all hosts in the cluster receive traffic
  • 15. @roboll_ Pod Native Ingress ● Pod Native Ingress means that traffic is sent directly to pods ○ Requires routable pod IPs and a cloud provider abstraction ● No support for TCP ○ We’re working on support using L4 load balancers and custom resources
  • 17. @roboll_ PKI ● PKI is used everywhere ○ Control plane, kubelet, webook configurations, aggregated apis, etc. ● No proper support for rotating credentials etcd-io/etcd#9541 - etcd doesn’t reload certificates for connections to ip addresses kubernetes/kubernetes#4672 - key/certificate rotation for kubernetes clients
  • 18. @roboll_ PKI Workarounds ● Careful orchestration to enable rotation ● No solution from the community yet ● Several issues with work in progress
  • 19. @roboll_ Ecosystem ● Dynamic community that is very eager to engage ● Many components lack production use and testing at scale kubernetes/autoscaler - issues with greater than >50 node groups kubernetes/kube-state-metrics - huge payload, not easily partitioned kubernetes-incubator/external-dns - batch size, headless services, rate limits
  • 20. @roboll_ Carefully vet your dependencies ● Kubernetes is highly automatable ○ Which means everyone is producing something ● Be careful what you pick up off the shelf
  • 22. @roboll_ Cargo Culting How can I keep a container running on Kubernetes? https://stackoverflow.com/questions/31870222/how-can-i-keep-container-running-on-kube rnetes
  • 23. @roboll_ Invest in training ● The technology is new! For everyone! ● Engineers will find a way, and it may not be pretty Give teams the tools and resources they need to succeed!
  • 24. @roboll_ Namespace Organization ● “A single namespace is simpler...” ● Not concerned with isolation (for now) ● Data in etcd is organized by path ○ Performance degrades with poor distribution Single namespace is a Bad Idea™
  • 25. @roboll_ Namespaces are more than just access control ● Large namespaces are difficult to deal with ○ API responses are slow ○ CLI output is unreadable ● How big should a Namespace be? ○ Rough guideline: ~3k pods per namespace ○ Large clusters support hundreds of Namespaces Organize Namespaces to limit the number of objects
  • 26. @roboll_ “One of my pods isn’t running...” ● Pods fail scheduling with an error: ○ Image tag “latest” is not allowed ○ Where is the error coming from? ○ Why is it surfaced at runtime? ● Validating admission webhook registered on all pods ○ When pods are rescheduled, they fail the validation ○ Pod scheduling is often when there is no user present
  • 27. @roboll_ Avoid Pod admission webhooks ● Admission webhooks are great for giving users feedback ○ Only at deploy time, never at runtime ● Pods are not controlled by users directly! ○ Usually driven by a workload controller ○ Unpredictable life cycle Admission webhooks on pods give unactionable feedback
  • 28. @roboll_ Stampede! ● We’re alerted by a sustained increase in image pulls ● A DaemonSet is crash looping on all clusters in a region ● Things escalate: all image pulls start failing. ● We’re rate limited by our image registry.
  • 29. @roboll_ Avoid imagePullPolicy: Always ● The image was present on all hosts ○ Each crash triggered a new pull because of the imagePullPolicy ● imagePullPolicy: Always is useful for dynamic tags ○ Dynamic tags are unpredictable Avoid dynamic image tags and imagePullPolicy: Always
  • 31. @roboll_ Paying the early adopter tax ● It’s expensive! ○ Progress slows down ○ Users can get frustrated
  • 32. @roboll_ Communicate with your customers ● Communicate clearly! ○ If users don’t understand the situation, they become frustrated ● Share successes, challenges, and plans
  • 33. @roboll_ Incidents as an early adopter ● Two fundamental approaches ○ Restore service immediately, debug with forensics ■ Requires a high level of confidence in forensic data ○ Investigating causes in real time ■ Can extend disruption, not always an option As an early adopter, forensics aren’t always reliable
  • 34. @roboll_ The Very Hard Way In summary: ● Kubernetes is extremely flexible and powerful ● Many parts of this ecosystem are still very immature ● The community is accessible and eager to help