SlideShare a Scribd company logo
1 of 28
Download to read offline
11.12.2018
Hardening Kubernetes
Setups: War Stories from
the Trenches of Production
Puja Abbassi - @puja108
- Developer Advocate /
Product Owner @ Giant
Swarm
- #CKA #Security
#Operators
- Data & Network
Science “Almost-PhD”
Puja
@puja108
@puja108 2
customer
product community
1. On running 100+
clusters
2. Postmortems - Lots of
them!
3. Hardening and
Best-Practices
Agenda
@puja108 3
On running 100+
clusters
@puja108 4
- Different Clouds
- Different Regions
- On-Premise
- China
100+ clusters
@puja108 5
- Companies
- Industries
- Users
- Use Cases
Diversity
@puja108 6
- Opt for Freedom
- Educate Users
- Harden up
Freedom vs.
Control
@puja108 7
Postmortems - Lots of them!
@puja108 8
“The primary goals of writing a postmortem are to
ensure that the incident is documented, that all
contributing root cause(s) are well understood, and,
especially, that effective preventive actions are put
in place to reduce the likelihood and/or impact of
recurrence.”
- Google SRE book
Postmortem Philosophy
@puja108 9
1. Gather Issues
2. Fix in Code
3. Roll out continuously
4. Profit 😉
Single Product
@puja108 10
- Issue Template
- High Priority
- Assigned to
x-functional team
Postmortem
Practice
@puja108 11
Load Balancing
Postmortems
@puja108
Team 1
PM Team 2
Team 3
12
500+ Postmortems
@puja108 13
War Stories
@puja108 14
@puja108
Kubernetes
upstream issue:
#57992
(fixed in 1.11.4
and 1.12.0)
15
- Faulty ingress
objects can break
controller
- Lots of teams + lots
of freedom
= lots of issues
Ingress
Controller
Misconfiguration
@puja108 16
@puja108
Ever built a
full-mesh IPIP
tunnels ICMP
pinger?
17
@puja108
Customer Load
Test goes bad?
You take the
blame!
- “Must be Calico,
kube-proxy, IC!”
- Turns out EC2 network
saturation was the
bottleneck
- Solution: More
workers!
18
Hardening and Best-Practices
@puja108 19
- Old versions
- Ingress (~15%)
- Networking & DNS
- Resource Pressure
- Multi-tenancy
Postmortem
Hotspots
@puja108 20
- Issues might have
been solved already
- CVEs
- Test Upgrades
extensively
- Automate Upgrades (or
have a process)
Old versions
@puja108 21
- NGINX IC: Newer
versions are less
prone to
misconfiguration
- Separate controllers
- Load- and
failover-testing
- Last resort:
SVC of type LB
Ingress
@puja108 22
- Monitor network
health
- Monitor DNS latency
- Check for known
issues
- Apply best practices
Networking & DNS
@puja108 23
- Resource Management!
- Include Buffers (lots
of them)
- Protect K8s and
critical addons
(priority)
Resource
Pressure
@puja108 24
Multi-tenancy
@puja108 25
- Separate and isolate
namespaces with RBAC
- No cluster-admins!
- Separate clusters if
possible
- Automate with CI/CD
- Minimize manual ops
Best Practices
@puja108 26
- Preemptive Monitoring &
Alerting are key!
- Logging (and Tracing)
help debugging
- Fix issues fast
- Educate users
- Have a postmortem process
- Train Recovery
Stand on the Shoulders of
Giants!
@puja108 27
- Kubernetes the very hard way - Datadog
- Scaling Kubernetes to 2,500 Nodes - OpenAI
- 5 - 15s DNS lookups on Kubernetes? - BitMEX
- Scaling CoreDNS in Kubernetes Clusters
- Inside Kubernetes Resource Management (QoS) -
Michael Gasch
- List of Kubernetes Best Practice talks/blogs
- Kubernetes Office Hours
Questions?
Stay in touch
- Twitter: @puja108
- Github: puja108
- Slack/Discuss: puja
@puja108 28
Thank you!

More Related Content

Similar to Hardening Kubernetes Setups: War Stories from the Trenches of Production - Puja Abbassi, Giant Swarm

Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesAjeet Singh Raina
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
 
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Olga Zinkevych
 
Desplegar en la nube y no morir en el intento - Plain Concepts Dev Day
Desplegar en la nube y no morir en el intento - Plain Concepts Dev DayDesplegar en la nube y no morir en el intento - Plain Concepts Dev Day
Desplegar en la nube y no morir en el intento - Plain Concepts Dev DayPlain Concepts
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regressionSusant Sahani
 
Kubernetes Cairo Meetup_dec_2019
Kubernetes Cairo Meetup_dec_2019Kubernetes Cairo Meetup_dec_2019
Kubernetes Cairo Meetup_dec_2019Ahmed Atef
 
Journey to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineJourney to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineGoogle Cloud Platform - Japan
 
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...DataStax
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture corehard_by
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
Unleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platformUnleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platformLakmal Warusawithana
 
The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016effie mouzeli
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreVikalp Bhalia
 
Kubernetes the deltatre way the basics - introduction to containers and orc...
Kubernetes the deltatre way   the basics - introduction to containers and orc...Kubernetes the deltatre way   the basics - introduction to containers and orc...
Kubernetes the deltatre way the basics - introduction to containers and orc...Rauno De Pasquale
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosTimothy St. Clair
 

Similar to Hardening Kubernetes Setups: War Stories from the Trenches of Production - Puja Abbassi, Giant Swarm (20)

Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
 
Desplegar en la nube y no morir en el intento - Plain Concepts Dev Day
Desplegar en la nube y no morir en el intento - Plain Concepts Dev DayDesplegar en la nube y no morir en el intento - Plain Concepts Dev Day
Desplegar en la nube y no morir en el intento - Plain Concepts Dev Day
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regression
 
Kubernetes Cairo Meetup_dec_2019
Kubernetes Cairo Meetup_dec_2019Kubernetes Cairo Meetup_dec_2019
Kubernetes Cairo Meetup_dec_2019
 
Journey to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container EngineJourney to Containerized Application / Google Container Engine
Journey to Containerized Application / Google Container Engine
 
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
Unleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platformUnleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platform
 
The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016The SaltStack Pub Crawl - Fosscomm 2016
The SaltStack Pub Crawl - Fosscomm 2016
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Dataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStoreDataweave Libraries and ObjectStore
Dataweave Libraries and ObjectStore
 
Kubernetes the deltatre way the basics - introduction to containers and orc...
Kubernetes the deltatre way   the basics - introduction to containers and orc...Kubernetes the deltatre way   the basics - introduction to containers and orc...
Kubernetes the deltatre way the basics - introduction to containers and orc...
 
sun solaris
sun solarissun solaris
sun solaris
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Hardening Kubernetes Setups: War Stories from the Trenches of Production - Puja Abbassi, Giant Swarm