SlideShare a Scribd company logo
Staying out of trouble with
K8S on AWS
Adam Hamsik
DevOps/Cloud Engineer
www.pixelfederation.com
1. Know your Enemy, Deep knowledge of
a. AWS
b. Kubernetes
i. Choose your CNI wisely
ii. Be aware of scheduler
c. Applications
2. Trust Your tools
a. Monitoring
b. ELK
c. Deployment tools
Staying out of trouble with K8S on AWS
TL;DR Summary
www.pixelfederation.com
1. Standard AWS HA procedures
2. Cluster Autoscaler
3. EBS volumes
a. EBS Volumes don’t work cross AZ
b. Kubernetes sometimes can’t find a place for a pod if all instances in a given
AZ are full
4. Choose the right Instance type for your application
Staying out of trouble with K8S on AWS
AWS Gotchas
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Kubernetes AWS architecture
www.pixelfederation.com
Staying out of trouble with K8S on AWS
K8s on AWS
Cluster Autoscaler
1. CA doesn’t understand AZ when auto scaling your cluster
a. Sometimes POD needs to run only in particular ZONE but CA will start new
node in another.
2. Use PodDistributionBudget to make sure that you have required number of pods running
3. Use podAntiAffinity to spread your replicas in multiple AZs, Nodes
4. CA vs AWS ASG rebalance policy can get cluster into a failure loop
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Real Life example
Cluster Autoscaler
1. Create application deployment with multiple replicas and EBS volumes as
update strategy use RollingUpdate
2. Change version and run upgrade
3. During upgrade CA will have to scale your cluster up based on MaxSurge
RollingUpdate parameter
4. There is 1 in 3 probability that new node will not be in a same AZ as original
one.
5. Upgrade can’t move forward and it’s blocked
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Kubernetes CA Multi AZ setup
www.pixelfederation.com
Staying out of trouble with K8S on AWS
K8s Node troubles
1. K8s scheduler wants to utilize your node as much as possible
a. It will schedule more pods on it than it’s physical resources can manage
2. Use kubelet limits to make sure pods are evicted from a node when it’s
utilized too much
3. Node problem detector is a daemon running as daemonset on each node
and checking if node is in correct state
a. Infrastructure daemon issues: ntp service down
b. Hardware issues: Bad cpu, memory or disk
c. Kernel issues: Kernel deadlock, corrupted file system
d. Container runtime issues: Unresponsive runtime daemon
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Real Life example
K8s Node troubles
1. Creating multiple Deployments on our cluster with containers not using
resource limits
2. Because without limits kubernetes scheduler has no idea about resources
every pod will need. It will run all pods on one node.
3. As resource usage of pods grows NODE will run out of HW resources
4. Kernel OOM killer will kill different systems services and NODE will become
unresponsive
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Node Size VS POD Size
K8s Node troubles
Not everything has to run in Kubernetes. Some things are better managed
in VMs.
If your application POD are almost as big as servers where you run them
it’s better to use VMs.
You have to plan your InstanceGroups Accordingly no need to have beefy
servers for small pods
www.pixelfederation.com
Staying out of trouble with K8S on AWS
K8s POD troubles
It essential to understand your workload and how does your application
behave in traffic.
1. POD resource limits and requests
a. Some applications need more ram/cpu during startup and later can work
with less plan accordingly.
b. Provide necessary info to K8s scheduler. Without this information scheduler
will work on best effort basis.
2. If your application goes over limit it will be killed by kernel and POD will be
restarted.
3. Set limits/requests relatively close together to make sure POD is not prime
suspect to free resources.
www.pixelfederation.com
Staying out of trouble with K8S on AWS
K8s POD troubles examples
1. Deployed application needs more ram during startup (logstash, ES)
2. During start application will exhaust it’s resource limits
3. Kernel OOM Killer will kill Logstash because it ran out of memory inside
it’s cgroup
4. Kubelet will restart application POD
www.pixelfederation.com
Staying out of trouble with K8S on AWS
K8s POD QoS
When Kubernetes creates a Pod it assigns one of these QoS classes
1. Guaranteed
a. Every Container in the Pod must have a memory/cpu limit and a memory/cpu
request, and they must be the same.
2. Burstable
a. The Pod does not meet the criteria for QoS class Guaranteed
3. BestEffort
a. For a Pod to be given a QoS class of BestEffort, the Containers in the Pod must not
have any memory or CPU limits or requests.
www.pixelfederation.com
Staying out of trouble with K8S on AWS
K8s Application troubleshooting
1. If your application is down start from as close as possible to a POD and
build from there.
a. Is your application healthy ? Do you have many restart on application POD ?
2. Can you access your application on a POD does it work ?
a. kubectl port-forward pod/pod-name local_port:remote_port
3. Can you access your application with a service ?
a. Kubectl port-forward
4. If everything above works and your ingress still doesn’t work check ingress
manifest.
www.pixelfederation.com
Staying out of trouble with K8S on AWS
When it something goes wrong
Kubernetes is a distributed application with many moving parts. Be
aware that any troubleshooting si a complicated process
1. Have your monitoring ready
a. Prometheus + Grafana works great
b. Prometheus can dynamically detect new services/pods and based on their
annotations scrape them for metrics.
2. Gather kubernetes events and logs
a. EFK
i. Gather Kubernetes logs from nodes/masters and push them to own elasticsearch
cluster
b. Gather Kubernetes events and store them in elasticsearch cluster
i. https://github.com/haad/event-exporter
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Grafana + Prometheus
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Questions ?
www.pixelfederation.com
Staying out of trouble with K8S on AWS
Thanks !
ahamsik@pixelfederation.com

More Related Content

What's hot

DevOps Summit 2016 - The immutable Journey
DevOps Summit 2016 - The immutable JourneyDevOps Summit 2016 - The immutable Journey
DevOps Summit 2016 - The immutable Journey
smalltown
 
Experimenting and Learning Kubernetes and Tensorflow
Experimenting and Learning Kubernetes and TensorflowExperimenting and Learning Kubernetes and Tensorflow
Experimenting and Learning Kubernetes and Tensorflow
Ben Hall
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
Amazon Web Services
 
Adopting Java for the Serverless world at JUG Hamburg
Adopting Java for the Serverless world at  JUG HamburgAdopting Java for the Serverless world at  JUG Hamburg
Adopting Java for the Serverless world at JUG Hamburg
Vadym Kazulkin
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
Amazon Web Services
 
DevOps 2015 - Dancing with Chef
DevOps 2015 - Dancing with ChefDevOps 2015 - Dancing with Chef
DevOps 2015 - Dancing with Chef
smalltown
 
PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...
PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...
PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...
Puppet
 
Fault Tolerance with Kubernetes
Fault Tolerance with KubernetesFault Tolerance with Kubernetes
Fault Tolerance with Kubernetes
Aditya Patawari
 
Multi host container networking
Multi host container networkingMulti host container networking
Multi host container networking
Weaveworks
 
Booting your Microservices Architecture with Spring & Netflix
Booting your Microservices Architecture with Spring & NetflixBooting your Microservices Architecture with Spring & Netflix
Booting your Microservices Architecture with Spring & Netflix
Joris Kuipers
 
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECSWeaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks
 
Performance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & WebdriverPerformance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & Webdriver
BlazeMeter
 
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructureHow Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructurePercolate
 
AWS Connect 2017 - Container (feat. AWS)
AWS Connect 2017 -  Container (feat. AWS)AWS Connect 2017 -  Container (feat. AWS)
AWS Connect 2017 - Container (feat. AWS)
smalltown
 
PHP deploy 2015 flavor - talk from php tour 2015 luxembourg
PHP deploy 2015 flavor - talk from php tour 2015 luxembourgPHP deploy 2015 flavor - talk from php tour 2015 luxembourg
PHP deploy 2015 flavor - talk from php tour 2015 luxembourg
Quentin Adam
 
Building a Production Grade PostgreSQL Cloud Foundry Service | anynines
Building a Production Grade PostgreSQL Cloud Foundry Service  | anyninesBuilding a Production Grade PostgreSQL Cloud Foundry Service  | anynines
Building a Production Grade PostgreSQL Cloud Foundry Service | anynines
anynines GmbH
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
Sudhir Tonse
 
Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...
Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...
Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...
Corley S.r.l.
 
Amazon SWF and Gordon
Amazon SWF and GordonAmazon SWF and Gordon
Amazon SWF and Gordon
Jonathan Weiss
 
Project Atomic [rootconf2015]
Project Atomic [rootconf2015]Project Atomic [rootconf2015]
Project Atomic [rootconf2015]
Aditya Patawari
 

What's hot (20)

DevOps Summit 2016 - The immutable Journey
DevOps Summit 2016 - The immutable JourneyDevOps Summit 2016 - The immutable Journey
DevOps Summit 2016 - The immutable Journey
 
Experimenting and Learning Kubernetes and Tensorflow
Experimenting and Learning Kubernetes and TensorflowExperimenting and Learning Kubernetes and Tensorflow
Experimenting and Learning Kubernetes and Tensorflow
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
 
Adopting Java for the Serverless world at JUG Hamburg
Adopting Java for the Serverless world at  JUG HamburgAdopting Java for the Serverless world at  JUG Hamburg
Adopting Java for the Serverless world at JUG Hamburg
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
 
DevOps 2015 - Dancing with Chef
DevOps 2015 - Dancing with ChefDevOps 2015 - Dancing with Chef
DevOps 2015 - Dancing with Chef
 
PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...
PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...
PuppetConf 2016: Scaling Puppet on AWS ECS with Terraform and Docker – Maxime...
 
Fault Tolerance with Kubernetes
Fault Tolerance with KubernetesFault Tolerance with Kubernetes
Fault Tolerance with Kubernetes
 
Multi host container networking
Multi host container networkingMulti host container networking
Multi host container networking
 
Booting your Microservices Architecture with Spring & Netflix
Booting your Microservices Architecture with Spring & NetflixBooting your Microservices Architecture with Spring & Netflix
Booting your Microservices Architecture with Spring & Netflix
 
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECSWeaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
 
Performance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & WebdriverPerformance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & Webdriver
 
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructureHow Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
 
AWS Connect 2017 - Container (feat. AWS)
AWS Connect 2017 -  Container (feat. AWS)AWS Connect 2017 -  Container (feat. AWS)
AWS Connect 2017 - Container (feat. AWS)
 
PHP deploy 2015 flavor - talk from php tour 2015 luxembourg
PHP deploy 2015 flavor - talk from php tour 2015 luxembourgPHP deploy 2015 flavor - talk from php tour 2015 luxembourg
PHP deploy 2015 flavor - talk from php tour 2015 luxembourg
 
Building a Production Grade PostgreSQL Cloud Foundry Service | anynines
Building a Production Grade PostgreSQL Cloud Foundry Service  | anyninesBuilding a Production Grade PostgreSQL Cloud Foundry Service  | anynines
Building a Production Grade PostgreSQL Cloud Foundry Service | anynines
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...
Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...
Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...
 
Amazon SWF and Gordon
Amazon SWF and GordonAmazon SWF and Gordon
Amazon SWF and Gordon
 
Project Atomic [rootconf2015]
Project Atomic [rootconf2015]Project Atomic [rootconf2015]
Project Atomic [rootconf2015]
 

Similar to Staying out of_trouble_with_k8s_on_aws

Kubernetes for Startups
Kubernetes for StartupsKubernetes for Startups
Kubernetes for Startups
Argonaut
 
Adam Hamsik - Kubernetes
Adam Hamsik - KubernetesAdam Hamsik - Kubernetes
Adam Hamsik - Kubernetes
Patricia Romanikova
 
Running Kubernetes
Running KubernetesRunning Kubernetes
Running Kubernetes
Pixel Federation
 
Aws summit 2019 running kubernetes
Aws summit 2019   running kubernetesAws summit 2019   running kubernetes
Aws summit 2019 running kubernetes
Adam Hamsik
 
Kubernetes @ pixel
Kubernetes @ pixelKubernetes @ pixel
Kubernetes @ pixel
Adam Hamsik
 
Kubernetes release 1.12
Kubernetes release 1.12Kubernetes release 1.12
Kubernetes release 1.12
Ovidiu Isai
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
tdc-globalcode
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
VMUG IT
 
Advanced Container Scheduling
Advanced Container SchedulingAdvanced Container Scheduling
Advanced Container Scheduling
Amazon Web Services
 
Managing containers at scale
Managing containers at scale          Managing containers at scale
Managing containers at scale
Smruti Ranjan Tripathy
 
A guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on KubernetesA guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on Kubernetes
t8kobayashi
 
給 RD 的 Kubernetes 初體驗 (EKS version)
給 RD 的 Kubernetes 初體驗 (EKS version)給 RD 的 Kubernetes 初體驗 (EKS version)
給 RD 的 Kubernetes 初體驗 (EKS version)
William Yeh
 
Docker clusters on AWS with Amazon ECS and Kubernetes
Docker clusters on AWS with Amazon ECS and KubernetesDocker clusters on AWS with Amazon ECS and Kubernetes
Docker clusters on AWS with Amazon ECS and Kubernetes
Julien SIMON
 
AWS Interview Questions and Answers.pdf
AWS Interview Questions and Answers.pdfAWS Interview Questions and Answers.pdf
AWS Interview Questions and Answers.pdf
nishajeni1
 
AWS Interview Questions and Answers_2023.pdf
AWS Interview Questions and Answers_2023.pdfAWS Interview Questions and Answers_2023.pdf
AWS Interview Questions and Answers_2023.pdf
nishajeni1
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the Cloud
Amazon Web Services
 
AWS ECS vs EKS
AWS ECS vs EKSAWS ECS vs EKS
AWS ECS vs EKS
Norberto Enomoto
 
Presentación11.pdf
Presentación11.pdfPresentación11.pdf
Presentación11.pdf
PabloCanesta
 
Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...
Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...
Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...
Codemotion
 
Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...
Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...
Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...
Codemotion
 

Similar to Staying out of_trouble_with_k8s_on_aws (20)

Kubernetes for Startups
Kubernetes for StartupsKubernetes for Startups
Kubernetes for Startups
 
Adam Hamsik - Kubernetes
Adam Hamsik - KubernetesAdam Hamsik - Kubernetes
Adam Hamsik - Kubernetes
 
Running Kubernetes
Running KubernetesRunning Kubernetes
Running Kubernetes
 
Aws summit 2019 running kubernetes
Aws summit 2019   running kubernetesAws summit 2019   running kubernetes
Aws summit 2019 running kubernetes
 
Kubernetes @ pixel
Kubernetes @ pixelKubernetes @ pixel
Kubernetes @ pixel
 
Kubernetes release 1.12
Kubernetes release 1.12Kubernetes release 1.12
Kubernetes release 1.12
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
Advanced Container Scheduling
Advanced Container SchedulingAdvanced Container Scheduling
Advanced Container Scheduling
 
Managing containers at scale
Managing containers at scale          Managing containers at scale
Managing containers at scale
 
A guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on KubernetesA guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on Kubernetes
 
給 RD 的 Kubernetes 初體驗 (EKS version)
給 RD 的 Kubernetes 初體驗 (EKS version)給 RD 的 Kubernetes 初體驗 (EKS version)
給 RD 的 Kubernetes 初體驗 (EKS version)
 
Docker clusters on AWS with Amazon ECS and Kubernetes
Docker clusters on AWS with Amazon ECS and KubernetesDocker clusters on AWS with Amazon ECS and Kubernetes
Docker clusters on AWS with Amazon ECS and Kubernetes
 
AWS Interview Questions and Answers.pdf
AWS Interview Questions and Answers.pdfAWS Interview Questions and Answers.pdf
AWS Interview Questions and Answers.pdf
 
AWS Interview Questions and Answers_2023.pdf
AWS Interview Questions and Answers_2023.pdfAWS Interview Questions and Answers_2023.pdf
AWS Interview Questions and Answers_2023.pdf
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the Cloud
 
AWS ECS vs EKS
AWS ECS vs EKSAWS ECS vs EKS
AWS ECS vs EKS
 
Presentación11.pdf
Presentación11.pdfPresentación11.pdf
Presentación11.pdf
 
Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...
Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...
Max Körbächer - AWS EKS and beyond master your Kubernetes deployment on AWS -...
 
Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...
Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...
Max Körbächer - AWS EKS and beyond – master your Kubernetes deployment on AWS...
 

Recently uploaded

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
ShahulHameed54211
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
Himani415946
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
TristanJasperRamos
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 

Recently uploaded (16)

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 

Staying out of_trouble_with_k8s_on_aws

  • 1. Staying out of trouble with K8S on AWS Adam Hamsik DevOps/Cloud Engineer
  • 2. www.pixelfederation.com 1. Know your Enemy, Deep knowledge of a. AWS b. Kubernetes i. Choose your CNI wisely ii. Be aware of scheduler c. Applications 2. Trust Your tools a. Monitoring b. ELK c. Deployment tools Staying out of trouble with K8S on AWS TL;DR Summary
  • 3. www.pixelfederation.com 1. Standard AWS HA procedures 2. Cluster Autoscaler 3. EBS volumes a. EBS Volumes don’t work cross AZ b. Kubernetes sometimes can’t find a place for a pod if all instances in a given AZ are full 4. Choose the right Instance type for your application Staying out of trouble with K8S on AWS AWS Gotchas
  • 4. www.pixelfederation.com Staying out of trouble with K8S on AWS Kubernetes AWS architecture
  • 5. www.pixelfederation.com Staying out of trouble with K8S on AWS K8s on AWS Cluster Autoscaler 1. CA doesn’t understand AZ when auto scaling your cluster a. Sometimes POD needs to run only in particular ZONE but CA will start new node in another. 2. Use PodDistributionBudget to make sure that you have required number of pods running 3. Use podAntiAffinity to spread your replicas in multiple AZs, Nodes 4. CA vs AWS ASG rebalance policy can get cluster into a failure loop
  • 6. www.pixelfederation.com Staying out of trouble with K8S on AWS Real Life example Cluster Autoscaler 1. Create application deployment with multiple replicas and EBS volumes as update strategy use RollingUpdate 2. Change version and run upgrade 3. During upgrade CA will have to scale your cluster up based on MaxSurge RollingUpdate parameter 4. There is 1 in 3 probability that new node will not be in a same AZ as original one. 5. Upgrade can’t move forward and it’s blocked
  • 7. www.pixelfederation.com Staying out of trouble with K8S on AWS Kubernetes CA Multi AZ setup
  • 8. www.pixelfederation.com Staying out of trouble with K8S on AWS K8s Node troubles 1. K8s scheduler wants to utilize your node as much as possible a. It will schedule more pods on it than it’s physical resources can manage 2. Use kubelet limits to make sure pods are evicted from a node when it’s utilized too much 3. Node problem detector is a daemon running as daemonset on each node and checking if node is in correct state a. Infrastructure daemon issues: ntp service down b. Hardware issues: Bad cpu, memory or disk c. Kernel issues: Kernel deadlock, corrupted file system d. Container runtime issues: Unresponsive runtime daemon
  • 9. www.pixelfederation.com Staying out of trouble with K8S on AWS Real Life example K8s Node troubles 1. Creating multiple Deployments on our cluster with containers not using resource limits 2. Because without limits kubernetes scheduler has no idea about resources every pod will need. It will run all pods on one node. 3. As resource usage of pods grows NODE will run out of HW resources 4. Kernel OOM killer will kill different systems services and NODE will become unresponsive
  • 10. www.pixelfederation.com Staying out of trouble with K8S on AWS Node Size VS POD Size K8s Node troubles Not everything has to run in Kubernetes. Some things are better managed in VMs. If your application POD are almost as big as servers where you run them it’s better to use VMs. You have to plan your InstanceGroups Accordingly no need to have beefy servers for small pods
  • 11. www.pixelfederation.com Staying out of trouble with K8S on AWS K8s POD troubles It essential to understand your workload and how does your application behave in traffic. 1. POD resource limits and requests a. Some applications need more ram/cpu during startup and later can work with less plan accordingly. b. Provide necessary info to K8s scheduler. Without this information scheduler will work on best effort basis. 2. If your application goes over limit it will be killed by kernel and POD will be restarted. 3. Set limits/requests relatively close together to make sure POD is not prime suspect to free resources.
  • 12. www.pixelfederation.com Staying out of trouble with K8S on AWS K8s POD troubles examples 1. Deployed application needs more ram during startup (logstash, ES) 2. During start application will exhaust it’s resource limits 3. Kernel OOM Killer will kill Logstash because it ran out of memory inside it’s cgroup 4. Kubelet will restart application POD
  • 13. www.pixelfederation.com Staying out of trouble with K8S on AWS K8s POD QoS When Kubernetes creates a Pod it assigns one of these QoS classes 1. Guaranteed a. Every Container in the Pod must have a memory/cpu limit and a memory/cpu request, and they must be the same. 2. Burstable a. The Pod does not meet the criteria for QoS class Guaranteed 3. BestEffort a. For a Pod to be given a QoS class of BestEffort, the Containers in the Pod must not have any memory or CPU limits or requests.
  • 14. www.pixelfederation.com Staying out of trouble with K8S on AWS K8s Application troubleshooting 1. If your application is down start from as close as possible to a POD and build from there. a. Is your application healthy ? Do you have many restart on application POD ? 2. Can you access your application on a POD does it work ? a. kubectl port-forward pod/pod-name local_port:remote_port 3. Can you access your application with a service ? a. Kubectl port-forward 4. If everything above works and your ingress still doesn’t work check ingress manifest.
  • 15. www.pixelfederation.com Staying out of trouble with K8S on AWS When it something goes wrong Kubernetes is a distributed application with many moving parts. Be aware that any troubleshooting si a complicated process 1. Have your monitoring ready a. Prometheus + Grafana works great b. Prometheus can dynamically detect new services/pods and based on their annotations scrape them for metrics. 2. Gather kubernetes events and logs a. EFK i. Gather Kubernetes logs from nodes/masters and push them to own elasticsearch cluster b. Gather Kubernetes events and store them in elasticsearch cluster i. https://github.com/haad/event-exporter
  • 16. www.pixelfederation.com Staying out of trouble with K8S on AWS Grafana + Prometheus
  • 17. www.pixelfederation.com Staying out of trouble with K8S on AWS Questions ?
  • 18. www.pixelfederation.com Staying out of trouble with K8S on AWS Thanks ! ahamsik@pixelfederation.com