SlideShare a Scribd company logo
1 of 12
Download to read offline
Kubernetes Day 2 @ ZSE Energia, a.s.
Miro Toma
November 10th, 2021
About
Me
• IT nerd since the dawn of time
• 25 years professional experience
• Held various positions covering most functions in IT
stack
• Passionate about tech & new trends
• Stirring the IT pot in the utilities sector since 2014
ZSE Energia, a.s.
• Major energy supplier in Slovakia
• Part of larger ZSE group
• Commercial company (not state managed!)
• Small internal IT unit
• Heavy reliance on vendors (not a dev shop)
The (somewhat) accelerated journey
Day 0 & 1 - Now or never
• K8s incepted as a target platform for an
ongoing high-profile project
• Severely limited infrastructure support
capacities (human) at the time [couldn’t
deploy on ‘classic’ VMs]
• Anticipated uptime requirements
Day 2 start – Apr 2019
• ingress
• logs (Fluentd, Elasticsearch, Kibana)
• 1 app namespace
• no native monitoring* (!DON’T!)
* trivial heartbeat monitoring with Zabbix
Later that (2nd) day..
• elasticsearch->opendistro->opensearch
• fluentd->fluent-bit
• vendor namespaces (SaaS model with ‘our’
infrastructure)
• calico (cluster reinstall)
• cert-manager
• prometheus/alert manager/grafana
• real backups (!)
• zookeeper
• kafka
Day 0 to Day 2 in <6 months
Backups
• “CI/CD pipeline will take care of the cluster rebuild“
• Until it won’t:
• persistent volumes
• manual tweaks (don’t !)
• ..
• Solutions exist to take whole-cluster backups, including volumes
• Use-case – migrate cluster between cloud subscriptions
• migration supported by cloud vendor for majority of resources
• but not Kubernetes (!)
• 4 hours vs. multi-month project
Don’t Question Your Vendor’s Infrastructure Sizing
• Obscene asks for CPU and memory
• Questioning never lead to a significant difference
0.1 (10% of a single CPU) ~1.2GB RAM
Example project ask – two-machine cluster with 4CPU, 16GB RAM each. Real life:
Deploy and set real quotas afterwards
• real world is a fraction of the original ask (no
exceptions yet)
• should thing go south, you can tune on the fly
Budget for Disruptions, Promote ‘Aversion’
• Define disruption budgets (religiously)
• beta since 1.5; prod from 1.21
• your app won’t potentially disappear on a node drain
• Strive to distribute pods across multiple nodes
• use podAntiAffinity as a rule
• consider using descheduler
• Sample scenario (real life):
1. all ingress pods eventually ended up running on a single node
2. drain the specific node hosting all ingress pods
3. no ingress (i.e. ‘cluster is down’) for a non-insignificant moment
Let Them Die Peacefully
• 30 secs default timeout to terminate may not be good for all
• Long running consumer queries
• Lengthy cleanup processes (e.g. to keep PVs consistent)
• Hooks delaying the TERM signal eats into the total budget
• Use rather generous terminationGracePeriodSeconds
• should the container terminate earlier, the control plane will notice
• Not everyone plays nice with TERM
• Use preStop hooks
Dying Containers Won’t Accept New Work
• Updating deployments, stateful-sets, kubectl delete pod xxx & co
• ‘Terminating’ a pod:
• containers receive TERM signal -> stop accepting new requests
• network (CNI) – in parallel - starts converging endpoints/services
• until converged, the terminating pods will deny new requests
• preStop hooks to delay TERM, thus giving time for network to converge
• don’t want, but also can’t really set a dependency on isolating a pod before shutting it
down (split-brain situations)
• 8 secs worked fine so far (exceptions)
Cluster upgrades
• Started @1.15, now on 1.20
• Upgrading a managed cluster is a breeze – until it isn’t
• fairly complex process - on a managed cluster you don’t get all the knobs and buttons to comfortably identify/fix
an issue
• two incidents yet:
• medium 1.16 -> 1.17 (upgrade stopped in the middle; documented fix/workaround)
• huge 1.19 -> 1.20 (internal cluster network went south, node pool ‘failed’)
• Both issues traced to node drain timeouts
• provider’s upgrade scripts define (weakly documented) node drain timeout for upgrades
• longer termination periods, multiplied by disruption budgets prolong node drains
• Current approach:
• upgrade control plane first (separately)
• create new node-pool(s) at the ugraded version
• manually drain old nodes
• delete old pools
Some Major Roads NOT Taken
Helm
• Initial eval with v2 (might take different twist now)
• Many charts ‘opinionated’
• Some charts drag-in dependencies we didn’t want
Operator frenzy (i.e. operator for everything)
• Many operators undergoing major revisions (would be hard to keep up)
• Many offerings for the same use-case, frequently neither matching all of our requirements
• Single manifest modification/deletion may evaporate your service in an instant
Pre-packaged pipelines (e.g. Banzai)
• Very early in development at the time
Note: These decisions were taken based on situation around 2018/2019. Some will be revisited in due course
Some More Takeaways
• Don’t rush Day 2
• Dedicate resources for day 0 & 1
• Day-to-day ops are surprisingly modest
• Adoption by ‘traditional’ IT departments may be a journey on its own…
• Local market uptake for K8s (still) lagging
• pushing & training vendors for adoption of k8s
• some vendors still ‘resist’, but some became proponents
• Stay cloud-agnostic
• minimize utilization of cloud-specific services
Thanks

More Related Content

Similar to Kubernetes day 2 @ zse energia

Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!Stuart Charlton
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Kieran Kunhya
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
 
Kubernetes Manchester - 6th December 2018
Kubernetes Manchester - 6th December 2018Kubernetes Manchester - 6th December 2018
Kubernetes Manchester - 6th December 2018David Stockton
 
The challenges of generating 2110 streams on Standard IT Hardware
The challenges of generating 2110 streams on Standard IT HardwareThe challenges of generating 2110 streams on Standard IT Hardware
The challenges of generating 2110 streams on Standard IT HardwareKieran Kunhya
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
Instrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionInstrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionbcantrill
 
Ethereum Classic Shanghai: Products and Services
Ethereum Classic Shanghai: Products and ServicesEthereum Classic Shanghai: Products and Services
Ethereum Classic Shanghai: Products and ServicesAvtar Sehra
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentTimothy Fitz
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...InfluxData
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...DevOps.com
 
FreeBSD: The Next 10 Years (MeetBSD 2014)
FreeBSD: The Next 10 Years (MeetBSD 2014)FreeBSD: The Next 10 Years (MeetBSD 2014)
FreeBSD: The Next 10 Years (MeetBSD 2014)iXsystems
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture corehard_by
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Dan Cundiff
 

Similar to Kubernetes day 2 @ zse energia (20)

Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!Platform Clouds, Containers, Immutable Infrastructure Oh My!
Platform Clouds, Containers, Immutable Infrastructure Oh My!
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Happy users and good sleep. How?
Happy users and good sleep. How?Happy users and good sleep. How?
Happy users and good sleep. How?
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
Kubernetes Manchester - 6th December 2018
Kubernetes Manchester - 6th December 2018Kubernetes Manchester - 6th December 2018
Kubernetes Manchester - 6th December 2018
 
The challenges of generating 2110 streams on Standard IT Hardware
The challenges of generating 2110 streams on Standard IT HardwareThe challenges of generating 2110 streams on Standard IT Hardware
The challenges of generating 2110 streams on Standard IT Hardware
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Instrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionInstrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in production
 
Ethereum Classic Shanghai: Products and Services
Ethereum Classic Shanghai: Products and ServicesEthereum Classic Shanghai: Products and Services
Ethereum Classic Shanghai: Products and Services
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous Deployment
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
 
FreeBSD: The Next 10 Years (MeetBSD 2014)
FreeBSD: The Next 10 Years (MeetBSD 2014)FreeBSD: The Next 10 Years (MeetBSD 2014)
FreeBSD: The Next 10 Years (MeetBSD 2014)
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 

More from Juraj Hantak

Kubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pfKubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pfJuraj Hantak
 
Dev ops culture_final
Dev ops culture_finalDev ops culture_final
Dev ops culture_finalJuraj Hantak
 
Integracia security do ci cd pipelines
Integracia security do ci cd pipelinesIntegracia security do ci cd pipelines
Integracia security do ci cd pipelinesJuraj Hantak
 
Secrets management vault cncf meetup
Secrets management vault cncf meetupSecrets management vault cncf meetup
Secrets management vault cncf meetupJuraj Hantak
 
Introductiontohelmcharts2021
Introductiontohelmcharts2021Introductiontohelmcharts2021
Introductiontohelmcharts2021Juraj Hantak
 
Intro to creating kubernetes operators
Intro to creating kubernetes operators Intro to creating kubernetes operators
Intro to creating kubernetes operators Juraj Hantak
 
19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetesJuraj Hantak
 
16. Cncf meetup-docker
16. Cncf meetup-docker16. Cncf meetup-docker
16. Cncf meetup-dockerJuraj Hantak
 
16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetes16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetesJuraj Hantak
 
Terraform a gitlab ci
Terraform a gitlab ciTerraform a gitlab ci
Terraform a gitlab ciJuraj Hantak
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleJuraj Hantak
 
Kubernetes monitoring using prometheus stack
Kubernetes monitoring using prometheus stackKubernetes monitoring using prometheus stack
Kubernetes monitoring using prometheus stackJuraj Hantak
 
12.cncfsk meetup observability and analysis
12.cncfsk meetup observability and analysis12.cncfsk meetup observability and analysis
12.cncfsk meetup observability and analysisJuraj Hantak
 
Nginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lkNginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lkJuraj Hantak
 

More from Juraj Hantak (20)

Kubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pfKubernetes day 2_jozef_halgas_pf
Kubernetes day 2_jozef_halgas_pf
 
Dev ops culture_final
Dev ops culture_finalDev ops culture_final
Dev ops culture_final
 
Promise of DevOps
Promise of DevOpsPromise of DevOps
Promise of DevOps
 
23 meetup rancher
23 meetup rancher23 meetup rancher
23 meetup rancher
 
Integracia security do ci cd pipelines
Integracia security do ci cd pipelinesIntegracia security do ci cd pipelines
Integracia security do ci cd pipelines
 
CNCF opa
CNCF opaCNCF opa
CNCF opa
 
Secrets management vault cncf meetup
Secrets management vault cncf meetupSecrets management vault cncf meetup
Secrets management vault cncf meetup
 
Introductiontohelmcharts2021
Introductiontohelmcharts2021Introductiontohelmcharts2021
Introductiontohelmcharts2021
 
Intro to creating kubernetes operators
Intro to creating kubernetes operators Intro to creating kubernetes operators
Intro to creating kubernetes operators
 
19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes19. stretnutie komunity kubernetes
19. stretnutie komunity kubernetes
 
16. Cncf meetup-docker
16. Cncf meetup-docker16. Cncf meetup-docker
16. Cncf meetup-docker
 
16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetes16. meetup sietovy model v kubernetes
16. meetup sietovy model v kubernetes
 
16.meetup uvod
16.meetup uvod16.meetup uvod
16.meetup uvod
 
14. meetup
14. meetup14. meetup
14. meetup
 
Terraform a gitlab ci
Terraform a gitlab ciTerraform a gitlab ci
Terraform a gitlab ci
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Kubernetes monitoring using prometheus stack
Kubernetes monitoring using prometheus stackKubernetes monitoring using prometheus stack
Kubernetes monitoring using prometheus stack
 
12.cncfsk meetup observability and analysis
12.cncfsk meetup observability and analysis12.cncfsk meetup observability and analysis
12.cncfsk meetup observability and analysis
 
Grafana 7.0
Grafana 7.0Grafana 7.0
Grafana 7.0
 
Nginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lkNginx app protect-for-meetup-v1.0-202006_lk
Nginx app protect-for-meetup-v1.0-202006_lk
 

Recently uploaded

VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 

Recently uploaded (20)

VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 

Kubernetes day 2 @ zse energia

  • 1. Kubernetes Day 2 @ ZSE Energia, a.s. Miro Toma November 10th, 2021
  • 2. About Me • IT nerd since the dawn of time • 25 years professional experience • Held various positions covering most functions in IT stack • Passionate about tech & new trends • Stirring the IT pot in the utilities sector since 2014 ZSE Energia, a.s. • Major energy supplier in Slovakia • Part of larger ZSE group • Commercial company (not state managed!) • Small internal IT unit • Heavy reliance on vendors (not a dev shop)
  • 3. The (somewhat) accelerated journey Day 0 & 1 - Now or never • K8s incepted as a target platform for an ongoing high-profile project • Severely limited infrastructure support capacities (human) at the time [couldn’t deploy on ‘classic’ VMs] • Anticipated uptime requirements Day 2 start – Apr 2019 • ingress • logs (Fluentd, Elasticsearch, Kibana) • 1 app namespace • no native monitoring* (!DON’T!) * trivial heartbeat monitoring with Zabbix Later that (2nd) day.. • elasticsearch->opendistro->opensearch • fluentd->fluent-bit • vendor namespaces (SaaS model with ‘our’ infrastructure) • calico (cluster reinstall) • cert-manager • prometheus/alert manager/grafana • real backups (!) • zookeeper • kafka Day 0 to Day 2 in <6 months
  • 4. Backups • “CI/CD pipeline will take care of the cluster rebuild“ • Until it won’t: • persistent volumes • manual tweaks (don’t !) • .. • Solutions exist to take whole-cluster backups, including volumes • Use-case – migrate cluster between cloud subscriptions • migration supported by cloud vendor for majority of resources • but not Kubernetes (!) • 4 hours vs. multi-month project
  • 5. Don’t Question Your Vendor’s Infrastructure Sizing • Obscene asks for CPU and memory • Questioning never lead to a significant difference 0.1 (10% of a single CPU) ~1.2GB RAM Example project ask – two-machine cluster with 4CPU, 16GB RAM each. Real life: Deploy and set real quotas afterwards • real world is a fraction of the original ask (no exceptions yet) • should thing go south, you can tune on the fly
  • 6. Budget for Disruptions, Promote ‘Aversion’ • Define disruption budgets (religiously) • beta since 1.5; prod from 1.21 • your app won’t potentially disappear on a node drain • Strive to distribute pods across multiple nodes • use podAntiAffinity as a rule • consider using descheduler • Sample scenario (real life): 1. all ingress pods eventually ended up running on a single node 2. drain the specific node hosting all ingress pods 3. no ingress (i.e. ‘cluster is down’) for a non-insignificant moment
  • 7. Let Them Die Peacefully • 30 secs default timeout to terminate may not be good for all • Long running consumer queries • Lengthy cleanup processes (e.g. to keep PVs consistent) • Hooks delaying the TERM signal eats into the total budget • Use rather generous terminationGracePeriodSeconds • should the container terminate earlier, the control plane will notice • Not everyone plays nice with TERM • Use preStop hooks
  • 8. Dying Containers Won’t Accept New Work • Updating deployments, stateful-sets, kubectl delete pod xxx & co • ‘Terminating’ a pod: • containers receive TERM signal -> stop accepting new requests • network (CNI) – in parallel - starts converging endpoints/services • until converged, the terminating pods will deny new requests • preStop hooks to delay TERM, thus giving time for network to converge • don’t want, but also can’t really set a dependency on isolating a pod before shutting it down (split-brain situations) • 8 secs worked fine so far (exceptions)
  • 9. Cluster upgrades • Started @1.15, now on 1.20 • Upgrading a managed cluster is a breeze – until it isn’t • fairly complex process - on a managed cluster you don’t get all the knobs and buttons to comfortably identify/fix an issue • two incidents yet: • medium 1.16 -> 1.17 (upgrade stopped in the middle; documented fix/workaround) • huge 1.19 -> 1.20 (internal cluster network went south, node pool ‘failed’) • Both issues traced to node drain timeouts • provider’s upgrade scripts define (weakly documented) node drain timeout for upgrades • longer termination periods, multiplied by disruption budgets prolong node drains • Current approach: • upgrade control plane first (separately) • create new node-pool(s) at the ugraded version • manually drain old nodes • delete old pools
  • 10. Some Major Roads NOT Taken Helm • Initial eval with v2 (might take different twist now) • Many charts ‘opinionated’ • Some charts drag-in dependencies we didn’t want Operator frenzy (i.e. operator for everything) • Many operators undergoing major revisions (would be hard to keep up) • Many offerings for the same use-case, frequently neither matching all of our requirements • Single manifest modification/deletion may evaporate your service in an instant Pre-packaged pipelines (e.g. Banzai) • Very early in development at the time Note: These decisions were taken based on situation around 2018/2019. Some will be revisited in due course
  • 11. Some More Takeaways • Don’t rush Day 2 • Dedicate resources for day 0 & 1 • Day-to-day ops are surprisingly modest • Adoption by ‘traditional’ IT departments may be a journey on its own… • Local market uptake for K8s (still) lagging • pushing & training vendors for adoption of k8s • some vendors still ‘resist’, but some became proponents • Stay cloud-agnostic • minimize utilization of cloud-specific services