SlideShare a Scribd company logo
1 of 15
Download to read offline
Lobby
Agenda
● K8s Scheduler
○ Architecture
○ Decision Tree
○ Best Fit Algorithm
■ Filtering Sub-Algorithm
■ Scoring Sub-Algorithm
● Expectations vs. Reality
● Job Scheduling Problem
○ Google’s Guys Solutions
○ Other Proposed Solutions
● Key Takeaways
2
Before we start… the sources…
● ❌ Medium.com “look-at-me” posts.
● ❌ Vendor marketing mumbo-jumbo.
● ✅ Peer reviewed CS publications.
● ✅ K8s failure stories (https://k8s.af/).
● ✅ Golang code review.
● ✅ Memes.
3
Kubernetes Scheduler
Kube Scheduler is responsible for selecting the
worker node and provision the pod on the target
node according to well-known pre-defined rules. 4
Scheduler Decision Tree
● Scheduling Policies:
○ Deprecated since Kubernetes >= v1.23
5
Metrics Server
func schedulePod
https://github.com/kubernetes/kubernetes/blob/e4c8802407fbaf
fad126685280e72145d89b125e/pkg/scheduler/schedule_one.
go#L335
● Best Fit Algorithm:
○ YAML rules specification.
○ Predicates => Filtering => Candidates
○ Priorities => Scoring => Ranking
Filtering Algorithm => Candidates
● ✅ General predicates:
○ e.g. PodFitsResources,PodFitsNodeSelector
● ✅ Storage predicates:
○ e.g. NoDiskConflict,MaxCSIVolumeCount
● ✅ Compute predicates:
○ e.g. PodToleratesNodeTaint
● ✅ Runtime predicates:
○ e.g. CheckNodeCondition, CheckNodeMemoryPressure
6
func findNodesThatFitPod
https://github.com/kubernetes/kubernete
s/blob/e4c8802407fbaffad126685280e7
2145d89b125e/pkg/scheduler/schedule
_one.go#L388
Scoring Algorithm => Ranking
● Priority function(s) return(s) a weight from 0-10.
● Sum of all priority function result is the final score.
● Nodes ranked (sorted) and high(er|est) become the target.
7
func prioritizeNodes
https://github.com/kubernetes/kuberne
tes/blob/e4c8802407fbaffad12668528
0e72145d89b125e/pkg/scheduler/sche
dule_one.go#L635
● Priority functions (a lot, some examples):
○ ✅ SelectorSpreadPriority: node is in desired topology domain?
○ ✅ CalculateNodeLabelPriority: nodes matches specified label(s)?
○ ✅ *AffinityPriority: node is attracting or repelling?
○ ❗LeastRequestedPriority: node is “least-loaded”?
○ ❗BalancedResourceAllocation: CPU/Memory afterwards? (A bet)
Expectations vs. Reality
● Expectation of Rebalance on:
○ Natural resource usage/demand.
○ Deployments, Restarts, Terminations.
○ Vertical/horizontal scaling operations.
8
● Reality at Compute Level:
○ UNEVEN DISTRIBUTION:
■ for Node in Cluster.NodeGroup:
Node.podCount() !=
NodeGroup.podCount()
/ Cluster.nodeCount()
● Reality at Storage / Network Level:
○ TL;TR here, qualify for several long ☕🍩 breaks.
○ AGGLOMERATION:
■ HIGH: “Too many pods” => “Overloaded”.
■ LOW: “Too few pods=> “Underutilized”.
Scheduling Problem - Fact 1 of 2
9
Static
Score
Job Scheduling Uncertainty:
● Input assumed for score
calculation is changing
before calculation is done.
● Output should be sort of
probability of an unchanged
context but in reality it is not.
Dynamic
Score
!=
Chaos &
Entropy
~=
Scheduling Problem - Fact 2 of 2
Optimal Job Scheduling is an Non-deterministic
Polynomial Complete (NPC) problem, which means:
10
Note: A chess game ⇔ P problem.
✅ The solution can be guessed and verified in P time.
❗There is no particular rule to make the guess.
😲 It’s not known whether any polynomial-time
algorithms will ever be found for NPC problems, it
remains one of the most important questions in
computer science.
It means no efficient P-time
algorithm has been found
for Job Scheduling, you
have to use best-guess
solutions.
Uneven Distribution
● “Solution”: Pod Topology Spread Constraints
○ maxSkew: 1
■ Distribute pods in an absolute even manner
○ topologyKey: kubernetes.io/hostname
■ Use the hostname as topology domain
○ whenUnsatisfiable: ScheduleAnyway
■ Always schedule pods even if it can’t satisfy
○ labelSelector
■ Only act on Pods that match this selector
whenUnsatisfiable:
ScheduleAnyway
Example: “Heavy Cron Jobs”
11
● Drawbacks:
○ Static scheduling
■ Triggered by deployment or failure
○ Conflicts with other strategies (to come next).
Agglomeration
● “Solution”: Inter-Pod affinity and anti-affinity
○ podAffinity: Attracts pods to a node.
○ podAntiAffinity: Repels pods from a node.
On-Demand
(us-east-1)
Spot
(eu-central-1) 12
Example: “Geo Spot Instances”
● Drawbacks:
○ Static scheduling (same story).
○ Conflicts with the previous Topology Constraints:
■ requiredDuringSchedulingIgnoredDuringExe
cution => Only a single Pod per domain.
■ preferredDuringSchedulingIgnoredDuringEx
ecution => Not enforced (e.g. Termination).
○ Not for real time rebalancing just for “(...)different topology
domains to achieve either high availability or cost-saving”.
Other Proposals
● Descheduler:
Promising multi-strategy (re-)scheduling:
https://github.com/kubernetes-sigs/descheduler
13
🤡
Descheduler and K8s Plugins are
SIGs ( Special Interest Group) projects.
● Winter Soldier by DevTron Labs:
Downscale to 0 pods. Conflicts w/AutoScaler?
https://github.com/devtron-labs/winter-soldier
● Refined Balanced Resource Allocation (New):
Promising dynamic metrics for schedulers (China).
https://ceur-ws.org/Vol-3304/paper07.pdf
● Kube Scheduler Plugins:
DYI: Just go and f@#ing write your own stuff?
https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/624-scheduling-framework
● Low Carbon Kubernetes Scheduler:
“Heliotropic multicountry” scheduler to save electricity 🚀
https://ceur-ws.org/Vol-2382/ICT4S2019_paper_28.pdf
Key Takeaways
● Usability ~> Maybe?:
○ Better suited for steady workloads sets,
not high peaky (e.g. node overload => pod unsch.).
○ Good for HA and cost saving, bad RT balancing.
○ Worsen unpredicted situations complexity
(i.e. YAML hotfix?, inter-dependency)).
14
● Reliability ~> Definitely not!:
○ No standard solution that fit all app / teams.
○ Requires a lot of edge cases testing, between
co-existent applications (same node group).
○ No easy way to humanely trace the decisions
taken by scheduler (e.g. during live issue).
○ Too many metric/rules siloed input for decisions:
i. AWS Auto Scaling (e.g. Fixed, Scheduled).
=> EC2 On-Demand vs. Spot => AWS guess.
ii. K8s AutoScaler (even without metrics).
=> “Rescheduler” => Dev/DevOps guess.
iii. Capacity right sizing (i.e. EC2, limits/requests guess)
● Costs? Well, is another story of me(gue)ssing AWS RI/SP…
The
End

More Related Content

Similar to Kubernetes Workload Rebalancing

PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinPGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinEqunix Business Solutions
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Drooling for drools (JBoss webex)
Drooling for drools (JBoss webex)Drooling for drools (JBoss webex)
Drooling for drools (JBoss webex)Geoffrey De Smet
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarGlobalLogic Ukraine
 
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle ContestDA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle ContestBerker Kozan
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)RichardWarburton
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
Aws uk ug #8 not everything that happens in vegas stay in vegas
Aws uk ug #8   not everything that happens in vegas stay in vegasAws uk ug #8   not everything that happens in vegas stay in vegas
Aws uk ug #8 not everything that happens in vegas stay in vegasPeter Mounce
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLSeldon
 
Big data101kagglepresentation
Big data101kagglepresentationBig data101kagglepresentation
Big data101kagglepresentationAlexandru Sisu
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...
Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...
Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...Data Con LA
 

Similar to Kubernetes Workload Rebalancing (20)

Caching in
Caching inCaching in
Caching in
 
Case Study
Case Study Case Study
Case Study
 
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinPGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Drooling for drools (JBoss webex)
Drooling for drools (JBoss webex)Drooling for drools (JBoss webex)
Drooling for drools (JBoss webex)
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr Bodnar
 
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle ContestDA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
Aws uk ug #8 not everything that happens in vegas stay in vegas
Aws uk ug #8   not everything that happens in vegas stay in vegasAws uk ug #8   not everything that happens in vegas stay in vegas
Aws uk ug #8 not everything that happens in vegas stay in vegas
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
 
Big data101kagglepresentation
Big data101kagglepresentationBig data101kagglepresentation
Big data101kagglepresentation
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...
Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...
Data Con LA 2022-PaCMAP ensembles for occupational specializations in the Cal...
 

More from Olaf Reitmaier Veracierta

Bandwidth control approach - Cisco vs Mikrotik on Multitenancy
Bandwidth control approach - Cisco vs Mikrotik on MultitenancyBandwidth control approach - Cisco vs Mikrotik on Multitenancy
Bandwidth control approach - Cisco vs Mikrotik on MultitenancyOlaf Reitmaier Veracierta
 
Arquitectura de Referencia - BGP - GSLB - SLB
Arquitectura de Referencia - BGP - GSLB - SLBArquitectura de Referencia - BGP - GSLB - SLB
Arquitectura de Referencia - BGP - GSLB - SLBOlaf Reitmaier Veracierta
 
Estrategia para Despliegue de Contenedores (Agile/DevOps)
Estrategia para Despliegue de Contenedores (Agile/DevOps)Estrategia para Despliegue de Contenedores (Agile/DevOps)
Estrategia para Despliegue de Contenedores (Agile/DevOps)Olaf Reitmaier Veracierta
 

More from Olaf Reitmaier Veracierta (20)

PoC Azure Administration
PoC Azure AdministrationPoC Azure Administration
PoC Azure Administration
 
RabbitMQ Status Quo Critical Review
RabbitMQ Status Quo Critical ReviewRabbitMQ Status Quo Critical Review
RabbitMQ Status Quo Critical Review
 
AWS Graviton3 and GP3
AWS Graviton3 and GP3AWS Graviton3 and GP3
AWS Graviton3 and GP3
 
KubeAdm vs. EKS - The IAM Roles Madness
KubeAdm vs. EKS - The IAM Roles MadnessKubeAdm vs. EKS - The IAM Roles Madness
KubeAdm vs. EKS - The IAM Roles Madness
 
AWS Cost Optimizations Risks
AWS Cost Optimizations RisksAWS Cost Optimizations Risks
AWS Cost Optimizations Risks
 
AWS Network Architecture Rework
AWS Network Architecture ReworkAWS Network Architecture Rework
AWS Network Architecture Rework
 
SRE Organizational Framework
SRE Organizational FrameworkSRE Organizational Framework
SRE Organizational Framework
 
Insight - Architecture Design
Insight - Architecture DesignInsight - Architecture Design
Insight - Architecture Design
 
Bandwidth control approach - Cisco vs Mikrotik on Multitenancy
Bandwidth control approach - Cisco vs Mikrotik on MultitenancyBandwidth control approach - Cisco vs Mikrotik on Multitenancy
Bandwidth control approach - Cisco vs Mikrotik on Multitenancy
 
Transparent Layer 2 Bandwidth Shaper
Transparent Layer 2 Bandwidth ShaperTransparent Layer 2 Bandwidth Shaper
Transparent Layer 2 Bandwidth Shaper
 
Arquitectura de Referencia - BGP - GSLB - SLB
Arquitectura de Referencia - BGP - GSLB - SLBArquitectura de Referencia - BGP - GSLB - SLB
Arquitectura de Referencia - BGP - GSLB - SLB
 
Backup aaS Solution Architecture
Backup aaS Solution ArchitectureBackup aaS Solution Architecture
Backup aaS Solution Architecture
 
Presentación de Arquitectura en la Nube
Presentación de Arquitectura en la NubePresentación de Arquitectura en la Nube
Presentación de Arquitectura en la Nube
 
Distributed Web Cluster (LAPP)
Distributed Web Cluster (LAPP)Distributed Web Cluster (LAPP)
Distributed Web Cluster (LAPP)
 
Multi-Cloud Connection Architecture
Multi-Cloud Connection ArchitectureMulti-Cloud Connection Architecture
Multi-Cloud Connection Architecture
 
Managed Cloud Services Revision
Managed Cloud Services RevisionManaged Cloud Services Revision
Managed Cloud Services Revision
 
Ingeniería de Software
Ingeniería de SoftwareIngeniería de Software
Ingeniería de Software
 
Estrategia para Despliegue de Contenedores (Agile/DevOps)
Estrategia para Despliegue de Contenedores (Agile/DevOps)Estrategia para Despliegue de Contenedores (Agile/DevOps)
Estrategia para Despliegue de Contenedores (Agile/DevOps)
 
On-Premise Private Cloud Architecture
On-Premise Private Cloud ArchitectureOn-Premise Private Cloud Architecture
On-Premise Private Cloud Architecture
 
Multimedia Streaming Architecture
Multimedia Streaming ArchitectureMultimedia Streaming Architecture
Multimedia Streaming Architecture
 

Recently uploaded

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Kubernetes Workload Rebalancing

  • 2. Agenda ● K8s Scheduler ○ Architecture ○ Decision Tree ○ Best Fit Algorithm ■ Filtering Sub-Algorithm ■ Scoring Sub-Algorithm ● Expectations vs. Reality ● Job Scheduling Problem ○ Google’s Guys Solutions ○ Other Proposed Solutions ● Key Takeaways 2
  • 3. Before we start… the sources… ● ❌ Medium.com “look-at-me” posts. ● ❌ Vendor marketing mumbo-jumbo. ● ✅ Peer reviewed CS publications. ● ✅ K8s failure stories (https://k8s.af/). ● ✅ Golang code review. ● ✅ Memes. 3
  • 4. Kubernetes Scheduler Kube Scheduler is responsible for selecting the worker node and provision the pod on the target node according to well-known pre-defined rules. 4
  • 5. Scheduler Decision Tree ● Scheduling Policies: ○ Deprecated since Kubernetes >= v1.23 5 Metrics Server func schedulePod https://github.com/kubernetes/kubernetes/blob/e4c8802407fbaf fad126685280e72145d89b125e/pkg/scheduler/schedule_one. go#L335 ● Best Fit Algorithm: ○ YAML rules specification. ○ Predicates => Filtering => Candidates ○ Priorities => Scoring => Ranking
  • 6. Filtering Algorithm => Candidates ● ✅ General predicates: ○ e.g. PodFitsResources,PodFitsNodeSelector ● ✅ Storage predicates: ○ e.g. NoDiskConflict,MaxCSIVolumeCount ● ✅ Compute predicates: ○ e.g. PodToleratesNodeTaint ● ✅ Runtime predicates: ○ e.g. CheckNodeCondition, CheckNodeMemoryPressure 6 func findNodesThatFitPod https://github.com/kubernetes/kubernete s/blob/e4c8802407fbaffad126685280e7 2145d89b125e/pkg/scheduler/schedule _one.go#L388
  • 7. Scoring Algorithm => Ranking ● Priority function(s) return(s) a weight from 0-10. ● Sum of all priority function result is the final score. ● Nodes ranked (sorted) and high(er|est) become the target. 7 func prioritizeNodes https://github.com/kubernetes/kuberne tes/blob/e4c8802407fbaffad12668528 0e72145d89b125e/pkg/scheduler/sche dule_one.go#L635 ● Priority functions (a lot, some examples): ○ ✅ SelectorSpreadPriority: node is in desired topology domain? ○ ✅ CalculateNodeLabelPriority: nodes matches specified label(s)? ○ ✅ *AffinityPriority: node is attracting or repelling? ○ ❗LeastRequestedPriority: node is “least-loaded”? ○ ❗BalancedResourceAllocation: CPU/Memory afterwards? (A bet)
  • 8. Expectations vs. Reality ● Expectation of Rebalance on: ○ Natural resource usage/demand. ○ Deployments, Restarts, Terminations. ○ Vertical/horizontal scaling operations. 8 ● Reality at Compute Level: ○ UNEVEN DISTRIBUTION: ■ for Node in Cluster.NodeGroup: Node.podCount() != NodeGroup.podCount() / Cluster.nodeCount() ● Reality at Storage / Network Level: ○ TL;TR here, qualify for several long ☕🍩 breaks. ○ AGGLOMERATION: ■ HIGH: “Too many pods” => “Overloaded”. ■ LOW: “Too few pods=> “Underutilized”.
  • 9. Scheduling Problem - Fact 1 of 2 9 Static Score Job Scheduling Uncertainty: ● Input assumed for score calculation is changing before calculation is done. ● Output should be sort of probability of an unchanged context but in reality it is not. Dynamic Score != Chaos & Entropy ~=
  • 10. Scheduling Problem - Fact 2 of 2 Optimal Job Scheduling is an Non-deterministic Polynomial Complete (NPC) problem, which means: 10 Note: A chess game ⇔ P problem. ✅ The solution can be guessed and verified in P time. ❗There is no particular rule to make the guess. 😲 It’s not known whether any polynomial-time algorithms will ever be found for NPC problems, it remains one of the most important questions in computer science. It means no efficient P-time algorithm has been found for Job Scheduling, you have to use best-guess solutions.
  • 11. Uneven Distribution ● “Solution”: Pod Topology Spread Constraints ○ maxSkew: 1 ■ Distribute pods in an absolute even manner ○ topologyKey: kubernetes.io/hostname ■ Use the hostname as topology domain ○ whenUnsatisfiable: ScheduleAnyway ■ Always schedule pods even if it can’t satisfy ○ labelSelector ■ Only act on Pods that match this selector whenUnsatisfiable: ScheduleAnyway Example: “Heavy Cron Jobs” 11 ● Drawbacks: ○ Static scheduling ■ Triggered by deployment or failure ○ Conflicts with other strategies (to come next).
  • 12. Agglomeration ● “Solution”: Inter-Pod affinity and anti-affinity ○ podAffinity: Attracts pods to a node. ○ podAntiAffinity: Repels pods from a node. On-Demand (us-east-1) Spot (eu-central-1) 12 Example: “Geo Spot Instances” ● Drawbacks: ○ Static scheduling (same story). ○ Conflicts with the previous Topology Constraints: ■ requiredDuringSchedulingIgnoredDuringExe cution => Only a single Pod per domain. ■ preferredDuringSchedulingIgnoredDuringEx ecution => Not enforced (e.g. Termination). ○ Not for real time rebalancing just for “(...)different topology domains to achieve either high availability or cost-saving”.
  • 13. Other Proposals ● Descheduler: Promising multi-strategy (re-)scheduling: https://github.com/kubernetes-sigs/descheduler 13 🤡 Descheduler and K8s Plugins are SIGs ( Special Interest Group) projects. ● Winter Soldier by DevTron Labs: Downscale to 0 pods. Conflicts w/AutoScaler? https://github.com/devtron-labs/winter-soldier ● Refined Balanced Resource Allocation (New): Promising dynamic metrics for schedulers (China). https://ceur-ws.org/Vol-3304/paper07.pdf ● Kube Scheduler Plugins: DYI: Just go and f@#ing write your own stuff? https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/624-scheduling-framework ● Low Carbon Kubernetes Scheduler: “Heliotropic multicountry” scheduler to save electricity 🚀 https://ceur-ws.org/Vol-2382/ICT4S2019_paper_28.pdf
  • 14. Key Takeaways ● Usability ~> Maybe?: ○ Better suited for steady workloads sets, not high peaky (e.g. node overload => pod unsch.). ○ Good for HA and cost saving, bad RT balancing. ○ Worsen unpredicted situations complexity (i.e. YAML hotfix?, inter-dependency)). 14 ● Reliability ~> Definitely not!: ○ No standard solution that fit all app / teams. ○ Requires a lot of edge cases testing, between co-existent applications (same node group). ○ No easy way to humanely trace the decisions taken by scheduler (e.g. during live issue). ○ Too many metric/rules siloed input for decisions: i. AWS Auto Scaling (e.g. Fixed, Scheduled). => EC2 On-Demand vs. Spot => AWS guess. ii. K8s AutoScaler (even without metrics). => “Rescheduler” => Dev/DevOps guess. iii. Capacity right sizing (i.e. EC2, limits/requests guess) ● Costs? Well, is another story of me(gue)ssing AWS RI/SP…