This document discusses research on improving the stability and resource management of geo-distributed Kubernetes federations.
The researchers found that Kubernetes Federation (KubeFed) implementations exhibited instability due to poor network conditions and static configuration parameters. They developed a proportional controller to dynamically adjust health check timeouts and improve stability.
They then addressed resource availability and network traffic awareness by developing mck8s, a platform that can deploy applications across clusters based on resource load and network traffic. Mck8s also allows bursting to public clouds when local clusters are overloaded and retracting those resources when possible. This improves resource management for geo-distributed multi-cluster Kubernetes environments.
Agentic RAG What it is its types applications and implementation.pdf
Container orchestration in geo-distributed cloud computing platforms
1. TRAINING THE NEXT GENERATION OF EUROPEAN FOG COMPUTING EXPERTS
Container orchestration in
geo-distributed cloud computing platforms
Keynote at HotCloudPerf
April 20th 2021
Mulugeta Ayalew Tamiru, Guillaume Pierre, Johan Tordsson and Erik Elmroth
Elastisys AB & Université de Rennes 1
1
3. Goal: reliably deploy software across the full platform
▪ Containers everywhere
• To abstract ourselves from heterogeneity of the host hardware +
hypervisors
▪ Deploy potentially large numbers of containers
• If necessary: burst to a public cloud
▪ Control container placements
• Manually
• Semi-automatically: “as close as possible from X”
• Automatically: load-balanced across all locations
3
4. Kubernetes Federation (KubeFed)
▪ Resource management and
application deployment on
multiple Kubernetes clusters
(member clusters) from a
single control plane (host
cluster)
▪ BUT: KubeFed was not
specifically designed for
worldwide geo-distribution
4
5. Experimental setup
5
▪ 1 host cluster and 5 member
clusters with Kubernetes 1.14
▪ Each cluster with a master
and five worker nodes
▪ Host cluster nodes: 4vCPUs,
16GB RAM
▪ Member cluster nodes:
4vCPUs, 4 GB RAM
▪ Simple nginx web server app
7. Impact of network configuration on stability
7
AVERAGE NO . OF TIMEOUT ERRORS PER MINUTE (N ) AND STABILITY (υ) OF THE UNCONTROLLED
SYSTEM FOR THE THREE EVALUATION SCENARIOS .
Network delay/ packet
loss rate increased
Cluster failure
Network delay/ packet
loss rate restored
Cluster restored
8. KubeFed configuration parameters
8
Parameter Default
Cluster Available Delay 20s
Cluster Unavailable Delay 60s
Leader Elect Lease Duration 15s
Leader Elect Renew Deadline 10s
Leader Elect Retry Period 5s
Cluster Health Check Timeout 3s
Cluster Health Check Period 10s
Cluster Health Check Failure
Threshold
3
14. (Temporary) conclusion
▪ We observe significant instability in KubeFed-based
geo-distributed fog platforms due to:
• poor network conditions
• default / static configuration parameters
▪ We designed a proportional controller to adjust CHCT at
run-time
• Improves the system stability from 83–92% with no controller to
99.5–100% using the controller
Mulugeta Tamiru, Guillaume Pierre, Johan Tordsson, Erik Elmroth. Instability in Geo-Distributed Kubernetes Federation:
Causes and Mitigation. In Proceedings of IEEE MASCOTS, Nov 2020.
14
15. Now that we fixed the instability problem, is KubeFed ready
to manage large-scale geo-distributed platforms?
Note quite: in KubeFed, any deployment request is pushed to the
requested cluster regardless of the resource availability in this cluster.
15
Let’s replay 1 hour of
Google cluster trace,
distribute jobs to one out
of 5 clusters according to
a binomial distribution:
▪ 3 overloaded clusters
▪ 2 mostly idle clusters
16. Problems to address
▪ Make sure applications are not deployed in overloaded clusters
• Even if this requires choosing another cluster automatically…
▪ Support application autoscaling in multi-cluster environments
• Vary the number of replicas within a single cluster…
• … or across multiple clusters
▪ Allow the system to burst out to a public cloud in case of resource
overload
• And retract public-cloud resources as early as possible
▪ Seamlessly integrate in existing KubeFed platforms
16
17. 17
Deploy mcd-app-1 across two clusters
which receive most network traffic
Make sure end-user requests are
distributed across both clusters
18. 18
Autoscale the application deployment
to maintain reasonable CPU usage
Dynamically provision more resources
from the public cloud if necessary
20. Conclusion
Geo-distributed Kubernetes federations are now:
▪ Stable
▪ Resource availability aware
▪ Network traffic and network latency aware
▪ Burstable between available clusters, and to the public cloud
mck8s is available: https://github.com/moule3053/mck8s
Mulugeta Tamiru, Guillaume Pierre, Johan Tordsson, Erik Elmroth. mck8s: an orchestration platform for geo-distributed
multi-cluster environments. In Proceedings of ICCCN, Jul 2021.
20
21. The FogGuru project has received funding from the European Union’s
Horizon 2020 research and innovation programme under the Marie
Skłodowska-Curie grant 765452.
TRAINING THE NEXT GENERATION
OF EUROPEAN FOG COMPUTING EXPERTS
www.fogguru.eu
21