Container orchestration in geo-distributed cloud computing platforms

TRAINING THE NEXT GENERATION OF EUROPEAN FOG COMPUTING EXPERTS
Container orchestration in
geo-distributed cloud computing platforms
Keynote at HotCloudPerf
April 20th 2021
Mulugeta Ayalew Tamiru, Guillaume Pierre, Johan Tordsson and Erik Elmroth
Elastisys AB & Université de Rennes 1
1

Geo-distributed cloud platforms
2
Fault tolerance Proximity
Resource aggregation Regulatory compliance

Goal: reliably deploy software across the full platform
▪ Containers everywhere
• To abstract ourselves from heterogeneity of the host hardware +
hypervisors
▪ Deploy potentially large numbers of containers
• If necessary: burst to a public cloud
▪ Control container placements
• Manually
• Semi-automatically: “as close as possible from X”
• Automatically: load-balanced across all locations
3

Kubernetes Federation (KubeFed)
▪ Resource management and
application deployment on
multiple Kubernetes clusters
(member clusters) from a
single control plane (host
cluster)
▪ BUT: KubeFed was not
specifically designed for
worldwide geo-distribution
4

Experimental setup
5
▪ 1 host cluster and 5 member
clusters with Kubernetes 1.14
▪ Each cluster with a master
and five worker nodes
▪ Host cluster nodes: 4vCPUs,
16GB RAM
▪ Member cluster nodes:
4vCPUs, 4 GB RAM
▪ Simple nginx web server app

Problem -- Instability
6
Stability

Impact of network configuration on stability
7
AVERAGE NO . OF TIMEOUT ERRORS PER MINUTE (N ) AND STABILITY (υ) OF THE UNCONTROLLED
SYSTEM FOR THE THREE EVALUATION SCENARIOS .
Network delay/ packet
loss rate increased
Cluster failure
loss rate restored
Cluster restored

KubeFed configuration parameters
8
Parameter Default
Cluster Available Delay 20s
Cluster Unavailable Delay 60s
Leader Elect Lease Duration 15s
Leader Elect Renew Deadline 10s
Leader Elect Retry Period 5s
Cluster Health Check Timeout 3s
Cluster Health Check Period 10s
Cluster Health Check Failure
Threshold
3

Stability vs. failure detection delay
9

Solution -- Controller to adjust CHCT at run-time
10

Results -- Stationary scenario
11

Results -- Network variability scenario
12
loss rate increased
loss rate restored

Results -- Cluster failure scenario
13
Cluster failure Cluster
restored

(Temporary) conclusion
▪ We observe significant instability in KubeFed-based
geo-distributed fog platforms due to:
• poor network conditions
• default / static configuration parameters
▪ We designed a proportional controller to adjust CHCT at
run-time
• Improves the system stability from 83–92% with no controller to
99.5–100% using the controller
Mulugeta Tamiru, Guillaume Pierre, Johan Tordsson, Erik Elmroth. Instability in Geo-Distributed Kubernetes Federation:
Causes and Mitigation. In Proceedings of IEEE MASCOTS, Nov 2020.
14

Now that we fixed the instability problem, is KubeFed ready
to manage large-scale geo-distributed platforms?
Note quite: in KubeFed, any deployment request is pushed to the
requested cluster regardless of the resource availability in this cluster.
15
Let’s replay 1 hour of
Google cluster trace,
distribute jobs to one out
of 5 clusters according to
a binomial distribution:
▪ 3 overloaded clusters
▪ 2 mostly idle clusters

Problems to address
▪ Make sure applications are not deployed in overloaded clusters
• Even if this requires choosing another cluster automatically…
▪ Support application autoscaling in multi-cluster environments
• Vary the number of replicas within a single cluster…
• … or across multiple clusters
▪ Allow the system to burst out to a public cloud in case of resource
overload
• And retract public-cloud resources as early as possible
▪ Seamlessly integrate in existing KubeFed platforms
16

17
Deploy mcd-app-1 across two clusters
which receive most network traffic
Make sure end-user requests are
distributed across both clusters

18
Autoscale the application deployment
to maintain reasonable CPU usage
Dynamically provision more resources
from the public cloud if necessary

Conclusion
Geo-distributed Kubernetes federations are now:
▪ Stable
▪ Resource availability aware
▪ Network traffic and network latency aware
▪ Burstable between available clusters, and to the public cloud
mck8s is available: https://github.com/moule3053/mck8s
Mulugeta Tamiru, Guillaume Pierre, Johan Tordsson, Erik Elmroth. mck8s: an orchestration platform for geo-distributed
multi-cluster environments. In Proceedings of ICCCN, Jul 2021.
20

The FogGuru project has received funding from the European Union’s
Horizon 2020 research and innovation programme under the Marie
Skłodowska-Curie grant 765452.
TRAINING THE NEXT GENERATION
OF EUROPEAN FOG COMPUTING EXPERTS
www.fogguru.eu
21

Container orchestration in geo-distributed cloud computing platforms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Container orchestration in geo-distributed cloud computing platforms

Similar to Container orchestration in geo-distributed cloud computing platforms (20)

More from FogGuru MSCA Project

More from FogGuru MSCA Project (20)

Recently uploaded

Recently uploaded (20)

Container orchestration in geo-distributed cloud computing platforms