OpenShift Multicluster

Juan Vicente Herrera - Red Hat Cloud Architect
¿Stretched cluster or
Multi-cluster?
Beyond your region...
1

Agenda
● HA Layers
● Multicluster / GitOps
● Stretched Cluster
● Multicluster
● Disaster Recovery
● Conclusion
2

High Availability vs. Disaster Recovery
3
● HA, High Availability : is a characteristic of a system which aims to ensure
an agreed level of operational performance, usually uptime, for a higher
than normal period even if some components of the overall design are
not functional (degraded). Generally, is based in Active/Active
redundancy.
● DR, Disaster Recovery: involves a set of policies, tools and procedures to
enable the recovery or continuation of vital technology infrastructure
and systems following a natural or human-induced disaster. Generally,
involves secondary sites and Active/Passive redundancy.

HA layers: Applications
4
●Application availability usually involves creating
different application pods.
●Whenever one of the application pods fails, all
requests are redirected to the application pods that
are still alive, not affecting the overall service level.
●Requests must be redirected to other application
pods transparently for the ﬁnal user. The
application must not maintain any local data
susceptible to be lost if the application instance
fails (stateless).
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
KubernetesOpenShift
Cluster

HA layers: Node
5
●Worker nodes are responsible to host application
pods.
●When one of the worker nodes fails, K8s cluster
redirects network traffic to the application pods in
other worker nodes. If necessary, other application
pods will be deployed automatically.
●The cluster must have enough system resources
(CPU/MEM) to distribute application workloads
upon worker node failures.
In a 5 nodes Kubernetes/OpenShift cluster, application workloads should not
consume more than 80% of the worker node system resources in order to be
able to allocate new pods upon worker node failures.
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Kubernetes/OpenShift
Cluster

HA layers: Control plane
6
● Master nodes host administration and
management services (such as API and Console
pods).
● HA offered through quorum. Generally three master
nodes are deployed. Upon node failure, two master
nodes still alive and service is not disrupted.
● Losing the entire control plane do not affect
application services, only OpenShift management
and provisioning is affected (read-only
operations).
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Kubernetes/OpenShift
Cluster

HA layers: Data
7
●Ceph introduction
○ Ceph is a Software Deﬁned Storage system
deployed on standard x86 servers, using the
CRUSH algorithm to distribute data evenly
across the cluster.
○ Ceph provides 3-in-1 interfaces for object,
block and ﬁle level storage. Ceph aims
primarily for completely distributed operation
without a single point of failure and scalable to
the exabyte level.
○ Ceph (by default) stores 3 object replicas per
client object.
Ceph
Cluster
Ceph
Node
Ceph
Node
Ceph
Node
Block
FileSystem
Object

HA layers: Conclusions
8
Until here, everything works perfectly.
Combining K8s/OCP and persistent storage
as CEPH or equivalent in a single site, an
extraordinary service level can be
guaranteed,
but…
How can I protect my applications upon
natural or human-induced disasters
affecting the entire DataCenter?
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Cluster
Storage
Node 1
Storage
Node 2
Storage
Node 3

Protection upon disasters
9
● Two different protection models:
○ Active/Active
■ Stretched cluster or multi-cluster.
■ Distributed applications between
clusters/DCs.
■ Data is accessible from any cluster/DC.
○ Active/Passive
■ Applications
■ Data
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Cluster
Storage
Node 1
Storage
Node 2
Storage
Node 3

Multi-DataCenter deployments
What are the different alternatives?
10
2. Stretched Cluster (Active/Active)
3. Disaster Recovery (Active/Passive)
4. Recap and Conclusions
1. GitOps: Two synchronized independent clusters (Active/Active)

1.- Multicluster /
GitOps
● Two independent clusters synchronized directly by the
applications (Active/Active)
● Data HA directly managed by the applications
11

GitOps: Synchronized Clusters
Configuration and application synchronization
12
GitOps is a way where a Git repository that always contains declarative descriptions of the
infrastructure currently desired in the production environment and an automated process to
make the production environment match the described state in the repository.
The authorization mechanism can be utilized to restrict the permissions on performing
deployments. This has a huge impact in terms of security as CI/CD applications do not need to
interact with the OpenShift cluster in the production environment.
Container
Registry
Conﬁguration
Continuous
Integration
(Build, Test, Etc.)
Source Code
Monitor &
Apply changes
Download
Containers
from Registry
AppCtrl
AppCtrl

13
ArgoCD is an "Application Controller" specifically
designed for Kubernetes that actively monitors
running applications and compare the current state
with the desired state (specified in the Git
repository).
An application deviated from the desired state is
considered as OutOfSync. ArgoCD informs and
shows about the differences, offering different
functionalities to synchronize (automatically or
manually) the current state with the desired state.
Any modification performed in the desired state in the Git repository can be applied in
order to automatically synchronized (per cluster/environment)the application state.

DATA management
14
GitOps is capable to offer a model managing applications where High Availability (with no
service disruption) is guaranteed in a Multi-DC environment (No RPO or RTO).
Requirement: Data application must be shared from different cluster/environments/sites.
● Multi-master design where all the DB instances are active, RW and maintain exactly the
same information (RPO=0, RTO=0).
● Single-master design where all application access a single node, RW, where the
information is replicated to other DB instances, RO or not accessible at all. When a failure
occurs in the master instance, a secondary instance is promoted to master (RPO=0,
RTO=~0)
● Either DB are deployed in containers or externally, access to the exact same shared data
from the applications deployed in different sites must be ensured.
Further info: En.wikipedia.org EDB Crunchy MongoDB CouchBase Percona Microsoft

Conclusion
15
As a recap, in a multi-DC environment, GitOps + Data replication (in the application or DB
layer) may offer Active/Active services with RPO=0 and RTO=0 upon natural or
human-induced disaster.
The main problem consists to identify the DB technologies (SQL and NoSQL) to use and
standardize how to deploy each of them.
The main advantages are:
- Deploy in two different sites/DC (max RTT between sites deﬁned by the DB layer).
- Simplicity as there are not dependencies on the underlying infrastructure.
- Cost, as this solution can be deployed in bare-metal nodes reducing licensing costs.
- Cost, being Active/Active there are not infrautilized resources in standby environments.
- Ideal solution if application are stateless or using external data.
- RPO=0, RTO=0

Conclusion
Multi-DataCenter
16
The main disadvantages are:
- Added complexity designing and deploying applications.
- Distributed DBs usually works with 3 object replicas and 3 DCs.
- External unification of incoming network traffic (routers, DNS, LBs, ...).
- Submariner is not included in this solution, so network traffic between
clusters/sites/DCs must leave OpenShift SDN.
- It is not possible to share Persistent Volumes in ODF between OpenShift clusters.
- There is not unified logging, monitoring, user access, security management, ...
External tools must be used.
- Max RTT are define per application/DB.

2.- Stretched Cluster
(Active/Active)
17

2.- Stretched Cluster (Active/Active)
18
● In the GitOps model we have proposed the deployment of several
independent and self-managed clusters that remain in a synchronized
state.
● A Stretched Cluster is a K8s/OpenShift deployment model in which the
nodes of the cluster are distributed among several DataCenters.
● Although it is unlikely, the total loss of the master nodes does not imply
the loss of service but limiting only the management of the cluster.

2.- Stretched Cluster (Active/Active)
19
Pros:
● By deﬁnition, a Stretched Cluster provides RTO=0 and RPO=0
● With 3 symmetrical DC's (same size), assuming 66% load, one DC could go down and
the cluster should not impact the service availability. 33% of the load of the down DC is
transferred to the remaining DC's (66/66/66 => 0/100/100)
● The same balancing happens internally between the worker nodes of each DataCenter.
Network latency requirements:
Latency between DataCenters ≤ 2ms (due to OCP masters requirements)
● ≤ 2ms between OCP masters
● ≤ 4ms between Ceph storage nodes (includinding arbiter) (Internal mode)
● ≤ 200ms between Ceph nodes and Arbiter node (External mode)

2.1- K8s/OCP + internal storage in 3 DC
(rooms/FD)
20
Master
3
DataCenter 3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
DataCenter 2
● w/without Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
DataCenter 1
K8s/OpenShift Cluster
● Application Nodes
CEPH
1
CEPH
CEPH
PROS: Autonomous and IaaS agnostic model. Minimum underutilization of Computing and Memory (66/66/66> 0/100/100)
CONS: Requires 3 DCs, each of them well communicated with the other 2 DCs (latency ≤2ms)
- Check ETCD behavior with network isolation between DC's
- Applications may break as a result of the ETCD not being writeable
Node 1
Pod C Pod B
Node n
Pod C Pod B

Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
2.2- K8s/OCP + internal storage in 2 DC +
Arbiter
21
PROS: Autonomous and IaaS agnostic model. The DC hosting the arbiter requires very low infrastructure resources.
CONS:
- Minimal resources for a 3rd room or a DataCenter well communicated with the other 2 DC's is required.
- ≤ 4ms between Ceph nodes and Arbiter node (Internal mode) as K8s/OCP master requires lower RTT values.
- Underutilization of Computing and Memory (50/0/50> 0/0/100)
Master
3
DataCenter 3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
DataCenter 2
● No Application Nodes
● CEPH without local physical storage
DataCenter 1
K8s/OpenShift Cluster
● 2 Ceph monitors
CEPH
Arbiter Node
(Metadata Only)
CEPH CEPH
● 2 Ceph monitors

2.3. HA based on IaaS - 2 DC’s Virtual
22
Master
3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
OpenShift Cluster
CEPH
Arbiter Node
(Metadata Only)
100% virtualized environments in both DC’s
In the event of a DC1 crash all VMs are started in DC2
CEPH CEPH
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
Master
3
ODF 3
Arbiter Node
(Metadata Only)
ODF 1
DataCenter 1 DataCenter 2
● Masters virtualized
● Workers virtualized
● Virtualized CEPH arbiter
● Masters virtualized
● Workers virtualized
● Virtualized CEPH arbiter
PROS: In the event of a DataCenter crash, all nodes are moved to the other DataCenter
CONS: Replication bandwidth and time is required to replicate and synchronize the migrated nodes (approx 1 to 5 min)
- Requires IaaS (RHV / VMware) + synchronous cabin replication of all VM's from DC1 to DC2

Conclusion, decision criteria
24
As described in the previous slides, there are multiple deployment options
and alternatives on the table. A key point to be considered in the
decision-making is the number of DC's, their capacities and the network
latency between them.
The ﬁrst option to be considered should be a Stretched Cluster deployed
across 3 DC's with similar computational capabilities (CPU, memory and
storage) and network latency between them ≤ 2ms.
This solution signiﬁcantly reduces the costs of infrastructure, operation
and software licensing with the highest degree of service availability.

25
On those scenarios where a Stretched Cluster deployment across 3 DC’s
with similar computation capabilities, etc. is not an option, an alternative
may be a Stretched Cluster in 2 DC plus an arbiter DC with limited resources.
This solution reduces the infrastructure resources for the arbiter DC that
could be allocated in a corporate building or room, and is separated from
the 2 main DCs, as long as the network latency requirements are met (≤
2ms).

26
When a Stretched Cluster deployment in 3 DC’s is not possible but we can
have latency ≤ 2ms, we can consider a Stretched Cluster in 2 DC with DR
based on IaaS.
This model introduces a caveat as it maintains 2 of the 3 nodes of the OCP
and OCS control plane in one of the DCs. If this "primary" DC goes down, 1
node of each type must be transferred to the secondary DC for it to be
considered with sufficient quorum.
It requires designing and implementing the VM migration process.

27
When a Stretched Cluster deployment on 3 DC’s is not an option or the
network latency will always be higher than 2ms, we can consider a
GitOps-based deployment on 2 DC’s.
This model requires that the instances of the same application, running in
any of the clusters, share the same data. Either because the architecture of
the applications allows it (for example, event-oriented), or because the
database is multimaster, or because there is a master-slave model that
allows promoting the slave without loss of service.

28
Finally, when a Stretched Cluster deployment in 3 DC's can't be
accomplished, the latency is not ≤ 2ms and the data access model by
applications cannot be standardized, we can only consider a single DC with
DR.
This approach is possibly the least ﬂexible and the one with the highest cost
per computing unit. It requires designing and implementing DR policies
and processes bounded to it as well as the operations and the maintenance
involved.

Conclusions, requirements per model
29
3 DC Stretched
Cluster
2 DC Stretched
Cluster + Arbiter site
2 DC managed
by Git Ops
2 DC Stretched
Cluster + IaaS
Three identical DCs with available scalability
(CPU, memory and storage) x
Two identical DCs with available scalability
(CPU, memory and storage) + Technical
room to deploy two baremetal servers
x
Two identical DCs with available scalability
(CPU, memory and storage). No technical
room available.
x x
RTT latency between DCs =< 2ms x x x
Applications deployed in two different OCP
clusters capable to share same external
storage (or replicated by application/DB)
x
IaaS infrastructure capable to migrate VMs
between DCs x

Conclusions, advantages per model
30
3 DC Stretched
Cluster
2 DC Stretched
Cluster + Arbiter site
2 DC managed by
Git Ops
2 DC Stretched
Cluster + IaaS
Active/Active (RTO=0, RPO=0) x x x x
Self-sufficient, do not require IaaS
environment x x x
Simple DR testing (no service
availability impact) x x x x
Efficient system resources
infra-utilization (CPU and memory)
(66/66/66 > 0/100/100)
x
No additional efforts to
implement/maintain DR x x
Do not require manual intervention
upon DC disaster x x /
Capable to be deployed on Bare Metal x x x /

Thanks for your attention!
31
LinkedIn: https://www.linkedin.com/in/jvherrera/
Twitter: https://twitter.com/jvicenteherrera
Email: juanvi@redhat.com

OpenShift Multicluster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to OpenShift Multicluster

Similar to OpenShift Multicluster (20)

More from Juan Vicente Herrera Ruiz de Alejo

More from Juan Vicente Herrera Ruiz de Alejo (20)

Recently uploaded

Recently uploaded (20)

OpenShift Multicluster