SlideShare a Scribd company logo
1 of 31
Download to read offline
Juan Vicente Herrera - Red Hat Cloud Architect
¿Stretched cluster or
Multi-cluster?
Beyond your region...
1
Agenda
● HA Layers
● Multicluster / GitOps
● Stretched Cluster
● Multicluster
● Disaster Recovery
● Conclusion
2
High Availability vs. Disaster Recovery
3
● HA, High Availability : is a characteristic of a system which aims to ensure
an agreed level of operational performance, usually uptime, for a higher
than normal period even if some components of the overall design are
not functional (degraded). Generally, is based in Active/Active
redundancy.
● DR, Disaster Recovery: involves a set of policies, tools and procedures to
enable the recovery or continuation of vital technology infrastructure
and systems following a natural or human-induced disaster. Generally,
involves secondary sites and Active/Passive redundancy.
HA layers: Applications
4
●Application availability usually involves creating
different application pods.
●Whenever one of the application pods fails, all
requests are redirected to the application pods that
are still alive, not affecting the overall service level.
●Requests must be redirected to other application
pods transparently for the final user. The
application must not maintain any local data
susceptible to be lost if the application instance
fails (stateless).
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
KubernetesOpenShift
Cluster
HA layers: Node
5
●Worker nodes are responsible to host application
pods.
●When one of the worker nodes fails, K8s cluster
redirects network traffic to the application pods in
other worker nodes. If necessary, other application
pods will be deployed automatically.
●The cluster must have enough system resources
(CPU/MEM) to distribute application workloads
upon worker node failures.
In a 5 nodes Kubernetes/OpenShift cluster, application workloads should not
consume more than 80% of the worker node system resources in order to be
able to allocate new pods upon worker node failures.
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Kubernetes/OpenShift
Cluster
HA layers: Control plane
6
● Master nodes host administration and
management services (such as API and Console
pods).
● HA offered through quorum. Generally three master
nodes are deployed. Upon node failure, two master
nodes still alive and service is not disrupted.
● Losing the entire control plane do not affect
application services, only OpenShift management
and provisioning is affected (read-only
operations).
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Kubernetes/OpenShift
Cluster
HA layers: Data
7
●Ceph introduction
○ Ceph is a Software Defined Storage system
deployed on standard x86 servers, using the
CRUSH algorithm to distribute data evenly
across the cluster.
○ Ceph provides 3-in-1 interfaces for object,
block and file level storage. Ceph aims
primarily for completely distributed operation
without a single point of failure and scalable to
the exabyte level.
○ Ceph (by default) stores 3 object replicas per
client object.
Ceph
Cluster
Ceph
Node
Ceph
Node
Ceph
Node
Block
FileSystem
Object
HA layers: Conclusions
8
Until here, everything works perfectly.
Combining K8s/OCP and persistent storage
as CEPH or equivalent in a single site, an
extraordinary service level can be
guaranteed,
but…
How can I protect my applications upon
natural or human-induced disasters
affecting the entire DataCenter?
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Cluster
Storage
Node 1
Storage
Node 2
Storage
Node 3
Protection upon disasters
9
● Two different protection models:
○ Active/Active
■ Stretched cluster or multi-cluster.
■ Distributed applications between
clusters/DCs.
■ Data is accessible from any cluster/DC.
○ Active/Passive
■ Applications
■ Data
Node 1
Pod A Pod B
Node 2
Pod A Pod B
Node ...
Pod A Pod C
Node n
Pod C Pod B
Master
1
Master
2
Master
3
DataCenter
Cluster
Storage
Node 1
Storage
Node 2
Storage
Node 3
Multi-DataCenter deployments
What are the different alternatives?
10
2. Stretched Cluster (Active/Active)
3. Disaster Recovery (Active/Passive)
4. Recap and Conclusions
1. GitOps: Two synchronized independent clusters (Active/Active)
1.- Multicluster /
GitOps
● Two independent clusters synchronized directly by the
applications (Active/Active)
● Data HA directly managed by the applications
11
GitOps: Synchronized Clusters
Configuration and application synchronization
12
GitOps is a way where a Git repository that always contains declarative descriptions of the
infrastructure currently desired in the production environment and an automated process to
make the production environment match the described state in the repository.
The authorization mechanism can be utilized to restrict the permissions on performing
deployments. This has a huge impact in terms of security as CI/CD applications do not need to
interact with the OpenShift cluster in the production environment.
Container
Registry
Configuration
Continuous
Integration
(Build, Test, Etc.)
Source Code
Monitor &
Apply changes
Download
Containers
from Registry
AppCtrl
AppCtrl
GitOps: Synchronized Clusters
13
ArgoCD is an "Application Controller" specifically
designed for Kubernetes that actively monitors
running applications and compare the current state
with the desired state (specified in the Git
repository).
An application deviated from the desired state is
considered as OutOfSync. ArgoCD informs and
shows about the differences, offering different
functionalities to synchronize (automatically or
manually) the current state with the desired state.
Any modification performed in the desired state in the Git repository can be applied in
order to automatically synchronized (per cluster/environment)the application state.
GitOps: Synchronized Clusters
DATA management
14
GitOps is capable to offer a model managing applications where High Availability (with no
service disruption) is guaranteed in a Multi-DC environment (No RPO or RTO).
Requirement: Data application must be shared from different cluster/environments/sites.
● Multi-master design where all the DB instances are active, RW and maintain exactly the
same information (RPO=0, RTO=0).
● Single-master design where all application access a single node, RW, where the
information is replicated to other DB instances, RO or not accessible at all. When a failure
occurs in the master instance, a secondary instance is promoted to master (RPO=0,
RTO=~0)
● Either DB are deployed in containers or externally, access to the exact same shared data
from the applications deployed in different sites must be ensured.
Further info: En.wikipedia.org EDB Crunchy MongoDB CouchBase Percona Microsoft
GitOps: Synchronized Clusters
Conclusion
15
As a recap, in a multi-DC environment, GitOps + Data replication (in the application or DB
layer) may offer Active/Active services with RPO=0 and RTO=0 upon natural or
human-induced disaster.
The main problem consists to identify the DB technologies (SQL and NoSQL) to use and
standardize how to deploy each of them.
The main advantages are:
- Deploy in two different sites/DC (max RTT between sites defined by the DB layer).
- Simplicity as there are not dependencies on the underlying infrastructure.
- Cost, as this solution can be deployed in bare-metal nodes reducing licensing costs.
- Cost, being Active/Active there are not infrautilized resources in standby environments.
- Ideal solution if application are stateless or using external data.
- RPO=0, RTO=0
GitOps: Synchronized Clusters
Conclusion
Multi-DataCenter
16
The main disadvantages are:
- Added complexity designing and deploying applications.
- Distributed DBs usually works with 3 object replicas and 3 DCs.
- External unification of incoming network traffic (routers, DNS, LBs, ...).
- Submariner is not included in this solution, so network traffic between
clusters/sites/DCs must leave OpenShift SDN.
- It is not possible to share Persistent Volumes in ODF between OpenShift clusters.
- There is not unified logging, monitoring, user access, security management, ...
External tools must be used.
- Max RTT are define per application/DB.
2.- Stretched Cluster
(Active/Active)
17
2.- Stretched Cluster (Active/Active)
18
● In the GitOps model we have proposed the deployment of several
independent and self-managed clusters that remain in a synchronized
state.
● A Stretched Cluster is a K8s/OpenShift deployment model in which the
nodes of the cluster are distributed among several DataCenters.
● Although it is unlikely, the total loss of the master nodes does not imply
the loss of service but limiting only the management of the cluster.
2.- Stretched Cluster (Active/Active)
19
Pros:
● By definition, a Stretched Cluster provides RTO=0 and RPO=0
● With 3 symmetrical DC's (same size), assuming 66% load, one DC could go down and
the cluster should not impact the service availability. 33% of the load of the down DC is
transferred to the remaining DC's (66/66/66 => 0/100/100)
● The same balancing happens internally between the worker nodes of each DataCenter.
Network latency requirements:
Latency between DataCenters ≤ 2ms (due to OCP masters requirements)
● ≤ 2ms between OCP masters
● ≤ 4ms between Ceph storage nodes (includinding arbiter) (Internal mode)
● ≤ 200ms between Ceph nodes and Arbiter node (External mode)
2.1- K8s/OCP + internal storage in 3 DC
(rooms/FD)
20
Master
3
DataCenter 3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
DataCenter 2
● w/without Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
DataCenter 1
K8s/OpenShift Cluster
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
CEPH
1
CEPH
CEPH
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
PROS: Autonomous and IaaS agnostic model. Minimum underutilization of Computing and Memory (66/66/66> 0/100/100)
CONS: Requires 3 DCs, each of them well communicated with the other 2 DCs (latency ≤2ms)
- Check ETCD behavior with network isolation between DC's
- Applications may break as a result of the ETCD not being writeable
Node 1
Pod C Pod B
Node n
Pod C Pod B
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
2.2- K8s/OCP + internal storage in 2 DC +
Arbiter
21
PROS: Autonomous and IaaS agnostic model. The DC hosting the arbiter requires very low infrastructure resources.
CONS:
- Minimal resources for a 3rd room or a DataCenter well communicated with the other 2 DC's is required.
- ≤ 4ms between Ceph nodes and Arbiter node (Internal mode) as K8s/OCP master requires lower RTT values.
- Underutilization of Computing and Memory (50/0/50> 0/0/100)
Master
3
DataCenter 3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
DataCenter 2
● No Application Nodes
● Master (it can be virtualized)
● CEPH without local physical storage
DataCenter 1
K8s/OpenShift Cluster
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
● 2 Ceph monitors
CEPH
Arbiter Node
(Metadata Only)
CEPH CEPH
● Application Nodes
● Master (it can be virtualized)
● CEPH w/ local physical storage
● 2 Ceph monitors
2.3. HA based on IaaS - 2 DC’s Virtual
22
Master
3
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
2
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
OpenShift Cluster
CEPH
Arbiter Node
(Metadata Only)
100% virtualized environments in both DC’s
In the event of a DC1 crash all VMs are started in DC2
CEPH CEPH
Node 1
Pod A Pod B
Node n
Pod C Pod B
Master
1
Master
3
ODF 3
Arbiter Node
(Metadata Only)
ODF 1
DataCenter 1 DataCenter 2
● Masters virtualized
● Workers virtualized
● Virtualized CEPH arbiter
● Masters virtualized
● Workers virtualized
● Virtualized CEPH arbiter
PROS: In the event of a DataCenter crash, all nodes are moved to the other DataCenter
CONS: Replication bandwidth and time is required to replicate and synchronize the migrated nodes (approx 1 to 5 min)
- Requires IaaS (RHV / VMware) + synchronous cabin replication of all VM's from DC1 to DC2
3.- Conclusions
23
Conclusion, decision criteria
24
As described in the previous slides, there are multiple deployment options
and alternatives on the table. A key point to be considered in the
decision-making is the number of DC's, their capacities and the network
latency between them.
The first option to be considered should be a Stretched Cluster deployed
across 3 DC's with similar computational capabilities (CPU, memory and
storage) and network latency between them ≤ 2ms.
This solution significantly reduces the costs of infrastructure, operation
and software licensing with the highest degree of service availability.
25
On those scenarios where a Stretched Cluster deployment across 3 DC’s
with similar computation capabilities, etc. is not an option, an alternative
may be a Stretched Cluster in 2 DC plus an arbiter DC with limited resources.
This solution reduces the infrastructure resources for the arbiter DC that
could be allocated in a corporate building or room, and is separated from
the 2 main DCs, as long as the network latency requirements are met (≤
2ms).
Conclusion, decision criteria
26
When a Stretched Cluster deployment in 3 DC’s is not possible but we can
have latency ≤ 2ms, we can consider a Stretched Cluster in 2 DC with DR
based on IaaS.
This model introduces a caveat as it maintains 2 of the 3 nodes of the OCP
and OCS control plane in one of the DCs. If this "primary" DC goes down, 1
node of each type must be transferred to the secondary DC for it to be
considered with sufficient quorum.
It requires designing and implementing the VM migration process.
Conclusion, decision criteria
27
When a Stretched Cluster deployment on 3 DC’s is not an option or the
network latency will always be higher than 2ms, we can consider a
GitOps-based deployment on 2 DC’s.
This model requires that the instances of the same application, running in
any of the clusters, share the same data. Either because the architecture of
the applications allows it (for example, event-oriented), or because the
database is multimaster, or because there is a master-slave model that
allows promoting the slave without loss of service.
Conclusion, decision criteria
28
Finally, when a Stretched Cluster deployment in 3 DC's can't be
accomplished, the latency is not ≤ 2ms and the data access model by
applications cannot be standardized, we can only consider a single DC with
DR.
This approach is possibly the least flexible and the one with the highest cost
per computing unit. It requires designing and implementing DR policies
and processes bounded to it as well as the operations and the maintenance
involved.
Conclusion, decision criteria
Conclusions, requirements per model
29
3 DC Stretched
Cluster
2 DC Stretched
Cluster + Arbiter site
2 DC managed
by Git Ops
2 DC Stretched
Cluster + IaaS
Three identical DCs with available scalability
(CPU, memory and storage) x
Two identical DCs with available scalability
(CPU, memory and storage) + Technical
room to deploy two baremetal servers
x
Two identical DCs with available scalability
(CPU, memory and storage). No technical
room available.
x x
RTT latency between DCs =< 2ms x x x
Applications deployed in two different OCP
clusters capable to share same external
storage (or replicated by application/DB)
x
IaaS infrastructure capable to migrate VMs
between DCs x
Conclusions, advantages per model
30
3 DC Stretched
Cluster
2 DC Stretched
Cluster + Arbiter site
2 DC managed by
Git Ops
2 DC Stretched
Cluster + IaaS
Active/Active (RTO=0, RPO=0) x x x x
Self-sufficient, do not require IaaS
environment x x x
Simple DR testing (no service
availability impact) x x x x
Efficient system resources
infra-utilization (CPU and memory)
(66/66/66 > 0/100/100)
x
No additional efforts to
implement/maintain DR x x
Do not require manual intervention
upon DC disaster x x /
Capable to be deployed on Bare Metal x x x /
Thanks for your attention!
31
LinkedIn: https://www.linkedin.com/in/jvherrera/
Twitter: https://twitter.com/jvicenteherrera
Email: juanvi@redhat.com

More Related Content

What's hot

CI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdCI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdBilly Yuen
 
The Future of Service Mesh
The Future of Service MeshThe Future of Service Mesh
The Future of Service MeshAll Things Open
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
 
Red hat ceph storage customer presentation
Red hat ceph storage customer presentationRed hat ceph storage customer presentation
Red hat ceph storage customer presentationRodrigo Missiaggia
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
 
HTTP2 and gRPC
HTTP2 and gRPCHTTP2 and gRPC
HTTP2 and gRPCGuo Jing
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Community
 
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftSerhat Dirik
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101Sander Knape
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency DatabaseScyllaDB
 
MongoDB and Machine Learning with Flowable
MongoDB and Machine Learning with FlowableMongoDB and Machine Learning with Flowable
MongoDB and Machine Learning with FlowableFlowable
 
Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...
Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...
Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...Eva Mave Ng
 
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Red Hat Developers
 
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCD
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCDKubernetes GitOps featuring GitHub, Kustomize and ArgoCD
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCDSunnyvale
 

What's hot (20)

CI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdCI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cd
 
The Future of Service Mesh
The Future of Service MeshThe Future of Service Mesh
The Future of Service Mesh
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Red hat ceph storage customer presentation
Red hat ceph storage customer presentationRed hat ceph storage customer presentation
Red hat ceph storage customer presentation
 
DevOps Presentation.pptx
DevOps Presentation.pptxDevOps Presentation.pptx
DevOps Presentation.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Pub/Sub Messaging
Pub/Sub MessagingPub/Sub Messaging
Pub/Sub Messaging
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
HTTP2 and gRPC
HTTP2 and gRPCHTTP2 and gRPC
HTTP2 and gRPC
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
MongoDB and Machine Learning with Flowable
MongoDB and Machine Learning with FlowableMongoDB and Machine Learning with Flowable
MongoDB and Machine Learning with Flowable
 
Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...
Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...
Designing Apps for Runtime Fabric: Logging, Monitoring & Object Store Persist...
 
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
 
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCD
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCDKubernetes GitOps featuring GitHub, Kustomize and ArgoCD
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCD
 

Similar to OpenShift Multicluster

final-red-hat-te-2023-gaurav-midha open to world
final-red-hat-te-2023-gaurav-midha open to worldfinal-red-hat-te-2023-gaurav-midha open to world
final-red-hat-te-2023-gaurav-midha open to worldpbtest
 
Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...Michael Elder
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native EnvironmentsWSO2
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
2008-12-2 System z Partners Field Call
2008-12-2 System z Partners Field Call2008-12-2 System z Partners Field Call
2008-12-2 System z Partners Field CallShawn Wells
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerBob Killen
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftKangaroot
 
Introduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeIntroduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeTerry Wang
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1sprdd
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1sprdd
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionNetApp
 
Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015Eran Harel
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
MuleSoft Meetup Roma - CloudHub Networking Stategies
MuleSoft Meetup Roma -  CloudHub Networking StategiesMuleSoft Meetup Roma -  CloudHub Networking Stategies
MuleSoft Meetup Roma - CloudHub Networking StategiesAlfonso Martino
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes IntroductionMiloš Zubal
 

Similar to OpenShift Multicluster (20)

final-red-hat-te-2023-gaurav-midha open to world
final-red-hat-te-2023-gaurav-midha open to worldfinal-red-hat-te-2023-gaurav-midha open to world
final-red-hat-te-2023-gaurav-midha open to world
 
Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
2008-12-2 System z Partners Field Call
2008-12-2 System z Partners Field Call2008-12-2 System z Partners Field Call
2008-12-2 System z Partners Field Call
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
SDN_Gustaf_Nilstadius
SDN_Gustaf_NilstadiusSDN_Gustaf_Nilstadius
SDN_Gustaf_Nilstadius
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
 
Introduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud NativeIntroduction to containers, k8s, Microservices & Cloud Native
Introduction to containers, k8s, Microservices & Cloud Native
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
 
Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015Reactive by example - at Reversim Summit 2015
Reactive by example - at Reversim Summit 2015
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
MuleSoft Meetup Roma - CloudHub Networking Stategies
MuleSoft Meetup Roma -  CloudHub Networking StategiesMuleSoft Meetup Roma -  CloudHub Networking Stategies
MuleSoft Meetup Roma - CloudHub Networking Stategies
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 

More from Juan Vicente Herrera Ruiz de Alejo

AWS migration: getting to Data Center heaven with AWS and Chef
AWS migration: getting to Data Center heaven with AWS and ChefAWS migration: getting to Data Center heaven with AWS and Chef
AWS migration: getting to Data Center heaven with AWS and ChefJuan Vicente Herrera Ruiz de Alejo
 

More from Juan Vicente Herrera Ruiz de Alejo (20)

Keycloak SSO basics
Keycloak SSO basicsKeycloak SSO basics
Keycloak SSO basics
 
Deploying Minecraft with Ansible
Deploying Minecraft with AnsibleDeploying Minecraft with Ansible
Deploying Minecraft with Ansible
 
Tell me how you provision and I'll tell you how you are
Tell me how you provision and I'll tell you how you areTell me how you provision and I'll tell you how you are
Tell me how you provision and I'll tell you how you are
 
Santander DevopsandCloudDays 2021 - Hardening containers.pdf
Santander DevopsandCloudDays 2021 - Hardening containers.pdfSantander DevopsandCloudDays 2021 - Hardening containers.pdf
Santander DevopsandCloudDays 2021 - Hardening containers.pdf
 
X by orange; una telco en la nube
X by orange;   una telco en la nubeX by orange;   una telco en la nube
X by orange; una telco en la nube
 
Dorsal carrera de la mujer ROSAE 2017
Dorsal carrera de la mujer ROSAE 2017 Dorsal carrera de la mujer ROSAE 2017
Dorsal carrera de la mujer ROSAE 2017
 
Cartel carrera de la mujer ROSAE 2017
Cartel carrera de la mujer  ROSAE 2017Cartel carrera de la mujer  ROSAE 2017
Cartel carrera de la mujer ROSAE 2017
 
Volkswagen Prague Marathon 2017
Volkswagen Prague Marathon 2017Volkswagen Prague Marathon 2017
Volkswagen Prague Marathon 2017
 
Plan de entrenamiento Maratón de Madrid Mes 3
Plan de entrenamiento Maratón de Madrid Mes 3Plan de entrenamiento Maratón de Madrid Mes 3
Plan de entrenamiento Maratón de Madrid Mes 3
 
Plan de entrenamiento Maratón de Madrid Mes 2
Plan de entrenamiento Maratón de Madrid Mes 2Plan de entrenamiento Maratón de Madrid Mes 2
Plan de entrenamiento Maratón de Madrid Mes 2
 
Plan de entrenamiento Maratón de Madrid Mes 1
Plan de entrenamiento Maratón de Madrid Mes 1Plan de entrenamiento Maratón de Madrid Mes 1
Plan de entrenamiento Maratón de Madrid Mes 1
 
Cartel carrera de la mujer ROSAE 2014
Cartel carrera de la mujer ROSAE 2014Cartel carrera de la mujer ROSAE 2014
Cartel carrera de la mujer ROSAE 2014
 
AWS migration: getting to Data Center heaven with AWS and Chef
AWS migration: getting to Data Center heaven with AWS and ChefAWS migration: getting to Data Center heaven with AWS and Chef
AWS migration: getting to Data Center heaven with AWS and Chef
 
Devops madrid: successful case in AWS
Devops madrid: successful case in AWSDevops madrid: successful case in AWS
Devops madrid: successful case in AWS
 
Devops Madrid Marzo - Caso de uso en AWS
Devops Madrid Marzo - Caso de uso en AWSDevops Madrid Marzo - Caso de uso en AWS
Devops Madrid Marzo - Caso de uso en AWS
 
Configuration management with Chef
Configuration management with ChefConfiguration management with Chef
Configuration management with Chef
 
DevOps and Chef improve your life
DevOps and Chef improve your life DevOps and Chef improve your life
DevOps and Chef improve your life
 
MongoDB Devops Madrid February 2012
MongoDB Devops Madrid February 2012MongoDB Devops Madrid February 2012
MongoDB Devops Madrid February 2012
 
Amazon EC2: What is this and what can I do with it?
Amazon EC2: What is this and what can I do with it?Amazon EC2: What is this and what can I do with it?
Amazon EC2: What is this and what can I do with it?
 
MongoDB - Madrid Devops Febrero
MongoDB - Madrid Devops FebreroMongoDB - Madrid Devops Febrero
MongoDB - Madrid Devops Febrero
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

OpenShift Multicluster

  • 1. Juan Vicente Herrera - Red Hat Cloud Architect ¿Stretched cluster or Multi-cluster? Beyond your region... 1
  • 2. Agenda ● HA Layers ● Multicluster / GitOps ● Stretched Cluster ● Multicluster ● Disaster Recovery ● Conclusion 2
  • 3. High Availability vs. Disaster Recovery 3 ● HA, High Availability : is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period even if some components of the overall design are not functional (degraded). Generally, is based in Active/Active redundancy. ● DR, Disaster Recovery: involves a set of policies, tools and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Generally, involves secondary sites and Active/Passive redundancy.
  • 4. HA layers: Applications 4 ●Application availability usually involves creating different application pods. ●Whenever one of the application pods fails, all requests are redirected to the application pods that are still alive, not affecting the overall service level. ●Requests must be redirected to other application pods transparently for the final user. The application must not maintain any local data susceptible to be lost if the application instance fails (stateless). Node 1 Pod A Pod B Node 2 Pod A Pod B Node ... Pod A Pod C Node n Pod C Pod B Master 1 Master 2 Master 3 DataCenter KubernetesOpenShift Cluster
  • 5. HA layers: Node 5 ●Worker nodes are responsible to host application pods. ●When one of the worker nodes fails, K8s cluster redirects network traffic to the application pods in other worker nodes. If necessary, other application pods will be deployed automatically. ●The cluster must have enough system resources (CPU/MEM) to distribute application workloads upon worker node failures. In a 5 nodes Kubernetes/OpenShift cluster, application workloads should not consume more than 80% of the worker node system resources in order to be able to allocate new pods upon worker node failures. Node 1 Pod A Pod B Node 2 Pod A Pod B Node ... Pod A Pod C Node n Pod C Pod B Master 1 Master 2 Master 3 DataCenter Kubernetes/OpenShift Cluster
  • 6. HA layers: Control plane 6 ● Master nodes host administration and management services (such as API and Console pods). ● HA offered through quorum. Generally three master nodes are deployed. Upon node failure, two master nodes still alive and service is not disrupted. ● Losing the entire control plane do not affect application services, only OpenShift management and provisioning is affected (read-only operations). Node 1 Pod A Pod B Node 2 Pod A Pod B Node ... Pod A Pod C Node n Pod C Pod B Master 1 Master 2 Master 3 DataCenter Kubernetes/OpenShift Cluster
  • 7. HA layers: Data 7 ●Ceph introduction ○ Ceph is a Software Defined Storage system deployed on standard x86 servers, using the CRUSH algorithm to distribute data evenly across the cluster. ○ Ceph provides 3-in-1 interfaces for object, block and file level storage. Ceph aims primarily for completely distributed operation without a single point of failure and scalable to the exabyte level. ○ Ceph (by default) stores 3 object replicas per client object. Ceph Cluster Ceph Node Ceph Node Ceph Node Block FileSystem Object
  • 8. HA layers: Conclusions 8 Until here, everything works perfectly. Combining K8s/OCP and persistent storage as CEPH or equivalent in a single site, an extraordinary service level can be guaranteed, but… How can I protect my applications upon natural or human-induced disasters affecting the entire DataCenter? Node 1 Pod A Pod B Node 2 Pod A Pod B Node ... Pod A Pod C Node n Pod C Pod B Master 1 Master 2 Master 3 DataCenter Cluster Storage Node 1 Storage Node 2 Storage Node 3
  • 9. Protection upon disasters 9 ● Two different protection models: ○ Active/Active ■ Stretched cluster or multi-cluster. ■ Distributed applications between clusters/DCs. ■ Data is accessible from any cluster/DC. ○ Active/Passive ■ Applications ■ Data Node 1 Pod A Pod B Node 2 Pod A Pod B Node ... Pod A Pod C Node n Pod C Pod B Master 1 Master 2 Master 3 DataCenter Cluster Storage Node 1 Storage Node 2 Storage Node 3
  • 10. Multi-DataCenter deployments What are the different alternatives? 10 2. Stretched Cluster (Active/Active) 3. Disaster Recovery (Active/Passive) 4. Recap and Conclusions 1. GitOps: Two synchronized independent clusters (Active/Active)
  • 11. 1.- Multicluster / GitOps ● Two independent clusters synchronized directly by the applications (Active/Active) ● Data HA directly managed by the applications 11
  • 12. GitOps: Synchronized Clusters Configuration and application synchronization 12 GitOps is a way where a Git repository that always contains declarative descriptions of the infrastructure currently desired in the production environment and an automated process to make the production environment match the described state in the repository. The authorization mechanism can be utilized to restrict the permissions on performing deployments. This has a huge impact in terms of security as CI/CD applications do not need to interact with the OpenShift cluster in the production environment. Container Registry Configuration Continuous Integration (Build, Test, Etc.) Source Code Monitor & Apply changes Download Containers from Registry AppCtrl AppCtrl
  • 13. GitOps: Synchronized Clusters 13 ArgoCD is an "Application Controller" specifically designed for Kubernetes that actively monitors running applications and compare the current state with the desired state (specified in the Git repository). An application deviated from the desired state is considered as OutOfSync. ArgoCD informs and shows about the differences, offering different functionalities to synchronize (automatically or manually) the current state with the desired state. Any modification performed in the desired state in the Git repository can be applied in order to automatically synchronized (per cluster/environment)the application state.
  • 14. GitOps: Synchronized Clusters DATA management 14 GitOps is capable to offer a model managing applications where High Availability (with no service disruption) is guaranteed in a Multi-DC environment (No RPO or RTO). Requirement: Data application must be shared from different cluster/environments/sites. ● Multi-master design where all the DB instances are active, RW and maintain exactly the same information (RPO=0, RTO=0). ● Single-master design where all application access a single node, RW, where the information is replicated to other DB instances, RO or not accessible at all. When a failure occurs in the master instance, a secondary instance is promoted to master (RPO=0, RTO=~0) ● Either DB are deployed in containers or externally, access to the exact same shared data from the applications deployed in different sites must be ensured. Further info: En.wikipedia.org EDB Crunchy MongoDB CouchBase Percona Microsoft
  • 15. GitOps: Synchronized Clusters Conclusion 15 As a recap, in a multi-DC environment, GitOps + Data replication (in the application or DB layer) may offer Active/Active services with RPO=0 and RTO=0 upon natural or human-induced disaster. The main problem consists to identify the DB technologies (SQL and NoSQL) to use and standardize how to deploy each of them. The main advantages are: - Deploy in two different sites/DC (max RTT between sites defined by the DB layer). - Simplicity as there are not dependencies on the underlying infrastructure. - Cost, as this solution can be deployed in bare-metal nodes reducing licensing costs. - Cost, being Active/Active there are not infrautilized resources in standby environments. - Ideal solution if application are stateless or using external data. - RPO=0, RTO=0
  • 16. GitOps: Synchronized Clusters Conclusion Multi-DataCenter 16 The main disadvantages are: - Added complexity designing and deploying applications. - Distributed DBs usually works with 3 object replicas and 3 DCs. - External unification of incoming network traffic (routers, DNS, LBs, ...). - Submariner is not included in this solution, so network traffic between clusters/sites/DCs must leave OpenShift SDN. - It is not possible to share Persistent Volumes in ODF between OpenShift clusters. - There is not unified logging, monitoring, user access, security management, ... External tools must be used. - Max RTT are define per application/DB.
  • 18. 2.- Stretched Cluster (Active/Active) 18 ● In the GitOps model we have proposed the deployment of several independent and self-managed clusters that remain in a synchronized state. ● A Stretched Cluster is a K8s/OpenShift deployment model in which the nodes of the cluster are distributed among several DataCenters. ● Although it is unlikely, the total loss of the master nodes does not imply the loss of service but limiting only the management of the cluster.
  • 19. 2.- Stretched Cluster (Active/Active) 19 Pros: ● By definition, a Stretched Cluster provides RTO=0 and RPO=0 ● With 3 symmetrical DC's (same size), assuming 66% load, one DC could go down and the cluster should not impact the service availability. 33% of the load of the down DC is transferred to the remaining DC's (66/66/66 => 0/100/100) ● The same balancing happens internally between the worker nodes of each DataCenter. Network latency requirements: Latency between DataCenters ≤ 2ms (due to OCP masters requirements) ● ≤ 2ms between OCP masters ● ≤ 4ms between Ceph storage nodes (includinding arbiter) (Internal mode) ● ≤ 200ms between Ceph nodes and Arbiter node (External mode)
  • 20. 2.1- K8s/OCP + internal storage in 3 DC (rooms/FD) 20 Master 3 DataCenter 3 Node 1 Pod A Pod B Node n Pod C Pod B Master 2 DataCenter 2 ● w/without Application Nodes ● Master (it can be virtualized) ● CEPH w/ local physical storage Node 1 Pod A Pod B Node n Pod C Pod B Master 1 DataCenter 1 K8s/OpenShift Cluster ● Application Nodes ● Master (it can be virtualized) ● CEPH w/ local physical storage CEPH 1 CEPH CEPH ● Application Nodes ● Master (it can be virtualized) ● CEPH w/ local physical storage PROS: Autonomous and IaaS agnostic model. Minimum underutilization of Computing and Memory (66/66/66> 0/100/100) CONS: Requires 3 DCs, each of them well communicated with the other 2 DCs (latency ≤2ms) - Check ETCD behavior with network isolation between DC's - Applications may break as a result of the ETCD not being writeable Node 1 Pod C Pod B Node n Pod C Pod B
  • 21. Node 1 Pod A Pod B Node n Pod C Pod B Master 1 2.2- K8s/OCP + internal storage in 2 DC + Arbiter 21 PROS: Autonomous and IaaS agnostic model. The DC hosting the arbiter requires very low infrastructure resources. CONS: - Minimal resources for a 3rd room or a DataCenter well communicated with the other 2 DC's is required. - ≤ 4ms between Ceph nodes and Arbiter node (Internal mode) as K8s/OCP master requires lower RTT values. - Underutilization of Computing and Memory (50/0/50> 0/0/100) Master 3 DataCenter 3 Node 1 Pod A Pod B Node n Pod C Pod B Master 2 DataCenter 2 ● No Application Nodes ● Master (it can be virtualized) ● CEPH without local physical storage DataCenter 1 K8s/OpenShift Cluster ● Application Nodes ● Master (it can be virtualized) ● CEPH w/ local physical storage ● 2 Ceph monitors CEPH Arbiter Node (Metadata Only) CEPH CEPH ● Application Nodes ● Master (it can be virtualized) ● CEPH w/ local physical storage ● 2 Ceph monitors
  • 22. 2.3. HA based on IaaS - 2 DC’s Virtual 22 Master 3 Node 1 Pod A Pod B Node n Pod C Pod B Master 2 Node 1 Pod A Pod B Node n Pod C Pod B Master 1 OpenShift Cluster CEPH Arbiter Node (Metadata Only) 100% virtualized environments in both DC’s In the event of a DC1 crash all VMs are started in DC2 CEPH CEPH Node 1 Pod A Pod B Node n Pod C Pod B Master 1 Master 3 ODF 3 Arbiter Node (Metadata Only) ODF 1 DataCenter 1 DataCenter 2 ● Masters virtualized ● Workers virtualized ● Virtualized CEPH arbiter ● Masters virtualized ● Workers virtualized ● Virtualized CEPH arbiter PROS: In the event of a DataCenter crash, all nodes are moved to the other DataCenter CONS: Replication bandwidth and time is required to replicate and synchronize the migrated nodes (approx 1 to 5 min) - Requires IaaS (RHV / VMware) + synchronous cabin replication of all VM's from DC1 to DC2
  • 24. Conclusion, decision criteria 24 As described in the previous slides, there are multiple deployment options and alternatives on the table. A key point to be considered in the decision-making is the number of DC's, their capacities and the network latency between them. The first option to be considered should be a Stretched Cluster deployed across 3 DC's with similar computational capabilities (CPU, memory and storage) and network latency between them ≤ 2ms. This solution significantly reduces the costs of infrastructure, operation and software licensing with the highest degree of service availability.
  • 25. 25 On those scenarios where a Stretched Cluster deployment across 3 DC’s with similar computation capabilities, etc. is not an option, an alternative may be a Stretched Cluster in 2 DC plus an arbiter DC with limited resources. This solution reduces the infrastructure resources for the arbiter DC that could be allocated in a corporate building or room, and is separated from the 2 main DCs, as long as the network latency requirements are met (≤ 2ms). Conclusion, decision criteria
  • 26. 26 When a Stretched Cluster deployment in 3 DC’s is not possible but we can have latency ≤ 2ms, we can consider a Stretched Cluster in 2 DC with DR based on IaaS. This model introduces a caveat as it maintains 2 of the 3 nodes of the OCP and OCS control plane in one of the DCs. If this "primary" DC goes down, 1 node of each type must be transferred to the secondary DC for it to be considered with sufficient quorum. It requires designing and implementing the VM migration process. Conclusion, decision criteria
  • 27. 27 When a Stretched Cluster deployment on 3 DC’s is not an option or the network latency will always be higher than 2ms, we can consider a GitOps-based deployment on 2 DC’s. This model requires that the instances of the same application, running in any of the clusters, share the same data. Either because the architecture of the applications allows it (for example, event-oriented), or because the database is multimaster, or because there is a master-slave model that allows promoting the slave without loss of service. Conclusion, decision criteria
  • 28. 28 Finally, when a Stretched Cluster deployment in 3 DC's can't be accomplished, the latency is not ≤ 2ms and the data access model by applications cannot be standardized, we can only consider a single DC with DR. This approach is possibly the least flexible and the one with the highest cost per computing unit. It requires designing and implementing DR policies and processes bounded to it as well as the operations and the maintenance involved. Conclusion, decision criteria
  • 29. Conclusions, requirements per model 29 3 DC Stretched Cluster 2 DC Stretched Cluster + Arbiter site 2 DC managed by Git Ops 2 DC Stretched Cluster + IaaS Three identical DCs with available scalability (CPU, memory and storage) x Two identical DCs with available scalability (CPU, memory and storage) + Technical room to deploy two baremetal servers x Two identical DCs with available scalability (CPU, memory and storage). No technical room available. x x RTT latency between DCs =< 2ms x x x Applications deployed in two different OCP clusters capable to share same external storage (or replicated by application/DB) x IaaS infrastructure capable to migrate VMs between DCs x
  • 30. Conclusions, advantages per model 30 3 DC Stretched Cluster 2 DC Stretched Cluster + Arbiter site 2 DC managed by Git Ops 2 DC Stretched Cluster + IaaS Active/Active (RTO=0, RPO=0) x x x x Self-sufficient, do not require IaaS environment x x x Simple DR testing (no service availability impact) x x x x Efficient system resources infra-utilization (CPU and memory) (66/66/66 > 0/100/100) x No additional efforts to implement/maintain DR x x Do not require manual intervention upon DC disaster x x / Capable to be deployed on Bare Metal x x x /
  • 31. Thanks for your attention! 31 LinkedIn: https://www.linkedin.com/in/jvherrera/ Twitter: https://twitter.com/jvicenteherrera Email: juanvi@redhat.com