SlideShare a Scribd company logo
Stop Worrying and Keep Querying Using
Automated Multi-Region Disaster Recovery
Sergey Pronin sergey.pronin@percona.com
Shivani Gupta shivani@elotl.co
Jan Baraniewski jan@elotl.co
Agenda
1. Problem space
2. Solution
Problem space
Agenda
1. Problem space
a. Why Disaster Recovery (DR)
b. PostgreSQL on Kubernetes
c. DR setup in Percona Operator for PostgreSQL
2. Solution
a. Multi-cluster control planes
b. DR orchestration architecture
c. Demo w/ Elotl Nova
Why Disaster Recovery
“Disaster Recovery is an organization's plan to protect its IT systems and
data from disasters and recover quickly to minimize downtime and losses.”
1. Business continuity
a. SLA requirements
2. Compliance and standards
Disaster recovery
PostgreSQL on Kubernetes with Operators
● Operator controls database and
k8s primitives
● Day-1 simplified to one step
● Day-2 operations automated
DR through Backups
DR through Replication
Automated failover - problem space
Why Automation of Disaster Recovery?
Myth: ‘...but DR is rarely ever needed’
• Cloud Regions do fail often enough and for long enough to disrupt
business
• On-prem data centers do fail
When it happens: Need close to zero RTO for mission critical applications
• With manual steps, runbooks often cannot be found or are not
up-to-date
• Manual process comes with risk of human error
Should be regularly tested:
• Important to regularly fire-drill Disaster Recovery as part of regular QA
process (say once a month)
Solution
Agenda
1. Problem space
a. Why Disaster Recovery (DR)
b. PostgreSQL on Kubernetes
c. DR setup in Percona Operator for PostgreSQL
2. Solution
a. Multi-cluster control planes
b. DR orchestration architecture
c. Demo w/ Elotl Nova and Percona PostgreSQL
Multi-Cluster Control Plane aka Multi-cluster Orchestrator
• Deploy workloads to one or
more clusters from a central
scheduler
• Aggregate view of workload
topologies
• Orchestrate actions across
workloads
Multi-cluster Control Plane
Workload Clusters
Karmada, Admiralty, Elotl Nova follow similar architecture.
Policy Based Scheduling
Multi-cluster Control Plane
Workload Clusters
Decouple placement from workload definition
App
Manifest
Schedule
Policy
Schedule Policy Custom Resource
• Namespace selector
• Resource selector
• Cluster selector
spec:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: microsvc-demo
clusterSelector:
matchLabels:
nova.elotl.co/cluster.region: "us-east-1"
resourceSelectors:
labelSelectors:
- matchLabels:
microServicesDemo: "yes"
Spread Specification
“Cloning” a workload (e.g. ReplicaSet) from Control Plane cluster to the selected workload clusters
• Mode: Divide - each clusters runs a % of the replicas specified in the Control Plane workload
• Mode: Duplicate - each cluster runs the same number of replicas as specified in the Control Plane workload
apiVersion: policy.elotl.co/v1alpha1
kind: SchedulePolicy
metadata:
name: postgres-spread
spec:
spreadConstraints:
spreadMode: Duplicate
topologyKey: kubernetes.io/metadata.name
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: psql-operator
clusterSelector:
matchExpressions:
-key: kubernetes.io/metadata.name
operator: In
values:
-cluster-1
-cluster-2
Components of Disaster Recovery
• Setup database on multiple K8s clusters (different cloud regions or different
clouds or different data centers)
• Challenge: getting the setup right is error-prone. E.g. same configuration, same secrets
for backup repository (S3) or TLS secrets;
• Solution: Central scheduler w/ spread scheduling
• Data Replication
• Taken care of by PostgreSQL native methods
• Failure Detection
• Needs to be flexible depending on business requirements
• Failover
• Needs to be flexible based on business requirements. E.g. a simplistic scenario for
PostgreSQL is re-configure standby database and redirect application traffic.
• Failback (optional)
DR Orchestration Architecture
Scheduler
Failure
Webhook
Failover
Controller
Nova Control Plane
Workload Cluster
Nova Agent
Monitoring
Tool
Workload Cluster
Nova Agent
Configurations:
● Register Nova
Webhook as an alert
receiver in your
monitoring tool.
● Supply a mapping of
alert labels to docker
image w/ failover
logic.
Demo Layout: PostgreSQL automated failover to Standby
Nova Control Plane
S3
Bucket
Workload Cluster 1
Primary
Workload Cluster 2
StandBy
Workload Cluster 3
HAProxy
PSQL Client
AWS Region 1 AWS Region 2
AWS Region 3
Job for failover:
● Changes manifest of cluster-2
postgres to ‘primary’
● Re-configures HAProxy to
point to postgres on cluster-2
DB Monitoring
Nova agent to CP
Demo
Takeaways
• To survive widespread outages, your database requires deployment to
multiple clusters in different regions.
• Use of K8s, along with operators, makes DR setup easier and opens up
opportunities for automation, in turn enabling better RTO.
• Automation of recovery can be done in a simple, low-friction way using a
multi-cluster control plane such as Nova.
Future Work
• CRD based definition for failure detection and failover
• Eliminate out-of-band configuration and specify everything by deploying a
manifest
• High Availability of the Nova Control Plane
• Provide option to install Nova in active-active HA mode
Resources
• Learn more about Percona operators: https://per.co.na/operators
• Learn more about Elotl Nova: https://www.elotl.co/nova.html
• Free trial of Elotl Nova: https://www.elotl.co/free-trial.html
• Nova HADR beta coming soon!
Thank you!
Please feel free to
provide feedback using
this QR code.

More Related Content

Similar to Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery

AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
Markus Michalewicz
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Codemotion
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storage
Karol Chrapek
 
Kubernetes intro
Kubernetes introKubernetes intro
Kubernetes intro
Pravin Magdum
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
Ajith Narayanan
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk
Eran Gampel
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
Docker, Inc.
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
Leah Cole
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Rishabh Indoria
 
Monitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusMonitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheus
Chandresh Pancholi
 
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh
CodeOps Technologies LLP
 
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSScaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Jelastic Multi-Cloud PaaS
 
Distributed tracing in OpenStack
Distributed tracing in OpenStackDistributed tracing in OpenStack
Distributed tracing in OpenStack
Ilya Shakhat
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
Terry Cho
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
Nopparat Nopkuat
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
Valeriia Maliarenko
 
VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld
 

Similar to Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery (20)

AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storage
 
Kubernetes intro
Kubernetes introKubernetes intro
Kubernetes intro
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Monitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusMonitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheus
 
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh
 
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaSScaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
 
Distributed tracing in OpenStack
Distributed tracing in OpenStackDistributed tracing in OpenStack
Distributed tracing in OpenStack
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 
VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware
 

More from DoKC

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
DoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
DoKC
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
DoKC
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
DoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
DoKC
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
DoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
DoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
DoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
DoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
DoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
DoKC
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery

  • 1. Stop Worrying and Keep Querying Using Automated Multi-Region Disaster Recovery Sergey Pronin sergey.pronin@percona.com Shivani Gupta shivani@elotl.co Jan Baraniewski jan@elotl.co
  • 4. Agenda 1. Problem space a. Why Disaster Recovery (DR) b. PostgreSQL on Kubernetes c. DR setup in Percona Operator for PostgreSQL 2. Solution a. Multi-cluster control planes b. DR orchestration architecture c. Demo w/ Elotl Nova
  • 5. Why Disaster Recovery “Disaster Recovery is an organization's plan to protect its IT systems and data from disasters and recover quickly to minimize downtime and losses.” 1. Business continuity a. SLA requirements 2. Compliance and standards
  • 7. PostgreSQL on Kubernetes with Operators ● Operator controls database and k8s primitives ● Day-1 simplified to one step ● Day-2 operations automated
  • 10. Automated failover - problem space
  • 11. Why Automation of Disaster Recovery? Myth: ‘...but DR is rarely ever needed’ • Cloud Regions do fail often enough and for long enough to disrupt business • On-prem data centers do fail When it happens: Need close to zero RTO for mission critical applications • With manual steps, runbooks often cannot be found or are not up-to-date • Manual process comes with risk of human error Should be regularly tested: • Important to regularly fire-drill Disaster Recovery as part of regular QA process (say once a month)
  • 13. Agenda 1. Problem space a. Why Disaster Recovery (DR) b. PostgreSQL on Kubernetes c. DR setup in Percona Operator for PostgreSQL 2. Solution a. Multi-cluster control planes b. DR orchestration architecture c. Demo w/ Elotl Nova and Percona PostgreSQL
  • 14. Multi-Cluster Control Plane aka Multi-cluster Orchestrator • Deploy workloads to one or more clusters from a central scheduler • Aggregate view of workload topologies • Orchestrate actions across workloads Multi-cluster Control Plane Workload Clusters Karmada, Admiralty, Elotl Nova follow similar architecture.
  • 15. Policy Based Scheduling Multi-cluster Control Plane Workload Clusters Decouple placement from workload definition App Manifest Schedule Policy
  • 16. Schedule Policy Custom Resource • Namespace selector • Resource selector • Cluster selector spec: namespaceSelector: matchLabels: kubernetes.io/metadata.name: microsvc-demo clusterSelector: matchLabels: nova.elotl.co/cluster.region: "us-east-1" resourceSelectors: labelSelectors: - matchLabels: microServicesDemo: "yes"
  • 17. Spread Specification “Cloning” a workload (e.g. ReplicaSet) from Control Plane cluster to the selected workload clusters • Mode: Divide - each clusters runs a % of the replicas specified in the Control Plane workload • Mode: Duplicate - each cluster runs the same number of replicas as specified in the Control Plane workload apiVersion: policy.elotl.co/v1alpha1 kind: SchedulePolicy metadata: name: postgres-spread spec: spreadConstraints: spreadMode: Duplicate topologyKey: kubernetes.io/metadata.name namespaceSelector: matchLabels: kubernetes.io/metadata.name: psql-operator clusterSelector: matchExpressions: -key: kubernetes.io/metadata.name operator: In values: -cluster-1 -cluster-2
  • 18. Components of Disaster Recovery • Setup database on multiple K8s clusters (different cloud regions or different clouds or different data centers) • Challenge: getting the setup right is error-prone. E.g. same configuration, same secrets for backup repository (S3) or TLS secrets; • Solution: Central scheduler w/ spread scheduling • Data Replication • Taken care of by PostgreSQL native methods • Failure Detection • Needs to be flexible depending on business requirements • Failover • Needs to be flexible based on business requirements. E.g. a simplistic scenario for PostgreSQL is re-configure standby database and redirect application traffic. • Failback (optional)
  • 19. DR Orchestration Architecture Scheduler Failure Webhook Failover Controller Nova Control Plane Workload Cluster Nova Agent Monitoring Tool Workload Cluster Nova Agent Configurations: ● Register Nova Webhook as an alert receiver in your monitoring tool. ● Supply a mapping of alert labels to docker image w/ failover logic.
  • 20. Demo Layout: PostgreSQL automated failover to Standby Nova Control Plane S3 Bucket Workload Cluster 1 Primary Workload Cluster 2 StandBy Workload Cluster 3 HAProxy PSQL Client AWS Region 1 AWS Region 2 AWS Region 3 Job for failover: ● Changes manifest of cluster-2 postgres to ‘primary’ ● Re-configures HAProxy to point to postgres on cluster-2 DB Monitoring Nova agent to CP
  • 21. Demo
  • 22. Takeaways • To survive widespread outages, your database requires deployment to multiple clusters in different regions. • Use of K8s, along with operators, makes DR setup easier and opens up opportunities for automation, in turn enabling better RTO. • Automation of recovery can be done in a simple, low-friction way using a multi-cluster control plane such as Nova.
  • 23. Future Work • CRD based definition for failure detection and failover • Eliminate out-of-band configuration and specify everything by deploying a manifest • High Availability of the Nova Control Plane • Provide option to install Nova in active-active HA mode
  • 24. Resources • Learn more about Percona operators: https://per.co.na/operators • Learn more about Elotl Nova: https://www.elotl.co/nova.html • Free trial of Elotl Nova: https://www.elotl.co/free-trial.html • Nova HADR beta coming soon!
  • 25. Thank you! Please feel free to provide feedback using this QR code.