High Availability and Disaster Recovery in PostgreSQL - EQUNIX

Plaza Semanggi 9 Fl, Unit 9
Jl. Jend Sudirman Kav 50, Jakarta - 12930
+6221-22866662 | info@equnix.asia
I N D O N E S I A
High Availability and Disaster Recovery
in RDBMS/PostgreSQL
By: Julyanto SUTANDANG

This document is owned by Equnix Business Solutions PTE LTD. This document
contains confidential information, which is protected by law. No part of this
publication could be photocopied, reproduced or translated into another language
without permitted or the express written consent of Equnix Business Solutions PTE
LTD.
Data and information regarding the proposal and its offer is for limited use and
are not disclosed. The information contained in this document is subject to change at
any time without prior notice.
All rights reserved, © Copyright 2019 - Equnix Business Solutions, PTE Ltd
Copyright Notice

Service Availability
No Uptime Level
Guarantee
Max Tolerable
Downtime
(in hours)
Supporting
Technology
1 95.00% 438 Cold Standby
2 98.00% 175 Warm Standby
3 99.00% 87,6 Hot Standby
4 99.90% 8,76 High Availability
5 99,999% ~0 Fault Tolerant

What is...
What is High Availability (HA)?
➢ A Constellation of System and effort for achieving Service Availability up to
99,99% (Max: 8,76 accumulated downtime in a year)
➢ HA is NOT LB (Load Balancing)
➢ HA is NOT DR (Disaster Recovery)
➢ It involves OS Level configuration and setup.
➢ HA mitigates: Hardware Failure, Facility outage
➢ HA NOT mitigate: Human failure,
➢ There are many technology to enable HA, some of them is overlapping OR
not actually intended for HA.

High Availability Constellation

HA - PostgreSQL
❖ HA not in-built/in-core in
PostgreSQL
❖ PostgreSQL has native
replication (binary and logical) to
support HA
❖ PostgreSQL has PROMOTION
mechanism from STANDBY into
MASTER.
❖ Split brain is the most avoidance
problem in HA of Database.
❖ Uses OS level Heartbeat:
➢ Pacemaker and Corosync
➢ Patroni
❖ Uses Virtual / Floating IP address,
for failover.
❖ Replication: Binary OR Logical
❖ Replication: Synchronous/Async

HA Constellation
❖ Master - Active
❖ Standby - Passive (Why
Passive?)
❖ Failover with downtime
less than 10 seconds
❖ Master-Standby has
private direct connection
❖ Each Host has Public
Access Network
1
2
3

HA - brigding OS into PostgreSQL
Create PostgreSQL Trigger Script (activate_standby.sh)
# vi /etc/init.d/activate_standby.sh
#!/bin/bash
case $1 in
start)
#touch /equnix/data/trigger_file
#sed -i “s/synchronous_standby_names/#synchronous_standby_names/g” /equnix/data/postgresql.conf
/etc/init.d/postgres.service reload
exit 0
;;
stop)
#sed -i “s/synchronous_standby_names/synchronous_standby_names/g” /equnix/data/postgresql.conf
/etc/init.d/postgres.service reload
;;
*)
exit 0;
esac;

HA - Master Recovery and Follow
What Will Happen Master Recovery?
❖ Master doesn’t failback (auto_failback = off)
❖ Master become new slave (slave already became master)
❖ Master follow new master (in such proper steps)

HA Constellation
Post Failover
❖ Master become Standby and follow new master (Standby)

Load Balancing - just for comparison
PostgreSQL supports
Load balancing, works
in certain condition
only.
OLTP require Scale Up
NOT Scale Out.

HA - Cycle Mode
Cycle Mode (3 or More Replicas)
Same Sites

HA - Cycle Mode
Master Down, Replica 1 Takeover become Master
Same Sites

HA - Cases
Tricky cases in HA Implementation:
1. Master Node down, Standby should
be able to notice and become Master less
than 10 secs. This is a normal operation
of HA.
2. Standby Node down, Master should
be able to notice and ensure the Standby
is shutdown and sends alert.
3. Public Network Master down, Master
can be configure to failover to standby
and shutdown itself gracefully.
4. Private Network down, Standby
ensure Master is down through another
connection and failover OR standby just
shut the replication off; Send Alert.

What is...
What is Disaster Recovery (DR)?
➢ A Configuration of Sites and effort to
mitigate NATURAL DISASTER / Force
Majeure.
➢ Require 2 different sites, with minimum 70
KM or 80 KM (it depends)
➢ DR Swingover is definitely manual and
involving higher level of decision or policy
maker.
➢ There is Escalation Protocol should be
follow therefore, it is not technical decision.

What is...
What is Disaster Recovery (DR)?
➢ Swingover is done per site basis, NOT
per host or service basis.
➢ DR mitigates: Natural Disaster (Flood,
Fire, Earthquake, meteor (?) etc), and
other force majeure: Riots, Facility
shutdown, War, etc.
➢ DR NOT mitigate: Hardware Failure,
Human mistakes, facility outage, ...

Disaster Recovery
Disaster Recovery Configuration

Disaster Recovery
DC Site got disaster, manual Swingover by mgmt decision.

Who we are
As an IT Solutions Provider, we are committed to deliver services
Maintenance
Services
1. PostgreSQL
2. Linux OS
3. Open Source
Managed
Services
1. PostgreSQL
2. Linux OS
3. Open Source
System Optimization
Expertise (assessment,
consultation, advices,
solutions)
1. Any kind of IT
System which
require performance
fix and solutions
2. Network
3. Security
4. etc
High Performance
Software
Development
1. World class quality
2. High throughput
and high transactions
Hands-On Training
1. Linux Administration
(Basic and Advanced)
2. PostgreSQL
(Basic and Advanced)
3. High Performance
Transaction System

High Availability and Disaster Recovery in PostgreSQL - EQUNIX

More Related Content

What's hot

Similar to High Availability and Disaster Recovery in PostgreSQL - EQUNIX

Recently uploaded

High Availability and Disaster Recovery in PostgreSQL - EQUNIX