"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
1. Plaza Semanggi 9 Fl, Unit 9
Jl. Jend Sudirman Kav 50, Jakarta - 12930
+6221-22866662 | info@equnix.asia
I N D O N E S I A
High Availability and Disaster Recovery
in RDBMS/PostgreSQL
By: Julyanto SUTANDANG
3. Service Availability
No Uptime Level
Guarantee
Max Tolerable
Downtime
(in hours)
Supporting
Technology
1 95.00% 438 Cold Standby
2 98.00% 175 Warm Standby
3 99.00% 87,6 Hot Standby
4 99.90% 8,76 High Availability
5 99,999% ~0 Fault Tolerant
4. What is...
What is High Availability (HA)?
➢ A Constellation of System and effort for achieving Service Availability up to
99,99% (Max: 8,76 accumulated downtime in a year)
➢ HA is NOT LB (Load Balancing)
➢ HA is NOT DR (Disaster Recovery)
➢ It involves OS Level configuration and setup.
➢ HA mitigates: Hardware Failure, Facility outage
➢ HA NOT mitigate: Human failure,
➢ There are many technology to enable HA, some of them is overlapping OR
not actually intended for HA.
6. HA - PostgreSQL
❖ HA not in-built/in-core in
PostgreSQL
❖ PostgreSQL has native
replication (binary and logical) to
support HA
❖ PostgreSQL has PROMOTION
mechanism from STANDBY into
MASTER.
❖ Split brain is the most avoidance
problem in HA of Database.
❖ Uses OS level Heartbeat:
➢ Pacemaker and Corosync
➢ Patroni
❖ Uses Virtual / Floating IP address,
for failover.
❖ Replication: Binary OR Logical
❖ Replication: Synchronous/Async
7. HA Constellation
❖ Master - Active
❖ Standby - Passive (Why
Passive?)
❖ Failover with downtime
less than 10 seconds
❖ Master-Standby has
private direct connection
❖ Each Host has Public
Access Network
1
2
3
8. HA - brigding OS into PostgreSQL
Create PostgreSQL Trigger Script (activate_standby.sh)
# vi /etc/init.d/activate_standby.sh
#!/bin/bash
case $1 in
start)
#touch /equnix/data/trigger_file
#sed -i “s/synchronous_standby_names/#synchronous_standby_names/g” /equnix/data/postgresql.conf
/etc/init.d/postgres.service reload
exit 0
;;
stop)
#sed -i “s/synchronous_standby_names/synchronous_standby_names/g” /equnix/data/postgresql.conf
/etc/init.d/postgres.service reload
;;
*)
exit 0;
esac;
9. HA - Master Recovery and Follow
What Will Happen Master Recovery?
❖ Master doesn’t failback (auto_failback = off)
❖ Master become new slave (slave already became master)
❖ Master follow new master (in such proper steps)
11. Load Balancing - just for comparison
PostgreSQL supports
Load balancing, works
in certain condition
only.
OLTP require Scale Up
NOT Scale Out.
12. HA - Cycle Mode
Cycle Mode (3 or More Replicas)
Same Sites
13. HA - Cycle Mode
Master Down, Replica 1 Takeover become Master
Same Sites
14. HA - Cases
Tricky cases in HA Implementation:
1. Master Node down, Standby should
be able to notice and become Master less
than 10 secs. This is a normal operation
of HA.
2. Standby Node down, Master should
be able to notice and ensure the Standby
is shutdown and sends alert.
3. Public Network Master down, Master
can be configure to failover to standby
and shutdown itself gracefully.
4. Private Network down, Standby
ensure Master is down through another
connection and failover OR standby just
shut the replication off; Send Alert.
18. What is...
What is Disaster Recovery (DR)?
➢ A Configuration of Sites and effort to
mitigate NATURAL DISASTER / Force
Majeure.
➢ Require 2 different sites, with minimum 70
KM or 80 KM (it depends)
➢ DR Swingover is definitely manual and
involving higher level of decision or policy
maker.
➢ There is Escalation Protocol should be
follow therefore, it is not technical decision.
19. What is...
What is Disaster Recovery (DR)?
➢ Swingover is done per site basis, NOT
per host or service basis.
➢ DR mitigates: Natural Disaster (Flood,
Fire, Earthquake, meteor (?) etc), and
other force majeure: Riots, Facility
shutdown, War, etc.
➢ DR NOT mitigate: Hardware Failure,
Human mistakes, facility outage, ...
22. Who we are
As an IT Solutions Provider, we are committed to deliver services
Maintenance
Services
1. PostgreSQL
2. Linux OS
3. Open Source
Managed
Services
1. PostgreSQL
2. Linux OS
3. Open Source
System Optimization
Expertise (assessment,
consultation, advices,
solutions)
1. Any kind of IT
System which
require performance
fix and solutions
2. Network
3. Security
4. etc
High Performance
Software
Development
1. World class quality
2. High throughput
and high transactions
Hands-On Training
1. Linux Administration
(Basic and Advanced)
2. PostgreSQL
(Basic and Advanced)
3. High Performance
Transaction System