Plaza Semanggi 9 Fl, Unit 9
Jl. Jend Sudirman Kav 50, Jakarta - 12930
+6221-22866662 | info@equnix.asia
I N D O N E S I A
High Availability and Disaster Recovery
in RDBMS/PostgreSQL
By: Julyanto SUTANDANG
This document is owned by Equnix Business Solutions PTE LTD. This document
contains confidential information, which is protected by law. No part of this
publication could be photocopied, reproduced or translated into another language
without permitted or the express written consent of Equnix Business Solutions PTE
LTD.
Data and information regarding the proposal and its offer is for limited use and
are not disclosed. The information contained in this document is subject to change at
any time without prior notice.
All rights reserved, © Copyright 2019 - Equnix Business Solutions, PTE Ltd
Copyright Notice
Service Availability
No Uptime Level
Guarantee
Max Tolerable
Downtime
(in hours)
Supporting
Technology
1 95.00% 438 Cold Standby
2 98.00% 175 Warm Standby
3 99.00% 87,6 Hot Standby
4 99.90% 8,76 High Availability
5 99,999% ~0 Fault Tolerant
What is...
What is High Availability (HA)?
➢ A Constellation of System and effort for achieving Service Availability up to
99,99% (Max: 8,76 accumulated downtime in a year)
➢ HA is NOT LB (Load Balancing)
➢ HA is NOT DR (Disaster Recovery)
➢ It involves OS Level configuration and setup.
➢ HA mitigates: Hardware Failure, Facility outage
➢ HA NOT mitigate: Human failure,
➢ There are many technology to enable HA, some of them is overlapping OR
not actually intended for HA.
High Availability Constellation
HA - PostgreSQL
❖ HA not in-built/in-core in
PostgreSQL
❖ PostgreSQL has native
replication (binary and logical) to
support HA
❖ PostgreSQL has PROMOTION
mechanism from STANDBY into
MASTER.
❖ Split brain is the most avoidance
problem in HA of Database.
❖ Uses OS level Heartbeat:
➢ Pacemaker and Corosync
➢ Patroni
❖ Uses Virtual / Floating IP address,
for failover.
❖ Replication: Binary OR Logical
❖ Replication: Synchronous/Async
HA Constellation
❖ Master - Active
❖ Standby - Passive (Why
Passive?)
❖ Failover with downtime
less than 10 seconds
❖ Master-Standby has
private direct connection
❖ Each Host has Public
Access Network
1
2
3
HA - brigding OS into PostgreSQL
Create PostgreSQL Trigger Script (activate_standby.sh)
# vi /etc/init.d/activate_standby.sh
#!/bin/bash
case $1 in
start)
#touch /equnix/data/trigger_file
#sed -i “s/synchronous_standby_names/#synchronous_standby_names/g” /equnix/data/postgresql.conf
/etc/init.d/postgres.service reload
exit 0
;;
stop)
#sed -i “s/synchronous_standby_names/synchronous_standby_names/g” /equnix/data/postgresql.conf
/etc/init.d/postgres.service reload
;;
*)
exit 0;
esac;
HA - Master Recovery and Follow
What Will Happen Master Recovery?
❖ Master doesn’t failback (auto_failback = off)
❖ Master become new slave (slave already became master)
❖ Master follow new master (in such proper steps)
HA Constellation
Post Failover
❖ Master become Standby and follow new master (Standby)
Load Balancing - just for comparison
PostgreSQL supports
Load balancing, works
in certain condition
only.
OLTP require Scale Up
NOT Scale Out.
HA - Cycle Mode
Cycle Mode (3 or More Replicas)
Same Sites
HA - Cycle Mode
Master Down, Replica 1 Takeover become Master
Same Sites
HA - Cases
Tricky cases in HA Implementation:
1. Master Node down, Standby should
be able to notice and become Master less
than 10 secs. This is a normal operation
of HA.
2. Standby Node down, Master should
be able to notice and ensure the Standby
is shutdown and sends alert.
3. Public Network Master down, Master
can be configure to failover to standby
and shutdown itself gracefully.
4. Private Network down, Standby
ensure Master is down through another
connection and failover OR standby just
shut the replication off; Send Alert.
Disaster Recovery
Disaster Recovery Sites
Disaster Recovery Sites
What is...
What is Disaster Recovery (DR)?
➢ A Configuration of Sites and effort to
mitigate NATURAL DISASTER / Force
Majeure.
➢ Require 2 different sites, with minimum 70
KM or 80 KM (it depends)
➢ DR Swingover is definitely manual and
involving higher level of decision or policy
maker.
➢ There is Escalation Protocol should be
follow therefore, it is not technical decision.
What is...
What is Disaster Recovery (DR)?
➢ Swingover is done per site basis, NOT
per host or service basis.
➢ DR mitigates: Natural Disaster (Flood,
Fire, Earthquake, meteor (?) etc), and
other force majeure: Riots, Facility
shutdown, War, etc.
➢ DR NOT mitigate: Hardware Failure,
Human mistakes, facility outage, ...
Disaster Recovery
Disaster Recovery Configuration
Disaster Recovery
DC Site got disaster, manual Swingover by mgmt decision.
Who we are
As an IT Solutions Provider, we are committed to deliver services
Maintenance
Services
1. PostgreSQL
2. Linux OS
3. Open Source
Managed
Services
1. PostgreSQL
2. Linux OS
3. Open Source
System Optimization
Expertise (assessment,
consultation, advices,
solutions)
1. Any kind of IT
System which
require performance
fix and solutions
2. Network
3. Security
4. etc
High Performance
Software
Development
1. World class quality
2. High throughput
and high transactions
Hands-On Training
1. Linux Administration
(Basic and Advanced)
2. PostgreSQL
(Basic and Advanced)
3. High Performance
Transaction System
Our Proud Clients
Questions?

High Availability and Disaster Recovery in PostgreSQL - EQUNIX

  • 1.
    Plaza Semanggi 9Fl, Unit 9 Jl. Jend Sudirman Kav 50, Jakarta - 12930 +6221-22866662 | info@equnix.asia I N D O N E S I A High Availability and Disaster Recovery in RDBMS/PostgreSQL By: Julyanto SUTANDANG
  • 2.
    This document isowned by Equnix Business Solutions PTE LTD. This document contains confidential information, which is protected by law. No part of this publication could be photocopied, reproduced or translated into another language without permitted or the express written consent of Equnix Business Solutions PTE LTD. Data and information regarding the proposal and its offer is for limited use and are not disclosed. The information contained in this document is subject to change at any time without prior notice. All rights reserved, © Copyright 2019 - Equnix Business Solutions, PTE Ltd Copyright Notice
  • 3.
    Service Availability No UptimeLevel Guarantee Max Tolerable Downtime (in hours) Supporting Technology 1 95.00% 438 Cold Standby 2 98.00% 175 Warm Standby 3 99.00% 87,6 Hot Standby 4 99.90% 8,76 High Availability 5 99,999% ~0 Fault Tolerant
  • 4.
    What is... What isHigh Availability (HA)? ➢ A Constellation of System and effort for achieving Service Availability up to 99,99% (Max: 8,76 accumulated downtime in a year) ➢ HA is NOT LB (Load Balancing) ➢ HA is NOT DR (Disaster Recovery) ➢ It involves OS Level configuration and setup. ➢ HA mitigates: Hardware Failure, Facility outage ➢ HA NOT mitigate: Human failure, ➢ There are many technology to enable HA, some of them is overlapping OR not actually intended for HA.
  • 5.
  • 6.
    HA - PostgreSQL ❖HA not in-built/in-core in PostgreSQL ❖ PostgreSQL has native replication (binary and logical) to support HA ❖ PostgreSQL has PROMOTION mechanism from STANDBY into MASTER. ❖ Split brain is the most avoidance problem in HA of Database. ❖ Uses OS level Heartbeat: ➢ Pacemaker and Corosync ➢ Patroni ❖ Uses Virtual / Floating IP address, for failover. ❖ Replication: Binary OR Logical ❖ Replication: Synchronous/Async
  • 7.
    HA Constellation ❖ Master- Active ❖ Standby - Passive (Why Passive?) ❖ Failover with downtime less than 10 seconds ❖ Master-Standby has private direct connection ❖ Each Host has Public Access Network 1 2 3
  • 8.
    HA - brigdingOS into PostgreSQL Create PostgreSQL Trigger Script (activate_standby.sh) # vi /etc/init.d/activate_standby.sh #!/bin/bash case $1 in start) #touch /equnix/data/trigger_file #sed -i “s/synchronous_standby_names/#synchronous_standby_names/g” /equnix/data/postgresql.conf /etc/init.d/postgres.service reload exit 0 ;; stop) #sed -i “s/synchronous_standby_names/synchronous_standby_names/g” /equnix/data/postgresql.conf /etc/init.d/postgres.service reload ;; *) exit 0; esac;
  • 9.
    HA - MasterRecovery and Follow What Will Happen Master Recovery? ❖ Master doesn’t failback (auto_failback = off) ❖ Master become new slave (slave already became master) ❖ Master follow new master (in such proper steps)
  • 10.
    HA Constellation Post Failover ❖Master become Standby and follow new master (Standby)
  • 11.
    Load Balancing -just for comparison PostgreSQL supports Load balancing, works in certain condition only. OLTP require Scale Up NOT Scale Out.
  • 12.
    HA - CycleMode Cycle Mode (3 or More Replicas) Same Sites
  • 13.
    HA - CycleMode Master Down, Replica 1 Takeover become Master Same Sites
  • 14.
    HA - Cases Trickycases in HA Implementation: 1. Master Node down, Standby should be able to notice and become Master less than 10 secs. This is a normal operation of HA. 2. Standby Node down, Master should be able to notice and ensure the Standby is shutdown and sends alert. 3. Public Network Master down, Master can be configure to failover to standby and shutdown itself gracefully. 4. Private Network down, Standby ensure Master is down through another connection and failover OR standby just shut the replication off; Send Alert.
  • 15.
  • 16.
  • 17.
  • 18.
    What is... What isDisaster Recovery (DR)? ➢ A Configuration of Sites and effort to mitigate NATURAL DISASTER / Force Majeure. ➢ Require 2 different sites, with minimum 70 KM or 80 KM (it depends) ➢ DR Swingover is definitely manual and involving higher level of decision or policy maker. ➢ There is Escalation Protocol should be follow therefore, it is not technical decision.
  • 19.
    What is... What isDisaster Recovery (DR)? ➢ Swingover is done per site basis, NOT per host or service basis. ➢ DR mitigates: Natural Disaster (Flood, Fire, Earthquake, meteor (?) etc), and other force majeure: Riots, Facility shutdown, War, etc. ➢ DR NOT mitigate: Hardware Failure, Human mistakes, facility outage, ...
  • 20.
  • 21.
    Disaster Recovery DC Sitegot disaster, manual Swingover by mgmt decision.
  • 22.
    Who we are Asan IT Solutions Provider, we are committed to deliver services Maintenance Services 1. PostgreSQL 2. Linux OS 3. Open Source Managed Services 1. PostgreSQL 2. Linux OS 3. Open Source System Optimization Expertise (assessment, consultation, advices, solutions) 1. Any kind of IT System which require performance fix and solutions 2. Network 3. Security 4. etc High Performance Software Development 1. World class quality 2. High throughput and high transactions Hands-On Training 1. Linux Administration (Basic and Advanced) 2. PostgreSQL (Basic and Advanced) 3. High Performance Transaction System
  • 23.
  • 24.