Hot Streaming
Replication in
Postgres v9.3
Setup, Failover and Rebuilding the Master made easy with
Postgres
20/6/2014
v9.3
PREPARE THE INSTANCE
 Install Postgres on Servers which are going to hold Primary and
Secondary database
 Setup and configure the database cluster on Primary servers
 In this example:
 Primary DB Server:
 Dbserver1- 192.168.160.155
 Data directory: /opt/PostgresPlus/9.3AS/data
 Port: 5445
 Stand by DB Server
 Dbserver2- 192.168.160.155
 Data Directory: /opt/PostgresPlus/9.3AS/data2
 Port: 5446
2
EDIT postgresql.conf AND pg_hba.conf
ON MASTER
 wal_level = hot_standby (mandatory)
 max_wal_senders = 3 (mandatory to be set to a positive integer)
 wal_keep_segments = 128 (optional/depending on load)
 replication_timeout = 5 sec (optional)
 hot_standby = on (effective only for hot stand by server)
 Add entry in pg_hba.conf
 host replication enterprisedb 192.168.160.155/32 trust
3
TAKE A BACKUP AND RESTORE ON
SECONDARY
 Take a backup of Primary DB Instance/Cluster
 If archive WALs are available then you can take hot-backup using
pg_basebackup
 Restore the same for creating the DB cluster/instance on Secondary
server
 Change the port number if required in the new DB Cluster
4
CREATE recovery.conf IN SECONDARY
SERVER
 standby_mode = 'on' #mandatory
 primary_conninfo = 'host=192.168.160.155 port=5445
user=enterprisedb password=password'
 recovery_target_timeline = 'latest' #optional ## Important for rebuilding
 trigger_file = '/opt/PostgresPlus/9.3AS/data2/recover.trigger' #optional
 Note: pg_basebackup has option -R to create a default recovery.conf
file while dumping the backup
 pg_basebackup -h 192.168.160.155 –p 5445 –U enterprisedb -R
5
START THE SERVERS
 Start the secondary server
 There will be a warning in log files that primary server is not available,
ignore that
 Start the primary server
6
TEST REPLICATION
 On Primary:
edb=# insert into replication_test values (2);
INSERT 0 1
 On Secondary:
edb=# select * from replication_test;
test_column
-------------
1
2
(2 rows)
 Secondary server is read-only:
edb=# insert into replication_test values (3);
ERROR: cannot execute INSERT in a read-only transaction
7
TRIGGERING THE FAILOVER
 To, Trigger a failure on Primary and create the recovery trigger file
(manually, but can be scripted too)
touch /opt/PostgresPlus/9.2AS/data2/recover.trigger | pg_ctl promote –D
/opt/PostgresPlus/9.3AS/data2
 Logic to script the above step:
while( pg_ctl -h 192.168.160.147 –p 5444 -c "select 1 “)
{ sleep $connection_wait_time; }
touch opt/PostgresPlus/9.3AS/data2/recover.trigger
 Once completed, the recovery.conf will change to recover.done
 Connect to secondary db and execute insert to confirm the failover
edb=# insert into replication_test values (4);
INSERT 0 1
 Or execute select pg_is_in_recovery(); (output must be “f”) to confirm recovery is completed
 Point the database/Virtual IP to new database server
8
TRIGGERING THE SWITCHOVER
 Disconnect all the application from Primary Node
 Shutdown the primary database
 To, Trigger a failure on Primary and create the recovery trigger file
touch opt/PostgresPlus/9.3AS/data2/recover.trigger | pg_ctl promote
 Once completed, the recovery.conf will change to recover.done
 Connect to secondary db and execute insert
edb=# insert into replication_test values (4);
INSERT 0 1
 Or execute select pg_is_in_recovery(); (output must be “f”) to confirm recovery is completed
 Point the database/Virtual IP to new database server
9
HANDLING MULTIPLE REPLICAS
 v9.3 re-mastering will not need rebuilding the slaves.
 In v9.3 Timeline switches are part of WAL as well which can replicated
 Timeline switches happen during PITR or when slaves are promoted
 Other replicas can be re-configured and restarted to receive WAL from
new primary without rebuilding them from scratch
10
11
 Before Failover
RE-MASTERING
Srv1 as
Master
Srv2 as
Slave1
Srv3 as
Slave2
After failover
Srv1 has
Crashed
Srv2 as
Master
Srv3 as
Slave
No need to rebuild Srv3 or
restore archives. Timeline
switch info will be received
from Srv2 via streamed
WAL
Reconfigure
Srv3 to pull
WAL from
Srv2 and
Restart
REBUILDING THE MASTER
 If old primary needs to be added back to cluster as slave, it need not be re-
built
 Prior to v9.3 you either need to have wal archives to add a lost primary as
slave or you need to rebuild the primary
 In v9.3 Timeline switches are part of WAL as well which can be replicated
 As long as all the WAL since failure are available you can add the lost
master without any downtime/rebuilding
 Copy the recovery.done from new primary as recovery.conf in data
directory of lost primary
 Make changes in connection information and Start the old primary
instance as new hot standby
12
MONITORING THE REPLICATION
 Check if the current node is master or slave:
 SELECT pg_is_in_recovery();
 See the current snapshot on master and slave:
 SELECT txid_current_snapshot();
 Get latest information about replication from
 pg_stat_replication view
13
Sameer Kumar
Ashnik Pte Ltd, Singapore
www.ashnik.com | sameer.kumar@ashnik.com
www.slideshare.net/sameerkasi200x |
www.twitter.com/sameerkasi200x
Follow my blogs- pgpen.blogspot.com

Streaming Replication Made Easy in v9.3

  • 1.
    Hot Streaming Replication in Postgresv9.3 Setup, Failover and Rebuilding the Master made easy with Postgres 20/6/2014 v9.3
  • 2.
    PREPARE THE INSTANCE Install Postgres on Servers which are going to hold Primary and Secondary database  Setup and configure the database cluster on Primary servers  In this example:  Primary DB Server:  Dbserver1- 192.168.160.155  Data directory: /opt/PostgresPlus/9.3AS/data  Port: 5445  Stand by DB Server  Dbserver2- 192.168.160.155  Data Directory: /opt/PostgresPlus/9.3AS/data2  Port: 5446 2
  • 3.
    EDIT postgresql.conf ANDpg_hba.conf ON MASTER  wal_level = hot_standby (mandatory)  max_wal_senders = 3 (mandatory to be set to a positive integer)  wal_keep_segments = 128 (optional/depending on load)  replication_timeout = 5 sec (optional)  hot_standby = on (effective only for hot stand by server)  Add entry in pg_hba.conf  host replication enterprisedb 192.168.160.155/32 trust 3
  • 4.
    TAKE A BACKUPAND RESTORE ON SECONDARY  Take a backup of Primary DB Instance/Cluster  If archive WALs are available then you can take hot-backup using pg_basebackup  Restore the same for creating the DB cluster/instance on Secondary server  Change the port number if required in the new DB Cluster 4
  • 5.
    CREATE recovery.conf INSECONDARY SERVER  standby_mode = 'on' #mandatory  primary_conninfo = 'host=192.168.160.155 port=5445 user=enterprisedb password=password'  recovery_target_timeline = 'latest' #optional ## Important for rebuilding  trigger_file = '/opt/PostgresPlus/9.3AS/data2/recover.trigger' #optional  Note: pg_basebackup has option -R to create a default recovery.conf file while dumping the backup  pg_basebackup -h 192.168.160.155 –p 5445 –U enterprisedb -R 5
  • 6.
    START THE SERVERS Start the secondary server  There will be a warning in log files that primary server is not available, ignore that  Start the primary server 6
  • 7.
    TEST REPLICATION  OnPrimary: edb=# insert into replication_test values (2); INSERT 0 1  On Secondary: edb=# select * from replication_test; test_column ------------- 1 2 (2 rows)  Secondary server is read-only: edb=# insert into replication_test values (3); ERROR: cannot execute INSERT in a read-only transaction 7
  • 8.
    TRIGGERING THE FAILOVER To, Trigger a failure on Primary and create the recovery trigger file (manually, but can be scripted too) touch /opt/PostgresPlus/9.2AS/data2/recover.trigger | pg_ctl promote –D /opt/PostgresPlus/9.3AS/data2  Logic to script the above step: while( pg_ctl -h 192.168.160.147 –p 5444 -c "select 1 “) { sleep $connection_wait_time; } touch opt/PostgresPlus/9.3AS/data2/recover.trigger  Once completed, the recovery.conf will change to recover.done  Connect to secondary db and execute insert to confirm the failover edb=# insert into replication_test values (4); INSERT 0 1  Or execute select pg_is_in_recovery(); (output must be “f”) to confirm recovery is completed  Point the database/Virtual IP to new database server 8
  • 9.
    TRIGGERING THE SWITCHOVER Disconnect all the application from Primary Node  Shutdown the primary database  To, Trigger a failure on Primary and create the recovery trigger file touch opt/PostgresPlus/9.3AS/data2/recover.trigger | pg_ctl promote  Once completed, the recovery.conf will change to recover.done  Connect to secondary db and execute insert edb=# insert into replication_test values (4); INSERT 0 1  Or execute select pg_is_in_recovery(); (output must be “f”) to confirm recovery is completed  Point the database/Virtual IP to new database server 9
  • 10.
    HANDLING MULTIPLE REPLICAS v9.3 re-mastering will not need rebuilding the slaves.  In v9.3 Timeline switches are part of WAL as well which can replicated  Timeline switches happen during PITR or when slaves are promoted  Other replicas can be re-configured and restarted to receive WAL from new primary without rebuilding them from scratch 10
  • 11.
    11  Before Failover RE-MASTERING Srv1as Master Srv2 as Slave1 Srv3 as Slave2 After failover Srv1 has Crashed Srv2 as Master Srv3 as Slave No need to rebuild Srv3 or restore archives. Timeline switch info will be received from Srv2 via streamed WAL Reconfigure Srv3 to pull WAL from Srv2 and Restart
  • 12.
    REBUILDING THE MASTER If old primary needs to be added back to cluster as slave, it need not be re- built  Prior to v9.3 you either need to have wal archives to add a lost primary as slave or you need to rebuild the primary  In v9.3 Timeline switches are part of WAL as well which can be replicated  As long as all the WAL since failure are available you can add the lost master without any downtime/rebuilding  Copy the recovery.done from new primary as recovery.conf in data directory of lost primary  Make changes in connection information and Start the old primary instance as new hot standby 12
  • 13.
    MONITORING THE REPLICATION Check if the current node is master or slave:  SELECT pg_is_in_recovery();  See the current snapshot on master and slave:  SELECT txid_current_snapshot();  Get latest information about replication from  pg_stat_replication view 13
  • 14.
    Sameer Kumar Ashnik PteLtd, Singapore www.ashnik.com | sameer.kumar@ashnik.com www.slideshare.net/sameerkasi200x | www.twitter.com/sameerkasi200x Follow my blogs- pgpen.blogspot.com