PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBouncer

https://www.2ndQuadrant.com
PGConf APAC 2018
Singapore, March 22nd
Managing replication
clusters
with repmgr, Barman and PgBouncer

PGConf APAC 2018
Disclaimer/copyright
All content and images are created by/owned by 2ndQuadrant.

PGConf APAC 2018
TOPICS
Using these three applications to manage
PostgreSQL replication and HA:
● Barman
● repmgr
● PgBouncer

PGConf APAC 2018
What is HA?
●
Recovery Point Objective (RPO)
●
Recovery Time Objective (RTO)
Both as close to zero as possible/feasible
●
HA needs:
●
planning
●
documentation
●
practice
●
monitoring
●
HA involves tradeoffs
●
Many ways of implementing it
●
Keep systems up-to-date!!!

PGConf APAC 2018
Replication and all that
● replication: not HA by itself
● replication: sounds simple, is complicated
● archiving: sounds boring, but essential

PGConf APAC 2018
Types of replication
● streaming replication
● logical replication
● multi-master replication

PGConf APAC 2018
Terminology
● archive-centric
SELECT pg_is_in_recovery()
● Master, primary...
● Standby, slave, replica...
● Streaming replication, binary replication, physical replication...

PGConf APAC 2018
Overview
What are
– repmgr
– barman
– pgbouncer

PGConf APAC 2018
repmgr
● replication manager
● two main roles
– set up and manage a replication cluster
– automatic failover and monitoring
● License: GPL
● current version: 4.0.4
● https://repmgr.org

PGConf APAC 2018
simple repmgr setup

PGConf APAC 2018
complex repmgr setup

PGConf APAC 2018
repmgr commands
● primary register/unregister
● standby clone
● standby register/unregister
● standby promote
● standby follow
● standby switchover
● node rejoin
● node status
● witness register
● witness unregister
● cluster show

PGConf APAC 2018
barman
● backup and recovery manager
● among others:
– takes (incremental) backups
– archives WAL
– serves WAL
● License: GPL
● current version: 2.3
● http://www.pgbarman.org/

PGConf APAC 2018
Main Barman commands
● archive-wal
● backup
● check
● cron
● delete
● get-wal
● list-backup
● list-files
● list-server
● receive-wal
● recover
● show-backup
● show-server
● replication-status

PGConf APAC 2018
pgbouncer
● lightweight connection pooler
● useful functionality
– lowers performance impact of PostgreSQL connections
– can divert connections to other servers
● License: ISC (BSD-ish)
● current version: 1.8.1 (Dec 2018)
● https://pgbouncer.github.io/

PGConf APAC 2018
barman - a quick demo (1)
● sample configuration file (Barman server only)
[barman]
barman_home = /home/barman
barman_user = barman
log_file = /var/log/barman/barman.log
compression = gzip
reuse_backup = link
minimum_redundancy = 2
retention_policy = RECOVERY WINDOW OF 4 WEEKS
streaming_archiver = on
[test_cluster]
description = "Repmgr Test Cluster"
ssh_command = ssh -q localhost
conninfo = host=127.0.0.1 user=postgres port=5501

PGConf APAC 2018
● take a backup:
$ barman backup test_cluster
Starting backup using rsync-exclusive method for server test in /home/barman/test_cluster/base/
20161028T151425
Backup start at xlog location: 0/5000028 (000000010000000000000005, 00000028)
Copying files.
Copy done.
Asking PostgreSQL server to finalize the backup.
Backup size: 29.2 MiB. Actual size on disk: 158.4 KiB (-99.47% deduplication ratio).
Backup end at xlog location: 0/5000130 (000000010000000000000005, 00000130)
Backup completed
Processing xlog segments from file archival for test
000000010000000000000004
000000010000000000000005
000000010000000000000005.00000028.backup

PGConf APAC 2018
● list backup(s)
$ barman list-backup test_cluster
test_cluster 20180301T124344 - Thu Mar 1 12:43:47 2018 - Size: 28.5 MiB - WAL Size: 0 B
test_cluster 20180301T121110 - Thu Mar 1 12:11:21 2018 - Size: 28.4 MiB - WAL Size: 54.8 KiB

PGConf APAC 2018
● backup detail
$ barman show-backup test 20180301T124344
Backup 20180301T124344:
Server Name : test
Status : DONE
PostgreSQL Version : 90605
PGDATA directory : /tmp/repmgr-test/node_1/data
Base backup information:
Disk usage : 28.4 MiB (28.5 MiB with WALs)
Incremental size : 142.3 KiB (-99.51%)
Timeline : 1
Begin WAL : 000000010000000000000005
End WAL : 000000010000000000000005
WAL number : 1
WAL compression ratio: 99.84%
Begin time : 2018-03-01 12:43:44.721340+09:00
End time : 2018-03-01 12:43:47.314113+09:00
Begin Offset : 40
End Offset : 248
Begin XLOG : 0/5000028
End XLOG : 0/50000F8
WAL information:
No of files : 0
Disk usage : 0 B
Last available : 000000010000000000000005
Catalog information:
Retention Policy : VALID
Previous Backup : 20180301T121110
Next Backup : - (this is the latest base backup)

PGConf APAC 2018
● restore from backup
● various kinds of PITR recovery also available
$ barman recover --remote-ssh-command "ssh postgres@remotehost" test_cluster last
/var/lib/postgresql/data
Starting remote restore for server test using backup 20180301T124344
Destination directory: /var/lib/postgresql/data
Copying the base backup.
Copying required WAL segments.
Generating archive status files
Identify dangerous settings in destination directory.
WARNING
You are required to review the following options as potentially dangerous
postgresql.conf line 643: include = 'postgresql.replication.conf'
postgresql.conf line 644: include = 'postgresql.local.conf'
Your PostgreSQL server has been successfully prepared for recovery!

PGConf APAC 2018
repmgr - a quick demo (1)
● sample configuration file (per-node)
node_id=2
node_name=node2
conninfo='host=localhost dbname=repmgr user=repmgr port=5502'
data_directory='/var/lib/pgsql/data'
# barman settings
barman_server=barman.local

PGConf APAC 2018
● clone a standby... from the Barman backup!
● recovery.conf looks like this:
$ repmgr -D /tmp/repmgr-test/node_2/data
-f /tmp/repmgr-test/node_2/repmgr.conf
-h localhost -p 5501 -d repmgr -U repmgr --verbose -LINFO
standby clone
[2016-11-01 12:12:25] [NOTICE] using configuration file "/tmp/repmgr-test/node_2/repmgr.conf"
[2016-11-01 12:12:25] [NOTICE] destination directory '/tmp/repmgr-test/node_2/data' provided
[2016-11-01 12:12:25] [INFO] Connecting to Barman server to verify backup for test_cluster
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[2016-11-01 12:12:25] [INFO] creating directory "/tmp/repmgr-test/node_2/data/repmgr"...
[2016-11-01 12:12:25] [INFO] Connecting to Barman server to fetch server parameters
[2016-11-01 12:12:26] [INFO] connecting to upstream node
[2016-11-01 12:12:26] [INFO] connected to upstream node, checking its state
[2016-11-01 12:12:26] [INFO] Successfully connected to upstream node. Current installation size is 28 MB
[2016-11-01 12:12:26] [NOTICE] getting backup from Barman...
[2016-11-01 12:12:28] [NOTICE] standby clone (from Barman) complete
[2016-11-01 12:12:28] [NOTICE] you can now start your PostgreSQL server
[2016-11-01 12:12:28] [HINT] for example : pg_ctl -D /tmp/repmgr-test/node_2/data start
[2016-11-01 12:12:28] [HINT] After starting the server, you need to register this standby with
"repmgr standby register"

PGConf APAC 2018
● recovery.conf looks like this:
● barman-wal-restore script does the “ heavy lifting” of fetching WAL
– from barman 2.0 part of the barman-cli package
standby_mode = 'on'
primary_conninfo = 'user=repmgr port=5501 host=localhost application_name=node2'
recovery_target_timeline = 'latest'
restore_command = 'barman-wal-restore barman.local test_cluster %f %p'

PGConf APAC 2018
back to barman - keep the WAL flowing
● WAL retention management - tricky
● use barman in the restore command
● removes need to manage:
– wal_keep_segments
– replication slots
– archive_cleanup_command

PGConf APAC 2018
repmgrd demo (1)
● repmgrd-specific configuration in repmgr.conf
node_id=2
node_name=node2
conninfo='host=node2 dbname=repmgr user=repmgr connect_timeout=2'
data_directory='/var/lib/pgsql/data'
# barman settings
barman_server=barman
#repmgrd settings
failover=automatic
promote_command='repmgr standby promote'
follow_command='repmgr standby follow --wait --upstream-node-id=%n'

PGConf APAC 2018
repmgrd demo (2)
● Time to say goodbye to the primary...
● Standby promotes itself
pg_ctl -D /var/lib/pgsql/data/ -m immediate stop
[2018-03-20 13:54:02] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal
state
[2018-03-20 13:54:14] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2018-03-20 13:54:14] [INFO] checking state of node 1, 1 of 5 attempts
[2018-03-20 13:54:14] [INFO] sleeping 1 seconds until next reconnection attempt
[2018-03-20 13:54:18] [WARNING] unable to reconnect to node 1 after 5 attempts
[2018-03-20 13:54:18] [NOTICE] this node is the only available candidate and will now promote itself
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using "/home/ibarwick/devel/builds/94/bin/pg_ctl -l
/tmp/postgres.5502.log -m fast -w -D '/space/sda1/ibarwick/repmgr-test/node_2/data' promote"
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
[2018-03-20 13:54:19] [NOTICE] 0 followers to notify
[2018-03-20 13:54:19] [INFO] switching to primary monitoring mode

PGConf APAC 2018
repmgrd failover mechanism
● quorum vote
● individual standbys can have different priorities
● witness server to establish qualified majority
● concept of “locations”

PGConf APAC 2018
repmgrd event notifications
● repmgr/repmgrd generate “event notifications”
● “after” trigger for certain events
● recorded in the repmgr metadatabase
● can be used to execute custom scripts
● following events can generated when repmgrd is running:
– repmgrd_start
– repmgrd_shutdown
– repmgrd_failover_promote
– repmgrd_failover_follow
– standby_promote
– standby_follow
– standby_disconnect_manual

PGConf APAC 2018
repmgr event log
● Logs info about events of note
$ repmgr cluster event --terse
Node ID | Name | Event | OK | Timestamp
---------+-------+--------------------------+----+---------------------
2 | node2 | repmgrd_reload | t | 2018-03-20 13:54:19
2 | node2 | repmgrd_failover_promote | t | 2018-03-20 13:54:19
2 | node2 | standby_promote | t | 2018-03-20 13:54:19
2 | node2 | repmgrd_start | t | 2018-03-20 13:53:30
1 | node1 | repmgrd_start | t | 2018-03-20 13:53:27
2 | node2 | standby_register_sync | t | 2018-03-20 13:53:24
2 | node2 | standby_register | t | 2018-03-20 13:53:24
2 | node2 | standby_clone | t | 2018-03-20 13:53:23
1 | node1 | primary_register | t | 2018-03-20 13:53:11
1 | node1 | cluster_created | t | 2018-03-20 13:53:11

PGConf APAC 2018
PgBouncer - concepts
● PgBouncer intercepts incoming PostgreSQL connections
● Re-routes these to local or remote databases
– acts as both pooler and proxy
● Provides a “virtual database” for issuing commands

PGConf APAC 2018
PgBouncer - configuration
● simple example
[pgbouncer]
listen_addr = *
listen_port = 6432
[databases]
appdb-rw= host=node1 dbname=repmgr
appdb-ro= host=node2 dbname=repmgr

PGConf APAC 2018
pgbouncer connections
● connect to the primary:
● connect to the standby:
$ psql 'host=node2 user=repmgr port=6432 dbname=repmgr-rw'
psql (9.6.5)
repmgr-rw=# CREATE TABLE foo (id INT);
CREATE TABLE
Time: 4.636 ms
$ psql 'host=node2 user=repmgr port=6432 dbname=repmgr-ro'
psql (9.6.5)
repmgr-ro=# CREATE TABLE foo (id INT);
ERROR: cannot execute CREATE TABLE in a read-only transaction
Time: 0.867 ms

PGConf APAC 2018
Putting it all together
● Custom promote_command
● Performs following steps:
– pauses PgBouncer
– promotes the standby
– dynamically rewrites PgBouncer config file
– reloads PgBouncer config
– resumes PgBouncer
● barman reconfiguration not implemented

PGConf APAC 2018
PgBouncer – use include file
● [database] section as include file:
Note: %include directive available from PgBouncer 1.6
[pgbouncer]
listen_addr = *
listen_port = 6432
%include /etc/pgbouncer.database.ini

PGConf APAC 2018
Script (1) - variables
● Assuming PgBouncer running on DB node:
#!/usr/bin/env bash
set -u
set -e
# Configurable items
PGBOUNCER_HOSTS="node1 node2 node3"
PGBOUNCER_DATABASE_INI="/etc/pgbouncer.database.ini"
PGBOUNCER_DATABASE="appdb"
PGBOUNCER_PORT=6432
REPMGR_DB="repmgr"
REPMGR_USER="repmgr"

PGConf APAC 2018
Script (2) – pause/promote
● PgBouncer paused while standby promoted
# 1. Pause running pgbouncer instances
for HOST in $PGBOUNCER_HOSTS
do
psql -t -c "pause" -h $HOST -p $PGBOUNCER_PORT
-U postgres pgbouncer
done
# 2. Promote this node from standby to primary
repmgr standby promote -f /etc/repmgr.conf

PGConf APAC 2018
Script (3) – reconfigure
● Rewrite [databases] section
# 3. Reconfigure pgbouncer instances
PGBOUNCER_DATABASE_INI_NEW="/tmp/pgbouncer.database.ini"
for HOST in $PGBOUNCER_HOSTS
do
# Recreate the pgbouncer config file
echo -e "[databases]n" > $PGBOUNCER_DATABASE_INI_NEW
psql -d $REPMGR_DB -U $REPMGR_USER -t -A
-c "SELECT '${PGBOUNCER_DATABASE}-rw= ' || conninfo || '
application_name=pgbouncer_${HOST}'
FROM repmgr.nodes
WHERE active = TRUE AND type='primary'"
>> $PGBOUNCER_DATABASE_INI_NEW
psql -d $REPMGR_DB -U $REPMGR_USER -t -A
-c "SELECT '${PGBOUNCER_DATABASE}-ro= ' || conninfo ||
' application_name=pgbouncer_${HOST}'
FROM repmgr.nodes
WHERE node_name='${HOST}'"
>> $PGBOUNCER_DATABASE_INI_NEW
rsync $PGBOUNCER_DATABASE_INI_NEW $HOST:$PGBOUNCER_DATABASE_INI

PGConf APAC 2018
Script (4) – reload and resume
● Reload and resume PgBouncer
psql -tc "reload" -h $HOST -p $PGBOUNCER_PORT -U postgres pgbouncer
psql -tc "resume" -h $HOST -p $PGBOUNCER_PORT -U postgres pgbouncer
done
# Clean up generated file
rm $PGBOUNCER_DATABASE_INI_NEW
echo "Reconfiguration of pgbouncer complete"

PGConf APAC 2018
Notes
● This is an example implementation
● Use provisioning!

PGConf APAC 2018
Advantages/disadvantages
● Advantages
– can be set up on an existing cluster
– application level solution
● Disadvantages
– depends on SSH being available

PGConf APAC 2018
Outlook
● Make replication simpler
● Support for BDR, pglogical
● Further integration between applications

PGConf APAC 2018
ご清聴ありがとうございました
Ian Barwick
ian@2ndquadrant.com

PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBouncer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBouncer

Similar to PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBouncer (20)

More from PGConf APAC

More from PGConf APAC (20)

Recently uploaded

Recently uploaded (20)

PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBouncer