2. 1
Overview
When you’re using PostgreSQL, you can set up streaming replication quite easily in order
to have your data redundant on two or more nodes. But we can leverage the the copies
of your data to balance the load of your primary database-server. Will try to explain how
to setup a redundant PostgreSQL database with load balancing.
Goals
1. Achieve the redundancy using built-in streaming replication of PostgreSQL.
2. Use the slave data nodes to load balance the read queries.
Specifications
To achieve the redundancy, I’ve chosen to use the built-in streaming replication of
PostgreSQl. Another tool, repmgr, will be used for easier control of the replication
between those redundant PostgreSQL instances. The tool doesn’t replicate data itself but
it allows you to easily control the replication, the standby-server(s) and monitor the
status of the whole replication process.
In order to use both ( or more if you would like to) copies of the database-data, I will use
pgpool. Pgpool can pool the connections to the nodes and will monitor their status and
trigger failover (by repmgr) if needed. Pgpool has the capability to load balance traffic,
based on the type of SQL query. For example, a SELECT query can perfectly be executed
on a slave (read only) node and save the resources on the master(read-write) node.
Milestones
I. Setup
We will setup two nodes as redundant PostgreSQl DB (a primary and a standby)
and one server which will do the pooling and distribution. The pool is a single
point of failure and ideally should also be redundant. For this case. The most
important is that all data stays available in case of failure.
3. 2
As we can see in the above scheme, SQL queries will be distributed to the primary and
stand, based on the type of query. Data written to the master should get replicated to the
standby.
In case of a failure of the primary, the standby should replace the primary and become
read-write. This will make sure that the database stays available for all the applications.
In the meanwhile, one can investigate what happened to the old primary and after a
checkup, start it as the new standby.
The version of PostgreSQL used in the this document is 9.4 but once can easily replace
that with any other supported (recent) PostgreSQL version.
4. 3
II. Prerequisites
Firewall and SELinux
Considered that the iptables/firewall and SELinux is disabled on the server, we will not
be covering those parts as the objective of this document is only PostgreSQL failover
replication.
Hostname
Before we will get started with the setup, it’s important that all nodes in the setup can
connect to each other via their hostname. If that’s not an option, you could use IP’s
instead of hostnames but that would make this explanation less clear.
Public Key Authentication
Another prerequisite is that all nodes can connect to each other over SSH without a
password prompt with user postgres. SSH will be used to rsync the data from the
primary to standby and to initiate a failover from pgpool. Password-less SSH can be
achieved with public key authentication.
To setup the SSH authentication for the postgres user, we first need to create the user.
Now, we will need to generate the a new RSA keypair for all the nodes.
As you can see I did a cat of the pubic part of the key pair on each node. To allow all
machines to connect to each other and accept each other’s key, we’ll need to add the
generated public keys of all hosts to /var/lib/pgsql/.ssh/authorized_keys:
6. 5
In the below snapshot, I added each of the generated public RSA keys to the
authorized_keys file on node1. Don’t forget the last line to change the permission to
postgres user or SSH will not accept this file.
Since I really want unattended access ( so no password or any other question), I will also
add all hosts to the known_hosts file. This prevents the question to add the hosts
fingerprint on the first connection:
Since my authorized_keys and known host on node1 is fine for all other hosts, I’ll copy
these files to other nodes.
7. 6
It’s also a good idea to test the SSH connection between all hosts using the postgres user
from everywhere:
Also, pgpool server root user executes the commands on postgres nodes, to allow
passwordless authentication we will copy the postgres user pub key to root user:
III. Repository
The standard CentOS repositories do not contain pgpool or repmgr and for CentOS 7, the
supported PostgreSQL version will be 9.2. Because we want to use PostgreSQl 9.4,
repmgr and pgpool, we need to add the PGDG repository to Yum.
Lets add the repo on all nodes:
8. 7
Setup the primary PostgreSQL node
Now that we finished with all prerequisites, it’s time to really get started. The first step is
to install the PostgreSQl database for the master and configure it for replication
Let’s start with installing required packages:
I have experienced some problems with the latest version of repmgr (repmgr-3.1.5-1 at
the time of writing), especially to recover from a failed node, so i used the previous
version (repmgr94-2.0.2-4) of repmgr, using older version all problems seemed to
resolved.
One can find the older version link here:
https://mirror.its.sfu.ca/mirror/CentOS-Third-Party/pgrpm/pgrpm-94/redhat/rhel-7-x86_6
4/repmgr94-2.0.2-4.rhel7.x86_64.rpm
After the installation of postgreSQL and repmgr, let’s initialize PostgreSQL:
Configure database access by editing /var/lib/pgsql/9.4/data/pg_hba.conf
9. 8
The above configuration allows DB-nodes to connect from other DB-nodes as well as
from pgpool server with pgpool-user. By mentioning trust will allow users to connect
without password and the rest of the users will have to use password (since method is
mention as md5). So with the above configuration pgpool server can communicate to
DB-nodes with password for DB-operations.
Configure the PostgreSQL and streaming replication by editing
/var/lib/pgsql/9.4/data/postgresql.conf
10. 9
It’s good to merge the changed parameters with the existing, default content of the
PostgreSQL, since it contains many useful comments. I have just posted the effective
settings here to keep it clear.
Create a directory for repmgr configuration files:
Configure the repmgr by editing the /var/lib/pgsql/repmgr/repmgr.conf
Now set the owner of the files as postgres-user
Enable and start PostgreSQL on the master(node1):
Create required users for replication and repmgr and the repmgr DB:
11. 10
Last step is to register node1 as master node for repmgr:
Setup the Standby PostgreSQL node
Setting up the standby node doesn;t require a lot of configuration. On the first sync with
primary, most of the configuration is taken from here.
Same as with primary, the first is to start with installation of necessary packages:
Once all the packages are installed, we can sync the configuration and content of the
primary with the standby.
In case you experience problems with the synchronization, you need to first delete the
data and have to recreate it, as the postgreSQL backup-restore process doesn’t overwrite
the content, for which we need to clear the content from the folder. And run the given
command.
12. 11
After the synchronization with the primary, configuration repmgr, similar as we did on
the primary by creating the directory for the configuration files:
Also configured the /var/lib/pgsql/repmgr/repmgr.conf and don’t forget the last to
change the owner of the file.
Now let’s start and enable PostgreSQL service on standby server
As the last step in the standby node setup, register the standby in repmgr:
Test the replication
Before we test, let’s first look at the status of repmgr:
This seems to look good. At this point, updates on master should be replicated to the
slave, as we would create except by creating the new database on the primary
node(node1).
13. 12
Before we create database on primary let’s take a look at the list of databases on standby
node;
Now, let’s create a database on primary node:
List the databases on slave node using same command which run on primary:
As you can see the new database test got replicated to standby:
We will do one more test to check the if the standby is in read-only mode, we will actually
try to create a database on standby:
The above result is looks good, as we don’t want somebody to update the standby,
everything should pass via primary and get replicated to standby.
14. 13
Setup Pgpool
Setup pgpool in order to distribute the queries to the primary and standby and to make
sure that we failover to the standby in case of primary isn’t available anymore:
As with the DB-nodes, we will start with installing the necessary packages:
Configure pgpool by copying the sample configuration file:
Edit the newly copied sample file /etc/pgpool-II-94/pgpool.conf, don’t replace the content
of files but edit everything listed below;
As mention in the above settings, we have specified that the failover.sh should be
executed on failover, so we need to create this file in /etc/pgpool-II-94/failover.sh
Make sure that this script is executable:
15. 14
Also need use the pool_hba.conf for access control, so create this file in /etc/pgpool-II-94
with the below content;
The above configuration allows all users to connect from ip range 10.0.10.0/24 using
password.
Every user that needs to connect via pgpool needs to be added in pool_passwd. We need
to create this file and change the owner to postgres.
In the above snapshot we have added the pgpool-user to the file, which we created
earlier.
We need to repeat this last step for every user that needs access to the Database.
Last step is allow connection via PCP to manage pgpool. This requires a similar approach
as with pool_passwd.
This completes the pgpool configuration.
Let’s start and enable the pgpool service:
Test pgpool
Hoping that till now all went well. We should have a working installation of pgpool that
will execute queries on the replicated PostgreSQL nodes.
Let’s test if pgpool is working by executing a query on the database via pgpool:
16. 15
We will get same result from another host which can access to pgpool-server.
Pgpool is having range of show-commands available that allows us to get the status of
pgpool itself. They are executed as normal SQL but will be intercepted by pgpool.
Check the status of nodes connected to pgpool:
As we can see that node1 is primary and node2 is secondary node. Also the status (2)
mention as that the node is online.
Test failover
To test the failover scenarios we need to bring down the primary node and see if the
standby node becomes master or not:
17. 16
Post failover, new state of our replication will be like this:
Status from pgpool-server:
In the above screenshot we can see that the former slave (node2) has now became new
primary the former primary (node1) has now became standby but has an offline status
(3)
18. 17
Result in repmgr:
In the failover.sh script we defined and was called when the primary went
down/unavailable, we specified to log /tmp/pgpool_failover.log.
Below are the contents of the file after the failover;
As we can see, the database stays available via pgpool and the former standby became
read-write.
Recover after failover to master-slave replication
After the failover, we will be in non-redundant situation. Everything keeps working for
users of the database but the we need to go back to primary-standby configuration.
19. 18
The new state of replication will be:
First thing is to troubleshoot why the primary DB became unavailable and once the
problem is resolved , we can start using that node as the new standby-server:
Make sure that database is stopped on node1:
Then, we will sync the data from new primary(node2) to the previously failed node
(node1). In meanwhile if the database is updated and changed so we need to make sure
that those get replicated to node1 before we start it as a standby:
Also, as I mention earlier that postgres backup-restore procedure doesn’t overwrite the
data in /var/lib/pgsql/9.4/data directory. I simply deleted all the data in the directory and
20. 19
with the help of repmgr command synced all the updated data from new primary
node(node2):
As the new standby node1 is ready let’s start the service and repmgr notices
automatically that the new standby got available:
For pgpool, we need to re-attach the failed node in order for it to be visible and usable as
a standby node. Password requested for below command is secret, which we set while
creating pgpool user.
Now at this point, we are back in redundant status where our old standby(node2)
functions as the primary and the old primary(node1) functions as a standby. Both the
machines are now equal, we can leave the situation as it is and continue to use this way
or we can revert to original position.
21. 20
Recover the original situation
In case we want to go back to initial setup (node1 as primary and node2 as standby), we
can initiate a manual failover by stopping the current primary(node2) and recover it to
use as a standby:
Once the master(node2) stopped, a failover was triggered and node1 gets assigned as
primary again;
Now, sync node2 with new primary(node1) and start:
22. 21
Re-attach node2 to pgool in order to use it as a standby:
After going though all the above steps, now we are back to the initial situation where
node1 acts as primary and node2 acts as the standby.