2. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
Table of content
1 Overview
The mission
Before the migration
2 PostgreSQL 9.0
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
3 Clustering
Set up of corosync
OCF resource
4 Backups
Cron jobs
BackupPC
5 Monitoring
Nagios
Munin
6 Automation
Puppet module
The node file
#TODO
7 The end
Julien Pivotto PostgreSQL 9.0 HA
4. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
A.R.S.I.A.
• Association Régionale de Santé et d’Identification Animales
• 30 linux servers in several locations
• A lot of Open Source
• CentOS, Samba, Open-xchange, mailscanner, Cyrus,
• . . . Puppet, jenkins, foreman, OpenVPN, GLPI, rabbitmq,
• . . . BackupPC, CUPS, icinga, trac, zope, plone,
• . . . solr, pentaho, funambol, munin, squid, asterisk,
• . . . and PostgreSQL, . . .
Julien Pivotto PostgreSQL 9.0 HA
5. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
C.E.R.I.S.E
• A web application
• Plone (python)
• 15k+ visits, 500k+ pages and 2.000.000+ hits each month
• Developped by Affinitic
• Several databases
• PostgreSQL 9.0
• Oracle database
• Several servers/services
• Two reverse proxies in failover HA
• Two application servers in load balancing HA
• Two PostgreSQL servers in failover HA
• An oracledb server
• A development server
• A pentaho server
• Being integrated in jenkins (to be continued. . . )
Julien Pivotto PostgreSQL 9.0 HA
6. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
PostgreSQL before the migration
• PostgreSQL 8.3.7
• No native support of HA
• High availability with heartbeat 2 and DRBD
• Installed on the application servers
• Nothing automated
• Failover: Passive node is not even read only
• Installed in November 2008
Julien Pivotto PostgreSQL 9.0 HA
7. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
Monitoring before the installation
• Icinga
• Check of the DRBD
• Simple connection check to PostgreSQL
• Graphing with Cacti
• Size of the databases
• Connexions to the database
• Checkpoints
Julien Pivotto PostgreSQL 9.0 HA
8. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
Backups before the installation
• Backups were done every hour one the same machine
• External backups once a day on disk and on tape
• Backups are made with pg_dump command
• BackupPC get those files
Julien Pivotto PostgreSQL 9.0 HA
9. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
PostgreSQL 9.0
• PostgreSQL 9.0 was out in september 2010
• It brings to the world native replication in PostgresSQL
• There is not any native failover tool
• So we need to use PostgreSQL + Corosync
• The setup of PostgreSQL is managed by Puppet
Julien Pivotto PostgreSQL 9.0 HA
10. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Write-Ahead Logging
• It means that every change to datafile must first be written
into a log file
• Less disk writes: only the log file needs to be flushed to disk to
guarantee that a transaction is committed, rather than every
data file changed by the transaction
Julien Pivotto PostgreSQL 9.0 HA
11. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
What is streaming replication
• Streaming replication provides the capability to ship and apply
WAL XLOGS to standby servers
• It’s possible to have multiple standby servers
• Standby servers can be read-only ("Hot standby")
Julien Pivotto PostgreSQL 9.0 HA
12. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
DisadvantagesSpecifications of streaming replication
• Streaming replication supports only asynchronous log-shipping
• But when the database is used, the delay is close to
synchronous log-shipping
• Adding a standby server requires manual action
• But in our case we will only have one standby server
• PostgreSQL does not provide HA feature
• But Corosync does
• It is a single-threaded replication
• It is a single-threaded replication
Julien Pivotto PostgreSQL 9.0 HA
13. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Master configuration
The master only needs one configuration file.
Configuration non-related to SR
#Postgresql configuration
#http://www.postgresql.org/docs/9.0/interactive/index.html
listen_addresses = ’*’
max_connections = 200
shared_buffers = 4096MB
work_mem = 4096MB
effective_cache_size = 10024MB
commit_delay = 100000
effective_cache_size = 2560
log_destination = ‘stderr’
log_directory = ‘pg_log’
logging_collector = on
log_filename = ‘postgresql-%Y-%m-%d_%H%M%S.log’
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_min_messages = notice
log_min_duration_statement = 1000
log_line_prefix = ‘%t %u ’
log_statement = ‘none’
datestyle = ‘iso, dmy’
Julien Pivotto PostgreSQL 9.0 HA
15. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Master configuration
• wal_level = hot_standby
Allows stanby server to be readable
• max_wal_senders = 2
We allow up to 2 standby nodes
• wal_keep_segments = 128
The minimum wal segments to keep
Julien Pivotto PostgreSQL 9.0 HA
16. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Slave configuration
• The slave requires at least two configuration files
• A postgreSQL.conf file
• A recovery.conf file, used to apply the WAL XLOGS shipped by
the master
• A trigger file to stop replication can be specified
PostgreSQL.conf - Configuration related to SR
wal_level = hot_standby
hot_standby = on
Note that the file also have the same first part of the config file
than the master configuration.
Julien Pivotto PostgreSQL 9.0 HA
17. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Slave configuration
recovery.conf
standby_mode = ‘on’
primary_conninfo = ‘host=192.168.177.2 user=replicuser’
• standby_mode means that this is a standby server
• primary_conninfo is the connection to the master
Julien Pivotto PostgreSQL 9.0 HA
18. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Replication user
• A super user called replication has to be created
• The SQL command to create it is
CREATE USER replication SUPERUSER LOGIN CONNECTION
LIMIT 1 ENCRYPTED PASSWORD ‘foobar’;
Julien Pivotto PostgreSQL 9.0 HA
19. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
pg_hba.conf
• pg_hba.conf is the file that contains some kind of ACLs for
the PostgreSQL connections
• In that file we will add both nodes as ‘trusted’ and the
replication user as trusted too
pg_hba.conf
hostnossl all all 10.0.10.8/32 trust
hostnossl all all 10.0.10.9/32 trust
hostnossl replication replicuser 192.168.177.2/24 trust
Julien Pivotto PostgreSQL 9.0 HA
20. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Setting up a slave
• You have to type a bunch of commands on the master when
you add a new standby server
Adding a standby server
psql -c "SELECT pg_start_backup(’label’, true)"
rsync -a ${PGDATA}/ standby:/srv/pgsql/standby/ --exclude postmaster.pid --exclude ‘*-master’
--exclude ‘*-slave’
psql -c "SELECT pg_stop_backup()"
Julien Pivotto PostgreSQL 9.0 HA
21. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
Corosync configuration
• The goal of corosync is to make the switch between
master/slave when needed
• It will ensure that a master is online and connected to the
router
• The two servers are connected to each other on eth1
• Corosync is installed by Puppet
• We take it from the clusterlabs repositories
• We use a personalized master/slave ocf resource to manage
the PostgreSQL M/S
Julien Pivotto PostgreSQL 9.0 HA
22. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
The main configuration file of corosync is
/etc/corosync/crm.conf. It contains all the
resources/nodes/etc. . .
Defining the nodes
node babar.interne.arsia.be
attributes standby="off"
node dumbo.interne.arsia.be
attributes standby="off"
In this code, the two nodes are defined, and we tell corosync that
they should be started at launch.
Julien Pivotto PostgreSQL 9.0 HA
23. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
Defining the primitives
primitive pgsql ocf:inuits:pgsql-ms
primitive virt_ip ocf:heartbeat:IPaddr2
params nic="eth0" iflabel="0" ip="10.0.10.10" cidr_netmask="24" broadcast="10.0.10.255"
meta target-role="Started" is-managed="true"
primitive ping ocf:pacemaker:ping
params host_list="10.0.10.1"
op monitor interval="10s" timeout="10s"
op start interval="0" timeout="45s"
op stop interval="0" timeout="50s"
• We define 3 primitives:
• pgsql, the PostgreSQL primitive
• virt_ip, the floating IP address
• ping, the primitive that will check that the servers are
connected to the router
Julien Pivotto PostgreSQL 9.0 HA
24. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
Configuring the primitives
ms pgsql-ms pgsql
params pgsqlconfig="/var/lib/pgsql/data/postgresql.conf"
lsb_script="/etc/init.d/postgresql-9.0"
pgsqlrecovery="/var/lib/pgsql/data/recovery.conf"
meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="false"
clone clone-ping ping
meta globally-unique="false"
• We configure the PostgreSQL M/S: the init script, the
configuration files. . .
• We also configure the ping resource as a clone (it will be
launched on both servers)
Julien Pivotto PostgreSQL 9.0 HA
25. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
Defining the nodes
group PSQL virt_ip
location connected PSQL
rule $id="connected-rule" -inf: not_defined pingd or pingd lte 0
colocation ip_psql inf: PSQL pgsql-ms:Master
property $id="cib-bootstrap-options"
cluster-infrastructure="openais"
expected-quorum-votes="2"
stonith-enabled="false"
no-quorum-policy="ignore"
default-resource-stickiness="INFINITY"
rsc_defaults $id="rsc_defaults-options"
migration-threshold="INFINITY"
failure-timeout="10"
resource-stickiness="INFINITY"
• These lines will ensure that the master is always on the same
node as the floating IP address
• And also that the master is connected to the router
Julien Pivotto PostgreSQL 9.0 HA
26. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
OCF resource
• There is a custom OCF resource to manage the master/slave
PostgreSQL
• It is based on an example of resource written by Andrew
Beekhof from Clusterlabs
• The file has to be in
/usr/lib/ocf/resource.d/inuits/pgsql-ms
Julien Pivotto PostgreSQL 9.0 HA
27. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
OCF resource
• The script does the following:
• It moves the postgresql.conf-master to
postgresql.conf when a node is promoted/master
• It moves the postgresql.conf-slave to postgresql.conf
when a node is depromoted/slave
• It ensure that recovery.conf-slave is on recovery.conf
on slave and absent on master
• It starts/restarts PostgreSQL when needed.
• I will post that file on Github soon
Julien Pivotto PostgreSQL 9.0 HA
28. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Cron jobs
BackupPC
Backups of the databases
• Sometimes, you need backups (especially when you don’t have
backups. . . )
• We do a backup per hour on each node (one at minute 0 and
one at minute 30)
• We do a backup per day on each node
• We do a backup per day on before BackupPC backup on each
node.
• We keep 24 hourly backups and 7 daily backups on disk
• With BackupPC we keep months of backups
Julien Pivotto PostgreSQL 9.0 HA
29. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Cron jobs
BackupPC
Hourly backup script
/usr/local/bin/backup_hourly.sh
#!/bin/bash
DATE=$(date +%H)
BACKUP_PATH=/var/lib/backups/hourly
for db in foobar_db foobar2_db
do
/usr/bin/pg_dump $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz
ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz
done
The daily script is almost the same.
Julien Pivotto PostgreSQL 9.0 HA
30. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Cron jobs
BackupPC
BackupPC script
/usr/local/bin/backup_backuppc.sh
#!/bin/bash
DATE=$(date +%u)
BACKUP_PATH=/var/lib/backups/backuppc
for db in cerise trackitquality trackit zodb_cerise
do
/usr/bin/pg_dump -U postgres $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz
ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz
done
In the backupPC config, I added the following:
BackupPC config
$Conf{DumpPreUserCmd} = ‘$sshPath -t -q -x -l backuppc $host /usr/local/bin/backup_backuppc.sh’;
Julien Pivotto PostgreSQL 9.0 HA
32. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Nagios
Munin
Check hot_standby latency
• The check_postgres.pl script has a check for hot_standby
delay
• But we do not know who is the master and the slave, and it is
required to launch the script
• So, here is a bash script I wrote to know the M/S order
Master/slave replication check
#!/bin/bash
/usr/lib64/nagios/plugins/check_postgres.pl --db="$1"
--action hot_standby_delay -w 300 -c 600 --host=$(
crm_resource --resource pgsql-ms --locate|
awk ‘/Master/ {master=$6} / $/ {slave=$6} END {print master","slave}’
)
Julien Pivotto PostgreSQL 9.0 HA
35. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Puppet module
• The puppet postgres module is forked from Kris Buytaert’s
github page
• It is modified to remove all references to services, because we
want corosync to manage them
• It creates the users, the super users, the databases
• It is a parameterized class, with a "cluster" parameter. So we
can also install simple PostgreSQL
• The cache sizes are parameterized too, so we can also use that
in Vagrant boxes
• Here are some examples from the module I will upload on
Github ASAP
Julien Pivotto PostgreSQL 9.0 HA
36. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Class postgres
The postgres class installs the packages and makes the initdb stuff.
init.pp
class postgres (
$cluster = ‘no’,
$running_ip = ‘127.0.0.1’
){ ...
• The cluster parameter indicates if we want or not clustering
• running_ip is used for the SQL commands. In case of a
cluster, you have to put cluste’s IP address here.
Julien Pivotto PostgreSQL 9.0 HA
38. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Example in the node file
Here is the result in the node file:
dumbo.pp
node babar {
class {
’postgres’:
cluster => ’yes’,
running_ip => ’10.0.10.10’,
}
include postgres::munin
include postgres::backup
include cluster::node
postgres::config{
$::fqdn: listen => ’*’,
}
Julien Pivotto PostgreSQL 9.0 HA
39. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Example in the node file
dumbo.pp
postgres::hba {
$::fqdn:
allowedrules => [
"host all all $::ipaddress/32 trust",
’hostnossl all all 10.0.10.8/32 trust’,
’hostnossl all all 10.0.10.9/32 trust’,
’hostnossl all all 10.0.10.10/32 trust’,
’hostnossl replication replicuser 192.168.177.2/24 trust’,
],
}
Julien Pivotto PostgreSQL 9.0 HA
40. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Example in the node file
dumbo.pp
postgres::createsuperuser{
’replicuser’:
passwd => ’foobar’,
}
postgres::createuser{
’cerise’:
passwd => ’foobar’;
}
postgres::createdb{
’zodb_cerise’:
owner => ’cerise’,
require => Postgres::Createuser[’cerise’],
}
}
Julien Pivotto PostgreSQL 9.0 HA
41. ;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
#TODO
• The first synchronisation is not puppetized
• More advanced checks on the database #monitoringsucks
(e.g. slow queries)
• A disaster recovery
• Improve the ocf script
• Check the content of the backups
• . . .
Julien Pivotto PostgreSQL 9.0 HA