SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
1.
;
PostgreSQL 9.0 HA
Julien Pivotto
April, 1 2012 @ Loadays
2.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
Table of content
1 Overview
The mission
Before the migration
2 PostgreSQL 9.0
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
3 Clustering
Set up of corosync
OCF resource
4 Backups
Cron jobs
BackupPC
5 Monitoring
Nagios
Munin
6 Automation
Puppet module
The node file
#TODO
7 The end
Julien Pivotto PostgreSQL 9.0 HA
3.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
Who am I
• Julien Pivotto
• Consultant at Inuits since May 2011
• FOSS defender since 2005
Julien Pivotto PostgreSQL 9.0 HA
4.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
A.R.S.I.A.
• Association Régionale de Santé et d’Identification Animales
• 30 linux servers in several locations
• A lot of Open Source
• CentOS, Samba, Open-xchange, mailscanner, Cyrus,
• . . . Puppet, jenkins, foreman, OpenVPN, GLPI, rabbitmq,
• . . . BackupPC, CUPS, icinga, trac, zope, plone,
• . . . solr, pentaho, funambol, munin, squid, asterisk,
• . . . and PostgreSQL, . . .
Julien Pivotto PostgreSQL 9.0 HA
5.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
C.E.R.I.S.E
• A web application
• Plone (python)
• 15k+ visits, 500k+ pages and 2.000.000+ hits each month
• Developped by Affinitic
• Several databases
• PostgreSQL 9.0
• Oracle database
• Several servers/services
• Two reverse proxies in failover HA
• Two application servers in load balancing HA
• Two PostgreSQL servers in failover HA
• An oracledb server
• A development server
• A pentaho server
• Being integrated in jenkins (to be continued. . . )
Julien Pivotto PostgreSQL 9.0 HA
6.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
PostgreSQL before the migration
• PostgreSQL 8.3.7
• No native support of HA
• High availability with heartbeat 2 and DRBD
• Installed on the application servers
• Nothing automated
• Failover: Passive node is not even read only
• Installed in November 2008
Julien Pivotto PostgreSQL 9.0 HA
7.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
Monitoring before the installation
• Icinga
• Check of the DRBD
• Simple connection check to PostgreSQL
• Graphing with Cacti
• Size of the databases
• Connexions to the database
• Checkpoints
Julien Pivotto PostgreSQL 9.0 HA
8.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
The mission
Before the migration
Backups before the installation
• Backups were done every hour one the same machine
• External backups once a day on disk and on tape
• Backups are made with pg_dump command
• BackupPC get those files
Julien Pivotto PostgreSQL 9.0 HA
9.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
PostgreSQL 9.0
• PostgreSQL 9.0 was out in september 2010
• It brings to the world native replication in PostgresSQL
• There is not any native failover tool
• So we need to use PostgreSQL + Corosync
• The setup of PostgreSQL is managed by Puppet
Julien Pivotto PostgreSQL 9.0 HA
10.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Write-Ahead Logging
• It means that every change to datafile must first be written
into a log file
• Less disk writes: only the log file needs to be flushed to disk to
guarantee that a transaction is committed, rather than every
data file changed by the transaction
Julien Pivotto PostgreSQL 9.0 HA
11.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
What is streaming replication
• Streaming replication provides the capability to ship and apply
WAL XLOGS to standby servers
• It’s possible to have multiple standby servers
• Standby servers can be read-only ("Hot standby")
Julien Pivotto PostgreSQL 9.0 HA
12.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
DisadvantagesSpecifications of streaming replication
• Streaming replication supports only asynchronous log-shipping
• But when the database is used, the delay is close to
synchronous log-shipping
• Adding a standby server requires manual action
• But in our case we will only have one standby server
• PostgreSQL does not provide HA feature
• But Corosync does
• It is a single-threaded replication
• It is a single-threaded replication
Julien Pivotto PostgreSQL 9.0 HA
13.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Master configuration
The master only needs one configuration file.
Configuration non-related to SR
#Postgresql configuration
#http://www.postgresql.org/docs/9.0/interactive/index.html
listen_addresses = ’*’
max_connections = 200
shared_buffers = 4096MB
work_mem = 4096MB
effective_cache_size = 10024MB
commit_delay = 100000
effective_cache_size = 2560
log_destination = ‘stderr’
log_directory = ‘pg_log’
logging_collector = on
log_filename = ‘postgresql-%Y-%m-%d_%H%M%S.log’
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_min_messages = notice
log_min_duration_statement = 1000
log_line_prefix = ‘%t %u ’
log_statement = ‘none’
datestyle = ‘iso, dmy’
Julien Pivotto PostgreSQL 9.0 HA
14.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Master configuration
Configuration related to SR
wal_level = hot_standby
max_wal_senders = 2
wal_keep_segments = 128
Julien Pivotto PostgreSQL 9.0 HA
15.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Master configuration
• wal_level = hot_standby
Allows stanby server to be readable
• max_wal_senders = 2
We allow up to 2 standby nodes
• wal_keep_segments = 128
The minimum wal segments to keep
Julien Pivotto PostgreSQL 9.0 HA
16.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Slave configuration
• The slave requires at least two configuration files
• A postgreSQL.conf file
• A recovery.conf file, used to apply the WAL XLOGS shipped by
the master
• A trigger file to stop replication can be specified
PostgreSQL.conf - Configuration related to SR
wal_level = hot_standby
hot_standby = on
Note that the file also have the same first part of the config file
than the master configuration.
Julien Pivotto PostgreSQL 9.0 HA
17.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Slave configuration
recovery.conf
standby_mode = ‘on’
primary_conninfo = ‘host=192.168.177.2 user=replicuser’
• standby_mode means that this is a standby server
• primary_conninfo is the connection to the master
Julien Pivotto PostgreSQL 9.0 HA
18.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Replication user
• A super user called replication has to be created
• The SQL command to create it is
CREATE USER replication SUPERUSER LOGIN CONNECTION
LIMIT 1 ENCRYPTED PASSWORD ‘foobar’;
Julien Pivotto PostgreSQL 9.0 HA
19.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
pg_hba.conf
• pg_hba.conf is the file that contains some kind of ACLs for
the PostgreSQL connections
• In that file we will add both nodes as ‘trusted’ and the
replication user as trusted too
pg_hba.conf
hostnossl all all 10.0.10.8/32 trust
hostnossl all all 10.0.10.9/32 trust
hostnossl replication replicuser 192.168.177.2/24 trust
Julien Pivotto PostgreSQL 9.0 HA
20.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Intro
Streaming replication
Master configuration
Slave configuration
PostgreSQL specific tricks
Setting up a slave
Setting up a slave
• You have to type a bunch of commands on the master when
you add a new standby server
Adding a standby server
psql -c "SELECT pg_start_backup(’label’, true)"
rsync -a ${PGDATA}/ standby:/srv/pgsql/standby/ --exclude postmaster.pid --exclude ‘*-master’
--exclude ‘*-slave’
psql -c "SELECT pg_stop_backup()"
Julien Pivotto PostgreSQL 9.0 HA
21.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
Corosync configuration
• The goal of corosync is to make the switch between
master/slave when needed
• It will ensure that a master is online and connected to the
router
• The two servers are connected to each other on eth1
• Corosync is installed by Puppet
• We take it from the clusterlabs repositories
• We use a personalized master/slave ocf resource to manage
the PostgreSQL M/S
Julien Pivotto PostgreSQL 9.0 HA
22.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
The main configuration file of corosync is
/etc/corosync/crm.conf. It contains all the
resources/nodes/etc. . .
Defining the nodes
node babar.interne.arsia.be
attributes standby="off"
node dumbo.interne.arsia.be
attributes standby="off"
In this code, the two nodes are defined, and we tell corosync that
they should be started at launch.
Julien Pivotto PostgreSQL 9.0 HA
23.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
Defining the primitives
primitive pgsql ocf:inuits:pgsql-ms
primitive virt_ip ocf:heartbeat:IPaddr2
params nic="eth0" iflabel="0" ip="10.0.10.10" cidr_netmask="24" broadcast="10.0.10.255"
meta target-role="Started" is-managed="true"
primitive ping ocf:pacemaker:ping
params host_list="10.0.10.1"
op monitor interval="10s" timeout="10s"
op start interval="0" timeout="45s"
op stop interval="0" timeout="50s"
• We define 3 primitives:
• pgsql, the PostgreSQL primitive
• virt_ip, the floating IP address
• ping, the primitive that will check that the servers are
connected to the router
Julien Pivotto PostgreSQL 9.0 HA
24.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
Configuring the primitives
ms pgsql-ms pgsql
params pgsqlconfig="/var/lib/pgsql/data/postgresql.conf"
lsb_script="/etc/init.d/postgresql-9.0"
pgsqlrecovery="/var/lib/pgsql/data/recovery.conf"
meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="false"
clone clone-ping ping
meta globally-unique="false"
• We configure the PostgreSQL M/S: the init script, the
configuration files. . .
• We also configure the ping resource as a clone (it will be
launched on both servers)
Julien Pivotto PostgreSQL 9.0 HA
25.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
crm.conf
Defining the nodes
group PSQL virt_ip
location connected PSQL
rule $id="connected-rule" -inf: not_defined pingd or pingd lte 0
colocation ip_psql inf: PSQL pgsql-ms:Master
property $id="cib-bootstrap-options"
cluster-infrastructure="openais"
expected-quorum-votes="2"
stonith-enabled="false"
no-quorum-policy="ignore"
default-resource-stickiness="INFINITY"
rsc_defaults $id="rsc_defaults-options"
migration-threshold="INFINITY"
failure-timeout="10"
resource-stickiness="INFINITY"
• These lines will ensure that the master is always on the same
node as the floating IP address
• And also that the master is connected to the router
Julien Pivotto PostgreSQL 9.0 HA
26.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
OCF resource
• There is a custom OCF resource to manage the master/slave
PostgreSQL
• It is based on an example of resource written by Andrew
Beekhof from Clusterlabs
• The file has to be in
/usr/lib/ocf/resource.d/inuits/pgsql-ms
Julien Pivotto PostgreSQL 9.0 HA
27.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Set up of corosync
OCF resource
OCF resource
• The script does the following:
• It moves the postgresql.conf-master to
postgresql.conf when a node is promoted/master
• It moves the postgresql.conf-slave to postgresql.conf
when a node is depromoted/slave
• It ensure that recovery.conf-slave is on recovery.conf
on slave and absent on master
• It starts/restarts PostgreSQL when needed.
• I will post that file on Github soon
Julien Pivotto PostgreSQL 9.0 HA
28.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Cron jobs
BackupPC
Backups of the databases
• Sometimes, you need backups (especially when you don’t have
backups. . . )
• We do a backup per hour on each node (one at minute 0 and
one at minute 30)
• We do a backup per day on each node
• We do a backup per day on before BackupPC backup on each
node.
• We keep 24 hourly backups and 7 daily backups on disk
• With BackupPC we keep months of backups
Julien Pivotto PostgreSQL 9.0 HA
29.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Cron jobs
BackupPC
Hourly backup script
/usr/local/bin/backup_hourly.sh
#!/bin/bash
DATE=$(date +%H)
BACKUP_PATH=/var/lib/backups/hourly
for db in foobar_db foobar2_db
do
/usr/bin/pg_dump $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz
ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz
done
The daily script is almost the same.
Julien Pivotto PostgreSQL 9.0 HA
30.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Cron jobs
BackupPC
BackupPC script
/usr/local/bin/backup_backuppc.sh
#!/bin/bash
DATE=$(date +%u)
BACKUP_PATH=/var/lib/backups/backuppc
for db in cerise trackitquality trackit zodb_cerise
do
/usr/bin/pg_dump -U postgres $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz
ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz
done
In the backupPC config, I added the following:
BackupPC config
$Conf{DumpPreUserCmd} = ‘$sshPath -t -q -x -l backuppc $host /usr/local/bin/backup_backuppc.sh’;
Julien Pivotto PostgreSQL 9.0 HA
31.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Nagios
Munin
check_postgres script
• The check_postgres.pl is a nagios-compatible perl script
• Available on http://www.bucardo.org/check_postgres/
and on Github
• What we check with it:
• The current connections
• The status of the replication (the delay)
Julien Pivotto PostgreSQL 9.0 HA
32.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Nagios
Munin
Check hot_standby latency
• The check_postgres.pl script has a check for hot_standby
delay
• But we do not know who is the master and the slave, and it is
required to launch the script
• So, here is a bash script I wrote to know the M/S order
Master/slave replication check
#!/bin/bash
/usr/lib64/nagios/plugins/check_postgres.pl --db="$1"
--action hot_standby_delay -w 300 -c 600 --host=$(
crm_resource --resource pgsql-ms --locate|
awk ‘/Master/ {master=$6} / $/ {slave=$6} END {print master","slave}’
)
Julien Pivotto PostgreSQL 9.0 HA
33.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Nagios
Munin
Munin postgres scripts
• Munin is shipped with perl plugins for postgresql
• We use four of them:
• postgres_size,
• postgres_checkpoints,
• postgres_connections_db,
• postgres_cache
Julien Pivotto PostgreSQL 9.0 HA
34.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Nagios
Munin
Munin postgres scripts
Julien Pivotto PostgreSQL 9.0 HA
35.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Puppet module
• The puppet postgres module is forked from Kris Buytaert’s
github page
• It is modified to remove all references to services, because we
want corosync to manage them
• It creates the users, the super users, the databases
• It is a parameterized class, with a "cluster" parameter. So we
can also install simple PostgreSQL
• The cache sizes are parameterized too, so we can also use that
in Vagrant boxes
• Here are some examples from the module I will upload on
Github ASAP
Julien Pivotto PostgreSQL 9.0 HA
36.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Class postgres
The postgres class installs the packages and makes the initdb stuff.
init.pp
class postgres (
$cluster = ‘no’,
$running_ip = ‘127.0.0.1’
){ ...
• The cluster parameter indicates if we want or not clustering
• running_ip is used for the SQL commands. In case of a
cluster, you have to put cluste’s IP address here.
Julien Pivotto PostgreSQL 9.0 HA
38.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Example in the node file
Here is the result in the node file:
dumbo.pp
node babar {
class {
’postgres’:
cluster => ’yes’,
running_ip => ’10.0.10.10’,
}
include postgres::munin
include postgres::backup
include cluster::node
postgres::config{
$::fqdn: listen => ’*’,
}
Julien Pivotto PostgreSQL 9.0 HA
39.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Example in the node file
dumbo.pp
postgres::hba {
$::fqdn:
allowedrules => [
"host all all $::ipaddress/32 trust",
’hostnossl all all 10.0.10.8/32 trust’,
’hostnossl all all 10.0.10.9/32 trust’,
’hostnossl all all 10.0.10.10/32 trust’,
’hostnossl replication replicuser 192.168.177.2/24 trust’,
],
}
Julien Pivotto PostgreSQL 9.0 HA
40.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
Example in the node file
dumbo.pp
postgres::createsuperuser{
’replicuser’:
passwd => ’foobar’,
}
postgres::createuser{
’cerise’:
passwd => ’foobar’;
}
postgres::createdb{
’zodb_cerise’:
owner => ’cerise’,
require => Postgres::Createuser[’cerise’],
}
}
Julien Pivotto PostgreSQL 9.0 HA
41.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Puppet module
The node file
#TODO
#TODO
• The first synchronisation is not puppetized
• More advanced checks on the database #monitoringsucks
(e.g. slow queries)
• A disaster recovery
• Improve the ocf script
• Check the content of the backups
• . . .
Julien Pivotto PostgreSQL 9.0 HA
42.
;
Overview
PostgreSQL 9.0
Clustering
Backups
Monitoring
Automation
The end
Any questions?
Julien Pivotto PostgreSQL 9.0 HA