• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Postgresql 9.0 HA at LOADAYS 2012
 

Postgresql 9.0 HA at LOADAYS 2012

on

  • 4,485 views

 

Statistics

Views

Total Views
4,485
Views on SlideShare
4,483
Embed Views
2

Actions

Likes
7
Downloads
142
Comments
0

2 Embeds 2

https://si0.twimg.com 1
https://abs.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Postgresql 9.0 HA at LOADAYS 2012 Postgresql 9.0 HA at LOADAYS 2012 Presentation Transcript

    • ; PostgreSQL 9.0 HA Julien Pivotto April, 1 2012 @ Loadays
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Table of content 1 Overview The mission Before the migration 2 PostgreSQL 9.0 Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave 3 Clustering Set up of corosync OCF resource 4 Backups Cron jobs BackupPC 5 Monitoring Nagios Munin 6 Automation Puppet module The node file #TODO 7 The end Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Who am I • Julien Pivotto • Consultant at Inuits since May 2011 • FOSS defender since 2005 Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration A.R.S.I.A. • Association Régionale de Santé et d’Identification Animales • 30 linux servers in several locations • A lot of Open Source • CentOS, Samba, Open-xchange, mailscanner, Cyrus, • . . . Puppet, jenkins, foreman, OpenVPN, GLPI, rabbitmq, • . . . BackupPC, CUPS, icinga, trac, zope, plone, • . . . solr, pentaho, funambol, munin, squid, asterisk, • . . . and PostgreSQL, . . . Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration C.E.R.I.S.E • A web application • Plone (python) • 15k+ visits, 500k+ pages and 2.000.000+ hits each month • Developped by Affinitic • Several databases • PostgreSQL 9.0 • Oracle database • Several servers/services • Two reverse proxies in failover HA • Two application servers in load balancing HA • Two PostgreSQL servers in failover HA • An oracledb server • A development server • A pentaho server • Being integrated in jenkins (to be continued. . . ) Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration PostgreSQL before the migration • PostgreSQL 8.3.7 • No native support of HA • High availability with heartbeat 2 and DRBD • Installed on the application servers • Nothing automated • Failover: Passive node is not even read only • Installed in November 2008 Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Monitoring before the installation • Icinga • Check of the DRBD • Simple connection check to PostgreSQL • Graphing with Cacti • Size of the databases • Connexions to the database • Checkpoints Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Backups before the installation • Backups were done every hour one the same machine • External backups once a day on disk and on tape • Backups are made with pg_dump command • BackupPC get those files Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave PostgreSQL 9.0 • PostgreSQL 9.0 was out in september 2010 • It brings to the world native replication in PostgresSQL • There is not any native failover tool • So we need to use PostgreSQL + Corosync • The setup of PostgreSQL is managed by Puppet Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Write-Ahead Logging • It means that every change to datafile must first be written into a log file • Less disk writes: only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave What is streaming replication • Streaming replication provides the capability to ship and apply WAL XLOGS to standby servers • It’s possible to have multiple standby servers • Standby servers can be read-only ("Hot standby") Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave DisadvantagesSpecifications of streaming replication • Streaming replication supports only asynchronous log-shipping • But when the database is used, the delay is close to synchronous log-shipping • Adding a standby server requires manual action • But in our case we will only have one standby server • PostgreSQL does not provide HA feature • But Corosync does • It is a single-threaded replication • It is a single-threaded replication Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Master configuration The master only needs one configuration file. Configuration non-related to SR #Postgresql configuration #http://www.postgresql.org/docs/9.0/interactive/index.html listen_addresses = ’*’ max_connections = 200 shared_buffers = 4096MB work_mem = 4096MB effective_cache_size = 10024MB commit_delay = 100000 effective_cache_size = 2560 log_destination = ‘stderr’ log_directory = ‘pg_log’ logging_collector = on log_filename = ‘postgresql-%Y-%m-%d_%H%M%S.log’ log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_min_messages = notice log_min_duration_statement = 1000 log_line_prefix = ‘%t %u ’ log_statement = ‘none’ datestyle = ‘iso, dmy’ Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Master configuration Configuration related to SR wal_level = hot_standby max_wal_senders = 2 wal_keep_segments = 128 Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Master configuration • wal_level = hot_standby Allows stanby server to be readable • max_wal_senders = 2 We allow up to 2 standby nodes • wal_keep_segments = 128 The minimum wal segments to keep Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Slave configuration • The slave requires at least two configuration files • A postgreSQL.conf file • A recovery.conf file, used to apply the WAL XLOGS shipped by the master • A trigger file to stop replication can be specified PostgreSQL.conf - Configuration related to SR wal_level = hot_standby hot_standby = on Note that the file also have the same first part of the config file than the master configuration. Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Slave configuration recovery.conf standby_mode = ‘on’ primary_conninfo = ‘host=192.168.177.2 user=replicuser’ • standby_mode means that this is a standby server • primary_conninfo is the connection to the master Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Replication user • A super user called replication has to be created • The SQL command to create it is CREATE USER replication SUPERUSER LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD ‘foobar’; Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave pg_hba.conf • pg_hba.conf is the file that contains some kind of ACLs for the PostgreSQL connections • In that file we will add both nodes as ‘trusted’ and the replication user as trusted too pg_hba.conf hostnossl all all 10.0.10.8/32 trust hostnossl all all 10.0.10.9/32 trust hostnossl replication replicuser 192.168.177.2/24 trust Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Setting up a slave • You have to type a bunch of commands on the master when you add a new standby server Adding a standby server psql -c "SELECT pg_start_backup(’label’, true)" rsync -a ${PGDATA}/ standby:/srv/pgsql/standby/ --exclude postmaster.pid --exclude ‘*-master’ --exclude ‘*-slave’ psql -c "SELECT pg_stop_backup()" Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource Corosync configuration • The goal of corosync is to make the switch between master/slave when needed • It will ensure that a master is online and connected to the router • The two servers are connected to each other on eth1 • Corosync is installed by Puppet • We take it from the clusterlabs repositories • We use a personalized master/slave ocf resource to manage the PostgreSQL M/S Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf The main configuration file of corosync is /etc/corosync/crm.conf. It contains all the resources/nodes/etc. . . Defining the nodes node babar.interne.arsia.be attributes standby="off" node dumbo.interne.arsia.be attributes standby="off" In this code, the two nodes are defined, and we tell corosync that they should be started at launch. Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf Defining the primitives primitive pgsql ocf:inuits:pgsql-ms primitive virt_ip ocf:heartbeat:IPaddr2 params nic="eth0" iflabel="0" ip="10.0.10.10" cidr_netmask="24" broadcast="10.0.10.255" meta target-role="Started" is-managed="true" primitive ping ocf:pacemaker:ping params host_list="10.0.10.1" op monitor interval="10s" timeout="10s" op start interval="0" timeout="45s" op stop interval="0" timeout="50s" • We define 3 primitives: • pgsql, the PostgreSQL primitive • virt_ip, the floating IP address • ping, the primitive that will check that the servers are connected to the router Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf Configuring the primitives ms pgsql-ms pgsql params pgsqlconfig="/var/lib/pgsql/data/postgresql.conf" lsb_script="/etc/init.d/postgresql-9.0" pgsqlrecovery="/var/lib/pgsql/data/recovery.conf" meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="false" clone clone-ping ping meta globally-unique="false" • We configure the PostgreSQL M/S: the init script, the configuration files. . . • We also configure the ping resource as a clone (it will be launched on both servers) Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf Defining the nodes group PSQL virt_ip location connected PSQL rule $id="connected-rule" -inf: not_defined pingd or pingd lte 0 colocation ip_psql inf: PSQL pgsql-ms:Master property $id="cib-bootstrap-options" cluster-infrastructure="openais" expected-quorum-votes="2" stonith-enabled="false" no-quorum-policy="ignore" default-resource-stickiness="INFINITY" rsc_defaults $id="rsc_defaults-options" migration-threshold="INFINITY" failure-timeout="10" resource-stickiness="INFINITY" • These lines will ensure that the master is always on the same node as the floating IP address • And also that the master is connected to the router Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource OCF resource • There is a custom OCF resource to manage the master/slave PostgreSQL • It is based on an example of resource written by Andrew Beekhof from Clusterlabs • The file has to be in /usr/lib/ocf/resource.d/inuits/pgsql-ms Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource OCF resource • The script does the following: • It moves the postgresql.conf-master to postgresql.conf when a node is promoted/master • It moves the postgresql.conf-slave to postgresql.conf when a node is depromoted/slave • It ensure that recovery.conf-slave is on recovery.conf on slave and absent on master • It starts/restarts PostgreSQL when needed. • I will post that file on Github soon Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Cron jobs BackupPC Backups of the databases • Sometimes, you need backups (especially when you don’t have backups. . . ) • We do a backup per hour on each node (one at minute 0 and one at minute 30) • We do a backup per day on each node • We do a backup per day on before BackupPC backup on each node. • We keep 24 hourly backups and 7 daily backups on disk • With BackupPC we keep months of backups Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Cron jobs BackupPC Hourly backup script /usr/local/bin/backup_hourly.sh #!/bin/bash DATE=$(date +%H) BACKUP_PATH=/var/lib/backups/hourly for db in foobar_db foobar2_db do /usr/bin/pg_dump $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz done The daily script is almost the same. Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Cron jobs BackupPC BackupPC script /usr/local/bin/backup_backuppc.sh #!/bin/bash DATE=$(date +%u) BACKUP_PATH=/var/lib/backups/backuppc for db in cerise trackitquality trackit zodb_cerise do /usr/bin/pg_dump -U postgres $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz done In the backupPC config, I added the following: BackupPC config $Conf{DumpPreUserCmd} = ‘$sshPath -t -q -x -l backuppc $host /usr/local/bin/backup_backuppc.sh’; Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin check_postgres script • The check_postgres.pl is a nagios-compatible perl script • Available on http://www.bucardo.org/check_postgres/ and on Github • What we check with it: • The current connections • The status of the replication (the delay) Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin Check hot_standby latency • The check_postgres.pl script has a check for hot_standby delay • But we do not know who is the master and the slave, and it is required to launch the script • So, here is a bash script I wrote to know the M/S order Master/slave replication check #!/bin/bash /usr/lib64/nagios/plugins/check_postgres.pl --db="$1" --action hot_standby_delay -w 300 -c 600 --host=$( crm_resource --resource pgsql-ms --locate| awk ‘/Master/ {master=$6} / $/ {slave=$6} END {print master","slave}’ ) Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin Munin postgres scripts • Munin is shipped with perl plugins for postgresql • We use four of them: • postgres_size, • postgres_checkpoints, • postgres_connections_db, • postgres_cache Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin Munin postgres scripts Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Puppet module • The puppet postgres module is forked from Kris Buytaert’s github page • It is modified to remove all references to services, because we want corosync to manage them • It creates the users, the super users, the databases • It is a parameterized class, with a "cluster" parameter. So we can also install simple PostgreSQL • The cache sizes are parameterized too, so we can also use that in Vagrant boxes • Here are some examples from the module I will upload on Github ASAP Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Class postgres The postgres class installs the packages and makes the initdb stuff. init.pp class postgres ( $cluster = ‘no’, $running_ip = ‘127.0.0.1’ ){ ... • The cluster parameter indicates if we want or not clustering • running_ip is used for the SQL commands. In case of a cluster, you have to put cluste’s IP address here. Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Sqlexec definition sqlexec.pp define postgres::sqlexec($username, $database, $sql, $sqlcheck) { exec{ "psql -h $postgres::running_ip –username=${username} $database -c ¨${sql}¨>> /var/log/puppet-postgresql.sql.log 2>&1 && /bin/sleep 5": environment => "PGPASSWORD=${postgres_password}", path => $::path, timeout => 600, unless => "psql -h $postgres::running_ip -U $username $database -c $sqlcheck", require => Service[’postgresql-9.0’], } } Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Example in the node file Here is the result in the node file: dumbo.pp node babar { class { ’postgres’: cluster => ’yes’, running_ip => ’10.0.10.10’, } include postgres::munin include postgres::backup include cluster::node postgres::config{ $::fqdn: listen => ’*’, } Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Example in the node file dumbo.pp postgres::hba { $::fqdn: allowedrules => [ "host all all $::ipaddress/32 trust", ’hostnossl all all 10.0.10.8/32 trust’, ’hostnossl all all 10.0.10.9/32 trust’, ’hostnossl all all 10.0.10.10/32 trust’, ’hostnossl replication replicuser 192.168.177.2/24 trust’, ], } Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Example in the node file dumbo.pp postgres::createsuperuser{ ’replicuser’: passwd => ’foobar’, } postgres::createuser{ ’cerise’: passwd => ’foobar’; } postgres::createdb{ ’zodb_cerise’: owner => ’cerise’, require => Postgres::Createuser[’cerise’], } } Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO #TODO • The first synchronisation is not puppetized • More advanced checks on the database #monitoringsucks (e.g. slow queries) • A disaster recovery • Improve the ocf script • Check the content of the backups • . . . Julien Pivotto PostgreSQL 9.0 HA
    • ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Any questions? Julien Pivotto PostgreSQL 9.0 HA