Postgresql 9.0 HA at LOADAYS 2012

  • 4,885 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,885
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
161
Comments
0
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ; PostgreSQL 9.0 HA Julien Pivotto April, 1 2012 @ Loadays
  • 2. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Table of content 1 Overview The mission Before the migration 2 PostgreSQL 9.0 Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave 3 Clustering Set up of corosync OCF resource 4 Backups Cron jobs BackupPC 5 Monitoring Nagios Munin 6 Automation Puppet module The node file #TODO 7 The end Julien Pivotto PostgreSQL 9.0 HA
  • 3. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Who am I • Julien Pivotto • Consultant at Inuits since May 2011 • FOSS defender since 2005 Julien Pivotto PostgreSQL 9.0 HA
  • 4. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration A.R.S.I.A. • Association Régionale de Santé et d’Identification Animales • 30 linux servers in several locations • A lot of Open Source • CentOS, Samba, Open-xchange, mailscanner, Cyrus, • . . . Puppet, jenkins, foreman, OpenVPN, GLPI, rabbitmq, • . . . BackupPC, CUPS, icinga, trac, zope, plone, • . . . solr, pentaho, funambol, munin, squid, asterisk, • . . . and PostgreSQL, . . . Julien Pivotto PostgreSQL 9.0 HA
  • 5. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration C.E.R.I.S.E • A web application • Plone (python) • 15k+ visits, 500k+ pages and 2.000.000+ hits each month • Developped by Affinitic • Several databases • PostgreSQL 9.0 • Oracle database • Several servers/services • Two reverse proxies in failover HA • Two application servers in load balancing HA • Two PostgreSQL servers in failover HA • An oracledb server • A development server • A pentaho server • Being integrated in jenkins (to be continued. . . ) Julien Pivotto PostgreSQL 9.0 HA
  • 6. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration PostgreSQL before the migration • PostgreSQL 8.3.7 • No native support of HA • High availability with heartbeat 2 and DRBD • Installed on the application servers • Nothing automated • Failover: Passive node is not even read only • Installed in November 2008 Julien Pivotto PostgreSQL 9.0 HA
  • 7. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Monitoring before the installation • Icinga • Check of the DRBD • Simple connection check to PostgreSQL • Graphing with Cacti • Size of the databases • Connexions to the database • Checkpoints Julien Pivotto PostgreSQL 9.0 HA
  • 8. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end The mission Before the migration Backups before the installation • Backups were done every hour one the same machine • External backups once a day on disk and on tape • Backups are made with pg_dump command • BackupPC get those files Julien Pivotto PostgreSQL 9.0 HA
  • 9. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave PostgreSQL 9.0 • PostgreSQL 9.0 was out in september 2010 • It brings to the world native replication in PostgresSQL • There is not any native failover tool • So we need to use PostgreSQL + Corosync • The setup of PostgreSQL is managed by Puppet Julien Pivotto PostgreSQL 9.0 HA
  • 10. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Write-Ahead Logging • It means that every change to datafile must first be written into a log file • Less disk writes: only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction Julien Pivotto PostgreSQL 9.0 HA
  • 11. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave What is streaming replication • Streaming replication provides the capability to ship and apply WAL XLOGS to standby servers • It’s possible to have multiple standby servers • Standby servers can be read-only ("Hot standby") Julien Pivotto PostgreSQL 9.0 HA
  • 12. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave DisadvantagesSpecifications of streaming replication • Streaming replication supports only asynchronous log-shipping • But when the database is used, the delay is close to synchronous log-shipping • Adding a standby server requires manual action • But in our case we will only have one standby server • PostgreSQL does not provide HA feature • But Corosync does • It is a single-threaded replication • It is a single-threaded replication Julien Pivotto PostgreSQL 9.0 HA
  • 13. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Master configuration The master only needs one configuration file. Configuration non-related to SR #Postgresql configuration #http://www.postgresql.org/docs/9.0/interactive/index.html listen_addresses = ’*’ max_connections = 200 shared_buffers = 4096MB work_mem = 4096MB effective_cache_size = 10024MB commit_delay = 100000 effective_cache_size = 2560 log_destination = ‘stderr’ log_directory = ‘pg_log’ logging_collector = on log_filename = ‘postgresql-%Y-%m-%d_%H%M%S.log’ log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_min_messages = notice log_min_duration_statement = 1000 log_line_prefix = ‘%t %u ’ log_statement = ‘none’ datestyle = ‘iso, dmy’ Julien Pivotto PostgreSQL 9.0 HA
  • 14. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Master configuration Configuration related to SR wal_level = hot_standby max_wal_senders = 2 wal_keep_segments = 128 Julien Pivotto PostgreSQL 9.0 HA
  • 15. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Master configuration • wal_level = hot_standby Allows stanby server to be readable • max_wal_senders = 2 We allow up to 2 standby nodes • wal_keep_segments = 128 The minimum wal segments to keep Julien Pivotto PostgreSQL 9.0 HA
  • 16. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Slave configuration • The slave requires at least two configuration files • A postgreSQL.conf file • A recovery.conf file, used to apply the WAL XLOGS shipped by the master • A trigger file to stop replication can be specified PostgreSQL.conf - Configuration related to SR wal_level = hot_standby hot_standby = on Note that the file also have the same first part of the config file than the master configuration. Julien Pivotto PostgreSQL 9.0 HA
  • 17. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Slave configuration recovery.conf standby_mode = ‘on’ primary_conninfo = ‘host=192.168.177.2 user=replicuser’ • standby_mode means that this is a standby server • primary_conninfo is the connection to the master Julien Pivotto PostgreSQL 9.0 HA
  • 18. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Replication user • A super user called replication has to be created • The SQL command to create it is CREATE USER replication SUPERUSER LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD ‘foobar’; Julien Pivotto PostgreSQL 9.0 HA
  • 19. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave pg_hba.conf • pg_hba.conf is the file that contains some kind of ACLs for the PostgreSQL connections • In that file we will add both nodes as ‘trusted’ and the replication user as trusted too pg_hba.conf hostnossl all all 10.0.10.8/32 trust hostnossl all all 10.0.10.9/32 trust hostnossl replication replicuser 192.168.177.2/24 trust Julien Pivotto PostgreSQL 9.0 HA
  • 20. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Intro Streaming replication Master configuration Slave configuration PostgreSQL specific tricks Setting up a slave Setting up a slave • You have to type a bunch of commands on the master when you add a new standby server Adding a standby server psql -c "SELECT pg_start_backup(’label’, true)" rsync -a ${PGDATA}/ standby:/srv/pgsql/standby/ --exclude postmaster.pid --exclude ‘*-master’ --exclude ‘*-slave’ psql -c "SELECT pg_stop_backup()" Julien Pivotto PostgreSQL 9.0 HA
  • 21. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource Corosync configuration • The goal of corosync is to make the switch between master/slave when needed • It will ensure that a master is online and connected to the router • The two servers are connected to each other on eth1 • Corosync is installed by Puppet • We take it from the clusterlabs repositories • We use a personalized master/slave ocf resource to manage the PostgreSQL M/S Julien Pivotto PostgreSQL 9.0 HA
  • 22. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf The main configuration file of corosync is /etc/corosync/crm.conf. It contains all the resources/nodes/etc. . . Defining the nodes node babar.interne.arsia.be attributes standby="off" node dumbo.interne.arsia.be attributes standby="off" In this code, the two nodes are defined, and we tell corosync that they should be started at launch. Julien Pivotto PostgreSQL 9.0 HA
  • 23. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf Defining the primitives primitive pgsql ocf:inuits:pgsql-ms primitive virt_ip ocf:heartbeat:IPaddr2 params nic="eth0" iflabel="0" ip="10.0.10.10" cidr_netmask="24" broadcast="10.0.10.255" meta target-role="Started" is-managed="true" primitive ping ocf:pacemaker:ping params host_list="10.0.10.1" op monitor interval="10s" timeout="10s" op start interval="0" timeout="45s" op stop interval="0" timeout="50s" • We define 3 primitives: • pgsql, the PostgreSQL primitive • virt_ip, the floating IP address • ping, the primitive that will check that the servers are connected to the router Julien Pivotto PostgreSQL 9.0 HA
  • 24. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf Configuring the primitives ms pgsql-ms pgsql params pgsqlconfig="/var/lib/pgsql/data/postgresql.conf" lsb_script="/etc/init.d/postgresql-9.0" pgsqlrecovery="/var/lib/pgsql/data/recovery.conf" meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="false" clone clone-ping ping meta globally-unique="false" • We configure the PostgreSQL M/S: the init script, the configuration files. . . • We also configure the ping resource as a clone (it will be launched on both servers) Julien Pivotto PostgreSQL 9.0 HA
  • 25. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource crm.conf Defining the nodes group PSQL virt_ip location connected PSQL rule $id="connected-rule" -inf: not_defined pingd or pingd lte 0 colocation ip_psql inf: PSQL pgsql-ms:Master property $id="cib-bootstrap-options" cluster-infrastructure="openais" expected-quorum-votes="2" stonith-enabled="false" no-quorum-policy="ignore" default-resource-stickiness="INFINITY" rsc_defaults $id="rsc_defaults-options" migration-threshold="INFINITY" failure-timeout="10" resource-stickiness="INFINITY" • These lines will ensure that the master is always on the same node as the floating IP address • And also that the master is connected to the router Julien Pivotto PostgreSQL 9.0 HA
  • 26. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource OCF resource • There is a custom OCF resource to manage the master/slave PostgreSQL • It is based on an example of resource written by Andrew Beekhof from Clusterlabs • The file has to be in /usr/lib/ocf/resource.d/inuits/pgsql-ms Julien Pivotto PostgreSQL 9.0 HA
  • 27. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Set up of corosync OCF resource OCF resource • The script does the following: • It moves the postgresql.conf-master to postgresql.conf when a node is promoted/master • It moves the postgresql.conf-slave to postgresql.conf when a node is depromoted/slave • It ensure that recovery.conf-slave is on recovery.conf on slave and absent on master • It starts/restarts PostgreSQL when needed. • I will post that file on Github soon Julien Pivotto PostgreSQL 9.0 HA
  • 28. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Cron jobs BackupPC Backups of the databases • Sometimes, you need backups (especially when you don’t have backups. . . ) • We do a backup per hour on each node (one at minute 0 and one at minute 30) • We do a backup per day on each node • We do a backup per day on before BackupPC backup on each node. • We keep 24 hourly backups and 7 daily backups on disk • With BackupPC we keep months of backups Julien Pivotto PostgreSQL 9.0 HA
  • 29. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Cron jobs BackupPC Hourly backup script /usr/local/bin/backup_hourly.sh #!/bin/bash DATE=$(date +%H) BACKUP_PATH=/var/lib/backups/hourly for db in foobar_db foobar2_db do /usr/bin/pg_dump $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz done The daily script is almost the same. Julien Pivotto PostgreSQL 9.0 HA
  • 30. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Cron jobs BackupPC BackupPC script /usr/local/bin/backup_backuppc.sh #!/bin/bash DATE=$(date +%u) BACKUP_PATH=/var/lib/backups/backuppc for db in cerise trackitquality trackit zodb_cerise do /usr/bin/pg_dump -U postgres $db | gzip > $BACKUP_PATH/${db}_$DATE.pgsql.gz ln -fs $BACKUP_PATH/${db}_$DATE.pgsql.gz $BACKUP_PATH/${db}_current.pgsql.gz done In the backupPC config, I added the following: BackupPC config $Conf{DumpPreUserCmd} = ‘$sshPath -t -q -x -l backuppc $host /usr/local/bin/backup_backuppc.sh’; Julien Pivotto PostgreSQL 9.0 HA
  • 31. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin check_postgres script • The check_postgres.pl is a nagios-compatible perl script • Available on http://www.bucardo.org/check_postgres/ and on Github • What we check with it: • The current connections • The status of the replication (the delay) Julien Pivotto PostgreSQL 9.0 HA
  • 32. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin Check hot_standby latency • The check_postgres.pl script has a check for hot_standby delay • But we do not know who is the master and the slave, and it is required to launch the script • So, here is a bash script I wrote to know the M/S order Master/slave replication check #!/bin/bash /usr/lib64/nagios/plugins/check_postgres.pl --db="$1" --action hot_standby_delay -w 300 -c 600 --host=$( crm_resource --resource pgsql-ms --locate| awk ‘/Master/ {master=$6} / $/ {slave=$6} END {print master","slave}’ ) Julien Pivotto PostgreSQL 9.0 HA
  • 33. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin Munin postgres scripts • Munin is shipped with perl plugins for postgresql • We use four of them: • postgres_size, • postgres_checkpoints, • postgres_connections_db, • postgres_cache Julien Pivotto PostgreSQL 9.0 HA
  • 34. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Nagios Munin Munin postgres scripts Julien Pivotto PostgreSQL 9.0 HA
  • 35. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Puppet module • The puppet postgres module is forked from Kris Buytaert’s github page • It is modified to remove all references to services, because we want corosync to manage them • It creates the users, the super users, the databases • It is a parameterized class, with a "cluster" parameter. So we can also install simple PostgreSQL • The cache sizes are parameterized too, so we can also use that in Vagrant boxes • Here are some examples from the module I will upload on Github ASAP Julien Pivotto PostgreSQL 9.0 HA
  • 36. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Class postgres The postgres class installs the packages and makes the initdb stuff. init.pp class postgres ( $cluster = ‘no’, $running_ip = ‘127.0.0.1’ ){ ... • The cluster parameter indicates if we want or not clustering • running_ip is used for the SQL commands. In case of a cluster, you have to put cluste’s IP address here. Julien Pivotto PostgreSQL 9.0 HA
  • 37. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Sqlexec definition sqlexec.pp define postgres::sqlexec($username, $database, $sql, $sqlcheck) { exec{ "psql -h $postgres::running_ip –username=${username} $database -c ¨${sql}¨>> /var/log/puppet-postgresql.sql.log 2>&1 && /bin/sleep 5": environment => "PGPASSWORD=${postgres_password}", path => $::path, timeout => 600, unless => "psql -h $postgres::running_ip -U $username $database -c $sqlcheck", require => Service[’postgresql-9.0’], } } Julien Pivotto PostgreSQL 9.0 HA
  • 38. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Example in the node file Here is the result in the node file: dumbo.pp node babar { class { ’postgres’: cluster => ’yes’, running_ip => ’10.0.10.10’, } include postgres::munin include postgres::backup include cluster::node postgres::config{ $::fqdn: listen => ’*’, } Julien Pivotto PostgreSQL 9.0 HA
  • 39. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Example in the node file dumbo.pp postgres::hba { $::fqdn: allowedrules => [ "host all all $::ipaddress/32 trust", ’hostnossl all all 10.0.10.8/32 trust’, ’hostnossl all all 10.0.10.9/32 trust’, ’hostnossl all all 10.0.10.10/32 trust’, ’hostnossl replication replicuser 192.168.177.2/24 trust’, ], } Julien Pivotto PostgreSQL 9.0 HA
  • 40. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO Example in the node file dumbo.pp postgres::createsuperuser{ ’replicuser’: passwd => ’foobar’, } postgres::createuser{ ’cerise’: passwd => ’foobar’; } postgres::createdb{ ’zodb_cerise’: owner => ’cerise’, require => Postgres::Createuser[’cerise’], } } Julien Pivotto PostgreSQL 9.0 HA
  • 41. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Puppet module The node file #TODO #TODO • The first synchronisation is not puppetized • More advanced checks on the database #monitoringsucks (e.g. slow queries) • A disaster recovery • Improve the ocf script • Check the content of the backups • . . . Julien Pivotto PostgreSQL 9.0 HA
  • 42. ; Overview PostgreSQL 9.0 Clustering Backups Monitoring Automation The end Any questions? Julien Pivotto PostgreSQL 9.0 HA