Patroni
Sébastien Lardière
Loxodata
mars 2019
slardiere PATR – mars 2019 1 / 28
PATRONI
slardiere PATR – mars 2019 2 / 28
Haute Disponibilité
Gestion de la haute disponibilité :
PostgreSQL : priorité aux données : pas de bascule automatique
si besoin de priorité du service : bascule automatique
outils :
PAF (Pacemaker et Corosync)
Repmgr (ad-hoc)
Patroni (DCS Raft) de Zalando
slardiere PATR – mars 2019 3 / 28
QU’EST-CE QUE C’EST ?
slardiere PATR – mars 2019 4 / 28
Quoi?
Gestion de la haute disponibilité du service PostgreSQL :
bascule automatique : failover
contrôle de la bascule : switchover
s’appuie sur un DCS :
etcd, consul, zookeeper et même kubernetes
consensus RAFT : http://thesecretlivesofdata.com/raft/
intégration avec les sauvegardes physiques (pgbackrest)
Source : https://github.com/zalando/patroni
Doc : https://patroni.readthedocs.io/en/latest/
Ansible : https://github.com/IrisNetwork/ansible-patroni
slardiere PATR – mars 2019 5 / 28
COMMENT ÇA MARCHE ?
slardiere PATR – mars 2019 6 / 28
Comment?
Composants :
Cluster de DCS : un nœud par instance PostgreSQL
démon (python) contrôlant l’instance PostgreSQL
et son merveilleux fichier de configuration en YAML
incluant une API REST
une commande patronictl
un mécanisme de sauvegarde physique, idéalement PgBackRest
slardiere PATR – mars 2019 7 / 28
Clients HAproxy Backups
PG1
PG2
PG3
PG4 PG5
DCS1
DCS2
DCS3
DCS4 DCS5
Patroni 1
Patroni 2
Patroni 3
Patroni 4 Patroni 5
Connect
Ask
Connect
Connect
Backup
Restore
slardiere PATR – mars 2019 8 / 28
Comment?
Configuration de Patroni :
syntaxe YAML
bootstrap :
création des données de PostgreSQL : initdb
restauration depuis PgBackRest
configuration de l’instance PostgreSQL
bootstrap des replicas
copie du primary avec pg_basebackup
restauration depuis PgBackRest
slardiere PATR – mars 2019 9 / 28
YAML
Exemple
scope: ats
name: pg1
restapi:
listen: 192.168.122.30:8008
connect_address: 192.168.122.30:8008
etcd:
host: 192.168.122.30:2379
slardiere PATR – mars 2019 10 / 28
Exemple
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: false
use_slots: true
parameters:
log_destination: "syslog"
log_checkpoints: "on"
archive_mode: "on"
archive_timeout: 1800s
archive_command: /usr/bin/pgbackrest --stanza=ats archive-push %p
recovery_conf:
restore_command: /usr/bin/pgbackrest --stanza=ats archive-get %f "%p"
pg_hba:
- host replication postgres 192.168.122.30/24 trust
- host all all 192.168.122.30/24 trust
slardiere PATR – mars 2019 11 / 28
Exemple
method: pgbackrest
pgbackrest:
command: /usr/local/bin/pgbackrest.sh
keep_existing_recovery_conf: False
recovery_conf:
recovery_target_action: promote
recovery_target_timeline: latest
restore_command: /usr/bin/pgbackrest --stanza=ats archive-get %f "%p"
initdb:
- encoding: UTF8
- data-checksums
slardiere PATR – mars 2019 12 / 28
Exemple
postgresql:
listen: 192.168.122.30:5432
connect_address: 192.168.122.30:5432
data_dir: /var/lib/postgresql/ats
bin_dir: /usr/lib/postgresql/11/bin
authentication:
replication:
username: postgres
password: azerty
superuser:
username: postgres
password: azerty
parameters:
work_mem: "16MB"
slardiere PATR – mars 2019 13 / 28
Exemple
create_replica_methods:
- pgbackrest
- basebackup
pgbackrest:
command: /usr/bin/pgbackrest --stanza=ats --delta restore
no_params: True
no_master: 1
keep_data: True
basebackup:
max-rate: ’100M’
slardiere PATR – mars 2019 14 / 28
Exemple
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
slardiere PATR – mars 2019 15 / 28
COMMENT FAIT-ON ?
slardiere PATR – mars 2019 16 / 28
Opérations :
démarrage du DCS
démarrage du démon Patroni
contrôle avec patroni list
slardiere PATR – mars 2019 17 / 28
Opérations avec patronictl :
Mode maintenance : pause/resume
restart des instances PostgreSQL
reinit des instances PostgreSQL
switchover, failover des instances
autres : dsn, query, show-config
slardiere PATR – mars 2019 18 / 28
Patronictl
Exemple
postgres@pg10_03:~$ /usr/bin/patronictl -c /etc/patroni/config.yml list
+---------+--------+-----------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+-----------------+--------+---------+----+-----------+
| ats | pg1 | 192.168.122.30 | | running | 47 | 0 |
| ats | pg2 | 192.168.122.102 | Leader | running | 47 | 0 |
| ats | pg3 | 192.168.122.32 | | running | 47 | 0 |
| ats | pg4 | 192.168.122.61 | | running | 47 | 0 |
| ats | pg5 | 192.168.122.8 | | running | 47 | 0 |
+---------+--------+-----------------+--------+---------+----+-----------+
slardiere PATR – mars 2019 19 / 28
Patronictl
Exemple
postgres@pg10_03:~$ /usr/bin/patronictl -c /etc/patroni/config.yml switchover
Master [pg2]:
Candidate [’pg1’, ’pg3’, ’pg4’, ’pg5’] []: pg1
When should the switchover take place (e.g. 2015-10-01T14:30) [now]:
Current cluster topology
+---------+--------+-----------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+-----------------+--------+---------+----+-----------+
| ats | pg1 | 192.168.122.30 | | running | 47 | 0 |
| ats | pg2 | 192.168.122.102 | Leader | running | 47 | 0 |
| ats | pg3 | 192.168.122.32 | | running | 47 | 0 |
| ats | pg4 | 192.168.122.61 | | running | 47 | 0 |
| ats | pg5 | 192.168.122.8 | | running | 47 | 0 |
+---------+--------+-----------------+--------+---------+----+-----------+
slardiere PATR – mars 2019 20 / 28
Patronictl
Exemple
Are you sure you want to switchover cluster ats, demoting current master pg2? [y/N]: y
2019-03-19 17:29:45.45659 Successfully switched over to "pg1"
+---------+--------+-----------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+-----------------+--------+---------+----+-----------+
| ats | pg1 | 192.168.122.30 | Leader | running | 47 | |
| ats | pg2 | 192.168.122.102 | | stopped | | unknown |
| ats | pg3 | 192.168.122.32 | | running | 47 | 0 |
| ats | pg4 | 192.168.122.61 | | running | 47 | 0 |
| ats | pg5 | 192.168.122.8 | | running | 47 | 0 |
+---------+--------+-----------------+--------+---------+----+-----------+
slardiere PATR – mars 2019 21 / 28
Patronictl
Exemple
postgres@pg10_03:~$ /usr/bin/patronictl -c /etc/patroni/config.yml list
+---------+--------+-----------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+-----------------+--------+---------+----+-----------+
| ats | pg1 | 192.168.122.30 | Leader | running | 48 | 0 |
| ats | pg2 | 192.168.122.102 | | running | 48 | 0 |
| ats | pg3 | 192.168.122.32 | | running | 48 | 0 |
| ats | pg4 | 192.168.122.61 | | running | 48 | 0 |
| ats | pg5 | 192.168.122.8 | | running | 48 | 0 |
+---------+--------+-----------------+--------+---------+----+-----------+
slardiere PATR – mars 2019 22 / 28
COMMENT S’EN SERT-ON ?
slardiere PATR – mars 2019 23 / 28
Connexions :
Template HAproxy et PgBouncer
confd crée les configs
en utilisant le DCS
HAProxy utilise l’API REST pour trouver le primary
l’application se connecte à HAProxy
slardiere PATR – mars 2019 24 / 28
/etc/haproxy/haproxy.cfg
Exemple
listen master
bind *:5000
option httpchk OPTIONS /master
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg1 192.168.122.30:5432 maxconn 100 check port 8008
server pg2 192.168.122.102:5432 maxconn 100 check port 8008
server pg3 192.168.122.32:5432 maxconn 100 check port 8008
server pg4 192.168.122.61:5432 maxconn 100 check port 8008
server pg5 192.168.122.8:5432 maxconn 100 check port 8008
slardiere PATR – mars 2019 25 / 28
Sauvegardes :
Intégration avec pgBackRest
suit automatiquement les bascules
sert à l’initialisation
slardiere PATR – mars 2019 26 / 28
/etc/pgbackrest/conf.d/ats.conf
Exemple
[ats]
pg1-path=/var/lib/postgresql/ats
pg1-host=192.168.122.30
pg1-host-user=postgres
pg2-path=/var/lib/postgresql/ats
pg2-host=192.168.122.102
pg2-host-user=postgres
pg3-path=/var/lib/postgresql/ats
pg3-host=192.168.122.32
pg3-host-user=postgres
pg4-path=/var/lib/postgresql/ats
pg4-host=192.168.122.61
pg4-host-user=postgres
pg5-path=/var/lib/postgresql/ats
pg5-host=192.168.122.8
pg5-host-user=postgres
slardiere PATR – mars 2019 27 / 28
Questions?
slardiere PATR – mars 2019 28 / 28

PostgreSQL - Haute disponibilité avec Patroni

  • 1.
  • 2.
  • 3.
    Haute Disponibilité Gestion dela haute disponibilité : PostgreSQL : priorité aux données : pas de bascule automatique si besoin de priorité du service : bascule automatique outils : PAF (Pacemaker et Corosync) Repmgr (ad-hoc) Patroni (DCS Raft) de Zalando slardiere PATR – mars 2019 3 / 28
  • 4.
    QU’EST-CE QUE C’EST? slardiere PATR – mars 2019 4 / 28
  • 5.
    Quoi? Gestion de lahaute disponibilité du service PostgreSQL : bascule automatique : failover contrôle de la bascule : switchover s’appuie sur un DCS : etcd, consul, zookeeper et même kubernetes consensus RAFT : http://thesecretlivesofdata.com/raft/ intégration avec les sauvegardes physiques (pgbackrest) Source : https://github.com/zalando/patroni Doc : https://patroni.readthedocs.io/en/latest/ Ansible : https://github.com/IrisNetwork/ansible-patroni slardiere PATR – mars 2019 5 / 28
  • 6.
    COMMENT ÇA MARCHE? slardiere PATR – mars 2019 6 / 28
  • 7.
    Comment? Composants : Cluster deDCS : un nœud par instance PostgreSQL démon (python) contrôlant l’instance PostgreSQL et son merveilleux fichier de configuration en YAML incluant une API REST une commande patronictl un mécanisme de sauvegarde physique, idéalement PgBackRest slardiere PATR – mars 2019 7 / 28
  • 8.
    Clients HAproxy Backups PG1 PG2 PG3 PG4PG5 DCS1 DCS2 DCS3 DCS4 DCS5 Patroni 1 Patroni 2 Patroni 3 Patroni 4 Patroni 5 Connect Ask Connect Connect Backup Restore slardiere PATR – mars 2019 8 / 28
  • 9.
    Comment? Configuration de Patroni: syntaxe YAML bootstrap : création des données de PostgreSQL : initdb restauration depuis PgBackRest configuration de l’instance PostgreSQL bootstrap des replicas copie du primary avec pg_basebackup restauration depuis PgBackRest slardiere PATR – mars 2019 9 / 28
  • 10.
    YAML Exemple scope: ats name: pg1 restapi: listen:192.168.122.30:8008 connect_address: 192.168.122.30:8008 etcd: host: 192.168.122.30:2379 slardiere PATR – mars 2019 10 / 28
  • 11.
    Exemple bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout:10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: false use_slots: true parameters: log_destination: "syslog" log_checkpoints: "on" archive_mode: "on" archive_timeout: 1800s archive_command: /usr/bin/pgbackrest --stanza=ats archive-push %p recovery_conf: restore_command: /usr/bin/pgbackrest --stanza=ats archive-get %f "%p" pg_hba: - host replication postgres 192.168.122.30/24 trust - host all all 192.168.122.30/24 trust slardiere PATR – mars 2019 11 / 28
  • 12.
    Exemple method: pgbackrest pgbackrest: command: /usr/local/bin/pgbackrest.sh keep_existing_recovery_conf:False recovery_conf: recovery_target_action: promote recovery_target_timeline: latest restore_command: /usr/bin/pgbackrest --stanza=ats archive-get %f "%p" initdb: - encoding: UTF8 - data-checksums slardiere PATR – mars 2019 12 / 28
  • 13.
    Exemple postgresql: listen: 192.168.122.30:5432 connect_address: 192.168.122.30:5432 data_dir:/var/lib/postgresql/ats bin_dir: /usr/lib/postgresql/11/bin authentication: replication: username: postgres password: azerty superuser: username: postgres password: azerty parameters: work_mem: "16MB" slardiere PATR – mars 2019 13 / 28
  • 14.
    Exemple create_replica_methods: - pgbackrest - basebackup pgbackrest: command:/usr/bin/pgbackrest --stanza=ats --delta restore no_params: True no_master: 1 keep_data: True basebackup: max-rate: ’100M’ slardiere PATR – mars 2019 14 / 28
  • 15.
    Exemple tags: nofailover: false noloadbalance: false clonefrom:false nosync: false slardiere PATR – mars 2019 15 / 28
  • 16.
    COMMENT FAIT-ON ? slardierePATR – mars 2019 16 / 28
  • 17.
    Opérations : démarrage duDCS démarrage du démon Patroni contrôle avec patroni list slardiere PATR – mars 2019 17 / 28
  • 18.
    Opérations avec patronictl: Mode maintenance : pause/resume restart des instances PostgreSQL reinit des instances PostgreSQL switchover, failover des instances autres : dsn, query, show-config slardiere PATR – mars 2019 18 / 28
  • 19.
    Patronictl Exemple postgres@pg10_03:~$ /usr/bin/patronictl -c/etc/patroni/config.yml list +---------+--------+-----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+--------+-----------------+--------+---------+----+-----------+ | ats | pg1 | 192.168.122.30 | | running | 47 | 0 | | ats | pg2 | 192.168.122.102 | Leader | running | 47 | 0 | | ats | pg3 | 192.168.122.32 | | running | 47 | 0 | | ats | pg4 | 192.168.122.61 | | running | 47 | 0 | | ats | pg5 | 192.168.122.8 | | running | 47 | 0 | +---------+--------+-----------------+--------+---------+----+-----------+ slardiere PATR – mars 2019 19 / 28
  • 20.
    Patronictl Exemple postgres@pg10_03:~$ /usr/bin/patronictl -c/etc/patroni/config.yml switchover Master [pg2]: Candidate [’pg1’, ’pg3’, ’pg4’, ’pg5’] []: pg1 When should the switchover take place (e.g. 2015-10-01T14:30) [now]: Current cluster topology +---------+--------+-----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+--------+-----------------+--------+---------+----+-----------+ | ats | pg1 | 192.168.122.30 | | running | 47 | 0 | | ats | pg2 | 192.168.122.102 | Leader | running | 47 | 0 | | ats | pg3 | 192.168.122.32 | | running | 47 | 0 | | ats | pg4 | 192.168.122.61 | | running | 47 | 0 | | ats | pg5 | 192.168.122.8 | | running | 47 | 0 | +---------+--------+-----------------+--------+---------+----+-----------+ slardiere PATR – mars 2019 20 / 28
  • 21.
    Patronictl Exemple Are you sureyou want to switchover cluster ats, demoting current master pg2? [y/N]: y 2019-03-19 17:29:45.45659 Successfully switched over to "pg1" +---------+--------+-----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+--------+-----------------+--------+---------+----+-----------+ | ats | pg1 | 192.168.122.30 | Leader | running | 47 | | | ats | pg2 | 192.168.122.102 | | stopped | | unknown | | ats | pg3 | 192.168.122.32 | | running | 47 | 0 | | ats | pg4 | 192.168.122.61 | | running | 47 | 0 | | ats | pg5 | 192.168.122.8 | | running | 47 | 0 | +---------+--------+-----------------+--------+---------+----+-----------+ slardiere PATR – mars 2019 21 / 28
  • 22.
    Patronictl Exemple postgres@pg10_03:~$ /usr/bin/patronictl -c/etc/patroni/config.yml list +---------+--------+-----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+--------+-----------------+--------+---------+----+-----------+ | ats | pg1 | 192.168.122.30 | Leader | running | 48 | 0 | | ats | pg2 | 192.168.122.102 | | running | 48 | 0 | | ats | pg3 | 192.168.122.32 | | running | 48 | 0 | | ats | pg4 | 192.168.122.61 | | running | 48 | 0 | | ats | pg5 | 192.168.122.8 | | running | 48 | 0 | +---------+--------+-----------------+--------+---------+----+-----------+ slardiere PATR – mars 2019 22 / 28
  • 23.
    COMMENT S’EN SERT-ON? slardiere PATR – mars 2019 23 / 28
  • 24.
    Connexions : Template HAproxyet PgBouncer confd crée les configs en utilisant le DCS HAProxy utilise l’API REST pour trouver le primary l’application se connecte à HAProxy slardiere PATR – mars 2019 24 / 28
  • 25.
    /etc/haproxy/haproxy.cfg Exemple listen master bind *:5000 optionhttpchk OPTIONS /master http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server pg1 192.168.122.30:5432 maxconn 100 check port 8008 server pg2 192.168.122.102:5432 maxconn 100 check port 8008 server pg3 192.168.122.32:5432 maxconn 100 check port 8008 server pg4 192.168.122.61:5432 maxconn 100 check port 8008 server pg5 192.168.122.8:5432 maxconn 100 check port 8008 slardiere PATR – mars 2019 25 / 28
  • 26.
    Sauvegardes : Intégration avecpgBackRest suit automatiquement les bascules sert à l’initialisation slardiere PATR – mars 2019 26 / 28
  • 27.
  • 28.