Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Patroni: Kubernetes-native PostgreSQL companion


Published on

Kubernetes is a solid leader among different cloud orchestration engines and its adoption rate is growing on a daily basis. Naturally people want to run both their applications and databases on the same infrastructure.

There are a lot of ways to deploy and run PostgreSQL on Kubernetes, but most of them are not cloud-native. Around one year ago Zalando started to run HA setup of PostgreSQL on Kubernetes managed by Patroni. Those experiments were quite successful and produced a Helm chart for Patroni. That chart was useful, albeit a single problem: Patroni depended on Etcd, ZooKeeper or Consul.

Few people look forward to deploy two applications instead of one and support them later on. In this talk I would like to introduce Kubernetes-native Patroni. I will explain how Patroni uses Kubernetes API to run a leader election and store the cluster state. I’m going to live-demo a deployment of HA PostgreSQL cluster on Minikube and share our own experience of running more than 130 clusters on Kubernetes.

Patroni is a Python open-source project developed by Zalando in cooperation with other contributors on GitHub:

How to do a LIVE-demo with minikube:
1. git clone
2. cd patroni
3. git checkout feature/demo
4. cd kubernetes
5. open and edit line #4 (specify the minikube context )
6. docker build -t patroni .
7. may be docker push patroni
8. may be edit patroni_k8s.yaml line #22 and put the name of patroni image you build there
9. install tmux
10. run tmux in one terminal
11. run bash in another terminal and press Enter from time to time

Published in: Technology
  • Be the first to comment

Patroni: Kubernetes-native PostgreSQL companion

  1. 1. Patroni: Kubernetes-native PostgreSQL companion PGConf APAC 2018 Singapore ALEXANDER KUKUSHKIN 23-03-2018
  2. 2. 2 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: Twitter: @cyberdemn
  3. 3. 3 ZALANDO 15 markets 6 fulfillment centers 20 million active customers 3.6 billion € net sales 2016 165 million visits per month 12,000 employees in Europe
  4. 4. 4 FACTS & FIGURES > 300 databases on premise > 150 on AWS EC2 > 200 on K8S
  5. 5. 5 Bot pattern and Patroni Postgres-operator Patroni on Kubernetes, first attempt Kubernetes-native Patroni Live-demo AGENDA
  6. 6. 6 ● small python daemon ● implements “bot” pattern ● runs next to PostgreSQL ● decides on promotion/demotion ● uses DCS to run leader election and keep cluster state Bot pattern and Patroni
  7. 7. 7 ● Distributed Consensus/Configuration Store (Key-Value) ● Uses RAFT (Etcd, Consul) or ZAB (ZooKeeper) ● Write succeed only if majority of nodes acknowledge it (quorum) ● Supports Atomic operations (CompareAndSet) ● Can expire objects after TTL DCS
  8. 8. 8 Bot pattern: leader alive Primary NODE A Standby NODE B Standby NODE C UPDATE(“/leader”, “A”, ttl=30, prevValue=”A”)Success WATCH (/leader) WATCH (/leader) /leader: “A”, ttl: 30
  9. 9. 9 Bot pattern: master dies, leader key holds Primary Standby Standby WATCH (/leader) WATCH (/leader) /leader: “A”, ttl: 17 NODE A NODE B NODE C
  10. 10. 10 Bot pattern: leader key expires Standby Standby Notify (/leader, expired=true) Notify (/leader, expired=true) /leader: “A”, ttl: 0 NODE B NODE C
  11. 11. 11 Bot pattern: who will be the next master? Standby Standby Node B: GET A:8008/patroni -> failed/timeout GET C:8008/patroni -> wal_position: 100 Node C: GET A:8008/patroni -> failed/timeout GET B:8008/patroni -> wal_position: 100 NODE B NODE C
  12. 12. 12 Bot pattern: leader race among equals Standby Standby /leader: “C”, ttl: 30 CREATE (“/leader”, “C”, ttl=30, prevExists=False) CREATE (“/leader”, “B”, ttl=30, prevExists=False) FAIL SUCCESS NODE B NODE C
  13. 13. 13 Bot pattern: promote and continue replication Standby Primary /leader: “C”, ttl: 30WATCH(/leader ) promote NODE B NODE C
  14. 14. 14 DCS STRUCTURE ● /service/cluster-name/ ○ config {"postgresql":{"parameters":{"max_connections":300}}} ○ initialize ”6303731710761975832” (database system identifier) ○ members/ ■ dbnode1 {"role":"replica","state":"running”,"conn_url":"postgres://"} ■ dbnode2 {"role":"master","state":"running”,"conn_url":"postgres://"} ○ leader dbnode2 ○ optime/ ■ leader “67393608” # ← absolute wal positition
  15. 15. 15 AWS DEPLOYMENT
  16. 16. 16 “Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units (Pods) for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community.” KUBERNETES
  17. 17. 17 Spilo & Patroni on K8S v1 Node Pod: demo-0 role: replica PersistentVolume PersistentVolume Node Pod: demo-1 role: master StatefulSet: demo Secret: demoUPDATE() WATCH() Service: demo-replica labelSelector: role=replica Service: demo labelSelector: role=master
  18. 18. 18 Spilo & Patroni on K8S v1 ● We will deploy Etcd on Kubernetes ● Depoy Spilo with PetSet (old name for StatefulSet) ● And quickly hack a callback script for Patroni, which will label the Pod we are running in with the current role (master, replica) ● And use Services with labelSelectors for traffic routing
  19. 19. 19 Can we get rid from Etcd? ● Use labelSelector to find all Kubernetes objects associated with the given cluster ○ Pods - cluster members ○ ConfigMaps or Endpoints to keep configuration ● Every iteration of HA loop we will update labels and metadata on the objects (the same way as we updating keys in Etcd) ● It is even possible to do CAS operation using K8S API
  20. 20. 20 No K8S API for expiring objects How to do leader election?
  21. 21. 21 Do it on the client side! ● Leader should periodically update ConfigMap or Endpoint ○ Update must happen as CAS operation ○ Demote to read-only in case of failure ● All other members should check that leader ConfigMap (or Endpoint) is being updated ○ If there are no updates during TTL => do leader election
  22. 22. 22 Kubernetes-native Patroni Node Pod: demo-0 role: replica PersistentVolume PersistentVolume Node Pod: demo-1 role: master StatefulSet: demo Endpoint: demo Service: demo Secret: demo UPDATE() W ATCH() Endpoint: demo-config Service: demo-replica labelSelector: role=replica
  23. 23. 23 DEMO TIME
  24. 24. 24 ● No dependency on Etcd ● When using Endpoint for leader election we can also maintain subsets with the IP of the leader Pod ● 100% Kubernetes-native solution Kubernetes API as DCS CONSPROS ● Can’t tolerante arbitrary clock skew rate ● OpenShift doesn’t allow to put IP from the Pods rage into the Endpoint ● SLA for K8S API on GCE prommiss only 99.5% availability
  25. 25. 25 DEPLOYMENT
  26. 26. 26 How to deploy it ● kubectl create -f your-cluster.yaml ● Use Patroni Helm Chart + Spilo ● Use postgres-operator
  27. 27. 27 POSTGRES-OPERATOR ● Creates CustomResourceDefinition Postgresql and watches it ● When new Postgresql object is created - deploys a new cluster ○ Creates Secrets, Endpoints, Services and StatefulSet ● When Postgresql object is updated - updates StatefulSet ○ and does a rolling upgrade ● Periodically syncs running clusters with the manifests ● When Postgresql object is deleted - cleans everything up
  29. 29. 29 CLUSTER STATUS
  30. 30. 30 PostgreSQL manifest Stateful set Spilo pod Kubernetes cluster PATRONI Postgres operator pod Endpoint Service Client application Postgres operator config mapCluster secrets Database deployer create create create watch deploy Update with actual master role
  31. 31. 31 Monitoring & Backups ● Things to monitor: ○ Pods status (via K8S API) ○ Patroni & PostgreSQL state ○ Replication state and lag ● Always do Backups! ○ And always test them! GET http://$POD_IP:8008/patroni for every Pod in the cluster, check that state=running and compare xlog_position with the master
  32. 32. 32 Our learnings ● We run Kubernetes on top of AWS infrastructure ○ Availability of K8S API in our case is very close to 100% ○ PersistentVolume (EBS) attach/detach sometimes buggy and slow ● Kubernetes cluster upgrade ○ Require rotating all nodes and can cause multiple switchovers ■ Thanks to postgres-operator it is solved, now we need only one ● Kubernetes node autoscaler ○ Sometimes terminates the nodes were Spilo/Patroni/PostgreSQL runs ■ Patroni handles it gracefully, by doing a switchover
  33. 33. 33 LINKS ● Patroni: ● Patroni Documentation: ● Spilo: ● Helm chart: ● Postgres-operator:
  34. 34. Thank you!