Harder, better, faster, stronger.
HA with RelStorage and Postgres
abstract @ PLOG 2014simone.deponti@abstract.it /
HA?
From the not-very-realiable Wikipedia page
about it:
A = Ut / Tt
Where Ut is the uptime and Tt the total time
Three rules of HA
1. Eliminate single points of failure
2. Have a reliable failover
3. Detect failures as they occur
It’s a short way to HA
Two elephants in a cluster
1. Use PostgreSQL’s native Streaming
Replication
2. Means explicit master/slave roles at any
gi...
repmgr
1. Developed by phoenicians 2ndQuadrant
2. Works in addition to streaming replication
3. Acts as watchdog and can t...
How it works
1. Continuously and compulsively checks
twitter that the master is alive
2. If the master is unreachable for ...
repmgr’s gotchas
1. Create a database to store replication info
(do not follow the bad example of the
documentation)
2. Th...
Fail scenarios
1. One node has a catastrophic failure
(easy)
2. There is a total network outage (when the
network goes up ...
Our work is never over
1. Always notify when a failure is detected
2. Investigate ASAP, even if automatic
action was taken...
RelStorage and PostgreSQL
It has a smal issue with connections IDLE IN
TRANSACTION.
Check with:
SELECT datname, usename,
q...
Simone Deponti
simone.deponti@abstract.it
Upcoming SlideShare
Loading in...5
×

HA with RelStorage and Postgres

777

Published on

How to improve the availability of your Plone site using RelStorage and PostgreSQL, with the help of repmgr.

A brief introduction to HA is given, before dwelling deep into the setup of RelStorage and PostgreSQL, the use of repmgr, and how to avoid common pitfalls and unexpected traps.

Published in: Software, Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
777
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
11
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

HA with RelStorage and Postgres

  1. 1. Harder, better, faster, stronger. HA with RelStorage and Postgres abstract @ PLOG 2014simone.deponti@abstract.it /
  2. 2. HA? From the not-very-realiable Wikipedia page about it: A = Ut / Tt Where Ut is the uptime and Tt the total time
  3. 3. Three rules of HA 1. Eliminate single points of failure 2. Have a reliable failover 3. Detect failures as they occur
  4. 4. It’s a short way to HA
  5. 5. Two elephants in a cluster 1. Use PostgreSQL’s native Streaming Replication 2. Means explicit master/slave roles at any given time 3. Manual failover (slave promotion) or use repmgr
  6. 6. repmgr 1. Developed by phoenicians 2ndQuadrant 2. Works in addition to streaming replication 3. Acts as watchdog and can take automatic actions (run bash scripts) 4. You run it on the slave node (https://github.com/2ndQuadrant/repmgr)
  7. 7. How it works 1. Continuously and compulsively checks twitter that the master is alive 2. If the master is unreachable for more than N seconds, runs a bash script It also offers convenient command line tools to check status of cluster, promote nodes, syncronize.
  8. 8. repmgr’s gotchas 1. Create a database to store replication info (do not follow the bad example of the documentation) 2. The suggested wal_keep_segments setting is too high, will use up to 78GB, can be reduced with -w option 3. Use custom promote and follow scripts 4. Launch daemon with with --monitoring- history
  9. 9. Fail scenarios 1. One node has a catastrophic failure (easy) 2. There is a total network outage (when the network goes up again, you have a split brain) 3. There is network partitioning (similar to above, can be worst) repmgr and streaming replication do no perform too well in cases 2 and 3
  10. 10. Our work is never over 1. Always notify when a failure is detected 2. Investigate ASAP, even if automatic action was taken 3. Have the slave try to exclude the master upon promotion Example of #3: The slave upon promotion contacts all the clients and tells them to avoid talking to master
  11. 11. RelStorage and PostgreSQL It has a smal issue with connections IDLE IN TRANSACTION. Check with: SELECT datname, usename, query_start, state_change, state, query FROM pg_stat_activity; It’s not fatal, but might result in locks during backups.
  12. 12. Simone Deponti simone.deponti@abstract.it
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×