Fail over fail_back

1,626
-1

Published on

Talk from pgDay NYC about how to plan and manage failover in a PostgreSQL replication system.

Published in: Internet
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,626
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
66
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Fail over fail_back

  1. 1. Fail Fail over back Josh Berkus PostgreSQL Experts Inc. NYC PgDay 2014
  2. 2. mozilla logo is a trademark of the Mozilla corporation. Used here under fair use.
  3. 3. 2 servers 1 command
  4. 4. admin executes failover connect to master? no what error? shutdown master no response other erroryes fail to shutdown BRINGUP standby success standby is standing by? yes no
  5. 5. Automated Failover
  6. 6. image from libarynth.com. used under creative commons share-alike
  7. 7. www.handyrep.org
  8. 8. Fail over
  9. 9. Goals
  10. 10. 1. Minimize Downtime
  11. 11. 2. Minimize data loss
  12. 12. 3. Don't make it worse! ?
  13. 13. Planned vs. Emergency
  14. 14. Failover once a quarter ● Postgres updates ● Kernel updates ● Disaster Recovery drills ● Just for fun!
  15. 15. Automated or Not? ● < 1hr ● false failover ● testing testing testing ● complex SW ● >= 1hr ● 2am call ● training ● simple script
  16. 16. sysadmin > software
  17. 17. failover in 3 parts (1) Detecting Failure(2) Failing Over DB (3) Failing Over App
  18. 18. 1. Detecting Failure
  19. 19. can't connect to master could not connect to server: Connection refused Is the server running on host "192.168.0.1" and accepting TCP/IP connections on port 5432?
  20. 20. can't connect to master ● down? ● too busy? ● network problem? ● configuration error?
  21. 21. can't connect to master ● down? › failover ● too busy? › don't fail over
  22. 22. pg_isready pg_isready -h 192.168.1.150 -p 6433 -t 15 192.168.1.150:6433 - accepting connections
  23. 23. pg_isready 0 == running and accepting connections (even if too busy) 1 == running but rejecting connections (security settings) 2 == not responding (down?)
  24. 24. more checks can ssh? master is down; failover no postgres processes on master? yes exit with error yes attempt restart no master is OK; no failover succeed fail
  25. 25. check replica pg_isready? OK to failover yes exit with error no is replica?yes no
  26. 26. some rules ● don't just ping 5432 ● misconfiguration > downtime ● tradeoff: › confidence › time to failover
  27. 27. failover time master poll fail: ssh master: attempt restart: verify replica: failover: 1 – 10 1 – 10 3 – 15 1 – 5 3 – 20 9 – 60
  28. 28. AppServer One AppServer Two PARTITION
  29. 29. AppServer One AppServer Two PARTITION
  30. 30. AppServer One AppServer Two Broker
  31. 31. AppServer One AppServer Two Proxy
  32. 32. Failing Over the DB
  33. 33. Failing Over the DB 1. choose a replica target 2. shutdown the master 3. promote the replica 4. verify the replica 5. remaster other replicas
  34. 34. Choosing a replica A. One replica B. Designated replica C. Furthest ahead replica
  35. 35. One Replica fail over to it or don't well, that's easy
  36. 36. Designated Replica ● load-free replica, or ● cascade master, or ● syncronous replica
  37. 37. “Furthest Ahead” ● Pool of replicas ● Least data loss ● Least downtime ● Other replicas can remaster … but what's “furthest ahead”?
  38. 38. receive vs. replay ● receive == data it has ● replay == data it applied
  39. 39. receive vs. replay ● receive == data it has › “furthest ahead” ● replay == data it applied › “most caught up”
  40. 40. receive vs. replay “get the furthest ahead, but not more than 2 hours behind on replay”
  41. 41. receive vs. replay “get the furthest ahead, but not more than 1GB behind on replay”
  42. 42. timestamp? pg_last_xact_replay_timestamp() ● last transaction commit ● not last data ● same timestamp, different receive positions
  43. 43. Position? pg_xlog_location_diff() ● compare two XLOG locations ● byte position ● comparable granularly
  44. 44. Position? select pg_xlog_location_diff( pg_current_xlog_location(), '0/0000000'); --------------- 701505732608
  45. 45. Position? ● rep1: 701505732608 ● rep2: 701505737072 ● rep3: 701312124416
  46. 46. Replay? ● more replay == slower promotion ● figure out max. acceptable ● “sacrifice” the delayed replica
  47. 47. Replay? SELECT pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location() ); --------------- 1232132
  48. 48. Replay? SELECT pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location() ); --------------- 4294967296
  49. 49. master shutdown ● STONITH ● make sure master can't restart ● or can't be reached
  50. 50. Terminate or Isolate
  51. 51. promotion pg_ctl promote ● make sure it worked ● may have to wait › how long?
  52. 52. Remastering
  53. 53. remastering pre-9.3 ● all replicas are set to: recovery_target_timeline = 'latest' ● change primary_conninfo to new master ● all must pull from common archive ● restart replicas
  54. 54. remastering pre-9.3
  55. 55. remastering pre-9.3
  56. 56. remastering pre-9.3
  57. 57. remastering pre-9.3
  58. 58. remastering post-9.3
  59. 59. remastering post-9.3
  60. 60. remastering post-9.3
  61. 61. remastering post-9.3 ● all replicas are set to: recovery_target_timeline = 'latest' ● change primary_conninfo to new master ● restart replicas
  62. 62. restart problem ● must restart to remaster › not likely to change soon ● break connections vs. fall behind
  63. 63. 3. Application Failover
  64. 64. 3. Application Failover ● old master → new master for read-write ● old replicas → new replicas for load balancing ● fast: prevent split-brain
  65. 65. CMS method 1. update Configuration Management System 2. push change to all application servers
  66. 66. CMS method ● slow ● asynchronous ● hard to confirm 100% complete ● network split?
  67. 67. zookeeper method 1. write new connection config to zookeeper 2. application servers pull connection info from zookeeper
  68. 68. zookeeper method ● asynchronous › or poor response time ● delay to verify ● network split?
  69. 69. Pacemaker method 1. master has virtual IP 2. applications connect to VIP 3. Pacemaker reassigns VIP on fail
  70. 70. Pacemaker advantages ● 2-node solution (mostly) ● synchronous ● fast ● absolute isolation
  71. 71. Pacemaker drawbacks ● really hard to configure ● poor integration with load- balancing ● automated failure detection too simple › can't be disabled
  72. 72. proxy method 1. application servers connect to db via proxies 2. change proxy config 3. restart/reload proxies
  73. 73. AppServer One AppServer Two Proxy
  74. 74. AppServer One AppServer Two Proxy
  75. 75. proxies ● pgBouncer ● pgPool ● HAProxy ● Zeus, BigIP, Cisco ● FEMEBE
  76. 76. Failback
  77. 77. what? ● after failover, make the old master the master again
  78. 78. why? ● old master is better machine? ● some server locations hardcoded? ● doing maintenance on both servers?
  79. 79. why not? ● bad infrastructure design? ● takes a while? ● need to verify old master? ● just spin up a new instance?
  80. 80. pg_basebackup
  81. 81. pg_basebackup
  82. 82. rsync ● reduce time/data for old master recopy ● doesn't work as well as you'd expect › hint bits
  83. 83. pg_rewind ++ ● use XLOG + data files for rsync ● super fast master resync
  84. 84. pg_rewind -- ● not yet stable ● need to have all XLOGs › doesn't yet support archives ● need checksums › or 9.4's wal_log_hints
  85. 85. Automated Failback
  86. 86. www.handyrep.org Fork it on Github!
  87. 87. Questions? ● github.com/pgexperts/HandyRep › fork it! ● Josh Berkus: josh@pgexperts.com › PGX: www.pgexperts.com › Blog: www.databasesoup.com Copyright 2014 PostgreSQL Experts Inc. Released under the Creative Commons Share-Alike 3.0 License. All images, logos and trademarks are the property of their respective owners and are used under principles of fair use unless otherwise noted.

×