Deploying Maximum HA Architecture With PostgreSQL

4,714 views

Published on

Deploying Maximum HA Architecture With PostgreSQL

  1. 1. Deploying Maximum HA architecture with PostgreSQL / Denish Patel Database Architect
  2. 2. Who am I ? • Denish Patel • Database Architect with OmniTI for more than 5 years • Expertise in PostgreSQL , Oracle, MySQL, MS SQL Server • Contact : denish@omniti.com • Blog: http://denishjpatel.blogspot.com/ • Providing Solutions for business problems to deliver • Scalability • Reliability • High Availability We are hiring!! • Consistency Apply @ l42.org/lg • Security 1
  3. 3. Agendum • Why do you need HA architecture ? • Why PostgreSQL ? • Traditional HA Architecture • Goals for Maximum HA • Maximum HA Solution 2
  4. 4. Assumptions • Consistency and Availability Matters (CAP theorem) • Good to reduce MTTF but you have “real” control on MTTR. 3
  5. 5. Why do you need HA architecture? Application Unavailability of Downtime Data Loss of productivity Loss of Revenue Dissatisfied Customers 4
  6. 6. Why do you need HA architecture? System Unplanned Failures Outages Data Prevent Failures Tolerate System Recover Fast Planned Changes Outages Data Changes 5
  7. 7. Why PostreSQL ? • Best protection at Lowest Cost • No additional software costs for providing maximum Availability compared to closed source databases • Provide free feature sets to prevent outages, tolerate them and recover fast. 6
  8. 8. Traditional HA Architecture Master Standby Database Database Copy WAL files WAL WAL PostgreSQL 8 7
  9. 9. Traditional HA Architecture Master Hot Standby Database Database Steaming Replication Copy WAL files WAL WAL PostgreSQL 9 8
  10. 10. Goals for Maximum HA Architecture • 99.99% Uptime of application • Reduce MTTR • Planned outages • Unplanned outages 9
  11. 11. Plan to reduce MTTR • How do you manage failover ? • Is it transparent to your application? • Hot Backups/ Dumps • Are you running on production server? • Schema backups • How often? Are they under revision control ? • WAL files copy scripts • Do all of your prod servers using same copy of the script ? • Where is your reporting queries pointing to ? • Production DB? 10
  12. 12. System Failures Server Node Fails Storage Fails System Failures Site Fails Unplanned Outages 11
  13. 13. Handle System Failures inet Floating IP/ VIP App Server Master Failover 12
  14. 14. Site Failures Server Node Fails Storage Fails System Failures Site Fails Unplanned Outages 13
  15. 15. Handle Site Failures Offsite Bkp inet Floating IP/ WAL VIP apply App Server Ship WAL Files Master Failover SRHS 14
  16. 16. Data FailuresUnplanned Outages Human Error Data Failures Data Corruption 15
  17. 17. Handle Data Failures • PITR slave lag using OMNIpitr • 1 hour lag on wal apply • Periodic pg_dump tables from slave • Run pg_extractor • https://github.com/omniti- labs/pg_extractor • Track schema changes into subversion/git 16
  18. 18. Data CorruptionUnplanned Outages Human Error Data Failures Data Corruption 17
  19. 19. Handle Data Corruption • File System level backups • Backups on Slave database using OMNIpitr • Regular recovery testing • Snapshot backups for faster recovery • Solaris ZFS is recommended! • Monthly pg_dump backups • Backups on slave 18
  20. 20. System Changes OS Upgrade Database Upgrade System Changes Network Changes Planned Outages 19
  21. 21. Handle OS Upgrades Floating IP Master SRHS Master SRHS Failover Read WAL Slave 1 Copy NAS 20
  22. 22. Handle OS Upgrades Floating IP Master Upgrade OS SRHS Master SRHS New Read WAL Master Slave 1 Copy NAS 21
  23. 23. Handle OS Upgrades Floating IP Master SRHS New SRHS Failover New Read WAL Master Slave 1 Copy NAS 22
  24. 24. System Changes OS Upgrade Database Upgrade System Changes Network Changes Planned Outages 23
  25. 25. Handle Database Upgrade Yes No PG 8.3+ ? Outage acceptable ? Outage No acceptable? Yes pg_upgrad No Yes e –check pass? Third Yes party Rep pg_dump No i.e Slony pg_restore Drop incompatible tables before upgrade and pg_upgrade restore after * Only showing recommended options 24
  26. 26. Handle Data Changes Planned Outages Alter Schemas Data Changes Data growth 25
  27. 27. Handle Alter schemas • Transactional DDL • CREATE or REPLACE views • NOT VALID • Checks • FKs • Add column without scanning entire table • NULLABLE • No Default 26
  28. 28. Handle Data Changes Planned Outages Alter Schemas Data Changes Data growth 27
  29. 29. Handle Data Growth PostgreSQL Bloat removal • Offline • VACUUM FULL • CLUSTER • Online • Rebuild index CONCURRENTLY • Rebuild table online using pg_reorghttp://denishjpatel.blogspot.com/2011/03/extreme-training-session-at-pgeast-p90x.html 28
  30. 30. Now we have …. 9 PITR Floating IP pg_extractor pg_reorg 29
  31. 31. Maximum HA ArchitectureApp Floating IP Master LB SRHS Bkp Failover Master SRHS Read WALSlave 2 apply NAS Read Bkp Salve 1 30
  32. 32. References • PostgreSQL Documentations • http://www.postgresql.org/docs/ • OmniTI Labs • https://labs.omniti.com/ • OMNIpitr • pg_extractor 30
  33. 33. Thanks • PG Day NYC Conference Committee • OmniTI • You!! 31
  34. 34. Questions? 32

×