PostgreSQL Scaling And Failover

36,989 views
36,621 views

Published on

Overview of PostgreSQL scaling and high availability options.

Published in: Technology

PostgreSQL Scaling And Failover

  1. 1. PostgreSQL High Availability & Scaling John Paulett October 26, 2009
  2. 2. Overview Scaling Overview – Horizontal & Vertical Options High Availability Overview Other Options Suggested Architecture Hardware Discussion 10/26/2009 2
  3. 3. What are we trying to solve? Survive server failure? – Support an uptime SLA (e.g. 99.9999%)? Application scaling? – Support additional application demand 10/26/2009 3
  4. 4. What are we trying to solve? Survive server failure? – Support an uptime SLA (e.g. 99.9999%)? Application scaling? – Support additional application demand → Many options, each optimized for different constraints 10/26/2009 4
  5. 5. Scaling Overview 10/26/2009 5
  6. 6. How To Scale Horizontal Scaling – “Google” approach – Distribute load across multiple servers – Requires appropriate application architecture Vertical Scaling – “Big Iron” approach – Single, massive machine (lots of fast processors, RAM, & hard drives) 10/26/2009 6
  7. 7. Horizontal DB Scaling Load Balancing – Distribute operations to multiple servers Partitioning – Cut up the data (horizontal) or tables (vertical) and put them on separate servers – aka “sharding” 10/26/2009 7
  8. 8. Basic Problem when Load Balancing Difficult to maintain consistent state between servers (remember ACID), especially when dealing with writes 4 PostgreSQL Load Balancing Methods: – Master-Slave Replication – Statement-Based Replication Middleware – Asynchronous Multimaster Replication – Synchronous Multimaster Replication 10/26/2009 8
  9. 9. Master-Slave Replication Master handles writes, slaves handle reads Asynchronous replication – Possible data loss on master failure Slony-I – Does not automatically propagate schema changes – Does not offer single connection point – Requires separate solution for master failures 10/26/2009 9
  10. 10. Statement-Based Replication Middleware Intercept SQL queries, send writes to all servers, reads to any server Possible issues using random(), CURRENT_TIMESTAMP, & sequences pgpool-II – Connection Pooling, Replication, Load Balancing, Parallel Queries, Failover 10/26/2009 10
  11. 11. pgpool-II 10/26/2009 11
  12. 12. Synchronous Multimaster Replication Writes & reads on any server Not implemented in PostgreSQL, but application code can mimic via two-phase commit 10/26/2009 12
  13. 13. Load Balancing Issue Scaling writes breaks down at a certain point 10/26/2009 13
  14. 14. Partitioning Requires heavy application modification Performing queries across partitions is problematic (not possible) PL/Proxy can help 10/26/2009 14
  15. 15. Vertical DB Scaling “Buying a bigger box is quick(ish). Redesigning software is not.” ● Cal Henderson, Flickr 37 Signals Basecamp upgraded to 128 GB DB server: “don’t need to pay the complexity tax yet” ● David Heinemeier Hansson, Ruby on Rails 10/26/2009 15
  16. 16. Sites Running on Single DB StackOverflow – MS SQL, 48GB RAM, RAID 1 OS, RAID 10 for data 37Signals Basecamp – MySQL, 128GB RAM. Dell R710 or Dell 2950 10/26/2009 16
  17. 17. High Availability Overview 10/26/2009 17
  18. 18. High Availability Application still up even after node failure – (Also try to prevent failure with appropriate hardware) PostgreSQL High Availability Options – pg-pool – Shared Disk Failover – File System Replication – Warm Standby with Point-In-Time Recovery (PITR) Often still need heartbeat application 10/26/2009 18
  19. 19. Shared Disk Failover Use single disk array to hold database's data files. – Network Attached Storage (NAS) – Network File System (NFS) Disk array is central point of failure Need heartbeat to bring 2nd server online 10/26/2009 19
  20. 20. File System Replication File system is mirrored to another computer DRDB – Linux filesystem replication Need heartbeat to bring 2nd server online 10/26/2009 20
  21. 21. Point in Time Recovery “Log shipping” – Write Ahead Logs sent to and replayed on standby – Included in PostgreSQL 8.0+ – Asynchronous - Potential loss of data Warm Standby – Standbys' hardware very similar to primary's – Need heartbeat to bring 2nd server online 10/26/2009 21
  22. 22. Heartbeat “STONITH” (Shoot the Other Node In The Head) – Prevent multiple nodes thinking they are the master Linux-HA – Creates cluster, takes nodes out when they fail 10/26/2009 22
  23. 23. Additional Options 10/26/2009 23
  24. 24. Additional Options Tune PostgreSQL – Defaults designed to “run anywhere” – pgbench, VACUUM/ANALYZE Tune Queries – EXPLAIN Caching (avoid the database) – memcached – Ehcache 10/26/2009 24
  25. 25. Radical Additional Options “NoSQL database ” – CouchDB, MongoDB, HBase, Cassandra, Redis – Document store – Map/Reduce querying 10/26/2009 25
  26. 26. Suggested Architecture 10/26/2009 26
  27. 27. Current Production Setup DB and Web server on same machine No failover 10/26/2009 27
  28. 28. Suggested Architecture 2 nice machines Point in Time Recovery with Heartbeat Tune PostgreSQL Monitor & improve slow queries Add in Ehcache as we touch code → Leave horizontal scaling for another day 10/26/2009 28
  29. 29. Initial Architecture High Availability 10/26/2009 29
  30. 30. Future Architecture Scale up application servers horizontally as needed Improve DB Hardware 10/26/2009 30
  31. 31. Hardware Options PostgreSQL typically constrained by RAM & Disk IO, not processor 64-bit, as much memory as possible Data Array – RAID10 with 4 drives (not RAID 5), 15k RPM Separate OS Drive / Array 10/26/2009 31
  32. 32. Dell R710 Processor: Xeon 4x 15k HD in RAID10 24GB (3x 8GB) RAM (up to 6x 16GB) =$6,905 10/26/2009 32
  33. 33. Other Considerations Should have Test environment mimic Production – Same database setup – Provides environment for experimentation Can host multiple DBs on single cluster 10/26/2009 33
  34. 34. References http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding http://37signals.com/svn/posts/1819-basecamp-now-with-more-vroom http://anchor.com.au/hosting/dedicated/Tuning_PostgreSQL_on_your_Dedicated_S erver http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log- shipping-with-pg-standby.html http://blog.stackoverflow.com/2009/01/new-stack-overflow-servers-ready/ http://developer.postgresql.org/pgdocs/postgres/high-availability.html http://developer.postgresql.org/pgdocs/postgres/pgbench.html https://developer.skype.com/SkypeGarage/DbProjects/PlProxy http://wiki.postgresql.org/wiki/Performance_Optimization http://www.postgresql.org/docs/8.4/static/warm-standby.html http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/ http://www.slony.info/ 10/26/2009 34
  35. 35. Additional Links http://ehcache.org/ http://highscalability.com/skype-plans-postgresql-scale-1-billion- users http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalable DatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx http://www.danga.com/memcached/ http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont- want-to-shard/ http://www.slideshare.net/iamcal/scalable-web-architectures- common-patterns-and-approaches-web-20-expo-nyc-presentation 10/26/2009 35
  36. 36. 10/26/2009 36

×