SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
PostgreSQL
High Availability & Scaling
John Paulett
October 26, 2009
2.
Overview
Scaling Overview
– Horizontal & Vertical Options
High Availability Overview
Other Options
Suggested Architecture
Hardware Discussion
10/26/2009 2
3.
What are we trying to solve?
Survive server failure?
– Support an uptime SLA (e.g. 99.9999%)?
Application scaling?
– Support additional application demand
10/26/2009 3
4.
What are we trying to solve?
Survive server failure?
– Support an uptime SLA (e.g. 99.9999%)?
Application scaling?
– Support additional application demand
→ Many options, each optimized for
different constraints
10/26/2009 4
6.
How To Scale
Horizontal Scaling
– “Google” approach
– Distribute load across multiple servers
– Requires appropriate application architecture
Vertical Scaling
– “Big Iron” approach
– Single, massive machine (lots of fast processors,
RAM, & hard drives)
10/26/2009 6
7.
Horizontal DB Scaling
Load Balancing
– Distribute operations to multiple servers
Partitioning
– Cut up the data (horizontal) or tables (vertical)
and put them on separate servers
– aka “sharding”
10/26/2009 7
8.
Basic Problem when Load
Balancing
Difficult to maintain consistent state
between servers (remember ACID),
especially when dealing with writes
4 PostgreSQL Load Balancing Methods:
– Master-Slave Replication
– Statement-Based Replication Middleware
– Asynchronous Multimaster Replication
– Synchronous Multimaster Replication
10/26/2009 8
9.
Master-Slave Replication
Master handles writes, slaves handle reads
Asynchronous replication
– Possible data loss on master failure
Slony-I
– Does not automatically propagate schema changes
– Does not offer single connection point
– Requires separate solution for master failures
10/26/2009 9
10.
Statement-Based Replication
Middleware
Intercept SQL queries, send writes to all
servers, reads to any server
Possible issues using random(),
CURRENT_TIMESTAMP, & sequences
pgpool-II
– Connection Pooling, Replication, Load Balancing,
Parallel Queries, Failover
10/26/2009 10
12.
Synchronous Multimaster
Replication
Writes & reads on any server
Not implemented in PostgreSQL, but
application code can mimic via two-phase
commit
10/26/2009 12
13.
Load Balancing Issue
Scaling writes breaks down at a certain
point
10/26/2009 13
14.
Partitioning
Requires heavy application modification
Performing queries across partitions is
problematic (not possible)
PL/Proxy can help
10/26/2009 14
15.
Vertical DB Scaling
“Buying a bigger box is quick(ish). Redesigning
software is not.”
● Cal Henderson, Flickr
37 Signals Basecamp upgraded to 128 GB DB
server: “don’t need to pay the complexity tax
yet”
● David Heinemeier Hansson, Ruby on Rails
10/26/2009 15
16.
Sites Running on Single DB
StackOverflow
– MS SQL, 48GB RAM, RAID 1 OS, RAID 10 for data
37Signals Basecamp
– MySQL, 128GB RAM. Dell R710 or Dell 2950
10/26/2009 16
18.
High Availability
Application still up even after node failure
– (Also try to prevent failure with appropriate
hardware)
PostgreSQL High Availability Options
– pg-pool
– Shared Disk Failover
– File System Replication
– Warm Standby with Point-In-Time Recovery (PITR)
Often still need heartbeat application
10/26/2009 18
19.
Shared Disk Failover
Use single disk array to hold database's
data files.
– Network Attached Storage (NAS)
– Network File System (NFS)
Disk array is central point of failure
Need heartbeat to bring 2nd server online
10/26/2009 19
20.
File System Replication
File system is mirrored to another
computer
DRDB
– Linux filesystem replication
Need heartbeat to bring 2nd server online
10/26/2009 20
21.
Point in Time Recovery
“Log shipping”
– Write Ahead Logs sent to and replayed on standby
– Included in PostgreSQL 8.0+
– Asynchronous - Potential loss of data
Warm Standby
– Standbys' hardware very similar to primary's
– Need heartbeat to bring 2nd server online
10/26/2009 21
22.
Heartbeat
“STONITH” (Shoot the Other Node In The
Head)
– Prevent multiple nodes thinking they are the
master
Linux-HA
– Creates cluster, takes nodes out when they fail
10/26/2009 22
27.
Current Production Setup
DB and Web server on same machine
No failover
10/26/2009 27
28.
Suggested Architecture
2 nice machines
Point in Time Recovery with Heartbeat
Tune PostgreSQL
Monitor & improve slow queries
Add in Ehcache as we touch code
→ Leave horizontal scaling for another day
10/26/2009 28
29.
Initial Architecture
High Availability
10/26/2009 29
30.
Future Architecture
Scale up application servers horizontally
as needed
Improve DB Hardware
10/26/2009 30
31.
Hardware Options
PostgreSQL typically constrained by RAM
& Disk IO, not processor
64-bit, as much memory as possible
Data Array
– RAID10 with 4 drives (not RAID 5), 15k RPM
Separate OS Drive / Array
10/26/2009 31
32.
Dell R710
Processor: Xeon
4x 15k HD in RAID10
24GB (3x 8GB) RAM (up to 6x 16GB)
=$6,905
10/26/2009 32
33.
Other Considerations
Should have Test environment mimic
Production
– Same database setup
– Provides environment for experimentation
Can host multiple DBs on single cluster
10/26/2009 33