SlideShare a Scribd company logo
1 of 47
© Continuent
1/9/2024
PostgreSQL replication strategies
Understanding High Availability and
choosing the right solution
emmanuel.cecchet@continuent.com
emmanuel.cecchet@epfl.ch
Slides available at http://sequoia.continuent.org/Resources
1 © Continuent www.continuent.com
What Drives Database Replication?
/ Availability – Ensure applications remain up and running when
there are hardware/software failures as well as during scheduled
maintenance on database hosts
/ Read Scaling – Distribute queries, reports, and I/O-intensive
operations like backup, e.g., on media or forum web sites
/ Write Scaling – Distribute updates across multiple databases, for
example to support telco message processing or document/web
indexing
/ Super Durable Commit – Ensure that valuable transactions such
as financial or medical data commit to multiple databases to
avoid loss
/ Disaster Recovery – Maintain data and processing resources in
a remote location to ensure business continuity
/ Geo-cluster – Allow users in different geographic locations to
use a local database for processing with automatic
synchronization to other hosts
2 © Continuent www.continuent.com
High availability
/ The magic nines
Percent uptime Downtime/month Downtime/year
99.0% 7.2 hours 3.65 days
99.9% 43.2 minutes 8.76 hours
99.99% 4.32 minutes 52.56 minutes
99.999% 0.43 minutes 5.26 minutes
99.9999% 2.6 seconds 31 seconds
3 © Continuent www.continuent.com
Few definitions
/ MTBF
• Mean Time Between Failure
• Total MTBF of a cluster must combine MTBF of its
individual components
• Consider mean-time-between-system-abort (MTBSA)
or mean-time-between-critical-failure (MTBCF)
/ MTTR
• Mean Time To Repair
• How is the failure detected?
• How is it notified?
• Where are the spare parts for hardware?
• What does your support contract say?
4 © Continuent www.continuent.com
Outline
/ Database replication strategies
/ PostgreSQL replication solutions
/ Building HA solutions
/ Management issues in production
5 © Continuent www.continuent.com
/ Clients connect to the application server
/ Application server builds web pages with data coming from the
database
/ Application server clustering solves application server failure
/ Database outage causes overall system outage
Internet
Database
Database
Disk
Application
servers
Problem: Database is the weakest link
6 © Continuent www.continuent.com
Disk replication/clustering
/ Eliminates the single point of failure (SPOF) on the disk
/ Disk failure does not cause database outage
/ Database outage problem still not solved
Internet
Database
Database disks
Application
servers
7 © Continuent www.continuent.com
/ Multiple database instances share the same disk
/ Disk can be replicated to prevent SPOF on disk
/ No dynamic load balancing
/ Database failure not transparent to users (partial outage)
/ Manual failover + manual cleanup needed
Internet
Databases Database
Disks
Application
servers
Database clustering with shared disk
8 © Continuent www.continuent.com
Master/slave replication
/ Lazy replication at the disk or database level
/ No scalability
/ Data lost at failure time
/ System outage during failover to slave
/ Failover requires client reconfiguration
Internet
Master
Database
Database
Disks
Application
servers
Slave Database
log shipping
hot standby
9 © Continuent www.continuent.com
Internet
Web
frontend
App.
server Master
Scaling the database tier
Master-slave replication
/ Pros
• Good solution for disaster recovery with remote slaves
/ Cons
• failover time/data loss on master failure
• read inconsistencies
• master scalability
10 © Continuent www.continuent.com
Internet
/ Pros
• consistency provided by multi-master replication
/ Cons
• atomic broadcast scalability
• no client side load balancing
• heavy modifications of the database engine
Atomic
broadcast
Scaling the database tier
Atomic broadcast
11 © Continuent www.continuent.com
Scaling the database tier – SMP
Internet
Web
frontend
App.
server
Well-known
database
vendor here
Database
Well-known hardware +
database vendors here
/ Pros
• Performance
/ Cons
• Scalability limit
• Limited reliability
• Cost
12 © Continuent www.continuent.com
Internet
/ Pros
• no client application modification
• database vendor independent
• heterogeneity support
• pluggable replication algorithm
• possible caching
/ Cons
• latency overhead
• might introduce new deadlocks
Middleware-based replication
13 © Continuent www.continuent.com
/ Failures can happen
• in any component
• at any time of a request execution
• in any context (transactional, autocommit)
/ Transparent failover
• masks all failures at any time to the client
• perform automatic retry and preserves consistency
Internet
Sequoia
Transparent failover
14 © Continuent www.continuent.com
Outline
/ Database replication strategies
/ PostgreSQL replication solutions
/ Building HA solutions
/ Management issues in production
15 © Continuent www.continuent.com
Feature pgpool-I pgpool-II PGcluster-I PGcluster-II Slony-I Sequoia
Replication
type
Hot standby Multi-
master
Multi-master Shared disk Master/Slave Multi-
master
Commodity
hardware
Yes Yes Yes No Yes Yes
Application
modifications
No No Yes if
reading from
slaves
No Yes if
reading from
slaves
Client
driver
update
Database
modifications
No No Yes Yes No No
PG support >=7.4 Unix >=7.4 Unix 7.3.9, 7.4.6,
8.0.1 Unix
8.? Unix
only?
>= 7.3.3 All
versions
Data loss on
failure
Yes Yes? No? No Yes No
Failover on
DB failure
Yes Yes Yes No if due to
disk
Yes Yes
Transparent
failover
No No No No No Yes
Disaster
recovery
Yes Yes Yes No if disk Yes Yes
Queries load
balancing
No Yes Yes Yes Yes Yes
PostgreSQL replication solutions compared
16 © Continuent www.continuent.com
Feature pgpool-I pgpool-II PGcluster-I PGcluster-II Slony-I Sequoia
Read scalability Yes Yes Yes Yes Yes Yes
Write
scalability
No No No Yes No No
Query
parallelization
No Yes No No? No No
Replicas 2 up to 128 LB or
replicator
limit
SAN limit unlimited unlimited
Super durable
commit
No Yes No Yes No Yes
Add node on
the fly
No Yes Yes Yes Yes (slave) Yes
Online
upgrades
No No Yes No Yes (small
downtime)
Yes
Heterogeneous
clusters
PG >=7.4
Unix only
PG >=7.4
Unix only
PG PG PG>=7.3.3 Yes
Geo-cluster
support
No No Possible but
don’t use
No Yes Yes
PostgreSQL replication solutions compared
17 © Continuent www.continuent.com
Performance vs Scalability
/ Performance
• latency different from throughput
/ Most solutions don’t provide parallel query execution
• No parallelization of query execution plan
• Query do not go faster when database is not loaded
/ What a perfect load distribution buys you
• Constant response time when load increases
• Better throughput when load surpasses capacity of a single
database
18 © Continuent www.continuent.com
Understanding scalability (1/2)
Performance vs. Time
0
50
100
150
200
250
300
350
400
450
500
00:00:00 01:12:00 02:24:00 03:36:00 04:48:00 06:00:00 07:12:00 08:24:00 09:36:00 10:48:00
Time (sec.)
Response
time
1 Database - Load in users
1 Database - Response time
Sequoia 2 DBs - Load in users
Sequoia 2 DBs - Response time
20 users
Single DB
Sequoia
19 © Continuent www.continuent.com
Understanding scalability (2/2)
Performance vs. Time
0
500
1000
1500
2000
2500
00:00:00 00:28:48 00:57:36 01:26:24 01:55:12 02:24:00 02:52:48
Time (sec.)
Response
time
1 DB - Load in users
1 DB - Response time
Sequoia 2DB - Load in users
Sequoia 2 DB - Response time
90 users
Single DB
Sequoia
20 © Continuent www.continuent.com
RAIDb Concept: Redundant Array of Inexpensive Databases
/ RAIDb controller – creates single virtual db, balances load
/ RAIDb 0,1,2: various performance/fault tolerance tradeoffs
/ New combinations easy to implement
tables
2 & 3
table ...
RAIDb controller
table n-1
table 1 table n
SQL
• partitioning (whole tables)
• no duplication
• no fault tolerance
• at least 2 nodes
RAIDb-0
• mirroring
• performance bounded by
write broadcast
• at least 2 nodes
• uni/cluster certifies only RAIDb-1
RAIDb-1
Full DB
RAIDb controller
SQL
Full DB Full DB Full DB Full DB
• partial replication
• at least 2 copies of each
table for fault tolerance
• at least 3 nodes
table x table y
tables
x & y
Full DB table z
SQL
RAIDb controller
RAIDb-2
21 © Continuent www.continuent.com
JVM
Sequoia
JDBC driver
Sequoia
controller
JVM
PostgreSQL
JDBC Driver PostgreSQL
Sequoia architectural overview
/ Middleware implementing RAIDb
• 100% Java implementation
• open source (Apache v2 License)
/ Two components
• Sequoia driver (JDBC, ODBC, native lib)
• Sequoia Controller
/ Database neutral
22 © Continuent www.continuent.com
Sequoia Controller
Derby
Sequoia driver
Derby
Virtual database 1
Database
Backend
Connection
Manager
Database
Backend
Connection
Manager
Request Manager
Query result cache
Scheduler
Load balancer
Derby JDBC
driver
Derby JDBC
driver
Recovery
Log
Authentication Manager
Derby
Database
Backend
Connection
Manager
Derby JDBC
driver
Sequoia driver
Client application
(Servlet, EJB, ...)
Client application
(Servlet, EJB, ...)
connect myDB
connect login, password
execute SELECT * FROM t
ordering
exec
RR, WRR, LPRF, …
get connection
from pool
update cache
(if available)
Sequoia read request
23 © Continuent www.continuent.com
Sequoia Controller
Distributed Request Manager
Sequoia Controller
Distributed Request Manager
Sequoia driver
Virtual database 1
Database
Backend
Connection
Manager
Database
Backend
Connection
Manager
Derby JDBC
driver
Derby JDBC
driver
Virtual database 2
Database
Backend
Connection
Manager
Database
Backend
Connection
Manager
Request Manager
Query result cache
Scheduler
Load balancer
Authentication Manager
Derby JDBC
driver
Sequoia driver
Client application
(Servlet, EJB, ...)
Sequoia driver
Client application
(Servlet, EJB, ...)
Client application
(Servlet, EJB, ...)
Request Manager
Query result cache
Scheduler
Load balancer
Authentication Manager
Recovery
Log
Recovery
Log
Derby Derby Derby
Recovery
Database
Embedded
Derby
Derby JDBC
driver
Derby
Recovery
Database
Embedded
Derby
Database
Backend
Connection
Manager
Database
Backend
Connection
Manager
Derby JDBC
driver
Derby
Derby JDBC
driver
Derby
Database
Backend
Connection
Manager
Database
Backend
Connection
Manager
Derby JDBC
driver
Derby JDBC
driver
Derby Derby
jdbc:sequoia://node1,node2/myDB
Total order reliable
multicast
Sequoia write request
24 © Continuent www.continuent.com
Alternative replication algorithms
/ GORDA API
• European consortium defining API for pluggable replication
algorithms
/ Sequoia 3.0 GORDA compliant prototype for
PostgreSQL
• Uses triggers to compute write-sets
• Certifies transaction at commit time
• Propagate write-sets to other nodes
/ Tashkent/Tashkent+
• Research prototype developed at EPFL
• Uses workload information for improved load balancing
/ More information
• http://sequoia.continuent.org
• http://gorda.di.uminho.pt/
25 © Continuent www.continuent.com
PostgreSQL specific issues
/ Indeterminist queries
• Macros in queries (now(), current_timestamp, rand(), …)
• Stored procedures, triggers, …
• SELECT … LIMIT can create non-deterministic results in UPDATE statements if
the SELECT does not have an ORDER BY with a unique index:
UPDATE FOO SET KEYVALUE=‘x’ WHERE ID IN
(SELECT ID FROM FOO WHERE KEYVALUE IS NULL LIMIT 10)
/ Sequences
• setval() and nextval() are not rollback
• nextval() can also be called within SELECT
/ Serial type
/ Large objects and OIDs
/ Schema changes
/ User access control
• not stored in database (pg_hba.conf)
• host-based control might be fooled by proxy
• backup/restore with respect to user rights
/ VACUUM
26 © Continuent www.continuent.com
Outline
/ Database replication strategies
/ PostgreSQL replication solutions
/ Building HA solutions
/ Management issues in production
27 © Continuent www.continuent.com
Simple hot-standby solution (1/3)
/ Virtual IP address + Heartbeat for failover
/ Slony-I for replication
28 © Continuent www.continuent.com
Simple hot-standby solution (2/3)
/ Virtual IP address + Heartbeat for failover
/ Linux DRDB for replication
/ Only 1 node serving requests Client Applications
Client Applications
Virtual IP
Postgres
Linux OS
DRBD
Heartbeat
/dev
/drbd0
/dev
/drbd0
Postgres
Linux OS
DRBD
Heartbeat
29 © Continuent www.continuent.com
Simple hot-standby solution (3/3)
/ pgpool for failover
/ proxy might become bottleneck
• requires 3 sockets per client connection
• increased latency
/ Only 1 node serving requests
Client Applications
Client Applications
pgpool
Postgres
1
Postgres
2
30 © Continuent www.continuent.com
Internet
/ Apache clustering
• L4 switch, RR-DNS, One-IP techniques, LVS, Linux-HA, …
/ Web tier clustering
• mod_jk (T4), mod_proxy/mod_rewrite (T5), session replication
/ PostgreSQL multi-master clustering solution
Highly available web site
mod-jk
RR-DNS
31 © Continuent www.continuent.com
Internet
/ Consider MTBF (Mean time between failure) of every hardware and
software component
/ Take MTTR (Mean Time To Repair) into account to prevent long outages
/ Tune accordingly to prevent trashing
Sequoia
Highly available web applications
32 © Continuent www.continuent.com
Building Geo-Clusters
Database
Database
Database
Database
Database
Database
America master
Europe slave
Asia slave
America slave
Europe master
Asia slave
America slave
Europe slave
Asia master
asynchronous
WAN replication
33 © Continuent www.continuent.com
Split brain problem (1/2)
/ This is what you should NOT do:
• At least 2 network adapters in controller
• Use a dedicated network for controller communication
Database
Database
Database
Database
Client servers
Controllers Databases
Network
switch
eth0
eth0
eth1
eth1
eth2
eth2
34 © Continuent www.continuent.com
Split brain problem (2/2)
/ When controllers lose connectivity clients may update
inconsistently each half of the cluster
/ No way to detect this scenario (each half thinks that the other
half has simply failed)
Database
Database
Database
Database
Client servers
Controllers Databases
Network
switch
eth0
eth0
eth1
eth1
eth2
eth2
35 © Continuent www.continuent.com
Avoiding network failure and split-brain
/ Collocate all network traffic using Linux Bonding
/ Replicate all network components (mirror the network configuration)
/ Various configuration options available for bonding (active-backup or
trunking)
Database
Database
Database
Database
Client servers
Controllers Databases
eth1
eth0
bond0
eth1
eth0
bond0
bond0 eth0
eth1
bond0 eth0
eth1
bond0
eth0
eth1
bond0
eth0
eth1
bond0
eth0
eth1
bond0
eth0
eth1
36 © Continuent www.continuent.com
Synchronous GeoClusters
/ Multi-master replication
requires group communication
optimized for WAN
environments
/ Split-brain issues will happen
unless expensive reliable
dedicated links are used
/ Reconciliation procedures are
application dependent
DB 6
DB 5
DB native JDBC driver
DB 7
Sequoia driver
DB 1 DB 2
DB native
JDBC driver
DB 3
DB native
JDBC driver
DB 4
Sequoia controller
Full replication
Sequoia controller
Full replication
Sequoia controller
Full replication
Sequoia controller
Full replication
Sequoia
driver
JVM
Client
program
Sequoia
driver
JVM
Client
program
Sequoia
driver
JVM
Client
program
Sequoia
driver
Sequoia
driver
DB 9
DB native JDBC driver
DB 10
Sequoia controller
Full replication
DB 8
DB 12
DB native JDBC driver
DB 13
Sequoia controller
Full replication
DB 11
37 © Continuent www.continuent.com
Outline
/ Database replication strategies
/ PostgreSQL replication solutions
/ Building HA solutions
/ Management issues in production
38 © Continuent www.continuent.com
Managing a cluster in production
/ Diagnosing reliably cluster status
/ Getting proper notifications/alarms when something
goes wrong
• Standard email or SNMP traps
• Logging is key for diagnostic
/ Minimizing downtime
• Migrating from single database to cluster
• Expanding cluster
• Staging environment is key to test
/ Planned maintenance operations
• Vacuum
• Backup
• Software maintenance (DB, replication software, …)
• Node maintenance (reboot, power cycle, …)
• Site maintenance (in GeoCluster case)
39 © Continuent www.continuent.com
Dealing with failures
/ Sotfware vs Hardware failures
• client application, database, replication software, OS, VM, …
• power outage, node, disk, network, Byzantine failure, …
• Admission control to prevent trashing
/ Detecting failures require proper timeout settings
/ Automated failover procedures
• client and cluster reconfiguration
• dealing with multiple simultaneous failures
• coordination required between different tiers or admin scripts
/ Automatic database resynchronization / node repair
/ Operator errors
• automation to prevent manual intervention
• always keep backups and try procedures on staging environment first
/ Disaster recovery
• minimize data loss but preserve consistency
• provisioning and planning are key
/ Split brain or GeoCluster failover
• requires organization wide coordination
• manual diagnostic/reconfiguration often required
40 © Continuent www.continuent.com
Summary
/ Different replication strategies for different needs
/ Performance ≠ Scalability
/ Manageability becomes THE major issue in
production
41 © Continuent www.continuent.com
/ pgpool: http://pgpool.projects.postgresql.org/
/ PGcluster: http://pgcluster.projects.postgresql.org/
/ Slony: http://slony.info/
/ Sequoia: http://sequoia.continuent.org
/ GORDA: http://gorda.di.uminho.pt/
/ Slides: http://sequoia.continuent.org/Resources
http://www.continuent.org
Links
© Continuent
1/9/2024
Bonus slides
43 © Continuent www.continuent.com
RAIDb-2 for scalability
/ limit replication of heavily written tables to subset of
nodes
/ dynamic replication of temp tables
/ reduces disk space requirements
DB native JDBC driver
DB native JDBC driver DB native JDBC driver
Sequoia controller
RAIDb-2
Sequoia controller
RAIDb-2
Sequoia controller
RAIDb-2
Sequoia
driver
Client
program
Sequoia
driver
Client
program
Sequoia
driver
Client
program
RO + temp
tables
All tables RO tables RO tables
All tables
WO sub1
tables
RO + temp
tables
WO sub2
tables
44 © Continuent www.continuent.com
RAIDb-2 for heterogeneous clustering
/ Migrating from MySQL to Oracle
/ Migrating from Oracle x to Oracle x+1
DB native JDBC driver
DB native JDBC driver Oracle 11h driver
Sequoia controller
RAIDb-2
Sequoia controller
RAIDb-2
Sequoia controller
RAIDb-2
Sequoia
driver
Client
program
Sequoia
driver
Client
program
Sequoia
driver
Client
program
Oracle
migrated
tables
MySQL
Old tables
Oracle
new apps
MySQL
Old tables
Oracle driver
MySQL driver MySQL driver Oracle driver
Oracle
new apps
Oracle
migrated
+ new apps
45 © Continuent www.continuent.com
Server farms with master/slave db replication
/ No need for group communication between controller
/ Admin. operations broadcast to all controllers
RW
Client application
node 1
Sequoia controller 1
ParallelDB
Sequoia driver
...
RO RO
RO
MySQL
master
MySQL
slave
MySQL
slave
MySQL
slave
MySQL JDBC driver
Client application
node 2
Sequoia driver
Client application
node 3
Sequoia driver
Client application
node n-1
Sequoia driver
Client application
node n
Sequoia driver
Sequoia controller 2
ParallelDB
MySQL JDBC driver
...
Sequoia controller x
ParallelDB
MySQL JDBC driver
46 © Continuent www.continuent.com
Composing Sequoia controllers
/ Sequoia controller viewed as single database by client (app. or
other Sequoia controller)
/ No technical limit on composition deepness
/ Backends/controller cannot be shared by multiple controllers
/ Can be expanded dynamically
RO RO
RO
RAC RAC MySQL
master
MySQL
slave
MySQL
slave
MySQL
slave
RAC RAC
SAN
DB native JDBC driver
Sequoia controller
ParallelDB
DB native JDBC driver
Sequoia controller
ParallelDB
Sequoia driver
Sequoia controller
RAIDb-1
DB native JDBC driver
Sequoia controller
RAIDb-2
Sequoia driver
Sequoia controller
RAIDb-1
DB native driver
DB
DB DB DB

More Related Content

Similar to 2007-05-23 Cecchet_PGCon2007.ppt

NetBackup Appliance Family presentation
NetBackup Appliance Family presentationNetBackup Appliance Family presentation
NetBackup Appliance Family presentationSymantec
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud DatabasesOVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud DatabasesOVHcloud
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesSeveralnines
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
DB2 pureScale Overview Sept 2010
DB2 pureScale Overview Sept 2010DB2 pureScale Overview Sept 2010
DB2 pureScale Overview Sept 2010Laura Hood
 
Webinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off Target
Webinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off TargetWebinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off Target
Webinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off TargetStorage Switzerland
 
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...VMware Tanzu
 
Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014EDB
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspectivePriit Piipuu
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledgesqlserver.co.il
 
Veeam webinar - Deduplication best practices
Veeam webinar - Deduplication best practicesVeeam webinar - Deduplication best practices
Veeam webinar - Deduplication best practicesJoep Piscaer
 
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 20161049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016panagenda
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelYasunori Goto
 

Similar to 2007-05-23 Cecchet_PGCon2007.ppt (20)

NetBackup Appliance Family presentation
NetBackup Appliance Family presentationNetBackup Appliance Family presentation
NetBackup Appliance Family presentation
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud DatabasesOVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud Databases
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - Slides
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
DB2 pureScale Overview Sept 2010
DB2 pureScale Overview Sept 2010DB2 pureScale Overview Sept 2010
DB2 pureScale Overview Sept 2010
 
Webinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off Target
Webinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off TargetWebinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off Target
Webinar: All-Flash For Databases: 5 Reasons Why Current Systems Are Off Target
 
Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...
Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...
Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...
 
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
 
Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspective
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Veeam webinar - Deduplication best practices
Veeam webinar - Deduplication best practicesVeeam webinar - Deduplication best practices
Veeam webinar - Deduplication best practices
 
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 20161049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
 
Case Studies on PostgreSQL
Case Studies on PostgreSQLCase Studies on PostgreSQL
Case Studies on PostgreSQL
 

Recently uploaded

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Recently uploaded (20)

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 

2007-05-23 Cecchet_PGCon2007.ppt

  • 1. © Continuent 1/9/2024 PostgreSQL replication strategies Understanding High Availability and choosing the right solution emmanuel.cecchet@continuent.com emmanuel.cecchet@epfl.ch Slides available at http://sequoia.continuent.org/Resources
  • 2. 1 © Continuent www.continuent.com What Drives Database Replication? / Availability – Ensure applications remain up and running when there are hardware/software failures as well as during scheduled maintenance on database hosts / Read Scaling – Distribute queries, reports, and I/O-intensive operations like backup, e.g., on media or forum web sites / Write Scaling – Distribute updates across multiple databases, for example to support telco message processing or document/web indexing / Super Durable Commit – Ensure that valuable transactions such as financial or medical data commit to multiple databases to avoid loss / Disaster Recovery – Maintain data and processing resources in a remote location to ensure business continuity / Geo-cluster – Allow users in different geographic locations to use a local database for processing with automatic synchronization to other hosts
  • 3. 2 © Continuent www.continuent.com High availability / The magic nines Percent uptime Downtime/month Downtime/year 99.0% 7.2 hours 3.65 days 99.9% 43.2 minutes 8.76 hours 99.99% 4.32 minutes 52.56 minutes 99.999% 0.43 minutes 5.26 minutes 99.9999% 2.6 seconds 31 seconds
  • 4. 3 © Continuent www.continuent.com Few definitions / MTBF • Mean Time Between Failure • Total MTBF of a cluster must combine MTBF of its individual components • Consider mean-time-between-system-abort (MTBSA) or mean-time-between-critical-failure (MTBCF) / MTTR • Mean Time To Repair • How is the failure detected? • How is it notified? • Where are the spare parts for hardware? • What does your support contract say?
  • 5. 4 © Continuent www.continuent.com Outline / Database replication strategies / PostgreSQL replication solutions / Building HA solutions / Management issues in production
  • 6. 5 © Continuent www.continuent.com / Clients connect to the application server / Application server builds web pages with data coming from the database / Application server clustering solves application server failure / Database outage causes overall system outage Internet Database Database Disk Application servers Problem: Database is the weakest link
  • 7. 6 © Continuent www.continuent.com Disk replication/clustering / Eliminates the single point of failure (SPOF) on the disk / Disk failure does not cause database outage / Database outage problem still not solved Internet Database Database disks Application servers
  • 8. 7 © Continuent www.continuent.com / Multiple database instances share the same disk / Disk can be replicated to prevent SPOF on disk / No dynamic load balancing / Database failure not transparent to users (partial outage) / Manual failover + manual cleanup needed Internet Databases Database Disks Application servers Database clustering with shared disk
  • 9. 8 © Continuent www.continuent.com Master/slave replication / Lazy replication at the disk or database level / No scalability / Data lost at failure time / System outage during failover to slave / Failover requires client reconfiguration Internet Master Database Database Disks Application servers Slave Database log shipping hot standby
  • 10. 9 © Continuent www.continuent.com Internet Web frontend App. server Master Scaling the database tier Master-slave replication / Pros • Good solution for disaster recovery with remote slaves / Cons • failover time/data loss on master failure • read inconsistencies • master scalability
  • 11. 10 © Continuent www.continuent.com Internet / Pros • consistency provided by multi-master replication / Cons • atomic broadcast scalability • no client side load balancing • heavy modifications of the database engine Atomic broadcast Scaling the database tier Atomic broadcast
  • 12. 11 © Continuent www.continuent.com Scaling the database tier – SMP Internet Web frontend App. server Well-known database vendor here Database Well-known hardware + database vendors here / Pros • Performance / Cons • Scalability limit • Limited reliability • Cost
  • 13. 12 © Continuent www.continuent.com Internet / Pros • no client application modification • database vendor independent • heterogeneity support • pluggable replication algorithm • possible caching / Cons • latency overhead • might introduce new deadlocks Middleware-based replication
  • 14. 13 © Continuent www.continuent.com / Failures can happen • in any component • at any time of a request execution • in any context (transactional, autocommit) / Transparent failover • masks all failures at any time to the client • perform automatic retry and preserves consistency Internet Sequoia Transparent failover
  • 15. 14 © Continuent www.continuent.com Outline / Database replication strategies / PostgreSQL replication solutions / Building HA solutions / Management issues in production
  • 16. 15 © Continuent www.continuent.com Feature pgpool-I pgpool-II PGcluster-I PGcluster-II Slony-I Sequoia Replication type Hot standby Multi- master Multi-master Shared disk Master/Slave Multi- master Commodity hardware Yes Yes Yes No Yes Yes Application modifications No No Yes if reading from slaves No Yes if reading from slaves Client driver update Database modifications No No Yes Yes No No PG support >=7.4 Unix >=7.4 Unix 7.3.9, 7.4.6, 8.0.1 Unix 8.? Unix only? >= 7.3.3 All versions Data loss on failure Yes Yes? No? No Yes No Failover on DB failure Yes Yes Yes No if due to disk Yes Yes Transparent failover No No No No No Yes Disaster recovery Yes Yes Yes No if disk Yes Yes Queries load balancing No Yes Yes Yes Yes Yes PostgreSQL replication solutions compared
  • 17. 16 © Continuent www.continuent.com Feature pgpool-I pgpool-II PGcluster-I PGcluster-II Slony-I Sequoia Read scalability Yes Yes Yes Yes Yes Yes Write scalability No No No Yes No No Query parallelization No Yes No No? No No Replicas 2 up to 128 LB or replicator limit SAN limit unlimited unlimited Super durable commit No Yes No Yes No Yes Add node on the fly No Yes Yes Yes Yes (slave) Yes Online upgrades No No Yes No Yes (small downtime) Yes Heterogeneous clusters PG >=7.4 Unix only PG >=7.4 Unix only PG PG PG>=7.3.3 Yes Geo-cluster support No No Possible but don’t use No Yes Yes PostgreSQL replication solutions compared
  • 18. 17 © Continuent www.continuent.com Performance vs Scalability / Performance • latency different from throughput / Most solutions don’t provide parallel query execution • No parallelization of query execution plan • Query do not go faster when database is not loaded / What a perfect load distribution buys you • Constant response time when load increases • Better throughput when load surpasses capacity of a single database
  • 19. 18 © Continuent www.continuent.com Understanding scalability (1/2) Performance vs. Time 0 50 100 150 200 250 300 350 400 450 500 00:00:00 01:12:00 02:24:00 03:36:00 04:48:00 06:00:00 07:12:00 08:24:00 09:36:00 10:48:00 Time (sec.) Response time 1 Database - Load in users 1 Database - Response time Sequoia 2 DBs - Load in users Sequoia 2 DBs - Response time 20 users Single DB Sequoia
  • 20. 19 © Continuent www.continuent.com Understanding scalability (2/2) Performance vs. Time 0 500 1000 1500 2000 2500 00:00:00 00:28:48 00:57:36 01:26:24 01:55:12 02:24:00 02:52:48 Time (sec.) Response time 1 DB - Load in users 1 DB - Response time Sequoia 2DB - Load in users Sequoia 2 DB - Response time 90 users Single DB Sequoia
  • 21. 20 © Continuent www.continuent.com RAIDb Concept: Redundant Array of Inexpensive Databases / RAIDb controller – creates single virtual db, balances load / RAIDb 0,1,2: various performance/fault tolerance tradeoffs / New combinations easy to implement tables 2 & 3 table ... RAIDb controller table n-1 table 1 table n SQL • partitioning (whole tables) • no duplication • no fault tolerance • at least 2 nodes RAIDb-0 • mirroring • performance bounded by write broadcast • at least 2 nodes • uni/cluster certifies only RAIDb-1 RAIDb-1 Full DB RAIDb controller SQL Full DB Full DB Full DB Full DB • partial replication • at least 2 copies of each table for fault tolerance • at least 3 nodes table x table y tables x & y Full DB table z SQL RAIDb controller RAIDb-2
  • 22. 21 © Continuent www.continuent.com JVM Sequoia JDBC driver Sequoia controller JVM PostgreSQL JDBC Driver PostgreSQL Sequoia architectural overview / Middleware implementing RAIDb • 100% Java implementation • open source (Apache v2 License) / Two components • Sequoia driver (JDBC, ODBC, native lib) • Sequoia Controller / Database neutral
  • 23. 22 © Continuent www.continuent.com Sequoia Controller Derby Sequoia driver Derby Virtual database 1 Database Backend Connection Manager Database Backend Connection Manager Request Manager Query result cache Scheduler Load balancer Derby JDBC driver Derby JDBC driver Recovery Log Authentication Manager Derby Database Backend Connection Manager Derby JDBC driver Sequoia driver Client application (Servlet, EJB, ...) Client application (Servlet, EJB, ...) connect myDB connect login, password execute SELECT * FROM t ordering exec RR, WRR, LPRF, … get connection from pool update cache (if available) Sequoia read request
  • 24. 23 © Continuent www.continuent.com Sequoia Controller Distributed Request Manager Sequoia Controller Distributed Request Manager Sequoia driver Virtual database 1 Database Backend Connection Manager Database Backend Connection Manager Derby JDBC driver Derby JDBC driver Virtual database 2 Database Backend Connection Manager Database Backend Connection Manager Request Manager Query result cache Scheduler Load balancer Authentication Manager Derby JDBC driver Sequoia driver Client application (Servlet, EJB, ...) Sequoia driver Client application (Servlet, EJB, ...) Client application (Servlet, EJB, ...) Request Manager Query result cache Scheduler Load balancer Authentication Manager Recovery Log Recovery Log Derby Derby Derby Recovery Database Embedded Derby Derby JDBC driver Derby Recovery Database Embedded Derby Database Backend Connection Manager Database Backend Connection Manager Derby JDBC driver Derby Derby JDBC driver Derby Database Backend Connection Manager Database Backend Connection Manager Derby JDBC driver Derby JDBC driver Derby Derby jdbc:sequoia://node1,node2/myDB Total order reliable multicast Sequoia write request
  • 25. 24 © Continuent www.continuent.com Alternative replication algorithms / GORDA API • European consortium defining API for pluggable replication algorithms / Sequoia 3.0 GORDA compliant prototype for PostgreSQL • Uses triggers to compute write-sets • Certifies transaction at commit time • Propagate write-sets to other nodes / Tashkent/Tashkent+ • Research prototype developed at EPFL • Uses workload information for improved load balancing / More information • http://sequoia.continuent.org • http://gorda.di.uminho.pt/
  • 26. 25 © Continuent www.continuent.com PostgreSQL specific issues / Indeterminist queries • Macros in queries (now(), current_timestamp, rand(), …) • Stored procedures, triggers, … • SELECT … LIMIT can create non-deterministic results in UPDATE statements if the SELECT does not have an ORDER BY with a unique index: UPDATE FOO SET KEYVALUE=‘x’ WHERE ID IN (SELECT ID FROM FOO WHERE KEYVALUE IS NULL LIMIT 10) / Sequences • setval() and nextval() are not rollback • nextval() can also be called within SELECT / Serial type / Large objects and OIDs / Schema changes / User access control • not stored in database (pg_hba.conf) • host-based control might be fooled by proxy • backup/restore with respect to user rights / VACUUM
  • 27. 26 © Continuent www.continuent.com Outline / Database replication strategies / PostgreSQL replication solutions / Building HA solutions / Management issues in production
  • 28. 27 © Continuent www.continuent.com Simple hot-standby solution (1/3) / Virtual IP address + Heartbeat for failover / Slony-I for replication
  • 29. 28 © Continuent www.continuent.com Simple hot-standby solution (2/3) / Virtual IP address + Heartbeat for failover / Linux DRDB for replication / Only 1 node serving requests Client Applications Client Applications Virtual IP Postgres Linux OS DRBD Heartbeat /dev /drbd0 /dev /drbd0 Postgres Linux OS DRBD Heartbeat
  • 30. 29 © Continuent www.continuent.com Simple hot-standby solution (3/3) / pgpool for failover / proxy might become bottleneck • requires 3 sockets per client connection • increased latency / Only 1 node serving requests Client Applications Client Applications pgpool Postgres 1 Postgres 2
  • 31. 30 © Continuent www.continuent.com Internet / Apache clustering • L4 switch, RR-DNS, One-IP techniques, LVS, Linux-HA, … / Web tier clustering • mod_jk (T4), mod_proxy/mod_rewrite (T5), session replication / PostgreSQL multi-master clustering solution Highly available web site mod-jk RR-DNS
  • 32. 31 © Continuent www.continuent.com Internet / Consider MTBF (Mean time between failure) of every hardware and software component / Take MTTR (Mean Time To Repair) into account to prevent long outages / Tune accordingly to prevent trashing Sequoia Highly available web applications
  • 33. 32 © Continuent www.continuent.com Building Geo-Clusters Database Database Database Database Database Database America master Europe slave Asia slave America slave Europe master Asia slave America slave Europe slave Asia master asynchronous WAN replication
  • 34. 33 © Continuent www.continuent.com Split brain problem (1/2) / This is what you should NOT do: • At least 2 network adapters in controller • Use a dedicated network for controller communication Database Database Database Database Client servers Controllers Databases Network switch eth0 eth0 eth1 eth1 eth2 eth2
  • 35. 34 © Continuent www.continuent.com Split brain problem (2/2) / When controllers lose connectivity clients may update inconsistently each half of the cluster / No way to detect this scenario (each half thinks that the other half has simply failed) Database Database Database Database Client servers Controllers Databases Network switch eth0 eth0 eth1 eth1 eth2 eth2
  • 36. 35 © Continuent www.continuent.com Avoiding network failure and split-brain / Collocate all network traffic using Linux Bonding / Replicate all network components (mirror the network configuration) / Various configuration options available for bonding (active-backup or trunking) Database Database Database Database Client servers Controllers Databases eth1 eth0 bond0 eth1 eth0 bond0 bond0 eth0 eth1 bond0 eth0 eth1 bond0 eth0 eth1 bond0 eth0 eth1 bond0 eth0 eth1 bond0 eth0 eth1
  • 37. 36 © Continuent www.continuent.com Synchronous GeoClusters / Multi-master replication requires group communication optimized for WAN environments / Split-brain issues will happen unless expensive reliable dedicated links are used / Reconciliation procedures are application dependent DB 6 DB 5 DB native JDBC driver DB 7 Sequoia driver DB 1 DB 2 DB native JDBC driver DB 3 DB native JDBC driver DB 4 Sequoia controller Full replication Sequoia controller Full replication Sequoia controller Full replication Sequoia controller Full replication Sequoia driver JVM Client program Sequoia driver JVM Client program Sequoia driver JVM Client program Sequoia driver Sequoia driver DB 9 DB native JDBC driver DB 10 Sequoia controller Full replication DB 8 DB 12 DB native JDBC driver DB 13 Sequoia controller Full replication DB 11
  • 38. 37 © Continuent www.continuent.com Outline / Database replication strategies / PostgreSQL replication solutions / Building HA solutions / Management issues in production
  • 39. 38 © Continuent www.continuent.com Managing a cluster in production / Diagnosing reliably cluster status / Getting proper notifications/alarms when something goes wrong • Standard email or SNMP traps • Logging is key for diagnostic / Minimizing downtime • Migrating from single database to cluster • Expanding cluster • Staging environment is key to test / Planned maintenance operations • Vacuum • Backup • Software maintenance (DB, replication software, …) • Node maintenance (reboot, power cycle, …) • Site maintenance (in GeoCluster case)
  • 40. 39 © Continuent www.continuent.com Dealing with failures / Sotfware vs Hardware failures • client application, database, replication software, OS, VM, … • power outage, node, disk, network, Byzantine failure, … • Admission control to prevent trashing / Detecting failures require proper timeout settings / Automated failover procedures • client and cluster reconfiguration • dealing with multiple simultaneous failures • coordination required between different tiers or admin scripts / Automatic database resynchronization / node repair / Operator errors • automation to prevent manual intervention • always keep backups and try procedures on staging environment first / Disaster recovery • minimize data loss but preserve consistency • provisioning and planning are key / Split brain or GeoCluster failover • requires organization wide coordination • manual diagnostic/reconfiguration often required
  • 41. 40 © Continuent www.continuent.com Summary / Different replication strategies for different needs / Performance ≠ Scalability / Manageability becomes THE major issue in production
  • 42. 41 © Continuent www.continuent.com / pgpool: http://pgpool.projects.postgresql.org/ / PGcluster: http://pgcluster.projects.postgresql.org/ / Slony: http://slony.info/ / Sequoia: http://sequoia.continuent.org / GORDA: http://gorda.di.uminho.pt/ / Slides: http://sequoia.continuent.org/Resources http://www.continuent.org Links
  • 44. 43 © Continuent www.continuent.com RAIDb-2 for scalability / limit replication of heavily written tables to subset of nodes / dynamic replication of temp tables / reduces disk space requirements DB native JDBC driver DB native JDBC driver DB native JDBC driver Sequoia controller RAIDb-2 Sequoia controller RAIDb-2 Sequoia controller RAIDb-2 Sequoia driver Client program Sequoia driver Client program Sequoia driver Client program RO + temp tables All tables RO tables RO tables All tables WO sub1 tables RO + temp tables WO sub2 tables
  • 45. 44 © Continuent www.continuent.com RAIDb-2 for heterogeneous clustering / Migrating from MySQL to Oracle / Migrating from Oracle x to Oracle x+1 DB native JDBC driver DB native JDBC driver Oracle 11h driver Sequoia controller RAIDb-2 Sequoia controller RAIDb-2 Sequoia controller RAIDb-2 Sequoia driver Client program Sequoia driver Client program Sequoia driver Client program Oracle migrated tables MySQL Old tables Oracle new apps MySQL Old tables Oracle driver MySQL driver MySQL driver Oracle driver Oracle new apps Oracle migrated + new apps
  • 46. 45 © Continuent www.continuent.com Server farms with master/slave db replication / No need for group communication between controller / Admin. operations broadcast to all controllers RW Client application node 1 Sequoia controller 1 ParallelDB Sequoia driver ... RO RO RO MySQL master MySQL slave MySQL slave MySQL slave MySQL JDBC driver Client application node 2 Sequoia driver Client application node 3 Sequoia driver Client application node n-1 Sequoia driver Client application node n Sequoia driver Sequoia controller 2 ParallelDB MySQL JDBC driver ... Sequoia controller x ParallelDB MySQL JDBC driver
  • 47. 46 © Continuent www.continuent.com Composing Sequoia controllers / Sequoia controller viewed as single database by client (app. or other Sequoia controller) / No technical limit on composition deepness / Backends/controller cannot be shared by multiple controllers / Can be expanded dynamically RO RO RO RAC RAC MySQL master MySQL slave MySQL slave MySQL slave RAC RAC SAN DB native JDBC driver Sequoia controller ParallelDB DB native JDBC driver Sequoia controller ParallelDB Sequoia driver Sequoia controller RAIDb-1 DB native JDBC driver Sequoia controller RAIDb-2 Sequoia driver Sequoia controller RAIDb-1 DB native driver DB DB DB DB