Beginner's Guide to High Availability for Postgres

Beginners Guide to
High Availability for
Postgres
Presented by:
Borys Neselovskyi, Sales Engineer Region
DACH
16 February 2020

Slides and recording will be available in next 48 hours
Please do submit your questions through the chat – answers follow during
Q&A
We will be sharing info about EDB and Postgres later on in the session
Welcome – Housekeeping Items

© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
3
Agenda
1. Concepts of High Availability
2. RPO, RTO and Uptime in High Availability
3. How does High Availability work?
4. High Availability for Postgres using
• Streaming Replication
• Logical Replication
5. Postgres parameters for High Availability (Streaming
Replication)
6. EDB tools for High Availability management and monitoring

5
High Availability
High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of
operational performance, usually uptime, for a higher than normal period.
Key principles:
• Eliminate single point of failure
• Reliable crossover
• Detection of failures
Ref: https://en.wikipedia.org/wiki/High_availability

6
Scheduled/Unscheduled downtime
• Scheduled/planned downtime is a result of maintenance that is disruptive to system
operation and usually cannot be avoided with a currently installed system design.
• It include patches to system software that require a reboot or system configuration
changes that only take effect upon a reboot.
• Unscheduled/Unplanned downtime is the result of downtime events due to some
physical failures/events, such as hardware or software failure or environmental anomaly.
• For example, power outages, failed CPU or RAM components (or possibly other
hardware components failure), network failure, security breaches, or various
applications, middleware, and operating system failures result in Unplanned outage/
Unscheduled downtime.

7
Availability calculation
Calculated/expressed as a percentage of uptime in a given year based on the service level agreements. Some companies
exclude the planned outage/scheduled downtime based on their agreements with customers on the availability of their services.
Availability %
Downtime per
year
Downtime per
month
Downtime per
week
Downtime per
day
99.99% ("four nines") 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds
99.995% ("four and a half
nines") 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds
99.999% ("five nines") 5.26 minutes 26.30 seconds 6.05 seconds
864.00
milliseconds

9
Recovery Point Objective (RPO)
RPO is a measurement of time from the failure, disaster or comparable loss-causing event.
RPO can be used to measure:
• How far back must go, stretching back in time from the disaster to the last point where
data is in a usable format
• How frequently you need to back-up your data, although an RPO doesn’t represent
additional needs like restore time and recovery time.
• How much data is lost following a disaster or loss-causing event
• Ex: RPO = 2 hours
* In case of a crash I may forget everything that I did in the last 2 hours!

10
Recovery Time Objective (RTO)
The amount of time an application can be down and not result in significant damage to a business and
the time that it takes for the system to go from loss to recovery
Recovery process includes
• The steps that IT must take to return the application
• And its data to its pre-disaster state.

11
RPO vs. RTO
RPOs and RTOs are key concepts for maintaining business continuity and function as business metrics
for calculating how often your business needs to perform data backups.
• RTOs coincide with recovery point objectives (RPOs), a measurement of time from the
failure, disaster or similar loss-causing event.
• RPOs calculate back in time to when your data was last usable, probably the most recent
backup.

12
Mean Time To Recover (MTTR)
The average time that a device will take to recover from any failure. systems which have to be repaired or
replaced.
• Examples of such devices range from self-resetting fuses (where the MTTR would be
very short, probably seconds), up to whole systems which have to be repaired or
replaced.
• Usually part of a maintenance contract, where the user would pay more for a system
MTTR of which was 24 hours, than for one of, say, 7 days
• Does not mean the supplier is guaranteeing to have the system up and running again
within 24 hours (or 7 days) of being notified of the failure.

13
Geography Recovery Objectives (GRO)
If datacenter becomes unavailable, how long it takes for the service to become available again.
• It covers RPO/RTO for making services available across the geography.

High Availability
For Postgres

Eliminate Single
Point of Failure

16
Eliminate Single Point of failure
• WAL shipping based replication
• Replication based on the archived WAL
• Streaming replication (SR)
• Streaming WAL files to one or more standbys
• Logical replication
• Streaming logical data modifications from the WAL.

17
• Identical to primary system
• Data is still mirrored in real time
• Allows READ
• On failure, can replace primary
• Approaches
• WAL shipping based
• Streaming WAL (widely used after 9.0)
Hot Standby

18
Hot Standby: WAL shipping

19
Monitor: WAL shipping
• Functions on standby
• pg_is_in_recovery()
• pg_last_xlog/wal_replay_location/lsn()
• pg_last_xact_replay_timestamp()

20
Hot Standby: Streaming Replication

21
Streaming Replication
• Asynchronous Streaming Replication
• Synchronous Streaming Replication
• synchronous_standby_names
E.g.
• FIRST 1 (standby_east, standby_west)
• ANY 3 (standby_east, standby_west, eu_standby_east, eu_standby_west)
• 'standby_east, standby_west'
• synchronous_commit - off/local/remote_write/on/remote_apply

22
Monitor: Streaming Replication
• Views
• Master: pg_stat_replication
• Standby: pg_wal_receiver

Reliable
CrossOver &
Detection

24
Reliable Crossover & Detection
• In a redundant system, the crossover point itself becomes a single point of failure.
• Fault-tolerant systems must provide a reliable crossover or automatic switchover
mechanism to avoid failure.
• Detection of failures:
• If the above two principles are proactively monitored, then a user may never see a
system failure.

25
EDB Postgres Failover Manager:
• Continuously monitors your PostgreSQL service to automatically detect failures.
• After an outage is confirmed, Failover Manager automatically promotes the most up-to-
date standby database as the new master.

26
EDB Postgres Failover Manager:

28
RPO/RTO/MTTR/GPO
Backup And Recovery Tool

29
RPO/RTO/MTTR/GPO
Backup And Recovery Tool

31
High Availability Monitoring
Postgres Enterprise Manager

32
High Availability Monitoring
Postgres Enterprise Manager

Maintenance
Window/
Planned
Downtime

34
Maintenance Window/Planned Downtime
Software Updates/Patching
• Three reasons for software updates
• Remedy known software issues
• General stability and reliability of the software
• Security problem

35
Maintenance Window/Planned Downtime
Software Updates: Strategies
• Three strategies
• All Nodes Patching
• Rolling Patching
• Minimum Downtime Patching

36
Conclusion
• High Availability components
• Hot Standby (Streaming Replication)
• EDB Postgres Failover Manager
• Postgres Enterprise Manager
• Backup And Recovery Tool
• Design consideration
• Near zero downtime software maintenance
• RPO/RTO/GRO

37
Resources
• Blog series
• What Does High Availability Really Mean
• Patching Minor Version in Postgres High Availability (HA) Database Cluster
• Plans & Strategies for DBAs
• Key Parameters and Configuration for Streaming Replication in Postgres 12
• Quick and Reliable Failure Detection with EDB Postgres Failover Manager

38
• Enterprise PostgreSQL innovations
• 4,000+ global customers
• Recognized by Gartner Magic Quadrant for 7 years in a row
• One of the only sub-$1bn revenue companies
• PostgreSQL community leadership
2019
Challengers Leaders
Niche Players Visionaries
Ability
to
execute
Completeness of
vision
1986
The Design  
of PostgreSQL
1996
Birth of
PostgreSQL
2004
EDB
is founded
2020
Today
Materialized
Views
Parallel
Query
JIT
Compilation
Heap Only
Tuples
(HOT)
Serializable
Parallel Query
We’re database fanatics who care  
deeply about PostgreSQL
Expertise

39
Market Success

40
Core team Major contributors Contributors
EDB Open Source Leadership
Named EDB open source committers and contributors
Akshay Joshi Amul Sul Ashesh Vashi Ashutosh Sharma Jeevan Chalke
Dilip Kumar Jeevan Ladhe Mithun Cy Rushabh Lathia Amit Khandekar
Amit Langote Devrim
Gündüz
Robert
Haas
Bruce Momjian
Dave Page
Designates PostgreSQL committers

41
Learn More
Other resources
Thank You
Next Webinar: March 23
Best Practices in Security with PostgreSQL -
German
Postgres Pulse EDB Youtube Channel

Beginner's Guide to High Availability for Postgres

More Related Content

What's hot

Similar to Beginner's Guide to High Availability for Postgres

More from EDB

Recently uploaded

Beginner's Guide to High Availability for Postgres