SlideShare a Scribd company logo
Clustering in PostgreSQL
Because one database server is never enough
(and neither is two)
Chicago PostgreSQL User Group
15th May 2024
postgres=# select * from umair;
-[ RECORD 1 ]-----------------------------
name | Umair Shahid
description | 20+ year PostgreSQL veteran
company | Stormatics
designation | Founder
location | Islamabad, Pakistan
family | Mom, Wife & 2 kids
kid1 | Son, 17 year old
kid2 | Daughter, 14 year old
Our mission is to help businesses scale
PostgreSQL reliably for critical data
On to the topic now!
What is High Availability?
● Remain operational even in the face of hardware or
software failure
● Minimize downtime
● Essential for mission-critical applications that
require 24/7 availability
● Measured in ‘Nines of Availability’
Nines of Availability
Availability Downtime per year
90% (one nine) 36.53 days
99% (two nines) 3.65 days
99.9% (three nines) 8.77 hours
99.99% (four nines) 52.60 minutes
99.999% (five nines) 5.26 minutes
But my database resides
in the cloud, and the
cloud is always available
Right?
Wrong!
Amazon RDS Service Level Agreement
Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are
covered by the Amazon RDS Service Level Agreement ("SLA"). The RDS SLA
affirms that AWS will use commercially reasonable efforts to make Multi-AZ
instances of Amazon RDS available with a Monthly Uptime Percentage of at
least 99.95% during any monthly billing cycle. In the event Amazon RDS does
not meet the Monthly Uptime Percentage commitment, affected customers
will be eligible to receive a service credit.*
99.95% = 4.38 hours of downtime per year!
22 minutes of downtime per month!
* https://aws.amazon.com/rds/ha/
So - what do I do if I want
better reliability for my
mission-critical data?
Clustering!
What is clustering?
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
● Multiple database servers work
together to provide redundancy
● Gives the appearance of a single
database server
● Application communicates with
the primary PostgreSQL instance
● Data is replicated to standby
instances
● Auto failover in case the primary
node goes down
What is auto failover?
Primary
Standby 1 Standby 2
Application
Standby 1
Primary
Standby 2 New Standby
Application
Primary
Standby 1 Standby 2
Application
1 2 3
* Primary node goes down * Standby 1 gets promoted to Primary
* Standby 2 becomes subscriber to
Standby 1
* New Standby is added to the cluster
* Application talks to the new Primary
● Write to the primary
PostgreSQL instance and
read from standbys
● Data redundancy through
replication to two standbys
● Auto failover in case the
primary node goes down
Clusters with load balancing
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Clusters with backups and disaster recovery
● Off-site backups
● RTO and RPO requirements
dictate configuration
● Point-in-time recovery
It is extremely important to periodically test your backups
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Backup Backup
● Shared-Everything architecture
● Load balancing for read as well
as write operations
● Database redundancy to achieve
high availability
● Asynchronous replication
between nodes for better
efficiency
* with conflict resolution at the application layer
Multi-node clusters with Active-Active configuration*
Active 2
Application
Write Read Replicate
Active 1 Active 3
Multi-node clusters with data sharding and horizontal scaling
Node 2
Application
Node 1 Node 3
Coordinator
Write Read
● Shared-Nothing architecture
● Automatic data sharding based
on defined criteria
● Read and write operations are
auto directed to the relevant
node
● Each node can have its own
standbys for high availability
Globally distributed clusters
● Spin up clusters on the cloud,
on-prem, bare metal, VMs, or
a hybrid of the above
● Geo fencing for regulatory
compliance and better local
performance
● High availability across data
centers and geographies
Asynchronous
● Data may not be transferred immediately
● Transaction commits without waiting for
confirmation from replica
● Data may be inconsistent across nodes
● Faster and more scalable
● Used where performance matters more
than data accuracy
Replication - Synchronous vs Asynchronous
Synchronous
● Data is transferred immediately
● Transaction waits for confirmation from
replica before it commits
● Ensures data consistency across all
nodes
● Performance overhead caused by latency
● Used where data accuracy is critical, even
at the expense of performance
#AI
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Split Brain
Defined
Node in a highly available cluster lose
connectivity with each other but continue to
function independently
Challenge
More than one node believes that it is the
primary leading to inconsistencies and
possible data loss
● Network reliability and redundancy
○ Minimize the risk of partitions due to
connectivity issues
○ Redundant network hardware and paths
between nodes
○ Reliable cross datacenter connectivity
● Miscellaneous
○ Monitoring and alerts
○ Regular testing
○ Clear and precise documentation
○ Training
Split Brain - Prevention
● Use a reliable cluster manager
○ Algos and heartbeat mechanisms to
monitor node availability
○ Make decisions about failovers and
promotions
● Quorum-based decision making
○ Majority of nodes must agree on primary
node’s status
○ Requires odd number of nodes
● Witness server
○ Used to achieve a majority in an even-node
cluster
○ Does not store data
1. Identify the situation
○ Monitoring and alerting is crucial
2. Stop traffic
○ Application will need to pause
3. Determine the most up to date node
○ Compare transaction logs, timestamps,
transaction IDs, etc …
4. Isolate the nodes from each other
○ Prevent further replication so outdated
data does not overwrite latest one
5. Restore data consistency
○ Apply missed transactions
○ Resolve data conflicts
6. Reconfigure replication
○ Make the most update to date node the
primary
○ Reinstate the remaining nodes as replicas
7. Confirm integrity of the cluster
○ Monitor and double-check replication
8. Re-enable traffic
○ Allow read-only traffic, confirm reliability,
then allow write operations
9. Run a retrospective
○ Thorough analysis of the incident to
prevent future occurrences
○ Update docs and training to capture the
cause of split brain
Split Brain - Resolution
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
Time delay when data is transmitted from
one point to another
Challenge
Delayed replication can result in data loss.
Delayed signals can trigger a false positive
for failover.
Network Latency
● Network congestion
● Low quality network hardware
● Distance between nodes
● Virtualization overheads
● Bandwidth limitations
● Security devices and policies
● Transmission medium
Network Latency - Causes
● Employ redundancy
○ Network paths as well as health checks
● Best practices
○ Test and simulate various network
conditions
○ Monitoring and alerting for early detection
of problems
○ Documentation of rationale behind values
chosen
○ Periodic training
● Adjust heartbeat & timeout settings
○ Fine tune frequency of heartbeat and
timeout to match typical network behavior
● High speed & low latency network
○ Investing in high quality networking pays
dividends
● Quorum-based decision making
○ Majority of nodes must agree on primary
node’s status
○ Requires odd number of nodes or a
witness node for tie-breaker
Network Latency - Prevention of False Positive
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
A problem is reported, but in reality, there is
no issue
Challenge
Can trigger a failover when one isn’t
required, leading to unnecessary disruptions
and impacting performance
False Alarms
False Alarms - Causes
● Network issues
○ Latency, congestion, misconfiguration
● Configuration errors
○ Thresholds set too low?
● Resource constraints
○ High CPU load, memory pressure, I/O bottleneck
● Human error
○ Misreading information, miscommunication of scheduled maintenance, …
● Database locks
○ Long running queries with exclusive locks
False Alarms - Prevention
● Optimized thresholds
○ Best practices, past experience, and some hit & trial is required to ensure that the thresholds
are configured appropriately
● Regular upgrades and testing
○ Latest version of software and firmware to be used
○ Testing of various use cases can help identify possible misconfigurations
● Resource and performance optimization
○ Regularly monitor resource utilization and tune queries and database for performance
○ Maintenance tasks like vacuum, analyze, …
● Comprehensive monitoring and alerting
○ Monitoring can help with early detection of anomalies
○ Alerts can give early warnings as the database approaches defined thresholds
Challenges in
Clustering
● Split brain
● Network latency
● False alarms
● Data inconsistency
Defined
Situations where data in different nodes of a
cluster becomes out of sync, leading to
inconsistent results and potential data
corruption
Challenge
Inaccurate query results that vary based on
which node is queried. Such issues are very
hard to debug.
Data Inconsistency
● Replication lag
○ Network latency and high workloads can be big contributors
○ Data loss in case of failover
● Split brain
● Incorrect configuration
○ Log shipping configurations
○ Replication slots setup
○ Replication filters
Data Inconsistency - Causes
Data Inconsistency - Prevention
● Closely manage asynchronous replication
○ Closely monitor pg_stat_replication for replication lag
○ Place nodes in close proximity and use high quality network hardware
● Regularly check XID across the cluster
● Monitor replication conflicts and resolve promptly
● Regular maintenance and performance optimization
○ Vacuum, analyze, …
○ XID wraparound
This all sounds really hard
Open source clustering tools for PostgreSQL
● Repmgr
○ https://repmgr.org/
○ GPL v3
○ Provides automatic failover
○ Manage and monitor replication
● pgpool-II
○ https://pgpool.net/
○ Similar to BSD & MIT
○ Middleware between PostgreSQL and client applications
○ Connection pooling, load balancing, caching, and automatic failover
● Patroni
○ https://patroni.readthedocs.io/en/latest/
○ MIT
○ Template for PostgreSQL high availability clusters
○ Automatic failover, configuration management, & cluster management
Questions?
pg_umair

More Related Content

Similar to 20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database server is never enough (and neither is two)

Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
Ankita Kapratwar
 
Design patterns for scaling web applications
Design patterns for scaling web applicationsDesign patterns for scaling web applications
Design patterns for scaling web applications
Ivan Dimitrov
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IO
Piyush Katariya
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
Alexander Penev
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
Scott Hernandez
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Codership Oy - Creators of Galera Cluster
 
Best Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDSBest Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDS
Real-Time Innovations (RTI)
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
Data Management in Cloud Platforms
Data Management in Cloud PlatformsData Management in Cloud Platforms
Data Management in Cloud Platforms
shnkoc
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
Varad Meru
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
Using Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WANUsing Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WAN
philip_stoev
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
MariaDB plc
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
MariaDB plc
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Exploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at RobinhoodExploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at Robinhood
Alluxio, Inc.
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Spark Summit
 

Similar to 20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database server is never enough (and neither is two) (20)

Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Design patterns for scaling web applications
Design patterns for scaling web applicationsDesign patterns for scaling web applications
Design patterns for scaling web applications
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IO
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Best Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDSBest Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDS
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Data Management in Cloud Platforms
Data Management in Cloud PlatformsData Management in Cloud Platforms
Data Management in Cloud Platforms
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Using Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WANUsing Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WAN
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Exploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at RobinhoodExploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at Robinhood
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 

More from Umair Shahid

20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
Umair Shahid
 
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
Umair Shahid
 
Driving the future of PostgreSQL adoption
Driving the future of PostgreSQL adoptionDriving the future of PostgreSQL adoption
Driving the future of PostgreSQL adoption
Umair Shahid
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuning
Umair Shahid
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuning
Umair Shahid
 
Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogical
Umair Shahid
 

More from Umair Shahid (6)

20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...
 
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...
 
Driving the future of PostgreSQL adoption
Driving the future of PostgreSQL adoptionDriving the future of PostgreSQL adoption
Driving the future of PostgreSQL adoption
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuning
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuning
 
Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogical
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database server is never enough (and neither is two)

  • 1. Clustering in PostgreSQL Because one database server is never enough (and neither is two) Chicago PostgreSQL User Group 15th May 2024
  • 2. postgres=# select * from umair; -[ RECORD 1 ]----------------------------- name | Umair Shahid description | 20+ year PostgreSQL veteran company | Stormatics designation | Founder location | Islamabad, Pakistan family | Mom, Wife & 2 kids kid1 | Son, 17 year old kid2 | Daughter, 14 year old
  • 3. Our mission is to help businesses scale PostgreSQL reliably for critical data
  • 4. On to the topic now!
  • 5. What is High Availability? ● Remain operational even in the face of hardware or software failure ● Minimize downtime ● Essential for mission-critical applications that require 24/7 availability ● Measured in ‘Nines of Availability’
  • 6. Nines of Availability Availability Downtime per year 90% (one nine) 36.53 days 99% (two nines) 3.65 days 99.9% (three nines) 8.77 hours 99.99% (four nines) 52.60 minutes 99.999% (five nines) 5.26 minutes
  • 7. But my database resides in the cloud, and the cloud is always available Right?
  • 9. Amazon RDS Service Level Agreement Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are covered by the Amazon RDS Service Level Agreement ("SLA"). The RDS SLA affirms that AWS will use commercially reasonable efforts to make Multi-AZ instances of Amazon RDS available with a Monthly Uptime Percentage of at least 99.95% during any monthly billing cycle. In the event Amazon RDS does not meet the Monthly Uptime Percentage commitment, affected customers will be eligible to receive a service credit.* 99.95% = 4.38 hours of downtime per year! 22 minutes of downtime per month! * https://aws.amazon.com/rds/ha/
  • 10. So - what do I do if I want better reliability for my mission-critical data? Clustering!
  • 11. What is clustering? Primary Standby 1 Standby 2 Application Write Read Replicate ● Multiple database servers work together to provide redundancy ● Gives the appearance of a single database server ● Application communicates with the primary PostgreSQL instance ● Data is replicated to standby instances ● Auto failover in case the primary node goes down
  • 12. What is auto failover? Primary Standby 1 Standby 2 Application Standby 1 Primary Standby 2 New Standby Application Primary Standby 1 Standby 2 Application 1 2 3 * Primary node goes down * Standby 1 gets promoted to Primary * Standby 2 becomes subscriber to Standby 1 * New Standby is added to the cluster * Application talks to the new Primary
  • 13. ● Write to the primary PostgreSQL instance and read from standbys ● Data redundancy through replication to two standbys ● Auto failover in case the primary node goes down Clusters with load balancing Primary Standby 1 Standby 2 Application Write Read Replicate
  • 14. Clusters with backups and disaster recovery ● Off-site backups ● RTO and RPO requirements dictate configuration ● Point-in-time recovery It is extremely important to periodically test your backups Primary Standby 1 Standby 2 Application Write Read Replicate Backup Backup
  • 15. ● Shared-Everything architecture ● Load balancing for read as well as write operations ● Database redundancy to achieve high availability ● Asynchronous replication between nodes for better efficiency * with conflict resolution at the application layer Multi-node clusters with Active-Active configuration* Active 2 Application Write Read Replicate Active 1 Active 3
  • 16. Multi-node clusters with data sharding and horizontal scaling Node 2 Application Node 1 Node 3 Coordinator Write Read ● Shared-Nothing architecture ● Automatic data sharding based on defined criteria ● Read and write operations are auto directed to the relevant node ● Each node can have its own standbys for high availability
  • 17. Globally distributed clusters ● Spin up clusters on the cloud, on-prem, bare metal, VMs, or a hybrid of the above ● Geo fencing for regulatory compliance and better local performance ● High availability across data centers and geographies
  • 18. Asynchronous ● Data may not be transferred immediately ● Transaction commits without waiting for confirmation from replica ● Data may be inconsistent across nodes ● Faster and more scalable ● Used where performance matters more than data accuracy Replication - Synchronous vs Asynchronous Synchronous ● Data is transferred immediately ● Transaction waits for confirmation from replica before it commits ● Ensures data consistency across all nodes ● Performance overhead caused by latency ● Used where data accuracy is critical, even at the expense of performance
  • 19. #AI
  • 20. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 21. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 22. Split Brain Defined Node in a highly available cluster lose connectivity with each other but continue to function independently Challenge More than one node believes that it is the primary leading to inconsistencies and possible data loss
  • 23. ● Network reliability and redundancy ○ Minimize the risk of partitions due to connectivity issues ○ Redundant network hardware and paths between nodes ○ Reliable cross datacenter connectivity ● Miscellaneous ○ Monitoring and alerts ○ Regular testing ○ Clear and precise documentation ○ Training Split Brain - Prevention ● Use a reliable cluster manager ○ Algos and heartbeat mechanisms to monitor node availability ○ Make decisions about failovers and promotions ● Quorum-based decision making ○ Majority of nodes must agree on primary node’s status ○ Requires odd number of nodes ● Witness server ○ Used to achieve a majority in an even-node cluster ○ Does not store data
  • 24. 1. Identify the situation ○ Monitoring and alerting is crucial 2. Stop traffic ○ Application will need to pause 3. Determine the most up to date node ○ Compare transaction logs, timestamps, transaction IDs, etc … 4. Isolate the nodes from each other ○ Prevent further replication so outdated data does not overwrite latest one 5. Restore data consistency ○ Apply missed transactions ○ Resolve data conflicts 6. Reconfigure replication ○ Make the most update to date node the primary ○ Reinstate the remaining nodes as replicas 7. Confirm integrity of the cluster ○ Monitor and double-check replication 8. Re-enable traffic ○ Allow read-only traffic, confirm reliability, then allow write operations 9. Run a retrospective ○ Thorough analysis of the incident to prevent future occurrences ○ Update docs and training to capture the cause of split brain Split Brain - Resolution
  • 25. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 26. Defined Time delay when data is transmitted from one point to another Challenge Delayed replication can result in data loss. Delayed signals can trigger a false positive for failover. Network Latency
  • 27. ● Network congestion ● Low quality network hardware ● Distance between nodes ● Virtualization overheads ● Bandwidth limitations ● Security devices and policies ● Transmission medium Network Latency - Causes
  • 28. ● Employ redundancy ○ Network paths as well as health checks ● Best practices ○ Test and simulate various network conditions ○ Monitoring and alerting for early detection of problems ○ Documentation of rationale behind values chosen ○ Periodic training ● Adjust heartbeat & timeout settings ○ Fine tune frequency of heartbeat and timeout to match typical network behavior ● High speed & low latency network ○ Investing in high quality networking pays dividends ● Quorum-based decision making ○ Majority of nodes must agree on primary node’s status ○ Requires odd number of nodes or a witness node for tie-breaker Network Latency - Prevention of False Positive
  • 29. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 30. Defined A problem is reported, but in reality, there is no issue Challenge Can trigger a failover when one isn’t required, leading to unnecessary disruptions and impacting performance False Alarms
  • 31. False Alarms - Causes ● Network issues ○ Latency, congestion, misconfiguration ● Configuration errors ○ Thresholds set too low? ● Resource constraints ○ High CPU load, memory pressure, I/O bottleneck ● Human error ○ Misreading information, miscommunication of scheduled maintenance, … ● Database locks ○ Long running queries with exclusive locks
  • 32. False Alarms - Prevention ● Optimized thresholds ○ Best practices, past experience, and some hit & trial is required to ensure that the thresholds are configured appropriately ● Regular upgrades and testing ○ Latest version of software and firmware to be used ○ Testing of various use cases can help identify possible misconfigurations ● Resource and performance optimization ○ Regularly monitor resource utilization and tune queries and database for performance ○ Maintenance tasks like vacuum, analyze, … ● Comprehensive monitoring and alerting ○ Monitoring can help with early detection of anomalies ○ Alerts can give early warnings as the database approaches defined thresholds
  • 33. Challenges in Clustering ● Split brain ● Network latency ● False alarms ● Data inconsistency
  • 34. Defined Situations where data in different nodes of a cluster becomes out of sync, leading to inconsistent results and potential data corruption Challenge Inaccurate query results that vary based on which node is queried. Such issues are very hard to debug. Data Inconsistency
  • 35. ● Replication lag ○ Network latency and high workloads can be big contributors ○ Data loss in case of failover ● Split brain ● Incorrect configuration ○ Log shipping configurations ○ Replication slots setup ○ Replication filters Data Inconsistency - Causes
  • 36. Data Inconsistency - Prevention ● Closely manage asynchronous replication ○ Closely monitor pg_stat_replication for replication lag ○ Place nodes in close proximity and use high quality network hardware ● Regularly check XID across the cluster ● Monitor replication conflicts and resolve promptly ● Regular maintenance and performance optimization ○ Vacuum, analyze, … ○ XID wraparound
  • 37. This all sounds really hard
  • 38. Open source clustering tools for PostgreSQL ● Repmgr ○ https://repmgr.org/ ○ GPL v3 ○ Provides automatic failover ○ Manage and monitor replication ● pgpool-II ○ https://pgpool.net/ ○ Similar to BSD & MIT ○ Middleware between PostgreSQL and client applications ○ Connection pooling, load balancing, caching, and automatic failover ● Patroni ○ https://patroni.readthedocs.io/en/latest/ ○ MIT ○ Template for PostgreSQL high availability clusters ○ Automatic failover, configuration management, & cluster management