This document discusses clustering in PostgreSQL to provide high availability and redundancy. It begins by defining high availability and different "nines of availability". It notes that cloud databases like Amazon RDS only provide three and a half nines of availability, leaving room for hours of downtime per year. The document then discusses different clustering architectures like primary-standby replication, active-active multi-node clusters, and globally distributed clusters. It also covers challenges like split-brain situations, network latency, false alarms, and data inconsistency. Finally, it recommends open-source tools like repmgr, pgpool-II and Patroni that can help implement PostgreSQL clustering.
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
This presentation by Krzysztof Książek at Percona Live 2017 in Santa Clara, California gives detailed descriptions and comparisons of the leading open source database load balancing technologies
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
To get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
This ppt was used by Devrim at pgDay Asia 2017. He talked about some important facts about WAL - Transaction Logs or xlogs in PostgreSQL. Some of these can really come handy on a bad day
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
This presentation by Krzysztof Książek at Percona Live 2017 in Santa Clara, California gives detailed descriptions and comparisons of the leading open source database load balancing technologies
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
To get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
This ppt was used by Devrim at pgDay Asia 2017. He talked about some important facts about WAL - Transaction Logs or xlogs in PostgreSQL. Some of these can really come handy on a bad day
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)Aurimas Mikalauskas
Is my MySQL server configured properly? Should I run Community MySQL, MariaDB, Percona or WebScaleSQL? How many innodb buffer pool instances should I run? Why should I NOT use the query cache? How do I size the innodb log file size and what IS that innodb log anyway? All answers are inside.
Aurimas Mikalauskas is a former Percona performance consultant and architect currently writing and teaching at speedemy.com. He's been involved with MySQL since 1999, scaling and optimizing MySQL backed systems since 2004 for companies such as BBC, EngineYard, famous social networks and small shops like EstanteVirtual, Pine Cove and hundreds of others.
Additional content mentioned in the presentation can be found here: http://speedemy.com/17
There are parallels between storing JSON data in PostgreSQL and storing vectors that are produced from AI/ML systems. This lightning talk briefly covers the similarities in use-cases in storing JSON and vectors in PostgreSQL, shows some of the use-cases developers have for querying vectors in Postgres, and some roadmap items for improving PostgreSQL as a vector database.
MySQL Administrator
Basic course
- MySQL 개요
- MySQL 설치 / 설정
- MySQL 아키텍처 - MySQL 스토리지 엔진
- MySQL 관리
- MySQL 백업 / 복구
- MySQL 모니터링
Advanced course
- MySQL Optimization
- MariaDB / Percona
- MySQL HA (High Availability)
- MySQL troubleshooting
네오클로바
http://neoclova.co.kr/
This is the presentation delivered by Karthik.P.R at MySQL User Camp Bangalore on 09th June 2017. ProxySQL is a high performance MySQL Load Balancer Designed to scale database servers.
A look at what HA is and what PostgreSQL has to offer for building an open source HA solution. Covers various aspects in terms of Recovery Point Objective and Recovery Time Objective. Includes backup and restore, PITR (point in time recovery) and streaming replication concepts.
In the first half, we give an introduction to modern serialization systems, Protocol Buffers, Apache Thrift and Apache Avro. Which one does meet your needs?
In the second half, we show an example of data ingestion system architecture using Apache Avro.
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...Umair Shahid
In the world of database management, high availability and disaster recovery are key considerations for any organization that wants to ensure reliable access to its critical data. Clustering in PostgreSQL is one of the most effective ways to achieve both. In this talk, we will explore the ins and outs of PostgreSQL clustering, including the benefits and challenges of different clustering approaches, and how to set up a highly available and disaster-resilient PostgreSQL cluster.
We will dive into topics such as synchronous vs. asynchronous replication, load balancing, failover, and disaster recovery. We will also touch on some open source tools that are readily available to aid in PostgreSQL cluster management.
Clustering in PostgreSQL - Because one database server is never enough (and n...Umair Shahid
In the world of database management, high availability and disaster recovery are key considerations for any organization that wants to ensure reliable access to its critical data. Clustering in PostgreSQL is one of the most effective ways to achieve both. In this talk, we explore the ins and outs of PostgreSQL clustering, including the benefits and challenges of different clustering approaches, and how to set up a highly available and disaster-resilient PostgreSQL cluster.
We dive into topics such as synchronous vs. asynchronous replication, load balancing, failover, and disaster recovery. We also touch on some open source tools that are readily available to aid in PostgreSQL cluster management.
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)Aurimas Mikalauskas
Is my MySQL server configured properly? Should I run Community MySQL, MariaDB, Percona or WebScaleSQL? How many innodb buffer pool instances should I run? Why should I NOT use the query cache? How do I size the innodb log file size and what IS that innodb log anyway? All answers are inside.
Aurimas Mikalauskas is a former Percona performance consultant and architect currently writing and teaching at speedemy.com. He's been involved with MySQL since 1999, scaling and optimizing MySQL backed systems since 2004 for companies such as BBC, EngineYard, famous social networks and small shops like EstanteVirtual, Pine Cove and hundreds of others.
Additional content mentioned in the presentation can be found here: http://speedemy.com/17
There are parallels between storing JSON data in PostgreSQL and storing vectors that are produced from AI/ML systems. This lightning talk briefly covers the similarities in use-cases in storing JSON and vectors in PostgreSQL, shows some of the use-cases developers have for querying vectors in Postgres, and some roadmap items for improving PostgreSQL as a vector database.
MySQL Administrator
Basic course
- MySQL 개요
- MySQL 설치 / 설정
- MySQL 아키텍처 - MySQL 스토리지 엔진
- MySQL 관리
- MySQL 백업 / 복구
- MySQL 모니터링
Advanced course
- MySQL Optimization
- MariaDB / Percona
- MySQL HA (High Availability)
- MySQL troubleshooting
네오클로바
http://neoclova.co.kr/
This is the presentation delivered by Karthik.P.R at MySQL User Camp Bangalore on 09th June 2017. ProxySQL is a high performance MySQL Load Balancer Designed to scale database servers.
A look at what HA is and what PostgreSQL has to offer for building an open source HA solution. Covers various aspects in terms of Recovery Point Objective and Recovery Time Objective. Includes backup and restore, PITR (point in time recovery) and streaming replication concepts.
In the first half, we give an introduction to modern serialization systems, Protocol Buffers, Apache Thrift and Apache Avro. Which one does meet your needs?
In the second half, we show an example of data ingestion system architecture using Apache Avro.
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...Umair Shahid
In the world of database management, high availability and disaster recovery are key considerations for any organization that wants to ensure reliable access to its critical data. Clustering in PostgreSQL is one of the most effective ways to achieve both. In this talk, we will explore the ins and outs of PostgreSQL clustering, including the benefits and challenges of different clustering approaches, and how to set up a highly available and disaster-resilient PostgreSQL cluster.
We will dive into topics such as synchronous vs. asynchronous replication, load balancing, failover, and disaster recovery. We will also touch on some open source tools that are readily available to aid in PostgreSQL cluster management.
Clustering in PostgreSQL - Because one database server is never enough (and n...Umair Shahid
In the world of database management, high availability and disaster recovery are key considerations for any organization that wants to ensure reliable access to its critical data. Clustering in PostgreSQL is one of the most effective ways to achieve both. In this talk, we explore the ins and outs of PostgreSQL clustering, including the benefits and challenges of different clustering approaches, and how to set up a highly available and disaster-resilient PostgreSQL cluster.
We dive into topics such as synchronous vs. asynchronous replication, load balancing, failover, and disaster recovery. We also touch on some open source tools that are readily available to aid in PostgreSQL cluster management.
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
Zero Downtime Architectures based on JEE platform. Almost every big enterprise with online business tries to design its applications in a way that they are always online. But is it also the case when we upgrade the database cluster? When we switch the whole data center? Based on a customer project we try to present common architecture principles that enable you to do all this without any service interruption and the most important: without any stress.
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsDataWorks Summit
As a Hadoop developer, do you want to quickly develop your Hadoop workflows? Do you want to test your workflows in a sandboxed environment similar to production? Do you want to write unit tests for your workflows and add assertions on top of it?
In just a few years, the number of users writing Hadoop/Spark jobs at LinkedIn have grown from tens to hundreds and the number of jobs running every day has grown from hundreds to thousands. With the ever increasing number of users and jobs, it becomes crucial to reduce the development time for these jobs. It is also important to test these jobs thoroughly before they go to production.
We’ve tried to address these issues by creating a testing framework for Hadoop/Spark jobs. The testing framework enables the users to run their jobs in an environment similar to the production environment and on the data which is sampled from the original data. The testing framework consists of a test deployment system, a data generation pipeline to generate the sampled data, a data management system to help users manage and search the sampled data and an assertion engine to validate the test output.
In this talk, we will discuss the motivation behind the testing framework before deep diving into its design. We will further discuss how the testing framework is helping the Hadoop users at LinkedIn to be more productive.
This presentation is about Distributed Applications, points:
- Distribution
- Consistency
- Replication
- Distributed KV in Erlang
- Scaling
- Partitioning
- Observability
Lookout on Scaling Security to 100 Million DevicesScyllaDB
The massive increase of security-related data requires companies to respond with new approaches to ingestion. Learn how Lookout has changed its approach for ingesting telemetry to meet their goal of growing from 1.5 million devices to 100 million devices and beyond, using Kafka Connect and switching from AWS DynamoDB to Scylla.
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
Moving to a new home is daunting. Packing up all your things, getting a vehicle to move it all, unpacking it, updating your mailing address, and making sure you did not leave anything behind. Well, the move to MongoDB Atlas is similar, but all the logistics are already figured out for you by MongoDB.
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
"Confluent Cloud is a cloud-native service based on Apache Kafka. We run tens of thousands of clusters across all major cloud service providers (AWS, GCP and Azure). In this talk, we will go over our journey to make Confluent Cloud 10x faster than Apache Kafka.
We will talk about how we designed our various workloads, the complexities involved in our cloud-native service, the challenges we faced, and the various pitfalls we ran into. We will also cover the interesting learnings, which in hindsight, are first principles from this multi-year journey.
By attending this talk, attendees will be able to take our learnings from making Confluent Cloud latencies 10x better and possibly apply similar principles to their cloud native data streaming systems."
[@IndeedEng] Redundant Array of Inexpensive Datacentersindeedeng
Video available: http://youtu.be/hOsA5UpPUSU
Learn how Indeed built one of the fastest and most reliable websites in the world. Indeed Operations ensures indeed.com is always available and always fast for the jobseeker. Operations leaders Charles Valentine and Chris Graf will share how we configure and provision multiple datacenters around the world to provide a massively scalable platform for connecting job seekers with jobs. Charles and Chris will detail a simple and inexpensive method to build a platform that provides DNS-based global load balancing and failover, provider portability, and disposable datacenters.
Speakers:
Charles Valentine (VP of Technology Services at Indeed) leads the Operations, IT, and Security teams. Prior to joining Indeed in 2011, Charles served as VP Technology Services at The Knot.
Chris Graf has managed operations at Indeed since 2011. In that time, Indeed's traffic has grown by more than 300%. Prior to Indeed, Chris managed Web operations in the online gaming industry.
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...Umair Shahid
Discover the journey of PostgreSQL from its origins at UC Berkeley to its status as the world’s most favored open-source database. This presentation will explore significant historical shifts towards open-source software in enterprise settings and PostgreSQL’s continuous evolution to meet changing technological standards. We will highlight its extensive feature set, accommodating a variety of use cases such as GIS, full text search, JSON, graph, time series, business intelligence, and vector storage. Additionally, we will examine why PostgreSQL’s versatility is highly valued by developers, offering ease of development along with improved performance and scalability.
20221019 - Singapore Roadshow - Open source licenses, the impact on PostgreSQ...Umair Shahid
All open source licenses are not created equal, and PostgreSQL’s license is as liberal as they get. Join me as I walk through the different types of licenses, how PostgreSQL’s license contributed to its popularity, and where I see databases in general and PostgreSQL in particular headed.
Driving the future of PostgreSQL adoptionUmair Shahid
PostgreSQL is the most wanted database, but where do we go from here? This talk dives into how the technology world is evolving and what PostgreSQL vendors need to do in order to stay on the cutting edge.
Haroon walked us through various tips and tricks on how we can enhance PostgreSQL performance while highlighting some typical pitfalls people encounter. If you are planning for capacity, doing scalability analysis, or simply facing degradation in performance of your apps or queries running against PostgreSQL, you should definitely attend this session.
Haroon walked us through various tips and tricks on how we can enhance PostgreSQL performance while highlighting some typical pitfalls people encounter. If you are planning for capacity, doing scalability analysis, or simply facing degradation in performance of your apps or queries running against PostgreSQL, you should definitely attend this session.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
2. -[ RECORD 1 ]--------------------------------------------
name | Umair Shahid
description | 20 year veteran of the PostgreSQL community
company | Stormatics
designation | Founder
location | Islamabad, Pakistan
family | Mom, Wife & 2 kids
kid1 | Son, 16 year old
kid2 | Daughter, 13 year old
postgres=# select * from umair;
3. PostgreSQL Solutions for the Enterprise
Clustering & Analytics, backed by 24/7 Support and Professional Services
4. 10 years of
popularity
As measured by
PostgreSQL Oracle
MySQL SQL Server
https://db-engines.com/en/ranking_trend
5. Loved by Developers
2022 Developer Survey
Most
Loved
Most Used
(Professional Developers)
Most
Wanted
7. What is High Availability?
● Remain operational even in the face of hardware or software failure
● Minimize downtime
● Essential for mission-critical applications that require 24/7 availability
● Measured in ‘Nines of Availability’
8. Nines of Availability
Availability Downtime per year
90% (one nine) 36.53 days
99% (two nines) 3.65 days
99.9% (three nines) 8.77 hours
99.99% (four nines) 52.60 minutes
99.999% (five nines) 5.26 minutes
9. But my database resides
in the cloud, and the
cloud is always available
Right?
11. Amazon RDS Service Level Agreement
Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are covered by the
Amazon RDS Service Level Agreement ("SLA"). The RDS SLA affirms that AWS will use
commercially reasonable efforts to make Multi-AZ instances of Amazon RDS available
with a Monthly Uptime Percentage of at least 99.95% during any monthly billing cycle. In
the event Amazon RDS does not meet the Monthly Uptime Percentage commitment,
affected customers will be eligible to receive a service credit.*
99.95% = three and a half nines = 4.38 hours of downtime per year!!!
*
https://aws.amazon.com/rds/ha/
12. So - what do I do if I want
better reliability for my
mission-critical data?
Clustering!
13. ● Multiple database servers work
together to provide redundancy
● Gives the appearance of a single
database server
● Application communicates with
the primary PostgreSQL instance
● Data is replicated to standby
instances
● Auto failover in case the primary
node goes down
What is clustering?
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
14. What is auto failover?
Primary
Standby 1 Standby 2
Application
Standby 1
Primary
Standby 2 New Standby
Application
Primary
Standby 1 Standby 2
Application
1 2 3
- Primary node goes down - Standby 1 gets promoted to Primary
- Standby 2 becomes subscriber to
Standby 1
- New Standby is added to the cluster
- Application talks to the new Primary
15. ● Write to the primary
PostgreSQL instance and
read from standbys
● Data redundancy through
replication to two standbys
● Auto failover in case the
primary node goes down
Clusters with load balancing
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
16. ● Incremental backups
● Redundancy introduced
from primary as well
standbys
● RTO and RPO balance
achieved per organizational
requirements
● Point-in-time recovery
Clusters with backups and disaster recovery
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Incremental
Backup
Incremental
Backup
Incremental
Backup
Backup
17. ● Shared-Everything architecture
● Load balancing for read as well as
write operations
● Database redundancy to achieve
high availability
● Asynchronous replication between
nodes for better efficiency
*with conflict resolution at the application layer
Multi-node clusters with Active-Active configuration*
Active 2
Application
Write Read Replicate
Active 1 Active 3
18. ● Shared-Nothing architecture
● Automatic data sharding
based on defined criteria
● Read and write operations
are auto directed to the
relevant node
● Each node can have its own
standbys for high availability
Node 2
Application
Node 1 Node 3
Multi-node clusters with data sharding and horizontal scaling
Coordinator
Write Read
19. Globally distributed clusters
● Spin up clusters on the public
cloud, private cloud, on-prem,
bare metal, VMs, or a hybrid of
all the above
● Geo fencing for regulatory
compliance
● High availability across data
centers and geographies
20. Replication - synchronous vs asynchronous
Synchronous
● Data is transferred immediately
● Transaction waits for confirmation from
replica before it commits
● Ensures data consistency across all nodes
● Performance overhead caused by latency
● Used where data accuracy is critical, even
at the expense of performance
Asynchronous
● Data may not be transferred immediately
● Transaction commits without waiting for
confirmation from replica
● Data may be inconsistent across nodes
● Faster and more scalable
● Used where performance matters more
than data accuracy
23. ● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
24. ● Defined: Node in a highly available cluster lose connectivity
with each other but continue to function independently
● Challenge: More than one node believes that it is the primary
leading to inconsistencies and possible data loss
Split Brain
25. ● Quorum based voting
○ Majority nodes must agree on
primary
○ Cluster stops if quorum is not
achieved
● Redundancy and failover
○ Prevents a single point of
failure
● Proper configuration
● Split brain resolver
○ Software based detection &
resolution
○ Can shut down nodes in case
of scenario
● Network segmentation
○ Physical network separation
○ Avoid congestion
Split Brain - Prevention
26. Split Brain - Resolution
● Witness node
○ Third node that acts as a tie-breaker
○ The winner acts as primary, the loser is shut down
● Consensus algorithm
○ Paxos and Raft protocols are popular
● Manual resolution
○ DBA observes which nodes are competing to be the primary
○ Takes action based on best practices learnt from experience
27. ● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
28. ● Defined: Time delay when data is transmitted from one point to another
● Challenge: Delayed replication can result in data loss. Delayed signals
can trigger a false positive.
Network Latency
29. ● Deploy nodes in proximity
○ Reduces time delay and network
hops
● Implement quorum-based system
○ If quorum isn’t achieved failover
won’t occur
● High speed, low latency
networking
○ High quality hardware and
associated software
● Optimize database configuration
○ Parameter tuning based on
workloads
○ max_connections,
tcp_keealive_idle, …
● Alerting system
○ Detect and alert admins of
possible issues to preempt false
positives
Network Latency - Preventing False Positives
30. ● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
31. ● Defined: A problem is reported, but in reality, there is no issue
● Challenge: Can trigger a failover when one isn’t required, leading to
unnecessary disruptions and impacting performance
False Alarms
32. False Alarms - Prevention
● Proper configuration
○ Best practices, past experience, and some hit & trial is required to ensure that the
thresholds are configured appropriately
● Regular maintenance
○ Latest version of software and firmware to be used in order to avoid known bugs
and exploits
● Regular testing
○ Testing of various use cases can help identify bottlenecks and possible
misconfigurations
● Multiple monitoring tools
○ Multiple tools can help cross-reference alerts and confirm if failover is required
33. False Alarms - Resolution
In case a false alarm is triggered …
● Check logs
○ Check the logs of cluster nodes, network devices, and other infrastructure
components to identify any anomalies
● Notify stakeholders
○ Notify all stakeholders involved in the cluster’s operations to prevent any
unnecessary action
● Monitor cluster health
○ Monitor to cluster’s health closely to ensure that it is functioning correctly
and no further false alarms are triggered
34. ● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
35. ● Defined: Situations where data in different nodes of a cluster becomes out of
sync, leading to inconsistent results and potential data corruption
● Challenge: Inaccurate query results that vary based on which node is queried.
Such issues are very hard to debug.
Data Inconsistency
36. ● Synchronous replication
○ Ensures data is synchronized
across all nodes before it is
committed
○ Induces a performance overhead
● Load balancer
○ Ensure that all queries from a
specific application are sent to
the same node
○ Relies on eventual consistency of
the cluster
Data Inconsistency - Prevention
● Monitoring tools
○ Help with early identification of
possible issues so you can take
preventive measures
● Maintenance windows
○ Minimize data disruption with
planned downtime
● Regular testing
○ Test production use cases
regularly to ensure the cluster is
behaving as expected
37. Data Inconsistency - Resolution
● Resynchronization
○ Manual sync between nodes to correct data inconsistencies
● Rollback
○ Point-in-time recovery using tools like pg_rewind and pg_backrest
● Monitoring
○ Diligent monitoring to ensure that the issue doesn’t recur