Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Ā
šļøImprove database performance with connection pooling and load balancing techniques
1. š Improve database performance with connection
pooling and load balancing š¢
A complete review of load balancing solutions in popular RDBMS
We tend to rely on caching solutions to improve database performance. Caching frequently-
accessed queries in memory or via a database can optimize write/read performance and reduce
network latency, especially for heavy-workload applications, such as gaming services and Q&A
portals. But you can further improve performance by pooling users' connections to a database.
Client users need to create a connection to a web service before they can perform CRUD
operations. Most web services are backed by relational database servers such as Postgres or
MySQL. With PostgreSQL, each new connection can take up to 1.3MB in memory. In a
production environment where we expect to receive thousands or millions of concurrent
connections to the backend service, this can quickly exceed your memory resources (or if you
have a scalable cloud, it can get very expensive very quickly).
Because each time a client attempts to access a backend service, it requires OS resources to
create, maintain, and close connections to the datastore. This creates a large amount of overhead
causing database performance to deteriorate.
Consumers of your service expect fast response times. If that performance deteriorates, it can
lead to poor user experiences, revenue losses, and even unscheduled downtime. If you expose
your backend service as an API, repeated slowdowns and failures could cause cascading
problems and lose you customers.
Instead of opening and closing connections for every request, connection pooling uses a cache of
database connections that can be reused when future requests to the database are required. It
lets your database scale e
ff
ectively as the data stored there and the number of clients accessing it
grow. Tra
ffi
c is never constant, so pooling can better manage tra
ffi
c peaks without causing
outages. Your production database shouldnāt be your bottleneck.
Oracle load balancing & Pooling solutions:
Oracle Real application clustering (RAC)
When using RAC you can con
fi
gure SCAN-IP to achieve load balancing. But cause of some
applications behavior and existing limitation, you can not always using Scan ip, so you have to
create services to minimize tra
ffi
c on interconnection, in this case you lost load balancing feature
of Rac and use scalability and
high availability of Rac.
1-Oracle Tra
ffi
c Director Manager(TDM)
Oracle Tra
ffi
c Director is a fast, reliable, and scalable layer-7 software load balancer. You can set
up Oracle Tra
ffi
c Director to serve as the reliable entry point for all HTTP, HTTPS and TCP tra
ffi
c to
application servers and web servers in the back end. Depending on the needs of your IT
environment, you can con
fi
gure Oracle Tra
ffi
c Director to apply multiple, complex rules when
distributing requests to the back-end servers and when forwarding responses to clients.
Oracle Tra
ffi
c Director distributes the requests that it receives from clients to servers in the back
end based on the speci
fi
ed load-balancing method, routes the requests based on speci
fi
ed rules,
caches frequently accessed data, prioritizes tra
ffi
c, and controls the quality of service.
2. On engineered systems platforms, you can set up pairs of Oracle Tra
ffi
c Director instances and
leverage its built-in High Availability capability to setup either Active-Passive or Active-Active
failover. As the volume of tra
ffi
c to your network grows, you can easily scale the environment by
recon
fi
guring Oracle Tra
ffi
c Director with additional back-end servers to which it can route
requests.
Oracle Tra
ffi
c Director provides the following features:
ā¢ Advanced methods for load distribution
Con
fi
gure Oracle Tra
ffi
c Director to distribute client requests to servers in the back-end using
one of these methods:
ā¦ Round robin
ā¦ Least connection count
ā¦ Least response time
ā¦ Weighted round robin
ā¦ Weighted least connection count
ā¢ Flexible routing and load control on back-end servers
ā¦ Request-based routing
ā¦ Content-based routing
ā¦ Request rate acceleration
ā¦ Connection limiting
ā¢ Controlling the request load and quality of service
ā¦ Request rate limiting
ā¦ Quality of service tuning
ā¢ Support for WebSocket connections
ā¢ Integration with Oracle Fusion Middleware
ā¢ Easy-to-use administration interfaces
ā¢ Security
Oracle Tra
ffi
c Director enables and enhances security for your IT infrastructure in the following
ways:
ā¦ Reverse proxy
ā¦ Support for TLS 1.0, 1.1, and 1.2
ā¦ Web Application Firewall
ā¦ HTTP Forward Proxy Support in Origin Server Pools
ā¢ High availability
Oracle Tra
ffi
c Director provides high availability for your enterprise applications and services
through the following mechanisms:
ā¦ Health checks for the back end
ā¦ Backup servers in the back end
ā¦ Failover for load balancing
ā¦ Dynamic recon
fi
guration
ā¢ Monitoring statistics
Administrators can monitor a wide range of statistics pertaining to the performance of Oracle
Tra
ffi
c Director instances through several methods: the administration console, the command-
line interface, and a report in XML format.
ā¢ High performance
ā¦ SSL/TLS o
ffl
oading
ā¦ Content caching
ā¦
HTTP compression
Oracle Tra
ffi
c Director Terminology
An Oracle Tra
ffi
c Director con
fi
guration is a collection of elements that de
fi
ne the run-time
behavior of an Oracle Tra
ffi
c Director instance. An Oracle Tra
ffi
c Director con
fi
guration contains
information about various elements of an Oracle Tra
ffi
c Director instance such as listeners, origin
servers, failover groups, and logs.
3. 2-Orcale Connection Manager (CMAN)
CMAN is one of the useful software that help you achieve connectivity management easily.
Oracle Connection Manager (CMAN) is a multi-purpose database connection proxy server used in
Oracle deployments. CMAN is available as part of the Oracle Database Enterprise Edition (EE)
client package.
CMAN in Tra
ffi
c Director Mode (TDM), or CMAN-TDM, brings in intelligence to manage incoming
client connections through session multiplexing and dynamic load balancing that can optimize the
usage of database resources.
Session multiplexing can be handled e
ffi
ciently through the Proxy Resident Connection Pooling
(PRCP) feature in CMAN-TDM. When PRCP is enabled on CMAN-TDM, connection pool(s) are
created on the CMAN-TDM server that enable mid-tier clients to share connections to the
database.
Oracle Database 21c and beyond supports the multi-tenant architecture model with multiple
Pluggable Databases (PDBs) and a single Container Database (CDB). Until recently, PRCP pools
can be created for each database service. So if a PDB database instance has multiple services,
then a PRCP pool is created for each of the services.
Starting from Oracle EE 23c version, CMAN-TDM provides a per-PDB PRCP option as well. With
this feature, a PRCP pool will be created at the PDB level. This per-PDB PRCP pool will be shared
across multiple services associated with the same PDB.
Sample per-PDB PRCP architecture
To enable per-PDB PRCP, the user has to set the PRCP connection factor
(TDM_PERPDB_PRCP_CONNFACTOR) parameter to a non-zero positive value in cman.ora (The
con
fi
guration
fi
le for CMAN).
The per-PDB PRCP feature dynamically con
fi
gures the PRCP pool, by adjusting the maximum
size based on the Oracle Compute Unit (OCPU) ā an internal parameter ā and the PRCP
connection factor parameter at the PDB level. CMAN-TDM periodically checks the OCPU value of
5. 3-Galera cluster plus NGINX or Haproxy
Galera Cluster:
Galera Cluster for MySQL/MariaDB is a true Multi-Master Cluster based on synchronous
replication. Itās an easy-to-use, high-availability solution, which provides high system up-time, no
data loss and scalability for future growth.
MariaDB with Galera and HAProxy is a great option if you need to scale your MariaDB or even
MySQL backend. With a minimum of the 3 nodes we set up above, a single node can go down
and your application and users wonāt ever notice it. Galera o
ff
ers great protection against data
loss, inconsistent databases and future scalability issues.
Haproxy:
High Availability Proxy, or HAProxy is a single-threaded event-driven non-blocking engine that
combines a fast I/O layer with a priority-based scheduler. You can use it to balance TCP
connections between application servers and Galera Cluster.
Haproxy can integrated with KeepAlive service to achieve HA solutions.
Also you can use mysql max scale solutions.
Nginx:
Nginx is well-known for its ability to act as a reverse-proxy with small memory footprint. It usually
sits in the front-end web tier to redirect connections to available backend services, provided these
passed some health checks. Using a reverse-proxy is common when you are running a critical
application or service that requires high availability. It also distributes the load equally among the
backend services.
Over the last few years, a few products have tried to tackle the read/write split challenge. The
MySQL_proxy was the
fi
rst attempt I am aware of at solving this problem but it ended up with
many limitations.
ScaleARC does a much better job and is very usable but it stills has some limitations.
The latest contender is MaxScale from MariaDB.
4-MySQL Router
MySQL Router is lightweight middleware that provides transparent routing between your
application and any backend MySQL Servers. It can be used for a wide variety of use cases, such
as providing high availability and scalability by e
ff
ectively routing database tra
ffi
c to appropriate
backend MySQL Servers. The pluggable architecture also enables developers to extend MySQL
Router for custom use cases.
MySQL Router Features:
Failover
Typically, a highly available MySQL setup consists of a single primary and multiple replicas and it
is up to the application to handle failover, in case the MySQL primary becomes unavailable. Using
MySQL Router, application connections will be transparently routed based on load balancing
policy, without implementing custom application code.
6. Load Balancing
MySQL Router provides additional scalability and performance by distributing database
connections across a pool of servers. For example, if you have a replicated set of MySQL Servers,
MySQL Router can distribute application connections to them in a round-robin fashion.
Pluggable Architecture
MySQL Router's pluggable architecture allows MySQL developers to easily extend the product
with additional features, as well as providing MySQL users with the ability to create their own
custom plugins providing endless possibilities
ā¢ Connection Routing plugin which does connection-based routing, meaning that it forwards the
MySQL packets to the backend server without inspecting or modifying them, thus providing
maximum throughput.
ā¢ The Metadata Cache plugin, which provides transparent client load balancing, routing, and
failover into Group Replication and InnoDB Clusters.
5-MySQL HeatWave Read Replicas with Load Balancer
The MySQL HeatWave Database Service added new capabilities that enable applications to scale
database read operations to millions of queries per second.
More than 5 million queries per second in OCI MySQL HeatWave using 16 replicas with the shape
AMD E4 16 OCPU 256 GB RAM.
Users can add and remove read replicas and load balancers to DB Systems with just a few clicks,
and the service takes care of deployment, con
fi
guration, monitoring, and maintenance.
Applications can connect directly to read replicas or to a read-only endpoint that automatically
balances the queries across all the replicas.
7. Postgres solution for connection pooling & load
balancing
Two ways to integrate a connection pooler
There are two ways of implementing connection pooling for PostgreSQL application:
1-As an external service or middleware such as pgbouncer or
pgpool
Connection poolers such as pgbouncer and pgpool-II can be used to pool connections from
clients to a PostgreSQL database. The connection pooler sits in between the application and the
database server. Pgbouncer or pgpool-II can be con
fi
gured in a way to relay requests from the
application to the database server.
2-Client-side libraries such as c3p0
There exist libraries such as c3p0 which extend database driver functionality to include
connection pooling support.
However, the best way to implement connection pooling for applications is to make use of an
external service or middleware since it is easier to set up and manage. In addition external
middleware like pgpool2 provides other features such as load balancing apart from pooling
connections.
Scaling database performance with pgbouncer
Pgbouncer comes with three types of pooling:
1. Session pooling: One of the connections in the pool is assigned to a client until the timeout is
reached.
2. Transaction pooling: Similar to session polling, it gets a connection from the pool. It keeps it
until the transaction is done. If the same client wants to run another transaction, it has to wait
until it gets another transaction assigned to it.
3. Statement pooling: Connection is returned to the pool as soon as the
fi
rst query is completed.
Pgpool-II
Pgpool-II is a proxy software that sits between PostgreSQL servers and a PostgreSQL database
client. It provides the following features:
8. Connection Pooling
Pgpool-II maintains established connections to the PostgreSQL servers, and reuses them
whenever a new connection with the same properties (i.e. user name, database, protocol version,
and other connection parameters if any) comes in. It reduces the connection overhead, and
improves system's overall throughput.
Load Balancing
If a database is replicated (because running in either replication mode or native replication mode),
performing a SELECT query on any server will return the same result. Pgpool-II takes advantage
of the replication feature in order to reduce the load on each PostgreSQL server. It does that by
distributing SELECT queries among available servers, improving the system's overall throughput.
In an ideal scenario, read performance could improve proportionally to the number of PostgreSQL
servers. Load balancing works best in a scenario where there are a lot of users executing many
read-only queries at the same time.
Automated fail over
If one of the database servers goes down or becomes unreachable, Pgpool-II will detach it and
will continue operations by using the rest of database servers. There are some sophisticated
features that help the automated failover including timeouts and retries.
Online Recovery
Pgpool-II can perform online recovery of database node by executing one command. When the
online recovery is used with the automated fail over, a detached node by fail over is possible to
attach as standby node automatically. It is possible to synchronize and attach new PostgreSQL
server too.
Replication
Pgpool-II can manage multiple PostgreSQL servers. Activating the replication feature makes it
possible to create a real time backup on 2 or more PostgreSQL clusters, so that the service can
continue without interruption if one of those clusters fails. Pgpool-II has built-in replication (native
replication). However user can use external replication features including streaming replication of
PostgreSQL.
Limiting Exceeding Connections
There is a limit on the maximum number of concurrent connections with PostgreSQL, and new
connections are rejected when this number is reached. Raising this maximum number of
connections, however, increases resource consumption and has a negative impact on overall
system performance. Pgpool-II also has a limit on the maximum number of connections, but extra
connections will be queued instead of returning an error immediately. However, you can con
fi
gure
to return an error when the connection limit is exceeded (4.1 or later).
Watchdog
Watchdog can coordinate multiple Pgpool-II, create a robust cluster system and avoid the single
point of failure or split brain. To avoid the split brain, you need at least 3 Pgpool-II nodes.
Watchdog can perform lifecheck against other pgpool-II nodes, to detect a fault of Pgpool-II. If
active Pgpool-II goes down, standby Pgpool-II can be promoted to active, and take over Virtual IP.
In Memory Query Cache
In memory query cache allows to save a pair of SELECT statement and its result. If an identical
SELECTs comes in, Pgpool-II returns the value from cache. Since no SQL parsing nor access to
PostgreSQL are involved, using in memory cache is extremely fast. On the other hand, it might be
slower than the normal path in some cases, because it adds some overhead of storing cache
data.
9. Choosing a connection pooler: pgpool-II or pgbouncer?
There are several factors to consider when choosing a connection pooler to use. Although
pgbouncer and pgpool-II are great solutions for connection pooling, each tool has its strengths
and weaknesses.
Memory/resource consumption
If you are interested in a lightweight connection pooler for your backend service, then pgbouncer
is the right tool for you. Unlike pgpool-II, which by default allows 32 child processes to be forked,
pgbouncer uses only one process. Thus pgbouncer consumes less memory than pgpool2.
Streaming Replication
Apart from pooling connections, you can also manage your Postgres cluster with streaming
replication using pgpool-II. Streaming replication copies data from a primary node to a secondary
node. Pgpool-II supports Postgres streaming replication, while pgbouncer does not. It is the best
way to achieve high availability and prevent data loss.
Centralized password management
In a production environment where you expect many clients/applications to connect to the
database through a connection pooler concurrently, it is necessary to use a centralized password
management system to manage clients' credentials.
You can make use of auth_query in pgbouncer to load clientsā credentials from the database
instead of storing clientsā credentials in a userlist.txt
fi
le and comparing credentials from the
connection string against the userlist.txt
fi
le.
Load balancing and high availability
Finally, if you want to add load balancing and high availability to your pooled connections, then
pgpool2 is the right tool to use. pgpool2 supports Postgres high availability through the in-built
watchdog processes. This pgpool2 sub-process monitors the health of pgpool2 nodes
participating in the watchdog cluster as well as coordinating between multiple pgpool2 nodes.
Conclusion
Database performance can be improved beyond connection pooling. Replication, load balancing,
and in-memory caching can contribute to e
ffi
cient database performance.
If a web service is designed to make a lot of read and write queries to a database, then you have
multiple instances of a Postgres database in place to take care of write queries from clients
through a load balancer such as pgpool-II while in-memory caching can be used to optimize read
queries.
Despite the pgpool-II ability to function as a loader balancer and connection pooler, pgbouncer is
the preferred middleware solution for connection pooling because it is easy to set up, not too
di
ffi
cult to manage, and primarily serves as a connection pooler without any other functions.
When use Oracle you have this solution:
Setup oracle RAC with con
fi
g Scan-ip and using new features such as TAF, FAN, and specially
TAC(Transparent Application Continuity).
Another integrated solution is using CMAN.
If thses options can not achieve your load balancing requirements then use F5 appliences that
give you hardware load balancing and security options.
For Orcale environment when you are challenging on web applications, use Weblogic or tra
ffi
c
director software to achieve connection pooling, load balancing.