Queues, Pools and Caches paper

Queues, Pools and Caches -
Everything a DBA should know of scaling modern OLTP
Gwen (Chen) Shapira, Senior Consultant
The Pythian Group
cshapi@gmail.com

Scalability Problems in Highly Concurrent Systems
When we drive through a particularly painful traffic jam, we tend to assume that the jam has a cause.
That road maintenance or an accident blocked traffic and created the slowdown. However, we often
reach the end of the traffic jam without seeing any visible cause.

Traffic researcher Prof. Sugiyama and his team showed that with sufficient traffic density, traffic jams
will occur with no discernible root cause. Traffic jams will even form when cars drive in constant speed
on a circular one-lane tracki.

“When a large number of vehicles, beyond the road capacity, are successively injected into the
road, the density exceeds the critical value and the free flow state becomes unstable.”ii

OLTP systems are systems built to handle large number of small transactions. In those systems the main
requirements are servicing large number of concurrent requests, with low and predictable latency. Good
scalability for OLTP system can be defined as “Achieving maximum useful concurrency from a shared
system”iii.

OLTP systems often behave exactly like traffic jams in Prof. Sugiyama’s experiments – more and more
traffic is loaded into the database. Inevitably, a traffic jam will occur, and we may not be able to find any
visible root cause for that. In a wonderful video, Andrew Holdsworth of Oracle’s Real World
Performance group shows how increasing traffic on a database server can dramatically increase latency
without any improvement in response times and how reducing the number of connections to the
database can improve performanceiv.

In this presentation, I’ll discuss several design patterns and frameworks that are used to improve
scalability by controlling concurrency in modern OTLP systems and web based architectures.

All the patterns and frameworks I’ll discuss are considered part of the software architecture. DBAs often
take little interest in the design and architecture of the applications that use the database. But
databases never operate in vacuum, DBAs who understand application design can have better dialog
with the software team when it comes to scalability, and progress beyond finger pointing and “The
database is slow” blaming. Those frameworks require sizing, capacity planning and monitoring – a task
that DBAs are better qualified for than software developers, I’ll go into details on how DBAs can help
size and monitor these systems with the database performance in mind.

Connection Pools
The Problem:
Scaling application servers is a well understood problem. Through use of horizontal scaling and stateless
interactions it is relatively easy to deploy enough application capacity to support even thousands of
simultaneous user requests. This scalability, however, does not extend to the database layer.

Opening and closing a database connection is a high latency operation, due to the network protocol
used between the application server and the database and the significant overhead of database
resources. Web applications and OLTP systems can't afford this latency for every user request.

The Solution:
Instead of opening a new connection for each application request, the application engine prepares a
certain number of open database connections and caches them in a connection pool.

In Java, DataSource class is a factory for creating database connections and the preferred way of getting
a connection. Java defines a generic DataSource interface, and there are many vendors that provide
their own DataSource implementations. Many, but not all the implementations also include connection
pooling.v

Using the generic DataSource interface, developers call getConnection(), and the DataSource class
provides the connection. Since the developers write the same code regardless of whether the
DataSource class they are using implements pooling or not, asking a developer whether he is using
connection pooling is not a reliable method to determine if connection pooling is used.

To make things more complicated, the developer is often unaware of which DataSource class he is using.
The DataSource implementation will be registered with the Java Naming Directory (JNDI) and can be
deployed and managed separately from the application that is using it. Finding out which DataSource is
used and how the connection pool is configured can take some digging and creativity. Most application
servers contain a configuration file called "server.xml" or "context.xml" that will contain various
resource descriptions. Search for a resource with type "javax.sql.DataSource" can find the configuration
of the DataSource class and the connection pool minimum and maximum sizes.

The Architecture:

Application Business Layer

Application Data Layer
JNDI
DataSource Interface

DataSource

Connection
JDBC Driver Pool

New problems:
1. When connection pools are used all users share the same schema and same sessions, tracing
can be difficult. We advise developers to use DBMS_APPLICATION_INFO to set extra information
such as username (typically in client_info field), module and action to assist in future
troubleshooting.
2. Deciding on the size of a connection pool is the biggest challenge in using connection pools to
increase scalability. As always, the thing that gets us into trouble is the thing we don’t know
that we don’t know.
Most developers are well aware that if the connection pool is too small, the database will sit idle
while users are either waiting for connections or are being turned away. Since the scalability
limitation of small connection pools are known, developers tend to avoid them by creating large
connection pools, and increasing their size at the first hint of performance problems.

However a too large connection pool is a much greater risk to the application scalability. Here is
what the scalability of an OLTP system typically looks likevi:

Amdahl’s law say that the scalability of the system is constrained by its serial component as the
users are waiting for shared resources such as IO and CPU (This is the contention delay), but
according to the Universal Scalability Law there is a second delay called “coherency delay” –
which is the cost of maintaining data consistency in the system, this models waits on latches and
mutexes. After a certain point, adding more users to the system will decrease throughput.

Even when throughput doesn’t increase, at the point where throughput stops growing linearly,
data starts to queue and response times suffer proportionally:

If you check the wait events for a system that is past the point of saturation, you will see very
high CPU utilization, high “log file sync” event as a result of the CPU contention and high waits
for concurrency events such as “buffer busy waits” and “library cache latch”.

3. Even when the negative effects of too many concurrent users on the system are made clear,
developers still argue for oversized connection pools with the excuse that most of the
connections will be idle most of the time. There are two significant problems with this approach:
a. While we believe that most of the connections will be idle most of the time, we can’t be
certain that this will be the case. In fact, the worst performance issues I’ve seen were
caused by the application actually using the entire connection pool allocated.
This often happens when response times at the database already suffer for some
reason, and the application does not receive response in a timely manner. At this point
the application or users rerun the operation, using another connection to run the exact
same query. Soon there are hundreds of connections to the database, all attempting to
run the same queries and waiting for the same latches.
b. The oversized connection pools have to be re-established during failover events or
database restarts. The larger the connection pool is, the longer the application will take
to recover from failover event, as a result decreasing the availability of the application.
4. Connection pools typically allow setting minimum and maximum sizes for the pool. When the
application starts it will open connections until the minimum number of connections is met.
Whenever it runs out of connections, it will open new connections until it reaches the maximum
level. If connections are idle for too long, they will be closed, but never below the minimum
level. This sounds fairly reasonable, until you ask yourself - if we set the minimum to the
number of connections usually needed, when will the pool run out of connections?

A connection pool can be seen as a queue. Users arrive and are serviced by the database while
holding a connection. According to little's law the avg. number of connections used in the queue
is (avg. DB response time)*(avg. user arrival rate). It is easy to see that you will run out of
connections if the rate that users use your site increases, or if the database performance
degrades and response times increase.

If your connection pool can grow at these times, it means that it will open new connections, a
resource intensive operation as we previously noted, to a database that is already abnormally
busy. This will farther slow things down, which can lead to a vicious cycle known as "connection
storm". It is much safer to configure the connection pool to a specific size – which is the
maximum number of concurrent users that can run queries on the database with acceptable
performance. We’ll discuss later how to determine this size. This will ensure that during peak
times you will have enough connections to maximize throughput at acceptable latency, and no
more.
5. Unfortunately, even if you decide on a proper number of database connections, there is the
problem of multiple application servers. In most web architectures there are multiple web
servers, each with a separate connection pool, all connecting to the same database server. In
this case, it seems appropriate to divide the number of connections the DB will sustain by the
number of servers and size the individual pools by that number. The problem with this approach
is that load balancing is never perfect, so it is expected that some app servers will run out of
connections while others still have spare connections. In some cases the number of application

servers is so large that dividing the number of connections leaves less than one connection per
server.

Solutions to new problems:
As we discussed in the previous section, the key to scaling OLTP systems is by limiting the number of
concurrent connections to a number that the database can reasonably support even when they are all
active. The challenge is in determining this number.

Keeping in mind that OLTP workloads are typically CPU-bound, the number of concurrent users the
system can support is limited by the number of cores on the database server. A database with 12 cores
can typically only run 12 concurrent CPU-bound sessions.

The best way to size the connection pool is by simulating the load generated by the application.
Running a load test on the database is a great way of figuring out the maximum number of concurrent
active sessions that can be sustained by the database. This should usually be done with assistance from
the QA department, as they probably already determined the mix of various transactions that simulates
the normal operations load.

It is important to test the number of concurrently active connections the database can support at its
peak, therefore while testing it is critical to make sure that the database is indeed at full capacity and is
the bottleneck at the point when we decide the number of connections is maximal. This can be
reasonably validated by checking the CPU and IO queues at the database server and correlating with the
response times of the virtual users.

In usual performance tests, you try to decide on the maximum numbers of users the application can
support. Therefore you run the test with increasing number of virtual users, until the response times
become unacceptable. However, when attempting to determine the maximum number of connections
in the pool, you should run the test with a fixed number of users and keep increasing the number of
connections in the connection pool until the database CPU utilization goes above 60%, the wait events
go from “CPU” to concurrency events and response times become unacceptable. Typically all three of
these symptoms will start occurring at approximately the same time.

If a QA department and load testing tools are not available, it is possible to use the methodology
described by James Morle in his paper "Brewing Benchmarks" and generate load testing scripts from
trace files, which can later be replayed by SwingBench.

When running a load test is impractical, you will need to estimate the number of connections based on
available data. The factors to consider are:

1. How many cores are available on the database server?
2. How many concurrent users or threads does the application need to support?
3. When an application thread takes a connection from the pool, how much of the time is spent
holding the connection without actually running database queries? The more time the

application spends “just holding” the connection, the larger the pool will need to be to support
the application workload.
4. How much of the database workload is IO-bound? You can check IOWAIT on the database server
to determine this. The more IO-bound your workload is, the more concurrent users you can run
without running into concurrency contention (You will see a lot of IO contention though).

“Number of cores”x4 is a good connection pool starting point. Less if the connections are heavily utilized
by the application and there is little IO activity and more if the opposite is true.

The remaining problem is what to do if the number of application servers is large and it is inefficient to
divide the connection pool limit among the application servers. Well-architected systems usually have a
separate data layer that can be deployed on separate set of servers. This data layer should be the only
component of the application allowed to open connections to the database, and it provides data objects
to the various application server components. In this architecture, the connections are divided between
the data-layer servers, of which there are typically much fewer.
This design has three great advantages: First, the data layer usually grows much slower than the
application and rarely requires new servers to be added, which means that pools rarely require resizing.
Second, application requests can be balanced between the data servers based on the remaining pool
capacity and third, if there is a need to add application-side caching to the system (such as Memcached),
only the data layer needs modification.

Application Message Queues
The Problem:
By limiting the number of connections from the application servers to the database, we are preventing a
large number of queries from queuing at the database server. If the total number of connections
allowed from application servers to the database is limited to 400, the run queue on the database will
not exceed 400 (at least not by much).

We discussed in the previous section why preventing excessive concurrency in the database layer is
critical for database scalability and latency. However, we still need to discuss how the application can
deal with the user requests that arrive when there is no free database connection to handle them.

Let’s assume that we limited the connection pool to 50 connections, and due to a slow-down in the
database, all 50 connections are currently busy servicing user requests. However, new user requests are
still arriving into the system at their usual rate. What shall we do with these requests?

1. Throw away the database request and return error or static content to the user.
Some requests have to be serviced immediately. If the front page of your website can't load
within few seconds, it is not worth servicing at all. Hopefully, the database is not a critical
component in displaying these pages (we'll discuss the options when we discuss caches). If it
does depend on the database and your connection pool is currently busy, you will want to
display a static page and hope the customer will try again later.
2. Place the request in queue for later processing.
Some requests can be put aside for later processing, giving the user the impression of
immediate return. For example, if your system allows the user to request reports by email, the
request can certainly be acknowledged and queued for off-line processing. This option can be
mixed with the first option – limit the size of the queue to N requests and display error
messages for the rest.
3. Give the request extra-high priority. The application can recognize that the request arrived from
the CIO and make sure it gets to the database ahead of any other user, perhaps cancelling
several user requests to get this done.
4. Give the request extra-low priority. Some requests are so non-critical that there is no reason to
even attempt serving them with low latency. If a user uses your application to send a message
to another user, and there is no guarantee on how soon the message will arrive, it makes sense
to tell the user the message was sent while in effect waiting until a connection in the pool is idle
before attempting to serve the message. Recurring events are almost always lower priority than
one-time events: User signing up for the service is one time event, and if lost, will have
immediate business impact. Auditing user activity, on the other hand, is recurring event, and in
case of delay will have lower business impact.
5. Some requests are actually a mix of requests from different sources such as a dashboard, in
these cases it is best to display the different dashboard components as the data arrives, with
some components taking longer than others to show up.

In all those cases, the application is able to prioritise requests and decide on a course of action, based on
information that the database did not have at the time. It makes sense to shift the queuing to the
application when the database is highly loaded, because the application is better capable of dealing with
the excessive load.

Databases are not the only constrained resources, as application servers have their own limitations
when dealing with excess load. Typically, application servers have limited number of threads. This is
done for the same reason we limit the number of connections to the database servers - the server only
has limited number of cores and excessive number of threads will overload the server without
improving throughput. Since database requests are usually the highest latency action that is done by an
application thread, when the database is slow to response, all the application server threads can be busy
waiting for the database. The CPU on the application server will be idle while the application cannot
respond to additional user requests.

All this leads to the conclusion that from both the database perspective and the application perspective,
it is preferable to decouple the application requests from the database requests. This allows the
application to prioritise requests, hide latency and keep the application server and database server busy
but not overloaded.

The Solution:
Message queues provide an asynchronous communications protocol, meaning that the sender and
receiver of the message do not need to interact with the message queue at the same time. They can be
used by web applications and OLTP systems as a way to hide latency or variance in latency.

Java defines a common messaging API, JMS. There are multiple implementations of this API, both open
source and commercial. Oracle advanced queues are bundled with Oracle RDBMS both SE and EE at no
extra cost. These implementations differ in their feature set, supported operations, reliability and
stability. The API supports queues for point-to-point messaging with a single publisher and single
consumer. It also supports topics for publish-subscribe model where multiple consumers can subscribe
to various topics and receive the messages broadcasted with the topic.

Message queues are typically installed by system administrators as a separate server or component, just
like databases are installed and maintained. The message queue server is called "Broker", and is usually
backed by a database to ensure that messages are persistent even when the broker fails. The application
server then connects to the broker by a URL, and can publish and consume from queues by the queue
name.

The Architecture:

Application Business Layer Message
Queue
Application Data Layer

JNDI

DataSource

Connection
JDBC Driver Pool

New Problems:
There are some common mythologies related to queue management, which may make developers
reluctant to use them when necessaryvii:

1. It is impossible to reliably monitor queues
2. Queues are not necessary if you do proper capacity planning
3. Message queues are unnecessarily complicated. There must be a simpler way to achieve the
same goals.

Solutions to New Problems:
While queues are undeniably useful to improve throughput both at the database and application server
layers, they do complicate the architecture. Let’s tackle the myths one by one:

1. If it was indeed impossible to monitor queues, you would not monitor the CPU, load average,
average active sessions, blocking sessions, disk IO waits, latches.
All systems have many queues. The only question is - where is the queue managed and how
easy it will be to manage each specific queue.

If you use Oracle Advanced Queues, V$AQ will show you the number of messages in the queue
and the average wait for messages in the queue, which is usually all you need to determine the
status of the queue. For the more paranoid, I'd recommend adding a heartbeat monitor - insert
a monitoring message to the queue at regular intervals and check that your process can read it
from queue and the amount of time it took to arrive.

The more interesting question is what do you do with the monitoring information - at what
point will you send an alert to the on-call SA and what will you want her to do when she receives
the alert?

Any queuing system will have high variance in service times and arrival rates of work. If the
service time and arrival rates were constant, there will be no need for queues. The high variance
is expected to lead to spikes in system utilization, which can cause false alarms - the system is
behaving as it should, but messages are accumulating in the queue. Our goal is to give as early
as possible notice that there is a genuine issue with the system that should be resolved and not
send warnings when the system is behaving as expected.

For this end, I recommend monitoring the following parameters:
• Service time - this will be monitored at the consumer thread. The thread should track
(i.e. instrument) and log at regular intervals the average time it took to process a
message from the queue. If service time increase significantly (compared to a known
baseline, taking into account the known variance in response times), it can indicate a
slowdown in processing and should be investigated.
• Arrival rate should be monitored at the processes that are writing to the queue. How
many messages are inserted to the queue every second? This should be tracked for long
term capacity planning and to determine peak usage periods.
• Queue size - the number of messages in the queue. Using Little's Law we can measure
the amount of time a message spends in the queue (wait time) instead.
If queue size or wait time increase significantly, this can indicate a "business issue" - i.e.
impending breach of SLA. If the wait time frequently climbs to the point when SLAs are
breached, it indicates that the system is does not have enough capacity to serve the
current workloads. In this case either service times should be reduced (i.e. tuning), or
more processing servers should be added. Note that queue size can and should go up
for short periods of time, and recovering from bursts can take a while (depending on the
service utilization), so this is only an issue if the queue size is high and does not start
declining within few minutes, which will indicate that the system is not recovering.

• Service utilization - what percent of the time the consumer thread is busy. This can be
calculated by (arrival rate/(service time x number of consumers)).
The more the service is utilized, the higher the probability that when a new message
arrives, it will have other messages ahead of it in the queue and since R=S+W, the
service times will suffer. Since we already measure the queue size directly, the main use
of service utilization is capacity planning, and in particular detection of over-provisioned
systems. For known utilization and fixed service times, if we know the arrival rates will
grow by 50% tomorrow, you can calculate the expected effect on response timesviii:

Note that by replacing many small queues on the database server with one (or few)
centralized queue in the application, you are in a much better position to calculate
utilization and predict the effect on response times.

2. Queues are inevitable. Capacity planning or not, the fact that arrival rates and service times are
random will ensure that there will be times when requests will be queued, unless you plan to
turn away a large percentage of your business.

I suspect that what is really meant by "capacity planning will eliminate need for queues" is that
it is possible to over-provision a system in a way that the queue servers (consumers) will have
very low utilization. In this case queues will be exceedingly rare so it may make sense to throw
the queue away and have the application threads communicate with the consumers directly.
The application will then have to throw away any request that arrives when the consumers are
busy, but in this system it will almost never happen. This is “capacity planning by
overprovisioning”. I've worked on many databases that rarely exceeded 5% CPU. You'll still need
to closely monitor the service utilization to make sure you increase your capacity to keep
utilization low. I would not call this type of capacity planning "proper", though.

On the other hand, introduction of a few well defined and well understood queues will help
capacity planning. If we assume fixed server utilization, the size of the queue is proportional to
the number of servers. So on some systems; it is possible to do the capacity planning just by
examining the queue sizes.

3. Message Queues are indeed a complicated and not always stable beast. Queues are a simple
concept. How did we get to a point where we need all those servers, protocols and applications
to simply create a queue?
Depending on your problem definition, it is possible that message queues are an excessive
overhead. Sometimes all you need is a memory structure and few pointers. My colleague Marc
Fielding created a high-performance queue system with a database table and two jobs. Some
developers consider the database a worse overhead and prefer to implement their queues with
a file, split and xargs. If this satisfies your requirements, then by all means, use those solutions.

In other cases, I've attempted to implement a simple queuing solution, but the requirements
kept piling up: What if we want to add more consumers? What if the consumer crashed and
only processed some of the messages it retrieved? By the time I finished tweaking my system to
address all the new requirements; it was far easier to use an existing solution. So I advise to only
use home-grown solutions if you are reasonably certain the requirements will remain simple. If
you suspect that you'll have to start dealing with multiple subscribers, which may or may not
need to retrieve the same message multiple times, which may or may not want to ack messages,
and that may or may not want to filter specific message types, then I recommend using an
existing solution.

ActiveMQ, RabbitMQ (acquired by springsource) are popular open source implementations, and
Oracle Advanced Queue is free if you already have Oracle RDBMS license. When choosing an off
the shelf message queue, it is important to understand how the system can be monitored and
make sure that queue size, wait times and availability of the queue can be tracked by your
favorite monitoring tool. If high availability is a requirement, this should also be taken into
account when choosing message queue provider, since different queue systems support
different HA options.

Application Caching:
The Problem:
The database is a sophisticated and well optimized caching machine, but as we saw when we discussed
connection pools, it has its limitations when it comes to scaling. One of those limitations is that a single
database machine is limited in the amount of RAM it has, so if your data working set is larger than the
amount of memory available, your application would have to access the disk occasionally. Disk access is
10,000 times slower than memory access. Even a slight increase in the amount of disk access your
queries have to perform, the type that happens naturally as your system grows, can have devastating
impact on the database performance.

With Oracle RAC, more cache memory is available by pooling together memory from multiple machines
into global cache. However, the performance improvement from the additional servers is not
proportional to what you'd see if you would add more memory to the same machine. Oracle has to
maintain cache consistency between the servers, and this introduces significant overhead. RAC can
scale, but not in every case and it requires careful application design to make this happen.

The Solution:
Memcached is a distributed, memory-only, key-value store. It can be used by the application server to
cache results of database queries that can be used multiple times. The great benefit of Memcached is
that it is distributed and can use free memory on any server, allowing for caching to be done outside of
Oracle’s scarce buffer cache. If you have 5 application servers and you allocate 1G RAM to Memcached
on each server, you have 5G of additional caching.

Memcached cache is an LRU, just like the buffer cache. If the application is trying to store a new key, and
there is no free memory, the oldest item in the cache will be evicted and its memory used for the new
key.

According to the documentation, Memcached scales very well when adding additional servers because
the servers do not communicate with each other at all. Each client has a list of available servers and the
hash function that allows it to know which server will hold the value for which key. When the
application requests data from cache, it connects to a single server and accesses exactly one key. When
a single cache node crashes, there will get more cache misses and therefore more database requests,
but the rest of the nodes will continue operating as usual.

I was unable to find any published benchmarks that confirm this claim, so I ran my own un-official
benchmark, using Amazon’s ElastiCache, a service which allows one to create a Memcached cluster and
add nodes to it.

Few comments regarding the use of Amazon’s ElastiCache and how I ran the tests:

1. Amazon’s ElastiCache is only usable from servers on Amazon’s EC2 cloud. To run the test, I
created an ElastiCache cluster with two small servers (1.3G RAM, 1 virtual core), and one EC2

micro node (613 MB, up to two virtual cores for short bursts) running Amazon’s Linux
distribution.
2. I ran the test using Brutisix, a Memcached load test framework, written in PHP. The test is fairly
configurable, and I ran it as follows:
• 7 gets to 3 sets read/write mix, all reads and writes were random. Values were limited
to 256 bit.
• First test ran with a key space of 10K keys, which fit easily in memory of one
Memcached node. The node was pre-warmed with the keys.
• Second test ran with the same key space, two-nodes, both pre-warmed.
• Third test was one node again, 1M keys, which do not fit in memory of one or two
nodes and no pre-warming of cache.
• Fourth test with two nodes, 1M keys. Second node added after first node was already
active.
• The first 3 tests ran for 5 minutes each, the fourth ran for 15 minutes.
• The single node tests ran with 2 threads, and the two-node tests ran with four.

3. Amazon’s cloud monitoring framework was used to monitor Memcached’s statistics. It had two
annoying properties – it did not automatically refresh, and the values it showed were always 5
minutes old. In the future, it will be worth the time to install my own monitoring software on an
EC2 node to track Memcached performance.

Here is a chart of the total number of gets we could run on each node:

Number of hits and misses per node:

Few conclusions from the tests I ran:

1. In the tests I ran, get latency was 2ms on AWS cluster and 0.0068 on my desktop. It appears that
the only latency you’ll experience with Memcached is the network latency.
2. The ratio of hits and misses did not affect the total throughput of the cluster. The throughput is
somewhat better with a larger key space, possibly due to fewer get collisions.
3. Throughput dropped when I added the second server, and total throughput never exceeded 60K
gets per minute. It is likely that at the configuration I ran, the client could not sustain more than
60K gets per minute.
4. 60K random reads per minute at 2ms latency is pretty impressive for two very small servers,
rented at 20 cents an hour. You will need a fairly high-end configuration to get the same
performance from your database.

By using Memcached (or other application-side caching), load on the database will be reduced, since
there are fewer connections and fewer reads. Database slowdowns will have less impact on the
application responsiveness, since on many pages most of the data arrives from cache, the page can
gradually display without the users feeling that they wait forever to get results. Even better, if the
database is unavailable, you can still maintain partial availability of the application by displaying cached
results – in the best cases, only write operations will be unavailable when the database is down.

The Architecture:

Application Business Layer Message
Queue
Memcached Application Data Layer

JNDI

DataSource

Connection
JDBC Driver Pool

New Problems:
Unlike Oracle's buffer cache, which is automatically used by queries, use of the application cache does
not happen automatically and requires code changes to the application. In this sense it is somewhat
similar to Oracle's result cache - it stores results by request and not data blocks automatically. The
changes required to use Memcached are usually done in the data layer. The code that queries the
database is replaced by code that only queries the database if the result was not found in the cache first.

This places the burden of properly using the cache on the developers. It is said that the only difficult
problems in computer science are naming things and cache invalidation. The purpose of this paper is not
to solve the most difficult problem in computer science, but we will offer some advice on proper use of
Memcached.

In addition, Memcached presents the usual operational questions – How big should it be, and how can it
be monitored. We will discuss capacity planning and monitoring of Memcached as well.

Solutions to new problems:
The first step in integrating Memcached into your application is to re-write the functions in your data
layer, so they will look for data in the cache before querying the database:

For example, the following:

function get_username(int userid) {
username = db_select("SELECT usename FROM users WHERE userid = ?",
userid);
return username;
}

Will be replaced by:

function get_username(int userid) {
/* first try the cache */
name = memcached_fetch("username:" + userid);
if (!name) {
/* not found : request database */
name = db_select("SELECT username FROM users WHERE userid = ?",
userid);
/* then store in cache until next get */
memcached_add("username:" + userid, username);
}
return data;
}

We will also need to change the code that updates the database so it will update the cache as well,
otherwise we risk serving stale data:

function update_username(int userid, string username) {
/* first update database */
result = db_execute("Update users set username=? WHERE userid=?",
userid,username);
if (result) {
/* database update successful: update cache */
memcached_set("username:" + userid, username);
}

Of course, not every function should be cached. The cache has limited size, and there is an overhead for
attempting to use the cache for data that is not actually there. The main benefits are to use the cache
for results of large or highly redundant queries.

To use the cache effectively without risking data corruption, keep the following in mind:

1. Use ASH data to find the queries that use the most database time. Queries that take significant
amount of time to execute and short queries that execute very often are good candidates for
caching. Of course many of these queries use bind variables and return different results for each
user. As we showed in the example, the bind variables can be used as part of the cache key to
store and retrieve results for each group of binds separately. Due to the LRU nature of the
cache, commonly used binds will remain and cache and get reused while infrequently used
combinations will get evicted.
2. Memcached takes large amounts of memory (the more the merrier!) but there is evidencex that
it does not scale well across large number of cores. This makes Memcached a good candidate to
share a server with an application that makes intensive use of the CPU and doesn't require as
much memory. Another option is to create multiple virtual machines on a single multi-core
server and install Memcached on all the virtual machines. However this configuration means
that you will lose most of your caching capacity with the crash of a single physical server.
3. Memcached is not durable. If you can't afford to lose specific information, store it in the
database before you store it in Memcached. This seems to imply that you can't use Memcached
to scale a system which is doing primarily large number of writes. In effect, it depends on the
exact bottlenecks. If your top wait event is "Log file sync", you can use Memcached to reduce
the total amount of work the database does, reduce the CPU load and therefore potentially
reduce "log file sync" wait.
4. Some data should be stored eventually but can be lost without critical impact to the system.
Instrumentation and logging information is definitely in this category. This information can be
stored in Memcached and written to the database in batches and infrequently.

5. Consider pre-populating the cache: If you rely on Memcached to keep your performance
predictable, a crash of a Memcached server will send significant amounts of traffic to the
database and the effects on performance will be noticeable. When the server comes back, it can
take a while until all the data is loaded to the cache again, prolonging the period of reduced
performance. To improve performance in the first minutes after a restart, consider a script that
will pre-load data into the cache when the Memcached server starts.
6. Consider very carefully what to do when the data is updated:
Sometimes it is easy to simultaneously update the cache - if user changes his address and the
address is stored in the cache, update the cache immediately after updating the database. This
is the best case scenario, as the cache is kept useful through update. Memcached API contains
functions that allow changing data atomically or avoid race conditions.
When the data in the cache is actually aggregated data, it may not be possible to update it, but
will be possible to evict the current information as irrelevant and reload it to the cache when it
is next needed. This can make the cache useless when the data is updated and reloaded very
frequently.
Sometimes it isn't even possible to figure out what keys should be evicted from cache when
specific field is updated, especially if the cache contains results of complex queries. This
situation is best avoided, but can be dealt with by setting expiration time for the data, and
preparing to serve possibly-stale data for that period of time.

How big should the cache be?

• It is better to have many servers with less memory than few servers with a lot of memory. This
minimises the impact of one crashed Memcached server. Remember that there is no
performance penalty to a large number of nodes.
• Losing a Memcached instance will always send additional traffic to the database. You need to
have enough Memcached servers to make sure the extra traffic will not cause unacceptable
latency to the application.
• There are no downsides to a cache that is too large, so in general allocate to Memcached all the
memory you can afford.
• If the average number of gets per item is very low, you can safely reduce the amount of memory
allocated.
• There is no "cache size advisor" for Memcached, and it is impossible to predict the effect of
adding or reducing the cache size based on the monitoring data available from Memcached.
SimCache is a tool that based on detailed hit/miss logs for the existing Memcached can simulate
an LRU cache and predict the hit/miss ratio in various cache sizes. In many environments
keeping such detailed log is impractical, but tracking a sample of the requests could be possible
and can still be used to predict cache effects.
• Knowing the average latency of database reads under various loads and the latency of
Memcached reads should allow you to predict changes in response time as Memcached size and
its hit ratio changes. For example:
You use SimCache to see that with cache size of 10G you will have hit ratio of 95% in

Memcached. Memcached has latency of 1ms in your system. With 5% of the queries hitting the
database, you expect the database CPU utilization to be around 20%, almost 100% of the DB
Time on the CPU, and almost no wait time on the queue between the business and the data
layers (you tested this separately when sizing your connection pool). In this case the database
latency will be 5ms, so we expect the average latency for the data layer to be
0.95*1+0.05*5=1.2ms.

How do I monitor Memcached?

• Monitor number of items, gets, sets and misses. An increase in the number of cache misses will
definitely mean that the database load is increasing at same time, and can indicate that more
memory is necessary. Make sure that the number of gets is higher than the number of sets. If
you are setting more than getting, the cache is a waste of space. If the number of gets per item
is very low, the cache may be oversized. There is no downside to an oversized cache, but you
may want to use the memory for another purpose.
• Monitor for number of evictions. Data is evicted when the application attempts to store new
item but there is no memory left. An increase in the number of evictions can also indicate that
more memory is needed. Evicted time shows the time between the last get of the item to its
eviction. If this period is short, this is a good indication that memory shortage makes the cache
less effective.
• It is important to note that low hit rate and high number of evictions do not immediately mean
you should buy more memory. It is possible that your application is misusing the cache:
o Maybe the application sets large numbers of keys, most of which are never used again.
In this case you should reconsider the way you use the cache.
o Maybe the TTL for the keys is too short. In this case you will see low hit rate but not
many evictions.
o The application frequently attempts to get items that don't exist, perhaps due to data
purging of some sort. Consider setting the key with a "null" value, to make sure the
invalid searches do not hit the database over and over.
• Monitor for swapping. Memcached is intended to speed performance by caching data in
memory. If the data is spilled to disk, it is doing more harm than good.
• Monitor for average response time. You should see very few requests that take over 1-2ms,
longer wait times can indicate that you are hitting the maximum connection limit for the server,
or that CPU utilization on the server is too high.
• Monitor that the number of connections to the server does not come close to the max
connections settings of Memcached (configurable).
• Do not monitor "stat sizes" for statistics about size of items in cache. This locks up the entire
cache.

All the values I mentioned can be read from Memcached using the STAT call in its API. You can run this
command and get the results directly by telnet to port 11211. Many monitoring systems, including Cactii
and Ganglia include monitoring templates for Memcached.

i
Traffic jam without bottleneck -experimental evidence for the physical mechanism of the formation of a jam
Yuki Sugiyama, Minoru Fukui, Macoto Kikuchi, Katsuya Hasebe, Akihiro Nakayama, Katsuhiro Nishinari, Shin-ichi
Tadaki, Satoshi Yukawa New Journal of Physics, Vol.10, (2008), 033001
ii
http://www.telegraph.co.uk/science/science-news/3334754/Too-many-cars-cause-traffic-jams.html
iii
Scaling Oracle8i™: Building Highly Scalable OLTP System Architectures, James Morle
iv
http://www.youtube.com/watch?v=xNDnVOCdvQ0
v
http://docs.oracle.com/javase/1.4.2/docs/guide/jdbc/getstart/datasource.html
vi
http://www.perfdynamics.com/Manifesto/USLscalability.html
vii
http://teddziuba.com/2011/02/the-case-against-queues.html
viii
http://www.cmg.org/measureit/issues/mit62/m_62_15.html
ix
http://code.google.com/p/brutis/
x

http://assets.en.oreilly.com/1/event/44/Hidden%20Scalability%20Gotchas%20in%20Memcached%20and%20Frien
ds%20Presentation.pdf

Queues, Pools and Caches paper

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Queues, Pools and Caches paper

Similar to Queues, Pools and Caches paper (20)

More from Gwen (Chen) Shapira

More from Gwen (Chen) Shapira (20)

Recently uploaded

Recently uploaded (20)

Queues, Pools and Caches paper