1. Introduction to NUMA (Non-Uniform Memory Access)
vote
This is a primer on the NUMA hardware architecture...
In a typical SMP (Symmetric MultiProcessor
architecture), all memory access are posted to the
same shared memory bus. This works fine for a
relatively small number of CPUs, but the problem with
the shared bus appears when you have dozens, even
hundreds, of CPUs competing for access to the shared
memory bus. This leads to a major performance
bottleneck due to the extremely high contention rate
between the multiple CPU's onto the single memory
bus.
The NUMA architecture was designed to surpass these
scalability limits of the SMP architecture. NUMA
computers offer the scalability of MPP(Massively
Parallel Processing), in that processors can be added
and removed at will without loss of efficiency, and the
programming ease of SMP where.
Understanding Non-uniform Memory Access
Updated: 5 December 2005
Microsoft SQL Server 2005 is non-uniform memory access (NUMA) aware, and performs
well on NUMA hardware without special configuration. As clock speed and the number of
processors increase, it becomes increasingly difficult to reduce the memory latency
required to use this additional processing power. To circumvent this, hardware vendors
provide large L3 caches, but this is only a limited solution. NUMA architecture provides a
scalable solution to this problem. SQL Server 2005 has been designed to take advantage of
NUMA-based computers without requiring any application changes.
NUMA Concepts
The trend in hardware has been towards more than one system bus, each serving a small
set of processors. Each group of processors has its own memory and possibly its own I/O
channels. However, each CPU can access memory associated with the other groups in a
coherent way. Each group is called a NUMA node. The number of CPUs within a NUMA node
depends on the hardware vendor. It is faster to access local memory than the memory
associated with other NUMA nodes. This is the reason for the name, non-uniform memory
access architecture.
On NUMA hardware, some regions of memory are on physically different buses from other
regions. Because NUMA uses local and foreign memory, it will take longer to access some
regions of memory than others. Local memory and foreign memory are typically used in
reference to a currently running thread. Local memory is the memory that is on the same
2. node as the CPU currently running the thread. Any memory that does not belong to the
node on which the thread is currently running is foreign. Foreign memory is also known as
remote memory. The ratio of the cost to access foreign memory over that for local memory
is called the NUMA ratio. If the NUMA ratio is 1, it is symmetric multiprocessing (SMP). The
greater the ratio, the more it costs to access the memory of other nodes. Windows
applications that are not NUMA aware (including SQL Server 2000 SP3 and earlier)
sometimes perform poorly on NUMA hardware.
The main benefit of NUMA is scalability. The NUMA architecture was designed to surpass
the scalability limits of the SMP architecture. With SMP, all memory access is posted to the
same shared memory bus. This works fine for a relatively small number of CPUs, but not
when you have dozens, even hundreds, of CPUs competing for access to the shared
memory bus. NUMA alleviates these bottlenecks by limiting the number of CPUs on any one
memory bus and connecting the various nodes by means of a high speed interconnection.
Hardware-NUMA vs. Soft-NUMA
NUMA can match memory with CPUs through specialized hardware (hardware NUMA) or by
configuring SQL Server memory (soft-NUMA). During startup, SQL Server configures itself
based on underlying operating system and hardware configuration or the soft-NUMA
setting. For both hardware and soft-NUMA, when SQL Server starts in a NUMA
configuration, the SQL Server log records a multimode configuration message for each
node, along with the CPU mask.
Hardware NUMA
Computers with hardware NUMA have more than one system bus, each serving a small set
of processors. Each group of processors has its own memory and possibly its own I/O
channels, but each CPU can access memory associated with other groups in a coherent
way. Each group is called a NUMA node. The number of CPUs within a NUMA node depends
on the hardware vendor. Your hardware manufacturer can tell you if your computer
supports hardware NUMA.
If you have hardware NUMA, it may be configured to use interleaved memory instead of
NUMA. In that case, Windows and therefore SQL Server will not recognize it as NUMA. Run
the following query to find the number of memory nodes available to SQL Server:
3. SELECT DISTINCT memory_node_id FROM sys.dm_os_memory_clerks
If SQL Server returns only a single memory node (node 0), either you do not have
hardware NUMA, or the hardware is configured as interleaved (non-NUMA). If you think
your hardware NUMA is configured incorrectly, contact your hardware vendor to enable
NUMA. SQL Server ignores NUMA configuration when hardware NUMA has four or less CPUs
and at least one node has only one CPU.
Soft-NUMA
SQL Server 2005 allows you to group CPUs into nodes referred to as soft-NUMA. You
usually configure soft-NUMA when you have many CPUs and do not have hardware NUMA,
but you can also use soft-NUMA to subdivide hardware NUMA nodes into smaller groups.
Only the SQL Server scheduler and SQL Server Network Interface (SNI) are soft-NUMA
aware. Memory nodes are created based on hardware NUMA and therefore not impacted by
soft-NUMA. So, for example, if you have an SMP computer with eight CPUs and you create
four soft-NUMA nodes with two CPUs each, you will only have one memory node serving all
four NUMA nodes. Soft-NUMA does not provide memory to CPU affinity.
The benefits of soft-NUMA include reducing I/O and lazy writer bottlenecks on computers
with many CPUs and no hardware NUMA. There is a single I/O thread and a single lazy
writer thread for each NUMA node. Depending on the usage of the database, these single
threads may be a significant performance bottleneck. Configuring four soft-NUMA nodes
provides four I/O threads and four lazy writer threads, which could increase performance.
You cannot create a soft-NUMA that includes CPUs from different hardware NUMA nodes.
For example, if your hardware has eight CPUs (0..7) and you have two hardware NUMA
nodes (0-3 and 4-7), you can create soft-NUMA by combining CPU(0,1) and CPU(2,3). You
cannot create soft-NUMA using CPU (1, 5), but you can use CPU affinity to affinitize an
instance of SQL Server to CPUs from different NUMA nodes. So in the previous example, if
SQL Server uses CPUs 0-3, you will have one I/O thread and one lazy writer thread. If, in
the previous example SQL Server uses CPUs 1, 2, 5, and 6, you will access two NUMA
nodes and have two I/O threads and two lazy writer threads.
Non-Uniform Memory Access
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a
computer memory design used in multiprocessors, where the memory access time
depends on the memory location relative to a processor. Under NUMA, a processor can
access its own local memory faster than non-local memory, that is, memory local to
another processor or memory shared between processors.
4. NUMA architectures logically follow in scaling from symmetric multiprocessing (SMP)
architectures. Their commercial development came in work by Burroughs, Convex
Computer (later HP), SGI, Sequent and Data General during the 1990s. Techniques
developed by these companies later featured in a variety of Unix-like operating systems,
as well as to some degree in Windows NT and in later versions of Microsoft Windows.