2. What is a NUMA System ??
Non-uniform memory access is a computer memory design used in
multiprocessing, where the memory access time depends on the
memory location relative to the processor. Under NUMA, a processor
can access its own local memory faster than non-local memory.
P indicates PROCESSOR
3. Main Characteristics
• Consists of several nodes.
• Each node contains a subset of system’s CPU and a part of it
RAM.
• Programs can transparently access memory on local and
remote nodes without changes to the code.
4. Current CPU can generate an
immense load on the
memory subsystem.
This causes Congestion on
memory controllers and
interconnect links
Accessing a single node by multiple cores
Causes increase in the memory latencies up to
1200 cycles
In a Brief
5. Local vs Remote Differences For
Single threaded applications
Performance never
degraded by more than 20
percent, even when all
memory requests were
remote.
6. Local vs Remote Differences For
Multithreaded applications
The figure compares the
two policies by showing
the performance difference
between the best and worst
policy for each benchmark.
(F) Indicates First touch
(l) Indicates Interleave
(-) Indicates negligible difference
7. The first observation to make is that no one policy is best for all
applications. Several applications perform best with the first-
touch policy, but many prefer interleaving. The second
observation is that NUMA effects beyond the remote-access
penalty can indeed severely affect performance.
Observations
8. Further Investigated characteristics
Local Access Ratio
Memory Latency
Memory-Controller Imbalance
Average Interconnect Usage
Average Interconnect Imbalance
IPC(Instructions Per Cycle)
9. Avoiding performance pitfalls on NUMA
systems requires considering how the nodes
are connected, where the program’s
memory is placed, and how it accesses that
memory.
A NUMA memory-management
algorithm should place importance on
congestion management, rather than
focusing solely on reducing remote
accesses.
How to Make it Better !
The effects of imbalance and the local
access ratio are reflected in the memory-
access latency.
10. Conclusion
NUMA architecture is for scaling the processor count of
today's server-class systems. In the near future, expect
systems to have even more NUMA nodes and more
complicated NUMA topologies . The two NUMA concerns of
congestion and locality are hard to reconcile, and for any
particular application we can't know the best memory
placement beforehand.