Overview on NUMA

NUMA
MAATALLA Abed
abedmaatalla@gmail.com

• What is NUMA?
• History of processors.
• Close look on NUMA.
• UMA, NUMA & NUMA SMP architect.
• Barriers of NUMA.
• Solutions.
• Existing simulators.
• Benefits of NUMA

What is NUMA?
• Non-Uniform Memory Access: it will take longer
to access some regions of memory than others
• Designed to improve scalability on large SMPs
• Processor can access its own local memory faster than
non-local memory.
SMP: symmetric multiprocessing

What is NUMA?
• Groups of processors (NUMA node) have their own local
memory
– Any processor can access any memory, including the
one not "owned" by its group (remote memory)
– Non-uniform: accessing local memory is faster than
accessing remote memory

What is NUMA?
• Nodes are linked to each other by a hight-speed interconnection
• NUMA limits the number of CPUs
• Each group of processors has its own memory and possibly its I/O
channels
• The number of CPUs withing a NUMA node depends on the hardware
vendor.

What is NUMA?
• Facts:
– (most of) memory is
allocated at task startup.
– tasks are (usually) free to
run on any processor.
Both local and remote
accesses can happen
during task's life.

History of processors.
• Mental model of CPUs is stuck in the 1980s: basically
boxes that do arithmetic, logic, bit twiddling and shifting,
and loading and storing things in memory. But various
newer developments like vector instructions (SIMD) and
the idea that newer CPUs have support for virtualization.
• Many supercomputer designs of the 1980s and 1990s
focused on providing high-speed memory access as
opposed to faster processors, allowing the computers to
work on large data sets at speeds other systems could
not approach.

History of processors.
• The first commercial implementation of a NUMA-based
Unix system was the Symmetrical Multi Processing XPS-
100 family of servers, designed by Dan Gielan of VAST
Corporation for Honeywell Information Systems Italy.

Close look on NUMA.
• One can view NUMA as a tightly coupled form of cluster
computing. The addition of virtual memory paging to a
cluster architecture can allow the implementation of
NUMA entirely in software. However, the inter-node
latency of software-based NUMA remains several orders
of magnitude greater (slower) than that of hardware-
based NUMA.
• NUMA come to solve performance problems by
providing separate memory for each processor &
avoiding the performance hit when several processors
attempt to address the same memory.

Close look on NUMA
• Threads that share memory should be on the same
socket, and a memory-mapped I/O heavy thread should
make sure it’s on the socket that’s closest to the I/O
device it’s talking to.
• There is multiple level of memory like CC & LLC
because CPU become faster and need to speed up
memory access, it calls memory tree.

Close look on NUMA
• NUMA VS ccNUMA: The difference is almost
nonexistent at this point. ccNUMA stands for Cache-
Coherent NUMA, but NUMA and ccNUMA have really
come to be synonymous. The applications for non-cache
coherent NUMA machines are almost non-existent, and
they are a real pain to program for, so unless specifically
stated otherwise, NUMA actually means ccNUMA.

Close look on NUMA
• When a processor looks for data at a certain memory
address, it first looks in the L1 cache on the
microprocessor itself, then on a somewhat larger L1 and
L2 cache chip nearby, and then on a third level of cache
that the NUMA configuration provides before seeking the
data in the "remote memory" located near the other
microprocessors. Each of these NODES in the
interconnection network. NUMA maintains a hierarchical
view of the data on all the nodes.
• InterConnection Netwrok (ICN): as mentioned above,
ICN related NODES to allow exchange of data between
them. ( same in cluster physical link allow exchange of
data)

UMA, NUMA & NUMA SMP architect
• Uniform memory access(UMA): all
processors have same latency to
access memory. This architecture is
scalable only for limited nmber of
processors.
• Nom Uniform Memory
Access(NUMA): each processor has
its own local memory, the memory of
other processor is accessible but the
lantency to access them is not the
same which this event called " remote
memory access"

UMA, NUMA & NUMA SMP architect
• NUMA SMP: the hardware
trend is to use NUMA systems
with sereval NUMA nodes as
show in figure. A NUMA node
haa a group of processors
having shared memory. A
NUMA node can use its local
bus to interact with local
memory. Multiple NUMA
nodes can be added to form a
SMP. A common SMP bus can
interconnect all NUMA nodes

Barriers of NUMA.
• Spread data between memories.

Barriers of NUMA.
• Spread tacks between sockets.

Barriers of NUMA.
• IO NUMA: needs to be considered during placement /
scheduling.

Barriers of NUMA.
• There was just memory in 80s. Then CPUs got fast
enough relative to memory that people wanted to add a
cache. It’s bad news if the cache is inconsistent with the
backing store (memory), so the cache has to keep some
information about what it’s holding on to so it knows
if/when it needs to write things to the backing store.

Barriers of NUMA.
• Data request by more
than one processor.
• How far apart the
processors are from their
associated memory
banks.

Solutions
• It exist some hardware implementation to solve some
problems. Because, buying a high end server is so
expensive to test on it new approches and need a
special condition like cold and space.
• We as developer could create a simulator to implement
different approaches to analyse, improve performance
and scalability. This mean that simulator need to handle
software and hardware part also, by indicating remote
memory access events, calculate execution time of each
process and IO events ... etc.

Existing simulators
There is a same number of existing project that could be
named such as: RSIM, SICOSYS, SIMT and simNUMA.
Those projects exist and have done pretty nice job each
of those has power points and weakness points, but it's
already started and there is much more to cover and to
implement in this field.
There are a lot of approches and theories that needs to
be tested and proved or disproved.
For those reason mentioned above simulator plays an
important role in the near future

Benefit of NUMA
As mentioned above and scalability. It is extremely
difficult to scale SMP CPUs. At that number of CPUs, the
memory bus is under heavy contention. NUMA is one
way of reducing the number of CPUs competing for
access to a shared memory bus. This is accomplished
by having several memory busses and only having a
small number of CPUs on each of those busses.

I’m interested in things that
CPUs can’t do yet but will be
able to do in the near future.

Overview on NUMA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Overview on NUMA

Similar to Overview on NUMA (20)

Recently uploaded

Recently uploaded (20)

Overview on NUMA