Understanding Non-Uniform Memory Access - NUMA

Understanding Non-uniform Memory Access – NUMA
Microprocessor challenge .As clock speed and the number of processors increase, it becomes
increasingly difficult to reduce the memory latency required to use this additional processing
power. To circumvent this, hardware vendors provide large L3 caches, but this is only a limited
solution. NUMA architecture provides a scalable solution to this problem. In NUMA systems,
the nodes consist of processors and memory which are interconnected. Each
processor can access local memory and they can access memory on the other
nodes. There is a greater latency in accessing the remote memory compared to the
local memory. i.e. non-uniform memory access.
NUMA Concepts
The trend in hardware has been towards more than one system bus, each serving a small set
of processors. Each group of processors has its own memory and possibly its own I/O channels.
However, each CPU can access memory associated with the other groups in a coherent way.
Each group is called a NUMA node. The number of CPUs within a NUMA node depends on the
hardware vendor. It is faster to access local memory than the memory associated with other
NUMA nodes. This is the reason for the name, non-uniform memory access architecture.
On NUMA hardware, some regions of memory are on physically different buses from other
regions. Because NUMA uses local and foreign memory, it will take longer to access some
regions of memory than others. Local memory and foreign memory are typically used in
reference to a currently running thread. Local memory is the memory that is on the same node
as the CPU currently running the thread. Any memory that does not belong to the node on
which the thread is currently running is foreign. Foreign memory is also known as remote
memory. The ratio of the cost to access foreign memory over that for local memory is called
the NUMA ratio. If the NUMA ratio is 1, it is symmetric multiprocessing (SMP). The greater the
ratio, the more it costs to access the memory of other nodes. Windows applications that are
not NUMA aware (including SQL Server 2000 SP3 and earlier) sometimes perform poorly on
NUMA hardware.
The main benefit of NUMA is scalability. The NUMA architecture was designed to surpass the
scalability limits of the SMP architecture. With SMP, all memory access is posted to the same
shared memory bus. This works fine for a relatively small number of CPUs, but not when you
have dozens, even hundreds, of CPUs competing for access to the shared memory bus. NUMA
alleviates these bottlenecks by limiting the number of CPUs on any one memory bus and
connecting the various nodes by means of a high speed interconnection.
Hardware NUMA
Computers with hardware NUMA have more than one system bus, each serving a small set of
processors. Each group of processors has its own memory and possibly its own I/O channels,
but each CPU can access memory associated with other groups in a coherent way. Each group
is called a NUMA node. The number of CPUs within a NUMA node depends on the hardware
vendor. Your hardware manufacturer can tell you if your computer supports hardware NUMA.
The SMP and MPP Machine Architectures
This discussion requires a baseline understanding of symmetric multiprocessing (SMP) and

massively parallel processing (MPP) machine architectures
SMP systems allow any processor to work on any task no matter where the data for that task
are located in memory; with proper operating system support, SMP systems can easily move
tasks between processors to balance the workload efficiently.
SMP – Symmetric Multiprocessing System (Max 8 Processor)
Stated simply, an SMP machine has memory and disk that is equally accessible to any
processor (hence the term "symmetric").And symmetric really means symmetric. All the
processors have to be the same speed, the same stepping, the same manufacturer. They must
be identical in every way. If you break any of these rules, you will get strange results. Strange
results from QueryPerformanceCounter will be the least of your problems,The physics behind a
hardware bus plated into a circuit board limits how close they can be before electromagnetic
interference becomes unmanageable,
Entry Level Syster - Before about 2006, entry-level servers and workstations with two
processors dominated the SMP market. With the introduction of dual-core devices, SMP is
found in most new desktop machines and in many laptop machines. The most popular entry-
level SMP systems use the x86 instruction set architecture and are based on Intel’s Xeon,
Pentium D, Core Duo, and Core 2 Duo based processors or AMD’s Athlon64 X2, Quad FX or
Opteron 200 and 2000 series processors. Servers use those processors and other readily
available non-x86 processor choices including the Sun Microsystems UltraSPARC, Fujitsu
SPARC64 III and later, SGI MIPS, Intel Itanium, Hewlett Packard PA-RISC, Hewlett-Packard
(merged with Compaq which acquired first Digital Equipment Corporation) DEC Alpha, IBM
POWER and Apple Computer PowerPC (specifically G4 and G5 series, as well as earlier
PowerPC 604 and 604e series) processors. In all cases, these systems are available in
unprocessed versions as well.
Mid Range Server - The Burroughs B5500 first implemented SMP in 1961, It was implemented
later on other mainframes. Mid-level servers, using between four and eight processors, can be
found using the Intel Xeon MP, AMD Opteron 800 and 8000 series and the above-mentioned
UltraSPARC, SPARC64, MIPS, Itanium, PA-RISC, Alpha and POWER processors.
MMP - Massive parallel processing Architecture (Max 32 Processor)

Each MPP machine’s processors have a dedicated disk and memory. Processors cannot access
each other’s dedicated memory except by making a request through a special inter-processor
memory link that acts as a token ring network. This link is very slow compared to the SMP
machine’s hardware memory bus. Some MPP architectures overcome this limitation, to an
extent, by using a crossbar switch. This is a special hardware switch that connects the
processors’ memory in a matrix. This speeds up inter-memory access but is also expensive
and is normally considered a premium machine. MPP machines such as Teradata and Tandem
can have crossbar switches. IBM’s SP2 and NUMA-Q come with either high-speed fiber links or
crossbar switches (the latter is more expensive).
Difference SMP/MPP
This does not mean that SMP machines are better. The physics behind a hardware bus plated
into a circuit board limits how close they can be before electromagnetic interference becomes
unmanageable. Most SMP machines are limited to about 32 processors and cannot scale
beyond that.
The MPP machines, since they do not share anything and do not have a hardware bus, can
scale theoretically to a very large size (hence they are called massively parallel
Some vendors such as SUN have scaled beyond 32 processors by getting rid of the hardware
bus and using a crossbar switch instead. The SUN UE10000 is one such example, and it can
scale up to 64 processors.
Conclusion -
Generally, both MPP and SMP machines have their own limitations. SMP machines cannot scale
and MPP machines require that applications be partitionable so they can be evenly distributed
across the MPP nodes. This also means that operations on the MPP machines need to be fairly
atomic so they do not need data from other nodes to complete their operation.

Understanding Non-Uniform Memory Access - NUMA

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Understanding Non-Uniform Memory Access - NUMA

Încărcat de

Drepturi de autor:

Formate disponibile

Understanding Non-uniform Memory Access – NUMA

The SMP and MPP Machine Architectures

This discussion requires a baseline understanding of symmetric multiprocessing (SMP) and

SMP – Symmetric Multiprocessing System (Max 8 Processor)

MMP - Massive parallel processing Architecture (Max 32 Processor)

S-ar putea să vă placă și