Multicomputing and Multiprocessing

Multiprocessors
Flynns Classification of multiple-processor machines:
{SI, MI} x {SD, MD} = {SISD, SIMD, MISD, MIMD} SISD = Single Instruction Single Data Classical Von Neumann machines. SIMD = Single Instruction Multiple Data Also called Array Processors or Data Parallel machines. MISD MIMD Does not exist. Multiple Instruction Multiple Data Control parallelism.
Grains of Parallelism
Example 1. Fine grained Parallelism (SIMD) I P0 P1 P2 P3 Synchronization points Example 2. Coarse-grained Parallelism (MIMD) P0 P1 P2 P3
Arbitrary number of instructions Arbitrary number of instructions
Synchronization point
Also possible: medium grained parallelism Synchronize after a few instructions
A Taxonomy of Parallel Computers

Parallel Architecture SISD SIMD MISD MIMD
Vector
Array
Multiprocessors
Multicomputers
UMA Bus Switched
COMA CC-NUMA
NUMA
MPP
COW Cube
NC-NUMA Grid
Shared memory
Message Passing
UMA NUMA COMA MPP COW CC-NUMA NC-NUMA
Uniform Memory Access Non Uniform Memory Access Cache Only Memory Access Massively Parallel Processor Cluster Of Workstations Cache Coherent NUMA No Cache NUMA
Vector vs. Array Processing

Vector Processing
Function pipe CRAY-1 defined the style of vector processing Let n be the size of each vector. Then, the time to compute f (V1, V2) = k + (n-1), where k is the length of the pipe f.
Array Processing In array processing, each of the operations f (v1j, v2,j) with the components of the two vectors is carried out simultaneously, in one step.
In the early 60s ILLIAC 4 defined the style of array processing. More recent efforts are Connection machine CM-2, CM-5, known as data-parallel machines. Now it is virtually dead, but SIMD concept is useful in MMX / SSE instructions
MIMD Machines (Multiprocessors & Multicomputers) Multiprocessors
P P
Shared Memory
Processors are connected with memory via the memory bus (which is fast), or switches.
The address spaces of the processes overlap. Interprocess communication via shared memory
Multi-computers
P M
Interconnection Network
Processes have private address space. Interprocess communication via message passing only.
In clusters, computers are connected via LAN
Major Issues in MIMD
1. ProcessorMemory Interconnection 2. Cache Coherence 3. Synchronization Support 4. Scalability Issue 5. Distributed Shared memory
What is a switch?
P P P P
M M M M
P0 P1 P2 P3 P4 P5 P6 P7
M0 M1 M2 M3 M4 M5 M6 M7
An Omega Network as a Switch
Blocking vs. Non-blocking Networks Non-blocking Network All possible 1-1 connections among PEs feasible.
Blocking network Some 1-1 mappings are not feasible. Reminds us of telephone connections .
Omega network is a blocking network. Why? How many distinct mappings are possible?
Cross-bar switch
An (N x N) cross bar is a non-blocking network. Needs N2 switches. The cost is prohibitive. PE0 PE1 PE2 PE3 M0 M1 M2 M3
UMA Bus-based Multiprocessors (SMP) M M

BUS
Central Shared Memory
Limited bus bandwidth is a bottleneck. So processors use private caches to store shared variables. This reduces the bus traffic. However, due to bus saturation, the architecture is not scalable beyond a certain limit (say 8 processors).
UMA Multiprocessors Using Cross-Bar Switch
P0 P1 P2 P3 M0 M1 M2 M3 Example: SUN Enterprise 10000
Uses a 16 x 16 cross-bar switch, called the Gigaplane-XB. Each P is a cluster of four 64-bit UltraSparc CPUs. Each M is of size 4 GB. At this scale, there is no resource saturation problem, but the switch cost is O(N2).
UMA Multiprocessors Using Multistage Interconnection Networks
M000 M001 M010 P
0 1
M011 M100 M101 M110 M111
Notice the self-routing property of the above network.
M000 M001 M010 P P
0 1
M011 M100 M101
2x2 cross-bar
M110 M111
Eventually leads to an Omega or Cube like Network P0 P1 P2 P3 P4 P5 P6 P7 M0 M1 M2 M3 M4 M5 M6 M7
This switch, or a variation of it is used in connecting processors with memories. Self-routing network. No central router is necessary. Switch complexity is (N/2 log2N). Scalable architecture. Blocking network leads to switch conflicts.
NUMA Multiprocessors
P M P M P M P M
Example of Cm* in CMU
Cm* is the oldest known example of NUMA architecture. Implemented a shared virtual address space. Processors have faster access to memory modules that are closest. Access to remote memory modules is slower. Todays large-scale NUMA machines use multi-stage interconnection networks (instead of bus) for better scalability.

Multicomputing and Multiprocessing

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Multicomputing and Multiprocessing

Încărcat de

Drepturi de autor:

Formate disponibile

Multiprocessors

Flynns Classification of multiple-processor machines:

Also possible: medium grained parallelism Synchronize after a few instructions

A Taxonomy of Parallel Computers

UMA Bus Switched

UMA NUMA COMA MPP COW CC-NUMA NC-NUMA

Vector vs. Array Processing

MIMD Machines (Multiprocessors & Multicomputers) Multiprocessors

In clusters, computers are connected via LAN

Major Issues in MIMD

An Omega Network as a Switch

UMA Bus-based Multiprocessors (SMP) M M

Central Shared Memory

UMA Multiprocessors Using Cross-Bar Switch

P0 P1 P2 P3 M0 M1 M2 M3 Example: SUN Enterprise 10000

UMA Multiprocessors Using Multistage Interconnection Networks

M000 M001 M010 P

M011 M100 M101 M110 M111

Notice the self-routing property of the above network.

M000 M001 M010 P P

M011 M100 M101

Eventually leads to an Omega or Cube like Network P0 P1 P2 P3 P4 P5 P6 P7 M0 M1 M2 M3 M4 M5 M6 M7

Example of Cm* in CMU

S-ar putea să vă placă și