Chapter 2

3
STRUCTURES FOR ARRAY PROCESSORS
A synchronous array of parallel processors is called an array processor, which consists of multiple processing elements under the supervision of one control unit. An array processor can handle single instruction and multiple data stream streams. In this since, array processors are also known as SIMD computers. SIMD machines are especially designed to perform vector computations over matrices or arrays of data. In this book, the terms array processors, parallel processors, and SIMD computers are used interchangeably. SIMD computer organization In general, an array processor may assume one of two slightly different configurations as illustrated below.
Shared memory
PE1 MU PE
PE2 MU
...
PEN MUN
PE1
PE2
...
Interconnection network
In the first configuration, an array processor consists of N processors linked to the common memory. The processors share the common memory. If two or more processors need to fetch information from the same memory section, the control unit forms a queue of the requests and serves the requests in order of their arrive time. The second configuration differs from the first one in two aspects. First, each processor has own local memory. So, the processors dont need to fetch information from the shared common memory. Second, the system is enhanced with an interconnection network. Any processor may receive (or send) data from (or to) any other processor via interconnection network. The second configuration has an advantage over the first configuration such that it is faster than the first one. The processors communicate via interconnection network instead of waiting in a long queue to write (or read) data in (or from) the common memory.
3.1
Interconnection networks
An interconnection network is an integral part of the architecture of SIMD computers. The performance of SIMD computer is essentially depends on type, topology and characteristics of interconnection network have been chosen.
Operation mode Two types of communication can be identified in interconnection networks: synchronous and asynchronous. Synchronous communication is needed for establishing communication paths synchronously for either a data manipulation function or for a data instruction broadcast. Asynchronous communication is needed for multiprocessing in which connection requests are issued dynamically. Sometimes, a system may facilitate both synchronous and asynchronous processing called combined processing. Usually, SIMD interconnection networks operate synchronously. Control strategy A typical interconnection network consists of switching elements and interconnection links. Interconnection functions are realized by properly setting control of the switching elements. The control setting function can be managed by a centralized controller or by the individual switching elements. The latter strategy is called distributed control and the first strategy corresponds to centralized control. Most existing SIMD interconnection networks employ the centralized control on all switch elements by the control unit. Network topology A network topology can be depicted as a graph in which nodes represent switching points and edges represent communication links. Network Topologies
Static Crossbar
Dynamic Singlesta Nonblockin Multistag Rearrangeabl Blocking
The topologies are classified into categories: static and dynamic. In a static topology, links between two processors are passive and dedicated buses cannot be reconfigured for direct connections to other processors. On the other hand, links in the dynamic topology can be reconfigured by setting the networks active switching elements. Topologies in the static networks can be classified according to the dimensions required for layout. For illustration, 1-dimentional, 2dimensional, 3-dimentional are shown below. 1-dimentional interconnection network
(a) Linear 2 A
1-dimentional interconnection network includes the linear array used for pipeline architectures. 2-dimentional topologies include the ring, star, tree, mesh, and systolic array. 3-dimentional topologies include 3cube, cube connected cycles, etc. 2-dimentional interconnection networks
(b) Ring
(c) Mesh
(d) Star
(e) Tree
(f) Systolic
3-dimentional interconnection
(g) 3-cube
We consider two classes of dynamic networks: single-stage and multistage. A single-stage network, also called a recirculating network, is a network with N inputs and N outputs. Such a network consists of a single stage of switches with outputs connected to their (or processors) inputs through wrap around connections. Data items may have to recirculate through the single stage several times before reaching their final destinations. The number of recirculations needed depends on the connectivity in the single-stage network. In general, the higher is the network connectivity, the less is the number of recirculations. The crossbar network is the extreme case in which only one circulation is needed to establish any connection path. However, the 2 fully connected crossbar networks have a cost O( N ) , which may be prohibitive for large N. Most recirculating networks have cost O( N log N ) or lower, which is definitely more cost-effective for large N. Many stages of interconnected switches form a multistage network. Many switch boxes are used in a multistage network. Each switch is essentially an interconnection device with two inputs and outputs. All four possible states of a switch are illustrated below. These are: straight, exchange, upper broadcast, and lower broadcast. A four-function switch may be in any of the four legitimate states. In SIMD interconnection networks, a crossbar switch can be set to one of the two states: straight or exchange.
0 0
1 (a) crossbar switch

0 0 0 0 0
0 0
1 (b) straight
1 1 (c) exchange
1 1 (d) upper
1 1 (e) lower
Blocking networks In blocking networks, simultaneous connections of more than one terminal pair may result in conflict in the use of network communication links. Examples of blocking networks are (for omega), 1 (for flip), BL (for baseline), BL (for butterfly), etc. There is special class of blocking networks called banyan networks. Banyan network is a full access, unique path network. A network is a full access network if there is path in the network connecting any input with any output. A network is a unique path network if there is exactly one path connecting any input output pair. The , 1 , BL, and BF are banyan networks. Any full access and unique-path blocking network is self-rooting. In a self-rooting network, rooting can be performed in a distributed manner using the destination address as a routing-tag. That is, any output of a network can be reached by any input by simply following the binary address of the output. A self-rooting
algorithm uses the ith most significant of the destination address to set up the switch in the ith stage, selecting the upper output of switch if this bit is 0 and the lower output otherwise. For example, in the network shown below output 5 (101 in binary) can be reached from any of the inputs by choosing the lower terminal in stage 1, upper output in stage 2, and lower output in stage 3. Similar self-rooting algorithm exists for almost all Banyan networks.
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
Omega network, N=8
(a) Omega network, N=8
(b) Flip network, N=8
(c) Butterfly network, N=8
(a) Omega network, N=8
A direct consequence of self-rooting property of the network is that the path from input to an output within the network is completely specified if the addresses of input and output are known. First, let the inputs and output of the network be numbered in binary with number from 0 to N-1. An input x represented by xn xn-1 x1, and output y as yn yn-1 y1. Then the 2n-bit string xn xn-1 x1 yn yn-1 y1 specifies the path from x to y completely. This 2n-bit string is referred as path-identifier. The position of the path at any
intermediate stage of the network can be found by observing an n-bit string in path identifier. Before start learning the above mentioned networks we define some n useful permutations. Let x {0,1, , N 1} , where N = 2 and x = x n x n 1 x 2 x1 . The perfect shuffle is a permutation such that ( x) = x n 1 x n 2 x1 x n , that is (x) is a left circular shift of the bits in the binary representation of x. The 1 inverse ( x) , of the perfect shuffle is called unshuffled. For 1 k n , define kth subshuffle k (x) by k ( x ) = x n x n 1
x k +1 x k 1 x k 2 x1 x k , which is a perfect
k shuffle on the k least significant bits of x. Similarly, define (x) , as the kth k supershuffle: ( x) = x n 1 x n 2 x n k +1 x n x n k x 2 x1 , that is, the kth supershuffle is a perfect shuffle on the k most significant bits of x. The bit reversal is a permutation such that ( x) = x1 x 2 x n 1 x n . The butterfly is a permutation
k suck that ( x) = x1 x n 1 x n 2 x3 x 2 x n . For 1 k n , k (x) and (x) , the kth sub and super bit reversals are defined as the bit reversal operations on the least and most significant k bits of x. A similar definition leads to k (x) and
k (x) .
n The baseline network admits a recursive characterization. Let N = 2 . BL2 is the single-stage network consisting of a 2x2 switch. For N 4 , BL N 2 are available, BL N is serial cascade of a singleassuming that copies of BL N 2 with unshuffled permutation between stage of 2x2 switches and two them. The N network has log N stages of 2x2 switches where the interstage
1 connection pattern is perfect shuffle. The network is just inverse of . It is obvious, the interstage connection pattern between consecutive stages is unshuffle. The BF network, for 2<k<n employs k (x) connection pattern
between stages k-1 and k . It can be easily observed that a blocking network can realize only some of N! one-to-one mappings inputs of the network into its outputs. In general, a
N 2 = N N out of N! mappings. A (log N)-stage network can realize only 2 ratio of the number permutations realized by network to the number of all permutations is called a combinatorial power of a network. The combinatorial log N
NN 1 N! . power of , , BL, and BF networks is Balanced Matrices Characterization The functional characterization is useful for checking permutations that can be expressed in algebraic form. We first define the function L(x1, x2) to be the number of least significant bits CP =
which agree in the binary expressions of x1 and x2. Similarly the function M(y1, y2) equals to the number of most significant bits in the binary expressions of y1 and y2. Now, the network with N inputs/outputs has a conflict for permutation if there are inputs x, y such that L( x1 , x 2 ) + M ( y1 , y 2 ) n . That is, a conflict is caused by two inputs x1 and x2 if the sum of the number of their common least-significant bits and the number of their most-significant bits of their outputs equals or exceeds n. Conversely, if this sum is less than n for all pairs of inputs, the permutation is realizable by the Omega network. This idea is the basis of the method called balanced matrices characterization. We will illustrate this method with some examples. Consider the following permutation which is expressed as input-output pairs in binary representation. Inputs 000 001 010 011 100 101 110 111 Outputs 010 110 100 101 111 011 001 111
The above table can be viewed as a matrix of binary values with N rows and 2n columns. This permutation can be tested for Omega-passability by verifying the above condition for every pair of rows in this matrix. This representation, however, makes it unnecessary to test the condition for all pairs of inputs. Instead, we start from the left column of this Nx2n matrix and examine the n consecutive columns of the matrix, moving towards right. We refer to the Nxn matrix formed by the consecutive columns of the matrix in above table, starting at column i, as the ith window. The first window is merely the matrix of the N input labels and the (n+1)th window is the matrix of the outputs. Now consider Nxn matrix corresponding to any window. All number in the first window are distinct since they represent the inputs. The numbers occurring in the intermediate windows are determined by the permutation. If the rows in every window of the matrix are distinct , then the permutation is realizable by the Omega network. An Nxn matrix satisfying this condition is called a balanced matrix. Conversely, if two identical rows appear in any window, the matrix is not balanced, and the corresponding permutation is not realizable by the Omega network. Rearrangeable and nonblocking networks A network is called a rearrangeable network if it can perform all possible connections between inputs and outputs by rearranging its existing connections so that a connection path for a new input-output pair can always be established. It is
well-known that a rearrangeable network consists of at least 2log N stages. A 1 serial cascade of any two of the , , BL, BL-1, BF, and BF-1 networks, in which the last stage of the first network is merged with the first stage of the second network, is rearrangeable network. A network that can handle all possible connections without blocking is called a nonblocking network. Although both rearrangeable and nonblocking netwoks have combinatorial power equal 1, there is essential difference between them. In nonblockng network, whatever the existing connection paths it is always possible to establish a path between a new input output pair. This is because, there is more than one paths between any input output pair. While in rearrangeable network, a path can sometimes be established by going back and changing the states of switches. Historically the first economical design of interconnection networks with CP=1 but requiring fewer than N2 switches was given by Clos. Let N=n2 for some n>1. The three-stage symmetric Clos network consists of n copies of nxm and mxn crossbar switches in the first and third stages. Where m=2n-1. The middle stage is made up of m copies of nxn crossbar switches. In such a network any switch of the first stage can reach any switch in the second stage. Similarly, any switch of the second stage is directly connected to any switch in the third stage. 3.2 Problems 1. Use path identifer to find a path connecting input 5 to output 7 in Omega network with 16 inputs and outputs. Solution. Path identifer is of the form 01010111. So, a path connecting 5 to 7 passes through output (1010)2 = 10 of the first stage, output (0101)2 = 5 of the second stage, (1011)2 = 11 of the third stage and (0111)2 = 7 of the fourth stage. 2. Calculate combinatorial power of the network in problem 1. 2 32 . Solution. CP = 16! 3. Use balanced matrix characterization to determine whether or not 0 1 2 3 4 5 6 7 permutation = 7 5 6 4 2 0 1 3 can be realized by Omega network with 8 inputs and outputs. Solution. Matrix of input and out indices in binary has the following view: 000 001 010 011 100 101 111 101 110 100 010 000
110 111
001 011
It can be easily observed that in the 3rd window 011 appears twice. So, the given network cannot realize this permutation.

Chapter 2

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chapter 2

Încărcat de

Drepturi de autor:

Formate disponibile

3

STRUCTURES FOR ARRAY PROCESSORS

Dynamic Singlesta Nonblockin Multistag Rearrangeabl Blocking

1 (a) crossbar switch

Omega network, N=8

(a) Omega network, N=8

(b) Flip network, N=8

(c) Butterfly network, N=8

(a) Omega network, N=8

S-ar putea să vă placă și