Documente Academic
Documente Profesional
Documente Cultură
Marc Snir
1
Switching Networks, Permutations and FFT Graphs
1-to-1 connection). 1 1
� SN is configured to connect
0 → 1, 1 → 2, 2 → 3, 3 → 2.
� SN is permutation network if it can
connect any permutation
2
Benes Network
N/2xN/2
Benes
� A network with N inputs/outputs Network
(N power of 2) is build
recursively from two lines of 2x2
switches and 2 N/2 Benes
networks
� Each switch connects to top N/2xN/2
Benes
and bottom subnetworks. Network
3
Benes Networks Permute
� Construct bipartite graph : nodes are the left and right row of
switches. Two nodes uv are connected if an input at switch u
has to reach an output at switch v .
� Color graph with two colors – this is alkways possible (all
cycles have even length)
� Route connectionss of one color thru top half subnetwork and
edges of the other color to bottom half subnetwork
� Each switch routes one connection to top subnetwork and one
connection to bottom subnetwork
� Use same approach recursively to set up each subnetwork.
4
Example
� Permute : 0->4, 1->2, 2->0, 3->7, 4->5, 5->3, 6->1, 7->6
� Coloring : 0->4, 1->2, 2->0, 3->7, 4->5, 5->3, 6->1, 7->6
10 01
0 0
1 1
23 23 2 2
3 3
4 4
45 54 5 5
6 6
7 7
67 67
5
FFT Network
001
010
011
100
101
110
111
6
Homework 1
7
Homework 2
� Prove : an FFT SN, with all switches straight, performs the bit
reversal permutation (an . . . a1 → a1 . . . an )
000
001
010
011
100
101
110
111
8
Homework 3
9
Conclusion
10
Transpose
� Specific, hard√permutation
√ : √
transpose of N × N matrix. N B
� Assume N = 2n ,n even.
Transpose is B
Xan ...a1 ↔ Xan/2 ...a1 an ...an/2+1
(rotate binary address). √
N
� Algorithm, for B 2 < M :
Processors read B × B
submatrices, transpose them in
memory and store back
� T = N/PB
11
Cont.
� B 2 < M. Assume M = 2m , B = 2b .
� Basic operation : read aligned 2m−b block of 2b words,
permute in cache, write back. Can permute
Xan ...a1 ↔ Xan ...an/2+m−b+1 am−b a1 an/2 ...am−b+1 am−b+1 an/2+m−b ...an/2+1
(permute M/B × B submatrices) in one pass – time N/BP.
� Problem essentially solved when each block (of size B)
contains the right set of elements – can then move blocks to
right place inb one pass.
� Can complete transposition in b/2(m − b) passes
� �
N lg B
T =O ·
PB lg(M/B)
12
Lower Bound
� Need to “gather” words from distinct lines into each line. Show
this cannot be done too fast.
� “Step” : One I/O operation done by one processor.
� t is the number of words in line i that have to go to line j at
xi,j
end of step t (0 < i, j ≤ N/B).
� t is the number of words in cache i that have to go to line j
yi,j
at end of step t (0 < i ≤ P, 0 < j ≤ N/B).
� Φt = ∑i,j xi,,j
t lg x t +
i,j ∑i,j yi,j lg yi,j (entropy-like function)
� Initially, xi,j = 0 or xi,j = 1, yi,j = 0, so Φ0 = 0
� T = B, x T = 0 if i �= j and y T = 0, so
Finally, xi,i i,j i,j
T
Φ = (N/B) · (B lg B) = N lg B .
13
I/O
� Easy to check that write does not increase Φ
� Read by processor i from line k at step t + 1 : Let yj = yi,j t and
t
xj = xk,i . ∑ yj = M − B and ∑ xJ = B. The change in potential
is
∇Φ = ∑ ((yj + xj ) lg(yj + xj ) − yj lg yj − xj lg xj )
j