Documente Academic
Documente Profesional
Documente Cultură
Router Microarchitecture
VC Allocator
Route
Computation
Switch Allocator
VC 1
VC 2
Input 1 Output 1
VC 3
VC 4
Input buffers
VC 1
VC 2
Input 5 VC 3 Output 5
VC 4
• Logical stages
– Fit into physical stages depending on frequency
Head
Body 1
Body 2
Tail
N
1
1b
A S
Eject N S E
2 W
• No buffered flits when A arrives
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 14
Speculation
A succeeds in VA
Virtual Channel 2a but fails in SA,
Allocation retry SA
1a Lookahead Routing
Computation Switch Allocation 2b
3
Inject Port conflict
1 1c 1b detected
B N
1 1c 1b
A S
Eject N S E W
4
Winter 2011 3
ECE 1749H: Interconnection Networks (Enright Jerger) 15
Buffer Organization
Physical
channels Virtual
channels
VC 1 tail head
• Light traffic
– Many shallow VCs – underutilized
• Heavy traffic
– Few deep VCs – less efficient, packets blocked due to lack
of VCs
o0 o1 o2 o3 o4
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 19
Switch Organization: Crosspoint
Inject w columns
N
w rows
S
Eject N S E W
• Area and power scale at O((pw)2)
– p: number of ports (function of topology)
– w: port width in bits (determines phit/flit size and
impacts packet energy and delay)
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 20
Crossbar speedup
10:5 5:10 10:10
crossbar crossbar crossbar
Inject
E-in E-out
W-in W-out
N-in N-out
S-in S-out
Eject
• Exhibits fairness
Grant 2
01 02
Request 1
Grant 1
10 12
Request 2
Grant 2
20 21
0
01 0
02
Request 1
Grant 1
1
10 0
01 A2 A1
Request 2 B1
Grant 2
20
1 1
01 C2 C 1
0
01 1
02
Request 1
Grant 1
1
10 1
01 A2 A1
Request 2 B1
Grant 2
20
0 0
01 C2
1
01 1
02
Request 1
Grant 1
0
10 0
01 A2 A1
Request 2
Grant 2
20
0 1
01 C2
0
01 0
02
Request 1
Grant 1
1
10 0
01 A2
Request 2
Grant 2
20
1 1
01 C2
0
01 1
02
Request 1
Grant 1
1
10 1
01 A2
Request 2
Grant 2
20
0 0
01
• Cells that cannot use tokens pass row tokens to right and
column tokens down
Remaining tokens
B requesting pass down and
p1 1 1 1 1 right
Resources 0, 1
0 1 2 3
C requesting
Resource 0 p2 2 2 2 2
0 1 2 3 [3,2] receives 2
tokens and is
D requesting granted
Resources 0, p3
2 3 3 3 3
0 1 2 3
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 35
Wavefront Allocator Example
p0
0 0 0 0
0 1 2 3
p1 [1,1] receives 2
1 1 1 1
tokens and granted
0 1 2 3
p2 All wavefronts
2 2 2 2 propagated
0 1 2 3
p3
3 3 3 3
0 1 2 3
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 36
Separable Allocator
• Need for pipelineable allocators
• A 3:4 allocator
• First stage: 3:1 – ensures only one grant for each input
• Second stage: 4:1 – only one grant asserted for each output
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 38
Separable Allocator Example
Requestor 1
A wins A
B 3:1
arbiter 4:1
C
arbiter
A
3:1
B arbiter
4:1
A arbiter
3:1
arbiter Requestor 4
4:1 wins C
A arbiter
3:1
C arbiter
• 4 requestors, 3 resources
• Arbitrate locally among requests
– Local winners passed to second stage
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 39
Virtual Channel Allocator Organization
• Depends on routing function
– If routing function returns single VC
• VCA need to arbitrate between input VCs contending
for same output VC
– Returns multiple candidate VCs (for same physical
channel)
• Needs to arbitrate among v first stage requests before
forwarding winning request to second stage)
pov:1 piv:1
arbiter arbiter
piv po v
• Adaptive routing
– Returns multiple candidate output ports
• Switch allocator can bid for all ports
• Granted port must match VC granted
– Return single output port
• Reroute if packet fails VC allocation
• Second stage:
– Po pi:1 arbiters
• Winners of v:1 arbiters select output port request of
winning VC
• Forward output port request to pi:1 arbiters
SA + BFC + VA
(0.016 mm2)
(0.041 mm2
Req Grant + misc control lines (0.043 mm2)
16 Bytes 5x5
P2 Crossbar P3
(0.763 mm2)
P4
West Output
West Output Module
Switch
M6
East Output
WR
Latency Throughput
Zero load latency
(topology+routing+flo given by
w control) routing
Throughput
Min latency
given by
given by routing
topology
algorithm
Min latency
given by
Offered Traffic (bits/sec)
topology
D L
Tactual H Trouter Tc
v b
50
Throughput gap
Latency (cycles)
40
30
20
Latency gap
10
0
0.1 0.3 0.5 0.7 0.9
Injected load (fraction of capacity)
L
E ideal D Pwire
b
– D = Distance
– Pwire = transmission power per unit length
Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 56
Energy Gap
6
5
Network energy (mJ)
0
0 0.2 0.4 0.6 0.8
Injected load (fraction of capacity)
baseline ideal