Documente Academic
Documente Profesional
Documente Cultură
Outline
Introduction to Network-on-Chip
New challenges
Scenario
Cache implications
Routing algorithms
Types
Limitations
Router microarchitecture
Flit based
Optimization dimensions
Principles and Practices of Interconnection Networks, William J. Dally and Brian Towles
Other people
Li-Shiuan-Peh, MIT
Network scenario
10
Network scenario
11
Why networks ?
12
13
14
15
16
17
18
On-chip vs off-chip
Significant research in multi-chassis interconnection networks (off-chip)
Internet routers
Pin-limited bandwidth
New research area to meet performance, area, thermal, power and reliability
needs (On-chip)
19
20
Some examples
BLUEGENE/L
IP Routers
- Huge power
consumption
- One million Watts
- Complicated
network structure
- Constrained by costs
+ regulatory limits
- ~200W line card
- ~60W
interconnection
network
IB
4X
CPU
System
logic
Alpha 21364
- Packaging and
cooling costs
Dells law <= $25
- Router+link
~25W
MIT Raw CMP
- Complicated
communication
networks
- On-chip network
consumes about
36% of total chip
power
On-chip Networks
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
21
Topology
Routing
Properties
Deadlock avoidance
Router microarchitecture
Baseline model
Optimizations
Metrics
Power
Performance
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
22
General Purpose
Multi-cores
Shared
Memory
Distributed memory
(or Message Passing)
23
Here we are
Shared
Memory
General Purpose
Multi-cores
Distributed memory
(or Message Passing)
24
25
Message Passing
Shared Memory
26
Logically
Practically...
27
28
Intel SCC
29
Requires:
Data requests
Data responses
Coherence permissions
30
Rough goal:
Two solutions:
Broadcast-based protocol:
All processors see all requests at the same time, same order.
31
Broadcast-based coherence
32
Dirty replacements
Some parameters:
Some results:
33
Bus-based interconnect:
34
35
Directory protocol
Broadcast protocol
36
Private caches
Multiple L2 copies
37
Private caches
Shared caches
38
Private L2
Cache
Hit A 3Tag
s
Router
A
Logic
Data
Controller
L1 I/D
Cache
2
Core
Miss
A
LD A
Memory Controller
Source: Chita Das, ACACES Summer School, 2011
39
Miss
A
Private L2
3 Cache
Tag
Data
s
(off-chip)
Router
Logic
6
Data received,
sent to L2
Controller
L1 I/D
Cache
2
Miss
A
Core
LD A
Memory Controller
Request sent offchip
40
41
(on-chip)
Shared L2 Cache
Tags
Data
Logic
Send data to
6
requestor
Receive message
and sent to L2 4
Shared L2 Cache
L2 Hit
Controller
Tags
L1 I/D
Cache
Core
Data
Controller
1 LD A
Miss A
A
Memory
Controller
Router
L1 I/D
Cache
Logic
A
Core
42
Network-on-Chip details
43
Topology nomenclature 1
Direct
Source: Natalie Jerger, ACACES Summer School, 2012
Indirect
44
Higher degree requires more links and port counts at each router
2,3,4
45
Max=4
Avg=2.2
Max=4
Avg=1.77
Max=2
Avg=1.33
Abstract metrics are just proxies: Does not always correlate with the real metric
they represent
Example:
Network A with 2 hops, 5 stage pipeline, 4 cycle link traversal vs.
Network B with 3 hops, 1 stage pipeline, 1 cycle link traversal
Hop Count says A is better than B
But A has 18 cycle latency vs. 6 cycle latency for B
46
Traffic patterns
47
Arbitration
Problems
Starvation
Switching
Switching techniques
48
49
Deadlock
Packets
Routing
algorithm
Flow control
Router/switch
Throughtput
50
Low power
Limited resources
High performance
High reliability
Thermal issues
On-chip network
criticalities
51
Routing overview
52
Types
Routing path
Minimal: all packets uses the shortest path from source to destination
Non-minimal: packets may be routed to a longer path depending for
example on network state
Number of destinations
53
Leads to deadlock
54
Deterministic routing
Aka XY routing
Cons:
Deadlock-free (why???)
Pros:
55
Deterministic routing
56
Adaptive routing
57
58
Fully adaptive
59
N to E, N to W, S to E, S to W
No adaptivity
It is possible to do better?
Turn model
60
Basic steps
Identify the cycles combining turns, i.e. the most single cycles
61
Example on a 2D-mesh
2 simple cycles
62
Not all turns are valid to remove cycles and preserve deadlock free property
63
An example
64
An example
65
An example
66
North: top-right
West: top-left
South: bottom-left
East: bottom-right
67
68
circuit switching
Packet-based: allocation made to whole packets
Wormhole
Virtual channels
69
Pipelined in 4 stages
BW,RC,SA,ST,LT
Buffer states:
G idle,routing,active waiting,
70
71
Router components
Router components
Input buffers, route computation logic, virtual channel allocator, switch allocator,
crossbar switch
72
Router components
Body and tail flits inherit this info from head flit
Router performance
73
Overlap with BW
74
75
Li-Shiuan Peh and William J. Dally. 2001. A Delay Model and Speculative Architecture for Pipelined Routers
76
Li-Shiuan Peh and William J. Dally. 2001. A Delay Model and Speculative Architecture for Pipelined Routers
77
Flow control
78
Bufferless, buffered
No buffers
Two modes
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
79
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
80
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
81
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
82
Buffers
Two modes
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
83
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
84
to
manage
than
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
85
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
86
87
How to manage buffers between neighbors (i.e. how can I know the downstream
destination router buffer is full?)
Three ways:
Credit based
Ack/nack
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
88
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
89
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
90
91
Network power
breakdown
92
Bibliography 2
Dally, W. J., and B. Towles [2004]. Principles and Practices of Interconnection Networks,
Morgan Kaufmann Publishers, San Francisco.
C.A. Nicopoulos, N. Vijaykrishnan, and C.R. Das, Network-on-Chip Architectures: A Holistic
Design Exploration, Lecture Notes in Electrical Engineering Book Series, Springer, October 2009.
G. De Micheli, L. Benini, Networks on Chips: Technology and Tools, Morgan Kaufmann, 2006.
J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach,
Morgan Kaufmann, 2002.
R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, Y. Hoskote, 'Outstanding Research Problems in
NoC Design: System, Microarchitecture, and Circuit Perspectives', IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems (TCAD), vol. 28, pp. 3-21, Jan. 2009.
T. Bjerregaard and S. Mahadevan, A survey of research and practices of network-onchip, ACM
Comput. Surv., vol. 38, no. 1, pp. 151, Mar. 2006.
Natalie Enright-Jerger and Li-Shiuan Peh, "On-Chip Networks", Synthesis Lecture, Morgan-Claypool
Publishers, Aug. 2009
Agarwal, A. [1991]. Limits on interconnection network performance, IEEE Trans. on Parallel
and Distributed Systems 2:4 (April), 398412.
Dally, W. J., and B. Towles [2001]. Route packets, not wires: On-chip interconnection
networks, Proc. of the Design Automation Conference, Las Vegas (June).
Ho, R., K. W. Mai, and M. A. Horowitz [2001]. The future of wires, Proc. of the IEEE 89:4 (April).
Hangsheng Wang, Xinping Zhu, Li-Shiuan Peh and Sharad Malik, "Orion: A Power-Performance
Simulator for Interconnection Networks" , In Proceedings of MICRO 35, Istanbul, November 2002.
D. Brooks, R. Dick, R. Joseph, and L. Shang, "Power, thermal, and reliability modeling in
nanometer-scale microprocessors, " IEEE Micro , 2007.
93
94
Thank you
Any questions?