Documente Academic
Documente Profesional
Documente Cultură
A scaling primer
G
l
w S
Interconnect role
• Short (local) interconnect
– Used to connect nearby cells
– Minimize wire C, i.e., use short min-width wires
• Medium to long-distance (global) interconnect
– Size wires to tradeoff area vs. delay
– Increasing width Capacitance increases, Resistance
decreases Need to find acceptable tradeoff - wire sizing problem
• “Fat” wires
– Thicker cross-sections in higher metal layers
– Useful for reducing delays for global wires
– Inductance issues, sharing of limited resource
Cross-Section of A Chip
Block scaling
• Local interconnects :
int : (r/s2)(c)(ls)2 = rcl2
• Global interconnects :
int : (r/s2)(c)(l)2 = (rcl2)/s2
• v(t)=0.5v0 t = 0.69RC
– i.e., delay = 0.69RC (50% delay)
v(t)=0.1v0 t = 0.1RC
v(t)=0.9v0 t = 2.3RC
– i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)
Delay
Elmore Delay
• Driver is modeled as R
• Driver intrinsic gate delay t(B)
• Delay = all Ri all Cj downstream from Ri Ri*Cj
• Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2
• Elmore delay at n1 R(B)*(C1+C2)
n1 n2
B R(B)
C1 R(w) C2
Elmore Delay
• For uniform wire
u v u
C C(b)
x/2 x/2
R rx/2 C R rx/2
cx/4 cx/4 cx/4 cx/4 C
∆t
t_unbuf = R( cx + C ) + rx( cx/2 + C )
t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb x
Register Register
Combinational
Primary Logic Primary
Input Output
clock
l1 l2 l3 ln
Topt L 2 Rd C g rc rC g Rd c
Delay grows linearly with L (instead of quadratically)
Total buffer count
80
clk-buf
70
50
40
30
20
10
0
90nm 65nm 45nm 32nm
• Ever-increasing fractions of total cell count will be buffers
– 70% in 32nm
ITRS projections
RAT = 300
Delay = 350
Slack = -50
slackmin = -50
RAT = 700
Delay = 600
RAT = Required Arrival Time Slack = 100
Slack = RAT - Delay
RAT = 300
Decouple capacitive Delay = 250
slackmin = 50 load from critical path Slack = 50
RAT = 700
Delay = 400
Slack = 300
Timing Driven Buffering
Problem Formulation
• Given
– A Steiner tree
– RAT at each sink
– A buffer type
– RC parameters
– Candidate buffer locations
• Find buffer insertion solution such that the
slack at the driver is maximized
Candidate Buffering Solutions
Candidate Solution Characteristics
capacitance
– qi: RAT
Van Ginneken’s Algorithm
Dynamic Programming
Solution Propagation: Add Wire
• c2 = c1 + cx
• q2 = q1 – rcx2/2 – rxc1
• r: wire resistance per unit length
• c: wire capacitance per unit length
Solution Propagation: Insert Buffer
• c1b = Cb
• q1b = q1 – Rbc1 – tb
• Cb: buffer input capacitance
• Rb: buffer output resistance
• tb: buffer intrinsic delay 28
Solution Propagation: Merge
• cmerge = cl + cr
• qmerge = min(ql , qr)
Solution Propagation: Add Driver
• r = 1, c = 1
2 2
(v1, 1, 20) • Rb = 1, Cb = 1, tb = 1
• Rd = 1
Add wire
(v2, 3, 16) (v2, 1, 12)
v1 v1
Insert buffer
Add wire Add wire
(v3, 5, 8) (v3, 3, 8)
v1 v1
Left
candidates
Right candidates
Merged candidates
32
Solution Pruning
• Two candidate solutions
– (v, c1, q1)
– (v, c2, q2)
• Solution 1 is inferior if
– c1 > c2 : larger load
– and q1 < q2 : tighter timing
Pruning When Insert Buffer
(2)
(3)
(a) (b)
(4)
36
Candidate Example Continued
(4)
(5)
37
Candidate Example Continued
After pruning
(5)
38
Merging Branches
Left
Candidates
Right
Candidates
39
Pruning Merged Branches
Critical
With pruning
40
Van Ginneken Example
(20,400)
Buffer Wire
C=5, d=30 C=10,d=150
(30,250)
(5, 220) (20,400)
Buffer Wire
C=5, d=50 C=15,d=200
C=5, d=30 C=15,d=120
(45, 50) (30,250)
(5, 0) (5, 220) (20,400)
(20,100)
(5, 70)
41
Van Ginneken Example Cont’d
(45, 50) (30,250)
(5, 0) (5, 220) (20,400)
(20,100)
(5, 70)
Wire C=10
(20,100) (30,250)
(30,10) (5, 220) (20,400)
(5, 70)
(15, -10)
42
Basic Data Structure
N N
q1 < q2 ? Prune 2 q1 < q3 ? Prune 3 q1 < q4 ?
Y
Y
N Prune 3 q2 < q4 ?
q2 < q3 ?
Y
N Prune 4
N Prune 4 q3 < q4 ?
q3 < q4 ?
44
Pruning In Merging
Left Right ql1 < ql2 < qr1 < ql3 < qr2
candidates candidates
(cl1, ql1) (cr1, qr1) Merged (cl1, ql1) (cr1, qr1)
(cl2, ql2) (cr2, qr2) candidates (cl2, ql2) (cr2, qr2)
(cl1+cr1, ql1)
(cl3, ql3) (cl3, ql3)
(cl2+cr1, ql2)
(cl1, ql1) (cr1, qr1) (cl3+cr1, qr1) (cl1, ql1) (cr1, qr1)
(cl2, ql2) (cr2, qr2) (cl3+cr2, ql3) (cl2, ql2) (cr2, qr2)
(cl3, ql3) (cl3, ql3)
45
Van Ginneken Complexity
• Generate candidates from sinks to source
• Quadratic runtime
– Adding a wire does not change #candidates
– Adding a buffer adds only one new candidate
– Merging branches additive, not multiplicative
– Linear time solution list pruning
2 2 • r = 1, c = 1
(v1, 1, 20)
• Rb1 = 1, Cb1 = 1, tb1 = 1
• Rb2 = 0.5, Cb2 = 2, tb2 = 0.5
(v2, 3, 16)
v1 • Rd = 1