Sunteți pe pagina 1din 65

VLSI Design

Chapter 4:
Delay

Delay Definitions
when an input changes, the output will retain

its old value for at least the contamination


delay and take on its new value in at most the
propagation delay.

Timing Optimization
In Digital circuits, there will be number of critical paths that limit the

operating speed of the system and require attention to timing details.


The critical Paths can be affected at four main levels:
1. The architectural/Micro-architectural level (how many gate delays fit in a clock
cycle, how quickly addition occurs, how fast memories are accessed, how long signals take to
propagate along a wire, number of pipeline stages, parallelism, size of memories)

2. The Logical Level (types of functional blocks used (eg., ripple carry or lookahead adders),
the number of stages of gates in a clock cycle, and fan in and fan out of the gates.)

3. The Circuit Level (delay can be tuned at the circuit level by choosing transistor sizes or
using other styles of CMOS logic.)

4. The Layout Level (Efficient floorplan(either manually or automated) as it determines the


wire lengths that can dominate delay. Good cell layouts can also reduce parasitic capacitance.)

Delay Estimation
We would like to be able to easily estimate delay

Not as accurate as simulation

The step response usually looks like a 1st order RC

response with a decaying exponential.


Use RC delay models to estimate delay

C = total capacitance on output node


Use effective resistance R
So that tpd = RC

Characterize transistors by finding their effective R

Depends on average current as gate switches

Effective Resistance
Shockley models have limited value

Not accurate enough for modern transistors


Too complicated for much hand analysis
Simplification: treat transistor as resistor
Replace Ids(Vds, Vgs) with effective resistance R

Ids = Vds/R

R averaged across switching of digital gate


Too inaccurate to predict current at any given time
But good enough to predict RC delay

RC Delay Model
RC delay models approximate the nonlinear transistor I-V and C-V

characteristics with an average resistance and capacitance over the switching


range of the gate.
Use equivalent circuits for MOS transistors
Ideal switch + capacitance and ON resistance
Unit nMOS has resistance R, capacitance C
Unit pMOS has resistance 2R, capacitance C
Capacitance proportional to width
Resistance inversely proportional to width
d

d
k
s

s
kC

R/k
g

kC

kC
s

d
k
s

kC
2R/k

kC
kC
d

10

RC Values
Capacitance

C = Cg = Cs = Cd = 2 fF/m of gate width

Values similar across many processes


Resistance
R 6 K*m in 0.6um process
Improves with shorter channel lengths
Unit transistors
May refer to minimum contacted device (4/2 )
Or maybe 1 m wide device
Doesnt matter as long as you are consistent

11

Inverter Delay Estimate


Estimate the delay of a fanout-of-1 inverter

2 Y

12

Inverter Delay Estimate


Estimate the delay of a fanout-of-1 inverter
2C
R

2 Y

2C

2C
Y

13

Inverter Delay Estimate


Estimate the delay of a fanout-of-1 inverter
2C
R

2 Y

2C

2C

2C

2C

Y
R

R
C

C
C

14

Inverter Delay Estimate


Estimate the delay of a fanout-of-1 inverter
2C
R

2 Y

2C

2C
Y

2C
R

2C

C
C

d = 6RC
15

Example
Sketch a 3-input NAND gate with transistor

widths chosen to achieve effective rise and


fall resistance equal to that of a unit inverter
(R).
Annotate the gate with its gate and diffusion
capacitances. Assume all diffusion nodes are
contacted.
Then sketch equivalent circuits for the falling
output transition and for the worst-case rising
output transition.
16

Example: 3-input NAND


A 3-input NAND with transistor widths chosen to

achieve effective rise and fall resistances equal to a


unit inverter (R).

2
3
3
3
17

3-input NAND Caps


Annotate the 3-input NAND gate with gate and

diffusion capacitance.
2C
2

2C
2C

2C
2

2C
2C
3C
3C
3C

2C
2

3
3
3

2C
2C
3C
3C
3C
3C
18

3-input NAND Caps Continue


Annotate the 3-input NAND gate with gate and

diffusion capacitance.

2
3

5C
5C
5C

3
3

9C
3C
3C
19

3-input NAND Falling and Rising


Then sketch equivalent circuits for the falling

output transition and for the worst-case rising


output transition.

20

Elmore Delay Model


Estimates Propagation Delay as the sum over each

node in the ladder of the resistance between that


node and the source, multiplied by the capacitance
on the node:

t pd

Ri to sourceCi

nodes i

R1C1 R1 R2 C2 ... R1 R2 ... RN C N


R1

R2

R3

C1

C2

RN
C3

CN
21

ExampleElmore Delay Model


Find the Elmore Delay of the nodes Vout3 and Vout4 in

the RC tree:

22

Propagation Delay
Example: 2-input NAND
Estimate worst-case rising and falling delay of 2-input

NAND driving h identical gates.

2x

Y
h copies

23

Propagation Delay
Example: Estimate worst-case rising and falling

propagation delays of a 2-input NAND driving h


identical gates.

2x

6C
2C

Y
4hC
h copies

24

Propagation Delay
Estimate worst-case rising and falling propagation

delays of a 2-input NAND driving h identical gates.


2

2x

6C

Y
4hC

2C
h copies

For the input combination : A=0&B=1

Y
(6+4h)C

t pdr
25

Propagation Delay
Estimate worst-case rising and falling propagation

delays of a 2-input NAND driving h identical gates.


2

2x

6C

Y
4hC

2C
h copies

For the input combination : A=0&B=1

Y
(6+4h)C

t pdr 6 4h RC
26

Propagation Delay
Estimate worst-case rising and falling propagation

delays of a 2-input NAND driving h identical gates.

2x

6C

Y
4hC

h copies

2C

27

Propagation Delay
Estimate worst-case rising and falling propagation

delays of a 2-input NAND driving h identical gates.

2x
x

R/2

R/2
2C

Y
(6+4h)C

Y
4hC

6C

h copies

2C

t pdf
28

Propagation Delay
Estimate worst-case rising and falling propagation

delays of a 2-input NAND driving h identical gates.


2

2x
x

R/2

R/2
2C

Y
(6+4h)C

Y
4hC

6C

h copies

2C

t pdf 2C

R
2

6 4h C R2 R2

7 4h RC

29

Contamination Delay
Best-case (contamination) delay can be substantially

less than propagation delay.


Ex:(rising case) If both inputs fall simultaneously
2

2x

6C

R R
Y
(6+4h)C

Y
4hC

2C

tcdr 3 2h RC

Ex:(falling case) If B = 1(already) and A 1(rising),

node x is already discharged and thus could be


ignored:
2

2x

6C
2C

Y
4hC

x
R/2

R/2
2C

Y
(6+4h)C

tcdf = (6 + 4h) RC
30

Normalized Delay Inverter


Express delay in a process-independent form
Circuits can be compared based on topology rather than speed of the

manufacturing process
Observe that the delay of an ideal fanout-of-1 inverter with no parasitic capacitance
is = 3RC
Normalized Delay (d) relative to this inverter delay:

Hence, the delay of a

fanout-of-h inverter can be written in normalized form

as, (diffusion capacitance = gate capacitance)


d=h+1

31

Linear Delay Model


RC model showed that delay is a linear function of fanout of a

1.
2.

gate
Designers simplfy delay analysis by characterizing a gate by
the slope and y-intercept of the function
In LDM, normalized delay is expressed in terms of two
components of delay:
d=f+p
The parasitic delay (p) is the time for a gate to drive its own
internal diffusion capacitance (no load)
The effort delay/stage effort (f) depends on h and g:
f = gh
h (fanout/electrical effort) that is the ratio of the capacitance of
the external load to input capacitance of the gate
Complexity (logical effort, g) of the gate. Complex gates have
greater logical efforts, indicating that they take longer to drive
32
a given function

Linear Delay Model


A gate driving h identical copies

of itself is said to have a fanout


or electrical effort of h.
If the load does not contain
identical copies of the gate, the
electrical effort can be
computed as:

33

Computing Logical Effort


DEF: Logical effort is the ratio of the input

capacitance of a gate to the input capacitance of an


inverter delivering the same output current.
Measure from delay vs. fanout plots
Or estimate by counting transistor widths
2

2
A

Cin = 3
g = 3/3

2
Cin = 4
g = 4/3

4
1

Cin = 5
g = 5/3
34

Catalog of Gates
Logical effort of common gates
Gate type

Number of inputs
1

NAND

4/3

5/3

6/3

(n+2)/3

NOR

5/3

7/3

9/3

(2n+1)/3

4, 4

6, 12, 6

8, 16, 16, 8

Inverter

Tristate / mux
XOR, XNOR

35

Parasitic Delay
The parasitic delay of a gate is the delay of

the gate when it drives zero load. It can be


estimated with RC delay models.
The inverter has three units of diffusion
capacitance on the output, so the parasitic
delay is 3RC =
The 3-input NAND and NOR each have 9
units of diffusion capacitance on the output,
so the parasitic delay is three times as great.
36

Catalog of Gates
Parasitic delay of common gates

In multiples of pinv (1)

Gate type

Number of inputs
1

NAND

NOR

2n

Inverter

Tristate /
mux
XOR, XNOR

37

Example: Ring Oscillator


Estimate the frequency of an N-stage ring oscillator

Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay: d =
Frequency: fosc =

g=
h=
p=

38

Example: Ring Oscillatorodd no.


of inverters
Estimate the frequency of an N-stage ring oscillator

Logical Effort:
g=1
Electrical Effort:
h=1
Parasitic Delay:
p=1
Stage Delay: d = 2
Frequency: fosc = 1/(2*N*d) = 1/4N

31 stage ring oscillator in


0.6 m process has
frequency of ~ 200 MHz

39

Example: FO4 Inverter


Estimate the delay of a fanout-of-4 (FO4) inverter
d

Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay:

g=
h=
p=
d=
40

Example: FO4 Inverter


Estimate the delay of a fanout-of-4 (FO4) inverter
d

Logical Effort:
Electrical Effort:
Parasitic Delay:
Stage Delay:

g=1
h=4
p=1
d=5

The FO4 delay is about


200 ps in 0.6 m process
60 ps in a 180 nm process
f/3 ns in an f m process
41

Multistage Logic Networks


Logical effort generalizes to multistage networks
G
gi
Path Logical Effort

Cout-path

Path Electrical Effort

Path Effort

F f i gi hi

10
g1 = 1
h1 = x/10

x
g2 = 5/3
h2 = y/x

Cin-path

y
g3 = 4/3
h3 = z/y

z
g4 = 1
h4 = 20/z

20

42

Multistage Logic Networks


Logical effort generalizes to multistage networks
G
gi
Path Logical Effort

Cout-path

Path Electrical Effort

Path Effort

F f i gi hi

Cin-path

Can we write F = GH?

43

Paths that Branch


No! Consider paths that branch:
15

G
H
GH
h1

=
=
=
=

h2

= GH?

90

5
15

90

44

Paths that Branch


No! Consider paths that branch:
15

G
H
GH
h1

=1
= 90 / 5 = 18
= 18
= (15 +15) / 5 = 6

h2

= 90 / 15 = 6

= g1g2h1h2 = 36 = 2GH

90

5
15

90

45

Branching Effort
Introduce branching effort

Accounts for branching between stages in path

b
B bi

Con path Coff path


Con path

Note:

BH

Now we compute the path effort

F = GBH
46

Multistage Delays
Path Effort Delay

DF f i

Path Parasitic Delay

P pi

Path Delay

D d i DF P

47

Designing Fast Circuits


D d i DF P
Delay is smallest when each stage bears same effort

f gi hi F

1
N

Thus minimum delay of N stage path is


1
N

D NF P
This is a key result of logical effort

Find fastest possible delay


Doesnt require calculating gate sizes
48

Gate Sizes
How wide should the gates be for least delay?

f gh g CCoutin
gi Couti
Cini
f
Working backward, apply capacitance transformation

to find input capacitance of each gate given load it


drives.
Check work by verifying input cap spec is met.
49

Example: 3-stage path


Select gate sizes x and y for least delay from

A to B

x
x

y
45
y

45

50

Example: 3-stage path


x
x
A

y
45
y

Logical Effort
Electrical Effort
Branching Effort
Path Effort
Best Stage Effort
Parasitic Delay
Delay
D=

45

G=
H=
B=
F=

P=
51

Example: 3-stage path


x
x
A

y
45
y

45

Logical Effort
G = (4/3)*(5/3)*(5/3) = 100/27
Electrical Effort
H = 45/8
Branching Effort
B=3*2=6
Path Effort
F = GBH = 125
Best Stage Effort
f 3 F 5
Parasitic Delay
P=2+3+2=7
Delay
D = 3*5 + 7 = 22 = 4.4 FO4
52

Example: 3-stage path


Work backward for sizes

f gh g CCoutin

y=
x=

gi Couti
Cini
f

x
x
A

y
45
y

45

53

Example: 3-stage path


Work backward for sizes

f gh g CCoutin

y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10

gi Couti
Cini
f
45

A P: 4
N: 4

P: 4
N: 6

P: 12
N: 3

45

54

Best Number of Stages


How many stages should a path use?

Minimizing number of stages is not always fastest


Example: drive 64-bit datapath with unit inverter

InitialDriver

DatapathLoad
N:
f:
D:

64
1

64
2

64
3

64
4

55

Best Number of Stages


How many stages should a path use?

Minimizing number of stages is not always fastest


Example: drive 64-bit datapath with unit inverter

InitialDriver

2.8

16

= NF1/N + P
= N(64)1/N + N

23
DatapathLoad
N:
f:
D:

64
1
64
65

64
2
8
18

64
3
4
15

64

4
2.8
15.3
Fastest
56

Derivation
Consider adding inverters to end of path

How many give least delay?

Logic Block:
n1Stages
Path Effort F

n1

D NF pi N n1 pinv
1
N

N - n1 ExtraInverters

i 1

1
1
1
D
F N ln F N F N pinv 0
N

Define best stage effort

1
N

pinv 1 ln 0
57

Best Stage Effort


pinv 1 ln 0

has no closed-form

solution
Neglecting parasitics (pinv = 0), we find =

2.718 (e)
For pinv = 1, solve numerically for = 3.59

58

Sensitivity Analysis
How sensitive is delay to using exactly the best
1.6
D(N) /D(N)

number of stages?

1.51

1.4

1.26

1.2

1.15

1.0
( =2.4)

(=6)

0.0

0.5

0.7

1.0

1.4

2.0

N/ N

2.4 < < 6 gives delay within 15% of optimal

We can be sloppy!
I like = 4
59

Example, Revisited
Ben Bitdiddle is the memory designer for the Motoroil 68W86,

an embedded automotive processor. Help Ben design the


decoder for a register file.
A[3:0] A[3:0]

32 bits

Register File

16 words

4:16 Decoder

16
Decoder specifications:
16 word register file
Each word is 32 bits wide
Each bit presents load of 3 unit-sized transistors
True and complementary address inputs A[3:0]
Each input may drive 10 unit-sized transistors
Ben needs to decide:
How many stages to use?
How large should each gate be?
How fast can decoder operate?

60

Number of Stages
Decoder effort is mainly electrical and branching

Electrical Effort:
Branching Effort:

H=
B=

If we neglect logical effort (assume G = 1)

Path Effort:

F=

Number of Stages:

N=

61

Number of Stages
Decoder effort is mainly electrical and branching

Electrical Effort:
Branching Effort:

H = (32*3) / 10 = 9.6
B=8

If we neglect logical effort (assume G = 1)

Path Effort:

F = GBH = 76.8

Number of Stages:

N = log4F = 3.1

Try a 3-stage design


62

Gate Sizes & Delay


Logical Effort:
Path Effort: F =
Stage Effort:
Path Delay:
Gate sizes: z =
A[3] A[3]
10

10

A[2] A[2]
10

10

A[1] A[1]
10

10

G=

f
D

y=

A[0] A[0]
10

10

word[0]
96 units of wordline capacitance

word[15]

63

Gate Sizes & Delay


Logical Effort: G = 1 * 6/3 * 1 = 2
Path Effort:
F = GBH = 154
Stage Effort: f F 1/ 3 5.36
Path Delay: D 3 f 1 4 1 22.1
Gate sizes: z = 96*1/5.36 = 18 y = 18*2/5.36 = 6.7
A[3] A[3]
10

10

A[2] A[2]
10

10

A[1] A[1]
10

10

A[0] A[0]
10

10

word[0]
96 units of wordline capacitance

word[15]

64

Comparison
Compare many alternatives with a spreadsheet

Design

NAND4-INV

29.8

NAND2-NOR2

20/9

30.1

INV-NAND4-INV

22.1

NAND4-INV-INV-INV

21.1

NAND2-NOR2-INV-INV

20/9

20.5

NAND2-INV-NAND2-INV

16/9

19.7

INV-NAND2-INV-NAND2-INV

16/9

20.4

NAND2-INV-NAND2-INV-INV-INV 6

16/9

21.6
65

Review of Definitions
Term

Stage

Path

number of stages

logical effort

G gi

electrical effort

Cout
Cin

branching effort

Con-path Coff-path
Con-path

B bi

effort

f gh

F GBH

effort delay

DF f i

parasitic delay

P pi

delay

d f p

Cout-path
Cin-path

D d i DF P

66

Method of Logical Effort


1) Compute path effort

F GBH
N log 4 F

2) Estimate best number of stages


3) Sketch path with N stages
4) Estimate least delay
5) Determine best stage effort
6) Find gate sizes

1
N

D NF P
f F N1
gi Couti
Cini
f

67

Limits of Logical Effort


Chicken and egg problem

Need path to compute G


But dont know number of stages without G
Simplistic delay model
Neglects input rise time effects
Interconnect
Iteration required in designs with wire
Maximum speed only
Not minimum area/power for constrained delay

68

Summary
Logical effort is useful for thinking of delay in circuits

Numeric logical effort characterizes gates


NANDs are faster than NORs in CMOS
Paths are fastest when effort delays are ~4
Path delay is weakly sensitive to stages, sizes
But using fewer stages doesnt mean faster paths
Delay of path is about log4F FO4 inverter delays

Inverters and NAND2 best for driving large caps


Provides language for discussing fast circuits
But requires practice to master

69

S-ar putea să vă placă și