Sunteți pe pagina 1din 29

Introduction to CMOS VLSI Design

Design for Skew

Outline
Clock Distribution Clock Skew Skew-Tolerant Static Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits

Design for Skew

CMOS VLSI Design

Slide 2

Clocking
Synchronous systems use a clock to keep operations in sequence Distinguish this from previous or next Determine speed at which machine operates Clock must be distributed to all the sequencing elements Flip-flops and latches Also distribute clock to other elements Domino circuits and memories

Design for Skew

CMOS VLSI Design

Slide 3

Clock Distribution
On a small chip, the clock distribution network is just a wire And possibly an inverter for clkb On practical chips, the RC delay of the wire resistance and gate load is very long Variations in this delay cause clock to get to different elements at different times This is called clock skew Most chips use repeaters to buffer the clock and equalize the delay Reduces but doesnt eliminate skew
Design for Skew CMOS VLSI Design Slide 4

Example
Skew comes from differences in gate and wire delay With right buffer sizing, clk1 and clk2 could ideally arrive at the same time. But power supply noise changes buffer delays clk2 and clk3 will always see RC skew
gclk 3 mm clk1 1.3 pF 3.1 mm clk2 0.4 pF 0.5 mm clk3 0.4 pF

Design for Skew

CMOS VLSI Design

Slide 5

Review: Skew Impact


Ideally full cycle is available for work Skew adds sequencing overhead Increases hold time too
t pd Tc t pcq tsetup tskew
sequencing overhead
clk clk Combinational Logic Tc clk tpcq Q1 D2 clk tpdq tsetup tskew

F1

F1

Q1

CL

clk

tcd thold tccq tskew


clk Q1 tccq D2

F2

D2

tskew thold

tcd

Design for Skew

CMOS VLSI Design

F2

Q1

D2

Slide 6

Cycle Time Trends


Much of CPU performance comes from higher f f is improving faster than simple process shrinks Sequencing overhead is bigger part of cycle
100 1000 10

SpecInt95

MHz
80386 80486 Pentium Pentium II / III

100

0.1

80386 80486 Pentium Pentium II / III 10 1985

0.01 1985

1988

1991

1994

1997

2000

1988

1991

1994

1997

2000

Fanout-of-4 (FO4) Inverter Delay (ps)

100

FO4 inverter delays / cycle

500

VDD = 5

VDD = 3.3

50

200

VDD = 2.5

100

20

80386 80486 Pentium Pentium II / III

50 2.0 1.2 0.8 0.6 0.35 0.25

10 1985

1988

1991

1994

1997

2000

Process

Design for Skew

CMOS VLSI Design

Slide 7

Solutions
Reduce clock skew Careful clock distribution network design Plenty of metal wiring resources Analyze clock skew Only budget actual, not worst case skews Local vs. global skew budgets Tolerate clock skew Choose circuit structures insensitive to skew

Design for Skew

CMOS VLSI Design

Slide 8

Clock Dist. Networks


Ad hoc Grids H-tree Hybrid

Design for Skew

CMOS VLSI Design

Slide 9

Clock Grids
Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die

Design for Skew

CMOS VLSI Design

Slide 10

Alpha Clock Grids


Alpha 21064 Alpha 21164 Alpha 21264

PLL

gclk grid

gclk grid

Alpha 21064

Alpha 21164

Alpha 21264

Design for Skew

CMOS VLSI Design

Slide 11

H-Trees
Fractal structure Gets clock arbitrarily close to any point Matched delay along all paths Delay variations cause skew A B A and B might see big skew

Design for Skew

CMOS VLSI Design

Slide 12

Itanium 2 H-Tree
Four levels of buffering: Primary driver Repeater Second-level clock buffer Gater Route around obstructions

Repeaters

Typical SLCB Locations

Primary Buffer

Design for Skew

CMOS VLSI Design

Slide 13

Hybrid Networks
Use H-tree to distribute clock to many points Tie these points together with a grid Ex: IBM Power4, PowerPC H-tree drives 16-64 sector buffers Buffers drive total of 1024 points All points shorted together with grid

Design for Skew

CMOS VLSI Design

Slide 14

Skew Tolerance
Flip-flops are sensitive to skew because of hard edges Data launches at latest rising edge of clock Must setup before earliest next rising edge of clock Overhead would shrink if we can soften edge Latches tolerate moderate amounts of skew Data can arrive anytime latch is transparent

Design for Skew

CMOS VLSI Design

Slide 15

Skew: Latches
2-Phase Latches
t pd Tc
1
L1

2
L2

1
L3

2t
pdq sequencing overhead
1 2

D1

Q1

Combinational Logic 1

D2

Q2

Combinational Logic 2

D3

Q3

tcd 1 , tcd 2 thold tccq tnonoverlap tskew tborrow Tc tsetup tnonoverlap tskew 2

Pulsed Latches

t pd Tc max t pdq , t pcq tsetup t pw tskew


sequencing overhead

tcd thold t pw tccq tskew tborrow t pw tsetup tskew


Design for Skew CMOS VLSI Design Slide 16

Dynamic Circuit Review


Static circuits are slow because fat pMOS load input Dynamic gates use precharge to remove pMOS transistors from the inputs Precharge: = 0 output forced high Evaluate: = 1 output may pull low
A B C D A B C D Y A B C D Y

Design for Skew

CMOS VLSI Design

Slide 17

Domino Circuits
Dynamic inputs must monotonically rise during evaluation Place inverting stage between each dynamic gate Dynamic / static pair called domino gate Domino gates can be safely cascaded
domino AND

W A B

dynamic static NAND inverter

Design for Skew

CMOS VLSI Design

Slide 18

Domino Timing
Domino gates are 1.5 2x faster than static CMOS Lower logical effort because of reduced Cin Challenge is to keep precharge off critical path Look at clocking schemes for precharge and eval Traditional schemes have severe overhead Skew-tolerant domino hides this overhead

Design for Skew

CMOS VLSI Design

Slide 19

Traditional Domino Ckts


Hide precharge time by ping-ponging between halfcycles One evaluates while other precharges Latches hold results during precharge
Tc clk clk

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

t pd Tc 2t pdq

clk

clk

clk

clk clk clk

clk

clk

clk clk

Static

Static

Static

Static

Static

Static

Latch

tpdq

tpdq

Design for Skew

CMOS VLSI Design

Latch

Slide 20

Clock Skew
Skew increases sequencing overhead Traditional domino has hard edges Evaluate at latest rising edge Setup at latch by earliest falling edge
clk clk clk clk clk clk clk clk clk clk

t pd Tc 2tsetup 2tskew

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Static

Static

Static

Static

Latch

tsetup tskew

Design for Skew

CMOS VLSI Design

Latch

Slide 21

Time Borrowing
Logic may not exactly fit half-cycle No flexibility to borrow time to balance logic between half cycles Traditional domino sequencing overhead is about 25% of cycle time in fast systems!
clk clk clk clk clk clk clk clk

Dynamic

Dynamic

Dynamic

Dynamic

Static

Static

Static

Static

Latch

Latch
tsetup tskew

Design for Skew

CMOS VLSI Design

Slide 22

Relaxing the Timing


Sequencing overhead caused by hard edges Data departs dynamic gate on late rising edge Must setup at latch on early falling edge Latch functions Prevent glitches on inputs of domino gates Holds results during precharge Is the latch really necessary? No glitches if inputs come from other domino Can we hold the results in another way?

Design for Skew

CMOS VLSI Design

Slide 23

Skew-Tolerant Domino
Use overlapping clocks to eliminate latches at phase boundaries. Second phase evaluates using results of first
No latch at phase boundary 1 2

Dynamic

Static

1 2 a b c

1 2 a b c

Design for Skew

CMOS VLSI Design

Static

Dynamic

Slide 24

Full Keeper
After second phase evaluates, first phase precharges Input to second phase falls Violates monotonicity? But we no longer need the value Now the second gate has a floating output Need full keeper to hold it either high or low
H X f weak full keeper transistors

Design for Skew

CMOS VLSI Design

Slide 25

Time Borrowing
Overlap can be used to Tolerate clock skew Permit time borrowing No sequencing overhead
toverlap tborrow tskew 1

t pd Tc

2 1 1 1 1 1 2 2 2

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Static

Static

Static

Static

Static

Static

Static

Phase 1

Phase 2

Design for Skew

CMOS VLSI Design

Static

Slide 26

Multiple Phases
With more clock phases, each phase overlaps more Permits more skew tolerance and time borrowing
1 2 3 4 1 1 2 2 3 3 4 4

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Static

Static

Static

Static

Static

Static

Static

Phase 1

Phase 2

Phase 3

Phase 4

Design for Skew

CMOS VLSI Design

Static

Slide 27

Clock Generation
en clk 1 2 3 4
Design for Skew CMOS VLSI Design Slide 28

Summary
Clock skew effectively increases setup and hold times in systems with hard edges Managing skew Reduce: good clock distribution network Analyze: local vs. global skew Tolerate: use systems with soft edges Flip-flops and traditional domino are costly Latches and skew-tolerant domino perform at full speed even with moderate clock skews.

Design for Skew

CMOS VLSI Design

Slide 29