Sunteți pe pagina 1din 29

Introduction to

CMOS VLSI
Design

Design for Skew


Outline
 Clock Distribution
 Clock Skew
 Skew-Tolerant Static Circuits
 Traditional Domino Circuits
 Skew-Tolerant Domino Circuits

Design for Skew CMOS VLSI Design Slide 2


Clocking
 Synchronous systems use a clock to keep
operations in sequence
– Distinguish this from previous or next
– Determine speed at which machine operates
 Clock must be distributed to all the sequencing
elements
– Flip-flops and latches
 Also distribute clock to other elements
– Domino circuits and memories

Design for Skew CMOS VLSI Design Slide 3


Clock Distribution
 On a small chip, the clock distribution network is just
a wire
– And possibly an inverter for clkb
 On practical chips, the RC delay of the wire
resistance and gate load is very long
– Variations in this delay cause clock to get to
different elements at different times
– This is called clock skew
 Most chips use repeaters to buffer the clock and
equalize the delay
– Reduces but doesn’t eliminate skew
Design for Skew CMOS VLSI Design Slide 4
Example
 Skew comes from differences in gate and wire delay
– With right buffer sizing, clk1 and clk2 could ideally
arrive at the same time.
– But power supply noise changes buffer delays
– clk2 and clk3 will always see RC skew

gclk
3 mm 3.1 mm 0.5 mm
clk1 clk3
clk2
1.3 pF
0.4 pF 0.4 pF

Design for Skew CMOS VLSI Design Slide 5


Review: Skew Impact
 Ideally full cycle is
clk clk

Q1 D2

F1

F2
Combinational Logic

available for work Tc

 Skew adds sequencing


clk
tpcq
tskew

Q1 tpdq tsetup

overhead D2

 Increases hold time too clk

Q1

F1
CL

t pd  Tc   t pcq  tsetup  tskew  clk

D2
sequencing overhead

F2
tcd  thold  tccq  tskew tskew

clk
thold

Q1 tccq

D2 tcd

Design for Skew CMOS VLSI Design Slide 6


Cycle Time Trends
 Much of CPU performance comes from higher f
– f is improving faster than simple process shrinks
– Sequencing overhead is bigger part of cycle 100
1000

10

MHz
SpecInt95

100
1

80386 80386
80486
0.1 80486
Pentium
Pentium
Pentium II / III
Pentium II / III

0.01 10
1985 1988 1991 1994 1997 2000 1985 1988 1991 1994 1997 2000

100
Fanout-of-4 (FO4) Inverter Delay (ps)

VDD = 3.3
VDD = 5
500
FO4 inverter delays / cycle

50
VDD = 2.5
200

80386
100 20 80486
Pentium
Pentium II / III

50 10
2.0 1.2 0.8 0.6 0.35 0.25 1985 1988 1991 1994 1997 2000

Process

Design for Skew CMOS VLSI Design Slide 7


Solutions
 Reduce clock skew
– Careful clock distribution network design
– Plenty of metal wiring resources
 Analyze clock skew
– Only budget actual, not worst case skews
– Local vs. global skew budgets
 Tolerate clock skew
– Choose circuit structures insensitive to skew

Design for Skew CMOS VLSI Design Slide 8


Clock Dist. Networks
 Ad hoc
 Grids
 H-tree
 Hybrid

Design for Skew CMOS VLSI Design Slide 9


Clock Grids
 Use grid on two or more levels to carry clock
 Make wires wide to reduce RC delay
 Ensures low skew between nearby points
 But possibly large skew across die

Design for Skew CMOS VLSI Design Slide 10


Alpha Clock Grids
Alpha 21064 Alpha 21164 Alpha 21264

PLL

gclk grid gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

Design for Skew CMOS VLSI Design Slide 11


H-Trees
 Fractal structure
– Gets clock arbitrarily close to any point
– Matched delay along all paths
 Delay variations cause skew
 A and B might see big skew A B

Design for Skew CMOS VLSI Design Slide 12


Itanium 2 H-Tree
 Four levels of buffering:
– Primary driver
– Repeater Repeaters

– Second-level
clock buffer
– Gater
 Route around Typical SLCB
Locations

obstructions
Primary Buffer

Design for Skew CMOS VLSI Design Slide 13


Hybrid Networks
 Use H-tree to distribute clock to many points
 Tie these points together with a grid

 Ex: IBM Power4, PowerPC


– H-tree drives 16-64 sector buffers
– Buffers drive total of 1024 points
– All points shorted together with grid

Design for Skew CMOS VLSI Design Slide 14


Skew Tolerance
 Flip-flops are sensitive to skew because of hard edges
– Data launches at latest rising edge of clock
– Must setup before earliest next rising edge of clock
– Overhead would shrink if we can soften edge
 Latches tolerate moderate amounts of skew
– Data can arrive anytime latch is transparent

Design for Skew CMOS VLSI Design Slide 15


Skew: Latches
2-Phase Latches 1 2 1

 2t 
D1 Q1 Combinational D2 Q2 Combinational D3 Q3

L1

L2

L3
t pd  Tc  pdq
Logic 1 Logic 2

sequencing overhead 1

tcd 1 , tcd 2  thold  tccq  tnonoverlap  tskew 2

  tsetup  tnonoverlap  tskew 


Tc
tborrow 
2
Pulsed Latches
t pd  Tc  max  t pdq , t pcq  tsetup  t pw  tskew 
sequencing overhead

tcd  thold  t pw  tccq  tskew

tborrow  t pw   tsetup  tskew 

Design for Skew CMOS VLSI Design Slide 16


Dynamic Circuit Review
 Static circuits are slow because fat pMOS load input
 Dynamic gates use precharge to remove pMOS
transistors from the inputs
– Precharge:  = 0 output forced high
– Evaluate:  = 1 output may pull low
A
B
C  Y
D Y A B C D
A B C D

Design for Skew CMOS VLSI Design Slide 17


Domino Circuits
 Dynamic inputs must monotonically rise during
evaluation
– Place inverting stage between each dynamic gate
– Dynamic / static pair called domino gate
 Domino gates can be safely cascaded
domino AND

W X
A
B

dynamic static
NAND inverter

Design for Skew CMOS VLSI Design Slide 18


Domino Timing
 Domino gates are 1.5 – 2x faster than static CMOS
– Lower logical effort because of reduced Cin
 Challenge is to keep precharge off critical path
 Look at clocking schemes for precharge and eval
– Traditional schemes have severe overhead
– Skew-tolerant domino hides this overhead

Design for Skew CMOS VLSI Design Slide 19


Traditional Domino Ckts
 Hide precharge time by ping-ponging between half-
cycles
– One evaluates while other precharges
– Latches hold results during precharge
Tc

clk

clk

t pd  Tc  2t pdq
clk clk clk clk clk clk clk clk clk clk
Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static

Static

Static
Latch

Latch
tpdq tpdq

Design for Skew CMOS VLSI Design Slide 20


Clock Skew
 Skew increases sequencing overhead
– Traditional domino has hard edges
– Evaluate at latest rising edge
– Setup at latch by earliest falling edge

clk

clk
t pd  Tc  2tsetup  2tskew
clk clk clk clk clk clk clk clk
Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static
Latch

Latch
tsetup tskew

Design for Skew CMOS VLSI Design Slide 21


Time Borrowing
 Logic may not exactly fit half-cycle
– No flexibility to borrow time to balance logic
between half cycles
 Traditional domino sequencing overhead is about
25% of cycle time in fast systems!
clk

clk

clk clk clk clk clk clk


Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static
Latch

Latch
tsetup tskew

Design for Skew CMOS VLSI Design Slide 22


Relaxing the Timing
 Sequencing overhead caused by hard edges
– Data departs dynamic gate on late rising edge
– Must setup at latch on early falling edge
 Latch functions
– Prevent glitches on inputs of domino gates
– Holds results during precharge
 Is the latch really necessary?
– No glitches if inputs come from other domino
– Can we hold the results in another way?

Design for Skew CMOS VLSI Design Slide 23


Skew-Tolerant Domino
 Use overlapping clocks to eliminate latches at phase
boundaries.
– Second phase evaluates using results of first
No latch at
phase boundary
1 2

Dynamic

Dynamic
a b c d

Static

Static
1 1

2 2

a a

b b

c c

Design for Skew CMOS VLSI Design Slide 24


Full Keeper
 After second phase evaluates, first phase precharges
 Input to second phase falls
– Violates monotonicity?
 But we no longer need the value
 Now the second gate has a floating output
– Need full keeper to hold it either high or low
 H
X
weak full
f keeper
transistors

Design for Skew CMOS VLSI Design Slide 25


Time Borrowing
 Overlap can be used to
– Tolerate clock skew
– Permit time borrowing
 No sequencing overhead
toverlap
tborrow tskew

1

2
t pd  Tc
1 1 1 1 1 2 2 2
Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static

Static

Static

Static

Static
Phase 1 Phase 2

Design for Skew CMOS VLSI Design Slide 26


Multiple Phases
 With more clock phases, each phase overlaps more
– Permits more skew tolerance and time borrowing

1

2

3

4

1 1 2 2 3 3 4 4
Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static

Static

Static

Static

Static
Phase 1 Phase 2 Phase 3 Phase 4

Design for Skew CMOS VLSI Design Slide 27


Clock Generation

en clk

1

2

3

4

Design for Skew CMOS VLSI Design Slide 28


Summary
 Clock skew effectively increases setup and hold
times in systems with hard edges
 Managing skew
– Reduce: good clock distribution network
– Analyze: local vs. global skew
– Tolerate: use systems with soft edges
 Flip-flops and traditional domino are costly
 Latches and skew-tolerant domino perform at full
speed even with moderate clock skews.

Design for Skew CMOS VLSI Design Slide 29

S-ar putea să vă placă și