VLSI Timing

EECS150 - Digital Design
Lecture 13 - Circuit Timing

Feb 28, 2012
John Wawrzynek
Spring 2012
EECS150 - Lec13-timing1
Page 1
Performance, Cost, Power
How do we measure performance?

operations/sec? cycles/sec?
Performance is directly proportional to clock frequency.
Although it may not be the entire story:
Ex: CPU performance
= # instructions X CPI X clock period
Spring 2012
Page 2
ndby power.
low voltage
ndby current
ntage of the
bias is used
de. All core
ce and bulk
obalt disilipacitance, as
rmance and
n and data
back buffer.
Timing Analysis
ARM processor Microarch
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
clk
a
Timing Analysis
What is the
smallest T that
produces correct T ! time(clk"
operation?
T ! #clk"Q +
Spring 2012
Fig. 2. Microprocessor pipeline organization.
1 s
1 MHz
Spring 2003
10 MHz
100 ns
100 MHz
10 ns
1 GHz
1 ns
shown in Fig. 2, where the state boundaries are indicated by

gray. Features that allow the microarchitecture to achieve high
speed are as follows.
EECS150 Lec10-Timi
Page 3
Output cannot change instantaneously at the trigger clock edgeLec3.9

Similar to delay in logic gates, two components:
-
ch output for each transition

oad-dependent-delay x load
Timing Analysis and Logic Delay

-
Load dependent Clock-to-Q
Clocking Methodology
CS152 / Kubiatowicz
Lec3.9
1/28/04
CS152 / Kubiatowicz
Lec3.10
UCB Spring 2004
Clk
.
.
.
.
.
.
.
.
.
Combinational
Logic
Combination Logic
Critical Path & Cycle Time
.
.
.
Clk
.
.
.
.
.
.
All storage elements are clocked by the same clock

edge.
.
.
.
.
.
.
The .combination
logic blocks:
.
.
.
Inputs are updated at each clock tick

All outputs MUST be stable before the next clock tick
dby power.
ow voltage
dby
bycurrent
the same
ntage of the
ias is used
e. All core
e and bulk
obalt disiliacitance,
as tick
next clock
mance and
n and data
ack buffer.
o and four
Internal Clock-to-Q
clock
Spring 2012
Critical path: the slowest path between any two storage

devices
If T > worst-caseUCB
delay
through CL, doesCS152 / Kubiatowicz
1/28/04
Spring 2004
correctofoperation?
Cyclethis
timeensure
is a function
the critical path
Lec3.11
must be greater than:

Clock-to-Q + Longest Path through Combination Logic + Setup
4
Kubiatowicz
shown in CS152
Fig. 2,/ where
the state boundaries are indicated
1/28/04by
Lec3.11
The shifter and ALU reside in separate stages. The ARM in-
Page
UCB Spring 2004
CS152 / Kubiatowicz
Lec3.12
Limitations on Clock Rate

2 Delays in flip-flops
1 Logic Gate Delay
What are typical delay values?
Both times contribute to

limiting the clock period.
What must happen in one clock cycle for correct operation?

All signals connected to FF (or memory) inputs must be
ready and setup before rising edge of clock.
For now we assume perfect clock distribution (all flip-flops
see the clock at the same time).
Spring 2012
Page 5
Example
Parallel to serial
converter circuit
clk
T time(clkQ) + time(mux) + time(setup)

T clkQ + mux + setup
Spring 2012
Page 6
In General ...
For correct operation:
T clkQ + CL + setup
for all paths.
How do we enumerate all paths?
Any circuit input or register output to any register input or

circuit output?
Note:
setup time for outputs is a function of what it connects to.

clk-to-q for circuit inputs depends on where it comes from.
Spring 2012
Page 7
Basic Components: CMOS Inverter
near model composes
CS152 / Kubiatowicz
Lec3.30
UCB Spring 2004
Vdd
Circuit
Symbol
CL Delay: Transistors as water valves

!"#$%&'(#)*(+,%-$*".(/0
al Oxide Semiconductor
iconductor) transistors
In
iconductor) transistors
PMOS
In
Out
Out
If electrons are water molecules,

1 Inverter
2+.$0#$03
and a capacitor a bucket ...
Basic Components: CMOS
Vdd = 5V
GND = 0v
uctor
Out
Inverter Operation
1
Vdd
Circuit
Symbol
In
Vdd
A on
p-FET fills
Vdd = 5V In
up the capacitor
with
charge.
GND = 0v
PMOS
Vdd
Vdd
!"#$%&'(#)*(+,%-$*".(/0
Open
Time
Water level
Vdd
1/28/04
Vdd
4546%,"#$3
A on n-FET
empties the bucket.
Vdd
Disc
Vin
1 UCB Spring 2004
Open
Out
Open
Discharge
Vdd
Spring 2012
UCB Spring 2004
Vin
CS152 / Kubiatowicz
Lec3.32
Vd
Open
Charge
NMOS
CS152 / Kubiatowicz
Vout
Lec3.31
Vdd
Charge
Vout
Vdd
Out
2+.$0#$03
verter Operation
uctor
ring 2004
NMOS
Water level
This model is often good enough ...
Time
Page 8
CS152 / K
Lec3
Transistors as Conductors
Improved Transistor Model:
nFET
We refer to transistor "strength" as

the amount of current that flows for
a given Vds and Vgs.
The strength is linearly proportional
to the ratio of W/L.
pFET
Spring 2012
Page 9
Gate Delay is the Result of Cascading
Cascaded gates:
transfer curve for inverter.

Spring 2012
Page 10
Delay in Flip-flops
Setup time results from delay

through first latch.
clk
clk
clk
clk
Clock to Q delay results from

delay through second latch.
clk
clk
Spring 2012
clk
clk
Page 11
Wire Delay
Ideally, wires behave as
transmission lines:
signal wave-front moves close
to the speed of light
~1ft/ns
Time from source to
destination is called the
transit time.
In ICs most wires are short,
and the transit times are
relatively short compared to
the clock period and can be
ignored.
Not so on PC boards.
Spring 2012
Page 12
Wire Delay
Even in those cases where the
transmission line effect is
negligible:
Wires posses distributed
resistance and capacitance
v1
v2
v3
v4
For short wires on ICs,

resistance is insignificant
(relative to effective R of
transistors), but C is important.
Typically around half of C of
gate load is in the wires.
For long wires on ICs:
Time constant associated with

distributed RC is proportional
to the square of the length
busses, clock lines, global

control signal, etc.
Resistance is significant,
therefore distributed RC effect
dominates.
signals are typically rebuffered
to reduce delay:
v1
v2
v3
v4
time
Spring 2012
Page 13
Delay and Fan-out

2
1
3
The delay of a gate is proportional to its output capacitance.

Connecting the output of gate to more than one other gate increases
its output capacitance. It takes increasingly longer for the output of
a gate to reach the switching threshold of the gates it drives as we
add more output connections.
Driving wires also contributes to fan-out delay.
What can be done to remedy this problem in large fan-out situations?

Spring 2012
Page 14
Critical Path
Critical Path: the path in the entire design with the maximum
delay.
This could be from state element to state element, or from
input to state element, or state element to output, or from input
to output (unregistered paths).
For example, what is the critical path in this circuit?
Why do we care about the critical path?
Spring 2012
Page 15
Searching for processor critical path
Must consider all connected register pairs,

paths from input to register, register to
output. Dont forget the controller.
Design tools help in the search.
Synthesis tools report delays on paths,

?
and, of course, simulators can be used

to determine timing performance.
wer.
age
ent
the
sed
ore
ulk
ilias
and
ata
fer.
our
Special static timing analyzers accept

a design netlist and report path delays,
Tools that are expected to do something

about the timing behavior (such as
synthesizers), also include provisions for
specifying input arrival times (relative to the
clock), and output requirements (set-up
times of next stage).
16
Spring 2012
Page
The critical path

Late-mode timing checks (thousands)
Real Stuff: Timing Analysis

Most paths have hundreds of
picoseconds to spare.
200
150
100
50
0
!40 !20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
Timing slack (ps)
From The circuit and physical design of the POWER4 microprocessor, IBM J Res and Dev, 46:1, Jan 2002, J.D. Warnock et al.
Spring 2012
Figure 26
Histogram of the POWER4 processor path delays.
Page 17
Clock Skew
clock skew, delay in distribution
Unequal delay in distribution of the clock signal to various parts of
a circuit:
if not accounted for, can lead to erroneous behavior.
Comes about because:
clock wires have delay,
circuit is designed with a different number of clock buffers from
the clock source to the various clock loads, or
buffers have unequal delay.
All synchronous circuits experience some clock skew:
more of an issue for high-performance designs operating with very
little extra time per clock cycle.
Spring 2012
Page 18
CLK
CLK
Clock Skew (cont.)

CLK
CL
CLK

If clock period T = TCL+Tsetup+TclkQ, circuit will fail.
Therefore:
1. Control clock skew
a) Careful clock distribution. Equalize path delay from clock source to
all clock loads by controlling wires delay and buffer delay.
b) dont gate clocks in a non-uniform way.
2. T TCL+Tsetup+TclkQ + worst case skew.
Most modern large high-performance chips (microprocessors)

control end to end clock skew to a small fraction of the clock period.
Spring 2012
Page 19
Clock Skew (cont.)

CLK
CLK
CLK
CL
CLK
Note reversed buffer.

In this case, clock skew actually provides extra time (adds to
the effective clock period).
This effect has been used to help run circuits as higher
clock rates. Risky business!
Spring 2012
Page 20
Real Stuff: Floorplanning Intel XScale 80200
21
Spring 2012

Page
Grid
Tuned
sector
trees
Delay
Delay
Sector
buffers
x
Spring 2012
Clock Tree
Delays,
IBM Power
EECS150
CPU
- Lec13-timing1
Buffer level 2
Buffer level 1
Page 22
Delay
1.5
1.0
Volts (V)
was completed with a tool run at the chip level,

g unit-level pins to the grid. At this point, the
ng and the bottom-up clock routing process still
eat deal of flexibility to respond rapidly to even
es. Repeated practice routing and tuning were
d by a small, focused global clock team as the
and buffer placements evolved to guarantee
and speed the design process.
ements of jitter and skew can be carried out
I/Os on the chip. In addition, approximately 100
probe pads were included for direct probing
bal clock grid and buffers. Results on actual
microprocessor chips show long-distance
ging from 20 ps to 40 ps (cf. Figure 9). This is
from early test-chip hardware, which showed
s 70 ps skew from across-chip channel-length
[19]. Detailed waveforms at the input and
each global clock buffer were also measured
ared with simulation to verify the specialized
used to design the clock grid. Good agreement
. Thus, we have achieved a correct-by-design
ibution methodology.Spring
It is based
2012on our design
e and measurements from a series of increasingly
lex server microprocessors. This method results
quality global clock without having to use
or adjustment circuitry to control skews.
20 ps skew
0.5
0.0
0
500
1000
1500
Time (ps)
2000
2500
Figure 9
Multiplefingered
transmission
line
Global clock waveforms showing 20 ps of measured skew.
including uncertainties associated with the modeling

EECS150
- Lec13-timing1
of the floating-body
effect
[2123] and its impact on
noise immunity [22, 24 27] and overall chip decoupling
capacitance requirements [26], was another factor behind
the choice of a primarily static design style. Finally, the
size and logical complexity of the chip posed risks to
Clock Tree Delays, IBM Power
Page 23

VLSI Timing

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

VLSI Timing

Încărcat de

Drepturi de autor:

Formate disponibile

EECS150 - Digital Design

Lecture 13 - Circuit Timing

Performance, Cost, Power

How do we measure performance?

Fig. 2. Microprocessor pipeline organization.

shown in Fig. 2, where the state boundaries are indicated by

Output cannot change instantaneously at the trigger clock edgeLec3.9

ch output for each transition

Timing Analysis and Logic Delay

Load dependent Clock-to-Q

UCB Spring 2004

Critical Path & Cycle Time

All storage elements are clocked by the same clock

Inputs are updated at each clock tick

Fig. 2. Microprocessor pipeline organization.

Critical path: the slowest path between any two storage

must be greater than:

Limitations on Clock Rate

1 Logic Gate Delay

What are typical delay values?

Both times contribute to

What must happen in one clock cycle for correct operation?

T time(clkQ) + time(mux) + time(setup)

For correct operation:

for all paths.

How do we enumerate all paths?

Any circuit input or register output to any register input or

setup time for outputs is a function of what it connects to.

Basic Components: CMOS Inverter

near model composes

UCB Spring 2004

CL Delay: Transistors as water valves

If electrons are water molecules,

1 UCB Spring 2004

UCB Spring 2004

This model is often good enough ...

We refer to transistor "strength" as

Gate Delay is the Result of Cascading

transfer curve for inverter.

Setup time results from delay

Clock to Q delay results from

For short wires on ICs,

For long wires on ICs:

Time constant associated with

busses, clock lines, global

Delay and Fan-out

The delay of a gate is proportional to its output capacitance.

Driving wires also contributes to fan-out delay.

What can be done to remedy this problem in large fan-out situations?

For example, what is the critical path in this circuit?

Why do we care about the critical path?

Searching for processor critical path

Must consider all connected register pairs,

Synthesis tools report delays on paths,

and, of course, simulators can be used

Special static timing analyzers accept

Tools that are expected to do something

The critical path

Real Stuff: Timing Analysis

Histogram of the POWER4 processor path delays.

Clock Skew (cont.)

clock skew, delay in distribution

Most modern large high-performance chips (microprocessors)

Clock Skew (cont.)