Sunteți pe pagina 1din 34

Lecture 8:

Latch and Flip Flop Design

Slides originally from:


Vladimir Stojanovic & Vojin G. Oklobdzija

Computer Systems Laboratory


Stanford University
horowitz@stanford.edu

4/24/02 EE371 1
Outline

• Recent interest in latches and flip-flops


• Timing and Power metrics
• Design and optimization tradeoffs
• Master-slave vs. Pulse-triggered Latch
• Representative designs
• Comparison

4/24/02 EE371 2
Recent Interest in Flip-Flops
• Trends in high-performance systems
à Higher clock frequency
à More transistors on chip
• Consequences
à Increased flip-flop overhead relative to cycle time
• Cycle time 10 - 20 FO4 delays, flop overhead 2 - 4 FO4
à Difficult to control both edges of the clock
à Higher impact of clock skew
à Higher crosstalk and substrate coupling
à Higher power consumption
• expensive packages and cooling systems
• limit in performance
à Clock burns up to 40%, flops up to 20% of total power

4/24/02 EE371 3
Requirements in the Flip-Flop Design

• Small Clk-Output delay, Narrow sampling window


• Low power
• Small clock load
• High driving capability (increased levels of parallelism)
à Typical flip-flop load in a 0.18µm CMOS ranges from 50fF to
over 200fF, with typical values of 100-150fF in critical paths
• Integration of logic into the flop
• Multiplexed or clock scan
• Crosstalk insensitivity
- dynamic/high impedance nodes are affected

4/24/02 EE371 4
Flip-Flop Delay
• Sum of setup time and Clk-output delay is the only true
measure of the performance with respect to the system
speed
• T = TClk-Q + TLogic + Tsetup+ Tskew

D Q Logic D Q
N
Clk Clk

TClk-Q TLogic TSetup


4/24/02 EE371 5
Delay vs. Setup/Hold Times
350

300
Minimum Data-Output
250
Clk-Output [ps]

200

150
Setup Hold

100

50 Sampling Window
0
-200 -150 -100 -50 0 50 100 150 200
Data-Clk [ps]

4/24/02 EE371 6
Timing parameters, details
410

390 Unstable Clk-Q region Stable Clk-Q region

370 Failure region D-Q


D CQ +U
Time [ps]

350

330 minimum D-Q


Clk-Q stable
310
D CQ
290

270 U
Optimum setup time
250
-80 -60 -40 -20 0 20 40 60 80 100
D - Clk delay [ps]

The best point to pick on delay curve is minimum D-Q

4/24/02 EE371 7
Types of State-Elements

Master-Slave Latch Pulse-Triggered Latch


L
L1 L2 Data
Data D Q
D Q D Q
Clk Clk
Clk Clk
Clk
Data
S Q
Clk R

4/24/02 EE371 8
Master-Slave Latches
• Positive setup times
• Two clock phases:
à distributed globally
à generated locally
• Small penalty in delay for incorporating MUX
• Some circuit tricks needed to reduce the overall delay

4/24/02 EE371 9
T-G Master-Slave Latch
• PowerPC 603 (Gerosa, JSSC 12/94)

Vdd Vdd

Clk Clkb
Q
D

Clkb Clk

4/24/02 EE371 10
T-G Master-Slave Latch

• Low power feedback


• Unbuffered input
à input capacitance depends on the phase of the clock
à over-shoot and under-shoot with long routes
à wirelength must be restricted at the input
• Clock load is high
• Low power
• Small clk-output delay, but positive setup
• Easily embedded scan or mux

4/24/02 EE371 11
C2MOS MS Latches
Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973
Vdd Vdd

Ck Ckb
D Q
Ckb Ck

Vdd Vdd Vdd Vdd

Clk Ck
Vdd Vdd
Ckb Ck

• Low power feedback Ck Ckb


• Locally generated second phase
• Poor driving capability
• Robustness to clock slope

4/24/02 EE371 12
Single-Transistor-Clocked MS latches
D
Vdd

Clk Clk

Vdd
Q
Q

D D
D

Vdd

DSTC SSTC
• Yuan and Svennson, JSSC Jan. ‘97
• Ratioed DCVS and SRPL based designs
• Relatively small clock load
• Very sensitive to input glitching
• Capacitive coupling and charge sharing related speed and power problems
4/24/02 EE371 13
Pulse-Triggered Latches

• First stage is a pulse generator


à generates a pulse (glitch) on a rising edge of the clock
• Second stage is a latch
à captures the pulse generated in the first stage
• Pulse generation results in a negative setup time
• Frequently exhibit a soft edge property
• Must check for hold time violations

Note: power is always consumed in the clocked pulse


generator

4/24/02 EE371 14
Hybrid Latch Flip-Flop (H. Partovi, ISSCC’96)
Vdd
Second
Stage Latch Q

D
D=1

Clk

D=0 D=0

signal at
node X
D=1

Pulse Generator

4/24/02 EE371 15
HLFF – pulse generation
Keepers
Second
Data
Stage Latch

Clk

D=1

Pulse
Generator D=0 D=0
signal at
node X

D=1

4/24/02 EE371 16
HLFF Operation
• 1-0 and 0-1 transitions at the input with 0ps setup time

4/24/02 EE371 17
Hybrid Latch Flip-Flop
Skew absorption

Partovi et al, ISSCC’96


4/24/02 EE371 18
Hybrid Latch Flip-Flop
• Flip-flop features:
à single phase clock
à edge triggered, on one clock edge
• Latch features: Soft clock edge property
à brief transparency, equal to 3 inverter delays
à negative setup time
à allows slack passing
à absorbs skew
• Hold time is comparable to HLFF delay
à minimum delay between flip-flops must be controlled
• Fully static
• Possible to incorporate logic

4/24/02 EE371 19
Semi-Dynamic Flip-Flop (SDFF)
• Sun UltraSparc III, Klass, VLSI Circuits’98
Vdd Vdd

Q
Q

Clk

• Soft edge conditioned by data since first stage is precharged - cross-coupled


latch is added for robustness
• Small penalty for adding logic
• Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists
4/24/02 EE371 20
Sense-amplifier-based flip-flop
Madden & Bowhill, 1990, Matsui et al. 1994.
DEC Alpha 21264, StrongARM 110
• First stage is a sense
amplifier
• On rising clock edge
monotonic S_b or R_b
trigger the S-R latch
• Cross-coupled NAND -
speed bottleneck
• Big power savings in
reduced swing designs
• Nice interface to/from
domino logic

4/24/02 EE371 21
Modified Sense Amplifier-Based Flip-Flop
• The first stage is unchanged
sense amplifier
• Second stage is sized to
provide maximum switching
speed
• Driver transistors are large
• Keeper transistors are small
and disengaged during
transitions

Nikolic & Stojanovic, ISSCC ‘99

4/24/02 EE371 22
Modified Sense Amplifier-Based Flip-Flop
• Delay of each of the outputs is independent of the load on
the other output
• Delay of Q and Q is symmetrical as opposed to the NAND
based design
• Convenient for dual rail logic and driving strength for
standard CMOS is effectively doubled
• SAFF presents a small clock load, small setup time and all
the advantages of original design

• Possible tradeoff between speed and robustness to cross-


talk

4/24/02 EE371 23
K-6 Dual-Rail ETL
Clk
D
• Self-reset property
à increases dynamic power
à drives domino logic
• Precharge increases speed
• Very fast but burns a lot of power
• Small clock load
Vdd

4/24/02 EE371 24
Power and Delay Definitions
• All power related to the SE can be PD
divided into: VDD VDD
à Input power
• Data power (PD) D D Q
• Clock power (PCLK) VDD PLOAD
à Internal power (PINT)
à Load power (PLOAD) CLK CLK Qb
• PLOAD can be merged into PINT PCLK
• Internal power is a function of PINT
à data activity ratio (α) – number of
captured data transitions with respect to Ptot = Pinternal + ∑P driver
number of clock transitions inputs(D,CLK)
(αmax=100%)
• no activity (0000… and 1111…) Delay is (minimum D-Q)
• maximum activity (0101010..) Clk-Q + setup time
• average activity (random sequence)
à Glitching activity

4/24/02 EE371 25
State Element Performance Metrics

It is always possible trade power for speed

Common metrics:
• Power-Delay Product (PDP)
• Misleading measure
• Good only if measured at constant frequency = EDP
• EDP - Energy-Delay Product (EDP)
à More accurate measure (Gonzalez & Horowitz)
• ED2P – Energy-Delay2-Product
à A new measure, being justified by new results (Hofstee, Nowka,
IBM)

4/24/02 EE371 26
Design & optimization tradeoffs
90
80 • Opposite Goals
70
60 à Minimal Total power
PDPtot [fJ]

50 consumption
40
30
à Minimal Delay
20 • Power-Delay tradeoff
10 Opt.
0 • Minimize Power-Delay
0 50 100 150 200
product (PDPtot) @ f=const.
Total Power [uW]
90 90
80 80
70 70
60 60
PDPtot [fJ]

PDPtot [fJ]
50 50
40 40
30 30
20 20
10 Opt. 10 Opt.
0 0
0 5 10 15 20 25 0 200 400 600 800 1000 1200
Width [um] Delay [ps]

4/24/02 EE371 27
Overall Results
Delay Comparison (50% activity)

5
MS Latch Pulsed Latch Differential
4.5
4

3.5
Delay [ FO4 ]

2.5

2
1.5

0.5

0
PowPC C2MOS HLFF SDFF StrongArm SAbFF

4/24/02 EE371 28
Conventional Clk-Q vs.minimum D-Q
400
HLFF
350 SSTC & DSTC
PowerPC
300 Pulsed designs MS designs
Total power [uW]

250 Strong Arm FF

200 SA-F/F
150 mC2MOS latch
100
K6 ETL
50
SSTC
0
0 1 2 3 4 5 6 7 8 9 10 11 DSTC
Delay [ FO4 ] SDFF

400
HLFF
350
PowerPC
300
Total Power [uW]

Strong Arm FF
250

200
SA-F/F
• Hidden positive
150 mC2MOS latch
setup time
100 K6 ETL
50 • Degradation of
SSTC
0 total delay
DSTC
0 1 2 3 4 5
Clk-Q delay [FO4] SDFF
Older 0.22u comparison results
4/24/02 EE371 29
Overall Results
Single-Edge Triggered Structures Power Consumption Comparison
(50% activity)
Internal Power [uW] Clock Power [uW] Data Power [uW]
250
MS Latch
Single Ended Dual Ended

200
Power Consumption [uW]

150

100

50

F
TC

m
FF
TC

FF
S

FF
C

FF

F
F
O

bF
Ar
wP

CP
HL
DS

SD

CC
SS

CC
M

ng

SA
Po

C2

TG

ro

DE
SE

4/24/02 EE371 St
30
Internal Power distribution
400
350
Internal Power [uW] 300
250
200
150
100
50
0
Random, …01010101… …11111111… …00000000…
activity=0.5 activity=1 activity=0 activity=0
Data patterns
HLFF SDFF PowerPC 603 latch
mC2MOS latch StrongARM FF Alpha 21264 FF
K6 ETL
• Four sequences characterize the boundaries for internal power consumption
à …010101… maximum
à random, equal transition probability, average
à …111111… precharge activity
à …000000… leakage + internal clock processing
Older 0.22u comparison results
4/24/02 EE371 31
Comparison of Clock power consumption

DSTC MS latch
SSTC MS latch
K6 ETL
StrongArm FF
SA-F/F
2
mC MOS
PowerPC MS latch
SDFF
HLFF

0 10 20 30 40 50
Local Clock power consumption [? W]
Older 0.22u comparison results
4/24/02 EE371 32
Design goals
• Apply • Avoid
à Small clock load à Positive setup time
à Short direct path à Sensitivity to clock slope and
skew
à Reduced node swing
à Dynamic (floating) nodes
à Low-power feedback
à Dynamic Master latch
à Pulsed design
à Optimization of both
Master and Slave latch
Conduct Energy - Delay optimizations
Take into account all sources of power dissipation
ALWAYS use Clk-Q + setup time for max delay

For more details on storage elements check prof. Oklobdzija’s ISSCC’02 talk:
http://www.ece.ucdavis.edu/acsel under Presentations
4/24/02 EE371 33
Simulation Conditions:
• Power Supply Voltage: VDD=1.8V nominal
• Temperature T=27°C nominal
• Technology: 0.18µm Fujitsu
• Fan-Out of 4 Delay = 75pS
• Transistor Widths
à Minimal 0.36µm
 Maximal 10µm
• Load: 14 minimal inverters in the technology used
• Clock frequency: 500MHz (250MHz for Dual-Egde)
• Data/Clock slopes of ideal signal 100ps

4/24/02 EE371 34

S-ar putea să vă placă și