Sunteți pe pagina 1din 28

Power Dissipation in CMOS Circuits

Static CMOS are power-efficient because they dissipate nearly zero


power while idle.

However as transistor counts and clock frequency have increased and


transistor dimension decreased, power consumption has sky rocketed
and now is a primary design constraint.

Power dissipation in CMOS circuits are from


Static dissipation due to
subthreshold conduction through OFF transistor
tunneling current through gate oxide (both ON and OFF tr.)
leakage through reverse-biased diodes
Dynamic dissipation due to
Charging and discharging of load capacitance
Short-cirucit current while both NMOS and PMOS are ON.
Contention current of ratioed circuit

Advanced VLSI EEE 6405 Slide1

ABM HARUN-UR RASHID

Charging a Capacitor

When the gate output rises

Energy stored in capacitor is


2
EC 12 CLVDD

But energy drawn from the supply is

EVDD I t VDD dt CL
0

CLVDD

dV
VDD dt
dt

VDD

dV C V

2
L DD

Half the energy from VDD is dissipated in the pMOS transistor


as heat, other half stored in capacitor

When the gate output falls

Energy in capacitor is dumped to GND

Dissipated as heat in the nMOS transistor

Advanced VLSI EEE 6405 Slide2

ABM HARUN-UR RASHID

Switching Waveforms

Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz

Advanced VLSI EEE 6405 Slide3

ABM HARUN-UR RASHID

Activity Factor

Suppose the system clock frequency = f

Let fsw = f, where = activity factor

If the signal is a clock, = 1

If the signal switches once per cycle, =

Dynamic power:

Pswitching CVDD 2 f

Advanced VLSI EEE 6405 Slide4

ABM HARUN-UR RASHID

Review: Energy & Power Equations


In time period T total energy consumed is
E = CL VDD2 + tsc VDD Ipeak + VDD Ileakage T
P = CL VDD2 f + (tsc/T)VDD Ipeak + VDD Ileakage
Dynamic power
(~90% today and
decreasing
relatively)

Advanced VLSI EEE 6405 Slide5

Short-circuit
power
(~8% today and
decreasing
absolutely)

Leakage power
(~2% today and
increasing)

ABM HARUN-UR RASHID

Power and Energy Design Space


Constant
Throughput/Latency
Energy

Design Time

Variable
Throughput/Latency

Non-active Modules

Logic Design
Active

Reduced Vdd
Sizing

Run Time
DFS, DVS

Clock Gating

Multi-Vdd

(Dynamic
Freq, Voltage
Scaling)

Sleep Transistors
Leakage

+ Multi-VT

Multi-Vdd

+ Variable VT

Variable VT

Advanced VLSI EEE 6405 Slide6

ABM HARUN-UR RASHID

Bus Multiplexing

Buses are a significant source of power dissipation due to


high switching activities and large capacitive loading

15% of total power in Alpha 21064


30% of total power in Intel 80386

Share long data buses with time multiplexing (S1 uses even
cycles, S2 odd)

S1

S2

D1

S1

D1

D2

S2

D2

But what if data samples are correlated (e.g., sign bits)?

Advanced VLSI EEE 6405 Slide7

ABM HARUN-UR RASHID

Correlated Data Streams

Bit switching probabilities

MSB

For a shared (multiplexed)


bus advantages of data
correlation are lost (bus
carries samples from two
uncorrelated data
streams)

Bit position

Advanced VLSI EEE 6405 Slide8

LSB

Bus sharing should not be


used for positively
correlated data streams
Bus sharing may prove
advantageous in a
negatively correlated data
stream (where successive
samples switch sign bits) more random switching

ABM HARUN-UR RASHID

Glitch Reduction by Pipelining


Glitches depend on the logic depth of the circuit - gates
deeper in the logic network are more prone to glitching

Reduce logic depth by adding pipeline registers

additional energy used by the clock and pipeline registers

I$

Decode
Instruction

PC

Fetch

Execute

Memory

D$

WriteBack

MDR

arrival times of the gate inputs are more spread due to delay
imbalances
usually affected more by primary input switching

MAR

pipeline
stage
isolation
register

clk
Advanced VLSI EEE 6405 Slide9

ABM HARUN-UR RASHID

Power and Energy Design Space


Constant
Throughput/Latency
Energy

Design Time

Variable
Throughput/Latency

Non-active Modules

Logic Design
Active

Reduced Vdd
Sizing

Run Time
DFS, DVS

Clock Gating

Multi-Vdd

(Dynamic
Freq, Voltage
Scaling)

Sleep Transistors
Leakage

+ Multi-VT

Multi-Vdd

+ Variable VT

Variable VT

Advanced VLSI EEE 6405 Slide10

ABM HARUN-UR RASHID

Clock Gating

Most popular method for power reduction of clock signals


and functional units

Gate off clock to idle functional


units

e.g., floating point units

need logic to generate

R
Functional
e
unit
g

disable
signal
- increases complexity of control logic
- consumes power
- timing critical to avoid clock glitches
at
OR gate output

clock
disable

additional gate delay on clock signal

- gating OR gate can replace a buffer


in the clock distribution tree
Advanced VLSI EEE 6405 Slide11

ABM HARUN-UR RASHID

Clock Gating

The best way to reduce the activity is to turn off the clock
to registers in unused blocks

Saves clock activity ( = 1)

Eliminates all switching activity in the block

Requires determining if block will be used

Advanced VLSI EEE 6405 Slide12

ABM HARUN-UR RASHID

Clock Gating in a Pipelined Datapath


For idle units (e.g., floating point units in Exec stage, WB
stage for instructions with no write back operation)
Execute

Memory

D$

WriteBack

MDR

I$

Decode
Instruction

PC

Fetch

MAR

clk
No FP
Advanced VLSI EEE 6405 Slide13

No WB
ABM HARUN-UR RASHID

Power and Energy Design Space


Constant
Throughput/Latency
Energy

Design Time

Variable
Throughput/Latency

Non-active Modules

Logic Design
Active

Reduced Vdd
Sizing

Run Time
DFS, DVS

Clock Gating

Multi-Vdd

(Dynamic
Freq, Voltage
Scaling)

Sleep Transistors
Leakage

+ Multi-VT

Multi-Vdd

+ Variable VT

Variable VT

Advanced VLSI EEE 6405 Slide14

ABM HARUN-UR RASHID

Voltage / Frequency

Run each block at the lowest possible voltage and


frequency that meets performance requirements

Voltage Domains

Provide separate supplies to different blocks


Level converters required when crossing
from low to high VDD domains
Level Converter

The standard method to handle voltage domain crossings is a level


converter. When A = 0, N1 is OFF and N2 is ON. N2 pulls Y down to
0, which turns on P1, pulling X up to VDDH and ensuring that P2
turns OFF. When A = 1, N1 is ON and N2 is OFF. N1 pulls X down to
0, which turns on P2, pulling Y up to VDDH. In either case, the level
converter behaves as a buffer and properly drives Y between 0 and
VDDH without risk of transistors remaining partially ON.
Unfortunately, the level converter costs delay (about 2 FO4) and
power at each domain crossing.

Advanced VLSI EEE 6405 Slide15

ABM HARUN-UR RASHID

Voltage / Frequency

Many systems have time varying performance requirements. For


example, a video decoder requires more computation for rapidly moving
scenes than for static scenes. Such systems can save large amounts of
energy by reducing the clock frequency to the minimum sufficient to
complete the task on schedule, then reducing the supply voltage to the
minimum necessary to operate at that frequency. This is called dynamic
voltage scaling (DVS) or dynamic voltage/frequency scaling (DVFS) .

The DVS controller takes information from the system about the
workload and/or the die temperature. It determines the supply voltage
and clock frequency sufficient to complete the workload on schedule or
to maximize performance without overheating.
A switching voltage regulator efficiently steps
down Vin from a high value to the necessary
VDD. The core logic contains a phase-locked
loop or other clock synthesizer to generate
the specified clock frequency. Dynamic
Voltage Scaling adjust VDD and f according to
workload

Advanced VLSI EEE 6405 Slide16

ABM HARUN-UR RASHID

Decreasing the VDD


decreases dynamic
energy consumption
(quadratically)

But, increases gate


delay (decreases
performance)

tp(normalized)

Review: Dynamic Power as a Function of VDD

VDD (V)

Determine the critical path(s) at design time and use high


VDD for the transistors on those paths for speed. Use a
lower VDD on the other logic to reduce dynamic energy
consumption.

Advanced VLSI EEE 6405 Slide17

ABM HARUN-UR RASHID

Dynamic Frequency and Voltage Scaling

Intels SpeedStep

Hardware that steps down the clock frequency (dynamic frequency


scaling DFS) when the user unplugs from AC power
- PLL from 650MHz 500MHz

CPU stalls during SpeedStep adjustment

Transmeta LongRun

Hardware that applies both DFS and DVS (dynamic supply


voltage scaling)
- 32 levels of VDD from 1.1V to 1.6V
- PLL from 200MHz 700MHz in increments of 33MHz

Triggered when CPU load change is detected by software


- heavier load ramp up VDD, when stable speed up clock
- lighter load slow down clock, when PLL locks onto new rate,
ramp down VDD

CPU stalls only during PLL relock (< 20 microsec)

Advanced VLSI EEE 6405 Slide18

ABM HARUN-UR RASHID

Power and Energy Design Space


Constant
Throughput/Latency
Energy

Design Time

Variable
Throughput/Latency

Non-active Modules

Logic Design
Active

Reduced Vdd
Sizing

Run Time
DFS, DVS

Clock Gating

Multi-Vdd

(Dynamic
Freq, Voltage
Scaling)

Sleep Transistors
Leakage

+ Multi-VT

Multi-Vdd

+ Variable VT

Variable VT

Advanced VLSI EEE 6405 Slide19

ABM HARUN-UR RASHID

Speculated Power of a 15mm P


70
Power (Watts)

60
50

70
Leakage
Active

0% 0% 0% 0% 1% 1% 1% 2% 3%

40
30
20

60
Power (Watts)

0.25 , 15mm die, 2V

50
40

0.18 , 15mm die, 1.4V

Leakage
Active
9%
0% 0% 1% 1% 2% 3% 5% 7%

30
20

10

10

Temp (C)

Power (Watts)

60
50
40

Leakage
0.13 , 15mm die. 1V Active
26%
20%
11% 15%
1% 2% 3% 5% 8%

30
20

70
50
40
30
20

10

10

Temp (C)

Advanced VLSI EEE 6405 Slide20

41% 49% 56%

33%

60
Power (Watts)

70

Temp (C)

14%
6% 9%

19%

26%

0.1 , 15mm die, 0.7V


Leakage
Active

Temp (C)
ABM HARUN-UR RASHID

Review: Leakage as a Function of Design Time VT

Reducing the VT
increases the subthreshold leakage
current (exponentially)

But, reducing VT
decreases gate delay
(increases performance)

Determine the critical path(s) at design time and use low


VT devices on the transistors on those paths for speed.
Use a high VT on the other logic for leakage control.

Advanced VLSI EEE 6405 Slide21

ABM HARUN-UR RASHID

Review: Variable VT (ABB) at Run Time

VT = VT0 + (|-2F + VSB| - |-2F|)


where VT0 is the threshold voltage at VSB = 0
VSB is the source-bulk (substrate) voltage
is the body-effect coefficient

For an n-channel device,


the substrate is normally
tied to ground
A negative bias causes VT
to increase from 0.45V to
0.85V
Adjusting the substrate
bias at run time is called
adaptive body-biasing
(ABB)

Advanced VLSI EEE 6405 Slide22

VT (V)

VSB (V)
ABM HARUN-UR RASHID

Multi-Threshold CMOS (MTCMOS) Sleep Transistor

Advanced VLSI EEE 6405 Slide23

Adds high-Vth sleep transistor between


pull-up network and Vdd and between
pull down network and ground

Logic circuit use low-vth transistor for


speed

Sleep transistor are turned off when the


logic circuit is not in use by the sleep
signal.

The additional slepp transistor increase


increase area and delay. Furthermore the
pull up and pull down network will have
floating values and thus will loose state
during sleep mode.
ABM HARUN-UR RASHID

Multi-Threshold CMOS (MTCMOS) Sleep Transistor

Advanced VLSI EEE 6405 Slide24

Adds high-Vth sleep transistor between


pull-up network and Vdd and between
pull down network and ground

Logic circuit use low-vth transistor for


speed

Sleep transistor are turned off when the


logic circuit is not in use by the sleep
signal.

The additional sleep transistor increase


area and delay. Furthermore the pull up
and pull down network will have floating
values and thus will loose state during
sleep mode.
ABM HARUN-UR RASHID

Stacked transistor

Stack effect result in substantial sub


threshold leakage current reduction when
two or more stacked transistor are turned
off together.

Series OFF transistor


demonstrating Stack
effect.

Stack effect reduces subs-threshold leakage by a


factor of 10..

Advanced VLSI EEE 6405 Slide25

ABM HARUN-UR RASHID

Forced Stack

Advanced VLSI EEE 6405 Slide26

Transistor stacking exploits stack effect :


which results in substantial sub-threshold
leakage current reduction when two or
more stacked transistor are turned off
together.

Stacking increases delay.

ABM HARUN-UR RASHID

Sleepy stack

Advanced VLSI EEE 6405 Slide27

Sleepy stack technique divide the


existing transistor into two transistors
each typically with the same width W1
half the size of the original single
transistors width W0 (i.e. W1=W0/2)

During active mode all sleep transistors


are turned on.

High Vth transistor can be used for the


sleep transistors and the transistor
parallel to the sleep transistors without
incurring large delay.

ABM HARUN-UR RASHID

Sleepy Keeper

Advanced VLSI EEE 6405 Slide28

A PMOS transistor is placed in parallel to


the sleep transistor (Sleep) and a NMOS
is placed in parallel to the sleep transistor
(Sleep)

During sleep mode sleep transistors are


turned OFF and one of the transistors in
parallel to the sleep transistors keep the
connection with the appropriate power
rail to maintain a vlue of 1 in sleep
mode, given that the 1 have already
been calculated.

Similarly a value of 0 is maintained in


sleep mode, given the 0 value have
already been calculated.
ABM HARUN-UR RASHID

S-ar putea să vă placă și