Sunteți pe pagina 1din 46

Low-Power Logic Design

Christian Piguet
CSEM & EPFL, Neuchâtel & Lausanne, Switerland
Logic Families and Standard Cells

Logic Families
Static CMOS Logic, Branch-based Logic, Transmission Gates, N-Pass Logic,
Dynamic Precharged Logic

Low-Power and Standard Cells


Gated Clocks, Latch-based Designs, Cell Drives, Complex Gate
Decomposition, Standard Cell Libraries

Short PPT title | Author | Page 1


CMOS Gates: Conduction Functions

CMOS Combinatorial Circuit


Two networks of opposite
type, in which the two
X1 conduction functions
X2 P-ch are complementary
Xn
Z Conduction Functions:

X1 C0(X1, X2, ..., Xn) for N- ch


X2
N-ch
Xn C1(X1, X2, ..., Xn) for P- ch

Short PPT title | Author | Page 2


Separated Simplification Method

• One has to derive the symmetrical logical equation while using the Karnaugh
maps of the logic function
• This method, i.e. to perform twice the Boolean simplification for each N-ch
and P-ch networks, is called « separated simplification » method, as:
• one has to consider in a first step the blocks of ‘0’ in the Karnaugh map
to generate the N-ch network
• and in a second step the blokcs of ‘1’ of the Karnaugh maps to generate
the P-ch network

Short PPT title | Author | Page 3


Synthesis of an inverter with the “separated
simplification method”

Example : inverter

a z = ( a ) [0] + ( a ) [1]
z
0 1

1 0
P

zN = a z
a
a z
zP = a
N

Short PPT title | Author | Page 4


Synthesis of a NAND gate with the separated
simplification method

a Other example : NAND Gate


z 0 1
0 1 1 z = ( a b ) [0] + ( a + b ) [1]
b
1 1 0 a b
P P
a zN = a b z
z
a
zP = a + b N
b
b
N

Short PPT title | Author | Page 5


Complex CMOS gate generated by the separated
simplification method

ab
z
00 01 11 10
1 1 0 1 zN = a b c + a b c
0
c
0 1 1 1 zP = a b + a c + b c
1

z = ( a b c + a b c ) [0] + ( a b + a c + b c ) [1]

Short PPT title | Author | Page 6


Branch-based Static Logic S
00 01
AB
11 10

00 1 1 0 1

01 1 0 0 1

11 0 0 1 1

10 1 1 1 1
CD
Vdd Vdd
B A C A B A S N-ch = A C D + A B C + A B D
C C D AB
C A D S
00 01 11 10
S S
00 1 1 0 1
B C B B C BC
Without Branch 01 1 0 0 1
branch
A A A With
A A branches 11 0 0 1 1
C D D 10 1 1 1 1
C D
Vss Vss
CD AC
AD
ch and P-ch
Synthesis Method: both N-
Networks are sums of products S P-ch = A C + B C + A D
(each variable is inverted)

Short PPT title | Author | Page 7


Branch-Based Logic

Vdd
Vdd
B A C BC A A D C

C A D
S
S

Vdd Vdd
A B A AC BC AD
C C D

S S

Short PPT title | Author | Page 8


Advantages

• the « separated simplification » method provides several advantages


like the following:
possibility to have the two structural expressions as sums of products
the transistor networks can therefore be implemented as branches of serial
transistors connected between the output and Vdd or Vss
a more regular layout
very simple timing models, i.e. to charge or discharge the output capacitance
through one branch
a better testability

Short PPT title | Author | Page 9


Dual Topological Method

b c
a

P b c For a simple combinatorial circuit,


the N-ch and P-ch networks are dual
z

a
One can obtain the N-ch networks
N
by using Karnaugh simplification
and then derive the P-ch network
b b by topological duality

c c It is the most used method

Short PPT title | Author | Page 10


CMOS logic styles

+ b
a c

p-ch T
S
T

n- ch C 2 MOS
Transmission
logic
gate
static

Short PPT title | Author | Page 11


Synthesis with Transmission Gates

a
a) b)
P
b
b N
w 0 w a
MUX
a
1
c P
c
a N
a
Analysis: example with a 2:1 mux

Short PPT title | Author | Page 12


Gate-Matrix-like Electrical 2:1 Mux

c) w a a c b
wN = ( a ) [c] + ( a ) [b]
a
wP = ( a ) [c] + ( a ) [b]
a P-ch

w = ( a ) [c] + ( a ) [b]
a N- ch
Symmetrical Equation
a

Short PPT title | Author | Page 13


Karnaugh Maps of 2:1 Mux

d) e)
w bc w bc
00 01 11 10 00 01 11 10
0 (a) 0 0 0 1 1 ( a ) [b]
a a
1 1 0 1 1 0
(a) ( a ) [c]

Short PPT title | Author | Page 14


Synthesis of a complex CMOS gate with transmission
MOS

x c c b b d d Vss
x ab c b
00 01 11 10 P-ch
00 0 0 0 0 c b
01 0 1 1 0
cd c b
11 0 0 0 0 N-ch
c b
10 0 1 1 0
b
x = ( b ) [0 ] + (bc)[d] + (bc) [d]

xN = ( b ) [ 0 ] + ( b c ) [ d ] + ( b c ) [ d ]
xP = (bc)[d] + (bc)[d]

Short PPT title | Author | Page 15


Gate-Matrix Incrementer (pass-MOS)

Co Co s cin cin a a Vss


s a a
0 1 0 1 cin

0 0 1 0 0 0
cin cin cin N- ch
1 1 0 1 0 1 cin

cin
s = ( cin) [a] + ( cin) [ a ]

Co = ( cin) [0] + (cin) [ a ]


cin
Sum s is a XOR gate with P-ch
cin
4 transistors !
cin

Short PPT title | Author | Page 16


N-ch Transmission Transistor

N
a
s z
Out: Vdd-VT a
b Vdd-VT

c c

«1»
VTP: P-ch off N-ch on Out: Vdd-VT

P-ch off VGS=0!


VTN: N-ch off «1»

Short PPT title | Author | Page 17


Gate Matrix Circuit (CSEM 1985)

Short PPT title | Author | Page 18


C2MOS gate versus transmission gate
26 MOS

CK Slave

D
Master of a D Flip-Flop
Vdd
I Vdd
I CK
I
S
CK CK
S CK S
CK CK
CK
I I I
S
CK
Short PPT title | Author | Page 19
Drawbacks of Transmission Gates

• However, in our opinion, it is not a good practice to use transmission MOS or


gates.
• They have been extensively used for N-MOS technologies and often used for
CMOS technologies, but they present some problems for modelization and
simulation.
• One has to know what is the driver of a transmission gate to determine its
gate delay.
• If the gate delay depends on the previous driver gate, one has a major
problem for cell characterization.
• It is why one has to place inverters or buffers for each input of a standard cell
with transmission gate.
• The advantage of reducing the transistor count is lost

Short PPT title | Author | Page 20


Other Logic Families
• The CMOS static style is very simple and generally the most efficient in
speed/power consumption
• Dynamic and precharged logic styles, as well as dual rail, consume much
more
• N-MOS pass transistor logic (quite good for MUX & XOR) has a problem
at low Vdd, i.e. a Vdd-VT at the output
• It is why one has to add a MOS keeper, it is SPL logic, i.e. Single Pass
Transistor Logic

XOR: N-MOS pass-transistors


Only N-ch network:
Æ 2 MOS!

Short PPT title | Author | Page 21


SPL: Single Rail Pass Transistor

Could compete with


static CMOS at high
Vdd.

But not at low Vdd, for


which SPL is slower
and consumes much
more than
static CMOS

Full Adder

Short PPT title | Author | Page 22


Dual Rail Logic

Vdd Dual Rail Logic


P-MOS Voltage Switch Logic
Precharge

S S

N-MOS N-MOS
- precharged logic
inputs
- dual rail (S and S)

Dual networks

Short PPT title | Author | Page 23


CPL (Dual-Rail) versus CMOS

Complementary Pass
M Transistor Logic
U
X
Two N-NOS networks
for each rail

Short PPT title | Author | Page 24


Differential Cascode Voltage Switch Logic

DCVSL Dual-Rail

Ratioed:
the N-ch
networks
have to
fight
against
the P-ch
loads

Short PPT title | Author | Page 25


Precharged Logic

a b c d
precharge evaluation
+ + + +

PR P-ch PR P-ch
S S S

P-ch
N-ch N-ch N-ch
S
PR n-ch PR N-ch

Short PPT title | Author | Page 26


Problem: Logic Gates in Series

• Precharged gates in series with same precharge signal PR.


• Such a circuit cannot work.
• At the end of the precharge, the output of the first gate S1 is ‘1’ and all the
other outputs of other gates which are also inputs to the second gate are
also at ‘1’.
• It results that the N-ch network of the second gate is conducting.
• Before that some inputs S1 can switch to ‘0’ to cut off the N-ch network of
the second gate (assuming that the final output state S2=1) in the evaluation
phase, this N-ch network is conducting and discharges the output S2.
• This N-ch will be cut off later on, but it is too late, and there is no P-ch
network to charge S2 to ‘1’. So false output S2.

Short PPT title | Author | Page 27


Precharged gates connected in series

+ a + + +
b
PR P-ch PR P-ch P-ch PR P-ch
PR
S1 S2 S1

P-ch
N-ch N-ch N-ch
S2
PR N-ch PR N-ch PR N-ch PR N-ch

impossible IDEAL

Short PPT title | Author | Page 28


Domino Logic

+ + To have inverters to cut off


PR P-ch PR P-ch the next N-ch network

S1 S2

N-ch N-ch

PR N-ch PR N-ch

Krambeck and Law, authors of the first paper on Domino logic, have
received an Award at ISSCC’2000, but this paper on Domino logic had
been refused 20 years ago at the same conference

Short PPT title | Author | Page 29


Precharged NORA gate

+
PR P-ch PR P-ch

P-ch
PR
N-ch

PR N-ch PR n-ch
PR

to N-ch
to P-ch
The name NORA means NO Race. This logic style is similar to
Domino logic and to the logic using alternate N-ch and P-ch networks
The only difference is a dynamic latch (C2MOS gate) at the output

Short PPT title | Author | Page 30


Problems in Dynamic Logic

• An important problem of precharged logic is the dynamic output node that


presents smaller and smaller parasitic capacitance in deep submicron. It
means that the lowest frequency (several 10th of kHz in 2 to 4 µm, is larger
and larger and quite close to the highest frequency.

• Another problem is the activity, as an output at ‘0’ for a long period is


precharged to ‘1’ at each clock cycle, resulting in a 100% activity. It means
that the power consumption of precharged circuits is higher than the power
consumption of static circuits

Short PPT title | Author | Page 31


Charge Sharing

+ +
a PR b CK
OUT
I
d Cout OUT
I Cout
a b c
Cn CK
Cb
PR

Short PPT title | Author | Page 32


Low-Power and Standard Cells

• To write low-power VHDL code:


• Gated Clock
• Latch based VHDL
• Standard Cell Libraries
• Gate Sizing
• Cell Categories
• Gate decomposition
• Clock Skew, Clock Tree, Asynchronous
• Low-Power Standard Cell Library (for synthesis)

Short PPT title | Author | Page 33


ALU and Gated Clocks

See also A. Amara:


Power Aware Design
To minimize the activity Techniques
of a combinational
circuit (ALU),registers
data registers are located at the inputs
of the ALU. They are B Bus<8>
loaded at the same time
--> very few transitions ABus <8>
in the ALU gated
clock
ctr
REG0
These registers are at
ALU the same time pipeline
REG1
RAM Index H
registers (a pipeline for ALU<8>
free !) ROM Index H
control RAM Index L

The pipeline gated ROM Index L


register CY, Z ACCU
clock
mechanism Status Register
does not result in a S Bus <8>
more complex
architecture, but
reduces the power

Short PPT title | Author | Page 34


Latch-based Clocking Scheme Principle

- I.P. cores more robust


inputs
Combi- in Latch-based
national master slave - better clock skew tolerance
Circuit
DQ DQ - smaller but 2 clock trees
DQ DQ
DQ DQ - more temporal barriers, less glitches
- reduction MOS (ms-slave)

Ø1 Ø2 Ø1 Ø2
Skew CK
between
Ø1 pulses Very robust
has to be Ø1
less than
1/2 period Ø2

Short PPT title | Author | Page 35


Clock Gating for Latch-based design

Combinational Combinational Combinational


Combinational Circuit Circuit Circuit
Circuit

Clock1 Clock2 Clock1 Clock2

- logic circuit between each latch, more performances


- very natural and safe clock gating methodology
- gives glitch free clock signals without the adding of
memory elements (generally required for DFF)
- Synopsys handles very nicely the latch-based design method
(time borrowing, clock speed, …)

Short PPT title | Author | Page 36


Transistor Sizing

delay

rise time

fall time
W
N-ch

Short PPT title | Author | Page 37


Several Cell Categories

ƒ
ƒ/2
ƒ :2
:2
very
fast
cell
C mux

set of cells with


smaller transistors

Short PPT title | Author | Page 38


NAND6 decomposition

a) b)
A1
A1 36 δ
A2 ZN
A2 9δ ZN
A3
A3 4δ δ
A4 A4 ∆
A5 6∆ A5
A6 A6
Delay : 36 δ + 6 ∆ Delay: 9 δ + 4 δ + δ + ² = 14 δ + ∆

Short PPT title | Author | Page 39


Delays

gate (at 3.3 V) delay


_________________________________________________________
NAND6 not decomposed 0.70 ns
NAND6 decomposed 0.42 ns

NOR6 not decomposed 1.81 ns


NOR6 decomposed (2NOR3+NAND2) 0.65 ns
NOR6 decomposed (3NOR2+NAND3) 0.53 ns

Short PPT title | Author | Page 40


BUS Driver

N0 Vdd CMOS Complex Gate

Z0
N1 BUS N0 N1 N2 N3 N4 N5 N6 N7

Z1 Z0 Z1 Z2 Z3 Z4 Z5 Z6 Z7
N2

Z2 BUS
N3
N0 N1 N2 N3 N4 N5 N6 N7
Z3
.........
Z0 Z1 Z2 Z3 Z4 Z5 Z6 Z7
N7

Z7 Vss

Short PPT title | Author | Page 41


Hierarchical Bus Driver

0 The hierarchical BUS


BUS Driver contains 4
mx
BUS hierarchical levels with
1 smaller and smaller
0
transistors. Rise and Fall
mx
delays are not symetrical,
1 but one has 4 delays in
series. If the supply
0 voltage is reduced to 3.0
mx Volts, the energy per
1 transition is reduced by
about a factor of 3

16 sources, 64 MOS, Cload = 0.5 pF, very large transistors

Short PPT title | Author | Page 42


Less Cells and More Speed Performances

CSEL_LIB 4.x CSEL_LIB 5.0


delay µm2 delay µm2
32-b Multiply 16.43 ns 907K 12.15 ns 999K

fp Adder 27.73 ns 510K 21.15 ns 548K

CoolRISC ALU 10.79 ns 140K 7.66 ns 170K

Low-Power Standard Cell Libraries:

- CSEL_LIB 4.x: 60 functions, 220 layouts

- CSEL_LIB 5.0: 22 functions, 92 layouts

Short PPT title | Author | Page 43


Less Cells and Similar Silicon Area

CSEL_LIB 4.x CSEL_LIB 5.0


delay µm2 delay µm2
32-b Multiply 17.09 ns 868K 17.0 ns 830K

fp Adder 28.09 ns 484K 28.0 ns 472K

CoolRISC ALU 11.0 ns 139K 10.97 ns 118K

CSEL_LIB 4.x: 60 functions, 220 layouts

CSEL_LIB 5.0: 22 functions, 92 layouts

CSEM Library in TSMC 0.18µm available

Short PPT title | Author | Page 44


Thank you for your attention.

S-ar putea să vă placă și