Logic Design Part 2

Low-Power Logic Design
Christian Piguet
CSEM & EPFL, Neuchâtel & Lausanne, Switerland
Logic Families and Standard Cells
Logic Families
Static CMOS Logic, Branch-based Logic, Transmission Gates, N-Pass Logic,
Dynamic Precharged Logic
Low-Power and Standard Cells

Gated Clocks, Latch-based Designs, Cell Drives, Complex Gate
Decomposition, Standard Cell Libraries
Short PPT title | Author | Page 1

CMOS Gates: Conduction Functions
CMOS Combinatorial Circuit

Two networks of opposite
type, in which the two
X1 conduction functions
X2 P-ch are complementary
Xn
Z Conduction Functions:
X1 C0(X1, X2, ..., Xn) for N- ch

X2
N-ch
Xn C1(X1, X2, ..., Xn) for P- ch

Separated Simplification Method
• One has to derive the symmetrical logical equation while using the Karnaugh
maps of the logic function
• This method, i.e. to perform twice the Boolean simplification for each N-ch
and P-ch networks, is called « separated simplification » method, as:
• one has to consider in a first step the blocks of ‘0’ in the Karnaugh map
to generate the N-ch network
• and in a second step the blokcs of ‘1’ of the Karnaugh maps to generate
the P-ch network

Synthesis of an inverter with the “separated
simplification method”
Example : inverter
a z = ( a ) [0] + ( a ) [1]
z
0 1
1 0
P
zN = a z
a
a z
zP = a
N

Synthesis of a NAND gate with the separated
simplification method
a Other example : NAND Gate

z 0 1
0 1 1 z = ( a b ) [0] + ( a + b ) [1]
b
1 1 0 a b
P P
a zN = a b z
z
a
zP = a + b N
b
b
N

Complex CMOS gate generated by the separated
simplification method
ab
z
00 01 11 10
1 1 0 1 zN = a b c + a b c
0
c
0 1 1 1 zP = a b + a c + b c
1
z = ( a b c + a b c ) [0] + ( a b + a c + b c ) [1]

Branch-based Static Logic S
00 01
AB
11 10
00 1 1 0 1
01 1 0 0 1
11 0 0 1 1
10 1 1 1 1
CD
Vdd Vdd
B A C A B A S N-ch = A C D + A B C + A B D
C C D AB
C A D S
00 01 11 10
S S
00 1 1 0 1
B C B B C BC
Without Branch 01 1 0 0 1
branch
A A A With
A A branches 11 0 0 1 1
C D D 10 1 1 1 1
C D
Vss Vss
CD AC
AD
ch and P-ch
Synthesis Method: both N-
Networks are sums of products S P-ch = A C + B C + A D
(each variable is inverted)

Branch-Based Logic
Vdd
Vdd
B A C BC A A D C
C A D
S
S
Vdd Vdd
A B A AC BC AD
C C D
S S

Advantages
• the « separated simplification » method provides several advantages

like the following:
possibility to have the two structural expressions as sums of products
the transistor networks can therefore be implemented as branches of serial
transistors connected between the output and Vdd or Vss
a more regular layout
very simple timing models, i.e. to charge or discharge the output capacitance
through one branch
a better testability

Dual Topological Method
b c
a
P b c For a simple combinatorial circuit,

the N-ch and P-ch networks are dual
z
a
One can obtain the N-ch networks
N
by using Karnaugh simplification
and then derive the P-ch network
b b by topological duality
c c It is the most used method

CMOS logic styles
+ b
a c
p-ch T
S
T
n- ch C 2 MOS
Transmission
logic
gate
static

Synthesis with Transmission Gates
a
a) b)
P
b
b N
w 0 w a
MUX
a
1
c P
c
a N
a
Analysis: example with a 2:1 mux

Gate-Matrix-like Electrical 2:1 Mux
c) w a a c b
wN = ( a ) [c] + ( a ) [b]
a
wP = ( a ) [c] + ( a ) [b]
a P-ch
w = ( a ) [c] + ( a ) [b]
a N- ch
Symmetrical Equation
a

Karnaugh Maps of 2:1 Mux
d) e)
w bc w bc
00 01 11 10 00 01 11 10
0 (a) 0 0 0 1 1 ( a ) [b]
a a
1 1 0 1 1 0
(a) ( a ) [c]

Synthesis of a complex CMOS gate with transmission
MOS
x c c b b d d Vss
x ab c b
00 01 11 10 P-ch
00 0 0 0 0 c b
01 0 1 1 0
cd c b
11 0 0 0 0 N-ch
c b
10 0 1 1 0
b
x = ( b ) [0 ] + (bc)[d] + (bc) [d]
xN = ( b ) [ 0 ] + ( b c ) [ d ] + ( b c ) [ d ]
xP = (bc)[d] + (bc)[d]

Gate-Matrix Incrementer (pass-MOS)
Co Co s cin cin a a Vss

s a a
0 1 0 1 cin
0 0 1 0 0 0
cin cin cin N- ch
1 1 0 1 0 1 cin
cin
s = ( cin) [a] + ( cin) [ a ]
Co = ( cin) [0] + (cin) [ a ]

cin
Sum s is a XOR gate with P-ch
cin
4 transistors !
cin

N-ch Transmission Transistor
N
a
s z
Out: Vdd-VT a
b Vdd-VT
c c
«1»
VTP: P-ch off N-ch on Out: Vdd-VT
P-ch off VGS=0!

VTN: N-ch off «1»

Gate Matrix Circuit (CSEM 1985)

C2MOS gate versus transmission gate
26 MOS
CK Slave
D
Master of a D Flip-Flop
Vdd
I Vdd
I CK
I
S
CK CK
S CK S
CK CK
CK
I I I
S
CK
Drawbacks of Transmission Gates
• However, in our opinion, it is not a good practice to use transmission MOS or

gates.
• They have been extensively used for N-MOS technologies and often used for
CMOS technologies, but they present some problems for modelization and
simulation.
• One has to know what is the driver of a transmission gate to determine its
gate delay.
• If the gate delay depends on the previous driver gate, one has a major
problem for cell characterization.
• It is why one has to place inverters or buffers for each input of a standard cell
with transmission gate.
• The advantage of reducing the transistor count is lost

Other Logic Families
• The CMOS static style is very simple and generally the most efficient in
speed/power consumption
• Dynamic and precharged logic styles, as well as dual rail, consume much
more
• N-MOS pass transistor logic (quite good for MUX & XOR) has a problem
at low Vdd, i.e. a Vdd-VT at the output
• It is why one has to add a MOS keeper, it is SPL logic, i.e. Single Pass
Transistor Logic
XOR: N-MOS pass-transistors

Only N-ch network:
Æ 2 MOS!

SPL: Single Rail Pass Transistor
Could compete with

static CMOS at high
Vdd.
But not at low Vdd, for

which SPL is slower
and consumes much
more than
static CMOS
Full Adder

Dual Rail Logic
Vdd Dual Rail Logic

P-MOS Voltage Switch Logic
Precharge
S S
N-MOS N-MOS
- precharged logic
inputs
- dual rail (S and S)
Dual networks

CPL (Dual-Rail) versus CMOS
Complementary Pass
M Transistor Logic
U
X
Two N-NOS networks
for each rail

Differential Cascode Voltage Switch Logic
DCVSL Dual-Rail
Ratioed:
the N-ch
networks
have to
fight
against
the P-ch
loads

Precharged Logic
a b c d
precharge evaluation
+ + + +
PR P-ch PR P-ch
S S S
P-ch
N-ch N-ch N-ch
S
PR n-ch PR N-ch

Problem: Logic Gates in Series
• Precharged gates in series with same precharge signal PR.

• Such a circuit cannot work.
• At the end of the precharge, the output of the first gate S1 is ‘1’ and all the
other outputs of other gates which are also inputs to the second gate are
also at ‘1’.
• It results that the N-ch network of the second gate is conducting.
• Before that some inputs S1 can switch to ‘0’ to cut off the N-ch network of
the second gate (assuming that the final output state S2=1) in the evaluation
phase, this N-ch network is conducting and discharges the output S2.
• This N-ch will be cut off later on, but it is too late, and there is no P-ch
network to charge S2 to ‘1’. So false output S2.

Precharged gates connected in series
+ a + + +
b
PR P-ch PR P-ch P-ch PR P-ch
PR
S1 S2 S1
P-ch
N-ch N-ch N-ch
S2
PR N-ch PR N-ch PR N-ch PR N-ch
impossible IDEAL

Domino Logic
+ + To have inverters to cut off

PR P-ch PR P-ch the next N-ch network
S1 S2
N-ch N-ch
PR N-ch PR N-ch
Krambeck and Law, authors of the first paper on Domino logic, have
received an Award at ISSCC’2000, but this paper on Domino logic had
been refused 20 years ago at the same conference

Precharged NORA gate
+
PR P-ch PR P-ch
P-ch
PR
N-ch
PR N-ch PR n-ch
PR
to N-ch
to P-ch
The name NORA means NO Race. This logic style is similar to
Domino logic and to the logic using alternate N-ch and P-ch networks
The only difference is a dynamic latch (C2MOS gate) at the output

Problems in Dynamic Logic
• An important problem of precharged logic is the dynamic output node that

presents smaller and smaller parasitic capacitance in deep submicron. It
means that the lowest frequency (several 10th of kHz in 2 to 4 µm, is larger
and larger and quite close to the highest frequency.
• Another problem is the activity, as an output at ‘0’ for a long period is

precharged to ‘1’ at each clock cycle, resulting in a 100% activity. It means
that the power consumption of precharged circuits is higher than the power
consumption of static circuits

Charge Sharing
+ +
a PR b CK
OUT
I
d Cout OUT
I Cout
a b c
Cn CK
Cb
PR

Low-Power and Standard Cells
• To write low-power VHDL code:

• Gated Clock
• Latch based VHDL
• Standard Cell Libraries
• Gate Sizing
• Cell Categories
• Gate decomposition
• Clock Skew, Clock Tree, Asynchronous
• Low-Power Standard Cell Library (for synthesis)

ALU and Gated Clocks
See also A. Amara:

Power Aware Design
To minimize the activity Techniques
of a combinational
circuit (ALU),registers
data registers are located at the inputs
of the ALU. They are B Bus<8>
loaded at the same time
--> very few transitions ABus <8>
in the ALU gated
clock
ctr
REG0
These registers are at
ALU the same time pipeline
REG1
RAM Index H
registers (a pipeline for ALU<8>
free !) ROM Index H
control RAM Index L
The pipeline gated ROM Index L

register CY, Z ACCU
clock
mechanism Status Register
does not result in a S Bus <8>
more complex
architecture, but
reduces the power

Latch-based Clocking Scheme Principle
- I.P. cores more robust

inputs
Combi- in Latch-based
national master slave - better clock skew tolerance
Circuit
DQ DQ - smaller but 2 clock trees
DQ DQ
DQ DQ - more temporal barriers, less glitches
- reduction MOS (ms-slave)
Ø1 Ø2 Ø1 Ø2
Skew CK
between
Ø1 pulses Very robust
has to be Ø1
less than
1/2 period Ø2

Clock Gating for Latch-based design
Combinational Combinational Combinational

Combinational Circuit Circuit Circuit
Circuit
Clock1 Clock2 Clock1 Clock2
- logic circuit between each latch, more performances

- very natural and safe clock gating methodology
- gives glitch free clock signals without the adding of
memory elements (generally required for DFF)
- Synopsys handles very nicely the latch-based design method
(time borrowing, clock speed, …)

Transistor Sizing
delay
rise time
fall time
W
N-ch

Several Cell Categories
ƒ
ƒ/2
ƒ :2
:2
very
fast
cell
C mux
set of cells with

smaller transistors

NAND6 decomposition
a) b)
A1
A1 36 δ
A2 ZN
A2 9δ ZN
A3
A3 4δ δ
A4 A4 ∆
A5 6∆ A5
A6 A6
Delay : 36 δ + 6 ∆ Delay: 9 δ + 4 δ + δ + ² = 14 δ + ∆

Delays
gate (at 3.3 V) delay

_________________________________________________________
NAND6 not decomposed 0.70 ns
NAND6 decomposed 0.42 ns
NOR6 not decomposed 1.81 ns

NOR6 decomposed (2NOR3+NAND2) 0.65 ns
NOR6 decomposed (3NOR2+NAND3) 0.53 ns

BUS Driver
N0 Vdd CMOS Complex Gate
Z0
N1 BUS N0 N1 N2 N3 N4 N5 N6 N7
Z1 Z0 Z1 Z2 Z3 Z4 Z5 Z6 Z7
N2
Z2 BUS
N3
N0 N1 N2 N3 N4 N5 N6 N7
Z3
.........
Z0 Z1 Z2 Z3 Z4 Z5 Z6 Z7
N7
Z7 Vss

Hierarchical Bus Driver
0 The hierarchical BUS

BUS Driver contains 4
mx
BUS hierarchical levels with
1 smaller and smaller
0
transistors. Rise and Fall
mx
delays are not symetrical,
1 but one has 4 delays in
series. If the supply
0 voltage is reduced to 3.0
mx Volts, the energy per
1 transition is reduced by
about a factor of 3
16 sources, 64 MOS, Cload = 0.5 pF, very large transistors

Less Cells and More Speed Performances
CSEL_LIB 4.x CSEL_LIB 5.0

delay µm2 delay µm2
32-b Multiply 16.43 ns 907K 12.15 ns 999K
fp Adder 27.73 ns 510K 21.15 ns 548K
CoolRISC ALU 10.79 ns 140K 7.66 ns 170K
Low-Power Standard Cell Libraries:
- CSEL_LIB 4.x: 60 functions, 220 layouts
- CSEL_LIB 5.0: 22 functions, 92 layouts

Less Cells and Similar Silicon Area
CSEL_LIB 4.x CSEL_LIB 5.0

delay µm2 delay µm2
32-b Multiply 17.09 ns 868K 17.0 ns 830K
fp Adder 28.09 ns 484K 28.0 ns 472K
CoolRISC ALU 11.0 ns 139K 10.97 ns 118K
CSEL_LIB 4.x: 60 functions, 220 layouts
CSEL_LIB 5.0: 22 functions, 92 layouts
CSEM Library in TSMC 0.18µm available

Thank you for your attention.

Logic Design Part 2

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Logic Design Part 2

Încărcat de

Drepturi de autor:

Formate disponibile

Low-Power Logic Design

Low-Power and Standard Cells

Short PPT title | Author | Page 1

CMOS Combinatorial Circuit

X1 C0(X1, X2, ..., Xn) for N- ch

Short PPT title | Author | Page 2

Short PPT title | Author | Page 3

Short PPT title | Author | Page 4

a Other example : NAND Gate

Short PPT title | Author | Page 5

Short PPT title | Author | Page 6

Short PPT title | Author | Page 7

Short PPT title | Author | Page 8

• the « separated simplification » method provides several advantages

Short PPT title | Author | Page 9

P b c For a simple combinatorial circuit,

c c It is the most used method

Short PPT title | Author | Page 10

Short PPT title | Author | Page 11

Short PPT title | Author | Page 12

Short PPT title | Author | Page 13

Short PPT title | Author | Page 14

Short PPT title | Author | Page 15

Co Co s cin cin a a Vss

Co = ( cin) [0] + (cin) [ a ]

Short PPT title | Author | Page 16

P-ch off VGS=0!

Short PPT title | Author | Page 17

Short PPT title | Author | Page 18

• However, in our opinion, it is not a good practice to use transmission MOS or

Short PPT title | Author | Page 20

XOR: N-MOS pass-transistors

Short PPT title | Author | Page 21

Could compete with

But not at low Vdd, for

Short PPT title | Author | Page 22

Vdd Dual Rail Logic

Short PPT title | Author | Page 23

Short PPT title | Author | Page 24

Short PPT title | Author | Page 25

Short PPT title | Author | Page 26

• Precharged gates in series with same precharge signal PR.

Short PPT title | Author | Page 27

Short PPT title | Author | Page 28

+ + To have inverters to cut off

Short PPT title | Author | Page 29

Short PPT title | Author | Page 30

• An important problem of precharged logic is the dynamic output node that

• Another problem is the activity, as an output at ‘0’ for a long period is

Short PPT title | Author | Page 31

Short PPT title | Author | Page 32

• To write low-power VHDL code:

Short PPT title | Author | Page 33

See also A. Amara:

The pipeline gated ROM Index L

Short PPT title | Author | Page 34

- I.P. cores more robust

Short PPT title | Author | Page 35

Combinational Combinational Combinational

Clock1 Clock2 Clock1 Clock2

- logic circuit between each latch, more performances

Short PPT title | Author | Page 36