Low-Power Digital VLSI Design

1
LOW-POWER VLSI DESIGN:

AN OVERVIEW
1.1 WHY LOW-POWER?

Historically, VLSI designers have used circnit speed 85 the "performance" met-
ric. Large+., in terms of perfoimanee and silicon area, have been made for
digital processorz, microprocessors, DSPs ( D i t d Signal Processors), ASICs
(Application Spec& ICa), ete. In general, "small area" and "high perfor-
mance" are two cordieting constraints. The IC designers' activities have been
involved in trading off these constreink. Power dissipation issue was not B d e
sign criterion but an afterthought. In fact, power considerations have been the
ultimate design criteria in special portable applications such as wristwatches
and pacemakers for a long time. The objective in these applications war mini-
mum power for maximum battery life time.
Recently, power dissipation is becoming an important constraint in B design.

Several reasons anderlie the emerging of this issue. A m o n g them we dte:
Battery-powered systems such BS bptop/noteboak campatus, electronic

organiserr, etc. The need for these systems a r k s from the need to extend
battery We. Many portable electronics nse the rechargeable Nickel Cad-
mium (NiCd) batteries. Although the battery industry has been making
efforts to develop batteries with higher energy capaeity than that of NiCd,
8 strident increase does not seem imminent. The expected improvement
of the energy density is 40% by the turn of the century. With iecent
NiCd batteries, the energy density is around 20 Watt-hour/pound and
the voltage is around 1.2 V. So, for example, for a notebook consuming a
typical power of 10 Watts and using 1.5 pound of batteries, the time of
operation bdween recharges is 3 hours. Even with the advanced battery
2 CHAPTER
1
technologies. such as Nickel-Metal Hydride (Ni-MH) which provide large

energy density characteristics (- 30 Watt-hour/pound), the life time of
the battery h still low. Since battery technology has offered a limited im-
provement. low-power design techniques are essential for portable devices.
* Low-power design is not only needed for portable applications but also
to reduce the power of high-performance systems. With large integration
density and improved speed of operation, systeme with high do& frequen-
cies are emerging. These systems are using high-speed products snch as
microprocessors. The cost as9ociated with packaging, cooling and fans
required by these systems to remove the heat is incteasing significantly.
Table 1.1 shows the power consumption of various microprocessors that
operate in the frequency range of 66-t-300 MHu. This table demonstrates
that, at higher frequencies, the power dissipation is tw excesive.
rn Another issue related to high power dissipstion is reliability. With the

generation of on-chip high temperature, failure mechanisms are provoked
[El. Among them, we cite silicon interconnect fatigue, package relstcd
failure, electrical p a m e t e r shift. electrornigration, junction fatime, ete..
In addition,there is a trend tv keep the computers from using more than
5% shlue of the total US power bndgct [9]. Note that 50% of office power
is nsed by PCs. Since the processors' frequency is increasing, which results
in increased power, then low-power design techniques are prerequisites.
The power dissipation issues and the devices' reliability problems, when they
are sealed down to 0.5 fin and below. have driven the electronics industry to
adopt a snpply voltage lower than the old standard, 5 V. The new industry
Low-Power VLSI Design: An Overview 3
standard for IC operating voltage is 3.3 V (i10%). The effect of lowering

the voltage to much lower values can be impressive in terms of power saving.
The power is not only reduced but also the weight and volume associated with
batteries in battery-operated systems.
1.2 LOW-POWER APPLICATIONS

Low-power design is becoming a new era in VLSI technology, 8s it impacts
many applications; such as:
Battery-powered portable systems; for example notebooks, palmtops, CDs,

language translators, etc. There systems represent an important growing
maiket in the compoter industry. High-performance capabilities, eompara-
ble to those of desktops, are demanded. Several low-power deroprocessors
have been designed for these computers. Table 1.2 shows some examples
of there low-power processors. However, these circuits still consume signif-
icant power an the order of 1-to-3 Watts. These &ems have their power
_.
(!4 0 (W)
PowerPC 603 80 0.5 3.3 2.2 [lo]
IBM 486SLC2 66 0.8 3.3 1.8 [Ill
MIPS R4200 80 0.64 3.3 1.8 [IZ]
dissipation dominated by I j O devices such as hard disk ddves and LCD

displays. The total expected power dissipation of notebooks is 2 Watts
with 4 pounds weight and daily recharge.
Electronic pocket commvnication products such 8s; cordless and cellular
telephones, PDAs (Personal Digital Assistants), pagers, ete. Table 1.3
shows a battery analysis far B handheld cellular system. Low-power is
crucial for extending the battery life of these systems. Also, battery im-
provement is needed. The PDAs requite a large *mount of dats processing
with multimedia capabilities. The expected power of PDAs is around 0.5
Watt with 0.5 pound weight. Also the expected power for pagers is 10 mW
with 0.125 ponnd weight.
4 CHAPTER1
Handheld Cellular
Example Motorola Microtac
RF Power GOO mW
750 mAH secondary NiCd

Battery life 75 minuter talk time
I 20 hours standby
Total power load I 650 mA x G V = 3900 m W
. SubGHz processors for high-perfomance workstations and computers.

100 MBz systems and over are emerging, and 500 MHz and higher will
be common by the end of the decade. Since the power consumed is in-
creasing with the trend of frequency increase then processors with new
architectures and circuits optimized for low-power are crucial.
rn Other applications such as WLANs (Wireless Local Area Network) and
electronic goads (calculators, hearing aids, watches, ete.).
1.3 LOW-POWER DESIGN METHODOLOGY

In order to optimize the power dissipation ofdigital systems low-power method-
ology should be applied throughout the design process from system-level to
proeeer-level, while realizing that performance is atill essential. During opti-
mization, it is very important to know the power didribution within a proeer-
SOL Thns. the parts or blocks consuming an important fraction of the power
ate properly optimized fa power 9a-g. Fig. 1.1 shows the different design
levels of an integrated system. The process technology is under the control of
the deviee/process designer. However, the other levels are eontrolled by the
circuit designer.
1.3.1 Power Reduction Through Process Technology

One way to reduce the power dissipation is to reduce the power supply voltage.
However the delay increases sigdcantly, particulsrly when VDD approaches
Low-Power VLSI Deszgn: An Overview 5
cI LOGIC/CIRCUlT
DEVICEPROCESS
I
I
Figure 1.1 Power reduction design ~pacr
the threshold voltage. To overcome this problem, the devices should be scaled
properly. The advantages of scaling for low-power operation are the following:
Improved devices’ charlrcteristics for low-voltage operation. This is due to

the improvement of the current drive capabilities;
rn Rednced capacitances throngh small geometries and junction capacitances;
I Improved interconnect technology;
Availability of multiple and variable threshold devices. This iesults in

good management o f active and standby power trade-off; and
1 Higher density of integration. It was shown that the integration of 8 whole
system, into a single chip, provides orders of magnitude in power savings.
6 CHAPTER
1
Table 1.4 shows the effect of ecaling on microprocessor performance [14]. The
power &sipation can be reduced by one order of magnitude at fired frequency
of operation.
L (/4 I 0.50 I 0.35 1 0.25 I 0.15

L.ff ( P ) I 0.35 1 0.25 1 0.15 I 0.10
VDD (V) I 3.3 1 2.5 1 1.8 I 1.5
Area (mm') I 8 x 10 15.6 x I I 4x5 1 2.5 x 3

Clock (MH.) I 100 1 150 I 225 I 330
Power (W) 1 5.0 I 3.3 I 2.35 1 1.5
- ~Inn"M"R -"
m
Area (%ma) 1 6.4 x 8.4 I 4.5 x 6 I 3.2 x 4.2 1 2 x 2.5
Power(W) 1 5.0 I 2.2 I 1 1 0.45
1.3.2 Power Reduction Through Circuitnogic design

To minimize the power at circnit/logic level, many techniqoes c a n be nsed such
as:
Use of more static style over dynamic style;

Reduce the switching activity by logic optimim.tion;
Optimim clock and bns loading;
Clever circuit techniques that minimise device count and internal swing;
Custom design may improve the power, however, the design cost increases;
Redace VDOin "on-critical paths and proper transistor sizing;
Use of multi-!+ logic circuits; and
Re-encoding of sequential &enits.
Low-Power VLSI Design: An Overuiew 7
1.3.3 Power Reduction Through Architectural Design

At the architecture level, several approaches can be applied to the design:
rn Power management techniqoes where annsed blocks are shutdown;

m Low-power architectnrcs based on parallelism, pipelining, etc.;
m Memory partition with selectively enabled blocks;
Reduction of the number of global busses; and
rn Minimieation of instruction set for simple decoding and execution.
1.3.4 Power Reduction Through Algorithm Selection

Among the techniqves to minimize the power at the algorithmic level, we cite:
rn Minimking the number of operations and henee the number of hardware

resonrces; and
Data coding far minimum switching estiuity
1.3.5 Power Reduction in System Integration

The system level is also important to the whole process of power optimization.
Some techniques are:
. Utilive low system clocks. Higher frequencies are generated with on-chip
phbse locked loop; and
rn High-level of integration. Integrate off-chip memories (ROM, RAM, etc.)
and other ICs such 61 digital and analog peripherals.
1.4 THISBOOK
Tb3 book is an early eontribntion to the field oflow-power digital VLSI circuit
and system design. It targets two types of aodiences; the senior undergrad-
uate and postgradoate university stodents and the VLSI circuit and system
8 CHAPTER
1
designer working in industry. In this book we have tried to cover the basics,
from the process technologies and device modeling t o the architecture level, of
VLSl system. T h e fundamentals of pow- dissipation in CMOS Circuits are
presented to provide the readers with Juffieient badrgranod to be famdiaz with
the low-power defign world. Several practical eheuit examples and low-power
techniqucs, mainly in CMOS technology, me discussed. Also low-voltage issues
for digital CMOS and BiCMOS eircnitr are emphasiied. This book also pro-
vides an extensive study of advanced CMOS subsystem design. brious power
minimiaation techniques, 8t the circuit, logic, architecture and algorithm lev-
els, are presented. Finally, the book includes a rich list of references, treating
advanced topics, at the end of each chapter. This allows the readers to study,
in depth, any topier they find interesting.
This book is orgganiad into eigth chapters. The first chapter i s an introduction
to low-power design. The other chapters m e presented in the following sections.
1.4.1 Low-Voltage Process Technology

Chapter 2 deals with CMOS bulk, bipolar, BiCMOS and CMOS Silicon On
Insolstor (SOI) process technologies. Several CMOS technologies (N-well and
twin-tub) and low-voltage CMOS enhancement m e reviewed. Bipolar technol-
ogy with emphasir on advanced stmetme. is considered. The topic of the isols-
tion techniques wed for both bipolar and CMOS is addressed. Three BiCMOS
technologies, with different perfomance/cmt, are presented. Complementary
BiCMOS structnre, where a vertical irolated PNP transistor merged with an
NPN transistor in 8 CMOS process. The design rules of a 0.8 ~"mBiCMOS
process is supplied. Finally, SO1 technology is reviewed for low-voltage and
low-power spplieatianr.
1.4.2 Low-Voltage Device Modeling

Chapter 3 addresses the topic of device modeling. This t a p k is of iderest to
those readers who need to analyze, design and/or simulate circuits. It intro-
duces commonly used models of both MOS and bipolar devices. In this chapter
we consider simple analytical models which EM be used for circuit malysir and
design of deep-rubmicromete. MOSFETr a t low-voltage. Also, a simple model
t o compute the leakage current, henee the static power dissipation, of MOS-
Low-Power VLSI Deszgn: An Overview 9
FETs i6 discussed. The SPICE’ device models of an 0.8 pm CMOS/BiCMOS

process are also presented. This should help the reader to appreciate the mean-
ing of the model parameters as well as to analyse the power and delay of the
low-voltage cirenits presented throughout the book. Supply voltage scaling,
due to reliability and power dissipation issnes, is presented.
1.4.3 Low-Voltage Low-Power VLSI CMOS Circuit Design

Chapter 4 focuses on CMOS logic circuit design. The sauces of power dissipa-
tion in these circuits are reviewed. Simple models for delay and power dissipa-
tion estimation m e presented. The concept of switching activity is introduced
and examples are given. The power dissipation due to spurious transitions is de-
scribed. Several CMOS design styles, such 8s pseudo-NMOS, dynamic and NO
RAee (NORA) logics, are studied. Guidelines for low-power physical design 810
presented. Other circuit variations of the static complementary CMOS, which
are suitable for low-power applications, are discussed. This indodes the pass-
transistor logic family such as Complementary Pass-transistor Logic (CPL),
Dual Pass-trmsistor Logic (DPL), and Swing Restored Pass-transistor Logic
(SRPL). Also an overview of clocldng strategy in VLSI systems is covered. In-
duded in this chapter is ane important area which is the I/O circuits. The
power dissipation of the 1/0 circuits in also analped. Finally, techniques to
reduce static and dynamic power components for CMOS design are also re-
viewed. This chapter is intended to provide the readers sufficient background
in low-power circuit design.
1.4.4 Low-Voltage VLSI BiCMOS Circuit Design

A variety of BiCMOS logic circuits suitable for 3.3 and sub-3.3 V are presented
in Chapter 5. The chapter starts with the introdoction of the conventional BiG
MOS (totem-pole) gate which was used in 5 V applications. The degradation
of this gate, with supply voltage scsJing, is demonstrated. The BiNMOS family
suitable for low-voltage applications (3.3- 2 V range) is introduced. It is shown
that it provides better performance and delay-power product than CMOS, at
these voltages, even a t low fan-out. Other logic families, for low power sup-
ply voltage operation, are also discussed. Finally, this chapter presents several
low-voltage applications of BiCMOS.
‘SPIUE i s th. mod c o m o n l y u r e d circuit timulator.
10 CHAPTER
1
1.4.5 Low-Power CMOS Random Access Memory Circuits

The objective of Chapter 6 is two-fold. It is intended to present &=nit tech-
nique for active and standby power reduction in static and dynamic RAMs,
and to apply the concepts bebind these techniqoes for other applications b e
cause RAMs have seen a remarkable and rapid progrw in power reduction.
These techniqoes are applicd to the architectural and dreuit levels. Several
advanced circuit structures and memory organisstions are described. Circuits,
operating at a power supply as low as 1 V, are dm discussed. The Voltage
Down Converters (VDCs) used as DC-DC converters are also treated. Their
low-power aspects ere investigated.
1.4.6 VLSI CMOS SubSystem Design

Chapter 7 presents B subsystem view of CMOS design. A variety of building
blocks of VLSI systems such as adders, multipliers, ALUs, data path, ROMs,
PLAs, ete. are &cussed. Several options of each subsystem are presented with
power dbripation emphasis. The use of PLL in high-speed CMOS systems for
deskewing the internal dock is &o examined. Low-power issuer of CMOS
subsystems ilie &o included.
1.4.7 Low-Power VLSI Design Methodology

In Chapter 8 advanced techniques to reduce the dynamic power component
at several levels of design are presented. Lowering the power supply voltage
while maintaining the performance is one technique for power reduction ad-
dressed extensively in this chapter. It is shown that low-power techniques at
the high-level (algorithmic and architectural) of the design lead to a power
saving of several orders of magnitude. Several exxamples are included to give
the reader a desr picture of low-power design aspects. In addition, the pow-
estimation techniqnes, at the G c n i t , logical, architectural and behavioral Lev-
els, 61e overviewed. The goal of powa estimation is to opt-e power, meet
requirements and know the power distribution through the chip.
REFERENCES
[l] Special Report, 'The New Contenders," IEEE Spectrum, pp. 20-25, De
cember 1993.
[2] D. W. Dobberpuhl et al., 'A 200-MHz 64-b Dual-Issue CMOS Micro-
processor", IEEE J. Solid-State Circuits, vol. 27, no. 11, pp. 1555-1567,
November 1992.
131 W. J. Bowhill et d.,"A 300MBs 64b Qoad-Issue CMOS RISC Miero-
processor," IEEE International Solid-State Circaits C o d , Tech. Dig., pp.
182.183, February 1995.
141 Technology 1995: Solid State, IEEE Speetmm, pp. 35-39, January 1995.
[5] D. Bearden, et d.,"A 133 MHe 64b Four-Issue CMOS Mieroproeessor,'
IEEE International Solid-State Circuits Conf., Tech. Dig., pp. 174.175,
February 1995.
[6] MIPS Press release, 1994.
[TI A. Charms, ot al., "A 64b Microprocessor with Multimedia Support,"
IEEE International Solid-state Circuits Conf., Tech. Dig., pp, 178-179,
February 1995.
[8] C. Small, "Shrinking Devices Pat the Squeese on System Packaging,"
EDN, "01. 39, no. 4, pp. 41-46, February 1994.
[9] P. Verhofstadt, "Keynote Address," IEEE Symposinm on Low Power Elec-
tronics, Tech. Dig., October 1994.
[ID] G. Gerosa, et d.,"A2.2 W 80 MHz Superscalar RISC Microprocessor,"
IEEE Journal of Solid-state Circuits, "01. 29, no. 12, pp. 1440-1454, De-
cember 1994.
[ll] R. Beehade, et al., "A 32b 66MAu Micropzocersor," IEEE International
Solid-state Circuits Conference, Tech. Dig., pp. 208-209, February 1994.
12 DIGITAL VLSI DESIGN
LOW-POWER
[I21 N. K. Yeung, Y-H. Sutu, T. Y-F. Su, E. T. Pak, C-C Chao, 5. Akki, D.
D. Yau, and R. Ladenquai, "The Deign of a 55SPECint92 RISC Proees-
IOIunder ZW," IEEE Internationd Solid-State Circuits Conference, Tech.
Dig., pp. 206-201, Febrmry 1994.
[13] 5. Lipoff and A. D. Little, "Evsluation of New Battery Technology in Se
lected Applications," IEEE Workshop on Low-power Electronics, Phoenix,
AZ, August 1993.
(141 J. M. C. Stork, "Toehaalogy Leverage for U1L.a-Low Power In€mmation
Systems," IEEE Symposium on Low Power Electronics, Tech Dig., pp.
5255. October 1994.
2
LOW-VOLTAGE PROCESS
TECHNOLOGY
This chapter ~ e w ffia an introduction to IC fabrication of CMOS bnlk, bipolar

BiCMOS and CMOS SO1 devices including sub-micron devices for low-voltage
applications. Section 2.1 is a review of CMOS process technologies. Examples
for an N-well CMOS process and a twin-tub CMOS process are considered.
Section 2.2 deals with bipolar technology with emphasis on advanced hipola
structures. The topie of the isolation techniques used for both bipolar and
CMOS is addressed in Section 2.3. In Section 2.4 we discuss the similarities
between advanced CMOS and advanced bipolar transistor strnetnres to demon-
strate how both technologies m e indeed convergiug. The BiCMOS technologies
we introduced in Section 2.5. with emphasis on CMOS-based processes. Three
BiCMOS technologies, with different performance/cost, w e presented. Section
2.6. introducer a complementary BiCMOS structure, where B vertical isolated
PNP transistor is merged with an NPN transistor in B CMOS process. In Sec-
tion 2.7, B table with the design rules of B generic 0.8 pm BiCMOS process
is supplied. Finally, in Section 2.8, SO1 technology is reviewed for low-voltage
applications.
2.1 CMOS PROCESS TECHNOLOGY

The idea of CMOS wao first proposed by Wanlaoa and Sah [l].In the 198O's, it
was widely acknowledged that CMOS is the technology for VLSI because of its
unique advantyes, such as low power, high noise margin, wider temperature
and voltage operntion range, overall circuit simplification and layout effie. The
development of VLSI in tho 80's has driven the integration density to millions
of transistors on B single chip.
14 CHAPTER2
In this section we review two CMOS bull. technologies: N-well and twin-tub
proeeeser. Other processes such ar retrogradwvell technology is not discussed.
2.1.1 N-well CMOS Process

In the N-well CMOS process, the P-channel transistor is formed in the N-well
itself and the N-channel in the €-substrate. Fig. 2.1 illustrates cross-sectional
views and process steps of B typical N-well process.
The process starts by growing an oxide on the wafer. The oxide is then pat-
terned to open N-well windows. Phosphorus atoms are implanted into the &-
con followed by a high-temperature annealing to diffusethe well [Fig. Z.I(a)].
The LOCOS ( L o c a l Oxidation of Silicon)' technique is used to isolate the
Merent active areas. After removing the nitride used in the LOCOS process,
a photoresist layer is deposited and is then patterned by B P-well mark (new
mark). This is followed by low energy ion implantation of boron (B I/I) to
adjust the threshold voltage of the N-channel transistor [Fig. Z.l(b)]. A see-
ond ion implantation can be applied to eliminate punchthrough in the short
channel device. Simiirly, the threshold voltage of the P-channel tramistor is
adjusted [Fig. Z.I(c)]. A thin gate oxide is then grown and B layer of polysil-
icon is deposited and doped with phoaphoros. The polyailiean is patterned to
form the gates of all the transistors and intereonneetion layer [Fig. Z.l(d)].
The source and drain regions are then implanted by using =photoresist mark.
Boron is used for the Pf regions of the P-channel transistors and arsenic for
N-channel transistors [Fig. 2.l(e)]. The N f and P+ regions e.re dso used N-
and F- we& contacts, respectively. The photoresist is removed and a thick
oxide is deposited by Chemical Vapor Deposition (CVD) ar an isolation layer
between the polysilicon layer and the subsequent metal layer. Contact holes
are opened in the oxide layer and metal (usually aluminum) is deposited on the
whole wafer. At this stage, the metal is patterned and annealed at d s t i v d y
low-temperature (450 C) [Fig. Z.l(f)]. One or two other metal layers are u m -
ally added. At the end, the wafer is pauivated and windows are patterned over
the metal bonding pads to provide electrical contacts with pins.
'For nore dctoils on the LOCOS iadationnrrc Sictian 2.8.l.
PI
16 2
CHAPTER
. Strip 1eisUordde
Grow gate oxide
Deporitpolysilicon
.
8 Apply photoresist
and pattern
stripresirt
.
a Apply photoresist
-..
Patteln s/D regions
for P-ehanorl
~mi~rp+srn
Stripphotar&t
. RepeatiorN+SlD
Stripphotore%l
..Grow oxide
...
Etch contact hoie
Deposit mptd
Pattar" metal
0 Metal anneal
Figure 2.1 (emtinwd)
2.1.2 Twin-Tub CMOS Process

An alternative =pproa& for CMOS devices fabrication is to use two separate
v& (tubs) for N- and P-channel transistors in a lightly doped N- or P-type
snbrtrate. This "twin-tub" CMOS technology uses a single mmk that d o w a it
to form two independently doped and self-aligned tubs [Z];hence both CMOS
devices types are optimiaed independently. This tlexibility in selecting the
substrate type with no change in the process flow is the major advantage of
twin-tub CMOS. This technology is alro more attractive when the devices are
scaled down to submicron dimensions.
Low- Voltage Process Technology 17
Fig. 2.2 shows the major steps involved in B typical twin-tub process. The
starting material is B lightly doped P-epitaxial material over a, Pi- substrate to
reduce latch-up. In addition to the conventional N-tub process, another N-type
(arsenic) shallow implant is used to increase the suifaee doping of the N-tub to
prevent punchthrough (far short channel devices). It is also used to form the
channel-stoppers' for the P-channel transistors [Fig. Z.Z(a)]. The photoresist is
stripped and a selective oxidation of the N-tub is performed. The nitride/pad
wide layers are removed to implant boron, which is driven in to form the P-tub.
This is followed by a second boron ion implantation for the channel-stoppers
for the N-channel device [Fig, 2.2(b)]. The N-tub oxide is then stripped. So
far only one mask (N-tub mask, MASK#l) is required for self-aligned wells
and channel-stopper processes. Both tubs are driven in. LOCOS isolation is
developed to isolate between the devices using MASK#2, which defines the
active areas. After the LOCOS process, baron is implanted through the pad
oxide (wed in the LOCOS) to reduce the threshold voltage of the P-channel
transistor using MASK#3. This process results in a buried-channel PMOS
transistor. The pad oxide is then removed. The remaining steps are similar
to those used in the N-well process where MASK#4 is needed to pattern the
polysilieon [Fig. 2.2(~)].MASK#B and MASK#B me required to form the N t
and Pi Joureer/drainr (S/D), respectively. MASK#? for contact openings,
and MASK#8 for patterning the metal [Fig, 2.2(d)].
The fabrication ofsobmicron MOS transistors requires additional process steps

to avoid hot carrier effects. Fig. 2.3 illustrates &CMOStwin-tub structure with
Lightly Doped Drain (LDD). Both NMOS and PMOS devices have lightly doped
extensions t o the ~ o u i c eand drain regions. The electric field near the drain is
reduced due to its light doping. This prevents the generation of hot carriers.
The major process steps to fabricate the LDD structure are shown in Fig, 2.4.
2.1.3 Low-Voltage CMOS Technology

Seded CMOS has been reoognived BE the technology suitable for low-power bat-
tery operated systems demanding high-speed operations. Conventional sealed
CMOS technology undergoes a drastic reduction in speed when the power sup-
Ply is reduced to 1 V and sub-l V. Ifthe threshold voltage is sealed aggressively,
the subthreshold leakage current increases drastically, which causes limitations
for battery applications. Hence, high-performance low-power sealed CMOS
technology is needed for ultra-low voltage operation. One key in achieving low-
Power CMOS devices i s the reduction of the junction capacitances 8s well =
'For marc dctaila on Lhc Ehannel-atopprra rrfcfrr t o S d i m 2.3.
18 CHAPTER2
.
-.
stripe rcsir,
8 Grow sclcctivc hick oxide
..
Remove niindeipad oxide
B in ( P - ~ ~ I I )
P-tub N-rub
B anneal (P-wolll
2 n d B Ill (channel-stoppis)
P-rub
P epi-1aycr
..
..
H'SID
P'SID
contacts
Metalhalion
A
P rpi4ayer
Figure I.l Twin-tub pmscss sequence

Side will Field irxidc

20 CEAPTER2
other pararitic capacitances. Also, the subthreshold cmrrent should be reduced

when low threshold voltage (VT5 0.3V)is wed.
Extensions and variations of standard CMOS process have been proposed to

enhance the performance of devices at low-voltage [3, 41. There devices have
good short channel behavior, low junction eapadtbnce and ledwed parasitic
resistance. The power supply choice depends on performhnce/reliabity/power
trade-offs. Reduced power supply is needed far low-power applications, but
8 deeprubmicron CMOS device with ultrathin gate oxide and low threshold
voltage should be used to improve performance. Table 2.1 shows the speed
achieved at low-voltages using deepsubmicron processes.
Table 1.1 Perforrnsnee cornperison tow-uoltsge.
[ N a m e [Ref.] I C M O S Process 1 Voltage (V)I Delay (ps) I

IBM [3] 0.10 pm
ATLT [4] 0.10 pm
NEC [5] 0.15 pm
Fujitsu [6] 0.10 pm 21.0
0.15 pm 50.0
Toshiba [8] 0.35 pm 52.0
An example of improved performance CMOS technology suitable for low-voltage

is the one proposed by Toahiba [a] called CMOS Shallow Jnoction Well F E T
(SJET). Fig. 2.5 shows the cross-sectional view of the CMOS-SJET process.
The N-well and P-well depths are very shallow and comparable to the max-
mum depletion layer width in the channel. With this CMOS-SJET structure
the depletion layer of the NMOS device, for example, is extended compared
to the original one and reaches the depletion layer of the P-well and the N-
type sobstrate. As B result, the total depletion layer width is inmeaced and
low depletion capacitance, Go,is obtained. This leads to the reduction of
the subthreshold slope ( s w Section 3.3.2). Thus, the threshold voltage can be
reduced at low power supply voltage compared to the conventional CMOS p r e
CWS. Furthermore the wells are designed to reduce junction capacitance of the
S/D tegions by 40 to 55 % compared to the conventional one. The structure
of Fig. 2.5 alro uses dual polysilicon gate Nt and Pt,to optimize the thresh-
old voltages of the MOS devices. Mo W-polycide gates m e used to reduce the
poly sheet resistance. The delay of the CMOS-SJET inverter is 2.5 times better
than that of conventional CMOS using the same gate sine (0.5 pm technology)
a t 1.5 V power supply. The power-delay product of a CMOS-SJET gate a t
Low-Voltage Process Technology 21
P MOSFET N MOSFET
N-Subsmh
1.5 V nsing 0.35 p m teehno1o.q is 1.3 fJ which is 113 times improvement of

that for conventional CMOS d e ~ c e s . However,the main drswback with the
CMOS-SJET is the large body effect due to its retrograde doping profile.
2.2 BIPOLAR PROCESS TECHNOLOGY

The technology ofepitaxial growth gave rise to the economical manufacturing of
monolithic bipolar ICs as it allows a high-quality thin film of semieonductox to
be grown on the top of a sobstrate. Jonction-isolation and e p i t u y techniques
triggered the progress of bipolar technology. Althongh, most of the focos
has been on the development of CMOS for the last ten years, yet, we find
that bipolar technology has achieved significant progress as well. Impressive
high-speed resalts were demonstrated at the 1985 ISSCC (International Solid-
State Circuits Cafereme) and thereafter. ECL (Emitter Coupled Logic) gate
delay of 15 ps have been reported 191. It was shown that advanced silicon
bipolar technologies, although quite complex, eould be integrated at the LSI
level and operate at frequencies above thore of CMOS circuits. Since then, the
interest in sdvaneed bipolar processes has increased. The key features for such
technologies are: i) self-aligned base, ii) advanced isolation techniques such 8s
deep-trench, and iii) polySicon emitter contact.
22 CHAPTER2
LOU- Voltage Process Technology 23
A1
Figure 1.7 C r o a s a d i o n d vicw of the SICOS bipolm device structure [ll]
hsve been replaced by the side wall base electrodes. This allows the base are&
to be almost as large as the emitter. The SICOS rtructnre is suitable for VLSI
applications became of its density and low perasitics
One of the features of advanced bipolar transistors is the replacanent of aln-

m n
iUm by polysilicon for the contact of the emitter. This step has led to
noticeable improvement in the current gain of bipolar transistam. For further
reading on polysilicon emitter BJTs refer to [lo, 12, 131.
In this aection, we introduce &typical DoublePolysilicon Self-Aligned (DPSA)

process technology as an example of the advanced bipolar technologies'.
Any bipolar process typically starts with creating the bnried layers and the
epitaxial layer. Fig. 2.8 illustrates the major steps of the epitaxid growth
with an iv+ buried layer (BL). This buried lsyer is introduced to reduce the
collector resistance o f a hipolar transistor. While the epitaxial layer offers the
high-quality silicon host far the bipolar transistor. The steps involved in Fig.
2.8 are the following. First, an oxide lsrer is grown on the substrate and is then
patterned using the buried layer mask. The photoresist on the oxide s e r ~ e sas
a mask against etching and ion implantation. After etching the oxide, the
exposed regions of the silicon surface are implanted by arsenic or antimony to
form the Nt buried layers. The photoresist is then removed and an annealing
step is carried out. All oxide is then stripped. An N-epitariai layer is grown
'A r-irw of conrmntiond bipolar t.~chnologyusing the jundion isolation ttchniquu can
be f o n d in [la].
24 CHAPTER2
Pholamm
.. Grow oxide
Apply p h a r o n a a
Pducdetch N+BLmark
8 Implant Sb
Si Epitaxial Laycr .. Strip resist

Annenl
Strip oride
Epilaxy (intrinsic layer)
on the substrate as shown in Fig. 2.8(b). The thickness of this epitadal layer
can he as low as 0.8 pm for advsnced digital bipolar technology. The problems
limiting the &g down of the thickness of epitaxial layer are the autodoping
and oot-diffusion of the boried Ieyer.
Fig. 2.9 amstrates the sequence of a DPSA process assuming B starting stimc-
ture with N+ buried layer, N-epitaxial hyer and isolation oxide as shown in
Fig. 2.9(a). First, photoresist is deposited and patterned to define the col-
lector contact region (deep Nt collector sink). This region is then implanted
with phosphorus to increa~eits doping level. The photoresist is stripped and
Low-Voltaqe Process Technology 25
Oxide isolalion
Initial Svucmre
Apply photoresist
.
PatBrn pholomist
(3 , : ,: (N+calleelor mask)
P In for lhcN'sink
CVD Oxide
..
Svip photoresistloride
(4
.DepositP+palySiio~ide
Pattendetch oxidalpolyS1
26 CHAPTER2
. DepositCVD oxide
RiE etch of oxide
Deposit !he second
-
lcvcl oipulyrilicon
P Ill IN+poIy)
Anncal
a Pauemictch
N+ p01ysi
-
a Dcposil oxide
Open wnracl haler
Dcposil metel
Pallemicuh mcial
B P-type bare is implanted through a pre-implantation oxide as shown in Fig

2.9(b). The resist and the oxide are then removed. A combination of 'P
polysilicon and oxide layers are deposited o m the wafer. These layers are then
etched 8 s shown in Fig. 2.9(c). A CVD oxide is deposited eyer the wafer. The
oxide is then dry etched using reactive ion etching (RIE). The Pi- polysilieon
is walled with the oxide (called sidewall space^) [Fig.P.S(d)]. The secondled
of polysilicon is deposited and implanted with phosphoros that will ultimately
form the diffosed emitter junction. At this stage, the wafer is annealed to drive
the dopants from the P+ and Nf polysilicon layers. Fig. 2.9(e) illwtiates
the structure after patterning the N+ polysilicon. The P+ diffusion under the
polysilicon forms the extrinsic base. The eontaet openings to the P+ and Nf
palyrilieon, and collector are etched. This is followed by the metallieation step.
At the end, the metal is patterned 81 shown in Fig. 2.9(I).
The advantage of bipolar devices is their high-speed performance. However,

there are not suitable for battery backup systems because they consume high
DC current. Many logic circuit techniqoes have been proposed for low-power
adlow-voltage operation, particularly for telecommunications applications 115,
161.
2.3 ISOLATION IN CMOS AND BIPOLAR TECHNOLOGIES
2.3.1 CMOS Device Isolation Techniques

Isolation in an integrated circuit means to electrically isolate similar or different
transistors. In a CMOS chip, where more than one million transistors can be
integrated, 1pA/tran&tor of leakage cnrrent due to a bad isohtion can lead to
a. few watts of DC power consumption, Moreover this leakage current pzovokes
susceptibility to thelatch-up as will be discussed in Section 3.1.6.
Isolation in CMOS is reqnired to separate the devices electrically by elimioat-

ing the inversion layers, which might be induced by the interconnection layer
between the trmsiston. The principle of isolation in CMOS is based on a field
oxide formation between two active mess [Fig, 2.101. The width ofthe isohtion
region should be minimiied to attain dense layout and particularly for VLSI
circuits.
28 CHAPTER2
Active Area Active Area

<’ 7
’
SubrLrare
Figure 2.10 Fidd o y d c irol~tirmin MOS integrated circuits.
Several isolation techniques have been proposed and used. The most popular
are LOCOS (Local Oxidation ofSilicon) [17],trench i d s t i o n [la, 19,20, 211,
and selective cpitaxy [22]. Selective epitaxy is not studied in t h s chapter.
2.3.1.1 Local Oxidation ofSilicon (LOCOS)
LOCOS is a relatively simple process for the isolation of active devices in

CMOS technology. It is realivcd by forming a thick field oxide (FOX) between
the active meas. FOX is very thick (0.4 - 0.6 hm), hence the corresponding
field threshold voltage is high. The condition for preventing an inversion layer
under FOX and between two active regions is that this field threshold voltage
should be higher than the highest power supply voltage used on chip. The field
threshold voltage can be further increased by iaipig the doping level under the
FOX, Thir can he achieved by selectively implanting the regions over which the
FOX is subsequently grown. These redom are commonly knom as chonnel-
8toppera.
The steps of the LOCOS process m e illwtrated in Fig. 2.11. A p d oxide

of 40 n m is grown and is followed hy chemical vapor deposition of B 100 nm
thick nitride layer, which masks the active region. The pad oxide is called
stress-relief-oxide (SRO) because it protects the silicon from stress caused by
the nitride during nuhsepucnt high temperature processes. Sicon nitride is
used as a mask to protect the active region from oxidation. A layet of pho-
toresist h applied to the wafer and then patterned using the mask of the active
areas. The nitride/oxide layers ace etched [Pi. 2.11(4]. A P-type dopant is
I PChanncl-Stop
Substrate
I Substrate
30 CHAPTER
2
Nitride
PolySiiicon
Nilridc
-
Figure 1.11 Poll buffered LOCOS promni
implanted to form the channel-stoppers [Fig. Z.ll(b)]. The photoresist, which

is used for protection against ion implantation,is sttipped and a thick thermal
oxide is grown;i.e. FOX. Only local oxkdstion is reahed hecanre the nitride
masks the cegions heneath it. At the end, the nitride/oxide are removed [Fig.
Z.Il(c)]. During this LOCOS process, 56% of tho FOX thickness b under the
silicon surfwe because the oxidation consumer some of the silicon. This p m
ceie is called remi-reeerred LOCOS isolation. One problem associated with this
PCOCOIS is the lateral extension of the field oxide under the nitride during the
oxidation, forming what is c d e d bird’s be& encroachment [Fig. 2.11(~)]. A
typical value ofthb encroachment is 0.5 pmlside. This encroachment limits the
sealing of the active areas and the c h e l width of the MOS device. Moreover,
this bird’s beak introduees imprecise channel widths.
The Pofy Buff=? LOCOS process was developed to iedoce the hid’s heat en-
croachment [23]. Ln this modified LOCOS process, the nitride mask thickness
has been inereared t o 240 n m snd B polysilicon streas relief buffer layer or50 nm
has been added between the nitride and B 10 n m pad oxide [Fig. 2.12(a)]. This
srrangement prevents deep lateral extenlion ofthe field oxidc under the nitride
layer [Fig. 2.12(h)]. A 0.8 pm field oxide thickness results in 0.15 pmlride of
encroachment and 2.2 pm minimum isolation pitch. Other techniques to solve

the problem of the bird's beak encroachment can be found in [24, 25, 261.
2.3.1.2 Trench Isolation
Treneh Isolation is mother alternative to LOCOS isolation process. This

technology has been accepted relatively quickly b the industry [Z'f]. It ad-
dresses the isolation problem between opposite type devices (like N-channel
and P-channel MOSFETs in CMOS technology). The advmtages of the trench
isolation m e : i) no bird's beak encroachment, ii) latch-up fiee structure, and
iii) planar sorfacc.
Fig 2.13 illustrates the steps of the trench isolation process. First, the pad
oxide, the nitride and the thick oxide layers are patterned using the mask of
the active areas. The thick oxide series ar s mask in the trench processing
[Fig. 2.13(.)]. A deep trench is formed by dry etching (RLE).This is fallowed
by B boron implsnt to ueate the P+ channel-stoppers at the bottom of the
trench. The top thick oxide is removed, and the trench sidewds are oxidived
[Fig. 2.13(b)]. The polysilicon is deposited over the whole wafer, filling the
trenches. The polysilicon is used as the trench dielectric because it uniformly
fills the trenches better than other dielectrics. The surface polysilicon is then
etched to yield the stroetore shown in Fig. 2.13(c). The wafer is oxidized
using the nitride as a mask. The nitride is finally removed as illustrated in Fig.
2.13(d). At this stage, conventional processing can be used to integrate the
CMOS devices.
Although trench isolation permits reduction of the separation between the ac-
tive regions; it has several drawbacks: i) it is a costly process because of the
large number of processing steps, and fi) it can not be used BE an isoletian
region for the inactive parts of the chip. In this ease, LOCOS is usnally used.
T h e description of other trench isollrtion processes c m be found in [28].
2.3.2 Bipolar Device Isolation Techniques

The first tsehnique used for bipolar isolation was based on collector/substiate
junction isolation [Fig. 2.141. The N-wells ( N collectors) ofthe adjacent transis-
tors were separated by Pt isles, which are deeply diffused to reach the P-type
substrate. By tying these ides and the robstrate to the most negative voltage,
thejunctions between them and the N-type collectors are revuse biased. Thus,
32 CHAPTER2
..
Grow oxidelnitrideloxide
Pattern a l i v e region
..
RIE trench
.Implant boron
Remove hick oxide
OXidizB m e h walls
Complement wcll
Porl-orocersinP
"
CII
.Oxidize
Remove nitride
B E C
I P-Subairare
Figure 1.16 Cross-sectional view of an NPN bipolar tranaialor with LOCOS

isolation.
allthe components in different N-wells (N collectors) me isolated. The area

conmmed by the isolation isles is large relative to the tramsirtor area.
The pa&s density of the bipolar technology tan be improved by r e p k g the

junction isolation with LOCOS kolation. An additional advantage of LOCOS
isolation is the reduction of the parasitic collector-substrate capacitance. Fig
2.15 illustrates the cross-sectional view of an NPN bipolar tranktor with LO-
COS isolation. The ares oecnpied by the oxide isolation is proportional to the
34 CAAPTER
2
epitaxial layer thickness. As the epitaxial thickness is being reduced for higher
device performance the oxide isolation area becomes smaller, which means that
LOCOS may become a practical isolation technique for advanced bipol-1 and
BiCMOS technologies.
Fig. 2.16 illwtrates thc proecsr steps for oxide isolation in a bipolar pmcesl.
After epitaxy growth, a thin layer of Si02 is grown and B layer of S i J N I is
deposited. A photoresist layer is applied and patterned with M isolation mark
[Fig. 2.16(a)]. Then the nitride/pad oxide layers and approximately half of
the epitaxial layer are dry etched. Boron implant is performed to form the
ehannel-stopper [Fig. 2.16(b)]. The photoresist is then removed and the wafer
i s oxidized to grow the thick isolation oxide. This oxide is called recessed ozide.
The SisN* and the pad oxide are stripped at this stage. The resulting strocture
is almost planar. In this structure the bird’s beak is formed BE in the MOS ewe
[Fig. 2.16(c)].
In the early 198O’s, new isolation techniques such as grooves and trenches [29,
30, 311 were demonstrated. These techniques reduced the collector-substrate
capacitance and increased the packing density. Hence they improve circuit
speeds The fabrication process is the same BS the one described in CMOS
trench isolation.
2.4 CMOS AND BIPOLAR PROCESSES CONVERGENCE

An interesting exchange of process technology know-how between the CMOS
and the bipolar domains has taken place over the years. We have seen that
epitaxial and buried layers hsvc been used for CMOS to mute the latch-up. At
the same time LOCOS, which WBS originally developed for CMOS, has been
used for isolsting bipolar transistors. The use of polysilicon for creating self-
aligned MOS transistors was later adapted for self-digned poly emitter bipolar
transistors. Another uample of the convergence between bipolar and CMOS
is the use of oxide spacers in CMOS for formation of LDD regions, while, it
has been osed in bipolar to reduce the reparation between the base contact and
the emitter. The convergence of both technologies made the attractive ides of
merging bipolar and CMOS seem more rational and feasible than ever.
Many of the steps of the advanced CMOS and bipolat procesrer ate similar,
hence, they can be shared for the fabrication of MOS and bipolar trsosistors
Photoresist
.
Oxide I \ Nilode
NtBL PruceES
Cmw epi-layer (Ntype1
Grow pad oxide
Dep06if nihidelresisl
Epi-layer Palteem resisl
..
Slnp r w k l
-+
Croiu sclecdvcoxidc
(CI
Remove nilndeloride
c
36 2
CHAPTER
when they are integrated in a BiCMOS process. Some examples of there steps
are:
1. The N-well, which can be used bl the body of the PMOS transistor and
ar the N-collector of the NPN transistor;
2. The N + buried layer of the NPN can be used to form B retrograde well for
the PMOS to reduce the latch-up susceptibility;
3. The polysilicon can be used for the CMOS gatos and for the emitter con-
tacts;
4. The r h d o w P-type implantation c a n he shared by the PMOS S/D and
the s e l f - w e d extrinsic base of the NPN transistor;
5 . The shallow N-type implantation can be shared by the NMOS S/D and
the emitter of the NPN transistor; and
6. The final annealing s t e p match.
However, as more steps me being shared by t h e different devices, the device

charactedstics have to be compromised. There is L tradeoff between the process
complexity and device quality.
2.5 BICMOS TECHNOLOGY

Although the idea ofmerging bipolar and CMOS on the same chip originsted 20
years ago [32],it was not feasible from a practical point of view becsuse of the
lack of adequate process technology. With the technological progresr achieved
in r-t ycarr, this idea has been revived. There are many techniques t o merge
bipolar and CMOS devices as reported in the literature [33, 34, 35, 36, 37, 381.
There m e two ways of classifying BiCMOS processes. One way ih to classify
them according to the baseline process. A CMOS-based BiCMOS process is
a CMOS bareline process, to which a bipolar transistor is added. Similarly, a
bipolar-bared BiCMOS process is a bipolar bascline process, to which CMOS
transistors are added. In both eases, the added device would have to be compro-
mired, which means that its characteristics can not be optimired. Alternatively,
BiCMOS processes can be classified according to their co.t/performance. In
this regard, three categories can be identified:
1. Low-cost;
2. Medium-performance; and
3. High-performance (high-speed).
In this section, we present three examples of BiCMOS processes. The first

one represents B low-cost proeers. It needs only one mask to incorporate the
bipolar device in B CMOS-based process. The second example shows a medium-
perfamanee BiCMOS process, which requires 3 extra masks to a CMOS pro-
cess. The third example illnstrbter a high-performsnce process in which polyd-
icon emitter and self-aligned structures are used.
2.5.1 Example 1: Low-Cost BiCMOS Process
In a low-cost BiCMOS proeerr, a bipolar transistor is added to B CMOS pro-

cess with minimum additional process steps. A typical N-we!J CMOS/bipolar
process sequence is listed in Fig. 2.17(a). The N-well of the PMOS is nsed for
the collector of the vertical NPN. The base is implanted in a separate step using
an additional mask. The P+ S J D and the extrinsic base shme the same im-
plantation step. The emitter and the Nt S/D ofthe NMOS are also implanted
in the same step. Fig. 2.17(b) illustrates the cross-section of an N-well BiC-
MOS strmtuie. The process complexity is comparable to that of the CMOS.
Howeuer, there me many trade offs in designing the emitter, base, and collector
of the NPN. If the CMOS proccss is optimbed, some of the bipolar device pa-
rameters, suuh as the breakdown voltage and the gain, may be satisfactory, but
many others are degraded. For example, due to the absence of the buied layer
and the deep Nt collector in the NPN, the collector resistance is high. Hence,
the cut-off frequency is low, the current drive is poor, and the collector-emitter
saturation voltage is high.
25.2 Example 2: Medium-PerformanceBiCMOS Process

Fig 2.18 shows B cross-sectional view of B BiCMOS stmeture, which can
be realized by adding an N P N to a baseline twin-tub CMOS process. This
structure has an N + buried layer and a deep Nt collector sink which enhance
the collector conductivity. The N + buried layer, under the PMOS, with tho
nniform N-well form a desired retrograde N-well. Similarly, the Pt buried layer
creates a retrograde P-well far the NMOS transistor. It also acts 81 an isolation
38 CHAPTER
2
CMOS (Bme) Bipolar (Addition)
P-SubsUale
N-well __I Collector ]
LOCOS isolation
NMOS channel implanration
PMOS channel implantation
Gate oxide
Polysilicon gate
SiDN+implantation
S l D P + implanmtion Pentrinsic base I
Base P implantation
t
~~
Contact opening
MeMiZa~CIn
(a)
WN NMOS PMOS
40 CHAPTER2
region between the N t buried layerr. A thin epitaxial layer (1 pm - 2 p m )

is used to increase the cutoff frequency of the NPN transistor and to reduce
the required width of the isolation islea between the bipolar transistors. The N
collector is formed at the same time with N-well of the PMOS transistor. After
the formation of LOCOS a deep N+ sinh is implanted and driven in. The Pf
extrinsic base is impknted at the ssme time with Pf S/D regions of the PMOS
transistor. The Nt emitter and the N+ S/D share the same implantation
step. In this process an aluminum emitter contact is used. Therefore. the 3i.e
of the emitter is larger compared to the case where a self-aligned polysilieon
emitter contact iv used. This process uses only 3 extra masks to form the
bipolar transistor. The first mask is needed for N t buried layer. The second
mask is used to implant the N+ deep collector, and the third one for the base
implantation.
The BiCMOS process described above can be optimized to be used far high
performance circuits. The collector resistance is low in comparison to the low-
cost proecsr (exsmple 1). For a 0.8 pm process, the cut-off frequency (ft) of a
bipolar can be as high m 5 081.
2.5.3 Example 3: High-Performance BiCMOS Process

A high-performance BiCMOS process can be achieved b7 replaeiog the N t
S/D implant, used t o form the emitter in example (21, by a doped polysilicon
emitter. One mtra mask is required to open the emitter window of the bipolar
transistor. The ion implantation of &hepoly emitter and MOS gates is devel-
oped simultaneously. As shown in Fig. 2.19, four additional mask levels (N'
buried layer, Nt deep collector, P-base, and emitter window) me required to
ohtnin an advanced BiCMOS.
After the farmstion of the N f / P + buried layers, the conventional twin-tub

process is carried out. LOCOS is developed to isolate the devices. The deep
collector N t is implanted and driven in, and the P-baseiS then patterned
and implanted. The threshold voltages of the MOS transistors are adjusted hy
additional ion implantations. After the gate oxide growth, a thin polysilicon is
deposited as shown in Fig. 2.20(a). The emitter window is then pettermed and
a second polysilicon layer is deposited [Fig. Z.ZO(b)]. The polysilicon is then
doped by implantation and patterned to define the CMOS gates and polyrilieon
emitter [Fig. Z.ZO(c)]. Next, implants are selectively carried out to form the
LDD regions for CMOS. Before implanting the N t / P + S/D regions. a sidewall
42 CHAPTER
2
Polysiticon
/ NPY
P-base
N-well
Thick piysilicon
(450 nm)
.
0 Apply photarcsisf
rauem emi,,er
.
0 Etch polytoxidc
s,ripresin
Deposit LPCVD poly
(250 "rn) 2nd pan
of spiit poiy
Poly-Erniller
\
-. lmplilni AsiQ
Apply pho~oicsist
-. Pattern poly
Dry etch poly
. strip reSiEl
Ann4
oxide is formed nelu the emitter and gate edges. Fig. 2.19(b) shows the find
crosrsection of this BiCMOS process.
The BJTs realiaed in the presented high-performance BiCMOS process have

low collector resistance (because of the buried layer and deep sink), high cur-
rent gain (becsuse of the poly emitter contact) and low parasitic capacitances
(because of the self-alignment). With this BiCMOS process ft's greater than 5
GHz can be achieved.
BiCMOS technology k a relatively high cost and complexity, because it re-

quires a total of 15 masks for snbmicron process. S e ~ e r dsolutions have been
proposed to redwe the number of process steps to lower process complexity
and cost. Recently one idea [40] has resulted in low-cost 0.35 fim BiCMOS
technology which needs only 11 masks by &g W-plog trench collector sink.
This technology is suitable for 3.3 V power supply voltage and promising for
low-power mixed-signal applications.
Recently BiCMOS technologies with high N P N f*'s transistor, from 10-to-30

GHz., have been reported [38, 40, 411. The applications of these technologies
are, for example, for low-voltage (3 V and s u b 3 V) and high-speed logic cir-
cuits. Another application of BiCMOS is mixed andog/digitd ICs .an&
from teleeommnnication circuits and high-speed networks to wireless systems.
Among these npplicstions, BiCMOS can be used for low-power high-frequency
portable systems. Bipolar devices can be used for high-frequency and high-
speed parts with low-power innovative circuits, and CMOS can be used for
low-speed ultra-low-power parts.
2.6 COMPLEMENTARY BICMOS TECHNOLOGY

In a Complementary BiCMOS (CBiCMOS) process both vertical NPN and
PNP transistors m e merged with CMOS on the same chip. Recent investiga-
tions indicate that CBiCMOS allows for improving the performance ofBiCMOS
gates at low supply voltages [42, 43, 441. Moreover far wireless applications,
where high-speed m d Im-power charactelistics are iequired, CBiCMOS tech-
nology is one of the solution. The added PNP device to conventional BiCMOS
can be oscd to efficiently design lowvoltage circuits. Further discnssion on
CBiCMOS circuits can be found in Section 5.3.2. Although, to date, the NPN
has shown superior performance to that of PNP, future trend indicates that
PNP performance k approaching that of NPN. Same of the problems wsoci-
44 CHAPTER
2
ated with the PNP transistor are its high collector resistance, low current gain,
and high b s e transit time.
It has been recently reported that CBiCMOS processes can offer NPNs with
fe'g of 8-20 GHz and PNPr with 2-7 GHa A [45, 46,41, 48, 49, 501. Fig. 2.21
shows a cross-sectional view and process flow of a CBiCMOS [46]. The N+
buried layet of the NPN transistor creates a retrograde well for the PMOS
transistor. The Pi buried layer is only used for isolation isles between NPN
transistors. After the epitaxial layer growth, twin-well and LOCOS processes
are performed. The P-well of the NMOS device is used 86 the collector of
PNP tr-tor. A second high energy (600 keV) boron ion implantation is
carried out to form the retrograde well (2nd P-well) for the NMOS and the P+
buried 1ny.r for PNP device. The S/D implants of MOS transistors are used
simultaneonsly for the extrinsic baser of the NPN and the PNP transistors.
The emitters of the NPN and the PNP are formed by the self-aligned contact
doping technique to simplify the process flow. Finally, the metal is deposited
and patterned.
Complementary BiCMOS offerr a technology with versatile devices. It adds

flexibility for mixed bipolar/MOS circuit design. The CBiCMOS technology
promises further improvements to BiCMOS circuits performance.
2.7 BICMOS DESIGN RULES

In this section, B set oflambda-based derign rules of a typical BiCMOS processs
(for 0.8 pm, X = 0.4 pm) is presented. The corresponding device parameters
are presented in Chapter 3.
the minimum length of the MOS gate is 2X and the minimum length and width
of the bipolar emitter contact is 2X and 4A respectively. Table 2.2 describes
the ba3ic marks used in the layont design of BiCMOS devices. The rest of the
masks are generated automatically.
Table 2.3 h t r the de3igp rules for the (design) masks only of a typical BiCMOS
technology in terms of the parameter A. The corresponding graphical repre-
sentation of design rules is illustrated in Plate 1. Plate I1 shows the layouts
of minimum size PMOS, NMOS and bipolar transistors in * 0.8 pm BiCMOS
technology.
6Thcgiucn designrules a r c t y p i o d o f ~ g m c r i c O . w
8m high-pdarmanccBiCMOSpco'osera.
P~rvbrUalc
N + I P + b w i d layer
N - t p spifBxill layer
Nn'iwinweIl(lnP-wcllfor PNP)
Field ihlulion
Callmior deep N'
DccpPt Ill for NMOS retrograde well uod
2nd P-well for PNP ( P+ bwicd layer)
Gate (CMOS)
NMOS S D ( N t s r s i n s i c brrc forPNP)
PMOS SID ( P Cwindc bsrc for NPN)
NPN Base
PNP Bare
Caniacl haler
N t w d P'eniLL~r implant
Mctslizaalion
P+
I I
Figure 1.11 (e) Fabrication pmcom flow: (b) C r o ~ c ~ o c t i mview

s l of CBiC
MOS [48].
46 CHAPTER
2
Teble 1.1 Basis BiCMOS Design Masks.
N-well (NW) The NW mark is used to define the N

substrate (bulk) of the PMOS and the N-
collector of the NPN transistor.
Nt deep collector (CN) The CN mark defines the area which is ex-
posed for the N + sink implantation.
P bare (CP) The CP maJk defines the ~e9;cm vhich is

to receive an P-implant to create the basc
dmlrion.
Polyrilicon (PO) The PO mark defines the gate and the emit-
ter electrodes, and the polysilicon intercon-
nect layer.
Emitter window (EW) The EW mask definer the opening for the
emitter window.
N i md Pt (DN and DP) The DN (DP) mask d e h a the N+ (Pi)

somzce and drain regime of the N-eh-d
(?-channel) device within the P-well (N-
well), and the body contact regions in the
N-wen (P-well)respectively.
Contact (CO) The CO mark defines the contact openings.
Metal 1 (Ml) The M1 mark defines the metal 1

interconnects.
Via (VIA) The VIA mask d&ms the openings of the

via that connects metal 1 to metal 2.
Metal 2 (M2) The M2 mask definer the metal 2

interconneets.
Lou- Voltage Process Technology 47
1. N-weU(NW)
1.1 minimum width 12A
1.2 minimum spacing 12A
2. N + -diffusion (DN)
2.1 minim- width 3A
2.3 minimum NW overlap ofDN OX
2.4 minimum NW to external DN spacing 6A
3. P+ -diffusion (UP)
3.3 minimum NW overlap of DP 4A
3.4 minimum NW to external UP spacing 4A
3.5 minimum space to DN (same potentid) CIA
3.6 minimum space to DN (different potentid) 3A
4. N-collector plug (CN)

4.3 minimum space to NW 1OA
4.4 minimum NW overlap of CN 3A
4.5 minimum space to DN 6A
4.6 minimum space to DP 5A
5. P-base diffusion (CP)

5.3 minimum NW olerlbp of CP 3A
5.4 minimum space to CN 5A
5.5 minimum space to DN 3A
5.6 minimum space to DP 3A
48 CHAPTER
2
6. Polyrilieon (PO)
6.2 m-um spming 3A
6.3 minimum space to DP or DN 2A
6.4 gate overhang of DP 01 DN 2A
6.5 minimW0 space to CN or CP 1A
7. Emitter window (EW)

7.2 minimum length 4A
7.4 minimum CP overlap of EW 2A
7.5 minimum poly overlap of EW 2A
8. contact (CO)
8.1 minimum size (single)
8.2 minimum rise (double)
8.3 minimum spacing
8.4 minimum DN or DP overlap of CO
8.5 minim"rn space to gate
8.6 minimum PO overlap of CO 1A
8.7 minimum CN or CP overlap of CO 1A
8.8 minimum PO to CO spacing in P b s e 2A
8.9 minimum poly emitter CO to CP spacing 2A
9. Metal 1 (MI)
9.2 minimom spacing 3A
9.3 minimum M I overlap of CO 1A
9.4 maximum current density 1 mA/pm
Table 2.8 (continued)
10. Metal 2 (Ma)

10.1 minimum width
10.2 minimum spacing
10.3 maim- current density
11. Via(VIA)
11.1 minimnm size
11.2 minimum spacing
11.3 minimum MI or M2 owrlap of VIA
11.4 minimum VIA to CO spacing
11.5 minimum PO to VL4 spacing
11.6 minimum PO overlap of VIA
50 CHAPTER
2
Plate I: Design Rules of Table 2.5.

NMOS
PMOS
BIT
Plate II: Layouts of minimum size PMOS, NMOS and bipolar

transistors.
52 CHAPTER
2
Si
2.8 SILICON ON INSULATOR

Silicon On lnsuletor (SOI) has recently received renewed interest for low-
voltage and low-power applications. This is due to the reduction of the cost
and improvement of its performance a t lower voltage. The emegenee of thio-
film SO1 CMOS processes have demonstrated excellent charactubtier for d e e p
submicron ULSI applications.
Many techniqnes existent to grow silicon on insolator [HI. The most mature
technique ir the epitaxial growth of Silicon On Sapphire (SOS). Many LSI/VLSI
circuits have been fabricated using SOS technology. SO1 can dso be produced
by oring what is called SIMOX (Separation by IMplrtnted Oxygen) [52] tech-
nology. It is fabricated simply by the formation of buried oxide (SiOl) by
implantation of oxygen underneath the surfsce of the silicon as illustrated in
Fig. 2.22. Dose and energy of oxygen ions are as high as 2 x 10'8m-2 and
200 KeV respectively. A subaqaent thermal annealing at high temperature
is performed to improve the qoality of the silicon overlayer. The buried oxide
can be several hundreds of n m thick and the thin silicon layer can have several
tens of n m thickness. Compared to SOS, SO1 SIMOX materials have better
defect density and thin silicon layer control. The dislocation density can be
lower than lO'~rn-~. One important phenomenon which u i r t s in CMOS SO1
devices is the kink effect. It consists of B "kink" which appears in the out-
put characteristics of an SO1 MOSFET, as illustrated in Fig. 2.23. It is due
mainly to the floating sobstrate of an NMOS device. An explanation of this
phenomena c a n be found in [51].
Drain
Kink effect
Drain Voltage
Figure 2 . m Kmk effect m tbc ouipvi chsrarterrslis of M SO1 MOS dcurce
The SO1 SIMOX is now m a t u n materid and represents a potential technology

for low-power applications. Several LSIfVLSl circuits have been fabricated
in SOI/SIMOX, particdarly for low-power application. Such circuits inelude
PLL (Phare Locked Loop) for wireless terminals applications [64], and 1.2-
GHe frequency divider under 1-V power mpply [55]. The SO1 technology was
applied &so to design a RUy pipelined 512-KbSRAM [53]. This SRAM worked
successfdly do- to O.? V with an access time less than 5 nr.
Pig. 2.24shows B thin film SOI/SIMOX CMOS process cross-section. The pro-
cess starts by the formation of buried oxide in silicon wafer ar explained above
in [Fig. 2.24(a)]. Then, an oxide is grown on the surface silicon and 8 nitride
hyer is deposited. Silicon nitride is used as n mark to protect the active region
from oxidation. The nitrideloxide layers are patterned and a LOCOS isolation
is applied [Fig. 2.24(b)]. At the end, the nitridejoxide layers are removed. This
is followed by P I/I to Bdjut the threshold voltage ofthe N-channel transistor.
Skilady, the threshold voltage of the P-channel transistor is edjdjnsted by I/I.
A thin gate oxide is then gmvn and a layer of polyrilicon is deposited and
doped with phosphorus. Then the Pt souice and drain regions of the PMOS
are patterned and implanted with boron [Fig. 2.24(c)]. Similarly, the N+ S/D
r@onr of the NMOS are patterned and implanted with phosphorus. A thick
oxide is then deposited BS an isolation layer between the polysilicon and the
subsequent metd layer. The oxide is etched at contact locations. N u t . the
54 CHAPTER
2
Srdp niMde and Midc

P-ChVTpimpianr
- N-ChV m paitcm
N-Ch V m implant
Gmw gale oxide
Dcparir polyrilicon
and pattern
Figure 1 3 4 M- P ~ F C S Sit- of CMOS lhin 61mSOI/SIMOX druicer.
metal l a y s (aluminum) is deposited over the whole surface. Finally, the metal
is etched and annealed.
This simple process description showsthat the SO1 process is much simpler than
bulk CMOS. Forbdance, the wells are no longer needed, and the punchthrough
implants ae also unnecessa~yif thin-film SO1 is used. Fig. 2.25 shows B
u
. .. .. .. ,. ...
56 CEAYTER 2
Due to the dielectric isolation, the MOS devices have several advantages over
bulk CMOS such as : absence of latch-up, high packing density and lower pma-
sitic capacitances. SO1 reduces the circuit capacitance by 30% [57]. It has been
discovered that if the silicon (containing the devices) is made sufficiently thin
(< IOUnm), the MOSFET’s devices are f d y depletcd [51! even when Vos = 0.
W y depleted thin film SO1 MOS dwiccs offer attractive characteristics for
CMOS applications such ar immunity from short channel effect, absence of
kink effect, superior aobthreshold leakage and high d r d n 8atursAition current
(due to low channel doping) [58, 59, 601.
Unfortunately, the technology hsr minor disadvantages such sr floating body

effects which rault in i) floating body induced threshold voltage lowering and
ii) low drain-tusauce breakdown voltage. For 1 V power supply this is not
a problem. However for 3 V operation this could be an important limitation.
Also, the threshold voltage is very sensitive to the thickness uniformity of the
superficial silicon. In addition. the low thermal conductivity of the oxide un-
derneath the thin film silicon layer is II severe problsrn when the SO1 circuit
is operating at high-frequency. Therefore technological improvements are still
needed to mlve there Limitations.
2.9 CHAPTER SUMMARY

In this chapter, we hme studied the proeerr technologies of CMOS and bipolar
devices. W e have shown that the advanced CMOS and bipolar processes me
converging, and many process techniques can be shsred for the fabdestion of
both devices. The different options for merging bipolar and CMOS devices are
then discussed. Three examples for BiCMOS processes with different eomplcx-
itier a e presented The eomplemcntary BiCMOS process is ako considered.
A table of design rules for a state-of-thcart BiCMOS technology is given for
layout exercises. Several advanced technologies such as CMOS SOI/SIMOX
and CMOS-SJET are reviewed for lm-voltage operation.
REFERENCES
[l] A F.M. Wanlans, and C.T. Sah, “Nanowatt Logic using Filed-Effect MOS
Triodes,” International Solid-state Circuits Conference Tech. Dig., pp.32-
33, 1963.
[Z] L.C. Parrillo, R.S. Payne, R.E. Davis, G.W. Ratlinger, and R.L. Field.
“Twin-Tub CMOS: A Technology for VLSl Chcuits,” International Eke-
tron Devices Meeting Tech. Dig., pp. 752-755, December 1980.
[3] Y. Tam et al., “High-Performance 0.1 pm CMOS Devices with 1.5 V Power
Supply,” International Electron Devices Meeting Tech. Dig., pp. 127-130,
December 1993.
141 K. F. Lee et al., “Room Temperatare 0.1 pm CMOS Technology with 11.8
ps Gate Delay”, International Eleetmn Devices Meeting Tech. Dig., pp.
131-134, December 1993.
[5] K. TaLeuchi et al., “0.15 pm CMOS with High Rdiability and Perfor-
mance”, International Electron Devices Meeting Tech.Dig., pp. 883-886,
December 1993.
[6] T. Yamaeaki, K. Goto, T. Fukano, Y. Nara, T. Sn@, and T. Ito, “21 pr
Switching 0.1 pm-CMOS at Room Temperature using High Pedormance
Co Salicide Pmcess,” International Electron Devices Meeting Tech. Dig.,
pp. 906-908, December 1993.
[7] A. Oyamatsu, K. Kinugawa, and M. Kalrumu, “Design Methodology of
Deep Submicron CMOS Dwices for 1 V Operation,’ Symposium on VLSI
Technology Tech. Dig.,
pp. 89-90, 1993.
[8] B. Yoshimma, F. Mdatsooka, and M. K a l r m u , “New CMOS Shallow Junc-
tion Well FET Structure (CMOS-SJET) for Low Power-Snpply Voltage,”
International Electron Devices Meeting Tech.Dig., pp. 909-912, December
1992.
[9] T. Uehino, T. Shiba, T. Kikuehi, Y. Tamaki, A. Watansbe, Y. Kiyota,
and M. Honda, “15-pr ECL/74-GAz ft Bipolar Technology,” Intecnational
Electron Devices Meeting Tech. Dig., pp. 67-70, December 1993.
58 LOW-POWERDIGITALVLSI DESIGN
[lo] T.B. Ning, and D.D. Tang, "Bipolar Trends," Proe. IEEE, vol. 74, no. 12,
pp. 1669-1671, December 1986.
[Ill T. Nabamnra, T. Miyslaki, S. Takahashi, T. Kure, T. Ohabe, end M.

Nagata, "Self-Aligned Bipolar Transistor with Polysilicon Sidewall Base
Electrode far High Packing Density and High Speed," IEEE Journal of
Solid-state Circnits, vol. 17, no. 2. pp. 226-230,April 1982.
1121 T.H. Ning, and R. D. Isaac, "Effect of Emitter Contsct on Current Gain
of Silicon Bipolar Devices," IEEE Electron Device Letters, ED-27, pp.
2051-2055, November 1980.
[I31 A.K. Kspoor and D.J. Rodston, "Pdysiliilicon Emitter Bipolar 'IkansiS-
tors," IEEE Press Book, 1989.
[14] M.I. Elmbsry, *Digital S i p o h Integrated Circnita," John Wiley & Sans,
New York, 1983.
\IS] B. h a + , Y. Ota and R.G. Swart., =Design Techniques for Low-Voltage
High-speed Digital Bipolar Circuits," IEEE J. Solid-state Circuits, vol.
29. no. 3, pp. 332-339,March 1994.
[16] W.Wilhelm and P. Weger, "Low-Power Bipolar Logic," Inteznational Solid
State Circuits Conf. Tech. Dig., pp. 94-95, February 1994.
[I71 E. Kooi, J.G.Van Lierop, and J.A. App&, "Formation of Silicon Nitride
at II Si-SiOz Interface during Local Oxidation of Silicon and During Heat
Treatment of Olddbed Silicon in NE, Gas," J. Electrochem. Soc., vol.
123, p. 1117, 1976.
[I81 R.D.Rung, H.Momore, and Y. Nagakubo, 'Deep-Trench Isolated CMOS
Devices," International Electron Devices Meeting Tech. Dig., pp. 6-9, D h
eember 1982.
1191 T. Yamaguchi, S. Morimoto, G. K-wamoto, H.K. Park, and G.C. Eiden,
"High-speed Latch-up Free 0.5 pm-Chamel CMOS using Self-Aligned Ti-
Si and DeepTrench Isolation Technologies," International Electron De-
vices Meeting Tech. Dig., pp. 522-525, December 1983.
[20] R.D. Rnng, "Trench Isolation Prospects for Application in CMOS VLSI,"
International Electron Devices Meeting Tech. Dig., pp. 574-577. December
1984.
[21] A. Mikashiba, T. Homma, and K. Hamano, "A New Trench Isolation
Technology as a Replacement for LOCOS," International Electron Devices
Meeting Tech. Dig., pp. 578-581. December 1984.
REFERENCES 59
[22] P. Singer, "Selective Epitaxial Growth Finds New Applications," Semicon-

dnctor International, p. 15, January 1988.
[23] R.A. Chapman, et al., "An 0.8 mzm CMOS Technology for Eigh-
Performance Logic Applications," International Electron Devices Meeting
Tech. Dig., pp. 362-365, December 1981.
[24] K.Y. Chiu, R. Fsng, J. Lin, and J.L. Moll, "The SWAMI- A Defect Free
and Near-Zero Bird's Beak Local Oxidation Technology for VLSI," Symp.
on VLSI Technology Tech. Dig., pp. 28-29, 1982.
[ZS] K.Y. Chin, J.L. Moll, and J. Manoliu, "A Bird's Beah free Local Oxida-
tion Technology Fearible for VLSI Circuits Fabrication," IEEE Trans. on
Electron Devices, vol. ED-29, pp. 536-540, 1982.
[26] 3. Aui, P. Vande Voorde and J. Moll, "Scaling Limitations of Suhmi-
won Local Oxidation Technology," International Electron Device Meeting
[27] H.B. Pogge, "Trench Isolation Technology,' Bipolar Circaits and Technol-
ogy Meeting Tech. Dig., pp. 18-25, September 1990.
[28] Y. Nits", ~~~~~~~-up Ree CMOS Structnre using Shallow lkench Isola-
tion," International Electron Devices Meeting Tech. Dig., pp. 509-512,
December 1985.
[29] H. Yamamoto, 0. Mieuno, T. Kubota, M. Nakamae, A. Shiraki, and Y.
Ikurhima, "High-Speed Performance ofa Bwic ECL Gate with 1.25 Micron
Design Rule," Symp. on VLSI Technology Tech. Dig., pp. 38-39, 1981.
[30]Y. Tamaki, T. Shiba, N. Honma, S. Miauo, and A. Hayas&, "New U-
Groove Isolation Technology for High-speed Bipolar Memory," Symp.
VLSI Technology Tech. Dig., pp. 2425, 1983.
[31] D.D. Tang, P.M. Solomon, T.H. Ning, R.D. Isaac, and R.E. Burger, "1.25
mwn DcepGmove-Isolated Self-Aligned Bipolar Circuits," IEEE Journal
of Solid-State Circuits, vol. SC-11, pp. 925-931, 1982.
[32] H.C. Lin, J.C. Ro, R.R. Iyer, and K. Kwong, "CMOS-B$pIar Transistor
Structure," IEEE Trans. Electron Devices, "01. ED-26, no. 11,pp. 945-951,
November 1969.
[33] T. Ikeda, A. Watanabe, Y. Nishio, I. Mwuda, N. Tamba, M. Okada, and
K. Ogiue, "High-Speed BiCMOS Technology with a Buried Twin Well
Structure," IEEE Trans. on Electron Devices, vol. ED-34, no. 6, pp. 1304
1309, June 1987.
60 DIGITAL VLSI DESIGN
LOW-POWER
1341 H. Momose, K.M. Cham, C.I. Drowley, H.R. Grinold., and R.S. Fu, "0.5
Micron BiCMOS Technology," International Electron Devices Meeting
(35) A.R. A l w e a , 3. Teplik, D.W. S c h d m , T. Hnlsemh, H.B. l i n g , M. Dy-
dyk.snd I. &him, "Second Generation BiCMOS Gate Array Technology,"
Bipolsr Circnits and Technology Meeting Tech. Dig., pp. 113-117, 1987.
1361 B. Bastani, C. L a g , L. Wong, J . Small, R. Lahri, L. Bouknight, T. Bow-
man, J. Mao~liu,and T. Tunt-od, "Advanced l Mimm BiCMOS Tcch-
0010gy for High Speed 256k SRAM'r," Symp. on VLSI Technology Tech.
Di.,pp. 41-42, 198~.
[37] T. Y-guchi and T.H. Yuanriha, 'Process Integration and Device Per-
formance of B Submicron BiCMOS with 1GGHB f< Doable Poly-Bipolar
Devices," IEEE Trans. on Electron Devices, "01. 36, no. 5, pp. 890-896,
May 1989.
[38] C. K. Lau, C-H Lin and D.L. Packwood, "Sub-micron BiCMOS Procer.
Design for Manufaoturing," Bipolar/BiCMOS Circuits and Technology
Meeting Tech. Dig.,pp. 76-83, 1992.
[39] C. H.Wang and J. Van Der Velden, '"A SinglcPoly BiCMOS Technology
with a 30 GHa Bipolar A," Bipolar/BiCMOS Circuits and Technology
Meeting Tech. Dig., pp. 234237, October 1994.
[40] 8. Yoshida, H. Suziki, Y. Kinoshita, K. Imai, T. Ahnoto, K. Toksshiki,
and T.Yamaaaki, "Process Integration Technology for Low Process Com-
plexity BiCMOS using Trench Collector Sink," Bipolar/BiCMOS Circuits
and Technology Meeting Tech. Dig.,pp. 230-233, October 1994.
[41] J. M. Sung et al., "BESTP- A High Performance Super-Aligned 3V/5V
BiCMOS Technology, with Extremely Low Paraaitics for Low-Power
Mixed-Signal Applications," IEEE Custom Integrated Circuits Conf. Tech.
Dig., pp. 15-18, May 1994.
[42] H.J. Shin, "Performance Comparison of Driver Configorations and M-
Swing Techniques for BiCMOS Logic Circuits," IEEE Jorunal of Solid-
State Circuits. "01. 25, no.3, pp. 863-865, Jone 1990.
[43] S.H.K. Embabi, A. BeUaouar, M.I. Elmarry, andR.A.Hadaway, "New Full-
Vdtag&wing BiCMOS Buffers," IEEE Journal of Solid-state Circuits,
vol. SC-26, pp. 150-153, February 1991
REFERENCES 61
[44] M. Hiraki, K. Yam,M. Mioami, K. Sato, N. Matsumki, A. Watanabe,

T. Nirhida, K. Sasa!&, and X. Seb, "A 1.5-VFull-Swing BiCMOS Logic
Circuit," IEEE Journal of Solid-State Circaits, vol. 27, no. 11, pp. 1568-
1574, November 1992.
[45] Y. Kobayashi, C. Yamaguchi, Y. Amemiya, and T. Sakai, '"High Petfor-
mmce LSI Process Technology: SST CBiCMOS," International Electron
Devices Meeting Tech. Dig., pp. 760-763,December 1988.
[46] K. Higashitmi, H. Honda, K. Ueda, M. Hatanalra, and S. Nagao, "A Novel
CBi-CMOS Technology by D I P Process," S p p . on VLSI Technology
Tech. Dig.,pp. 17-78, 1990.
[47] T. Maeda, K. Ishimaru, and H. Momose, "Lower Submicron FCBiMOS
(Fully Complementary BiMOS) Proeerr with RTP and MeV Implanted
5GHs Vertical PNP Transistor," Syrnp. on VLSI Technology Tech. Dig.,
pp.19-80, 1990.
[48] W.R. Burger, C. Lage, B. Landau, M. DeLong, and J. Small, "An Ad-
vanced 0.8 Micron Complementary BiCMOS Technolorn for Ultra-High
Speed Circuit Performance," Bipolar Circuits and Technology Meeting
[4Q] S.W. Sun, et al., "A Fully Complementary BiCMOS Technology for Sub-
Half-Micrometer Microprocessor Applications," IEEE Trans. Electron De-
v i e r , "01. 39, no. 12. pp. 2733-2139, December 1992.
[SO] T. Ikeda, T. Naksrhima, S. Kubo, A. Jonba, and M. Yamawaki, "A High
Performance CBiCMOS with Novel Self-Aligned Vertical PNP," Bprt
lar/BiCMOS Circuits and Technology Meeting Tech. Dig., pp. 238-240,
October 1994.
[51] J . P. Colinge, "SO1 Technology: Materials to VLSI," Kluwer Academic
Publishers, 1991.
[52] K. Izumi, M. Doken, and H. Ariyoshi, "CMOS Device Fabricated on Buried
SiOz layers Formed by Oxygen Implanted into Silicon," Electron. Lett.,
vol. 14, pp. 593-594, 1978.
[53] G.G. Shahidi, T.H. Ning. R.H. Dennard and B. Dawri, "SO1 for Low-
Voltage and High-speed CMOS," International Conf. SSDM, Japan. pp.
265-267, 1994.
I541 Y.Kado, T. Ohm, M. Harada, K. Deguchi, and T. Tsuehiya, *Enhaneed
Performance of Multi-GHz PLL LSls uabg Su&l/4mkon Gate Ultrathin
62 LOW-POWER
DIGITALVLSI DESIGN
Film CMOS/SlMOX Technology with Synchrotron X-ray Lithography”,

IEDM Tech. Digest, pp. 243-246, December 1993.
(551 M. Fujishima, K. A d a , Y. Omura and K. Irumi, “Low-Pow,, 1/2 Re-
quency Dividers ~ & g0.1-pmCMOS Circuits Built with Ultrathin SIMOX
Substrate,” IEEE Journal of Solid-state Circuits, ml. 28, no. 4, pp. 510-
512, April 1993.
1561 T. Ohno, Y. Kado. M. Hsrada, and T. Truchiya, “A High-Performance

Ultra-Thin Quarter-Micron CMOS/SIMOX Technology,” IEEE Sympo-
sium on VLSI Technology Tech. Dig., pp. 25-26, 1993.
1571 Y. Yamaguchi, A. Ishibarhi, M. Shimiau. T. NiPhimura, K. Tsu);amoto. K.
Aoric, and Y. Akasaka, “A High-speed 0.6-pm 16K CMOS Gate Array on
8 Thin SIMOX Film,” IEEE Trans. Electron Devices, vol. 40, no. 1, pp.
179-186, January 1993.
158) J. P. Colinge. “Subthreshold Slope of Thin Film SO1 MOSFET’s,” IEEE
Trans. Electron Device Letters, pp.274-276, September 1988.
1591 J. C. Sturm, K. Tokunaga, and J. P. Colinge, “Inereared Drain Satura-
tion Current in Ultrnthin SO1 MOS Transistors,” IEEE Electron Device
Letters, vol. 9. no. 9, pp. 460-?, September 1988.
1601 Y. Omura, S. Nakashima, K. Pumi, and T. Ishii, ‘‘O.l-pmGate Ultrathin
Film CMOS Devices using SIMOX Substrate with SO-nm Thick Buried
Oxide Layer,’ IEDM Tech. Dig., pp. 675-678. December 1991.
3
LOW-VOLTAGE DEVICE
MODELING
The objective of this chapter is two-fold. It is intended to review the basics

of the MOS transistor, which is a prerequisite for Chapters 4. to 7., and to
introduce commonly used models of both MOS and bipolsr devices [Sections
3.1, 3.2, and 3.61. In this chapter we consid- simple analytical models which
can be used for circuit analysis and deign of deeprubmicrometer MOSFET's
at low-voltage. Also, a simple model to compnte the leakage current of MOS-
FET's is presented [Section 3.31. The more sophisticated SPICE device models
are also presented to d w the reader to appreciate the meaning of the model
parameters as well as the capabilities and limitations of there models The
SPICE parameters for the 0.8 pm CMOS/BiCMOS p r o w s presented in C h a p
ter 2 are included in this chapter for readers who are interested in designing
and simulating low-uoltage CMOS circuits as well as BiCMOS circoita. In See-
tion 3.4, supply wltage scaling due to reliability and power dissipation issues
is presented.
3.1 MOSFET STRUCTURE AND OPERATION

Fig. 3.1' shows crosssections and views of an N-channel MOS transitor. By
applying a positive voltage on the gate Vos, .e depletion layer is imdduced in
the channel. Fnrther increase in VoS results in a surface inversion layer. The
channel width and length nrperliudy

64 CHAPTER
3
surface charge of the semiconductor (Qs cod/cm2) is equal in magnitude to

the charge of the gate electrode (QGeoul/ema). Thus, we have
4 s = - Po = (Vos - VPB d.)C,

~ ~
(3.1)
where Vos is the gate-source voltage and d, is the semicondnctor surface PO-
tential. C, is the gate oxide capacitance per unit area and is given by
<o
c., = - (3.2)
t.,
where eo is the oxide permittivity and ,t in the gate oxide thickness. The
flatbaod voltage VFBis given by
Qo is the total of dl charges in the oxide and near the interface oxide/silicon.
This charge is positive. The work function difference between the gate electrode
and the semiconductor d,, depends on the type ofthe electrode and the doping
concentration of the semiconductor, For an aluminum electrode, we have
dm, = -0.61 + dt (3.4)

For N' polysilicon electrode, we have
4". = ~ 0.55 + $f (3.5)

The fcrmi potential $1in Equations (4.4) and (4.5) is given by
N.
4fP = -&In(-) l i
for P - t y p e si (3.6)
Nd
$f,, = +Kin(-) f o r N-type Si (3.7)
ni
where K = K T / q . The charge Qs is the s u m of the charge in the depletion
layer QB and the inversion layer QI.Therefore;
vos = vrs + b, -
QB +&I
___ (3.8)
The bulk depletion charge (per unit are*) consists ofioniied acceptors (P-type
substrek) or donois (N-type substrate). The depletion charge ofB P-type bulk,
with zero biss b&-s-aouree voltage (VBB = 0), is given by
QBD = -9NaWn (3.9)

Low-Voltage Device Modeling
NMOS enhancemen1 NMOS dcplclion PMOS enhancement

mode mode mode
(bl
Figure 9.1 (a)The layout and ~ m s a - s c ~ t i o n n l r of

i~mn NMOS tzanrislor;
(b) Symbola of different types of MOS tronnirtorr.
66 CHAPTER3
where the q is the electron charge and N . is the donor concentration. T h e

width of the depletion layer in the bulk ( W D )is given by
(3.10)
The tnm-on (or threshold) voltage of an NMOS transistor is defined as the

gate-source voltage at which the surface potential 4. is equal to 21dt[. This
condition also defines what is known as the strong inversion'. At the onset of
strong inversion we can assumc that Qs ii: Q B . Using Equation ( 3 4 , we can
write the following expression of the threshold voltage
880
VTO = VPB t 4, - - (3.11)
Go,
QBO is eqnal to -qN.W,,, where W D , = W D ( ~=. 21dj1)3. Thus, the
threshold voltage can be rewritten as
If the bulk-source is reverse biased (IVBBI> O), the threshold voltage becomes
VT = VPB t 21$fl + WJ"(lv5al + zl4fl) (3,13)

c.,
This equation can be rewritten
VT = K"0 t 7(t/iiGmcl - &i)

(3.14)
where the body effect coefficient 7 is given by
(3.15)
Low- Voltage Device Modeling 67
This valoe is negative and is not suitable for digital circuits where a positive
VTIlis ieqmked fox switching. To get a reasonable VTo,the device rnrface is
implanted with boron. The implanted dose DI came$ VTo to increase by the
amount qDi/C,. The threshold voltage is hence given by
VTo = VFB + W,I t 7fi + ,?$ (3.16)
Consider now the previous example, with DI = 1.725 x 10'2cm-' and 7 =

0.238 V1i2we find that VT is equal to 0.7 V when lVss 1 = 0 V and is equal to
0.98 V when IVaai = 3.3 V .
The symbols of the NMOS and PMOS transistors are shown in Fig. 3.l(c).
Typical values of the VT are -2.5 V to -4 V far depletion-mode NMOS devices.
For low-voltage CMOS they a m 0.3 V to 0.8 V for enhancement-mode NMOS
devices, -0.3 V to -0.8 V for enhancement-mode PMOS devices.
When VGs < VTO,the transistor is in the cuiqffwgion, since no inversion layer
exists, 85 r b w n in Fig. 3.2(a). The drain current is, therefore, approximately
zero. When VGs > Vm, the channel is formed and a drain current flowsfrom
the dm.b to the source [Fig. 3.2(b)]. The transistor is in the linear region (&o
called ohmic wgion) when VOD( i . VGE ~ - VDS) 2 VT. When Vcr > VT a d
VDs > Vos - VT (ix. Vco < VT) the channel is pinched off as illustrated
in Fig. 3.2(c) and the device enters the solurntion region. The drain-source
voltage which causes the channel to pinchoff at the drain edge is commonly
~ is equal to Vcs VT.
known as the saturation d r a k s o u r c e voltage V D S . . and ~
The voltage drop between the pinchoff point and the wmce is VDS,.~.Any
VoS higher t h m V D S , .will
~ appear between the pinchoff point and the drain.
If we assume that the distance between the piacbaff point and the drain is
extremely small compared with the overall length. then for VDS> V D S , . the~
drain current is constant. The carriers which reach the pinchoff paint are swept
across to the drain by the potential (VDS- Vns..,) between the drain and the
end of the channel.
68 CHAPTER
3
LowVoltage Device Modeling 69
3.2 SPICE MODELS OF TBE MOS TRANSISTOR
3.2.1 The Simple MOS DC Model

Let us now ana1y.e the simple DC model describing the I-V characteristics of
an MOS transistor.
From Pip.3.3 it C L L be
~ shown that the element dz har a resistance
(3.17)
We assume that the mobility ( p ) of the electrons in the channel of an NMOS

device is constant. A cnrrent IDS crossing the incrementd resistance d R causes
a voltage drop of
dV = IosdR (3.10)
Sobstitutlng from Eqoation (3.11) in Eqnation (3.10) and integrating from the
sonrce to the dinin, we obtain
70 CHAPTER3
To solve thL integration, we need to express the electron inversion charge den-
sity QI(=) in term of V . From Equation (3.8), we have
Vos - V ~ B - + QBO
C
.
~
1 C, (3.20)
The surface potential 4, at any point z dong the channel is equal to ZlQfI +
V ( z ) . By substituting for VFB- Qso/C, +
2l$fl by [Equation (3.11)] in
Equation (3.20) we get
Q r ( a ) = 4 V c e - VTO - V (x ) l G (3.21)
The surface potential at the drain is larger than that at the Y ) ~ C C by VDs.
Therefore, the magnitnde of Q I decreares with the distance across the channel.
This is why the inversion layer is triangular a illustrated in Fig. 3.3. Assuming
that QBO is constant across the channel and substituting for Qi from Equation
(3.21) into Eqnation (3.19), we obtain
where kp is B process-dependent parameter defined as kp = pCs=. Equation

(3.24) is valid only for VDS 5 V D S , . ~(ohmic region). W h e n VDS exceeds
V D S . . the
~ drain-source current saturates. The saturation current can be found
by substituting for VDSby V D S , ,in
~ Equation (3.24) and is hence given by
The characteristics ofan MOS transistor based on Equations (3.24) and (3.25)
are s h o w in Fig. 3.4. The cnrrent eqnations (3.24) and (3.26) have to be
modified if the bulk-source voltage is greater than eero by replacing by
VT [see Eqnation (3.14)]. Note that when VDSis small (say 60 mV), Equation
(3.24) can be a p p r o h a t e d by
Low-Voltage Device Modeling 71
72 CHAPTER3
This equation expresses B linear relatiomhip between I D S and Vos. Using lin-
ear extrapolation, VTO and k p p can he determined 8s shown in Fig. 3.4(h).
-9,
The measured I-V characteristics show that the drain cnnent, in the saturation
region, iS a weak function ofVDs. This is due to the channel length modulation
phenomenon which can be explained s follows. Let us define
LLll = L.fl - AL (3.27)

where AL is width of the depletion layer between the pinchoff point and the
drain as shown in Fig. 3.5. The voltage wrom this depletion layer is VDS-
V D ~ ,therefore
~ ~ , AL can be written as
The corrected saturation current becomes
If we assume that AL
&Ill
<< 1, then we cam rewrite the current as
The ratio can be related to VDS by the following empirical relation
_
AL -
- XVDS (3.31)
L m
Thc channel modulation factor X is very small. A typical value of X is 0.01
V-?
The drain current model described, so far,is known as the LEVEL I (MOSI)
model in SPICE'. Thi. model is also d e d the Shiehman-Hodgea model. How-
eveq this model b still very simple' to accomt for state-of-thtart CMOS
devices and might lead to B 100% error in the current particularly for low-
voltage deepsubmicrometer CMOS devices. However, kp ( or p ) can be used
as D fitting parameter to reduce this error. This model in most suitable for
preliminary analysis.
4SPICE1GBor 381 oz 3C1.
'Tbis model 1- used in the 70's.
3.2.2 Semi-Empirical Short-ChannelModel (LEVEL 3)

The MOS3 model (or MOS LEVEL 3) has been developed for short- and
narrow- channel MOS ( L <_ Zpm, W 5 ZFm) [I]. The MOS3 model har the
following features (compaed to MOSI):
* A model for mobility degradation with the vertical abd the horizontal
electric fields;
rn A model for the threshold voltage of short- and narrow- channel devices
(the (Drain Induced Barrier Lowering (DIBL)effect is accounted foz);
An improved model for the channel length modulation phenomenon;
m Weak im&m conduction (subthreshold conduction).
The threshold voltage expression is given by [I]
VT = VFB t 214~1- UVDS t .rfs"sJ2l4rl+IVBBI + FN(ZI+FI+IVBBI)

(3.32)
7 in thir expression is 9;wn by Eqoation (3.15). This expression includes:
74 3
CHAPTER
. The static Ceedback effect codficient (r (Due to DIBL effect) [2]
(3.33)
where ’1is an empirical coefficient;

m The correction factor for short-channel &eft is based on a modified trape-
aoidal approach for calculating the charge Q B [Fig. 3.61. The correction
factor can be obtained from [3]
where W,, the depletion layer width of a cylindricsl junction and is given
by
We = 0.0831353+ 0.8013929-
W D
- 0.0111077(-)W D ’ (3.35)
2, 2,
m The correction factor for narrow-&-el MOS is given by
3.2.2.1 Mobility degradation:
The mobility degradation due to the vertical electric field is modeled by the
following simple equation [4]
where B is an empirical constant which depends on the oxide thikness. A

typical value of 0 is 0.05. To account for the effect of lateral average electnc
field, the effectivemobility is related to the drhin-source voltage and the channel
length by I41
(3.38)
In this expression, when the device operates in the saturation, Vos is replaced
by VosSct.
Lou-Voltage Device Modeling 75
3.2.2.2 Chunnel length modulation

When VDS 2 VDS,.,, the channel length is modulated by an amount AL.
This channel length redoctian is formulated in MOS3 by Baum'r model [5]. In
this model the voltage ~ C I O Q S the depletion surface oflength A,? is modeled by
I;(VOS- VDS.,,). x i s a fitting parameter.
3.2.2.3 Drnin current

In the LEVEL 1model of SPICE, the drain current in the weak inversion region
was assumed eero. The modeling of the subthreshold current in LEVEL 3 is
based on the analysis by Swanson and Meindl [6]. The drain culrmt in weak
inversion, which is b i d y L diffusion current, is given by
IDS = ~,el(var-v..)/nv,l (3.39)
where
v, = v, + nvl (3.40)
end
n = 1 +
0"s + Ca (3.41)
c,
76 CHAPTER3
where
dQs (3.42)
= dVsa
and Nps is a curve fitting parameter. V, marks the point between the weak
and strong inversion modes. Typical d u e s of n range &om 1.0 to 2.5. I , is
related to the c u r e d of Equation (3.39) by taking Vos = V,.
Fig. 3.7 illustrates the transfer characteristics of the weak inversion and drift
model. The voltage V , insures the continuity of the current, but it is dear from
the figure that at Vo3 = V, a discontinuity exists in the derivative. Therefore,
the MOS3 model is not precise in simulating the intermediate region where the
diffusion and drift currents are comparable.
In the strong inversion, the drsj, cuprent can be expressed as
The threshold voltage along the channel is given by
VT(Z) = VT t 7Fs(\lI24~1t IVBSI t V ( z ) - d m ) + FNV(=)

(3.44)
Using Taylor series expansion, W L have
VT(5) = VT + (1+FB)V(Z) (3.45)

By sobstituting for VT GornEquatian (3.45)in Eqoation (3.43),andintegrating

we obtain the following expression for the drain current
IDS = P c f / c o z w c j f L c / f [vC3 - VT 1 + Fg
- 7 V D . I VDS (3.47)
The saturation voltage, which taker into aecomt the carrier velocity saturation
effect, is gi~a.by
VDS,d = v,,, + v. - fi (3.48)
where
Knc = (Vcs - &)/(I + F s ) (3.49)
v. = v,,.L,ffIP. (3.50)
a b l e 3.1 shows the CMOS device and ASPICE panmeters correspondence.

Typical values for parameters of LEVEL 3 are shown in Table 3.2 for MOS
devices of the 0.8 pm BiCMOS proces described in Chapter 2.
The LEVEL 3 model approximates the device physics and relies on the proper
choice of the empirical pammeters t o accurately reproduce the device charac-
teristics.
3.2.3 BSIM Model (LEVEL 4)

BSIM (Berkeley Short-Channel IGFET Model) is a simple and accurate short
channel MOS transistor model I?].
It is implemented in SPICE as LEVEL 4.
The model was tested for effective channel length down to 1 p m . This model
inelodes:
Vertical field dependence of carder mobility;

Carrier velocity saturation;
.
rn Drain-induced barrier lowering effect;
Non-uniform doping in the channel surface and sub-surface regions effect;
3
CHAPTER
TBble P.1 CMOS dcvicc parsmetu and HSPICE ccrrsrpondmec
Pnramaer SPICE Description

Keyword
LEVEL Model level

VTO Zero-bias thrcshold voltage
TOX Gate oxide thickness
NSUB Substrate doping
NFS Surface fast state density
UO Surface mobility
VMAX Madmvm drift velocity of carderr
ETA Static feedback on threshold voltage
KAPPA Saturation field factor
THETA Mobility degradation factor
DELTA Width effect on threshold voltage
XJ Junction depth
CJ Zero-bias balk junction cspacitanee
JS Buk junction saturation current
JSW Sidewall balk junction saturation uurent
MJ Balk junction grading coefficient
PB Junction potential
CJSW Zero-bias side wall capacitance
MJSW Sidewall cspacitsnee grading c o d
CGDO Gate-drain overlap capacitance
CGSO Gate-rource overlap capacitance
CGBO Gate-bulk overlap capacitance
RD Drain ohmic resistance
RS Source ohmic resistance
ID Lateral diffosion from drain or source
WD Laterd dXusion dong the width
XL Making and etching effects on W
xw M d m g and etching effects on L
ACM Area calculation method
LDlF Lateral diffusion beyond the gate
Low- V07tage Device Modeling 79
Table 3.2 ESPICE MOSFET =odd p m t t - (LEVELs1) ( 0 8 p m BxC-

MOS p.accs8)
SPICE Keyword N.Channel PChannel Units
LEVEL 3 3
VTO 0.8 -0.9
TOX 17.5 Y 10-9 17.5 x 10-9
NSUB 3.23 x 10" 3.37 Y 10'6
NFS 820 Y 10s 764 Y 10'
uo 503 165
VMAX 150 x lo8 190 x 108
ETA 45 Y lo-* 121 x 10-8
KAPPA 6.7 10-3 1.45
THETA 63.4 x lo-' 135 x 10-3
DELTA fl 728 0.336
XJ 275 x lorQ 230 x
CJ 250 x lo-' 450 x lo-'
JS 5 10-4 5 x 10-4
JSW 5.5 x 10-0 5.5 Y 10-8
MJ n.m
... 0.50
PB 0.92 0.92
CJSW 205 x lo-'' 212 x 10-'1
MJSW 0.30 0.30
CGDO 274 x 215 x lo-"
CGSO 274 x 10-12 215 Y lo-'>
CGBO 571 x 10-l' 571 x lo-''
RD 596 1189
RS 596 1189
LD 59.5 x 10-9 0.
WD 0. 0.
XL 0. 0.
xw 0. 0.
ACM 2 2
LDIF 940 x 10Wo 1 x 10-8 m
80 3
CHAPTER
rn Depletion charge sharing by the drain and source;

rn Channel-length moddtion;
Dependence of some electrical parameters on drain and substrate biases;
Better modeling of weak-, medium-, and strong- inverzion regions and
elimination of the discontinuity problem in the drain-current; and
Geometric dependencies;
3.2.3.1 Threshold voltage:

The threshold voltage is given bj
VT = VFB + 4, + K I M ~ Kd9. t IVBBI) - ?VDS (3.51)
The two parameters, K , and K,, model the effect of non-uniform doping of the
substrate on the threshold voltage. Typical values for KI and K 2 are 1 V'lz
and 0.12 iespectively. The factor q mod& the DIBL effect and accounts for
the cbsnnel-length modulation effect. It is a function of VDSand VBB.
3.2.3.2 Drain current.

When V h 5 V D ~ ,we
. ~have
PO
* '=f)
IDS =
1t UO(V0S - VT) (1 + $$V,,) 2 " )
((Vos - V*)VD, - -V&
(3.52)
where
a = 1 + 9 XI
F(Q. t IVBgl)-"' (3.53)
and
I
g = 1 -
1.744 + 0.836(h + ~ V B B ~ ) (3.54)
The parameters Uo = U&), U, = UI(VB)and po = p o ( v ~ s , Vare

~ ) bias
sensitive. For VDS > VDS..~,the drain current is given by
Low- Voltage flbevice Modeling 81
where
K' = I+..+J1+2.. (3.56)
2
and
(3.67)
The drain-source saturation voltage is given by
(3.58)
3.2.3.3 Suhhreshold curreni:

In BSIM, the total drain current is modeled as the Linear sum of a rtrong-
inversion component and a weak-inverion component I,. I , is expressed BI
(3.59)
and
(3.61)
The factor d.8 is empirkd to achieve the best fit. The Subthreshold parameter
n is a function of Vpbs and VB.
3.2.3.4 Sensirivity Factors of Model Parumerers:

BSIM user the following formula to aeeoont for the sensitivity of each parameter
to the width and length of the channel
(3.62)
where Po is an arbitrary parameter, LPo and W P o ate the Land W sensitivity

factor. of Po.
82 CHAPTER3
Another deep-submicrometer MOSFET's model called BSlM3 181 has been de-
velopcd for circuit simulrdion. It uses an. improved threshold voltage, drain
current snd chaanel-lenpth modulation mod&. The model is also simple and
has a s d number of parameters (x 25).
3.2.4 MOS Capacitances

In transient simulation, MOS capacitances are very important for CMOS and
BiCMOS circuits an&& The MOS capacitances can be divided into two
types of lumped capacitors:
the depletion capacitors of the bu&drain and bulk-source pn junctions

( C m and C B S )[Fig. 3.81.
m the capacitors associated with the gate ( C a , COD,COB.Ccsm, C G D ~
and COB,) [see Fig. 3.8, except for COB-].
3.2.4.I Juncrion Depletion Cupucirurzces

The bull-source and the bullr-drain junctions have a bottom area As and AD
respectively and B sidewall with a perimeter P, and PD respectively. Each of
the bottom area and the sidewall contributes to the total depletion cap-tance.
The bottom area capacitance is mesured per unit area, while the sidewall
capacitance is measured per unit perimeter. Both of t h e e components are
voltage dependent. As these junctioos a x normally zcyerse biased, we will
consider the case when the bulk-soures and bulk-drain voltages ( V hand V B D )
m e less than 01 equal to 0.5#j (6is the junction built-in potential).
The total bull-source and hulk-drain capacitances can be expressed by the

following reletions [l]
The exponential factor. Mj and Mi.- are in the order of 0.3-0.5. C, is the
zero-bias capacitance of the bottom jmction p a unit area and C;,- is the
eel-bias capacitance per unit perimeter.
3.2.4.2 Gate Capacirances

The gate capacitances can be divided into taro categories:
rn The fid overlap capoeiioneea: gatedrain (CGD-), gatesource (Ccs-) ,

and gate-hmk (CDBm)ovellap capacitances. Both Ccs.. and Coom exist
due to the lateral diffusion of the source and drain under the gate. They
are usually given per unit width as Coso and Cooo. The total gate-source
and gate-drain overlap capacitance is given by:
cosm = CcsoWe:r, (3.65)
coo, = COD0 W.ff (3.66)
where Cam and Cooo are eqod to C,L+ The capadtor COB, is due
to the overlap of the gate a i d e and the bulk along the channel length at
both ends of the active of the transistor. This capacitance is typically
normalined to the effective channel length, the total COB^ is hence given
by
Coaw = C O B 0 L*ff (3.67)
a4 CHAPTER3
. where Ccao is equal to C,,Wd

The nonlinear capacitance due to the c A q e of the bulk OP tAe channel.
This capacitance is actually distributed but CM be modeled by lumped
eap&tances. In the CEX when the channel does note& the capscitance
CM be expressed as
C G B = cmwc,,Lc,f (3.68)
When the device in in the linear resion the channel is extending uniformly
Gom the m n x e to the drain. The channel shields the b d k and the CB-
paeitance exists only between the gate and the channel. The gate-buk
capacitance goes to %em.The gate-channel capacitance can be oxpressed
in terms of two equd lumped capacitances, B gate-source and a gatedrain
capacitance, which am denoted Cos and CGDand are given by
1
COS = COD = FcozweffL'ff (3.69)
Finally, when the device enters saturation, the channel at the drain pinches
off and hence the gate-drain capacitance component becomes i e m while
the pste-source capacitance esa be expressed by
2
Ccr = -C,W.,fL.ff (3.10)
3
Fig. 3.9 depicts the change of the capacitance components as a fnnctbn of the
gatc-source voltage (assuming that the sourcebulk voltage is zem). The total
gate-ronrce capacitance is given by the snmmation of the Cosm and Ccs, and
s i d m l y , the total gatedrain capacitance is given by the summation of C C D ~
and COD.
The above described capacitance model can be used for circuit analysis and
eLeuit design. SPICE me8 B chargecontrol model, which IS- developed by
Ward and Dutton [$I. This modelis bared on the mtod distribution of charge
in the MOS stiuctue and its conservation.
3.3 CMOS LOW-VOLTAGE ANALYTICAL MODEL

The MOS mod& discussed previously have been developed far circuit rimu-
lators. These models (e.g. BSIM) involvc large numbers of parameters whose
value. mud be derived from device measurements. With the% models it is dif-
ficult to develop an intlutive understanding of the device behavior. Therefore,
an analytical drain current model valid for submicrometer MOSFETs operating

at lowvoltage is needed for hand calculation and first order circuit analysis,
with reasonable accuracy.
3.3.1 Threshold Voltage Definitions

The threshold voltage, VT,has some definitions which are important for the
estimation of the static power dissipation. The first definition is the utrapo-
lated threshold voltage from the characteristic IDS - V m [me Section 32.11.
Another one is the constant-current (Lo., 010 nA per width unit) threshold
voltage. These voltages do not have the same value [lo, 11). The extrapo-
lated VT has approximately 0.2 V more than the constant-current one [ll].
The extrapolated threshold voltage should be sealed down proportiondy to
the supply uoltage. This is becmse the drive (saturation) current depends on
(VDD- VT(ertrapo1ated)).
86 CHAPTER3
3.3.2 Subthreshold Current

When the threshold voltage is scaled for low power supply voltage operation,
subthreshold current increases significantly. This current a limiting fador
for battery operated circnits. As shown in Fig. 3.10, the drain current in the
subthreshold &on can be modeled by
IDS,"* = w;,,I,locv..-"l/s (3.71)

W.
where VT here ir the constant-eorrent threahold voltage. I, and W. are the
drain current and the gate width to define VT. S is the subthreshold swing
parameter. which is the gate d k g e swing required to redvce the drain uuient
by one decade. The current I, is related to VDs by
I, = I;(1 - P=/".1 (3.72)
T h e subthreshold swing is given by LIZ)
S cz 2.3K (1 + 2) Vldeeode (3.73)
where Cdisthe drplelion-layer capacitance of the sourcejdrain junctions. Thus,

S has a theoretical minimum limit which is 60 mvldeeade.
The leakage current, due to the subthreshold eandnction, is computed from
ID^..,^ when Ves = 0. Then
I l d =-
w.llIo,o-vds (3.74)
W.
Using the examples of Fig. 3.10, typical values for constant-current and ax-
trapohted threshold voltager are 0.3 V and 0.5 V respectively. The parameter
5 is equal to 75 mVldeeade and the leakage cnrrent is e q d to 1p A l p m -
When estimating the static power dissipation, the worst-c leakage current
has to be evaluated. In this E B S ~ ,the worst csre threshold d t a g e , VT,, hsr to
be used where
VT,. = VT - AVT (3.75)
AVT is the vapiation of the threshold voltage due to the process parmeters
fluctuation such BS the oxide thickness, doping profile, junction depth, gate
and width lengths, ete. AVT can be BS high as 50 mV on the same wafer
and 150 mV for different wafers. This results in almost two decades ofleakage
Low- Voltage Devzce Modeling
current increase. Also the temperature effect has to be considered when leakage
current is computed. The temperature affects both VT and S. A typical value
of the temperature coefficient of the threshold voltage is 1.6 mV decrease per
degree Celsius. The subthreshold suing, S increases by 0.25 mV/(decade.C)
[See Equation 3.731. For example, if the temperature increases &om 25 C to
75 C, the thrcshald voltage decreases by 80 mV md the leakage current equalr
30 pA/pm (initid extrapolated VT = 0.5 V). This value ib 30 timu higher than
that at 25 C. Both the temperature and process effects can result in a drastic
increase of the worst-case static power dissipation. Note that this variation of
VT greatly affects the delay of CMOS circuits a t low supply voltage, since the
drive cuirent is proportional to (VDD- VT).
3.3.3 Low-Voltage Drain Current

A part of this model is based on the one proposed by 11.31. For long-channel
devices, the carrier drift velocity v is related to the horizontal electric field E
by B simple linear relation (v = p E ) where the carrier mobility is constant. For
short-channel devices, the mobility is no longer a constant and is a function of
88 CHAPTER3
the vertical electric field in the inversion layer. At this point we prefer to use
the symbol & for the mobility to denote its dependence on the vertical dectrie
field. Also, the velocity (v) is no longer proportional to E but is gjwn by the
following twwregion piecewise empirical model [14]
where
2%.,
E. = - (3.77)
&
where the saturation velocity is equal to 8 x lo8 e m / s for electrons (NMOS
device) and 6.5 x 10e e m / s for holes (PMOS device).
The drain current in triode region (VDS5 VDS,,,)is given by [I31
The saturation current can be expressed by

ZDS8.t = "sdC-Wtfl(VOS - VT VDS.d) (3.79)
By equating (3.78) and (3.79) we can derive the following expression for V D S . . ~
VD'oS,.t = (1 - X)(VCS - VT) (3.80)
where
(3.81)
The drain current in the saturation can be rewritten a8

Ios,.r = KvSatCmWe~i(Vcs- VT) (3.82)
Note that VT,m the current eqnation, is the extrapolated threshold voltage
The mobility & for electrons UUL be expressed [l5]
fin = 240\/0.06tO./(Vcs +vT) f m NC ply-gate (3 83)

and far holes
..=( 65[O.O6t,/(V~s - V T ) ] " ~

65 [0.06t,/(T'as-VT - I)]"'
f m 'P
fop
POlY- gate
N i p l y - gate
(3.84)
where to, is in k and the mobility in cma/(Vs).Thn analytical model CM he

used for gate length down to deepsobmcmn range
Low- Voltage Device Modeling 8'3
3.4 CMOS POWER SUPPLY VOLTAGE SCALING

Scaling device feature size has been used to increase paddng density and
speed. MOSFET scaling can follow three theories:
1. Constant Electric Field (CE) scaling [16].
2. Constant Voltage (CV) scaliog [l?].
3. Quasi-Constant Voltage (QCV) scaling 1171
Expression
Dimensions
Gate oxide
Doping
Voltage
Capaeitace
current
Gate Delay
Dynamic Power
Dynamic Energy
In the CE scheme all horizontal and vertical dimensions and voltages scale
h e d y with the $ m e faetor. In the CV reheme, the dimensions are scaled,
while the voltages w e kept constant. This scenario has been the most corn-
monly used. While the constant electric field scaling is natural Lom the device
physics point of view, the constant voltage scaling is more piactical from the
systems standpoint. Changing the supply voltage every technology generation
(when the feature sizes a e scaled) is too expensive because mdtiple pow-
90 CHAPTER
3
supply generatois will be required for each PC board. However, BS the channel
length scales helow sboat 0.6 p m the 5 V supply voltage must be reduced for
reliability rea~ons(e.6. hot carrier effects, breakdown, ete). The quasi-constant
voltage scaliog is an intermediary scheme between the CE and CV views. The
@c&g factors of the hoiieontal dimensions and the volts@ are denotd by kh
and !ex, rerpectively. Table 3.3 summluiees the scaling ef the important de-
vice parameters according to the three theories as a fonction of the horizontal
scaling factor (kh). Note that in the QCV scheme, the dimenions scale more
aggressively than the voltage (k, = kh'.)
For the drain current, the following average value is used
IDS (I W/LC,(VOS - VT)'.5 (3.85)
Thk expression is not far fiom the one propored by [El. Table 3.3 shows the
erect of device sealing on the delay, power and energy. It is assnmed that a gate
drives other gates, where the load is mainly the gate cspscithnce. The threshold
voltage is sealed proportional to VDD rcsling. The gate delays imprave with
scaling for all the scenarios, but with II better rate in the CV scheme. However.
the dynamic power. at maximal frequency, of the gate increases by a factor k;'
in the case of CV. For the CE scheme, the power is reduced by a high factor
equal to kF6. Also in this Table, the dynamic energy dissipated by a gate is
reported. This is independent of fkquency. For all schemes, it has improved
significantly, particularly for the CE case.
Scaling the snpply voltage is an efficient way to reduce the power consomption.
However, to get B better performance 8t low-Vdtagge the device sizes and the
threshold voltage have to be properly scaled. For B fixed sub-micron technology.
the supply voltage can not be reduced aggressively, otherwire the *peed is
degraded. However, for each fixcd technology generation, there is a lower limit
power supply voltage VDD,~, [la]. For VDD'S higher than this minimum limit
the speed does not improve significantly. Typical d u e s for VDD,~,are, 3.3
V and 2.5 V for L.,j of 0.5 pm and 0.3 pm, respectively. On the other hand,
the h i e r lrmit of V ~ isDdriven by the reliability and the power dissipation
limiitation. The d n e of this VDD is proportional to the s p a r e root of design
rules (6) [IS]. For 0.6 pm and 0.3 pm design rules with LDD structure, these
high limits are 4.5 V and 3.3 V, renpeetively.
3.5 MODELING OF THE BIPOLAR TRANSISTOR
3.5.1 BJT Structure and Operation

Fig. 3.11 shows a cross-sectional view of a NPN bipolar junction transistor
with geometrical layout and the corresponding symbols for NPN and PNP.
To understand the basic operation of the bipolar transistor, one dimensional
representation ofthe active mgim can be used. Fig. 3.12(a) illustrates a typical
profile of the one-dimensional section of the active region [Fig. 3.12(b)]. The
N+PN- sand+& farms the heart of BJT.
Consider an NPN transistor with VBE> 0.5V and VBC < OV (forward-active
mode). The corresponding energy band diagram is shown in Fig. 3.12(e).
When the NtP (emitter-base) junction is forward-biased, electrons are in-
jected from the emitter into the base (current In=).A small fraction of these
electrons recombine in the neutral base (I,B)8. The rest of the electrons, of
which the cmrent I,, is constituted, diffosc through the base towards the
reversebiased base-collector jnnction where they are swept by the electric field
into the basecollector depletion kym. On the other hand, some of the holes
in the base are injected into the N+ emitter region resulting in a current I p ~ .
This component is small compared to I.B because the hales' concentration in
the base ia much smaller than the electron concentration in the emitter. The
emitter-bare depletion layer can be B rite for the recombination between the in-
jected electrons and holes resulting in B current I,..,. Moreover, some holes ate
swept into the base dne to the generation in the basecollector depletion &on,
but this component is very small ( cz 10-'7A/pm2). The terminal currents can
be -€ten 11% follows
Ic = I..c (3.86)
IB = Za t L d + Ira (3.87)
4 = I,& +I d + IPE (3.88)
Note that it has been asmmed that the base and collector currents ere flowing
in the device, while the emitter coxrent is a0-g out of it [Fig. 3.121. The
emitter bjection efficiency, which is defined as the ratio of the electron's current
iojected into the base to the total emitter eorrent, is by
(3.89)
92 CHAPTER3
./ N-well
This ratio has to be nem unity; thst is, the emitter current should mostly
be due to electrons for an NPN transistot. The ratio
1C
-
fl= - (3.90)
IB
is defined the DC curcent gain.

Lou-Vololtage Device Modeling 93
94 CHAPTER3
When the emitter-base junction is reversebiased and the collector-base jam-

tion is forward-biased, the transistor is in the inverse xpion where the emitter
and collector may be exchanged. When both junctions are reverse-biased the
transistor is in the cutoflregion. But when they are forward-biased, the device
is said to be in the astoration repion. In this situation, both junctions sre in-
jecting into the bsse, the small electric fields in the two depletion regjons sweep
the carders into the emitter and collector repiom. Both junctions collect as
well as emit.
3.5.2 Ebers-Moll Model

In this section, we present the EbercMoU (EM) model, which is a simple
DC model of the bipolar transistor. The Ebers-Moll model can be used for
hand calculations and first order circnit analysis. The derivation of the model
equations, in this section, is bared on the analysis by Rodston [ZO]. Lo Section
3.5.1,we have disms~edthe device operation in the forward active region only.
For a general analysis, we assume that the base-emitter and the base-collector
junctions &re forward biased. In the following discussion we will neglect the
CnrrentS due to recombination in the apace ehsrge layeis and in the base. This
implies that Inc = &',hence, Equation (3.88) reduces to
IE = Lc + &E (3.91)
The current due the holes injected &om the base into the emitter is given by
1201
I,o = q AE D,E P ~ E O[,VD./V. - 11 (3.92)
WE
where h~~ is the equilibrium hole concentration in the emitter and W Eis the
neutral emitter width. The current Incis dominated by the diffusion current in
the base and is proportional to the gradient of the minority carders (electrons)
in the neutral base. Because the neutral base width (WB)is very thin, this
gradient is approximately a comtant. Therefore, we c a n write 1°C as [20]
Inc = q AE D,B [ n B ( O ) ;:gag(wB)] (3.93)
where na(0) and na(Ws) are the electron concentrations at the edges of the
emitter-base and collector-base depletion regions respectively [see Fig. 3.131.
Note that the slope of the clectmns in the base is given by the term between
the brackets as demonstrated by Fig. 3.13.
'B? app~ying KCL (i
bstuten LB and I.o.
j l s . / w e that I,.,
. I, + I~ ~ I, = 0). -
If thc recombination in the bsrc i s n&c$cd
ri L o .
scL t h t is the differcncc
(LB = 0). we can
KllliffC BaJC CDiieclor
Using thejunction law, the electron concentrations nn(0) and na(Ws), can be
expressed rn terms of VBE m d VBCrespectively. The current I., c a n hence
be given by [ZO]
where Ng is the base impurity eoncentration.
The collector current is given by
Ic = Inc - Ipc (3.95)
The current IPc is due to the holes injected from the base to the collector8.
The baSc-eoUcetor junction is basically a P + N N + structure as shown in Pig.
*Not= Lhat I., w- mat inclvdcd in Eqv~tion(3.88)because in drriring Equation (3.86)
we harr -rumEd that the Eallsstor-b-e junction was revc-c biased.
96 CHAPTER3
3.12(a). An expression for I,c can be derived from the analysis o f a P + N N +

diode. The reader is adviced to consult with reference [20] for the details of
this analysis. The carrent I,, is gi~mby
where pnco is the equilibrium hole concentration in the collector, Wc is the

,s the hole lifetime in the epitaxial
epitaxial thickness under the base and T ~ ? i
layer. By substituting Lorn Equations (3.92) and (3.94) in Equation (3.91)
and from Equations (3.94) and (3.96)in Equation (3.96)we get the following
equations for I p and lc
I, = I, - U,I, (3.97)
Ic = -I, + at', (3.98)
Eqnations (3.97) and (3.98) m e called the EberrMoU eqmations. Fig. 3.14
shows the equivalent circuit of the BJT bared on the Ebers-Moll equations.
The EbersMoU model described above is general and can be used for any region
of operation by substituting for VB, and V.c by lhe appropdate values. In
the forward ective region, assuming that VBS = 0.8 V and VBC < 0.3 V the
emitter and collector current of Equations (3.97) and (3.98)reduce to
la = I, sz I,, eV-1". (3.102)
where the reverse saturation current of the bare-emitter junction In, can be
derived from Equation (3.99)snd is given by
Lour-Voltage Device Modeling 97
ligure 3.14 Equivalent DC & N i t of the EST blucd on the Eb.ra;MoU

model
It can edsily be shown that the base current can he expressed as

1 - a,
IB = -F (3.105)
Ql
Eqnatims (3.102),(3.103) and (3.105)arethe well-known current equation. ofa
fommd biased bqpolar transistor. Note that Equation (3.105) yields the famous
relation between at and the DC forward current gain P P = Qf/(l- a f )1.
The simple Ebers-Moll model lacks accuracy for the following three reasons
1. It does not account far the parasitic resirtors of the emitter. base and
collector.
98 CRAPTER3
PC
d E’
2. It doer not aocount for the Early effect, which causes the collector current
to increase 8s the collector-emitter voltage increases.
3. It does not sccount for the effect of the high collector currents on the
current gain.
Next, we will discnss the modeling of e& phenomena separately,
3.5.2.I The Purusiricul Resisrors of a Bipolar Transistor

Fig. 3.15 shows the modification of the EM model hy the addition of the base
rwistanee RB, the collector resistance Rc and the emitter resistance R E . There
extrinsic components represent the transistor’s parasitic resistances from their
active region to their base, collector and emitter terminals, respectively.
The effect of the perasitie resistances ir important because the voltage drop
BEIOSS them contribute to the external baseemitter and collector-emitter volt-
ages VB1=. and V , , E ,respectively, = shown by the following two equations
V B ~ E=, VBE + RsIs t RBI, (3.106)

Vo,w = VCE + RcIc + REIE (3.107)
The drop across the parasitic resistors has to be acconnted for to get more
accurate iesalts from the EM model. Neglecting these drops may ~ V U Llead
to erroneous iesults. For example, if the external collector-emitter voltage i n
fonnd to be equal to 2 V one may dednce that the BJT operates in the active
Ecgion. However, if Rc = 1.8K and RB = 0 . M and Ic I , = 1 mA, then
the intrinde collector-emitter voltage (Von) is 0.1 V. This implies that the
bipolar transistor is actually saturated. This phenomenon is known as Quari-
Satuwlion.
3.5.2.2 The Early Effecf

The E d y effect refers to the base width modalation due to the change of the
collector base reverse voltage (in the forward active region). As the collector-
base reverse voltage increases, the base-collector depletion layer widens. The
resulting reduction in the neutral base width causer the current gain to increase
which, in turn, leads to an increase in the collector current [see Fig. 3.161. This
effect can be modeled by introducing the Early voltage (Va,) in the expression
of the collector cnrrent a5 follows
(3.108)
The inverse of the forward Early voltage 1,'VAj is analogous to the coefficient A
in an MOS transistor. A typical value of VA, is 50 V. The AC output resistance
of the BJT in the forward active region is related to the Early voltage and is
given by
70 -v.r
~
I0
(3.109)
The Early effect in the inverse active region can be modeled by using the reverse
Early voltage (VA,) which charaderises the slope ofthe collector cutrent in that
region (inverse active region).
3.5.2.3 High Current Effects

The current gain and the cut-off freqnency are degraded due to high collector
current. Fig. 3.11 shows the effect of the collector current o n the gain. This
degradation can be referred to the high level injection in the base (Webster
effect) and/or the base pushout (Kirk effect). For B detailed discussion
on these phenomenon, the reader is advised to consult reference [ZO]. In the
w e , -here the injection level in the bare is high (Webster effect) the collector
100 CHAPTER3
Figure 8.18 Thcl-V shmatcnsticrdrr BJT

Low- Voltage Deuzce Modelzng 101
cnrsent can be expresed as [ZJ]
Ic =
where the forward knee current Ixje is defined
ev-l=v%
- (3.110)
the collector current at

which its slope in the Gummcl plot changes from 1 to l/Z [see Fig. 3.181. This
current marks the onset of high level injection. The degradation of the current
gain, when Ic > k,, can be described by the following relation [203
P = - I0
=&- 1x1 (3.111)
IB IC
where & is the value of the gain when Ic < I z f . The modeling of the Kbk
effect is very complex. However, simple model for the current gain, which
can be used in first oidei circuit analysis, i n given below [Zl]
(3.112)
The aemracy of the simple EM model can be enhanced by acconntbg for the
parasitic resirtars, the Early effect and high emrent effect which mn be modeled
by simple analytical expressions as shown above.
3.5.3 Bipolar Models in SPICE

Two BJT models are implemented in SPICE. The Ebers-Moll model and a
more sophisticated one, which is based on the Gummel-Poon (GF) model
[ZZ].The second model indudes the following second order effects:
rn Very lour eument effect on the gain.

rn Base width modulation effect.
.
m High-level injection effects (the Kirk effect is not included)
Base resistance -tion with current.
The GP model is based on one-dimensional analysis. It is valid for all regions

of operation: cutoff, forward-active, invecse-active. and saturation. The GP-
bared bipolar model is illustrated by the equivalent circuit shown in Fig. 3.19.
*A trpicai value of 1x1 B u i L a c s is 1 m.4/pmn’
C ~
102 CHAPTER3
in1ii f
The two bad-teback diodes on the right represent the intrinsic base-emitter
and basccollector junctions and their curients are given by 1231
I,, I . ves/n,v. - 1)
= -(e (3.113)
qb
- ( e vec/n,v, - 1)
Iso = I* (3.114)
4s
where I, is given by [23]
(3.116)
The forward and reverse current e-on coefficient (nt ond %), which ate
introduced in Equations (3.113) and (3.114), are used to model thelow currents.
The parameter qb (base charge factor) accounts for the high current and base
Low- Voltage Device Modehng 103
Figure 2.1s Thc GP-blrrrd model of D b i p d v t r ~ $ i s t m
width m a d h t i o n effects. It is given b7 [23]
9s = + 1- (3.116)
qr models the effects of base width modulation and can be expressed as
The general expression of qs [Equation (3.116)] can be simplifled for lo dev el

and high-level injection conditions.
if PI q:/4 (low - level - injection)
(3.119)
if q, > 91214 (high- level -injection)
104 3
CHAPTER
eled by [23]
c,r,(ev-~”-v~ ~ I)
-
The two back-to-back diodes on the left [Fig. 3.191 account far the currents
caused bv the recombination of carders in the emitter-base and the collector-
base space-charge layers and other recombinations. These currents be mod-
(3.120)
c,r,(ev**’m=vs
- I) (3.121)
where C,,C,.n. and n. have been introduced to fit the measured corrents.
Further improvements to this model ate possible by the inclusion of three par-
asitic resistances ( R c , Rs, R B ) ;three jnnction capacitsnces (CE, C c , Cs);
and two diffusion capacitances (C-, Cdc)= shown in Fig. 3.19.
The model of the bare resistance take. into account the effect of the corrent
(current crowding) through the following expression [24]
tan(r) - I
R B ( I ) = R B +~ ~ ( R B
- R B ~ z) tan(z)l (3.122)
where the variable z ia given by
Rg represents the low-current maximum resistance and RBm high-cmrent min-

imum residanee.
The junction depletion capacitance is a function of the junction voltage (V).

This function can be approximated by the following two expressions
v -Mi
Cj.irp= C;(1 - - ) if V < FC4; (3.124)
4,
The empirieal factor FC has a value between 0 and 1. Its default valne in
SPICE is 0.5. Note that Equations (3.124) and (3.125) apply for a reverse and
forward biased junction respectively.
The diffusion capacitances model the charge associated with injected carriers.
For example, the electrons injected in the bare have B corresponding rtorsge
charge
Q~~ = r,rcc (3.126)
The forward transit time q is current-dependent and is gjven by an empirical

olprcrJirm[24]
Where VTF is a fitting parameter to model the change of 7, as a function

of VBC ( 01 V c s ) ,ITF models the change due to Io and XTF controls the
increase of q . ICO is the collector current in the absence of the high-current
effects which corresponds to that dEbers-Moll model.
The diffusion capacitance (associated v i t h the injected electrons from the emit-
ter into the base, when the base-emitter junction is forward biased) is gjvm
by
CDE = aQDB (3.128)
Similarly, the base-collector junction has a diffusion capacitance, which is given

by
CDC = -
aQDc (3.129)
av,,
where
QDC = SIEC (3.130)
Although the SPICE models account for most of the first and second order
effects, they m e not highly accurate. This originates from some weaknesses in
the theory on which the models are based. As the device festnres are scaled
down the currently a d a b l e models become less accurate. The physics and
the theory of the sealed devices is more complex. Hence, aseluate modeling
becomes very difficdt. One way around that problem is to chose the model pa-
rameters such that simulated device chsracteriaties agree with measurements.
In practice, the models' parameters are extracted automatically using parame-
ter analyser. with software tools to obtain the best fit. As a result, the values
of the extracted parameters may not correspond to their actual values. For
example, it is common to find B discrepancy of 20% between the measured
cnrrent gain of a bipolar transistor and that listed in the SPICE fie. h o t h e r
approach, which U eqmivalent to tweaking the parameterr, is to m e empifid
models (eg. BSIM model), in which the empirical (fitting) parameters c m be
optimized to get the best fit between simulation and measurements.
Typical GP parameters , for the 0.8 prn BiCMOS prsented in Chapter 2., a ~ e
shorn in Table 3.4 and 3.5.
106 CHAPTER
3
Table I., Bipolar dcviccpar-ekx and HSPICE sorxspondcna
Para SPICE Description

meter Keyword
IS Saturation current
BF Ideal madmum forward gain
BR Ideal madmum reverse gain
NF Forward current-emirision coefficient
NR Reverse current-emirision coefficient
VAF Forward early voltage
VAR Revers early voltage
IKF Forwadknee enrrent
IKR Reverse-knee current
ISE Baseemitter leakage ssturation current
ISC Basecollector leakage saturation current
NE Baseemitter leakage emission coefficient
NC Basecollector leakage emission coefficient
RE Emitter resistance
RC Collector resistance
RE Base resistance at zero current
IRB Base current where RB = RB(O)/Z
RBM Minimnm high-current base resistance
CJE Base-emitter ser-bias depletion cap.
VJE Base-emitter built-in potential
MJE Base-emitter junction grading factor
CJC Basecollector aero-bias depletion cap.
VJC Basecollector built-in potential
MJC Base-collector junction grading factor
CJS Collector-substrate iero-bias cap.
VJS Collector-substrate built-in potential
MJS Collector-substrate junction grading factor
XCJC Internal base fraction of base-collector cap.
FC Coefficient for forward-bias depletion cap.
Table 3.4 (contznnrd)
I,
XTF
TF
XTF
Forward transit time
T F biar-dependant coefficient
VTF VTF TF barecollector voltage dependence c o d .
ITF ITF T F high current parameta
T, TR Reverse transit time
XTB Forward and re~ersebetel0 temperature exponent
XTI Saturation current temperature exponent
ED Energy gap
KF Flicket noise coefficient
AF Flicker noise exponent
Table 3.5 ASPICE BJT model pa~metcrr(0.8 I" BiCMO8 p r 0 ~ ~ s ~ ]
SPICE Vdue Units

Keyword
IS Zx A
BF 100
BR 1
NF 1
NR 1
VA P ..
sn V
VAR 5 V
IKF 5n 10P A
IKR 0. A
ISE 0. A
108 3
CHAPTER
Table 8.6 (emlmurd)
RE 30 n
RC 87 n
RB 650 n
IRB 0 A
RBM 650 62
CJE 1 . 5 1 ~lo-'' F
VJE 0.87 V
MJE 0 265
CJC 1.15~10-14 F
VJC o 713 V
FC 0.5
TF 12.5~ Q
XTF 916.2
VTF 1.6
ITF a.7x 10-2
TR 4 x 10W8 J
XTB 1.4
XTI 3.5
EG 1.11 ev
XF 2.9x10-e -
AF 2.0
3.5.4 Chapter Summary

111 thk Chapter, we h a w r r r i c w c d the fundamentds ofth e 110s xiid bipolnr
derirrv 'l'hr ~ m w common

t device rwud11 u s S 4 i n SI'ICE ILRYC
been pn w ~ t d
Cw
'The key device P B I I U ~ ~ ~of ~ S h model h a w been defined and rrplaincd, so
that the rradcr is familiar with the drtailr of these niodclr and can apprecislr
the importance a f t h e different model parameten T h e reader 19 given B Lst of
model parameterr, for B typical 0 8 pm RiCXOS prnccis. that can be used for
circuit simulations T h o c modrl ran be used even a1 low-voltage opcralion.
hlorcoser, ia .in,plc analytical model unltd for suhmirronwrr 1lOSFET'r has
berm 1 l i r c i . r 4
REFERENCES
[I] A. Vlrudimirescu, and S. Lio, "The simulation of MOS Integrated Circaits

using SPICEZ," M m o . No. UCB/ERL M80/7, Univ. Cdifomia, Berkeley,
October 1980.
[Z] H. Masuda, M. Nakai and M, Kubo, "Characteristics and Limitations of
Scaled Down MOSFET's Due to Two Dimensional Field Effect," IEEE
Trans. on Electron Devices. Vol. ED-26, pp. 980-986, 1979.
[3] R.L.M. D u g , "A Simple Current Model for Short-Channel IGFET and Its
Application to Circuit Simulation," IEEE Journal of Solid-State Circuits,
vol. SC-14, pp. 358-367,1979.
(41 G. Merkd, J . Bore1 and N.Z. Cupces. "An Accurate Large Signal MOS
Transistor Model for Use in Computer-Aided Design," IEEE Trans. an
Electron Devices, vol. ED-IS, 1972.
[5] G. Baum and 8 . Beneking, 'Drift Velocity Saturation in MOS Tranris-
tors," IEEE Trans. on Electron Devices, YOI.
ED-17, pp. 481-482, 1970.
[6] R.M. Swanson and J.D. Meindl, "Ion-Implanted Complementary MOS
Transistors in Lou-Voltage Circuits," IEEE Journal of Solid-state Cir-
cuits, vol. SC-7, pp. 146-153, 1972.
171 B.J. Sheu, D.L. Scharfetter, P.-K. KO, and M.C. Jeng, "BSIM Berke-
ley Short-Channel IGFET Model for MOS Transistors," IEEE Journal of
Solid-state Circuits, vol. SC-22, pp. 558-566, 1987.
[8] J. 8. Huang, Z. H. Liu, M. C. Jeng, P. K. KO,and C. Ha, "A Robust
physical and Predictive Model for Deep-Snbmicmmeter MOS Circuit Sim-
ulation," IEEE Custom Integrated Circuits Conf., Tech. Dig., pp. 14.2.1-
14.2.4, May 1993.
[9] D.E. Ward and R.W. Dutton, "A Chargeoriented Model for MOS Tran-
sistors Capacitances," IEEE Journal of Solid-State Circuits, vol. SC-13,
pp. 703-707, 1978.
112 LOW-POWERDIGITALVLSI DESIGN
[lo] Y. P. Tsividir, "Operation and Modeling of the MOS Trwsistor,' Mc

Gmw-Ha, 1988.
[Ill T. Sakata et al., "Subthreshold-Current Reduction Circuits for Multi-
Gigabit DRAM'S," B E E Jonmal of Solid-state Circnits, vol. 29, no. 7,
pp. 761-769, July 1994.
1121 S.M. Sae, "Physics of Semiconductor Devices," John WiIey & Sons, 1981.
1131 C.G. Sodini, P.-K. KO,and J.L. Moll, "The effect of High Fields on MOS
Device and Cireuit Performance," IEEE Trans. on Electron Devices, Vol.
ED-31, No. 10, pp. 1386-1393, October 1984.
[14] B. HoefRinger, H. Sihbert, and G. Z h e r , "Model and Performance of
Hot-Electron MOS Transistor for VLSI," IEEE Trans. on Electron Devices,
Vol. ED-26, pp. 513, 1979.
[I51 C. hu, "Low-Voltitge CMOS Device Scaling," IEEE International Solid-
State Circuits Canf.,Ted. Dig., pp. 86-87, 1994.
(161 R.H. Dennard, a t al.,"Designoflon Implanded MOSFETa with Very S m d
Physical Dimensions," IEEE Journal of Solid-state Circuits, vol. SC-9, pp.
256-266, October 1974.
[I71 P.K. Chatterjjee, et al., ''The Impact of Scaling Laws on the Choice of
N-Channel or P-Channel for MOS VLSI," IEEE Electron Device Letten,
Vol. EDL-I, pp. 220-223, October 1980.
[la] M. K e h m u , "Process and device Techoologiea of CMOS Devices for Low-
Voltage Operation," IEICE Trans. Electron., vol. E76-C, no. 5, pp. 672-
680,May 1993.
[19] M. Kdkumu, M. Kinugawa, and K. H m b o t o , "Choice of Power-Supply
Voltage for Half-Micrometer and Lower Submicrometer CMOS Devices,"
IEEE Trans. Electron devices, vol. 37, no. 6, pp. 13341342, May 1990.
[20] D.J. Rodstan, "Bipolar Semiconductor Devices," McGraw-HiU Publishing
Company, 1990.
1211 K. Naknuato, et al.,'Characteristics and Scaling Properties of n - p n Tran-
sistors with a Sidewall Base Contact Structure," IEEE Trans. on Electron
Devices, vol. ED-32, no 2, pp. 328-332, February 1985.
[22] H.K. Gummel and H.C. Poon, "An Integral Charge Control Model of Hipa-
lirr Transistors," Bell Syst. Tech. J., vol. 49, 1970.
REFERENCES 113
[23] 1. Getreu, “Modeling the Bipolar Transistor,’ Tektranix, h e . , 1916.

[24] P. Antognetti and G. Massobrio, “Semieandnctor Device Modeling with
SPICE,” McGraw a;U,1988.
4
LOW-VOLTAGE LOW-POWER
VLSI CMOS CIRCUIT DESIGN
In thir chapter we introduce the CMOS logic gate with the development of sim-
ple models for delay and power disripstion estimation. These analysis permit
us to understand the mechanisms that control the performance, particularly
the power dkipation, of a logic circuit. Several CMOS d m i p s t y k , such as
pseudoNMOS, dynamic logic and NORA, are presented. Other k c n i t varia-
tions of the static complementary CMOS, which are suitable for low-PO- ap-
plications, are discussed. These include the passtransistor logic families such
as Complemendary Pass-transistor Logic (CPL), Dud Pasctramistor Logic
(DPL), and Swing Restored Pass-transistor Logic (SRPL). Also an overview
of clocking strategy in VLSl systems is covered. Included in this chapter is
one important %re*which is the I/O circuits. The power dissipation of the
I j O circuits is also analyzed. Findy, low-power techniques for CMOS design
are also reviewed at the tr-istor-level. We will cover the low-power issues
a t subsystem/system/architeeture levels in Chapter 6,7 and 8 in more detail.
Several books treat in detail other CMOS circuit design aspects [I, 2, 31. The
reader CM refer to them.
Many issues existing in todays advanced CMOS circuit structures are consid-
ered; such as:
Power dissipation components of a CMOS gate and their importance;

Concept of switching activity;
Power dissipation in 110 circuits;
. Single-phase clocking strategy;

Clock skew issue:
116 CHAPTER4
rn Clock distribution in VLSl systems;

m Ground bouncing; and
m Low-power circuit techniques and design guideher.
4.1 CMOS INVERTER DC CHARACTERISTICS

Fig. 4.1 shows the basic complementary MOS inverter. Before deriving the
DC-transfer characteristics of this inverter (the output voltage Y C ~ S U Ithe input
voltage), lets understand the operation of this circuit.
. When the input is BIGH, which means at VDD,we have
VSSn = Krn = VDD (4.1)
v, = K" VDD = 0~
(4.2)
In this case, Vosn > VT, and lVcstl < lVrpl. The PMOS is OFF and the
NMOS is ON. The NMOS transistor N provider a current path to ground.
The find stable value of the outpot voltage V. is
v, = 0 (4.3)
At the steady rtete, the DC cnment from VDD to the groondis controlled
by the subthreshold current of the PMOS P ,since this device ia OFF and
the NMOS N has B VDS equals to zero. We assume that the junctions
leakage is negligible. If VT,,' is low enough (lower for example than -0.5
V), the subthreshold current is negligible (< 1 pA/prn width). If
(negative) is high, the subthreshold is not negligible and can be w high as
1 p A / p m for = -0.05 V [see Section 3.321. In this case the output is
not exBctly at zero and can have a value of tens of mV. In this section we
a m m e that the subthreshold cmient is not importmt. Low-VT CMOS
circuits .%re treated in Section 4.10.
Similarly, when Kn is low (OV) Vos. f VT, and IV,s8l > [VTJ. The
PMOS transistor is ON and the NMOS transistor iS OFF. The output
voltage is given by
v. = VDD (4.4)
Also we assume that the leakage current is negligible.
'Exbr*pold.ed thruhold voltage.
Lorn-Voltage Lou-Power VLSI CMOS Cixuit Design 117
%sf+ PMOS
* Figure 1.1 A CMOS Inruter
The logic levels of the CMOS inverter are close to VDDand ground and the
logic swing is equal to VDO.This is B main feature of CMOS gates.
4.1.1 ltansfer Characteristics

In this section we discuss the DC ehaiacterirtier of the CMOS inverter of Fig.
4.1. Fig. 4.2 shows the DC transfer characteristic with the different regions
of operation. For simplicity we use, for the MOS devices, the simple cnrrent
models presented in Section 3.2.1. The circuit operation can be divided into
fiue regions:
Region (A): 0 5 Ern< VT,

The NMOS transistor is operating in the subthreshold region and the
current is assumed zero. Hence the PMOS current is also em. The
PMOS transistor is in the linear region. Thus, V. = VDD.
118 CHAPTER4
Region (B): Vrn < K. < I L

Ens is defined M the input voltage at whioh the gab of the inverter
is maximum and is also defined s the gate threshold voltage. In this
region, the NMOS transistor ia operating in the satmation region and
the PMOS is in the linear region. Since the emrent in both devices is
thc same (in sbsolute value), w e have
IDS? = - I D S . (4.5)
The PMOS current is given by
I D S p '-Pp [(~~-vDD-vTn)(va--I/DD)-~/~(~-vDO)z]
(4.6)
Where
6, = kp% (4.7)
Leff
(4.8)
The saturation cument of the NMOS is given by
where
a.= -,k W.ff (4.11)
L.ff
and
VGS, = Km (4.12)
Using equations (4.5), (4.6) and (4.10), the ontput voltage is given by
v, = (K*-Vrp)+ (4.13)
- P-
VDD
(%, - VTp)' - a(%% -- vTv)vDD -(!& - vT,)a
2 PP
This equation of V, versus V, is plotted in Fig. 4.2 region (B)

Region (C) : K, = V &
Both the NMOS and PMOS transistors we in the saturation region.
In this case, the PMOS current can be given by
(G" - VTJ
I D , = -P, (4.14)
Lou- Voltage Low-Power VLSI CMOS Circuit Design 119
'DI
YO
The NMOS saturation current is given in Eqoation (4.10). By equal-

iring the absolute value of the two dr- currents we have
(4.15)
where
p = -i% (4.16)
PP
This equation is very useful from B design point of view. Note, from
this equation, that the logic threshold voltage of this gate is set by the
designer; since the parameters & and /a are dependent on W c f fand
L . t f . Moreover, the region (C) is d e k e d for only one point of I$,,
For symmetrical NMOS and PMOS devices we have
VT" = VTP (4.17)
If the designer set

a 'PP (4.18)
120 CHAPTER4
This ratio is a typicd example. The designer should set the rise ratio
a5
(4.20)
We obtain
VDD
K, = K*" = - (4.21)
2
A n inverter with this V,."* is sometimes called B symmetrical gate. The
cutput voltage in this ea5e h not neeereary equal to VDD/2 and is given
by the following inequality
K" -vT, < v. < V,,+ v, (4.22)
In reality, V. is set by the alight dependence of I D , versus VD'OS

Region (D) : K,," < V,, < VDD +
In this region the NMOS is in the linear region while the PMOS is
in the saturation region. Simila analysis used in region (B]can be
applied. The output voltage is given by
\i
V. = (K* - V&) - ( L VT,,)' ~ ~ &(I$.
Pn
~ VDD VT?)~(4.23)
~
Region (E): VDD + < '4" 5 VDD

In this region the NMOS transistor is ON, and in the linear region,
and the PMOS is operating in the subthreshold region. If we arirume
that this current is too small then
v. =0 (4.24)
The cnrient flowing from VDDto ground, Y C ~ I S Y Sthe inpnt voltage, is plotted
in Fig. 4.2(b). It reaches its madmum when both the MOS transistors are in
saturation. It h important to note that for V,= K,," the DC power dissipation
would be maximal.
Low- Voltage Low-Power VLSI CMOS G h o d Desrgn 121
Figvre 4.3 ERccl of thc ratio p on the (s)DC t r d w F h ~ E t e r i s t i c (b)

i
threshold voltage of ulr CMOS inverter
4.1.2 Effect of p
As we discussed before. the ratio 0 controls the threshold voltage of the CMOS
inverter. This panmeter is set by the ekenit designer through the transistor
sizes. Other psrameters such BS the mobility and the theshold voltage of
devices are set during the fabrication and the circuit designer can not change
them. Fig. 4.3 illustrates the dependence of DC transfer charaeterirtier and the
threshold voltage of the CMOS inverter on the ratio p . Increasing 0 decreases
the voltage &,". KU has II prwticsl maximum less than VOD t VpP and
practical minimum greater than I+". Practical values mean that 0 can not
have zero or infinite. In general, the circuit designer tries to set 0 = 1 for
symmetrical operation unless the gate is used to switch an input s-8 different
than a CMOS swing (from ground to VDD).
4.1.3 Noise Margins
Noise margin LG an important parameter in logic design. It i6 defined si the

allowable noise voltage on the input 10 that the output is not affected. In other
122 CHAPTER4
(a)
words, we would define the valid logic levels such that they are restored when
they propagate through a digital circuit. The logic levels c a n be extracted from
the DC characteristic. As illustrated in Fig. 4.4 we define the levels at
the input by
.
rn
Logic 0 : for 0 5 Ii, 5 VrI,
Logic 1 : for fix 5 5 VDD
and at the output by
. Logic 0 : for 0 5
Logic 1 : far Vog

v. 5 V0'
5 V, 5 VDD
The LOW noise margin is defined by

N M L = ]fir.- V d (4.25)
Low- Voltage Low-Power VLSI CMOS Cnrcuit Dessgn 123
and the HIGH noise margin is defrned by
N M H = IVOH- Vrxl (4.26)
The V,r. and the V m lev& can be defined ils the points where the slope of the
DC transfer characteristics is -1, i.e.,
These valuer can be deduced wing equations (4.13) and (4.23). To have good
noise mar&, it is desirable to have Vii. and f i x each near the other, mound
the point V D D ~ ~ .
For CMOS circuits, the HIGH output Voltage level VOH,can be defined by
letting VOH = VDDand Vor. = 0. The CMOS logic inverter has fairly ideal
transfer €nnnnctian and it tends to have very good noise margins. In some appli-
cations, either N M x or NM,, is compromised to have good speed of operation.
4.1.4 Minimum Power Supply

To obtain the maximum power raving in CMOS logic circuits, the power supply
voltage should be reduced. So, what is the lowest practical supply voltage at
which CMOS d l operate? In 19'12, Swansan and Meindl 141 demonstrated
that the minimum supply voltage is given by
Vnom,n = BkTln (4.28)
At room temperature this value is equal to 0.2 V. This demonstrates that

CMOS ir a good candidate for ultra-low-power applications.
4.1.5 Example of Noise Margins

For an inverter with W, = 2W,= 4 p n (in 0.8 p n CMOS technology), and
using a threshold voltage VT = VT,=(V~,(=0.5 V, we have the fobwinsvalues
for N M L and H M H . At 3.3 V power supply voltage, Nnai. = 1.15 V and
N M x = 1.45 V. However at 1.5 V, N M L = 0.60 V and N M H = 0.65 V. So
the noise level should be kept low, particularly at low power supply voltage.
124 CHAPTER4
T vDD 1
Figure 4.5 CMOS invat.? %ndwitching chaiactuistic
4.2 CMOS INVERTER SWITCHING CHARACTERISTICS

In this section, we present the transient behavior of the CMOS inverter. A very
simple analytic model for delay is developed. The objective of this analysis is
to understand the parameters that affect the speed of the gate. We assume that
the input has a step waveform. The delay t d , is the time difference between
the mid point of the input rwhg and the mid point of the wing of the output
signal. Referring to Fig. 4.5,
td, is the 50% delay when the output is rising; and

rn tq k the 50% delay when the output k faUing.
The power dissipation issue during the switching is considered in Section 4.3.
Low-Voltage Low-Power VLSI CMOS Czrcuit D e q n 125
4.2.1 Analytic Delay Models

The load capacitance shown in Fig. 4.5 at the output of the CMOS inverter
represents the total of the input capacitance of driven gates, the pararitic ca-
pacitance at the output of the gate itself and the wiring cepacitance. In Section
4.4, we discuss the estimation of this load capacitance. For simplicity we ac
sume for 50% delay. that the MOS current is averaged, and is e q d to the
saturation current. The equation of the saturation used in this seetion is the
one given by Equation (3.82) Section 3.3.3. This saturation current is well
modeled for short-ch-el devices,
4.2.1.1 Fall Deluy
When the input goes from low (ground) to high (VDD),initially the output is at
VDD, the pull-down NMOS of Fig. 4.5 is in the saturation region. We wusume
that when the output falls to VDD~Z, the NMOS drain current is approximated
by the raturstion current IDs,&. Referring to the equivalent circuit of Fig.
4.6(a), the delay i s computed from the following differential equation
where
-E n )
I D S , , ~ , = Kn~.atCocWe~,m(Vcsn (4.30)
We ~ s s u m ethat the factor K, does not change. By integrating Equation
(4.29) from t = tL, correrponding to V, = VDD, to 2 = t l , corresponding
to V. = V D ~ / Zand
, substitution of (4.30) into (4.29) we obtain
Note from this equation that the delay is inversely proportional to the width of
the MOS transistor. So by aising the gate we can reduce the delay of the gate
alone.
4.2.1.2 Rise Delay
When the input goes from high (VDD)to low (ground), initidly the output is a t
zero. The pull-up PMOS transistor operates in the saturation region. Similarly
using the equivalent circuit of Fig. 4.6(h), the rise delay is given by
(4.32)
126 CHAPTER
4
11 vDD
At t = t , Vo=V,,
At t = t 3 V o = O
At t = t Vo=-v~~
4 2
From the *bow equation we can deduce that the dse delay is greater than the
fall delay for equally sisad MOS transistors. So We,,,phould be rised such
that the two saturation currents are almost equal in order to get symmetrical
rise and fall dehyr.
4.2.1.3 Delay nme

By definition, the delay time (sometiw called propagation delay) is given by
1
fz = #d, +td.) (4.33)
Hence, for VT. = - V T ~= VT the delay is given by

Low-Voltage Low-Power VLSI CMOS Circnzt Deszgn 127
Or the equation can be written as
(4.35)
The constant is slightly diected by VDDthrough the parameter K. This equ*

tion shows a simple analytic expression for the delay time. We can observe
that the delay is linesrly proportional to the total load capaeitsnce. Secondly,
the delay increases when the power supply is scaled down. When VDD ap-
prosches the threshold voltage of the device, the delay incresses drssticdy.
If the threshold voltage L sealed down with the supply voltage and the oxide
t b i c h m is sealed down too, then the delay can improve with VDO sealing.
&om the CMOS circuit designer point of view, the only parameters thst can
be controlled to opt-e the speed of CMOS gates me:
.. The width of the MOS transistor;

The load capacitances (input of the n u t stage, wiring,ette.); and
The supply voltage V D D .
Fig. 4.7(a) shows the simulated effect of the power supply voltage on the delay
ofan inverter with fanout = 3, using the device parameters given in Chapter 3.
We buffer the input voltage with one inverter stage to obtain accurate results.
The delay is almost stable at high VDO,however when VDDapproaches the
threshold voltage of the NMOS and PMOS devices, it increaser drastically
as expected by Equation (4.35). Therefore, the threshold wltage should be
reduced to overcome this problem. In Fig. 4.7(b), the delay of the inverter is
D VOD= 2.5 V. For VT/VDD > 0.5. the delay
plotted versus the ratio V T ~ V D at
incresses rapidly. In order to maintain improvement in circuit performace at
reduced power supply voltage, VTJVDDmust be 5 0.2.
4.2.2 Delay Characterizationwith SPICE

A data sheet for the delay of a cell (i.e., CMOS inverter) c ~ be
n e d y prepared
using SPICE. For example the load capzsitace 01 the fanout of a CMOS
inverter is swept during the airnulation, and the relation of the type l a =
a + b.C,(or fanout) can be obtained. Fig. 4.8 shows the delay YS. the external
load capacitance C,. Other parameters can be extracted also.
128 CHAPTER4
4.5
I
Low- Voltage Low-Power VLSI CMOS Circuit Deszgn 129
0.65 I 1
0.15 '
1 2 3 4 5 6 7 8 9
I
10
4.3 POWER DISSIPATION

To minimiae the power consnmption of a CMOS circait, the various power
components and their effect mast be identified. There are two types of power
dissipation. One is the m-nn power dissipation which is related to the
peak of the instantaneous current and the other is the averagge power dissipa-
tion. The peak current has an effect on the supply voltage noise due to the
power line resistance. It can cause heating of the device, thus resulting in per-
formanee degradation. From the battery lifetime point of view, the average
power dissipation is mole important.
There are three power dissipation components within the CMOS inverter.
These are:
1. Static power csused by the leakage current and other Static cur-
rent 1.t due to the value of the input voltage;
2. Dynamic power caused by the total output capacitance CL;and
130 CHAPTER4
3. Dynamic power caused by the short-circait curent I,. during the

switching transient
Sometimes component (2) and (3) are merged as total dynamic power
4.3.1 Static Power

This component is split sometimes into two other components. The sourcces
of static power dissipation, in a complementary CMOS inverter, are leakage
currents (P,*) a d current drawn &om the supply due to the input voltage
(P,%).Hence the total static power is given by
P, = P s i + P.2 (4.36)
Leakage eubent consists of MOS junction leakage currents. Fig. 4.9 shows
the parasitic diodes in a CMOS inverter. The body ties in this stroeture, such
as the p&itic. diodes, m e not conducting (i.e. reverse biased and/or at iero
voltage). The current in B diode is given by
9vd
Id = I,(exp - 1) ~ (4.37)
nkT
where n is the emission coefficient of the diode (sometimes equal to 1) and Vd is
the applied voltage to the diode. Note that the current parameter 1. inereares
with temmnrturc. The total rrower dissipation due to these le&am currents is
given by
P,l = ~ I a , V L W (4.38)
A typical value of this leakage current Id is 1 fa/device junction. This value is
deuicer, the total contdbution to the power would be -

too small to have any effect on the static powex, because if we have o m million
0.01 pW. This first
component of the static power is neglected, in the analysis, through all the
chapters of this book except Chapter 6 in the c- of memory design.
We con$der now the second component ofthe static power which is a function
of the input voltage Kn. Assume that the input of the pull-down NMOS, of
the inverter, is at B voltage 0 5 K" < V,. In this ease the torrent is given by
the subthreshold expression (Fig. 4.10)
wW.O,,oLsgw
I D S = zo-I (4.39)
Vss
r
132 CHAPTER
4
wherc VT is the constant-current threshold voltage. For V ,. > VT the current

is given by expressions discussed in Chapter 3. The corresponding static power
disripation is given by
P.2 = IDsm*o.VDD (4.40)
Thc mean value ofthe current is for both the PMOS and NMOS transistors. For
example if V. = 0, VT = 0.15 V, W c f j= 10 fim and S = 75 mVJdeeade, this
current is 1 nA. Far 1 million devices integrated, the total static power would
be impmtant (1 mA of current). Note that this current increases drasticdly
with the increase of temperature [see Section 3.321. This value, in standby
mode. is not permitted lor battery-operated applications. CMOS circuits have
been known to consume energy only during switching. But this is not troe mow.
since low-VT CMOS is used far low-voltage operation. Some CMOS circuits,
which exhibit a high DC current, are discussed in Section 4.6.
4.3.2 Dynamic Power of the Output Load

In this section we estimate the power dissipation due to the total oiitput load
capacitance CL.This power is due to the currents needed to charge and dis-
charge CL as shown in Fig. 4.11 and 4.12. We assumc a etcp input 10 neither
the PMOS and NMOS m e on rimultanmurly. The average dynamic power
Pa required to charge and dischsrgc II capacitance C, at Iswitching frequency
f = IjT (Fig. 4.12) is given by
I =
(4.41)
The output current is given during charging phsse by

-- .Ip = C do
I~ ," (4.42)
df
and during the discharge phase by
i - In = -c&-
- 'dv.
(4.43)
df
Then Eqoation (4.41) becomes
Finally the dynamic power dissipation is
(4.45)
T
Low-Voltage Low-Power VLSI CMOS Cmud Desegn 133
T VDD T vDD
This equation shows that the power dissipation is proportiond to the operating
frequency. Moreover, the ieduction of the power supply d r a s t i d y reducer the
power dissipation. Ideally, 3.3 V ~npplyvoltage rednces the power dissipation
by 56% compared to that of 5 V. Moreover, at 1 V the power is reduced by
96% compared to 5 V. The expression of dynamic power in Equation (4.45) is
valid only for an inverter. However, for E. complex gate the concept ofswitching
activity is introduced [see Section 4.5.31.
-
During the h s t output transition (charging) from 0 VDD,the energy drawn
from the power mopply is Ed = CLV;,. For tbis transition, the energy stored
in the load capacitor is
This means that during lhe output transition 0 -

Vo0, hdf of the energy
drawn Gom the supply is stored in the capadtar and the other haUis eonramed
134 CHAPTER4
...............
/
... ....... L
...... ....... 1
Time
...
y
...... ...... .>
Time
\
Lou- Voltage Low-Power VLSI CMOS Circuit Design 135
by the pull-up PMOS transistor. For the outpnt transition VDD -

0, the
mergy [l/2 C z V i D ) stored in the capacitor is consumed by the pun-down
NMOS transistor and no current is drawn from the supply.
4.3.2.1 Energy vs. Power

It is important to distinguish between enecgy and power. If for uample, for a
CMOS gate x e reduce its dock rate (I),
its power coxsmption will be reduced
by the same proportion. Howevu, its energy d still be the same. Assume that
the gste is powered with a battery to perform computations. The time reqoired
t o complete the computation, with low dock rate, d beincreased. Therefore,
after t h e computation the battery Uiy be jnst as dead as if the computation
had been performed at high clock rate. So law-enecgy design is moreimportant
than low-power design. The factor of merit in this case can be defined as the
pmdud of energy limes the delay. The canvcntional term, low-power.is used
through out this book to mean that we design for low-energy.
4.3.3 Short-circuit Power Dissipation

Even if there were no load capacitance on the outpnt of the inverter and the
paradtics are negligible, the gate would still dissipate switching energy. If the
input changes slowly, both the NMOS and PMOS transistom are ON, an excess
power is dissipated due to the. short-circnit current. Fig. 4.13 shows the rhort-
circuit cments BS the inverter switches as function of the i d time of the input.
We are assaming that the rise time of the input is equal to the fall time.
P,c = I,..,.LVDD (4.47)

To estimate I,.,, we use the simple model of the short-circuit current of Fig.
4.14 151. Also we Bssume that the inverter has symmetrical devices, which
mesni that = P, = 0 and VT, = -VT- = VT. We also assume that the
rise time is equal to the fall time of the input signal (7,= rt = 7).The mean
short-circuit current in the unloaded inverter is
r,,. =z Y T [j: i(t)dt + j:’i(tpt] (4.48)
Due to symmetry we have

136 CHAPTER4
350 I
-50 '
0 I 2 1 4 5 (1 7
1
8
Time (ns)
Figure 4.18 Shari-circuit evmnt function of the input dope
The NMOS transistor is operating in satmation, hence the above equation
The input voltage is given by
X * ( t ) = VOO
-f (4.51)
It can be derived &om Fig. 4.14 that
VT
VDD
*I= -7 and t 2 = I
2 (4.62)
Then the integral leads t o

Low- Voltage Low-Pourer VLSI CMOS Circuit Design 137
Figure 4.14 hput voltage and short-cbeuit cumnt model
Thk equation shows that the short-circuit power dissipation is also proportional
to the tiequeney. The only parameters that can be controlled by the circuit
designer at given frequency and power supply to reduce P., are: 0 and 7.
The power supply s d n g greatly affects the reduction of short-circuit power
dissipation. Note that this analysis was done for an unloaded inverter. For a
loaded gate, if the outpnt signal and inpnt signd have eqnd rise/fd times, the
short-circuit power dissipation will be less than 20% of the total power [5]. So
it is very important to keep the edges fast, to have negligible P,*01a t least, it
is desirable to have equal input and output rise/fd times.
If the load capacitance is high, the output rirejfaU times become larger than
the input ones. In this case, the inpot ehsnges completely before the output
changer rignificantly. Therefore, the short-circuit current is near zero. Note
that if VODis approaching (VT,,+ VTz)01 is less, the short circuit current can
he eliminated because both devices can not conduct simultaneourlv.
138 CHAPTER4
4.3.4 Other Power Issues

The total power dissipztion of a CMOS gate is given by
Pi,t,, = P. + Pd + PSC (4.54)
It represents the total power of a gate when it is switching at the same rate
aa the operating frequency. In Chaptez 8, we will discuss how to estimate the
power dissipation of a complex circuit.
Other power dissipation k u e s exist, such as: worst ease power estimation and
temperature effect. These conditions are : maximum VDOandjunction tcmper-
atarc, and faat-faat process. Static power dissipation (subthreshold carrent) is
incieaad by the increased temperature and increased power supply. Dynamic
pow= is not sensitive to the temperatare bat it is affected greatly by the worst
caae VDD.Short-drcuit power dissipation depends on the temperature j u t as
the short-circuit current doer. It is also dependent on the power snpply. The
mobility and threshold voltage deereaae with increasing temperature. Each of
these two parameters has an opposite effect on the current. So it is important
to eonrider the worst case power consumption evaluation in any design.
The simulated average total power dissipation can be easily measured by the
SPICE simulator u&g POWER MEASUREMENT commands. However, sev-
eral papers in the literature have introduced "power meter" in circvit simulation
to meaauce the power dissipation [6,7, 81,
4.4 CAPACITANCEESTIMATION
Previously we saw that the speed and power dissipation of CMOS gat- depend
strongly on the total ontput load ce.paeitance. This capacitance is the sum of
three components as shown in Fig. 4.15.
Total input capacitances of N driven gates noted C,m;

1 Parasitic output capacitance of the drive gate noted C,;and
I Wiring capacitance noted C,.
For simplicity we estimate, in this section, the average value of Cr. over the
range of the output awing. This approach is used only for b i t i d estimation
Low- Voltage Low-Power VLSI CMOS Czreutt Deszgn 139
of the design. More circait simulation and layout extraction and port-layout
shdation arc needed fm mole accuracy. Moreover, it is sometimes interesting
to derive a simple expression for the load capacitance to dee the impact of
important parameters on the speed and the power dissipation. We h t eramine
the different components of the outpnt load capacitance: then we illustrate by
eo
. example the estimation approach.
4.4.1 Estimation of C,,

The total eapacitanee of the driven gates can be evaluated by 5m-g the
input capacitance of all the receiving gates and we have
The gate capacitance of the receiving gate can be approximated by

n
Cq*te= conC ( W L ) < (4.56)
;=I
where n is the number of tr-torr of the gate. This expression sum3 the gate
capacitances of all the transistors composing the driven circuit. For a CMOS
inverter it is given by
(4.57)
140 CHAPTER4
3.5 I
,
I
' ?'
VOllll ,? ',,' voD=3.3 v -
3 - y:
2.5 - ,
i !
i ? -
2 - Vin i I
- i
1.5 - i .
1 -
i -
i 7
0.5 - i
i .
i ;vout2
_..t . .... . ..*< ei .
.
-0.5
Low-Voltage Low-Power VLSI CMOS Czrcuit Desrqn 141
Figwe 4.16 shows an example of the equivalent gate capacitance of the receiving
gate. The driven inverter has the following drawn sizes : W, = W. = 20 p m
and L = 0.8 pm. This gate can be replaced by an equivalent capaeitenee
Cgacc z= 50 f F ,which is approximately the same as the one ealeulated from
Equetion (4.57).
4.4.2 Parasitic Capacitances

Fig. 4.17 shows the main contributions to the output parasitic capacitances
of a CMOS inverter. Thus, it L estimated by
c, +
= CdP Cd,, + Gjp+ c,, (4.58)
142 CHAPTER
4
The drain overlap capacitance for NMOS and PMOS ir given by
cg. = c,w (4.59)

C, is ddned in SPICE parameters of Chapter 3 as CCDO. The drain junction
capacitance is a function of the ~everseapplied voltage during the switching of
the inverter. The average value of this capacitance over the range of output
swing is defined by
c, +
= 6,aAo c j . , P ~ (4.60)
where AD and Po are the area and the perimeter of the drain junction a shown
in Fig. 4.18. The average bottom junction capacitance is
(4.61)
The average side-wall capedance

Low-Voltage Low-Power VLSI CMOS Czrcuit Design 143
\I
4.4.3 Wiring Capacitance

The Simple model of wiring capacitance is bared on the parallel-plate model
[Fig. 4.191 given by
c,, = -
cm
H
(4.63)
where H is the thickness of the insulator layer (oxide), and C,. is the capaei-
tanee per erea unit. The total capacitance of the wire is
c, = IWC,. (4.64)
where W is the width of the wire (metal or poly). and I is the length of the
wire. Table 4.1 piyes some values of the widng capacitance per area for the
0.8 pm process presented in Chapter 2. This capacitmce can not be known in
the early design stage but can be known after layout extraction.
When the thickness of the insulator becomes comparable to that of the wire,
T, then the fringing fields at the edge of the wire become important. The effect
of the fringing fields is manifested by the increare of the effective area of the
plates [Fig. 4.191. Many approximations have been proposed to compute the
144 CHAPTER
4
Metal2 to Substrate 11
Metal2 to Metall 25
Metall to Substrate 19
Metal1 to poly 28
Metall to diffusion 27
Gate poly over field oxide 58
Table 4.1 Typical 0.8-sm CMOS rim f&&g csparitmr.
Layer Perimeter C a p a d t a c e F/pm)
Metal2 to Substrate 38
Metal2 to Metall 47
Metall to Substrate 44
Metall to poly 48
Metall to diffusion 47
Gate p d y over field oxide 44
effect of fringing capacitance. One relatively accurate empirical approximation

is given by [9]
C,, = ~[(~)+0.77+1.06(-)0~"+
W W T
1.06(-)0.6] (4.65)
B H
where C,, is the total capacitance ofthe wire per unit length. The contribution
of the fringing effect in many -es k important. "able 4.2 shows the fringing
capacitance per =nit of length.
4.4.4 Example
Consider en inverter with W, = 2W. = 20 pm with 3 pm length of each drain
and source. This inverter is driving B Line of metall of 100 pm length by 2 pm
width a d an inverter with W, = 2W, = 20 pm operating st VDD= 3.3 V.
Low- Voltage Low-Power VLSI CMOS Ctrcuit Design 145
The total load cspacitsnce is computed using the 0.8 p m device parameters
presented in Chapter 3 BI follows:
m The gate capacitance of the dzivcn inverter is
c, = [%L,+W"I;,IC,
= [20 x 0.8 + 10 x 0.81 x 2 f F w 48fF
. The total ovedap capacitance at the ontput is
,c
, = CGD,W, + CODhiW"
Then
C,, = 20 x 215 x lo-'+ 10 x 214 x lo-'

= 4.30 t 2.14 w 7 fF
rn The total drain junction capacitances can be approximated at mid-
voltage of 1.65 V (1/2 of V D ~instead
) of eompnting integrh. We
have far one drain junction
The drain areas are 60 pmaand 30 p d far PMOS and NMOS respec-
tively. The drain perimeters are 46 p m and 26 pm for the PMOS and
NMOS transistors respectively. The total junction capacitance can be
easily calculated and is
Cj s 3 2 f F
Note that this capacitance increaser with the power supply voltage
reduction.
m The wire capacitance is estimated by adding the two components psx-
allel plate and fringing capacitances. The ares of the wire is 200 pm'
while its perimeter is 204 pm. We have
c, = w x I x CW(peV m a ) + +
Z(W i ) x C&r length)
= 200pm' x 19 Y lO-'fF/pm' +
204pm x 44 x 10-3fF/pm
= 3.8 + 9.0 c 13 f F
Note that the fringing capacitance is an important portion of the total

wire capacitance.
146 CHAPTER4
Hence the total capaeitance at the output is 100 fF.Note that the contribution
of the junction capacitance is important. The contribution of each component
wries *om one circuit to another and it depends on the layout style osed.
Before starting any circuit layout, it L important to keep in mind an estimation
of capacitances snch BQ the gate a d ontput capacitance of 1 unit sbe inverter
and the wire capacitance of, for example, 100 fin poly line and 100 p n metall
line. With these data, when starting the design, it is possible to siee different
transistors correctly.
4.5 CMOS STATIC LOGIC DESIGN

From the CMOS inverter we can re&e any static logic function by using the
complementary NMOS and PMOS transistors. In this section we present the
design of NAND/NOR, eomplex and tr-mission gates. The fanin of any
complex gate is defined as the number of inputs of this gate. The fanavt of
a complex logic gate is the number of driven inpnts attached to the output of
this gate.
4.5.1 NANDINOR Gates

Fig. 4.20 shows B 2-input NAND gate (NAND2) and a Z-inpmt NOR gate
(NOR2). Each input reqoires a complementary pair. In the case of the NAND
gate, the PMOS transistors a r e connected in parallel, whilc the NMOS tran-
sistors are connected in series. But in the case of the NOR gate, the NMOS
devices are connected in parallel, while the PMOS devices are connected in se-
ries. Thege gatea consnme only dynamic power while the DC power dissipation
is vero (if VT'S are high) because there is no DC path between VDDand ground
for any logic combination of the input. For the NAND and NOR gates of Fig.
4.20, any input combination (AB = 00,01,11,mlO) there is no path between
the two I&.
The design of these gates, or any CMOS static gate, follows that of an inverter.
As discussed in Sections 4.1 and 4.2, an inverter ir designed to meet a given
DC and tianrient petformanee, then (W/L), and (W/L), are determined. The
(W/L)- and (WjL), of the devices of II logic gate are determined BJ follows:
For example we want to design a 3-input NAND (Fig. 4,21(a)) to have the same
DC and transient as that of an inverter driving the same C,, (Fig. 4.21(h)).
Low-Voltage Low-Power VLSI CMOS Circuit Desagn 147
J
A gF 6
A m =c”
T
148 CHAPTER4
We assume that
W" = W",= w
.* = Wns (4.66)
and
w,= w,= w,,= w,, (4.67)
The first thing to do is to approximate the gbtc by M equivalent inverter where
the effective p is given by
1 1 1 1 3
G=G+-t-=- (4.68)
w 2 s
.
0 0,
and
?Pelf =a, (4.69)
To have LS of the gate in the midway of the power supply in DC character-
istics, the following condition should be satisfied for the Sinpot NAND gate
(see Eqnation 4-18)
PPLlf = a<n (4.70)
which means that
P, = 0.
3 (4.71)
To have the same delay BE an inverter with determined eiues, we should have
(assuming that L is the same)
w,,= w*e,l = w, (4.72)
and
w,,.= w,.,,= T
W, (4.73)
But in practice the size of these transistors, composing the 3-input NAND gate,
should be increased because the output parasitic capacitance afthe NAND gate
(or any complex gate) is larger than that of the inverter. Hence
w,> w, (4.74)
and
W" > 3w"i (4.75)
Note that by circuit simulation, we can properly size the transistors. Moreover,
it should be noted that the back-gate bias effect has to be taken into consider-
ation in the design of the series NMOS devices in NAND gate (or repier PMOS
in NOR). The relies-connected MOSFETr, during switching, exhibit a thresh-
old voltage increase doe to a non-null source-substrate voltage as shown in the
simulation example of Fig. 4.22. In Fig. 4.22(a), the transistor NL of the
first NAND3 gate near the ootpot outl, is driven by the latest signal becanse
N, 8nd N, are already ON. Therefore, the node oi is at the ground level and
the source of the transistor N, is not subject to the body effect. In t h e other
NAND3 gate, the transistor N , and N6 are ON, while Ne receives the input
signal. In this case, the node a. and bz are eit II certain voltege Icvd. Henee,
during the discharging period the transistors N, and N5m e subject to the body
effect. This effect slows the discharge of the output aa shown in Fig. 4.22(b).
The output outl is discharged more ispidly than the output oui2. One way
t o reduce the body effect at the logic level is to put the transistor, driven by
the latest ardving signal, near the output. The e d y arri'ving sign& should be
used to discharge the nodes snsceptible to the body effect. For example in ~n
adder &=nit, the transistor driven by the carry is placed near the ontpot.
Let us derive the output parasitic capacitance ofthe m-input NAND gate and
compare it to thst of the CMOS inverter of Fig. 4.21(b). We have
c, = *wpc,, + w,c, + mC*? + .c, (4.76)
The Ce. of the m-input gate is larger than that of the CMOS inverter by the
ratio W,/W,.i. Fmm the above equation it is obvions that C, of the m-inpnt
NAND gate is lrtrger than that of the CMOS invater.
Note that for the same pedormance and far the same number of inputs the
NAND gate consumes less silicon area than that ofa NOR gate because of the
s m d e r *pea taken by the NMOS devices. Hence, CMOS NAND gates arc more
widely used than NOR gates. Moreover, the NOR gate eonsume~more power
than the NAND gate.
4.5.2 Complex CMOS Logic Gates

The strategy used to build NANDINORgater can be extended to build more
complex logic gates. Complex logic functions can be realiied by connecting
several NAND, NOR and INVERTER gates. However, they can also be 6%
eiently realized oring a single CMOS logic gate. Any complex CMOS gate is
formed by two N and P logic blacks as shown in Fig. 423(a). The two blocks
have the same number of transistors. Fig. 4.23(b) shows a threcinput complex
CMOS gate and its logic equivalent symbol. The topology of the block N is the
dual of the block P, i.e., p a d e l connections become sexier and vice v e w . In
either the P or the N logic blocks, the pardel combination is placed Iar from
the output to minimize the output capacitance and hence improves the speed
and maybe the dynamic power dissipation. For example, the contribution of
150 CHAPTER4
the N block to the output capacitance in Fig. 4.23(b) is less than that of Fig.
4.23(c). There is no direct DC path between VDD and ground for any of the
logic input combination. In practice, the complex CMOS gates are used for a
marimurn f& of 6-6.
Low- Voltage Low-Power VLSI CMOS Circuit Design 151
Logic
Block
B
c-
Logic
ci5 (C)
Figvre 4.13 CMOS

152 CHAPTER
4
4.5.3 Switching Activity Concept

So far, we have discussed the dynamic power dissipation of an inverter due
to the load capacitance. Whet about a CMOS complex gate driving a load
capacitance ? The dynamic power dissipstion has two components in B complex
gate. The internal cell power, P*mcd,,n, and the capacitive load power. The
internal cell power consists of the power dissipated by of the internal capacitive
nodes. Sometimes the internal short-circuit power is added to the internal cell
dynamic power.
The dynamic power for B complex gate cannot be estimated by the simple
expression Cr,ViDf, because it might not always switch when the dock is
-
VODand VDD 0 transitions,
the switching activity a determiner how many 0 + V O Dtransitions
~
-
switching. The switching activity determines how often this switching occurs
on a capacitive node. For N periods of 0
occur at
transition 0 -
the output. In other words, the activity Q represents the probability3 that a
VDDwin OEEU during the period T = l / f . f is the periodicity
of the inputs of the gate. The average dynamic power of B complex gate due
to the output load capacitance is
P* = aCLV;,f (4.77)
The internal power dissipation, due to the internal capacitive nodes, can be
characterized by simulation. Fig. 4.24 illustrates an example of a complex gate
with internal nod-. The internal dynamic power of a cell is gken by
"
P k A p = xQiC$xvDDf (4.78)
i=,
where R is the number of the internal nodes, Q, is the switching activity of

each node i, C;is the parasitic capacitance of the internal node, and V, is the
internal voltage swing of each node i. The parasitic capacitance at the output
is included with the load CL.Note that internal voltage swing can be different
than VDO.
4.5.4 Switching Activity of Static CMOS Gates

In this section we consider the computation of the switching activity of static
CMOS gates. We will discuss the case of dynamic gates and other circuit styles
lDvring tbis tranritionLhc enorgy CzVi4 is d r a m &om the avpply
'Wc u s y m c that thc @c doar not expert-= sLkhbg
Low-Voltage Low-Power VLSI CMOS Circait Desaggn 153
I L
in the next sections. First we consider the c s e of a NOR gate. Then we treat
several rtatk gates. Table 4.3illustrates the truth table of the NORgate. From
-
the table the probability that the output is at zem is 3/4 and that it is at one
is 114. The probability for (I VDDtransition is eompnted by multiplying the
probability that the output d be at sera, Po,by the probability it d be at
one, P,.
3 1 3
PNOn, = Po.P, = - Y - = - (4.79)
4 4 16
We aFsume that the inputs ate uniformly distributed (i.e, the probabilities
P(A=I)=P(B=l)=I/1).
We show that for m y bodean function, the activity d a static gate is given
by
OI = P(0 4 1) = P,.P, (4.80)
where Po is computed by dividing the nvmber of zeros by the total n-ber of
input eornbin&ons (N = 2" for n-input gate) and P, is computed by dividing
the number of ones by N. Po is also equal to (1 -PI), Fig. 4.25 shows the
probability that the output maker an 0 3 1 transition for several static gates.
The probability of transition. at the inputs are assumed uniformly distributed.
Low- Voltage Lour-Power VLSI CMOS Circuit Design 155
+ ~ P(O-21)
114
P(0 +I j
3/16
3D ‘I4
Figure 4.11 output octivitics Rr static lagie gates

1/64
with d o d g dis
tribnted inpute
4.5.4.1 Example
As an example of a logic decision far low-power, consider the different Lnple-

mentation of an 6-input AND gate driving a 0.1 pF load. As shown in Fig.
4.26, we may compare the following implementations:
.
rn
Implementatirm 1 : an 6-inpnt NAND and an invater.
Implementation 2 : two 3-input NANDs and one 2-input NOR.
Implementation 3 : three 2-input NANDr and ODE 3-input NOR
The library osed of such 8 comparison is a high-performance standard cell

library optimbed for speed. Table 4.4 shows some eharacteristics of the library,
where the average delay is reported which is the average v d u e of the rise and
delay timer. W, = ZW, = 10 pm is set for all the t r d t o r s composing
the different gates. The delay is a function of the outpui load capacitance4 C,
in pF. The area is a function of a unit area called cell grid. Each unit area for
a cell h= a certain height and width. Also included in this Table, is the input
capacitance of a gate and the output parmitic capacitance in fFr. We make,
for this example, the following annumptions:
‘Tlua saparitmcc doer not inrlvda the output pararilic one.
156 CHAPTER4
P = 6314096 P = 6314096
01
lrnplernenialion I
Low-Voltage Low-Power VLSI CMOS Circuzt Deszgn 157
= We neglect the \siring capacitance between the Merent cells; and

m We neglect &o the internal power of each gate.
Gate Area output Input Average

type (eeU unit) cap. (fF) cap. (fF) delay (ns)
INV 2 85 48 0.22 + 1.00 C.

NAND2 3 105 48 0.30 t 1.24 C.
NAND3 4 132 48 0.37 + 1.50 C.
NAND6 T 200 48 0.65 + 2.30 C.
NOR2 3 101 48 0.27 + 1.50 C,
NOR3 4 117 48 0.31 + 2.00 C.
First we compare the delay and the iliea of the different implementations. Us-
ing the data of Table 4.4, the results are reported in Table 4.5. The delay may
be computed or simulated by SPICE as illustrated in Table 4.5. The imple-
mentations 2 and 3 offer the best speed compared to the first one. However,
they requiz. more area.
Implern. 1 Implem. 2 Implem. 3

Area (cell unit) 9 11 13
Computed delay (ns) 1.1 0.85 0.87
SPICE delay (m) 1.1 0.86 0.83
Let us now compare the power dissipation wing the power cost function. It ir
defined by
Power coat = CP.-.,,C, (4.86)
158 CHAPTER
4
where Po+,,; is the probability of transition 0 -1 at each node i and C: is the

t o t d capacitance at each node i. We assume that the inputs A, B, C,D , E ,
and F a r e uncolrdated andrandom (i.~.,E = 0.5). For the implementstions
of Fig. 4.26, w e compote the transition probabilities. Table 4.6 summarizes
the procednre of probabilties compntation of Merent nodes in the drcnit.
lmplomentatian 1 01
P, 63/64 1/64
Po = 1- P, 1/64 63/64
^^II^^^
PO-, 65/4086 oa/nuao
Implementation 2 01 0 2 2
PI 718 7!8 1/64
Po = 1 - P, 118 1/8 63/64
PO-, 7/84 7/64 65/4090
Note that the node 01,in implemention 1, has a lower switching activity =om-
pared to the other two. To compute the power cost function we laiu not indude
the p~imaryinputs. Table 4.7 illnstrates the results of this calculation. The
results indicate that implementation 1 has the lowest power. So technology
mapping is important for low-power applications.
We consider now another example using low-area 0.8 p m CMOS standard eel!
library for the &input AND implementation. Some characteristics of this li-
brary are s h o w in Table 4.8. Cornpazed to the library presented in Table 4.4,
this library uses sma!! transistors with W, = W, = 4 em. Compared to the
Low-Voltage LowPower VLSI CMOS Circutt Deszgn 159
case of the highperformance hbrary, the cell area unit, in the low-area ease, LS
smaller by a factor of 1.5. Note that the delays of diRerent gates are higher.
Bowever, the input gate and output parasitic capacitance$ me lower Thus,
this hbrarg c a n be used for low-power fonction implementation.
Table 4.8 Characteristic. of s lov.mcs 0 8 ,zm CMOS bbprrry
Gate Area Output Input Average

type (cell unit) cap. (fF) cap. (fF) delay (ns)
INV 2 35 13 0.23 t 3.73 C,

NAND2 3 60 13 0.28 + 4.40C,
NAND3 4 65 13 0.34 t 6.00 C.
NAND6 7 81 13 0.53 t 7.13 C,
NOR2 3 62 13 0.35 t 6.27 C,
NOR3 4 69 13 0.47 t 8.84C,
Implem. 1 Implem. 2 Implem. 3

Power cost (D) 3.5 19.5 43.7
The delays reported in Table 4.8 do not indnde the effect of the input voltage
-
dope. The delay, of the m e r e n t implementations, w.s simulated with SPICE
and it is almost the pame for all the configuration. The delay is 1.5 "8. Using
the same reasoning discussed earlier we can compute the power cost function
wing this library. The transition probabilities are the same, except the total
160 CHAPTER4
node capacitances which are different. The results of the power cost evaluation
are illustrated in Table 4.9.
The power cost, in the case of low-power library, is almost half of that of high-
performenee. Still, implementation 1 hea .e low-power chs*Factedstie while the
speed is h o s t the S-e compared to the others. The me- is also lower than
the other implementations. This example shows that the power dissipation e m
be Fedneed a t the gate level. Even if we take into account the wire capaci-
tances between the cells atill, the conclusion is valid. The topic of low-power
at the gate-level is discussed more in Chapter 8. Keep in mind, that in this
comparison, the internal power of the gates has not been considered.
4.55 GlitchingPower
Note that in the probabmty discussed so far, we assumed that the gates had e e m
delay. In that case, we m e not taking into account the glitches and we consider
only the transitions between stable states. Glitches must be considered if we
assume non-aero delay at gates. Thus the total dynamic powei of a circuit is
the total dynamic power with iero delays power and the glitching power. So
what is the glitehing phenomenon?
In a static logic gate, the output or internal nodes can switch before the correct
logical value is being stable. To illustrate this spurioos transition, Fig. 4.2T
ABC make the following transition 100 -

shows an example of a circnit with a cascaded configuration. When the inputs
111, the output, with %emdelay
gates, should stay high. However, considering a unit delay for each gate, the
output 01is delayed compared to the input C and hence csusing the output
Z to evaluate with the new value of C and the old value of O1.In that care,
the output expedenee. a dynamic hazard (glitch). This transition increases the
dynamic power of the circuit and adds a dynamic component to the switching
activity,
Another example is shown in Fig. 4.28(a). The cawaded circuit exhibits a

glitching pioblem. However, the same function can be implemented oring bal-
anced delay implementation as shown in Fig. 4.28(b). These are some mles to
amid this problem:
Balance delay paths; psrticdaxly on highly loaded nodes. Insert, if

possible, buffers to equirliee the fart path; and
Lou-Voltage Low-Power VLSI CMOS Circuit Design 161
.
m Avoid if possible the carcaded implementation; and
Redesign the logic when the power due
component.
to the glitches is an important
4.5.6 Basic Physical Design

To implement simple gates, the physical layout should be performed. It is
usually eary to draw a layout of a gate with well arranged transistors. For
example, for the inverter, Fig. 4.29(~.) shows a possible layout implrmenta-
tion. The metall is need for the power liner. Many uariations can be drawn,
depending on the use of the gate. Fig. 4.29(b) shows another layoot variation
of the inverter prhere metal2 is used BS the power lines. For clarity the wells
and body ties are not shown in there layouts.
Similarly, the rchemstic of NAND2 and NOR2 gates E B be ~ converted to lay-

outs. Fig. 4.30(a) shows one pwsible layout of a tw-input NAND gate. The
layoot can &a be arranged to draw the inpot poly lines vertically. The layout
artist should draw the gate taking into consideration the environment of this
cell (the connectivity to others). Fig. 4.30(b) shows the lilyout of a two-input
NOR gate. Note that the junction mess should be aptimieed during the layout
to reduce the power dissipation and improve the speed of the cell. A n imple
mentation of a %input NOR gate with B high output drain junction capadtsnce
is shown in Fig. 4.31.
To do a layoat of a complex gate (i.e, several tens of transistors), the folloving

general layout guidelines can be used :
.
m
Set the siaing of the transistors composing the gate;
Run V D ~and, Vss in metal (1 or 2) hodmntdy. For example, VDD

at the top and Vss a t the bottom of the cell in semi-rectangular form;
m Define the polysilicon gate lines odentatioionr and order them for max-
imum active area cros~overto form the gate regions;
rn Place the N-block (NMOS transistors) near Vss and theP-block (PMOS
transistors) near VDD. The PMOS devices should be located in the
common N-well ifthey use the same bulk potential;
m Adhere to the design rules snd m e if possible an interactive DRC (De-
sign Rule Checker);
162 CHAPTER4
AEC loo Iii
B
-* (a1 D
Lorn- Voltage Lou-Power VLSI CMOS Circud D e q n 163
164 CHAPTER
4
"OD v~~
i;ll lhl
.. -. . .
B
OUI
A
Low-Voltage Low-Power VLSI CMOS Circuit Design 165
rn Keep the internal junction and wire capacitances to the minimum to

minimiae the p’aes and the delay; and
m Complete the uonnection of different nodes inside the cell using the
different layers available (metall, p l y , etc.).
Note that the power Line widths are drawn taking into consideration the cur-
rent consamed by the cell because the electromigation phenomena sets the
minimum width of eoodacturs.
Far low-power design, these are some layont guidelines:
m Identify, in your circuit. the high switching activity nodes;

m Use for these high activity nodes low-capacitance iayers such BS metall,
metal$ ete.;
rn Keep the wires of high activity nodes short;
w Use low-capacitance layers for high capacitive nodes and busses.
For large width devices, use special layout; such BF interdigitated fin-
gers [3] and donut (round transistor); to achieve & l o w drain junction
capacitance; and
m Design complex cells or blocks using, as much as, possible custom a p
proaeh.
4.5.7 Physical Design Methodologies

There are many layout methodologies to do the physical implementation of a
complex circuit. The furt methodology is called fill-eartom design, where
the layont of each transistor is optimized. The layout of B complex block is
performed by costom design for r e a ~ o nof~ speed. However, this style leads to
low design productivity snd is ~ a x l yused in ASIC5 and digital processms. Bnt,
when the low-power is an issue the full-cnstom deign can be used to M e
the power of the circuit.
Another design methodology is the standard-cell approach (or semi-curtom

design) . That is, several gates and functions are created in the library such as:
166 CHAPTER4
NAND, NOR, XOR, AOI, OOAI, latches, buffers, multiplexers, full-

adder, fipfiops, etc.;
= Linear cells : low-battery detector, power-np reset, etc.;
m MSI/LSI functions : ALU (Arithmetic and Logic Unit), countezs, mag-
nitude comparators, ete.;
rn Compiled maemeellr : register file,FIFO (First In Fhrt Out), ROM
(Red Only Memory), parallel multiplier, etc.; and
Macrocells : Sjle-bit microcontroller, 16-b fixed point DSP, UART
(Universal Asynchronous Reedver/Transmitter), etc.
A &wit is designed by capturing the rehematie or thefanctional model (VBDL,

Verilog, etc.) of the cells. The layont is generated by an antomatic placement
and routing. An example of a CMOS standard cell library can be found in [lo].
In standard cell approach, the logic c& have the same height and the width is
variable. In many libraries, the cells are available in two layout styles. In the
area-optimized cell, the cells me made as small an possible. In the performance-
optimized style, cells are optimieed for high-speed performance and, as a result,
occupy more aces than the small cells. Even the height of the c& in the two
styles is different. A typical standard cell layout for a NAND gate is shown in
Fig. 4.32. This methodology providu lower cost and higher productivity than
the fall-enstom one. For low-power applications, the small and large cells for
the same function can be c a r e U y chosen to optimise the power in a complex
design without degrading the timing requirement.
The third layout methodology is the gete array6. The gate arrays consist d i m -
plemented cells and need only the personalination steps. Fig. 4.33illuetrates an
example of gatearray core using Sea-Of-Gates structure. It consists of I/O and
internal cell areas. The 110 cell area contains pads with input/output buffets.
Theinternal cell array eontainsscontin~ousarrayofNMOS and PMOS tran-
sistors. Hence, the transistors and interconnects a r e & e d y predefined. The
design of a logic gate consists of wiring the different tramistors using metal-
lization and contacts. The isolation of a logic gate is performed by tying the
polysilieon gates of the limiting transistors to Vss or VDDdepending on the
type of gate diffusion. Routing channels are routed over unused transistors.
This methodology permits the reduction of the design cost at the expense of
area, power and performance. Ont recent gate array nrchiteeture WVIU based on
multiplexers with small sine transistors to maintain low-power characteristics
1111.
Figure 4.53 An cxunpk ofstandwd c e l l I s ~ o u(NANDZ)

l
168 CHAPTER
4
7 I/O Cell area
VDD(metal)
Pdiffusion
Polysilican gates
N-diffusion
V
ss (metal)
Comparing these layout approaches, the full-custom methodology offers the

beat approach to minimive the power digsipation. However, for a complex d t
sign, it is costly to use such a design strategy. The standard cells approach
provides good performance and an improved design time. However, in many
libraries the devices ate oversized for performance purposes and conrequently,
the power dissipation would be high. To efficiently use the standard cells tech-
Low- Voltage Low-Power VLSI CMOS Circurt Deszgn 169
Figure 4.14 (a) CMOS kran.mis&one t c i (b) and ( c ) rchrmatic symbols.
nique for low-power applications, the library should be expanded to include

several versions of the same function with different driving oapabilities. In that
case, powerful synthesis tools are needed to optirnim the power while main-
taining the timing specificstions. Moreover, both the standaid c& and gate
arrays stylu require new place and route took for low-power design.
4.5.8 Conventional CMOS Pass-Transistor Logic

Another alterndive to CMOS static complementary logic ir the conventional
passtransirtor logic based on MOS switches. Fig. 4.34 shows a CMOS trans
mission gate (TG) as primitive element. It u o n ~ t ro f a complementary pair
connected in parallel. It acts as B switch, with the logic variable A as the con-
trol inpnt. If A is low, the gate is OFF and presents e high resistance between
the terminals. If A L high, the gate is ON and acts as a switch with an on
resistance of R,, and % in pamllel. The equivalent resistance of the TG is
RTD = R,,llG. This resistance is ulways less than the smallest among R,
and 4. This permits a fast switching characteristic. When the input I is at
Voo, then the outpot F is quidtly charged initially by the NMOS, then at the
170 CHAPTER4
vD;k;
-
>"
PMOS ON
NMOS ON
TlIlE
end by the PMOS transistor as illustrated by the equivalent resistances of Fig.

4.35. In this figure, we assme that at V,, = 0, A and A are set to their final
values. During this transient switrhing phase the NMOS is subject to the body
while the PMOS is not. When a eero, at the input I , is to be transmitted then
the PMOS is subject to the body &ct. The PMOS and NMOS transistors
should be sbed such that they charge and discharge the output symmetrically.
If VT. = IVT,~and the body effect is symmetrical then we can size the devices
such as P. = Pp. Sometimes, equal shed NMOS and PMOS devices can be
used. It is easy to see that the delay of the TG gate in approdmately indepen-
dent of the input level. This is not the case if the pass-logic Y S ~ Sa singlcchannel
Low-Voltage Low-Power VLSI CMOS Czrcurt Deszgn 171
transistor. A drawback of the CMOS TG is that it co~~sumes

more area than a
single-channel transmission gate (NMOS TG 01 PMOS TG). Thnr, if the area
is ofprime concern, NMOS TGs are used.
Any CMOS TG logic (we call it here conventional pars-transistor logic) function
can be implemcntcd using the TG primitive element described above. In such
implementation the transistor count, hence the silicon area, is low compared to
standard static CMOS implementation. This ishighlighted in the implementa-
tion of such functions BJ mdtiple-g, demdtipleldng, decoding and addition.
Pi. 4.36 shows & 4 1 multiplmer, where the data lines A, B, C and D are
contlolled by S1 and S2 such that
F = A S I S ? + B.S,.Sz + C.S& + D.S,.S2 (4.87)
Thm form of logic is used when the inputs and their logic complements are
available. The implemenlation does not need VDDor ground liner. However,
the implementation suffers from a number ofdrawbacks; the driving capability
of the ckcnit is limited and the delay increa~eswith long TG chains. Moreover,
the eireait does not provide a restoration ofthe logic lev& i.e., the logic gates
are passive with no gain elements. Pi.4.37 shows an example on how to lestore
the voltage levels in chained TGs. When 8 TGs are pnt in s u i e s . the output
signal changes very slowly. However, when an inverter stage is added every 4
TG stages, the level is restored as shown in the SPICE voltage waveforms of
Fig. 4.37.
The CMOS TG logic can be used in CMOS d r c u i t design offering an extra

degree of eirenit design Beedom. A0 example is the full-adder. The adder
Circuits dl be diseused in detail in Chapta 7. Fig. 4.38 shows the schematic
of the XOR gate which is used by the adder. When the input A is low, A is
high. The transmission gate TG is closed, then the output is equal to B. When
A is high, A is law. The inverter formed by the transistors N m d Pis enabled,
then the output is equal to A. The TG gate is open in this care. To implement
an adder lets first review its functions. The boolean function o f a full-adder
are:
S,, = A B B B Ci, (4.88)
,C
, = A.B t &(A + B) (4.89)
A and B are the inpots, Ci, the carry input, ,
,
S is the sum ontput, and C,,
is the carry output. The truth table ofan adder is shown in Table 4.10.
The CMOS implementation ofa one-bit full-adder is 3hown in Fig. 4.39(a). It

requires 28 transistors and has two gate delays. In this circuit the transistors
172 CHAPTER4
D
Low-Voltage Low-Power V L S I CMOS Crrcuzt Deszgn 173
n<I
controlled by the carry signal C,, should be placed dose to the output. This
will _offretthe body effect problem, since the carry is the latest arri-8 signal.
An optimiaed implementation of the full-adder is shown in Fig 4.39(b) It uses
only 18 transistors and is bared on the XOR function shown in Fig. 4.38 and
the TG gates. Hence, this adder is more compact and farter and eonrnmer less
power than the complementary static one.
174 CHAPTER4
Figure 4.38 TG XOR gate.
A B C;., S,, C,
0 0 0 0 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
Table 4.10 Adder l h t h Table
4.5.9 CMOS Static Latch

Fig. 4.40 shows a mxs-cmpled CMOS static latch. In the storage mode (input
LD = O), when the node A is high, B is low,PLand N, are ON while P2 and
Nt are OFF. Similarly, when A is low, B is high, PI and N2are OFF while P,
and N1 are ON. The standby power &sipation of the ceU is very small. The
state of the htch changed by turning the two transmission gates ON (LD
high) and applying the input and its complement.
Lorn- Voltage Low-Power VLSI CMOS Circuit D w i p 175
176 CEAPTER4
Figure 4.40 CMOS cros%couplcdstatic latch
4.6 CMOS LOGIC STYLES

CMOS logic har been known to have a negligible static power dissipation. How-
ever, this is valid as long as VT is not too low. However, it has low-speed and
consumes large area because for n-input, twice the number of transistors is
required. As B result, it is sometimes desirable to have faster and smaller
logic gates at the cost maybe of parameters such lls : noise margins, power
dissipation, etc. This section discusses many CMOS logic alternatives to wm-
plementary CMOS and also the clocking issuer in a VLSI system.
4.6.1 Pseudo-NMOS CMOS Logic

The gate area of complementary CMOS can be reduced if CMOS circuits u e
designed in B way similar to NMOS circuit f a d e [IZ]. A PMOS device is
used to replace the depletion-type device in NMOS family. This type of circuit
is referred to as pseudo-NMOS, as shown in the inverter of Fig. 4.41. When
the input A is low, the output is high and at VDD.When A goes to LL high level,
N turns ON while P is still ON. I0 this cllse, the output never reacher zero
and taker a value VOLdetermined by the ratio & / A and the logic is called
ralioed. To examine V0h, we nre simple analysis. When A is at VDD,N is in
the linear ~epionwhile P is saturated. By equating the currents using simple
models, we have
Low- Voltage Low-Power VLSI CMOS Czrcwt Desrgn 177
Thus V0,, depends strongly on the ratio &/A,. For example, if we need B
VOL = 0 . 0 4 V ~and ~ VT = 0 . 2 V . ~ , then the ratio &I@, should be equal at
l e s t to 0.1. If the NMOS transistor is minimom she, the PMOS should be
weak to provide adequate noise margins (low Voc). In this case, the rise time
of the gate is too slow. If we improve the rise time, the ratio condition tends
to inerurre the gate area a d hence the input capacitance.
Although this circuit offers a reduetion in total transistor count and ease of
layout, it has the disadvantage of non-~erostatic power dissipation. Since the
pull-up PMOS is always ON, a current flows from VDD to ground whenever
the pull-down section of the pseudo-NMOS is turned ON. This current is the
source of the static power dissipation. When II pseudo-NMOS gate, with antput
a t VoL, is driving another one, the d i v a gate, with OFF pd-down section,
leaks a high eubthreshold cnrrent but still this cnrrent is lower than the one
when the pull-down in ON. For a-input preudrrNMOS gate there ate (ntl)
transistois. Fig. 4.42 illustrates an example of complex gate implemented in
pseudo-NMOS style. This logic hns been used in many applications such 88.
decoding logic for memories and PLA. Because of its high static power, it is
not suitable for low-power applications.
4.6.2 Dynamic CMOS Logic

To reduce the area and improve the speed of CMOS circuits, another popular
style e d e d dynamic iogie is used. Fig. 4.43 shows a dynamic CMOS gate.
This logic is referred to as domino CMOS logic [13]. The domino gate shown
in Fig. 4.43(a) consists of e dynamic CMOS drcuit followed by a static CMOS
178 CHAPTER
4
A
R
i
Figure 4.41 PseudaNMOS complex laslc g a b
buffer. The dynamic circuit consists of a PMOS prechargc transistor P i , an

evalnation NMOS transistor N,,a storage capacitor C , and an N-logic block
which is a serie-parallel combination of NMOS transistors estivated by the
inputs and implementing the required logic. The storage capacitance represents
the parasitic et node A.
This circuit u4es asingle clock phase clk. DuMg theprecharge p k e ( c f k= O),
the storage capacitance is charged through the PMOS pull-up PI to VDDand
the inpats have no effect since there is no path to ground. The output of the
buffer is precharged to ground. During the evaluation phase (cfL = l), A', is
ON, and depending on the logic performed by the N-logic block, the node A is
either discharged or it will stay precharged.
Fig. 4.43(b) shows an example of complex gate. In a cascaded set of domino

logic stages, a5 shown in Fig. 4.44, the first stage evaluates and causes the next
one to evaluate (like domino f a ) . The number of erscaded skages is limited
by the evaluation clock phase.
Compared to psendo-NMOS, domino logic has the same k p n t capacitance snd

improved iise time. However the fall time is affected since there is one more
transistor in the pull-down section. Also the gate is suitable for high-fanout
operation because of the CMOS buffer. Moreover, it is efficient in area for high
+
fanin because n 4 transistors are required compared to 2n for CMOS static
gate.
Some limitations of the gete ue:

Low-Voltage Low-Power VLSI CMOS Cwcud Deszgn 179
T
180
er
clk
Stagel
Figure 4.44
sage2
Dormno logic chw

stage3
4
CHAPTER
1 The domino gate has a problem called charge sharing OP redistribu-

tion. Fig. 4.45 gives an example to explain this problem. During the
precharge, the node A is a t VDD and charge CVDDis stored on the
capacitance C. We armme (worst-case) that the pararitic capacitance
of nodes B and C,C, and C2respectively, have iero charges. During
the evaluation, the node A should stay at VDD,however, due to C, and
Cz,charge sharing take place. Using the charge conservation principle
before and after redistribution, we have
CVDD = (C+ c, + C,)V. (4.92)
Hence the final voltage of node A is

C
VA = (4.93)
c + c, + c, "DO
Iffar example CI = Cz = 0.6C then this voltage wonld be VDD/Z.This
voltage can alter the logic and provoke the CMOS buffer to dissipate
high static power dissipation.
rn If the clock frequency is too lour, the node A leaks the charge stored
on C due to the leakage cnizents. The dynamic node can leak its
charge in n t h e of few hundreds of #r to few ma, depending on the
temperature, the Starage capacitance and the leakage cnrrent. When
Low- Voltage Lour-Power VLSI CMOS Czrcvit Design 181
Figure 4.45 Charge aharingin h - c CMOS l o p k
using power-down techniques, the dynamic nodes should not be left

floating for a long time. If the leakage is high with low VT devices,
the charge can be deleted in B t h e IU low s 100 RS. This problem
is similar to charge sharing. Fig. 4.46 shows two alternates to solve
the problems of charge sharing and leakage. In Fig. 4.46(a), a weak
PMOS (low W/L) is added BL pull-up transistor. This circuit operates
like pseudo-NMOS during evaluation phae. Hence it consumes some
static power dissipation. If the circuit operates at high-fceqnency, the
added Teak PMOS har no role because it does not have enough time to
operate. Note that this weak PMOS inereares the ontpnt cappacitmee
and then it slows this dynamic gate. To eliminate the DC path during
evaluation, the gate of the weak PMOS c a n be driven from the output
of CMOS buffer as shown in Fig. 4.46(b). This circuit adds another
capacitance at the output ofthe inverter. A third alternate circuit
which solves only the problem ofcharge sharing is shown in Fig. 4.41.
In this chcoit configuration, intermediate nodes of complex gate are
prccharged with additional precharge PMOS devices.
rn Another limitation of the domino logic gate is that it implements non-
inverting logic functions. Hovever, this is not a serious limitation and
can be overcome, if the need arises, by "Jig CMOS static gates. The
dedgnep can mix both stalic and dynamic CMOS logic circuits in a
given design to optimize the overall performance.
182 CHAPTER4
Logic
Block Block
Historically, dynamic design style have been devised f a low-power charaeter-

istics because of t h e reduced device count. Moreover, dynamic gates do not
experience short-kcnit pover &sipation and glitching problems as in rtatie
&wits. However, to drive the docked transistors, a lluge dock dirtribation
network is needed. This highly loaded network consumes a significant a m o u t
of dynamic power particularly at high frequency of o p e r a t i d . The switching
gate the output maker a 0 -

activities of dynamic gates are higher than those of static gates. In B dynamic
1 transition during the precharge cycle only
if the N-bloc discharges the autpnt during the evaluation phase. Hence, the
probability of 0 + 1 transition is given by
Po-, = Po (4.94)
where Po is the probability that the output has a "0" output. For a two-input
NAND dynamic gate, the output has only one zero for 4 input stater. So,
1 1
Po-, = Po = - (4.96)
2' - 4
~ ~
For a NOR2 gate, we have
Another refinement oftbe domino CMOS logic is shown in Fig. 4.48 [14],where
the CMOS buffer is removed. N and P logic blocks are alternated and each
drke the other. When clk is low (0), the h s t and third stage are prechsrged
high and the second stage is precharged low.
Fig. 4.49 s h w s another NP domino logic called NORA (No Fbcce) [El. Two
sections elk and elk are shown in Fig. 4.49. It is constructed by cascading
N and P blocks followed by C 2 M O S (clocked CMOS) latch. CMOS buffers
(inverters) ace nsed to provide logic inversion. When clk = 1 (evaluation phase
in section dk),the CaMOS latch3 operates like aninverter. When clk = 0, the
latch move* into hold state because the output NMOS and PMOS transistors
ale OFF. In this case, the old data is latched at the output. This latch is used
to avoid signal races. A NORA pipeline is shown in Fig. 4.50 and it consists
of alternating elk and cik sections. Signal racer do not occur in this structure
because of the use of C'MOS. Another logic hlrr; been proposed to oveicome
charge sharing by using additional clocking signals. It is e d e d Zipper CMOS
logic. For more details refer to [MI.
'Scr the ex-ple of the DEC Alpha Ehip in Scc~ion4.8.4.
184 CHAPTER4
Block Block Block
Pigme 4.48 NP do-o I Q ~ E
An example of a pipelined full-addu (FA) NORA circoit is shown in Pig. 4.61.

This cell can be used in many deigns such as B pipelined multiplier. The
output C'MOS latches c a n only use three transistors rather then four. The
NMOS and PMOS tramistor Pa and N, respectively, can be removed from the
output C'MOS latches. The reason is that during precharge phase (clk = O),
the outpnt nodes A and B are set t o ground and VDDre~pectively. Thus,
the transistors PI and are tmned OFF. Benee, the clocked transistors P.
and N, cam be removed and the FA cell is isolated from other sections during
precharge.
4.6.3 Design Style Comparison

If we eompae the above discussed deign styles, static CMOS lo@ is the slow-
est circuit, but the power efficiency is the best, particularly if minimum siae
devices are used. Hence, it is snitable far low-power, m e d i m speed applica-
tions. Note that the static CMOS logic occupies the largest chip area because
complementary functions are needed. The circuit designer can includc, in static
logic, pas-transistor logic to improve the speed and B P ~ B Pseudo-NMOS
. logic
style can be f a t e l than static CMOS logic, howeyer its rise time is long. This
is limited by the low output logic level. Moreover, the most serious drawback
of pseudo-NMOS logic is the high power dissipation in the standby mode. N-P
domino logic is f a t , because it has small input capacitance Wre paendrrNMOS
Low-Voltage Low-Power VLSI CMOS Circvrt Deszgn 185
T T
\?7+
T
To N-Block \?7 i::
(a) NORA clk-SeLdon
To N-Block To I lock
186 4
CHAPTER
clk-Section
-
clK-sect,on clk-Section
Figure 4.110 NORA p l p e h r l o g x o .
Figure 4.61 Pipehod fd-addrr NORA c w c u t
logic and improved rise time. The power dissipation consumed by this logic Is
high due to the hi& switching adi-ity of the clock even if the circuit is not
used. However,power-down techniques can be used t o control the dock of the
logic. Using thi. style, requires from the desi@er to spend more d s i p effort
than the static style to solve all the problems of dynamic logic such 81: charge
sharing, clock skew, preeharging, ate. Finally, we note that pass-transistor logic
is very pxomising for high-performance low-voltage low-powez applications.
Figvre 4.51 Clock skew.
4.6.4 Clock Skew in Dynamic Logic

Clock skew is 8 critical design parameter in high-speed circuits. Fig. 4.52
shows the clock skew in single complementary-phase dock sipds. If & is
generated &om elk, clock skew is possible. The time skew is measured between
the h&-VDD points of clk and & sign&. In the presence of dock skew,
a glitch e m be transmittad from one section to another as illustrated in the
example of Fig, 4.53(b). This structure cant- one stage between the two
C'MOS latches, and a glitch can be transmitted to the last C'MOS latch.
The example ofFig. 4.53(c) does not have this problem. It has been shown that
to eliminate the signd race in N-P domino logic. an even number of inversions
&odd be used between stages 1171. Moreover, the clock skew problem shonld
be minimieed to improve the speed of dynamic circuits. One possible solution of
single complementary-phase dock generation, with miaimd skew and p ~ o c e s -
elk;-
insensitive, is the one shown in Fig. 4.54 [18]. The delays clk. + clk and
d k are equahed with special buffer sizing.
188 CHAPTER4
4c:
4.7 CLOCKING
One way to synchronize thousands of sign& in 8. VLSI system is to employ a
docking strategy. The clock controls the flowof data in the digital system and
reduces the compl&ty of design.
Low-Voltage Low-Power VLSI CMOS Czrcuzt Deszgn 189
clock signal
repistcr input
register register
Figure 4.65 do&dpip.lm. ayrtrm
Moat VLSI processors a r e constructed Using a set of functional blocks (ALU,

shifter, register file, ete.) connected vis pipeline registers as shown in the
example of Fig. 4.55. The clock signd can be split to one, two, three or four
phases. Typically the phases are non-overlapping.
First we pesent the different storage elements (latches, registers), then we

treat two doeking strategies : Jinglcphase and two-phsse with emphasi. on
the former which is usually the main option available in standard cell and
gat-array approaches. The doc$ distdbntion issues are discussed in Section
4.9.4.
190 CHAPTER4
Q
lateh
D i
clock
Q :
4.7.1 Storage Elements

There are many types of storage elements. Some of the ones used in VLSI
design are the fallowing:
4.7.1.1 D-Latch
Sometimes d e d level-sensitive latch. Its operation is shown in Fig. 4.56. The

output changes with the input when the dock is high (case of positive level-
sensitive latch). The D inpot must he rtehle within LL time window around
the positive transition of the clock (Fig. 4.57). The input data is pasred to
the output within B delay ti. The time window i s defined by two times; called
setup'time t , , lrnd hold time h. Setup time, t., is the time needed for the
D input to he stable, prior to the do& edge. More specifically, it is the delay
between the input of the latch and the storage node. Hold time, t h is the time
needed for the D input to he stable after the clock edge. This time relates to
the delay between the clock input and the storage point.
There are a variety of implementations for this D-latch. Fig. 4.58 reviews
some of the static versions. The circuit of Fig. 4.58(a) hhS a weak inverter
used 85 feedback path for latch mode. The mltsge at node A is not changed
by noise or leakage because the feedback inverter would keep the level. The
feedback inserter should have low (Wjl) for NMOS and PMOS (weak inverter)
compared to the transmission gate and forward inverter. This assures that the
transmission gate is capable of overdriving the feedback inverter when data is
being written to the latch. The feedback inverter should he carefully siaed to
guarantee switching for all process corners and maximom fanout condition.
Low- Voltage Low-Power VLSI CMOS Circurt Design 191
The problem of rstioed design in Fig, 4.58(a) can bc avoided by using the
modified version in Fig. 4.58(b), where B transmission gate in added in the
feedback path. When clk = 1, the data is passed to the storage node and the
feedback node is disconnected. When clk = 0, the feedback loop is dosed, and
the latch is in store (latch) mode. Fig. 4.58(c) shows another version of Fig.
4.58(b), where the outputs are buffered. Thia latter latch is fonnd in the cells
library of standard-cell and gate-array. All there described static latches store
their state even ifthe clock is stopped. Note that these latches do not dissipate
any DC power.
To reduce the size of the static latches, dynamic versions can be used as
illustrated in Fig. 4.59, Fig. 4.60 and Fig. 4.61. Fig. 4.59 shows a simple
dynamic latch, where the storage node A, temporarily stores the data. Note
that latches have B property called "trampareney": output follows the input
when the dock is asserted. Otherwise they are yopsqne". Fig. 4.60 shows two
other latches [19]. The circnits of Fig. 4.60(a) is transparent when the dock
elk, is high and latches the data (opaque) when the dock is low. This latch is
positive level-sensitive. The negative level-sensitive is shown in Fig. 4.60(b).
Note that these latches use one clock line ( c l k ) .
The circuits of Fig. 4.60 have redaced noise immunity. For example, for the
circuit of Fig. 4.60(a), when the latch is opaque (elk = O), the node A may
be tristated high with Q tristated law. The node A is isolated and may be
surceptible to noise which reduces its voltage. The reduced voltage of node A
can cause the PMOS PBleaking current, thereby deitwyhg the output Q. This
problem was addressed with latches designed in DEC Alpha microprocerror
PI]. For example the eircoit of Fig, 4.61 is an improved version of Yuan and
Svenrron [19]. A weak PMOS device P3 is added to solve the problem of noise
in positive level-sensitive latch. The operation of this latch follows. When clk
192 CHAPTER
4
weak invenci with small

iwu ror NMOS and PMOS
clk
clk = 0
clk
Figvre 4.68 Simple dynamic CMOS single-dock latch
T T
b high, PI,NI and N3 function like an inverter. Pz,Nz and N4 function &a
&e an 'bwerter. Therefore the latch p~3sesthe input D t o the output Q. If D
falls to low,then A is high and Q is low. When clk is low, Ns and Nn are OFF.
If D goes to high, Pi is OFF,while the nodes A and Q are tristated high and
low respectively. The added P3,in this case, is ON and holds P2 OFF. This
device supplies current to node A and counters any noise.
194 CHAPTER
4
TT T
Figure 1.81 Nan-inverting dynamic ktch with improved n&e immunity.
For R&bility reason many latches have been designed for DEC Alpha chip
[Zl].Some are illustrated in Fig. 4.62. These latches have been designed for all
process corners and circuit conditions (supply Voltage, temperature, rise/faU
times, etc.). The results showed no appmciable evidence of raccthrough for
elk risvjfd times at or below 0.8 ns. With 1-ns rise/fall times, the latches
showed some signs of feilure. A 0.5 ns for rise/faU timer was set for the dock
in this chip.
4.7.1.2 Edge-Triggered D-flip-jop, (E7DFFJ
Sometimes this fipflap is called edgetriggered register. Fig. 4.63 shows

a static veisian (bnffered) of the D flipflop with positive edge-triggered, and
the voltage waveforms. It is constructed by using two latches. The first one
called master, is positive level-sensitive. The second one called slave, is negative
level-sensitive. When the clock is low, the storage node A follows the input,
while the node B stores the old data and is disconnected. Then, when the clock
makes a transition from 0 to 1, the node A stores the input value during the
transition. then ceases to sample any input data. When elk = 1, the master
is in the the hold mode and the node A psraes the data to storage node B of
the slave latch which is then passed to the output Q and Q. In this case, the
outpvt is disconnected from the input D. Hence, the Ripflop doer not have
the transparency property of the latch. When the clock returns b a d to 0. the
slave k in hold mode. By reversing the two latches, B negative edge-triggered
flip-flop can be constructed. This circuit can be found in standard-eeU and
gete-array libraries and represents an important cell in synchronized design.
With high operating frequency. it is desirable to balance the delay of clk and
Low- Voltage Law-Power VLSI CMOS Circuit Design 195
TT T
196 CHAPTER
4
cik locally, to reduce the clock skew problem. The dock skew, in single-phsc
strategy can lead to invalid data storage.
A dynamic version of the positive ETDFF is shown in Fig. 4.64 [19]. The
operation of this drcuit is Unstrated by the voltage waveforms. The d o e
Low-Voltage Low-Power V L S I CMOS Czrcuit Design 197
T T
D i n n
of the hold time of this Ripflop is close to zero [ZO]. This dynamic flipflop,
compared to the static one, needs only 9 transistors and one clock Line. The
negative ETDFF is shown in Fig. 4.65.
4.7.1.3 MiscrlIoneous
Many other latches and Ripflops are available; Car example in gatearray Li-
braries such as the JK Ripflop and the toggle (T) flip-flop. Fig. 4.66 shows
the T Rip-flop with reset control. When elk = 1, the output Q is comple-
mented, whereas when d k = 0, Q keeps its old state.Thir T flip-flop provides
divide-by-2 operation. A J K flipflop is shown in Fig. 4.67. When J and K
inputs are low, the outputs are meintainod on the positive edge of the dock. If
198 CHAPTER
4
T T
J = 0 and K = 1, the ontput Q is set to 0, whereas when J = 1 and K = 0,

the output Q is set to 1. When both J and K are high then the ontput are
complemented.
4.7.2 Single-Phase Clocking

Generic singlephase finite-state-machine (FSM) is shown in Fig. 4.68. The
storage element c a n be either a latch 01a register (Bpflop). For the latch case,
it demands more constrained design because of the transparency property of the
latch. When the latch is transparent, thc statesignals can pass the logic block
more than once during one dock eyele. To avoid race condition in this FSM,
the clock width (of transpateney) has to satisfy B two aided-constraint [22].
Hence, singlephme with latches, in the case of FSM, is insidiously complex.
To reduce the complexity of timing constraint, single-phase ETDFFs c a n be

used. T h e ilipipaop k never transparent. At the clock edge, the state is stored
and it cannot pass the logic more than once during one d o c k cyde. D&&
and synchronizing VLSI circuits with ETDPFr is rather simple and straight-
forward pazticukrly when nsing static Bpilops.
For high-speed CMOS applications it is necessary that the storage elements

should be carefully designed with minimum delay, setup time and dock skew.
In thia case, trktate dynamic latches can be used efficiently. Fig. 4.69 shows ~n
example of using dynamic latches [21]. Notice that L1 and L2 arc tr-parent
latches separated by random logic and are not simultaneously active. When
Low-Voltage Low-Power VLSI CMOS Czrcuit Design 199
200 CHAPTER4
Elk
Q
-
Q .. ...... ~i
Figure 4.81 JK &p-tlop.

Low-Voltage Low-Power VLSI CMOS Circuzt Design 201
Combinational
clk is high, L1 is transparent, whereas when elk is low, L2 is transparent. The

minimum number of logic gates hetween latches can be B ~ F andO the madmum
k constrained by the cycle time.
202 CHAPTER
4
Fig. 4.70 shows another example of singlephase system using ETDFFs. This
system is edge based and the minimum cycle time is given by [22]
t.q.l.,min = ttf,m.r + b s k , m ~+*t..tup,m.* + t.inu.mnr (4.97)

where t i t , t ~ ~ t,.tup,m.r
~ , ~ and , ~i,~.lo,m.r
~ ~ are , worst case ddsys of the flip-
flop, combinational logic block, setup time and clock skew. When design-
ing with gatc-array and/or standard cell approaches, the single-phase clocking
scheme using static ETDFFs is the oaly option available for the designer.
4.7.3 Wo-Phase Clocking

Two-phase "on-ovedapping clocking strategy iernove~many constraints exist-
ing in single-phase discipline. However, the use of two-phase (or multiple
phase) non-overlapping clock atructmes becomes more difficult as clock fre
quendes and chip size increase. This is because of the increase in dock skew
and clock interconnect wking. For high-speed applications, singlephare strat-
egy is preferred and tends to be widely used in many VLSI systems' designs.
Fig. 4.71 shows an example of tw-phase non-overlapping docking scheme.

The first latch LI is transparent when the clock elk, is high, ahereas 1 2 is
transparent when d k a is high. The example of Fig. 4.71 is not the d y way to
build 8 two-phase system. Latches C ~ be R replaced by two-phase master-slave
flip-flops where the master latch is clocked by elkl and the slave latch by elk2.
This latter structure does not have transparency property.
4.8 PASS-TRANSISTOR LOGIC FAMILIES

Sweral pms-transistor logic families, for logic circuit design, have been pra-
posed for improving the speed of CMOS circuits. Such families me: the conven-
tional CMOS pers-transistor logic, the Complementary Pass-transistor Logic
(CPL) 1231, the Dual Pass-transistoi Logic (DPL) [24], and the Swing Re-
stored Pas-transistor Logic (SRPL) [%]. In this section, CPL, DPL, and
SRPL logics are presented and compared.
4.8.1 CPL
The main concept behind CPL ia shown in the block diagram of Fig. 4.72. It
consists of NMOS pass tranrktor logic network driven by two sets of eomple
mentary inputs and two CMOS inverterr used as buffers.
Fig. 4.13 illustrates an example of ANDINAND gate built in CPL logic. At

the node Q for exhmple we have
Q = A.B t B . B = A.B (4.98)
At the output of the corresponding inverter we have NAND function. The

NMOS pass-transistor loaie network forms pull-up and pull-down functions.
When the inputs ( A B ) have the followingcombination (ll),the voltage of the
node Q is a t a voltage given by
VQ = VDD - VTdVQ) (4.99)

204 CHAPTER4
Figure 4.71 Basic CPL l& circuit.
where VT,. is the threshold voltage subject to the body effect. So the invertiog
buffers translate the swing of the output fram ground to VDD- VT,,to a full-
rail logic swing (ground to V D D ) .The logic threshold voltage of the inverting
buffers should be shifted to lower voltage than VDD/Z. Hence the 0 ratio of
the inverter in this case should be higher than unity. This inverting buffer
permits also to drive large load capacitance efficiently. When the output of
logic networks are st Von - VT, then all the output inverters are driven by
reduced $Wing, BS shown in Fig. 4.74. Hence, the DC power of the inverter
increases because the pull-up PMOS device is not completely OFF. The VG,
of the puU-mp PMOS is eqnal to -VTm.Moreover, the drive capability of the
pull-down NMOS transistor is reduced particularly if the power supply voltage
is iedueed. The noise margins are also affected. To solve the problem of DC
power &$pation we can design NMOS transistors with lower VT than that
of the PMOS transistor. Also, the body effect should be controlled. Another
way to solve all the problems associated with the reduced high-level is to add
to the CPL II PMOS latch 8s shown in the case of the ANDINAND circuit of
Fig. 4.75. In this case, the two added PMOS transistors can be sised to be
minimum. as long 8s the high-level reacher VDDin the given cycle time. We
call this style PMOS latch CPL. Careful design should be considered when the
NMOS network has minimum size devices. Otherwise the high-level stored in
t h e latch cannot be discharged.
Fig. 4.16 shows examples of CPL arrays for ORINOR and XORjXNOR fune.
lions. With only 4 transistom we cm pmdnce many awo-kput functions
with their complement. More examples are shown in Fig. 4.17 for 3-input
ANDINAND and ORJNOR gates. In these examples 8 NMOS transistors are
needed to generate the 3-input functions. Any complex logic function can be
constructed easily using this principle of NMOS n e w o r k t~an&%tors.
For e x m -
Ple the full-adder circuit call be constructed wing wired CPL as shown in Fig.
4.18. The circuit is constructed using basic CPL primitives discussed before.
206 CHAPTER
4
(a) (h)
Figure 4.78 CPL ORINOR and XOR/XNOR

Low- Voltage Lou-Power VLSI CMOS Circuit Design 207
Ait; - ~~~
~
B
ii
B
ABC
~
-
ABC A+BIC
-
A+B+C
(a) (b)
Figure 4.71 CPL %input: (4 ANDINAND; (b) ORINOR loaic m a y s
Ako the sizes of the transistors are shown in this fignre for fast operation. The
tr-istors of the NMOS net>mrk, far from the output, have larger size than
those closer to the mtput. This is because the NMOS devices, closer to the
output, pass a reduced swing. The siving of the transistors depends on the
chcuit type, layout and device's parameters, Compared to full-dder imple-
mented in standard static CMOS style, the adder of Fig. 4.78 is much fsstei
and dissipater less power due to the low internal swing. Also the schematic of
this CPL adder is structured resulting in simplified layout.
One drawbad assodated with the CPL logic is the driving capability which is
limited and the delay increases with long pass-transistor chains. So buffering
is needed to restore the transmitted level and improve the driving eapability.
4.8.2 DPL
The DPL is a modified version of CPL suitable foor law-voltage applications. It
deviates the problems of CPL associated with the reduced high level. Example
far ANDINAND gate is illustrated in the schematic of Fig. 4.79. It consists
of NMOS and PMOS pass transistors in contrast to CPL gate, where only
NMOS devices are used. In the example of ANDiNAND gate, the NMOS
tranrktor m e used to pass the ground while the PMOS transistors are used
to pass the high level (VoD). The output of the DPL is full rail-to-rail swing
owing to the addition of PMOS. However. this addition results in increased
208 4
CHAPTER
Low- Voltage Low-Power VLSI CMOS Czrcuit Design 209
A.5 A.B
Figure 4.18 DPL AND/NAND patc.
input capacitance compared to CPL. This wiU not limit the performance of
DPL as will be explained.
Fig. 4.80 shows a comparison between the switching characteristics of CPL,

conventional pus-transktor CMOS and DPL XOR gates. In the truth tables,
the colnmn labeled *Pass" shows which signals are passed and perform the
XOR function. There are some features of DPL
. The DPL gate h a s a balanced input capacitance. This reduces the

dependence of the delay on the input data, contrary to the CPL and
conventional CMOS pass-transistor logic where the input capacitances
for the signals A and B are not the same.
rn In DPL, far any input combination, there are always two eurient paths
driving the output. This compensates for any reduction in speed due
to the additional PMOS. Fox example, when the inputs A and B are
low, A is passed by a PMOS while B is passed by sn NMOS.
A DPL fall-adder implementation is shown in Fig. 4.81. When d the input A,

Band C arelow, for exampie, there are two current paths to the output buffer.
This implementation uses DPL primitives such as ANDJNAND, ORINOR,
XOR/XNOR and MUX to generate the carry and rum signals.
210 CHAPTER
4
CPL
Ciicuii
A B XOR Pars
Table
-"DO - "T,
Figure 4.80 Cornpariaon oi CPL,conventional CMOS TC and DPL PLII~

k-ister iogin for XOR gata.
4.8.3 Modifred CPL

Another technique which uses CPLlike st~llesuitable for low-power/low-voltege~~
h the Swing Restored Pass-transistor Logic (SRPL) [25]. Figure 4.82 show6
the b& of SRPL logic gate. One part is the NMOS network with the CPL
style discussed previonsly and the second part, is B CMOS latch. The crors-
coupled CMOS inverters (latch) permit to restore the logic levels. So, any
logic function in SRPL can be implemented using CPL network and a CMOS
latch st the output. The aieing of such a logic is critical fot speed and power
dissipation issuer. Fig. 4.83 show an example of ANDINAND gate using
SRPL. Incre-8 the sise ofthe NMOS traniistorr in the network,Wnctmm~
Low- Voltage Low-Power VLSI CMOS CtrctLit Design 211
OWNOR
Figure 4.81 DPL Iull-addcLr.

212 CHAPTER4
NMOS CPL
improves the speed as shown in the simulation C U Y ~ of Fig. 4.84. It har been
found that the rim of the latch should be minimum, for a fast operation, using
the 0.8 p n device parameters of Chapter 3. If the siae of the NMOS transistors
in the network k small, the autpnt of the SRPL gate fails to switch to ground
b e c a m the equivalent impedance of the network is lower t h a n the one seen
by the output to VDO. Thk problem becomes wome when many gates are
cascaded. Fig. 4.85 illostrstes this problem in 2 ANDJNAND cwcaded gates.
When the input goes from VDOto ground, the nodes A and B,initidly at VDD,
cannot be completely discharged.
750
I
I
4 6 8 10 12 14 16 18 20
4.8.4 Pass-TransistorLogics Comparison

The speed and power dissipation of the different pars-logic styles. so far pre-
sented, depend on the circuit type and the application of the circuit (cascaded
gates, driving a fixed load, etc.). For the care of 8 full-adder, used in a mul-
tiplier array, B comparison is given in Chapter 7. In general, SRPL has the
lowest power dissipation but careful design is needed when smaU device iim
are used. The DPL consumes more power than SRFL and PMOS latch CPL.
because of the higher transistor count.. Both CPL and SRPL Circuits have the
smallest area and the fastest speed. In summary, CPL-like styles are promising,
for law-power and high-speed applications.
214 CHAPTER4
-0
%+- I T
Part of
thc lalch
4.9 YO CIRCUITS
1/0 circuits connect the on-cbip lo& circuitry to the external world. They
play an impmtant role in the limitation of speed and power dissipstion of the
whole chip. In thu section many 110 circuits are discussed such BS input and
output buffers, dock distribution, clock buffeimg and low-swing 110.The power
dissipation issuer related to there circuits are &o studied. Layout techniques
for 1/0 circuits are not cclverd in this chhapter.
4.9.1 Input Circuits

To distribute en inpot signal to the i n t e n d circuitry of a chip, BO input buffer
is needed. It has its gate connected to the input pad. Excessive electrostatic
charge, on the input pad, can break down the oxide and destroy the trandrtorr
of the input buffer. For an oxide thiekmss of 100 A, the bieakdoxn voltage
is ii 7 V. The voltage build on the gate, from the electrostatic charge, can
be ss high 300 V [%I.
Fig. 4.86 shows an example of electroatstk dkcharge
protection. If the voltage, a t the node N , goes above V m or below ground,
than the coupling diodes D, and D2 limit the voltage excureion of the node
N w i t h -VBz and VDD+ VBz. The role of the resistance R, is to limit the
YDD
peak current that flows in the diodes. %ical d n e s of R are few a hundred of
and m e realieed using the diffusion layers. The input protection Circuit has
a pararitic RC time constant which can limit high-speed operation. It ranger
from a few tens of ps to a few hundreds of pa.
The input buffer, connected to this input pad, consists in general of a number
of inverter stages to drive the internal circuitry. The input buffer. for clock
distribution, needs rpecid care and design and is discussed in Section 4.9.4.
4.9.1.1 SfaficPower Dissipaliorr

When the input signal has TTL (Transistor-Transistor Logic) levels. the con-
ventional CMOS buffer is used to translate these levels to CMOS levels. The
TTL interface has historically specified input voltage levels of 0.8 V for the
low-level input maximum, and 2.0 V for the high-level input minimum. The
recently passed 3.3 V “Low-Voltage TTL (LVTTL)” standard is shown in
Table 4.11.
The individual input inverters are designed by setting their W / L ratio such
that the rwitebiog point of the buffer is near 1.4 V (middle of VILand Vrx).
To have thk switching point of 1.4 V at 5 V power supply voltage, the ratio
W,lW, of the input inverter of the buffer should be at 2.9 using 0.8 pm CMOS
technology. At 3.3 V,this ratio should only be equal to 0.7. However, since the
TTL voltage swing is limited to 1.2 V, the input buffer is always dissipating
216 CHAPTER4
Minimum Madmnm Maximum

high output high inpnt low output low input
+
Figure 4.81 TTL inpuL buffrr.
DC power, BL shown in Fig. 4.87, particularly if the VT of the devices is low. If

the first inverter does not fully translate the input TTL levels then the second
Stage dissipates some DC power. The static power dissipated by a TTL i n p d
buffer is
PTTL= VDDIDTTL (4.100)
where
= IDTTLL
IDTTL tIDTTL~ (4.101)
IDDTTL is the average dissipated current for the CBLSEJwhen the input is at low
and high levels. At VDO= 3.3 V, the input buffer dissipates more static power
when the input is high than when it is low. Fig. 4.88 shows the characteristics
of the static power dissipation of the input buffer. Note that w h a VDD is
sealed down the DC current is reduced beeanre the Vos o f the pull-up PMOS
of the input buffer is zedwed. If the number of TTL input pads is large, then
the DC power of the input buffers could bc an important and limiting factor.
A static power-saving input buffer fox reducing IDTTL for 5 V power supply
voltage har been proposed in [21].
Low- Voltage Low-Power VLSI CMOS Czrcuit Design 217
Figure 4.88 Simdslcd static ~ o w dissipation

r of input bvffcr
4.9.1.2 Dynainic P u w r Dissipation
The dynamic power dissipation of the input pad is mainly internal power. The
total dynamic power of all the input pads (of the $ m e type of example) is
PI= ANsE*<f (4.102)
where A is the switching activity, N , the number of the input 'pads and Eii is
internal energy of the input pad in Watt/Hz.
When the input signal has ECL levels, then an ECL input buffer, with ECL-
**CMOS converter a ~ ensed. In "eeneral the" are imolemented in BiCMOS
technology and con~umea DC power. An ECL-CMOS converter can be de-
signed in full CMOS ps].
218 CHAPTER4
4.9.2 Schmitt Rigger

When the input signal to a chip is slowly e g , a hysteresis circuit is needed
at the input pad to generate B dean edge. A circuit called Sehmitt trigger
can be used for this fnnetion. They are often found at the on-chip inputs.
Fig. 4.89 illustrates the transfer characteristic of ideal Schmitt inverter
with hysteresis voltage Vx = VT+ - VT-. For 3.3 V power supply with 3.6 V
for fast process and 3.0 far slow process, typical d u e s are : VT+,.,,.. = 1.7 V
and VT-+* = 1.0 V. The Schmitt circuit switches at different thrrrholds.
When the input is rising, it switches when En= VT+ and when the inpnt is
'
falling,it switches when K,, = VT.. Fig. 4.90 shows an example of how the
Schmitt t*gw turns a signal with a very slow transition into a Sign& with a
sharp transition.
A CMOS version ofthe Schmitt trigger is shown in Fig. 4.91. When the input
is rising, initially the NMOS transistois are OFF. The Vcs afthe transistor Nz
is given by
v,,, = v;" v m
~
(4.103)
...........
vT+
vr. ....~~ .... ........................
vDD\ Time
6
Figure 4.81 The CMOS Schmilt triggrrrchrrnstic.
When V,. = VT+, N, enters in conduction mode which means VGS, = V,,
then'
V F N = vr+ - VT" (4.104)
'WIneglrct the body offast of N,
220 CHAPTER
4
The voltage VFN is rontiolled by Nt and N,. These transistors opelate in

saturation because
VCSl = VT+ (4.105)
VDS,= VFN = VT+ - VT* (4.106)
and
vG'cs8= V D D ~ VPN (4.107)
VDSS= VDO - VPW (4 108)
The drain currents flowing in N, and Na are equal. Then using a simple MOS
model we have
z L b (4.109)
&(VT+
2 ~ VTm) = ,(vDD ~ VT+)'
We have
(4.110)
where
(4.111)
This equation shows that the trigger point is independent of the process prs-
remeters except for VT,. By symmetry, the trigger point for falling transition,
ULO be deduced from the pull-up section. We have
(4.112)
where
(4.113)
If & = and VT. = -V, = VT,then

VT
VT+ = "OD~
2
+-2
(4.114)
v7.=---VOO VT (4.115)
2 2
VH = VT+ - VT- = vr (4.116)
In thiscase the hysteresis voltage can be made equal to VT. The short-circuit
power dusipation of the Sehmitt trigger can be very important since the rke/fd
timer of the input signal is very long.
Lorn-Voltage Low-Power VLSI CMOS Circuit Design 221
Fig. 4.92 shows SPICE simulation o f the circuit of Fig. 4.91 in 0.8 p m tech-
nology. In thla example, the load capacitance is 0.1 pF and the total power
dissipation is 0.85 mW. The dynamic power &sipation, dne to the load and
parasitic capacitances, is 0.40 m W .Therefore, the power dre to theshort-circuit
-
iS 0.45 m W , which represents 53 %of the total power dissipation.
4.9.3 CMOS Buffer Sizing

When the gate is intended to drive B large load capacitance (larger than the
h p u t capacitance of the gate), the driving CapabilitY is limited and the delay is
large. If we increase the i i e of the gate (driver configuration), we improve the
nse/fall times but still the delay can be improved by putting several stager of
buffering between the first gate and the load. The objective in B buffer config-
uration io to gel the input signal to the load as quickly as possible. Each stage
in the buffer chain should have its transistor widths larger than the previous
(ZZ1.P)
Low-Voltage Low-Power VLSI CMOS Circuzt Deszgn 223
Question : What are the d u e s of the size ratio a and the number of stages
n t o op&e the deky ?
By differentiating t a equation with respect to a and then setting it equal to

aem, we have
= o 2.1 (4.124)
The optimum number of stages ir
,,n = I.(Cf,/C,") (4.126)
In this analysis, we have neglected the pararitic output capacitance of each

stage. Other stndies [30,31, 32, 331 illustrate that the siee ratio a depends on
the ratio of the parasitic ontput capacitance and load cspacitanee. In [34] B new
approach for CMOS tapered buffers, with large Ch/Cs, ratio, was proposed.
It uses B variable sise ratio between the stages.
The power dissipation ofa CMOS bufferis mainly dominated by dynamic power
dissipation for large VT. The short-circuit power dissipation can be neglected
85 first-order analysis [34]. If we indude the parasitic outpnt capacitance. So
stage i, has a t o t d ontput capacitance
c, = O'C., + a.-'Cp (4.126)
we assume that the parasitic capacitance of stage i is proportional to the size

ratio a. The dynamic power dissipation at the output of glrte i is
Pi = c,v;,r = V&f(a'C, +a'-'cp) (4.121)

or
P, = v;,fa'-'(ac." + C,) (4.128)
The total power is
Rence
a" - 1
P, = V&f(aC,, t C,)- a - 1 (4.130)
The power efficiency of the buffer can then be defined as
224 CEAPTER4
where P~isthepowe~dissipated, duetotheloadCL, whichissimply C=V&f.

PT is the total power dissipated given by Equation (4.130). This power effi-
dency, for a given Cc,C,, and C,,is afunction of only the factor a. The term
1 - characteriaes the additional power dissipation overhead, needed by the
buffer chain to drive the load CL.For high values of a,the power efficiency of
the buffer increases. In practice a can be in the range of 2-ta-10. This d u e of
a can beret depending on speed, dday and power dissiphtion constraints.
4.9.4 Clock Drivers and Clock Distribution

U m d y when the dock is to be distributed on-cbip, input buffers me needed.
The clock erenit hss to drbe wry high internal load with extremely h t
fd/Jl/rise times. For example, in the CSLS of DEC Alpha chip [21] the dock
load is 3.2 nF. If this load has to be driven by a large driver, in ~ i s e / Wtimes
of 0.5 ns when the clock frequency is 200 M B z [T.iOrr = 5 4,then the average
transient current would be
r,. =
cE= 3.2 x 10-0 x 3.3 = 21 A (4.132)
At 0 . 5 lo-*
~
OVDD = 3.3V power mupply. The corresponding dynamic power dissipation
due to this clock lobding is
P = CV&f = 3.2 x lo-' x 3 . P x 200 Y 10's 7W (4.133)
This example shows how the docking is an important design issue. A clocking
strategy should be used to distribute the clock to the different functional blocks
of chip with minimum clock skew and low-power dissipation.
The clock skew problem is due mainly to two iuuea
rn The difference in RC intercomat time constants: For example in Fig.

4.94 node A and node B have two different branch lengths to node C.
In this case, the delays of the signals at node A and node B Vir a v k
node C ace different. Therefore, the dock skew is eqoal to the time
difference between these two signals.
m Unbalanced loads a t different nodes: As shown in the example of Fig.
4.95, if the loads at the nodes A and B, Ca and CB respectively, are
different. Then the skew between the signals at these nodes exists.
Low- Voltage Low-Power VLSI CMOS Cmuzt Deszggn 225
-T
Clock Driver
FFZ
Block
Figure 4.95 Clock skew due to the vnbaknced bad. at block A and block
B.
226 CHAPTER4
Several stmtegiea have been proposed to minimiee dock skew. The first a p
proach is to use cascaded inverters (buffer) to ddve B lmge load and feed dl
blocks as shown in Fig. 4.96. The buffer chain is designed by the approach
presented in Section 4.9.3. In another approach, the clack distribution is ac-
eomplirhed by using a tree of clock buffers well sized as illustrated by Fig.
4.97. Identical buffers are used in each level and each buffer sees the s a m e load
capacitance. Equalking clock buffer loads is possible by : 1) equalizing the
interconnect lengths between the buffers of different levels, and 2) the addition
of dammy bufferr st the slightly loaded bvffer ontput. The last distribution
level has buffers which drive the functional elements such as registers. This
structure results in very reduced skew and the only skew that exists is the one
produced by variations in process parameters. To further minimile the skew,
identical layout for all the buffers, should be wed. As an uample of tree ap-
proach is the following case. To distribute the clock signal to 64 elements (for
example r e e k s ) . 3 stages (levels) of buffering with 1-to-4 tree structure m e
required. A wuiety of software paekager have been developed for clock tree
synthesis [35. 361.
T o ieduce the high dynamic power dissipation (few Watts) in dock distribution
at a fixed power supply. many techniques c a n be used such as:
1. Using a low capacitance clock routing Line such as metal3. This layer
of metal can be, for example, dedicated to clock distribution only.
2. Using low-swing drivers at the top level of the tree 01 in intermediate
levels.
Figure 1.87 Clock tree distribution,
For the second approach, a half-swing clocking scheme has been proposed 1371.
Fig. 4.98 shows the half-swing dock driver which generate half VDD clock
signals (four phases) to the elements (eg , latches). Using the charge shaiing
principle, the node of haEVDD can he expressed by
H-VDD = when clk is low

c, + c, + c, + c s V D ~ (4.134)
H-VDD = -VDD whenclk ia hwh (4.135)

ca+ c3+ c, + G B
where C, and CB me added Capacltms to the power liner. C, through C4 are
the load capacitances of the driver. When CA is equal t o CB and both ase
large enough, compared to C,-C,, then H-VDD node is stabilized at V D D ~ ~ .
Fig. 4.99 shows the clocking schemes of the latches driven by the clock driver.
Compared to the conventional scheme which uses two clock phases, the half-
Swing scheme requires four clock phases. Two phases are for PMOSs and two
are for NMOSr BI shown in Fig. 4.99(b). This scheme reduces the power by
75%. However, the delay of the latch is increased by the new docking scheme,
which can be acceptable [37].
4.9.5 Output Circuits
TO drive the output pad. a high drive capability driver is needed to achieve
adeqnate rise and fall times. In this cme, inverter chain is used to handle the
228 CHAPTER4
large load of the pad, package wiring, and off-chip load. This capacitance can
be few tens of pF. A typical value of this capacitance is 50 pF. There arc
~ V) to
many types of output pads swh BS tristate, bidirectional, I O W - V D(3.3
higb-VDo ( 5 V) output buffer and low-swing output.
4.9.5.1 Trisiafe and Bidirectional Circuits

Fig. 4.100 shows a tdstate circuit to drive large pad capacitance. When the
output enable signnl is high, the output data is the same BS the input data.
When the output enable signal is low, then the output of the pad is in high
impedance state (Z). Bolh the otttput NMOS and PMOS transistors are cutoff.
Fig. 4.101 shows the bidirectional I / O circuit which is quite useful when we
need to save the nomber of 1/0 pads. Sometimes an input buffer is included
in the bidirectional pad. The operation ofthis circuit is obvious.
4.9.5.2 Power Di,wiparion o/ Output Circuir
The total power dissipation a t the output pads can be divided into the static
power dissipation asd the dynamic power dissipation. The statk power dissi-
pation is due mainly to the leakage curents (junction and subthreshold) if the
ontput pads are driving CMOS logic. If the VT of the devices is large enough,
then the static power dissipation of the output pads is neglected. However if
VT is small, then the DC power, due to the subthreshold current, for the output
pads is
P. = N . I D s , ~ . . VDD
~ (4.136)
where No is the number of output pads and ID5,mron is the average subthresh-
old current for both cases when the input is 1-w and high. For low VT the
230 CHAPTER4
1
Data-in
Figure 4.101 Biduraciiod pad.
IDS,-..* value would be important, beesnse the devices in the autpnt bnffer
have large ske partiedrub the output transiston. ID,,.., should be corn-
puted in worse case where the VT has its minimum value. Thus for future
technologies where the threshold voltage is low and the nomber of output pads
is large, thm static power dissipation would be very important and can be a
limiting factor for low-power applications. Hence low-power eircuit techniques
are needed for output buffers.
If the CMOS output buffer is intended to drive bipolar TTL inputs (not CMOS
TTLinputs), thenMportanteurrentissn~.Fig. 4.102shows thefinalstageof
the buffer driviog a TTL logic. Since, bipolar TTL inputs can sonrce significant
amounts ofcnrrent, B CMOS ootpnt buffer must sink this current. For 3.3 V
power supply, this current can be in the range of 1 mA to 12 m.4 depending
on the strength of the ootput driver. The static power dissipated by the one
output pad driving bipolar TTL inputs is
= VOLIOL (4.131)
output driver :
Figure 4.10'2 TTL output buRIr.
where lo&is the cmrent sunk by the output buffer and is equal to the I-
of the cnxrent from d the bipolar TTL inputs. VOL = 0.4 V for 10- T T L
output. This disspated power is due to the ontpnt NMOS pull-down transistor
and can be an important issue s far BJ the chip heat is concerned. Note that
the corresponding energy is not drawn from the internal power supply.
Another romponent of the total power dissipated at the output pads is the
dynamic power. It is given by
Pen = A(N,E<. + N.C.V&)f (4.138)

where E;, is the internal switching energy of the output pad, and G, is the
werage output load capacitance (including the pad load). As an example. 64
output pads switching vith an activity of 10% at 200 MHe dissipate 0.8 W
(WDD = 3.3 V, E;. = 70 ) r W / M H Z and C, = 50 pP). This d u e is very
important to take into account.
The total power dissipation of the bidirectional pads can be evaluated using
the approaches developed far the input and outpot circuits.
4.9.5.3 3.3-10-5 v olllpul hzterface

When a 3.3 V chip is connected to a 5 V chip, zero DC power dissipation
interfaces are needed. If the conventional CMOS is used to interface the 3.3
v 109;. to 5 V logic, the DC power would be large. Fig. 4.103 illurtrates this
232 CAAPTER4
problem. For example, if the 3.3 V inverter driver high into the 5 V inverter,
the Vos of the PMOS transistor P, is equal to 1.7 V. This value is larger
than VT of the device and thus results in large DC power dissipation in the
range of milliwattr. Since this power is for every 110, then for a whole ASIC
chip it could be hundreds of mW. This situation is unacceptable for low-power
application..
The circuit of Fig. 4.104 defines a solotion t o the problem of DC pow% d i c

sipation (381. The circnit has two power supplies, denoted VDDLand VDDB
corresponding to Iow-VDo (erhmple 3.3 V) and high-VoD (example 5 V), r+
spectidy. For low input data, node A is at VDDL and node B is at aero. The
NMOS transistor N is conducting and the output is at Vss. Since the output
is %em,the feedback PMOS transistor. PI,is also conducting. The p a r NMOS
transistor N,, is cutoff, thus the node C is palled up to V D D XThen
. the PMOS
transistor P is completely OFF. Hence no leakage is in this state except the
junction leakage currents and the Subthreshold currents. For high input d a b ,
node A is a t s e m and node B is at VDDL.In this cffie the NMOS transistor N
is OFF and the pffis transistor Ne is condncting. Initially the feedback PMOS
transistor Pj is ON and since Np i s conducting, then proper sising of PI and
Nn (higher conductance of Np)d l permit node C to be discharged though
N p . This canses P to eondnct, which in t u n charges the ontput to V D D H .
Then the feedback device Pj is completely OFF. Thus this interface results in
very limited leakage current and solver the problem of interface.
As mentioned, the transistors PI and Np should be sined properly so that the

circuit does not hteh the prcvious data. Pj should be mvch smaller than
Low-Voltage Low-Power VLSI CMOS Circuit Deszgn 233
Xp. We we simple analyri. to find the relationship between the sizes of the
two transistors. For high input data, initidly the node Cis at V D D X . Thns
the NMOS Ng is in satmatian and the PMOS Pf is in the linear region. By
'ustoning that the drain current of N? is much higher than that of P f , w e have
(4.140)
where & and opt are the 8s of the NMOS transistor Np and the PMOS
transistor P f , respectively. The low-to-high voltage converter has jl negligible
DC current when the input is stable since all the devices are completely OFF.
Thin technique can be used to interface any lowvoltage to higher voltage.
4.9.6 Ground Bounce

W h e n a high drive carrent CMOS driver switches, it generates high carrent
spikw. This current can generate noise, as shown in Fig. 4.105. The current
tlows through the impedance between the pad and supply node and produces
a voltage noise. This noise is often called L$ or ground bounce. The I is due
to the padrage inductance. The ground hounce is given by
di
V' = L- (4.141)
dt
234 CHAPTER4
, .
C""*", j :
4 : I
Vi"
p F yj > 'TI i n x
. . ..
Time
V dl
L = L- dt
This noise problem can occur on power lead and is termed power bounce. We
will use only one name to refer to this problem. Consider a CMOS output
driver driving the output pad of 50 p F at 3.3 V in 2 ns rke/fall timer. It can
be shown [39] that 2 is related to the fall/rise times by
(4.142)
The dijdl can be as high as 165 mA/m. If for example 8 drivers are dowed
to switch rimnltaneoudy per eaeh VoojVss pads pair, the resulting ground
bounce for 1 = 1 n B is 1320 mV. This value can be B problem, partieduly for
low-voltage applications, since this ground bounce consumes a large fraction
of the digital noise margins. Some of the problems encountered arc 1) fake
triggering. 2) double cloddng, andjoz 3) missing clocked pulses.
Low- Voltage Low-Power VLSI CMOS Czrcurt Deszgn 235
110 buffers are not the only sonree of ground bounce in CMOS circuits. Clock
bnffers llod slightly the c o x logic can also cause serious ground bounce in the
supply leads when driving large loads. Careful power supply routing should
be taken when we power large buffes. The resistance of the metal should be
minimieed so the voltage drop, due to the corrent spike, is reduced.
There are many techniques to reduce the ground bounce. One simple approach
is to use separate supply pins for the ootput buffers. Some approaches, based
on reducing L and d i l d l , are the following:
Multiple supply pads and pins iz O ~ way

E to ieduce the indnctanee of
the supply. A recent chip nses 121 power/gronnd pins oat of a total of
293 pins [40].
Placement of power and ground pins, adjacent one to the other re-
duces the effective inductance of power sod groond pins by mutual
. inductance. This approach cmses an inerutse in chip s i x and cost.

Circuit techniques to reduce the d i j d t of the output and dock bufferr,
while maintaining sdeqwte performance. The simplest way is to con-
trol the rise/fsD times while maintaining the timing requirement. How-
ever, this approach has a serious problem, since worst-ease-slow pro-
cess dictates the buffer rising (worse~asedclsy), while best-casefast
process dictates the ground bounce l e d Benee the buffer design is
constrained by the two extremes of process variations. Once the buffer
is siaed to satisfy the worse~asedelay, the worsecase gronnd bounce
may exceed the fired level. This problem can be solved by controlling
the signal slope at the inpnt of the output transistors of the buffer [41].
rn For clock buffers, and in high-performance design, on-chip by-pass a-
pacitmce are added between t,he power bur and the substrate as shown
in Fig. 4.106. This capacitance lowers the impedance of the power s u p
ply. On-chip bypass capacitance doer not reduce the noire produced
by output buffers.
m Another approach is to reduce the output d t q e swing of the large
boffer.
In eondudon, to reduce the ground bounce, all the techniques can be combined
to reduce Land d i l d t The reader can refer to many other techniques to reduce
the ground bounce [42, 43, 44, 451.
236 CHAPTER4
T'DDC
-
I f VDDBus
4.9.7 Low-Swing Output Circuit

With the advent of high-performance VLSI chips, which operate beyond 100
MHe and have over 100 I/Os on the same chip, high data rate CMOS 110
interfaces with low-swing signals are needed such BP ECL (Emitter Coupled
Logic) 146, 47,481, BTL [4Q],GTL (501, and CMTL (Current Mode Transceiver
Logic) (511. Conventional unterminated htecconneets (between VLSl chips) for
CMOS-level sign& w u d y have poor signal quality with severe overshoot and
r k g h g . accompanied by EMJ (deetromag~tetieinterference) and the possibility
to trigger the lath-up.
Fig. 4.101 shows two chips connected to the bidirectional transmission line
(50 R termination resistors) though GTL I/O (Gunning 110 ) transceivers.
Bath ends of the transmission line are tezminated to prevent reflections. The
load seen by each driver is 25 R. The termination voltage VTMis about 1.2 V.
The output driver is an open-drain NMOS pull-down transistor and when it is
inactive the output is at high-level signal Vox equal to 1 $ ~The . input receiver
uses a M e r e n t i d comparator with external reference voltage = 0.8 V.
Figure 4.101 CTL 110 with two chipa connected to transmirsionhe
Fig. 4.108 shows an output duver in open-drain confignration which indudes

circuitry to reduce overshoot and the turn-off dildt. When K, is low, P, turns
ON which itself turns Na and N, ON. In this C B J ~ , ,the maximum output
voltage is VOL,,,, = 0.4 V. The powei dissipated by the pull-down NMOS ir
madmum and mainly static. The static current is equal to (VTM- V o r ) / R=
0 3 / 2 5 = 32 mA8. Hence, the marimurn static power dissipated on-chip is
P = 32 n A x 0.4 = 12.8 mW for each I/O. % i d value of Vor. is 0.24 V,
thns the nomind power dissipated by each active driver is 9.2 mW. When the
input goes Lorn low to high, N, turns ON and Na is still ON because the signal
through the two inverters I , and 1, is delayed by about 1 na. The transistor
NI is weak, hence the output discharge ir controlled by N, and Ns. There
transistors let the drain of N, connected to its gate as long BS V ~ s irs higher
than VT. When Ns turns OFF, then NI discharge. the gate of Nq to the
ground. Thus, the turn-off of N4 is controlled. In this mse, there is no DC
Power dissipated.
Fig. 4.109 shows the input buffer which employs B differential comparator. This
circuit switches to high (low) V,, when I L V,., > 50 mV (< -50 mv),
~
respectively over process, power supply and junction temperature variations.

'"ole Lhat this ourrent ;s supplird by Vp, and DOL V,,
238 CRAPTER4
Vi"
(GTJ. levels)
YOU,
The average power dissipated by this input receiver is 5.5 m W at 5 V power

supply.
4.10 LOW-POWER CIRCUIT TECHNIQUES

Remember that the total power dissipated by a circuit has three components.
Two of them which are very important are : 1) the static power (P.),and 2)
the dynamic powei ( P d ) . This section treats some of the circuit techniques
for achieving law-power while maintaining performance. Techniques to reduce
the power at rubrystem/rystem and architecture lev& will be discussed in
Chapters 6, 7 and 8.
4.10.1 Law Static Power Techniques

One important source of static power dissipation is the use of low threshold
voltage. With device sealing, the power supply voltage is sealed. If the thresh-
old voltage is not sealed, and is equal or greater than one half V D D the
, gate
delay increases drastically [52]. The threshold VOhge should be less than 20%
of VDD,in order to maintain puformance at law supply voltage. At 1 V power
ropply, the thrwhold voltages can be as low as 0.1 V. However, rcdncing VT
C ~ S serioos
~ S standby snbthreshold enrrent increase, dne to the exponential
relation between the current and VT. With low VT the process fluctuation can
increase this current more. For VLSI integration and future ULSI, the total
standby current can be high and not acceptable for low-power spplications.
To reduce this subthreshold current, associated with low VT devices. there are
many techniques. These techuiqms are based on the principle to reverse bias
the VGSvoltage of the MOS device (in the case of NMOS) in the standby made
ofoperation, as ahown in Fig. 4.110. With Vcs = V e x , where Von is mgativc,
the standby state of the device moves from state to state p . We d t e two
tcchniqoes using this principle:
4.10.1.1 Self-Reverse Biasing
This technique has been used mainly to reduce the static power dissipation
in standby mode of the memory decoded-driver [53]. The drivers, in memory,
have a lbrge number of circuits, arranged repeatedly, but only a few of them
operate aimultaneoudy. The drcuit of Fig. 4.111 can drastically reduce the
subthreshold current of the drkers. The technique simply consists of inserting
a PMOS tmnsbtor P- with a size W. between the power supply VDO and the
common source node A. AU the PMOS transistors (Pd,,Pd2, ...,Pdn)of the
' I C o l y L ~ -nl
t tbcahold voltage.
240 4
CHAPTER
drivers have, in thk example, the s m c sivc Wd and common SOUICC (node A).
The number of drkers R can be between a few hundreds to a few thousands.
The MOS transistors in the ddvers have low iVTdl (e.g., 0.1 V). The PMOS
transistor PG have a threshold d t t a g e IVT,I slightly higher than I V d (%.,
0.2 ~ 0.4 V).
In active mode, the input S is low and the transistor Pois ON. For the drivers
only one circuit is ON. In order that the PMOS transistor Pedoes not affect the
drive current of the driverg, its size W, should be larger than Wd,depending
on the capacitance of the common murce, which is huge for high R. In standby
mode, the input S is high and the PMOS transistor P, is OFF. The inputs
of all drivers are set to high (VDD). Without the PMOS tiansirtor P., the
total subthreshold emrent would be n timer the c u r d of each driver. This
malres thk current very high. Hence Pc %educesand limits the sobtbrahold
cnrrent. The voltage of the common source node A, is reduced by an amount
AVsna (afew hundreds ofrmV). This CBUSOS the PMOS transistors ofell drivers
to hsve self-reversebiasing gate-source voltage, which drastically reduces the
subthreshold current. The time needed for the node to stabiliue to VDD-
AVsns (or the time needed to switch from the active to stsndby mode) is called
evolution time and can be very high (order of 1 mr) compared 10 the delay of
the driver. The reason is that only the leakage and subthreshold cyzlents which
Slvndby mode
s Active mode
Figure 4.111 Subthicrholdcurrmt reduction by self-revcrre hissing.
&charge the node A in this mode. This time can be undgnificant to low-power
operation if the standby mode time is large enough s i n the case of many low-
power applications. When the input S is turned low (active mode), the time
needed for the coinmm source A to recover (reaches almost V D Dis ) too low
and can be lower than the delay time. Hence. it doer not interrupt the start of
normal operation.
Lets derive now the subthreshold current expressions before and after reduction
by SXB technique. The total subthreshold current withont the self-reverse-
biasing techaique is given by
wa
I..*, = n.I-exp
-1vm (4.143)
wo ~
Sjln10
With the lranristor P,, the subthreshold current is given by
w. exp
I d 2 = la- -lvTcI
(4.144)
w, ~
S/I.lO
242 CHAPTER4
We assume that the devices have the s m e lo, Wo and S. By dividiog the
-,
current equations (4.143)and (4.144). ws have, for the subthreshold current, a
reduction factor
Forexampleforn = 512, W. = lowd,(with this ratio thespeed irnot affected),

VT, = 0.3 V, V T ~= 0.1 V and S = 90 mVjdecode, the factory = 8.5 x 10'.
So, the saving, in subthreshold current, is sufficient. The parameter AVsni,
can be easily deduced. Note that this technique needs multi-VT technology.
4.10.1.2 Mulri-VTTechnique
This techniqne is similar to the one discussed above, but it u ~ be
n applied to
any CMOS logit (54,561. The basic idea is shown in the crsmple of the NAND
gate of Fig. 4.112. Here the MOS transistors P and N have high VT (e.g.,
0.6 V extrapolated) for 1 V power supply applications. Also the logic gate has
MOSFETs with low VT ( 5 0.3 V). The signal SL is used to switch the gate
in active or sleep (standby) mode. The virtual upp ply lines VDDV and Vssv
are common for many gates. We call thb logic multi-threahold CMOS logic
(MT-CMOS).
In the active mode, the signal SL is low,P and N are ON, so the vktoal supply
lines VDDVand Vssv can be set to almost VDOand ground, respectively.
Hence, the 10w-V~logic o m switch effidently, bot cart shonld be taken in the
siziing ofthe P I N devices compared to the logic. Fig. 4.113 shows the effect of
aieing the high-& devices on the delay of the gate. The width of P I N rhodd
be at least 10 timer larger than that of logic cells. This condition depends
greatly on the pararitic capacitances of the Virtusl sopply lints CI6nd C, [see
Fig. 4.1121. If C, and C, are large then the width of P and N transistors
can be reduced, because these capacitances tend to suppress the bouncing of
VDDVand Vssv and henee improve the rpeed. The high-& MOSFET. can be
cornmon for several logic g a t e s (q,
10).
In the standby (sleep) mode, the signal SL is high, then P and N are OFF.
Hence, the subthreshold current is limited by that of these high-VT devices.
In this ease, the static power dissipation is dramatically reduced in the sleep
mode. The subthreshold reduction factor can be deduced using the analysis
presented in the previous section. One problem associated with this MT logic
is that the evolution and recovery times can be large.
'
H - V T Tr Gak Wid* lnormalizedcd)
Figure 4.113 Effect high.V, MOS width on thc p=dommce ol MT-

CMOS,
244 CHAPTER
4
The measured delay, as a function of the supply voltagc tor Zinput NAND gate
with FO= 3 and wiring load of 1 mm (0.25 p F ) , is shown in Fig. 4.114. The
technology is 0.5.pm CMOS with low VT- = 0.25 V, low V T ~ = -0.35 V, high
VTn = 0.55 V and high VTp = -0.65 V. The MT-CMOS logic has almost the
s-e speed ag the full 10w-V~logic. The logic delay time is reduced by 70% at
1 V as campared with that af the high-v, one.
For holding the level of the output during the deep mode, a level holder is
necessary 85 shown in Fig. 4.115. It consists o d y of cross-coupled inverters
with high-VT devices powered from the power snpply VDD.
T h e source of the static power dissipation is not mly low VT devieer. Several
other issuer eontribnte to static power increase. These are some Circuit design
guidelines to ieduce the static power Mipation :
rn Avoid the use of pseudo-NMOS circuits in yaw design.

Figure 4.116 CMOS gatr with Icvrl holder.
m Avoid the w e of TTL-compatible I/O or devise low-DC current level

converters.
D o not use low VT devices in the 1/0 buffers, otherwke the DC power
increaser remarkably because the MOS transistors of the I/O buffers
have large sines. If you do not have any option, then use the rubthresh-
old reduction techniques.
4.10.2 Low Dynamic Power Techniques

ASIC. and VLSI processor elode are improving rapidly, reaehing the snb-GKa
range [ZI,561. The power dissipation of CMOS di@d circuits, operating at
thew high-fxequeneies, increaser drastically and it can be the main performance
limiting factor. Therefore, low-power circuit techniques are needed to reduce
the dynamic power of digitd citcuitr. Moreover, low-power chip consumption
is extremely important in order to extend the battery life of portable systems
1571.
In general the dynamic power dissipation of B gate (i) is given by:

Pas = rriC,v.VDDf (4.146)
where (I,is the gate activity, V, is the voltage swing, C, is the load and parasitic
capacitances and f is the operating frequency of the system. Equation (4.146)
demonstrates that there m e several ways to reduce P,:
246 CHAPTER4
1. Reduce the power supply voltage. Seating VDDfrom 3.3 V to 1 V

results in B power reduction factor of 11. However, tbia approach leads
t o speed degradation for a givcn technology. But if device sealing is
applied, in a next generation technology, the delay will improve and
henee the operating frequency. In a complex digital system local supply
reductions een be used for non-&tical dreuits.
2. Redwe, temporarily, the clock frequency of unused blocks on a VLSl
chip using an on-chip power management unit or reduce the gate BC-
tivity. These can be done a t the architectural level.
3. Reduce the output capacitance Ci. As a first order approximation thi.
capacitance is composed of the intercomect capadtanee G.,, and the
total input capacitances of the driven gates C;sv The latter caa be
redwed Using low inpat tapa6tanee logic family [SO] such a CPL-like.
Also u5ing minimum size logic gates in non critical parts of the dclign
can reduce the dynamic power significantly.
When Ci,, dominates, &s in busses and high-capacitance intereonncctionr (in-

terbloek wirer), then dreuit techniques, bwed on low-swing signal, while main-
taining the power sopply voltage. can lead to power dissipation reduction
158, 591. With increasing chip dimensions and integration density, the ca-
pacitances of wirer will dominate. It is expected that the power &ripation
associated with the busses and the interwnneetions in future ULSl chips waill
reach half of the total power dissipation [58].
These arc some guidelines for the design of low-dynamic power eircnits :
rn Cho0.e the technology that has low junction and oxide capacitances
for the same performance.
Avoid, if possible, the use of dynamic logic design style.
rn For any logic design, reduce the switching activity, by logic reordering
.. and balanced delays through gate tree to avoid glitching problem.

Use low-input capacitance logic family
In non-critical paths, use minimum size devices whenever it is possible
without degrading the overall performance requirements.
rn If pars-transistor logic style is used, uuefd design shodd be considered.
4.11 ADIABATIC COMPUTING

As discussed in Section 4.3.2, the energy provided by the snpply to charge a
load CLof a driver during charging and discharging is
E = C,,Va (4.147)
where V is the power supply voltage ar shown in Fig. 4.116(a). Half of the
energy is dissipated by the resistor of the pull-up PMOS device during the
charging phare. A similsr argument applies Lo the discharge resistor of the
pd-down NMOS transistor. This analysis is valid men if a step power supply
voltage, V, is applied to the network. From Fig. 4.116(b), the Voltage drop
across the resistor, Rp varies from V (supply voltage) to eero. Hence. the energy
disripsted by Rp is given by
En = / e V . d Q = / e V n C d ( V - V x ) (4.148)
then
1
En = 41.v’
2
(4.149)
En = C L V V . (4,150)
where 6 is the average voltage drop nerosr the resistor of the pull-up PMOS.
If the power supply voltage bar two half steps, ar shown in Fig. 4.116(c), the
energy dksipated by the resistor is
1
ER = -C,Va (4.151)
4
So less energy is dissipated by the resistor, when the average voltage is reduced,
while keeping the swing and load eapaeilnnce constant. This is the principle of
Adiabatic Switching [61, 62, 631.
For multi-steps power supply voltage, BC shown in Fig. 4.116(d), the total
energy dissipated is given by 1611
E = CL-Va = Ecmuant,msj
(4.152)
N N
and the one dissipated by the resistor is
En = 4 -
1 vz (4.153)
2 N
248 CHAPTER
4
Low-Voltage Lour-Power VLSI CMOS Czrcud Design 249
where N is the number of voltage steps uniformy distributed. Fig. 4.117 shows
an example of a driver with uaiformy distributed supplies which are switched
in surcesi~ely.The voltage V, is given by
To charge the load, Vt through VN are connected to the load in succession

(by dosing switch 1, opening switch 1, dosing witch a, etc.). To discharge
the load, Kx-1 through K are switched in the same way, and the switch 0 is
dosed, connecting the output to gannd. Note that the supply voltage, with
mnlti-steps, needs B longer time period than the conventional case to charge
mp the load capacitance. This techniqne has been used for large loads.
Another variation is to use a supply voltage with a ramp form" [62]. In this
case, the energy is drastically reduced if a long time period is used. For the
inverter for example, pulsed power supplie~(PPS) are applied to the circuit.
The adiabatic comput;oP becomes attractive only when the delay is not critical,
b e c a m in that technique the energy is traded for delay. The energy-delay
product of the sdie.bbstic circuit is much worse than the conventional CMOS
gates [64].
4.12 CHAPTER SUMMARY

This chapter has provided an introdnction t o low-power CMOS desisn. The
power dissipation components of a CMOS gate hsve been discussed. Tech-
niques to reduce the different components, a t physical and circuit levels, were
presented. Novel CMOS design styles such iu CPL, DPL, and SRPL were exam-
ined. Several issues in CMOS circuit design, such as clock distribution, ground
booncing, etc., were reviewed. This chapter represents a base, for Chapters 6 ,
7, and 8 , where subsystems and low-power architectures are discussed.
REFERENCES
[I] N. H. E. Weste and K. Eshraghian, "Principles of CMOS VLSI Design :

A Systems Perspective,'. second edition, Addison-Wesley, Reading, MA,
1993.
[2] J. P. Uyemura, "Circuit Design for CMOS VLSI," Kluwer Academic Pub-
lishers, Norwell, MA, 1992.
131 M. I. Elmasry, "Digital MOS Integrated Circuits 11", IEEE Press Book,
1993.
[4] R. M. Swansan and J. D. Meindl, "Ion-Implanted Complementary MOS
'hamistors in Law-Voltage Circuits", IEEE 3. Solid-State Circuits, "01. 7,
no. 2. pp. 146-153. April 1972.
[S] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry
and Its Impact on the Design a l Buffer Circuits," IEEE 3 . Solid-State
Circuits, "01. 19, no. 4, pp. 468.413, August 1984.
[6] S. M. Kang, "Accurate Simulation of Power Disripation in VLSI Circuits,"
IEEE J. Solid-State Circuits, vol. 21, no. 5, pp. 889-891, October 1986.
[TI G. J. Fisher, "An Enhanced Power Meter for SPICE2 Circuit Simulation,"
IEEE Trans. Computer-Aided Design, vol. 7, pp. 641-643, May 1988.
[8] G. Y. Yaeoub and W. H. Ku, "An Enhanced Technique lor Simulating
Short-circuit Power Dissipation," IEEE J. Solid-Slate Circuits. YOI.
24,
no. 3, pp. 844-847, June 1989.
[9] N. Meijs, and J. T. Fokkema, "VLSI Circuit Reconstruction From Mhsk

Topology,'. Integration,"01. 2, no. 2, pp. 85-119, 1984.
[I01 D. V. Heinbruch, "CMOS3 Cell Library," Addison-Wesley, Reading, MA,
1988.
[I11 R. J. Landers, and S. Mahant-Shetti, "Multiplexer-Based Architecture for
High-Density. Low-Power Gate Arrays," in Symposium on VLSI Circuits,
Tech. Dig., Honolulu, pp. 33-34, June 1994.
252 LOW-POWER DIGITALVLSI DESIGN
[lZ] M. 1. Elmasty, "Digital MOS Integrated Circuits I", IEEE Press Book,
1981.
[I31 R. H. Krambeck, C. M. Lee and H-F S. Law, *High S p e d Compact Ck-
cuitr with CMOS", IEEE J. Solid-State Circuits, vol. 17, no. 3, pp. 614-619,
June 1982.
[I41 V. Friedman and S. Lio, "Dynamic Logic CMOS Circuits". IEEE J. Solid-
Stale Circuits. vol. 19, no. 2. pp. 263-266,April 1984.
1151 N. F. Conclaves and H. J. DeMan, "NORA:LI Race Free Dynamic CMOS

Technique for Pipelined Logic Structures" IEEE J. Solid-state Circuits,
vol. 18, no. 3. pp. 261-266, June 1983.
1161 C. M. Lee and E. W. Seeto, "Zipper CMOS," IEEE Circuits and Dcviccr
Mag.. vol. 2, no. 3, pp. 10-17, May 1986.
[lT] N. Weste and K. Erhraghian, "Piinciplcr of CMOS VLSI Design : A Syr-
temr Perspective." Addison-Wesley. Reading, MA, 1985.
[IS] F. Lu and H. Samueli "A 200-MH1 CMOS Pipelined Multiplier-
Aeeumiilator Using a Quasi-Domino Dynamic Full-Addcr Call Design,"
IEEE J. Solid-Stale Circuits. VOI.
28, no. 2. pp. 123-132. February 1993.
[19] J. Yuan and C. Svenron, "High-speed CMOS Circnit Technique," IEEE
J. Solid-state Circuits, vol. 24. no. 1. pp. 62-71, February 1989.
1201 M.Afghahi and C. Svensson, "A Unified SinglcPhare Clocking Scheme far
VLSI Systems," IEEE J. Solid-state Circuits, uol. 25. DO. 1. pp. 225-233.
February 1990.
I211 D. W. Dobberpuhl e l al., '"A 200-MHz 64-b Dual-Issue CMOS Micro-
proccs~or",IEEE J. Solid-State Circuits. vol. 27, no. 11. pp. 1555-1567,
November 1992.
1221 H. 8. Bskoglu, "Circuits. Interconnects. and PacLaging lor VLSI," Addison
Wesley, Reading. MA, 1990.
[23] K. Yam, e l al., "A 3.8-ns CMOS 16x16 Multiplier u%htg Complementary
PaJr-'Ihn8islar Logic", IEEE J. Solid-Stntc Circuits, "01. SC-25. no. 2. pp.
388-394, April 1990.
[24] M. Suaiki. e l .I., "A 1.5-ns 32-b CMOS ALU in Double Pars-Thnsistor
Logic", IEEE J . Solid-Slite Circuits, vol. SC-28. no. 11, pp. 1145-1151,
November 1993.
REFERENCES 253
[25] A. Psrameswai, 8 . Eara, and T. Sakurai, "A High-speed, Low-Power,

Swing Restored P a s s - T r k t o r Logic Based Multiply and Accnmulate
Circuit for Multimedia Applications," IEEE Custom Integrated Circnits
Conference, Tech. Dig., S a n Diego, CA, pp. 278-281, May 1994.
[26] L. A. Glasser and D. W. Dobberpuhl, "The Design and Analysis ofVLS1
Circuits", Addison-Wesley, Reading, MA, 1985.
[27] T. Kobayashi et al., "A Current-Controlled Latch Sense Amplifier and B
Static Power-Saving Inpnt Buffer for Low-Power Architecture", IEEE J.
Solid-state Circuits, vol. SC-28, no. 4, pp. 523-527, April 1993.
[28] M. S. J . Steyaert, et al, 'ECL-CMOS and CMOS-ECL Interface in 1.2-
pm CMOS for 150-MAz Digital ECL Data Transmission Systems", IEEE
J. Solid-State CLcuits, uol. SC-26, no. 1,pp. 18-24, January 1991.
[29] C. Mead and L. Conway, "Introduction to VLSI Systems", Addison-
Wesley, Reading, MA, 1960.
[30] N. C. Li, G. L. Haviland and A. A. Tureynrki, "CMOS Tapered Boffer",
IEEE J. Solid-state Circuits, vol. SC-25, no. 4, pp. 1005-1008, August
1990.
[31] M. Nemes, "Driving Large Capacitances in MOS LSI Systems", IEEE J .
Solid-state Circuits, vol. SC-19, no. 1, pp. 159-161, February 1984.
[32] N. Bedenstiema and K. 0. Jcppson, "CMOS Chcuit Speed and Buffer
Opthiastian", IEEE Tram Computer-Aided Design, "01. CAD-6, no. 2,
pp. 276-281, M a d 1987.
[33] A.J. Al-JShalili, Y. Zhn and D. Al-KhaIili, "A Module Generator far Opti-
d e e d CMOS Bnffer", IEEE Trans. Computer-Aided Design, "01. CAD-9,
no. 10, pp. 1028-1046, October 1990.
[34] S. R. Vemuru and A. R. Thorbjornren, "Variable-Taper CMOS Buffer",
IEEE J. Solid-state Circuits, "01. SC-26, no. 9, pp.1265-1269, September
1991.
[35] J. Burlds, "Clock Tree Synthesis for High Performance ASIC?', in IEEE
ASIC Intun. Conf. and Exhibit, Rochester, NY, pp. PS-8.1-PS-8.3,
September 1991.
[36] P. D.Taand K. Do, "A Low-Power Clock Distribution Scheme for Complex
IC System", in IEEE ASIC Intern. Conf. and Exhibit, Rochester, NY, pp.
PI-5.1-P1-5.4, September 1991.
254 LOW-POWERDIGITAL
VLSI DESIGN
[37] Li. Kojims, S. Tsnaka, and K. Sasski, ” Half-Swing ClocLing Scheme for
75% Power Saving in C l o c h g Circuitry,” Symposium on VLSI Circuits,
Tech. Dig., Honolulu, pp. 2524, June 1994.
[381 J. S. Caravella and J. H.Quigley, *Thee Volt to Five Volt Intedace Cir-
cuit with Device Leakage Limited DC Power Dissipation”, in IEEE ASIC
Intern. Conf. and Exhibit, Rochester, NY. pp. 448-451, September 1993.
1391 M. Shoji, “CMOS Digital Circuit Technology”, Prentiee Hall h c . , Englc
wood Cliffs, NJ., 1988.
(401 F. Abu-Nofd et d.,“A ThresMillion Ttanaistor Microprocessor”, in IEEE
Iotenw&xal Solid-State Circuits Conf., pp. 108-109, February 1992.
(411 T. Gabars and D. Thompson, “Ground Honnee Control in CMOS In-
tessted Circuits“, in B E E International Solid-state Circuits Cod., pp.
88-89, February 1988.
(421 T.Gahara, “Gronnd Bounce Control and Impromd Latch-op Suppression
Through Substrate Conduction”, IEEE J. Solid-State Circuits, “01. 23,no.
5 , pp. 12241232, October 1988.
[43] M. HashLnoto and 0 - K Kwon, “Low dI/dt Noise and Refletion Free CMOS
Signal Driver”, in IEEE Cuatom Integrated Circuits Conf., Tech. Dig.,pp.
14.4.1-14.4.4. 1989.
[44] T. Wada, M. EiOo and K. Anami, ” Simple Noise Model and Law-Noise
Data-Ontput Buffer for Ultra-High-speed Memories”, IEEE J. Solid-state
Circuits, “01. 25, no. 6, pp. 15861588, December 1990.
[45l R. S e n t b a t h a n and J. L. Prince, “Application Sp&e CMOS Out-
put Driver Circuit Design Techniques to Reduce Simultaneous Switching
Noise”,IEEE J. Solid-state Circuit, YOI. 28, no. 12, pp. 1383-1388,Decem-
her 1993.
[46] T. Knight and A. Krymm, “A Sew-Terminating Low-Voltq,e-Swing CMOS
Outpvt Driver”, IEEE J. Solid-State Circuits, 701. 23, no. 2, pp. 457-464,
April 1988.
[47] H-J Schumseher, J. Dikken and E. Seevindr, “CMOS Subnanosecond True
ECL Output Buffer”, IEEE J. Solid-State Circuits, “01. 25, no. 1,pp. 150-
154, February 1990.
(481 M. PedcrMn and P. Meta, “ A CMOS to lO0K ECL Interface Circuit”,
in IEEE International Solid-State Circuits C o d , Tech. Dig., pp. 226-227,
February 1989.
REFERENCES 255
[49] J. Martinen, "BTL Transceivers Enable High-speed Bus Design", EDN,

August 1992.
[50] B. Gunning, L. Yuan, T. Nguyen and T. Wong, "A CMOS Low-Voltage-
S-g Itansrnisrion-Line Transceiver", in IEEE International Solid-state
Circuits Conf., Tech. Dig., pp. 58-59, Februay 1992.
[51] J. A. Quigley, J. S. Caravella and W. J. Neil, '"Current Mode Transceiver
Logic (CMTL) for Reduced Swing CMOS, Chip to Chip Communication",
in IEEE International ASIC Conference and Exhibit, Rochester, NY,Tech.
Dig., pp. 452-457, September 1993.
[52] M. Kakumu, 'Process and Device Technologies of CMOS Devices foz Low-
Voltage Operation," IEICE Trans. Electron., Vol. E76C, No. 5 , pp. 672-
680, May 1993.
[53] T. Kawahara et al., "Subthreshold Current Reduction for Decoded-Driver
by Self-Reverse-Biasing." IEEE J. Solid-state Circuits, vol. 28, no. 11, pp.
1136-1144, November 1993.
[54] S. Mutoh et al., "1 V Bigh-Speed Digital Ckcuit Technology with 0.5-
pm Multi-Threshold CMOS," in IEEE International ASIC Conference and
Exhibit, Rocherter, NY,Tech. Dig.,pp. 186-189, September 1993.
[55] M. Eoriguchi et el., "SSI CMOS Circuit for Low-Standby Subthreshold
Current Giga-Scale LSI'r", IEEE J. of Solid-state Circuits, Vol. 28. No.
11, pp. 1131-1135 November 1993.
[56] R. W. Badeau et al., "A 100-MAz Macropipelined VAX Microprocessor,"
IEEE J. Solid-state Cmcnits, vol. 27, no. 11, pp. 1585-1597, November
1992.
[57] R. Brodersen, A. Chandrakasan and S. Sheng, "Design Techniques for
Portable Systems", in IEEE International Solid-state Circuits Conf., Tech.
Dig., pp. 168-169, February 1993.
[58] Y.Nakagomeet al., "Sub.1-V Swing Internal Architecture for Futwe Low-
Power ULSI's," IEEE J . Solid-State Circuits. vol. 28, no. 4, pp. 414419,
A p d 1993.
[59] A. Bellaouar, I. S. Abu-Khater, and M. I. Elmssry, "Low-Power
CMOS/BiCMOS Drivers and Receivers for On-Chip Interconnects," IEEE
1. Solid-state Circuits. vol. 30, "0.1, May 1995.
[601 A. Chandrakaran et al., ~~~~~-Power CMOS Digital Design", IEEE J.

Solid-state Circuits, VOL 2, no. 4, pp. 473-484, April 1992.
256 LOW-POWER VLSI DESIGN
DIGITAL
[61] L. J. Svensson, and . I.G. Kollcr, "Driving a Capacitive Load without

Dissipating fCV'," IEEE Symporiam on Low Power Electronics, Tech.
Dig., San-Diego, pp. 100-101, October 1994.
1621 T.Gabara, "Pulsed Power Supply CMOS - PPS CMOS," IEEE Sgmpo-
sium on Low Power Elcotronics, Tech. Dig., San-Dicgo, pp. 98-99, October
1994.
[63]J. S. Denker, "A Review of Adiabatic Computing," IEEE Symposium on
Low Power Electronics, Tech. Dig.. San-Diego, pp. 94-97, October 1994.
[64] M. Horowita, T. Indermaur. and R. Gonadeu, "Low-PowerDigitd De-
sign." IEEE Symposium on Low Power Electroniw, Tech. Dig., Slm-Diego,
pp. 8-11, October 1994.
5
LOW-VOLTAGE VLSI BICMOS
CIRCUIT DESIGN
BiCMOS technology offers enhanced performance compared to CMOS at 5 V

power supply voltage. Many high-speed BiCMOS SRAMs, gate arrays, ASICr,
etc. have been fabricated [I]. In this chapter, we present 8 variety of BiCMOS
logic circnits suitable for 3.3 and rub-3.3 V. The potential gatel for digital ap-
plications m e identilied. The chapter starts with the introduction of the con-
ventional BiCMOS (totem-pole) gate which is used in 5 V applications. The
degradation of this gate, with supply voltage scaling, is demonstrated. In Sec-
tion 5.2, we introduce the BiNMOS family suitable for low-voltage applications.
Othec logic families, for low power supply voltage operation, are discussed in
Section 5.3. Low-voltage digital applications of BiCMOS m e identified. The
reader is referred to BiCMOS books [Z,31 to get more familiar with BiCMOS
circuits.
5.1 CONVENTIONAL BICMOS LOGIC
In this section, the eanvenlional BiCMOS logic family is introduced. This

brnily has been used successfully in many applications at 5 V power supply
voltage. The reason for the speed advantage of BiCMOS compared to CMOS is
explained. At lawvoltage, the performance degradation of conventional BiC-
MOS is shorn.
The CMOS inverter of Fig. 5.1 suffersfrom the limited current drive when the
load capaeit,ance u large. To increase the drive capability of CMOS, I bipolar
driver can he added at thc output of the CMOS inverter. Fig. 5.2 shows
one possible configuration to construct what is called B conventional BiCMOS
258 CHAPTER5
inverter. The addition of the bipolar driver stage to the basic CMOS inverter
is responsible for the high current driving capability of BiCMOS over CMOS.
As a result BiCMOS offers lower d e l q compared to that of CMOS especially
at high loading capacitance.
The operation ofthis gate is straightforward. When the input is low, the PMOS
P is ON and its d r a b current tmns the transistor QlON. The collector current
of QIcharger the output load capacitance. As the output reacher VDD-VBB,,
where VBE, is the turn-on voltage of the bipolar transistor and ir about 0.7
V, Q, gradually turns OFF. During this period, the NMOS transistor N a is
ON. Since Ndl is conducting, Q2 is in the cutoff region. Bansistor Nd2 can
also be controlled by the output node. However, using the base node results
in faster operation because the b a of Qt is p d e d up faster than the output
node and because the voltage level of the b a a node is largei. If the input is
high, the NMOS transistors N and Nd, are ON. Qlis OF€ while Q. turns ON
to discharge the output node. As a result, the load capacitance is pulled down.
As the output V. leaches VEB, transistor Q. turns OFF and the outpot stays
at this level. The conventional BiCMOS gate provides high drive capbilitr,
eem static power dissipation and h g h input impedance. More dincnssionr on
this gate are given in the following sections.
Low- Voltage VLSI BiCMOS Circuit Design 259
w CMOS
"0
BiCMOS
1
1 L
TCL
Figure 6 2 Conventional BiCMOS h v c r k r
5.1.1 DC Characteristics
Fig. 5.3 shows the DC transfer characteristic of the conventional BiCMOS

inverter of Fig. 5.2. When the input voltage to the BiCMOS inverter is s e r a
both the bipolar tran&lurr azr OFF. The PMOS device P operates in the
h e a r region with rero drain-source voltage. Due to the subthreshold current
of the transistor N (- 10 p a ) , the base-emitter voltage of QI is around 0.45
V. As a result, the output voltage V, = 4.55 V (0VDD= 5 V). The bilse of
the bipolar transistor Q2is at zero voltage because Nd2 is ON.
As the input voltage increases, the subthreshold current of N h u e a r e s caus-

ing VB,,~,to rise and the ontput voltage to fa.When the input voltage is
around the mid-VDo. both the P and N MOSFETs are ON and operate in t h e
saturation region. Also the bipolar devices are ON. At this point, the BiCMOS
inverter is in the high gain region and the output voltage drops sharply towards
its low level.
260 CHAPTER
5
,-.
5 3 j :
t
0
>
z 21
Figure 1.3 Thc DO tranafGr charactcrialic o f the convcntiondBiOMOS at 5

V.
As the input voltage increases again, the base of Q2Sollows the voltage of the
output since N is ON. When the input voltage reaches V D D ,the PMOS P is
OFF.The discharge device, A', is ON and the base ofQl is at uero. Also, the
o n t p t is completely discharged and N is ON. Then, the base of Q, is at sera
In this cme, the output voltage is %emend both the base-emitter voltages are
aero.
5.1.2 Randent Switching Characteristics

In this section we study the transient behavior of the convent,iond inverter of
Fig. 5.2. The purpose o f this analysis b threefold i) it serves to nndeEs1w.d
the transient switching behavior of the gate, i) to develop a simple analytic
model, and iii) also to show the superiority of BiCMOS compared to CMOS.
The objective of delay analysis is to point out the important device and circuit
parameters that affect the response OS the gate. The developed model is very
simple and can be used BS a first order spproimation. We start with the
Low- Voltage VLSI &CMOS Circuit Design 261
Time (nr)
(b)
e
-6
-8
0 1 2 3 4 5
Time (ns)
snalysis of the puU-op section. Then we show the difference in the case of the
pull-down section. We asinme a step input.
262 CHAPTER5
5.1.2.1 Tmnsient Lkhnvior
Fig. 5.4 shows the transient behavior of the BiCMOS inverter of Fig. 5.2.
When the inpmt f& t o gronnd, transistor P turns ON and operates initially
in the saturation region. Its drain charges the parasitic capadtames et the
base and when VBE,PI = VBErm, Qlturns ON. The emitter current increaser
in a relatively short time to its peak to charge the output load Cr.as shown
in Fig. 5.4(b). The ontput voltage is pulled-up following the base voltage of
Q1 BI shown in Fig. 5.4(a). As the b- of Q, exceeds VT,, Ndl turns ON to
discharge the base of QIto ground. But due to capacitive COUP^^. VB,,, tends
to be pulled-up. When the base vokage is higher t h m VDD- V D S , . ~where,
VDS..+is the saturation voltage of P,the PMOS tramistor P enters the Linear
zepion and the drain (base) current drops gradually. Consequently, the emitter
current of Ql struts falling. As the output voltage V, approaches the theoretical
limit of VDD VBE-, Ql is expected to turn gradually OFF. However, due to
~
the capacitive coupling between the bare and the output node, V, exceeds this
limit as shown in Fig. 5.4(a). The same ieasoning can be applied when the
input riser to VDD
5.1.2.2 Analytic Delay Mudel
A simple delay aoalysk is w r i e d out in this section. The reader can refer
to [4. 5, 61 for other detailed models. We talre iota acconnt the pararitic ca-
pacitances and the bipolar high current effects. We do not take into account
the parasitic resistances since they have no appreciable effect with advanced
bipolar technology. This model is based on i b j e model [TI.
Fig. 5.5 illustrates the transient equivalent circuit of the pull-up section (Fig.
5.2) of the conventional BiCMOS gate driving a load capacitance CI,.As we
are interested in 50% rise time, the PMOS current can be modeled by the
saturation current of the device. Thia current is given by Eqnstion (3.82) in
Chapter 3
IDS,,* = ~ p c ~ ~ , ~ t , p ~ p-~ l vosl
IVT?l) (5.')
where Vcs is equal to (K*+j V D D )where
~ , K,+ is the low level ofthe input.
The capacitance C,, accounts for the parasitic capacitances of the MOS devices
P, N d , and Ndz a t the base of the pull-up bipolar transistor. Therefore, it is
given by
c,, +
= C d , P Cd,N*> + (5.2)
where C d , pand Cd,Na,are the drain junction capacitances of P and Ndl and
Ca,N., is the gate oxide capacitance of N d l . The overlap capacitances of P
Bipolar large signal model

-. ~.
-7- .\
. ......T.. .
and N,, hie assumed negligible. The bipolar parasitic capacitance Ca, of Fig.
5.5(a) is given by
Cpa = CC.Q>t CE.Q, (5.3)
The total load capacitance, C., shown in Pig. 5.5(b), i s given by
c, = c, t CS,Q1+CC.Q, (5.4)
where Cr.is the external load capacitance, C,,O, is the average collector-
substrate capacitance of Qz and CC,~,is the average base-collector capacitance
of Q2.R e c d from Section 3.5.3 lhat the base-emitter Murion capacitance is
given by
drc,Q,
co =if= (5.5)
whew the q is the forward transit time subject to high-level effects.
The delay c m be divided into three components :
1. The first component, l,, in defined as the time required to turn QION.
The model of Fig. 5.5(a) can be used in this case. Writing lhe current
equation at the base node of QI,we have
264 CHAPTER5
Solving that equation and assuming that initidly the bare-emitter of

Qzis zero, we have
VBB,a
t, = (CF +C,)- (5.7)
I.?,,.,
If the initial VBEis not eeio then the above expression should be
corrected. Typical value of il is 17.5 ps for a total parasitic capacitance
at the base node of 50 f F ,V.j+,, = 0.7 V ,and I D S , . ~= 2 mA.
2 The second component, t2, is defined as the time required to charge

the diffusioncapmitame, CD,p,.Startingfrom t,, the collector current
begins to quickly rise and then rexbes its peak value, I c p . The output
voltage changes slowly (see waveformsofFig. 5.4). Sot. is then defined
as the time required for the collector corrent to reach its peak. This
delay component is given by
t2IDSd = T,IOCp (5.8)

which means that the charge furnished by the PMOS is needed to
charge diffusion capacitance. Therefore,
The peak collector current of Q1 can be approximated 'sing Equation

(3.111) [Section 3.5.21. So we have
ICP = JBOIX,IDS..t (5.10)
where Po is the value of the p i n for low-level injection and I x , is

the forward knee current. Note that r, is incremed by the collector
current [see equation (3.127) Section 3.531. Hence, an average value of
the forward transit time should be used in the above delay expression.
The initial value o f q is 12 ps and it can leach 50 pr when the collector
current reaches, for example, 5 mA. For = 2 mA, typical value
for t a is 78 pr (average forward transit time is 31 ps).
3. The third component, ts, is defined as the time required to charge
the total load capacitance to the middle point of the output swing.
If we assume that the voltage across the base-emitter of QIis almost
constant, then we have the following approximation
(5.11)
Low-Vollage VLSI BiCMOS Circuit Design 265
that Ic,pz is constant during this time [see Fig. 5.41, and
I f w e assume
the mid-point of the output is VDD/Z,then we have
(5.12)
The value of this delay vsries by more than an order of magnitude

depending on the device’s sise and the load capaeitnnee. For example,
for a load C, of 1 pF, this delay. t 3 , has a typical value a t 5 V power
voltage 400 p, while for load 100 f~ a typical value is
70 ps.
Hence, the total delay t d can he written as
1” = IIitatt. (5.13)
The first delay is associated with the parasitics at the bare, the second one with
thc forward transit time and the last one is a function of the load capacitance.
For smdl loads, t2 and ti dominate. Bowever, for large output loads, the third
delay term, t s dominates.
The exprersion of the pull-down time is similar to that of the pull-up time
ucept for the value of the drain e m e n t of the transistor N [see Fig. 5.21. The
saturation current ofthis device is given by
-
I D S . .=~ K , C = U , G ~ W ~ ( V G ~V h ) (5.14)
The VGs far the NMOS during the switching is affFeted by V L Zdrop
~ while the
one of the PMOS is not. This voltage is given by
vos = y;.,h. ~ VBE (5.15)

So the effective gate-source voltage of the NMOS k lower than that of PMOS.
The sizing of the NMOS and PMOS dwicer doer not follow the rule used for
CMOS. It can only be determined from circuit simulation to get symmetrical
risc/fa delay limes.
The slope of the characteriPtic delay-load of the BiCMOS gate is larger than
that of CMOS, since it is equal to V D D / Z ( ~ D S+,l c~p~) . For 8 CMOS gate, the
slope is rimply VDD/~(~DS.~,). The saturation culient in the CMOS is slightly
higher than that of BiCMOS because the CMOS inverter has D PMOS with
slightly wider device (see next Section]. Houcver, the slope of the BiCMOS
inverter is larger due to large Icp.Therefore. the BiCMOS gate h a s a higher
ddvability than CMOS.
266 CHAPTER5
5.1.3 CMOS and BiCMOS Comparison

Lets compare the delay of BiCMOS gate to CMOS gate, having both of them
the same inpnt capacitances. We consider the case of inverters with the fol-
lowing riser. For the BiCMOS inverter, we have : W, = W, = 10 em,
WN*, = WN,, = 2 fim, and the emitter ate8 is n2 the minimom area. For
the CMOS inuerter, we have W, = 15 em and W, = 7 em. For unloaded
inverters and from the delay cxprersion of the BiCMOS inverter discussed
above, ~ ~ , C M O<Si d , B , o M o S because the BiCMOS circuit has more parasitics
and requires an initial delay to turn ON the bipolar devise. For large loads,
I ~ , C M O S> G,B;CMOS, as explained previously. Fig. 5.6 shows the simulated
delays of the CMOS and BiCMOS inverters function of the fanout. Fanout is
defined here a s the ratio of the load seen by the gate to the hpni capacitance.
In other wozdr, fanout is equal to the number of the gates connected to the
ontput of the driving gate, all having the same input capacitance. The inputs
axe driven by a small siae inverter of the s a m e type to have t y p i d inpnt wave-
form falljrise times. For low fanout, 1-to.2, CMOS outperforms BiCMOS at 5
V powez supply voltage. However, when the fenout is greater than 3, BiCMOS
outperforms CMOS;particularly for high loads. In Fig. 5.6, the u o s s ( ~ ~ eea- r
pacitance (or fanout), denoted C,,is typically h the order of 100 f F . This
c m ~ o v e rvalue is critical for the performanee of BiCMOS; particularly when
the supply voltage is sealed down.
5.1.4 Power Dissipation

As discussed, the BiCMOS gste of Fig. 5.2 has no DC emrent path from VDD
to Vss if the input has rail-to-rail swing. Hence the static power dissipation is
negligible if VT of the MOS devices is high. The dynamic power dissipation of
the gate can be estimated from the circuit diagram of Fig. 5.7.
It is estimated by
Pa = C,iV%f + Cp2Vizms=f+ GVDD(VX- V L ) f (5.16)
-
The first term is due to the total peraritie capacitance at the base node of
Qi where the swing is V D D . The second term is also due to the parasitic
capacitance st the base node of 4. The swing at this node is limited to
VBB.,... when the collector current reaches its peak. Finally the third term is
related to the output load capacitance, CL,and the parasitic capacitance at the
output. The swing is only V x - V ~ where
, VH and VL are the high-level and the
low-level of ontput, respectively. These levels ace affected by the output load.
Low- Voltage VLSI BzCMOS Circuit Design 267
Equivalent load capacitance (kF)
For small loads the power of BiCMOS is greater than that of CMOS, while
for large loads, they have almost the same dynamic power. Table 5.1 shows
the simulation results of the power dissipation for both gates at 5 V power
supply. At a fanout of 1, CMOS consumes much lower power than BiCMOS
and it is h t e r . However at a Ianout of 10, the BiCMOS is faster (37.5% delay
reduction) and it dissipater only 24% power more than CMOS.
When a BiCMOS gate is driving another BICMOS, or a CMOS gate, the driven
gate exhibits a DC power dissipation. This DC current is nat acceptable,
particularly when the circuit is in standby mode. Thk is due to the reduced
$-Ping at the output of the first gate. Fig. 5.8 d o w r an example of BiCMOS
gatedrivhgaCMOS gate. Iffor example theoutput ofthefirst gate (BiCMOS)
VBE,the Vos of the driven NMOS would be higher than ieio and around
the VT, resulting in appreciable DC power. Furthermore, the drive current of
the driven gate would be reduced; particularly a t low power supply voltagc.
Another disadvantage of the reduced swing is the noire margin reduction.
268 CHAPTER5
Table 5.1 CMOS/BiCMOS powm disripotion v e r m ~Land OVDD = 6 V and

f=100hmS
Driver Fenout=l Fsnout=5 Fanout=lO
CMOS (mW) 0.67 0.83 1.26

BiCMOS (mW) 0.23 0.58 1.02
5.1.5 Full-Swing with Shunting Devices

Previously we have seen that BiCMOS &caits uhibit iedoced output s-g.
To overcome these shortcomings, various types of BiCMOS gates have been de-
vised. There are based on the conventional BiCMOS citcuits with baseemitter
or collector-emitter shunting techniques or on other logic circuits which will
be d~eusredin the following sections. Figore 5.9 shows some of the circuits
bared on shunting devices. Fig. 5.0(a) illustrated one full-swing (FS) configu-
ration called "FS type" gate [8] which uses MOS devices to achieve full-swing.
For the charging phase, 8s the output exceeds V x , Qi cemes to source current
to the load, and the load capacitance is charged through the shunting PMOS
transistor P,. When the input goes to HIGH,the load is discharged through
Low- Voltage VLSI BiCMOS Circuil Design 269
Fare 1 (BiCMOS) Gate 2 (CMOS)
Figure 5.8 DC eowcr dissipstim of the &ring p t c
N and N,. When V. falls below V,, Qa ceases to sink current from the load
capacitance. Then the output is discharged to the ground through only the
MOS transistors N and N,. The final charging and discharging phaser occurs
through the shunting devices. Hence, these phases c a n be slow became the
MOS shunting devices have low drive capabilities. When this FS BiCMOS
gate L operating under high frequency, the output s-g can he reduced. An-
other drawback of this circuit is that part of the current supplied by P ( N ) is
wasted through the shunting transistors which weakens the bipolar drive. The
shunting transistors P, ond N, can be minimum size.
The problem of the base drive inherent in the "FS type" BiCMOS gate can
be overcome by using feedback (FB) from the output through an inverter as
shavn in Fig 5.9(h). This eireuit is called "FB type" [9]. During the pull-up
transition, the shunting device P, is initially OFF and the PMOS transistor
p wpplied all its current to the b s e af Q,. When V, is approaching its high
level, the inverter I turns ON P, which itself charger the output node to V D D .
The pull-down transition can be explained similarly. The shunting devices P.
and N , and the inverter I can be sived properly to achieve greater speed then
the othei configurations, even the conventional BiCMOS gate.
270 CHAPTER5
VDD Vnn
r
&: CMOS inverter
Figure 5.0 Fdl.swing BiCMOS gstr typal: (a) "FS type"; (b) "FB k y p i ' ' ;
( c ) '"CErhlvltingtype.
Another full-swing configuration is the one shown in Fig. 5.9(c). It uses a

parallel inverter from the input to shunt the collector-emitter (CE) of QLand
Qa ontputs. The disadvantage of this gate is the increased input capacitance.
5.1.6 Power Supply Voltage Scaling

The output bipolar stage introducer VBEvoltage losaes at the output node
as discussed earlier. When LL BiCMOS gate is driving another BiCMOS gate,
the conventional BiCMOS gate loser its superior performance o v a CMOS at
lower power supply voltage. The major c a w of this problem is the pull-down
section of the BiCMOS gate. The VoSvoltage of the driving NMOS transistor
of the pull-down section is eqnal to VDD 2VeB. As VDDis redoeed, VOS
~
is signifinrntly reduced, resulting in degradation of drain current, hence the

driving capability ofthe conventional BiCMOS gate. Fig. 5.10 shows the delay
of a BiCMOS inverter in comparison to that ofs CMOS m the supply voltage is
scaled down. The reported delay times were extracted from SPICE simulation
by memuring the delay of the second gate in e. chain of identical inverters.
AU gates were equally loaded by B load CL = 0.25 p F and one fanout. All
the circuits have the same input capacitance. The BiCMOS invcrter fails to
Lour-Voltage VLSI BICMOS Czrcuit Design 271
1.4,
operate at 2 V power supply. The BiCMOS outperforms CMOS but for 3 and
sub4 V it looser its superior performance.
The limit of operation of the conventional BiCMOS gate with the power supply
current of this NMOS d e v k k (VDD -

voltage is determined by the NMOS device of the pull-down section. The drive
-2Vs.s -VT..). Hence, VDD,,,~ 2.2 V.
Therefore, high-performance BiCMOS circuits, at low-voltage, are needed that
minimize
m Teehnology/procesn complexity;
rn Circuit complexity by osing less device count;
m Area occupied by the gate; and
rn Power dissipation.
272 CHAPTER5
5.2 BINMOS LOGIC FAMILY

BiCMOS technology can gain much of its performance edge o ~ e rCMOS with
c k u i t techniques that mk-e or eliminate the effects of VBBloses. To over-
come the problem of dday degradation in conventional BiCMOS with supply
voltage, many navel circuits were proposed. In this section, a practical family
suitable for 3.3 V and sub-3.3 V operation regime is outlined.
Fig. 5.11 shows the BiNMOS family of BiCMOS &<nits. The b&c circuit
technique used in BiNMOS [lo] is the use of the NPN bipolar transistor only
in the pull-up section of the output stage [Fig. 5.11(&)]. The pull-down see-
tion is kept as CMOS. In CMOS circuits, the PMOS transistor is twc-tc-three
t i e s slower than an NMOS transistor, when same sbes are compared. In the
BiNMOS circuit, the use of the PMOS, with the bipolar driver in the pull-np
section, will halanee the unsymmetrical response of CMOS.
In the basic circuit of Fig. 5,11(a), the output reachs only VDD VBE level.
~
This increaser the delay and power &sipation of the subsequent gates. If a
resistor (in this case the gate is called BiRNMOS) or n grounded gate PMOS
transistor is inserted between the emitter and the base of the pull-up bipolar
transistor. the output achiever fd-swing. However, this will degrade the speed
of the gstc because the base current is bypasaed by the inserted element and
hence is reduced.
Many alternatives have been proposed such ar BiPNMOS [Ill, and PBiNMOS
[I21 to realist full-swing output. The BiPNMOS is shown in Fig. 5.11(c). A
small rise PMOS transistor and an inverter ale added to the bark BiNMOS
gate. The PMOS device realiees full-swing output when the output changes
from low to high. The Sdded PMOS, P, turns ON only when the output rewhches
the threshold voltage of the feedback inverter. Hence, the bare curreat supplied
by the pull-up PMOS transistor is not affected by this added PMOS transistor.
Consequently, the BiPNMOS gate has higher performance than conventional
BiNMOS and BiRNMOS. One drawback of the BiPNMOS is the increased
output load capacitance due to the inverter I.
The PBiNMOS gate eonfiguration shown, in Fig. 6,ll(d), uses a small sine
PMOS device in parallel with the bipolar p d - u p transistor t o r&e full-swing
output. This configuration results in better performance compared to the other
circuit structures but slightly increases the input capacitance of the gate. In
this section, we show that a properly optimiied PBiNMOS gate is faster than
CMOS, even a t low power supply and load.
Low- Voltage VLSI BiCMOS CiTCUit Design 273
274 CHAPTER
5
5.2.1 Bih‘MOS Gate Design

In this section we discuss the effect of the circuit parameters available to the
designer to optimine the PBiNMOS gate for low fanout fast operation ming the
0.8 pm BiCMOS device parameters discnssed in Chapter 3. We optimie the
design of the inverter. Then, the teeh*que can be extended to more complex
gates.
Finding the proper sieing of the inpct MOSFET’s P and N (W, and W,
respectively) is not tdvial. The sizing of Na and P, [see Fig. S.ll(d)] k not
critical. For typicd applications, it is enough to use near minimum size devices.
When the delay of the PBiNMOS is plotted versus the width of one of the
devices P or N,for different fanouts, a common optimum width exits as shown
in Fig. 5.12(a) with a fiattaed region. This optimum is due to the fact that
when inerebdng the size, the d r i n t i i t y of the gate increases. However, the
equivalent ontpnt load also increase.. Then at a certain siee, an optimum delay
exits. &om this figure,the optimum W, is 9 p m and W, = 11p m (particularly
for low-fanout). Note that in Fig. 5.12(8), we have chosen W, ii 0.8Wm. This
is explained in more detail below.
When the BiNMOS inverter is used as a driver of a fixed losd (e.g., bus), instead
of d d ~ gates,
g then we should consider the delay of the driver, including
the delay of the stage that drives it. In Fig. 5.12(b), the total delay of the
PBiNMOS driver and the CMOS inverter that driver it is plotted for two fixed
loads: 0.2 p F and 0.5 p F . The CMOS stage has a minimnm dae. The minimum
delay is around the point determined previously for the knout cese
The choice of the emitter area in this gate depends on the technology and the
load. For the 0.8 pm BiCMOS at 3.3 V power supply voltage, it was found that
using the minimum emitter ares (AB x 1 = 0.8 x 4 pm’) gives the minimum
delay for the range of loads 5 1pF.
Fig. 5.13 shows that the optimal W,/W, ratio is the same for different fanonts
and is equal to 0.8. This point &o gives almost symmetrical f d j d s e delays. So
wen if the fanont is unknown,the optimnm gate is fixed and the size. depend
only on the device parameters. This result is very important for standard cells
and gate arrays where the cells are ddgned with unknown loads.
Low-Voltage VLSI BICMOS Czrcutt Design 275
1411, I
2201 6
--
8 LO 12 14
I
16
276 CHAPTER5
.....
...... ....
340 ......., ........ VD0 = 3.3 v
wp +W,,=201im
n 2x0
240
228.2 0.4 0.6 0.8 I 1.2 1.4 1.6 1.8 2 2.2 2.4
wpmn ratio
Figure &.I$ The &lay of PBiNMOS inverter Y I ~ U B the ratio of W p / W . for

n fired input capacitance.
CMOS.--.-
-a
500
....
- ......
'
$ 4 0
300
200
IwI 2 3 4 5 6 7 8 9 1 0
Fanout
Figure 6.11) Comparison of the CMOS m d PBiNMOS delays for the same
input ce,p~ciLancc funslim of the fan..uk.
Low-Voltage VLSI BzCMOS Czrcuzt Deszgn 277
5.2.2 CMOS and BiNMOS Comparison

Fig. 5.14 shows the delay of CMOS and PBiNMOS inverters fnnction of the
knout. Both gates have the same input capacitance. The impmtant result of
this plot, is that the PBiNMOS gate is always h t a than CMOS, except for
B fanout of I , where PBiNMOS is slightly Carter. For a fanout of 3, which is II
typical value in many designs, the delay is reduced by 20%. For a higher fanout,
tho delay is reduced by 25.40%. This result ir quite different from the e a ~ eof
conventional BiCMOS where B high fanout (or load) is required for BiCMOS.
Let us compare the power dissipation of the gates for different fanoot. Table 5.2
shows this comparison for s m d fanouta. The power dissipations of both gat-
are comparable and are the same for e. fanout (> 3). The small rize additional
bipolar in the BiNMOS gate does not result in sigaificant power dissipation
overhead. This result shows that the BiNMOS family is an excellent choice fo?
law-powcr and high-speed operation. However for D fanout 1-2, still the CMOS
can be used.
TableS.2 CMOS/PBWMOSpow~i.di..ipationsarvfanovtBV~~= 3.3Y

f = 100 MBx.
Driver Fanouk2 Fanout=3 Fanout=5
CMOS (pW) 149 192 277

PBiNMOS ( p W ) 171 203 287
5.2.3 BiNMOS Logic Gates

Since the PBiNMOS is used extensively in 3.3 V digital integrated circuits,
some logic gates a e presented. Combinational PBiNMOS logic circuits *re
ewily constructed using the basic PBiNMOS inverter of Fig. 5.11(d). Two-
input NOR and NAND gates are shown in Fig. 5.15(a) and Fig. 5.15(b).
The logic function is implemented using the PMOS and NMOS blocks a5 in
CMOS technology. The bipolar device Ql is osed as a current drive. More
complex functions c m be implemented wing standard CMOS gate formation
theory. The layout of the PBiNMOS inverter is shown in Fig. 5.16. The
BJT consumes area in the PBiNMOS gate. However,when complex gates are
implemented with more MOS devices, &heextra area of the BJT is reduced.
278 CHAPTER
5
Figure 5.16 Cir-uit rchhcmslier of: (a) PBJNMOS NOR2 j (b) PBiNMDS
NANDZ.
One technique to reduce the area penalty of the BJT is to use merged N-well
bipolar and PMOS device..
5.2.4 Power Supply Voltage Scaling

For fntare technologies, the power snpply voltage will be sealed below 3.3 V.
Fig. 5.17 shows the delay of PBiNMOS and CMOS inverters for a fanaot=3
versus the power supply wltage scaling. The reported delay times were ex-
tracted from SPICE simulation by measuring the delay of the second gate in a
chain of identical inverters. In this case, the full-swing operation, at the input
of a PBSMOS inverter, is provided by an identical gate, where a shunting
PMOS is used. Fig. 5.17, shows that PBiNMOS is faster than CMOS down to
2.5 V. At 2.5 V the delay reductinis 15%. The crowwer power supply vdtage
between PBiNMOS and CMOS is around 2.15 V. Note that in this comparison
we used 8 0.8 pm BiCMOS technology aptimked for 5 V operation. In this
case, to compare the BSMOS to CMOS at low-voltage, deepsubmicron tech-
nology should be osed. From the device Iwd point of view, scaled technology
is expected to improve the performance of BiNMOS a t low-voltage. However,
2 V is the limit of the use of BiNMOS, since almost half of the swing a t s u b 2
V is provided by the poor shunting PMOS device.
In summary, BiNMOS family provides the follorving advantage:

Low-Voltage VLSI BiCMOS Circuit Design 279
-_
I - (N-Well B N - P l u g m N + Diff nP+D i f f
$$$Gate m P - B a s e a M e t a l 1 UMetal 2
~ C o n t a c t l X ] V l AI UEmitter
280 5
CHAPTER
. Simple gste compared to other BiCMOS logic circuits;

Good performance at 3.3 and 2.5 V power supply voltage generations
even at low-fanout; and
rn Needs simple BiCMOS process
The only disadvantage of BiNMOS is its poor performance for sub-2 V oper-
ation. The small area penalty of BiNMOS is not a problem since for complex
gates the overhead of the bipolar device is miaimiued.
5.3 LOW-VOLTAGE BICMOS FAMnIES

In this section, several BiCMOS logic circuits proposed for low-voltsge high-
speed digital applications are reviewed [13]. Many of these circuits have not
-
been widely used in BiCMOS products. However, some of the logic circuits pre-
sented in this section exhibit high-performance at low-voltage down to 1 V.
For fast operation at low-voltage the fd-swing operation should be realized

with bipolar devices. Otherwise, the techniqnes based on shunting devices do
not provide high drivability
5.3.1 Merged and Quasi-Complementary BiCMOS Logic

In this section two circuit techniques to overcome the shortcomings of the con-
ventional BiCMOS gate are discussed and compared. These gates are intended
to be nsed for sub-3.3 V operation. luso they m e devised to solve the pmb-
lem of ming PNP transistor (see next section on Complementarg &CMOS).
In all there circuits, the improvement is done mainly on the poU-dourn section
of the conventional BiCMOS, since it is the major can~eof speed degradation
at low-vdtage.
5.3.1.1 Merged BiCMOS (MBICMOSJ

To improve the performance of the pd-down seetion of the conventional BiC-
MOS circuit, with power snpply sealing, PMOS/NPN pd-down BiCMOS gate
has been proposed [I41 as shown in Fig. 5.18. In this pull-down canfig-
“ration, a PMOS transistor Pa,is “red to drive the NPN bipolar trsnsistor,
8,. The gate of the PMOS P, is tied to the base of Q,.The CMOS inverter
formed by the transistors P, and Ndl supplier rail-to-rail voltage swing to the
pull-down PMOS. Henee, the VGSvoltage of the driving PMOS transistor is
not affected by VaE loss s i n the ease of conventional BiCMOS. This gate is
d e d Merged BiCMOS (MBiCMOS) because of the advantage of the gate for
possible PMOSJNPN device’s merging.
The pull-up section is similar to the one in conventional BiCMOS. The opera-
tion of the pull-down sections is BS follows. When the input is high, N a p u b
the bare of Q1down to ground and P, turns ON. The transistor Pz supplies the
base elurent to Ql. The bipolar tramistor Q2discharges the load capacitance
to lover voltage equal or Iw than Vgaon.
Stin this structure suffers from the 2 VaE hrser. The only improvement in
MBiCMOS, compared to conventional BiCMOS, is the higher drive current of
the pull-down section. If the N-well of the pull-down PMOS transistor is tied
to the VDD rail, its threshold voltage will experience a degradation due to the
body effect during the pull-down transient. As a result, the drivability of the
pull-down PMOS transistor is degraded. A simple solution to eliminate this
problem is to shunt the IOUC~ and the substrate of the PMOS transistor, P2.
282 CHAPTER5
Figure 5.18 Tho MBiCMOS r t r
It was shown that this configuration (with shunted source/substrate) is fsJter

than its CMOS counterpart down to 2.2 V supply voltage "sins sub-0.5 pm
BiCMOS technology [15,161.
5.3.1.2 Qunsi-Complrme?zforyBiCMOS
Another variation of the MBiCMOS is called "Quasi-complementary BIG
MOS" [17]. A "quasi-PNP" connection is generated in the pull-down section
of the conventional BiCMOS as shown in Fig. 5.19. It consists of PMOS and
NPN tranaktors (Fig. 5.1S(b)). This configuration resembles the MBiCMOS
gate of Fig. 5.18. The QCBiCMOS has two attractive features. The first one is
that the drain curtent of the pull-down section does not suffer the ~ V B losses
E
as in the case of conventional BiCMOS. The second one is lhat the pull-down
waveform is steep, dae to the good Ehsrge retention capability of the bipolar
tramistor. The feedback circuit formed by the two cross-coupled inverters, 1,
and Iz, permits the discharge of the bere of the pun-down transistor immedi-
ately after the p&down transition.
The QCWiCMOS gate keeps its superiority over CMOS down to 2 V. At 2 V

it has better performance than BiNMOS logic circuit. However for sub-2 V,
it looses its performance. Furthermore, it consumer large area and needs a
relatively large fanout to outperform CMOS.
Low-Voltage VLSI BiCMOS Czrcuit Design 283
5.3.2 Emitter Follower Complementary BiCMOS Circuits

Full-swing operation can a L o be achieved by using what is called the Com-
plementary BiCMOS (CBiCMOS). The n ~ of e complementary BiCMOS has
been encouraged by the recent advances in bipolar technology, which led to
high-performance PNP transistors. It is expected that the N P N and PNP
transistors will exhibit dose performance when the de~cicesare scaled doam
and the base doping inerearer. In this section, we study the emitter-follower
(EF) CBiCMOS.
Fig. 5.20 shows the use of complementary bipolar output stage to form the
bnsic complementary BiCMOS circuits [18, 191. The pun-op section is similar
to the conventional BiCMOS. The pull-down section is symmetdcal to the pull-
np. The cnrrent of the NMOS transistor N does not sdfer of VBSreduction
doc to Q. as in conventional BiCMOS. T h e static swing varier between VBE-
and VDD VBB-. However, m explained in Section 5.1.2, the actual swing
~
might bs larger than the static design. The balanced transconductance of the
PMOSINPN and NMOSIPNF makes it ensier to obtain symmetrical fall and
rise time. Hence this circuit eliminates the degradation of the pull-down delay
with power supply voltage of the conventiond BiCMOS.
284 CHAPTER5
Figure 6.20 SEhrmsti. of Lhc basic CBiCMOS
The gate of Fig. 5.20 can be modified to achieve full-swing operation by using
emitter-base shunting devices. Fig. 5.21(a) shows EF CBiCMOS with shunting
technique. The shunting MOS transistors of the base-emitters permit rcstor8r
tion of the full logic level of the output. But still the full-swing is achieved
with the two dow MOS devices. Some of the base current can be consnmed
by the shunting devices which weakens the drive of Ql and Qz. To O T C I C O ~ ~
this problem, the feedback technique can be used as shown in the circuit of
Fig. 5.21(b). The turn ON of the shunting devices is delayed by the feedback
inverter, I.
There CBiCMOS drcuits have two drawbacks: poor performance at 2 V power

supply voltage and less, and high proce-g cost because of the high perfor-
manee PNP device needed. This low performance, at low voltage, is due mainly
t o the fact that 2Vse outpot swing is generated by the two shunting transistors.
5.3.3 Full-Swing Common-EmitterComplementary BiCMOS

Circuits
So far all the presented full-swing circuits, such as PBiNMOS, CBiCMOS,
MBiCMOS and QCBGMOS, achieve the rail-to-rail swing by using resistom
or MOSFETr that apcrate in the linear region. These techniques are effective
Figure 6.11 SrhematicofEF CBiCMOS g s k r xithshvnilngdcrirsr
only when the operating frequency is low, where the gate can complete its full-
swing operstion and/or when the load capacitance is small 1201. FuU-swing
circuits with full bipolat drive are needed. In this section, CBiCMOS variation
suitable for sub-2 V operation, called Ttmsient Saturation (TS) is presented.
Fig. 5.22 shows the basic common-emitter complementary BiCMOS ( C E

CBiCMOS) circuit. The circuit is symmetrical and has symmetrical fall
and rise times. When the input goes to high, N turns ON to rink the current
from the base of the PNP transistor Q2.When the base voltage o f Q 2 falls to
V D ~ - VQ.~turns ~ ~ON, to s o u m the current to the output load capacitance.
Q 2 eventually saturates and the output node ir pulled-up to VDD - Vcs..,. A1
the end of charging the MOS device is still consuming current. The operation
of the pull-down section can be explained similarly. Hence, the operation of
CECBiCMOS is "on-inverting and the gate needs an extra CMOS inverter
at t.he input to achieve complement fnnction. In this circuit, the MOS trsn-
rktors operate in saturation, hence they supply high cnrrent for the bipolar
transistors. Furthermore, the output swing has near rail-to-milw i n g (VCB,.~
to VDD- V,o,.r). This circuit offers high-speed at low-voltage, but har two
drawbacks; (i) the high-static power dissipation, due to the DC cwrent flowing
through the bave of either QI or Q a , and (ii) the excess delay due to the slow
procesr of turning the saturated BJTs OFF.
286 CRAPTER5
"DO
T
4
Figure 1.22 Common-*mitt* CBiCMOS $eL.
These two problems have been salved with several implementations [21, 221.
One possible implementation is shown in Fig. 5.23. It is cslled Transient
Satmation M-Swing (TS-FS) BiCMOS. This logic nses the principle of CE
CBiCMOS described in Fig. 5.22. When the input f a , we - m e that the
output is charged high, then Pa is ON. Pz tmns ON and the base of QL is
charged throngh Pa and Pa [Fig, 5.23(b)]. Consequently, Ql discharges the
output (load) down. When the octput voltage approaehs eero, the inverter
Z, turns Ps OFF and N4 ON [Fig. 522(c)]. The base voltage of Q1 falls
below V B E , causing it to torn OFF. Although 91 Jatutates, this does not
slow the n u t pull-up transition because the excess minority carriers of Q,
are discharged immediately after the pull-down operation. Thus, the bipolar
transistor ra1mst.a transiently. The circuit is symmetrical, hence the operation
of the pull-up section can be explained W a r l y . T h e PMOS transistor,Pa,
cuts
off the the DC enrient path during the pull-down transition to avoid any static
power dissipation. The small sine ontput latch, composed of the inverters I,
and I,, holds the output level because in steady state there is no path between
thc ontpnt and the supply h e s .
Compared to the BiCMOS logic circuits so far presented, TS-FSis faster below
2 V supply, when the load is relatively large (- 1 pF). At 1.5 V it is twice as
fast s CMOS for large loads. Although this circuit solves the problem of speed
degradstion of BiCMOS a1.5 V power supply, it still has several drawbacks:
Low-Voltage VLSI BiCMOS Cixuit Design 287
(a) (C)
Figure 6.13 (a) Circuit configuration af TS-FS BiCMOS: (b) and (c) tram.
sicnt saturation opcrstion for the pd-down srclion.
process complexity due to the PNP bipolar transistor; large area; relatively
high crossove~point with CMOS (- 0.4 pF); and it is a noninverting circuit.
5.3.4 Bootstrapped BiCMOS

An alternate way to avoid the negative effect of VgBloss in BiCMOS is simply
to use a second supply voltage equal t o (VDDt V B B ) Bowever,
. this approach
is costly because of the additional wirer needed to distribute across the chip
and the need for the second supply voltage. Another approach is to use boat-
strapping technique to pull-up the base of the pull-up bipolar transistor to
+
(VDD V B B )and hence the output to V D D .The generation of voltages higher
than the power supply at the gate level adds an extra degree of freedom to
BiCMOS. Schottky BiNMOS/BiCMOS circuit configorations using the boat-
288 CHAPTER5
strapping have been proposed to overcome lhe negative effect of VBEloss [ZO].
The full-swing operation is performed by saturating the bipolar transistor of
the pull-up section with jl base current polse. After which, the base is isolated
and bootstrapped to a voltage higher than VDD.These Schottky circnits ont-
perform all exjsting BiCMOS families in snbW regime down to 2 V, but they
need a BiCMOS tcehnology with good integrated Schottky diode. Other exam-
ples of a such technique are the bootstrapped BiCMOS circuits published by
[23,24. 251. The main advantage of the bootsttrapped circuits is that they c a n
be realized in conventional BiCMOS process with CMOS and NPN transistor
only. In this section, we present one bootstrapped circuit which overcomes
many drawbacks of the BiCMOS logic families discussed previously.
S.3.4.1 Basic Concepr of Operarim
The Bootrtrapped Full-swing BiCMOS (BFBiCMOS) inverter is shown in Fig.

5.24. It consists only of CMOS and NPN transistors. Benee, it can be built in a
non-complementary BiCMOS technology. The pull-down circnitry is identical
t o that of TS-FS and was explained previously. The operating principle of the
pull-up section can be explained as follows. When the input is high and the
output is low, the PMOS transistor Pd is ON. In this w e , the bare voltage of
QIis precharged t o VTP which is less than VBS- but close to it. The prechsrge
PMOS transistor MP, is ON to charge the bootstrapped capacitor Cawtto the
level VDD(piecharge cycle). When the input goes to low and Pitnrm ON, the
bipolar transistor Qlturns ON almost instantaneously becanse its bMie-emitter
junction is piecharged near Consequently the initial turn-on delay of
the pull-up section is reduced. This has an impaet on the minimum fanout
required by BFBiCMOS to outpetform CMOS. Once QI turns on, the output
node starts to charge the load capacitor CL toward VDD. Since Pp is OFF,
the node nl is disconnected from VDD and is floating. Thus as the output
voltage V. rises to VDD,the voltage at node nl also rises towards VDD+ V B S ~
(bootstrapping eyde).
When the inpnt is low, the gate of the PMOS transistor Pp turns OFF (almost
instantaneously) during the bootstrapping cycle to prevent dkehsrging the
bootstrapped node through reverse current Corn 01 to VDD.This is achieved
through the use of the pseudo-inverter formed by P( and Nj. During the boot-
strapping cyde (the input is low), Pt t u n s ON and the gate of the preeharge
transitor Pp is pulled up towards the voltage of nl. Thus, P,, is completely
OFF when the voltage at nl exceeds VDD.Furthermore, the PMOS transistor
Pd is OFF completely because its gate is driven by the boosted voltage through
P..
"OD
T 7
I"
Gt
Figure 5.24 The boolrtrtippd full-swing BiCMOS i n ~ e r t e (BFBiCMOS)

r
Compared to the Bootrtrapprd BiCMOS (BS-BiCMOS) [23] af Fig. 5.25, the

BFBiCMOS has several advantages. First, the bootstrapped capacitor ir driven
by the outpnt rather the input as in the BS-BiCMOS. In BS-BiCMOS, the gate
of precharge transistor, Pp is driven to VDDand the node nt to VDD VBE. +
Hence, when VT is lower than Vss, the boolrtrapped node leaks its charge and
resalts in less efficient bootstrapping. Third, a PMOS transistor Ps is used to
discharge the base to a pxcharged level VT, resultins in improved performance.
Furthermore, it has a high cioisover capacitance and less performance than the
BFBiCMOS.
290 CHAPTER5
Figvre 5.15 The BS-BiCMOS inucrtcr
The simulated waveforms at 1.5 V power supply of the BFBiCMOS inverter

aze shown in Fig. 5.26. The base of QLgoes to (VDD t VBB) when the input
is low. Note that when the input is high the base voltage falls to VT.
5.3.4.2 Design Issues
As a first orda analysis, the minimum d u e of Cb,,, necessary for the boot-
strapping condition, can be obtained as follows During the piecharge cyde,
the charge of the bootstrapped capacitor is VDDCS~.~ and the charge on C,,
the parasitic capacitance on the node nt, is VDDC,. The total charge on nl
during the precharge cycle is
Qni +
= VDDC~..~ VDDC, (5.17)
In order for Vt, to reach VDD, V,, must reach VDDt VBE- (during the
+
bootstrapping cycle). Thus the charge on C, is (VDD VBE,)~, and the
Law- Voltage VLSI BiCMOS Circuit Design 291
charge on Cbo,t is V~a,Ca~.c. The new charge is given by
QI,
=V s ~ ~ C +a (VDD
~ ~ i+ V B S ~ ) C ~ (5.18)
The charge necessary for the base is
Qb = Q-1- 461 (5.19)

As an approximation Qs can be given by
Qh =I& (5.20)
292 CHAPTER5
where I , is the average base current of Q 1 and t, is the rise time of the output.
From Equations (5.17-5.20) we find that
This equation indicates that Csomihas to be increased as the power supply is

scaled down. When power supply scaling is accompanied with device scaling,
1, improves and as a result ChOotcan be kept smsll. At 3.3 V, a typical value of
C,,,,is I00 IF, while at 1.5 V,without technology sealing, it is equal to 250fF.
The bootstrapped capacitance can be implemented using a NMOS transistor
with its IOUC~ and drain connected together. In this cme, the capacitance is
related to the area and gate oxide thickness of the MOS transistor. Simnlations
have shown that for 1.5 V power snpply voltage, the width and length of this
bootstrapped NMOS are equal to 13 fim and 6 pm, respectively. A typical area
increase for B two-input NAND gate due to Cb,, is 10%.
As shown in Fig. 5.24 of the BFBiCMOS inverter, the N-well of the PMOS
devices Pp, PI and P*is connected to the bootstrapped node nl.This prevents
their source/drain-well junctions to turn ON during the bootstrapping cycle.
Also, it pzevents any latch-op which might be eaosed by the parasitic SRC
when the drain/sowce-well voltages a r e forward-biased. The PMOS tiansistor
Pa &o has its well connected to its source. This eliminates the body effect of
the transistor and prevents any leakage during the bootstrapping.
5.3.4.3 BiNMOS Configuration

Fig. 5.2T shows the BiNMOS version of the bootrtinpped circuit. The pull-
down section uses an NMOS transistor (N,)as CMOS.
The p d - u p section of this BFBiNMOS configuration is slightly different than

BFBiCMOS, where a small-size PMOS transistar ( 4 )is Sdded. Withont this
PMOS device, the base-emitter voltage of Ql would be equal to VBB- when
the m t p t reachs VDo.For low output load, if the k p n t goes to high, the p d -
down NMOS device, X I , discharges the output faster than the PMOS transistor
Pd does for the base. Thus, the bipolar transistor Qlcan turn ON to supply
the output. This results in 8 high fall time delay. The added smd-sire PMOS
transistor, Pf,in the pull-up section solves this problem. It permits, through
the US^ oiinveiter I,, to set the voltage of nodes nl and B1 to Voo at the end
of the bootstrapping. Hence, the bareemitter voltage of QI is almost equal
to eem at the end of the bootstrapping. m e n the base is discharged from
Low-Voltage VLSI BiCMOS Circuit Desiqn 293
Figure 5.2T Bootstrsppcdfull-swingBiNMOS inverter (BFBiNMOS).
(VDOt VBB,) to VOO by the PMOS P j , inverter I2 holds the output level a t
VDO. Withoot this inverter, the output falls down to a level equal to (VDD -
VBE)due to the baseemitter coupling capacitance. The simulated waveforms
of the different voltages are shown in Fig. 5.28.
For an n-input gate implementation, the BFBiNMOS requires 4n input tran-

sistors. Whereas, the BFBiCMOS and the BS-BiCMOS require 5n and 6n
input transistors, respectively. The E ~ O S S O W ~load capacitance represents one
of the important parameters in circuit comparison. It is B measure of the load
where BiCMOS circuits start to have speed advantage over that of CMOS.
In the range 1.2-3.3 V. BFBiCMOS/BFBiNMOS circuits require almost an
e q o i v d d minimum fanont of 5. The BS-BiCMOS have a higher cmssavm
capacitance.
294 CHAPTER
5
Figure 52.4 Voltage w w o f o m of the inpvt (in),the output (out). and t h e

bareofQ1.
5.3.5 Comparison of BiCMOS Logic Circuits

In this section, a brief comparison ofseveral BiCMOS logic circuits is presented
n i n g II gene& 0.35 pm BiCMOS technology given in Table 5.3. For moxe
detailed comparison,the i d e r can refer to [25].
Two-inouts NAND " -

gate confirruration wlls chosen to evaluate and com~arethe
performance of the circuits shown in Fig 5.29. The logic families compared
are: CMOS [Fig. 5.29(a)], PBiNMOS [Fig. 5.29(b)], TS-FS [Fig. 5.29(c)],
BS-BiCMOS [Fig. 5.29(d)], BFBiNMOS [Fig. 5.29(e)], and BFBiCMOS [Fig.
Low Voltage V L S I BCWOS Carcurt Deszgn 295
296 CHAPTER5
Teble 6.1 Kay demicc parametrrafm 0 55 BiCMOS PROCESS
0.35pm 0 35pm
o a3pm 0 34pm
4.9 mA 24mA
B V”. = V n F = 3 3v w = 10 /,m
52 fF 73 fF
30 5l 37 R
28 n 31 R
265 R 280 R
5.29(f)]. The simulations were carried out using a chain ofgatcr. The reported
50% delay timed m e those of an intermediate gate.
Table 5.4 shows the delay, the a w a g e power dissipation and the power-d&T
product of the different NAND gates at two sopplies; 3.3 and 1.5V. The rimu-
lation was carried out at a typical load capacitance of 1 pF.
The bootstrapped family consumes more power than CMOS because of the
higher internal node capacitance. However, they provide a high speed of oper-
ation, particularly the BFBiCMOS, where il has a factor of 3 speed advantage
compared to CMOS at 1.5 V. Moreover, the delay-power product of the boot-
strappcd family is lower than that of CMOS. Notice that at 3.3 V, PBNMOS
has the lowest delay-power product and less delay than CMOS. BiNMOS at
1.5 V is slower than CMOS and is not reported in the table. These rwulta
also indicate that the m e of the bootstrapped BiCMOS/BiNMOS gate would
improve the delay-power product when VDOis scaled dawn to 1.5 V.
298 CHAPTER
5
Logic Type Delay Power DelayxPowei

(PSI (PWWBZ) (fJ/MH.)
TS-FS
20.0
18.5
26.4 7.6
Delay Power DelayxPowu
TS-FS
BS-BiCMOS 962 3.84 3.1
BFBiNMOS 1175 4.60 3.2
686 3.50 4.1
5.3.6 Conclusion
We have demonstrated, during all the previous sections, that the b e t family
to use for B fanout higher than 5 , is the bootstrapped BiCMOS for the r q e
of power supply 1-to3.3 V. Bowe~er,due to its higher area occupied, it can be
-
used m d y in high-speed digital applications. Note, when the load is large,
in the range of 1 p F , the bootstrapped f d y provides a Q h speed and
a good dday-power product. One drawback of this f d y , beside the large
=ma, is that the bootsttapping is sensitive to the shape of the inpot voltage.
One practical gate which can be used in several applications, even when the
fanout is low, is the BiNMOS family. It has good performance for 3.3 and 2.5
V power supplies. Also it provides a better delay-product than CMOS. In the
next section, many digital applications b a e d on BiNMOS family are outlined.
5.4 LOW-VOLTAGEBICMOS APPLICATIONS

In this section, we present the applications of BiCMOS digital circvitts in the
implementation of digitd building blocks, microprocessors, memories, digital
signal pmuessors, and gate arrays. BiNMOS f d y and its ntiliaation in pmc-
tied design at 3.3 V is emphasized. Many of the circuits cited are discuued in
detail in Chapters 4, 6 and 7.
5.4.1 Microprocessors and Logic Circuits
BiNMOS logic have been nred in several microprocessors [26, 271. In this
application, BiNMOS can be used in critical path delay reduction without
increasing .hip area since BiNMOS needs a low-fanout to outperform CMOS.
Among the critical paths, we cite
Decoders in the register file and the cache memory;

m Sense amplifiers and output buffers in the register file and the cache;
m Booth's encoder. Wallace tree, and the final adder in a multiplier;
m Arithmetic and lopi. unit in a rnio~optoce-x data psth; and
m Critical path of the control unit.
In the microprocessor of [26], the PBiNMOS logic family is used a t 3.3 V power
supply. The critical p s t h ofthe control onit is reduced by 36% ovei CMOS. The
BiNMOS gates keep their speed advantage even in the worst ehre (VDD= 2.7 V
and T = 125 C).
BiCMOS logic is not only limited to conventional gates, but many other logics
can be devised. One such example is the pass-transistor BiNMOS used in the
design of a 64bit adder [28] similar to the CMOS CPL logic family discussed
in Chapter 4. Fig. 5.30 shows an urdnsive ORINOR gate uriing the pass-
transistor BiNMOS gate (abbreviated PT-BiCMOS) wing donble raiL The
outputs of the pass-traoristoi network a m connected to the bases of the bipolar
transistors Q, and Q2 to reduce the intrinsic delay. The PMOS transistors Pl
and Ps are crorr-coupled to restore thc high level of the pass logic to full
Voo. The PMOS transistors, P2 and P4,charge the oatput to full-swing.
These transistors are subject to body effect, hence they turn ON later during
transitions.
300 CHAPTER5
-Pars-transistor
network
Fig. 5.31(a) compares the delay of exclusive OR and NOR gates using PT-
BiCMOS, TG-type CMOS, and CPL-type CMOS using 0.5 pm BiCMOS pro-
cess at 3.3 V power supply voltage. The fanout=l is equivalent to jl capacitance
of 35 IT The PT-BiCMOS gate is faster than the CMOS gates for any fanout.
The power-delay product is &so shorn in Fig. 5.31(b). The T G gate has the
best delay-power product for a fanant lower than 3. However, for B fanout
greater than 3, the PT-BiCMOS sate is better.
This PT-BiCMOS has been used in the dcsign of .e &bit adder [28]. It is used
mainly in the P, sum and carry blacks. A delay time of 3.5 ns was obtained for
the 64-bit adder at 3.3 V, which is 25% better than the CMOS version. The
area and power dinsipation penalties of the PT-BICMOS adder, compared to
the CMOS, were 13% and 14% respectively. The speed advantage is kept down
to almost 2 v.
5.4.2 Random Access Memories (RAMS)

One of the largest applications of BiCMOS is in RAM design, particularly
Static RAMS (SRAMs). The first BiCMOS SRAM was proposed in 1985 [29],
then many BiCMOS SRAMs were reported [30,31, 32, 33, 34, 35, 36,371. The
major applications of fast BiCMOS SRAMs a x cache for workstations and
msin memory for super computers. Many BiCMOS SRAMJ are in production
BNl
VD".,., Y
7w
006 0 12 0 I* 0 21
Load Capacitance (pF)
Low-Vo7tage VLSI BiCMOS Circuit Design 303
complexity. BiCMOS war limited to some periphery circuits due to layout-

pitch matching. It WIU used in the 110 buffers, decoder and drivers, main
sense amplifier and voltage down converter. In general BiCMOS SRAMs and
DRAMS are not suitable for low-power applications.
5.4.3 Digital Signal Processors

High-performance DSPs are needed in many applications such as video signal
processo~~, convolvers, filters. etc. BiCMOS technology has been used E U C C ~ S S -
fully in DSPs operating at B frequency of 300 MHs [41,421. These DSPs
operate at 3.3 V power supply voltage using BiNMOS logic family. Among the
characteristier of there BiCMOS DSPr, we cite:
Parallel, pipelined architecture;

m High-performance and high density of integrstion; In this ewe, critical
data-path functional blocks are customized; and
304 CHAPTER5
rn BiNMOS is used in the blocks such as: SRAM, ROM (Read Only
Memory), ALU (Arithmetic Logic Unit), multiplier, and clock driver,
etc.
Fig. 5.33 shows a block diagram of a DSP [41]. This architecture can ~ E O C ~ S B
any signal processing operation. The BiNMOS inverters me used as dock
buffers to reduce the clock skew at 300 MHu clock frequency. The dock is
distributed to about 1000 registers. High clock frequency increares drastically
power and reduces the power supply voltage due to the powor noise (effect of
high disripsted current). The BiNMOS inverter, used in the clock distribution,
is the conventional one which h= a high level of VDD- VBE. Bence, the
dynamic power of the clock network is rednced by 17% compared to CMOS
when rising BiNMOS.
Also the BiNMOS logic is used as:
rn Ootput buffer of the Booth encoder of the convoluer/multiplier blodr;

Decoder driver of the register file; and
0th- drivers.
5.4.4 Gate Arrays

Gate arrays became very popular for a wide spectrom of applications becsnse
of their low cost and short turn-around time. Gate array chips consist of s
large number of identical sites 01 basic cells which are usually placed in rows.
The rows are separated by routing channels. The core of rows and channels is
surrounded by 110 cells at the chip periphery as illustrated in Fig. 5.34.
Each of the basic cells is typically made up ofa nnmhez of transistors which can
he connected to form a two input NAND 01 NOR gate or B simple latch. The
only p ~ ~ e step ~ ~that h can
g be cnstomiaed is the metalhation. The nser of
a gate array can implement the system by specifying the required connections
between the devices in each cell and then the connection between the various
cells. This is done a u t o m s t i d y using CAD tools. The number of metal levels
used for wiling varies from 2 to 4. The first one or two levels are used for
internal Wiring of the cell and the upper levels (0.g. third and fourth) for
wiring between the cells in the harbontal and vertical directions [43].
24-bit
fl-
BiCMOS technology has been used extensively for building gate arrays and
channelless gate arrays (sea-of-gates) [43, 44, 45, 461. At 3.3 V power supply
voltage, BiNMOS logic f d y has been wed [lo, 111. In [ll],BiPNMOS logic
gste has been proposed for the Chamelless gate array. Fig. 5.35 shows a layont
ofa BiPNMOS basic c d on 0.5 pm BiCMOS technology. A bipolar transistor
and a md size MOS transistor are added to the pnre CMOS basic c e l l Thew
transistors are not only used to implement BiPNMOS gates but also Eip-flopn,
memory macros (RAM, ROM, and CAM), etc. A BiPNMOS two-input NAND
gate has 36% delay reduction compared to a similar CMOS gate for B fanout
of 7. The speed advantage is maintained down to 2.5 V.
306 CHAPTER5
110 PADS
I":
R
Figure 5.54 ~ . t . A-.~ d+.floeqian.
5.4.5 Application Specific ICs (ASICs)

In order to realiae high-performance ASICr, fast standard cell library macros
for rapid design are important. This library contains custom functional maems
such as: adder, Programmable Logic Axray (PLA), register file, RAM, cache,
Table Look-aside Buffer (TLB), and controller, ete. PBiNMOS logic has been
used for such a standard een library [12]. The cells of logic gates are d-ad
in CMOS and PBiNMOS for the same logic functions. T h e PBiNMOS gates
are used for a relatively high fanout and load, whereas CMOS gates are used
for a mall fanout. A CAD tool can be utiised to choose the most appropriate
cells in the design.
Lou-Voltage VLSI BiCMOS Circuit Design 307
Bipolar
I
0 Resinlor
PMOS
I Ma
F3S
NMOS
5.5 CHAPTER SUMMARY

In this chapter, we have demonstrated the advantage of using BiCMOS over
CMOS in terms of speed. We have shown the historical evolution of the dif-
ferent BiCMOS logic families. A vmiety of alternative circuit techniques for
low-voltage operation have been outlined and compared to the conventional
BiCMOS. Also we have shown how optimized BiNMOS are faster than CMOS
even if the fanout is low (greater than 1). The design techniques c8n he u-
tended to more complex gates and building blocks such as flipilops, and adders,
ctc. Vsdety of applications where BiCMOS, particularly BiNMOS can be used
at low-voltage are reviewed. The addition of the bipolar to CMOS to devise
new structures enhancer the performance of ICs. This feature improver the
access time of memories, register files, ALUs, DSPs, ete. Notice that a large
portion of a BiCMOS IC is implemented in CMOS, while bipolar transistors
represent a s m d portion ( 0 5 4 % ) for driving or sensing p u p o s s . The power
dissipation of BiCMOS circuits, compared to their CMOS cannterpartr, in-
aea5es drruticdy if ECL is nsed because of the DC current. However, if m l j
BiCMOS logic gates m e used, the powez inccease is not significant compared to
speed enhancemcnt. In some cases, like clock didribution network, the power
dissipation is reduced when using BiNMOS.
REFERENCES
[I] A. R. Alvsree, %CMOS Technology and Applications," Kiuwer Academic

Pnb., MA, Second Edition, 1993.
[Z] S. H.K. Embabi, A. Bellaouar and M. I. Elmarry, "BiCMOS Digital In-
tegrated Circuit Design", Kluwer Academic Pub., MA, 1993.
[3] M. 1. Elmasry, "Design and Analysis of BiCMOS ICr", IEEE Press, 1994.
[4] G. P. Rosseel, and R. W. Dutton, "Muence of Device Parameters on
the Switching Speed of BiCMOS Buffers,' IEEE Journal of Solid-State
circnits, vol. 24, no. 1, pp. WB9, Febmary 1989.
[5] P. Raje, K. Chan, and K. Saraswat, "BiCMOS Gete Performmcc Opti-
mieation wing Unified Delay Model," Symposium on VLSI Technology,
Tech. Dig., pp. 91-92, 1990.
[6] S. H. K. Embabi, A. BeUaouar, and M. I. Elrnsrry, "Analysis and Opt-ra-
tion of BiCMOS Digital Circuit Structures," IEEE Journal of Solid-state
circuits, vol. 26,no. 4. pp. 676-679, April 1991.
[TI P.A. Raje, K. C. Sarsraat and K. M. Cham, "Performance-driven Sealing
of BiCMOS Technology", IEEE Trans. an Electron Devices, ED-39, no. 3,
pp. 685-693, March 1992.
[8] 3. Gallie, et al., "High-Performance BiCMOS 100K-Gate Array," IEEE
Journal of Solid-state Circuits, vol. 25, no. 1, pp. 142-149, February 1990.
[9] Y.Nishio, et d.,"A BiCMOS Logic Gate with Positive Feedback," Inter-
national Solid-State Circuits Conference, Tech. Dig., pp. 116117,Febrosry
1989.
I101 A. E. Gamal et al., "BiNMOS a Basic Cell for BiCMOS Logic Circuits",
in Custom Integreted Circuits C o d , Tech. Dig., pp. 8.3.1-8.3.4.. 1989.
[ll] B. Ham et al., "0.5-um 2M-Transistor BiPNMOS Channelless Gate Ar-
ray", IEEE Journal Solid-State Circuits. "01. 26, no. 11, pp, 1615-1620,
November 1991.
310 LOW-POWER DIGITAL VLSI DESIGN
[12] H. Hara ct al., "0.5-um 3.3-V BiCMOS Standlrrd Cells with 32-kb Cache
and Ten-Port Register File", IEEE Journal Solid-State Circuits, vol. 27,
no. 11, pp. 1579-1584, November 1992.
[13] M. I. EImary, and A. Benaoosr, "BiCMOS a$ Low-Supplg Voltage," in

IEEE Bipolar/BiCMOS Circuits snd Techoology Meeting, pp. 89-96, Oc-
tober 1993.
[14] P. Rsje, et al., "MBiCMOS: A Device and Circuit Technique for Sub-
micron, s u b 2 V Repjme." Internetiond Solid-State Circuits Conference,
Tech. Dig.,pp. 150-151, 1991.
[15] P. G. Y. Tsui et al., "Stndy of BiCMOS Logic Gate Configurations for
Improved Low-Voltage Performance", IEEE Journal Solid-State Circuits,
vol. 28, no. 3, pp. 371-374, March 1993.
[I61 S. W. Sun et al., "A filly Complementary BiCMOS Technology for Sub-
Half-Micrometer Microprocwror Applications", IEEE Trans. Electron De-
vices, vol. 39, no. 12, pp. 2733-2739, December 1992.
[171 K. Yano et el., "Quasi-Complementary BiCMOS for Sub-SV Digital Cir-
cuits", IEEE Journal Solid-State Cizcuits, vol. 26, no. 11, pp. 1708-1119,
November 1991.
[IS] A. Wataosbe et d.,"Future BiCMOS Technologies for Scaled Sopply Volt-
age", International Electron Devices Meeting, Tech. Dig., pp. 429433, D e
cember 1989.
[I91 A. J. Shin et al., "Full-swing CBiCMOS Logic Circuits", in IEEE Bipo-
lar/BiCMOS Circuits and Technology Meeting, Tech. Dig. pp. 229-233,
September 1989.
[20] A. BeUaouar, I. S. Abu-Khater, M. I. Elmasry, and A. Chekims, "W-
Swing Schottky BiCMOS/BiNMOS and the Effects of Operating Frc
queney and Supply Voltage Scaling." IEEE Journal of Solid-State Circuits,
vol. 29, no. 6. pp. 693-700, June 1994.
[21] S. H. K. Embabi, A. Bellaonm, M. 1. Elmsiry, and R. A. Hmdaway, "New
FoU-Voltageswing BiCMOS Buffers", IEEE Journal Solid-State Circuits,
vol. 26. no. 2, pp. 150-153, Febrnary 1991.
[22] M . Hiraki et d.,"A 1.5-V FuU-Swing BiCMOS Logic Circuit", IEEE Jour-
nal Solid-State Circuits, "01. 27, no. 11, pp. 1568-1574, November 1992.
[23] R. Y. V. Ch& and C. A. T. Salama. "1.5 V Bootsttapped BiCMOS Logic
Gate", IEE Electronic Letters. Vol. 29. No. 3, pp. 301-309, February 1993.
REFERENCES 311
(241 S. 8. K. Embabi. A. Bellaouat, and K. Islam, "A Boatstrapped Bipolar

CMOS ( B 2 C M O S ) Gate for Low Voltage Applications," IEEE Journal of
Solid-State Ckcuits, "01. 30, no. 1,pp. 47-53. January 1995.
(251 A. Bellaouar, M. 1. Elrnsry, and S. H. K. Embabi. ' Bootstrapped Full-
Swing BiCMOS/BiNMOS Logic Circuits b r 1.2-3.3 V Supply Volta8e
Regime," IEEE Jaurnsl of Solid-State Circuits, 701. 30, no. 6, June 1995.
('261 J , Shuta, "A 3.3 V 0 . 6 p m RiCMOS Suprrscalar Mic.roproccssor,' IEEE In-
ternational Solid-State Circuits Conference, Tech. Dig., pp. 202-203.1994.
[27j F. Murabayarhi, ct sl.,-3.3 V, Novel Circuit Techniqnea for a 2.8-Miion-
Transistor BiCMOS RISC Microprocessor," IEEE Curtom Integrated Cir-
cuit Conference, Tech. Dig., pp. 12.1.1-12.1.4, May 1993.
[28] K. Ueda, H. Suziki, K. Suda, Y. Tnsujihnshi, H . Shinohsra. "A 64-hit

Adder By Pass Ttandrtor BiCMOS Circuit,' IEEE Curtom Integrated
Circuit Conference, Tech. Dig., pp. 12.2.1-12.2.4, May 1993.
(291 K. Ogiue, et d.. ?4 15 ns/ZSO mW 64K Static RAM," in ICCD. Tech.
uig.. pp. i~-z0.1985.
[So] H. Tran o t al., "An 8.m 1-Mb ECL BiCMOS SRAM with a Configurable
Memory Array Sine,' Internationol Solid-State Cireuila Con<. Tech Dig.,
pp. 36-31, February 1989.
pi] M. Matrui et al., "An 8-ns I-Mb ECL BiCMOS SRAM," International
Solid-state Circuits Cod., Tech. Dig., pp. 38-39, February 1989.
(321 Y. Maki et al.. "A 6.5-0s 1 Y b BiCMOS ECL SRAM,"International Solid-
State Circuits Conf. Tech. Dig., pp. 136-137. February 1990.
(331 M. Takada e t al., "A 5-ns I-Mb ECL BiCMOS SRAM," IEEE Journal of
Solid State Circuits, VOI. 25, no. 5 , pp. 1051-3062, October 1990
134) A. Ohbn et al.. "A 7-ns I-MI) BiCMOS ECL SRAM with Program-Free
Rcdundancy," in Symp. VLSI Circuits Conf. Tech. Dig.. pp. 41-42, May
1990.
(351 Y. Okajiia et &I.. "A 7-nr 4-Mh BiCMOS SRAM with a Parallel Testing
Circuit," International Solid-state Circuits Conf. Tech. Dig., pp. 5455,
February 1991.
136) N. Tamba el sl.,'"A 1.5 nr 256Kb BiCMOS SRAM with 11K 60 PI Logic
Gates." International Solid-State Citcuits C o d , Tech. Dig., pp. 246-247,
Februaiy 1993.
312 DIGITAL
LOW-POWER VLSI DESIGN
[37] K. Nakamvra et al., "A 200-MHz Pipelined 16-Mb BiCMOS SRAM with
PLL Propmtional Self-Tim'mg Generator," IEEE Journal of Solid-State
Circuits, vol. 29, no. 11, pp. 1317-1322. November 1994.
[38] G. Kitsukawa, et al., 'An Exp-ental I-Mb BiCMOS DRAM," IEEE
Jonrnal of Solid-State Circuits, vol. S C Z Z , no. 5, pp. 657-662, October
1987.
[39] S. Watanabc, et al., "BiCMOS Circuit Technology for High Speed
DBAMs," Symposium on VLSI Circuits, Tech.Dig.,pp. 79-80, 1987.
1401 G. Kitsukaws, et al., "Design of ECL I-Mb BiCMOS DRAM," Electronics
and Communications in Japan, Part 2, vol. 76, no. 5, pp. 89.102, 1992.
[41] M. Namura et al., ''A 300-MH8, ]&bit, 0.5-em BiCMOS Dsital Signal
Proeesror Core LSI," IEEE Cnstom Integrated Circuits Conference, Tech.
D i . , p p . 12.6.1-12.6.4,Me.y 1993.
1421 T. Inoue, et al., "A 300-MHe 16-bit BiCMOS Video Signal Proeersor,",
IEEE Journal of Solid-State Circuits, vol. 28, no. 12, pp. 1321-1329, De-
cember 1993.
[43] F. Mdurabayshi, et al., "A 0.5 micron BiCMOS Channellcss Gate Amy,"
IEEE Curtom Integrated Circuits Conference, Tech.Dig., pp. 8.7.1-8.7.4,
May 1989.
[44] E.Hara,etal., YA350p~50X0.8micr~nBiCMOS GateAnaywithShared
B i p o h Cell Structure," IEEE Custom Integrated Circuits Cenferenee,
Tech. Dig., pp. 8.5.1-8.5.4,Msy 1989.
I451 J. D. Gallia, et al., "High-Performance BiCMOS 100K-Gate Array," IEEE
Journal of Solid-State Circuits, "01.25, no. 1, pp. 142149, February 1990.
[46] T. Hanibuchi, et al., "A Bipolar-PMOS Merged Basic Cell for 0.8 micron
BiCMOS Sea of Gates," IEEE Joarnal of Solid-State Circuits, vol. 26, no.
3, pp. 427-431, March 1991.
6
LOW-POWER CMOS RANDOM
ACCESS MEMORY CIRCUITS
Low-power Random Access Memory (RAM) h a s seen a remarkable and rapid

progress in power reduction. Many circuits techniques lor active and standby
power reduction in static and dynamic RAMShave been devised. In this chapter
we study low-power memory circuit techniques which are very interesting for
several other applications. Among these circuits, we eramine memory cells,
sense amplifiers, precharging circuits, ete. Circuit techniques for 1.r V power
supply are also discussed. The voltage targets using NiCd and Mn batteries are
1.2 and 1.5 V respectively. The minimum voltage of a NiCd cell is 0.9 V. Also
we consider the Voltage Down Converters (VDCr) which are used in memories
and processors. No consideration is given to the detail of designing B complete
memory chip because a single configuration requires an entire book.
6.1 STATIC RAM (SRAM)

Today, workstations, computers and super computers are demanding high-
speed and high-density SRAMr, e.g., cache memories. These systems started
to use 4-Mb fast SRAMs and will require, in the future, larger density m e m e
nes with faster access time. Many I-to-4-Mb BiCMOS SRAMs [l, 2, 3, 4, 5. 61
have achieved access times of 5 to 10 ns. In these SRAMs, the power dissi-
pation is 275 to 1000 mW. which is not acceptable in many applications. On
the other hand, high-density, low-pawe~SRAMs are needed for applications
Such as hand-held terminals, laptaps, notebooks and IC memory cards. Table
8.1 shows examples for high-density SRAMr with low-power characteristics.
The standby current is in the order of 1 @A snd rub+A which is suitable for
battery-backup operation.
314 CHAPTER6
Memory Power CMOS Access Power

size (Ref.) supply technology time dissipation
1-Mb [‘f] 3.0 V 0.35-pm 7 ns 3 100 MHa

140 m W C
4 M b [8] 5.0 V 0 50-pm 23 ns 100 mW d 10 MHz
4 M b [9] 3.0 V 0.60-pm 68 ns 21 mW d 10 MHa
16-Mb [lo] 2.5 V 0.25-pm 15 ns 120 mW @ 20 MKs
16-Mb [Ill 3.0 V 0.40-pm 15 ns 165 mW 0 30 MHz
16-Mb [I21 3.3 V 0.35-pm 9 nr 238 mW d 30 MHz
The power dissipation iednction in SRAMr is not o d y due to power supply

voltage reduction, but &o to low-power circuit techniques. In this section we
review some of these circuit techniques for low-power applications.
SRAMs have several advantages OY~T Dynamic RAMS (DRAMS) such BS:
rn No refresh operation of the memory cells are needed.

m The speed of an SRAM is higher because of the differential pair of
bit-lines.
The operational modes are simpler because the row and eolamn address
signals are simultaneously loaded.
A low data retention current which is required by battery applications.
However, S U M S have the great disadvantage ofa large memory eeU eompered
to DRAMS. For this reason, their capadties rue smaller than that of DRAMs.
6.1.1 Basics of SRAMs

In order to treat the different circuit parts of an SRAM, it is important to
understand some characteristics of there memories. In general the pins of B
SRAM are :
1. Addresses (Ao ... An); which d e h e the memory location;

Low-Power CMOS Random Access Memory Czrcuits 315
2. Write Enable (m);

which selects between the read and write modes;
3. Chip Select (m);
whkh selects one memory out of several within a
system;
4. Output Enable (El?);
which is used to enable the output buffer; and
5 . Input/Output data (I/O).
6. Power supply pins.
A timing disgram during read eyde is shorn in Fig. 6.l(a). Daring this time
the data stared in a specific SRAM location (defined by the address) is read
out. For a read cycle, two times are shown in the figare; the read cycle time,
ixc, and the address access time, IAA. Fig. 6.l(b) shows the write cycle which
permits change to the data in an SRAM. Two timer are indicated. the write
cyde time, f w c , and the write recovery time, ~ W R .Same of this information
is used in this chapter. For more detail on the timing, the reader can refer to
any memory data book.
A typical SRAM mchitecture is shown in Fig. 6.2. The memory array con-
tains the memmy cells which a x readable and writable. The row decoder (X-
decoder) selects 1 out of n = 2’ rows, while the column decoder (Y-decoder)
Selects I = 2’ out of m = 21 columns. The address (row and column) are not
multiplexed as in the ease ofa DRAM. Sense amplifiers detect small voltage
variations on the memory complementary bit-line which reduces the reading
time. The conditioning circuit permits the preehaige of the bit-lines. The a-
c e s ~b e is determined by the critical path from the address input to the data
output as shown in Fig. 6.3. This path contbins address input buffer, row
decoder, memory cell array, sense amplifier and output buffer circuits. The
word-line decoding and bit-lines sensing delay timer am critical delay compo-
nentr. To reduce the sensing time during a read operation, the swing on the
bit-liner should be as small as pamible.
For an aspchronons’ S U M , a s p e d Circuit called an Address Detection

Transition (ATD) permits the generation of internal pulses. These pulses are
of two types; activation and equalieation. Activation pulses selectively activate
particula circuits, while equalization pukes permit the reduction ofthe delay
by restoring and equalking differential nodes prior to being selected. In t h m
section we treat only asynchronous SFLAMr.
‘Not docked crternoily.
316 CHAPTER6
-
CS (Chip Select) ;
-
OE (Output Enable) I
ktnn-
\
Data Out
- r-
CS (Chip Select) \ I tWK
-
WE (Write Enable )
Data in Dafa valid \\\

(b)
Figure 0.1 Typicd timing of a SRAM: (s)mad q d e ; (b) w i l e cydc.
LlC
318 CHAPTER6
Input
Addmr Row decoder Memory
idnver
address mpnt buffer cell
6.1.2 Static RAM Cells

The memory cell is an important circuit in the design of low-power and high-
density SRAMs because the memory size is dominated by the cell area. There
are various static memory cells. The cell of Fig. 6.4 has six transistors, in the
form of two inverters, cross-coupled with two pars-transistors, connected to two
complementary bit-lines BL and B. The pas-transiston are controlled by
the signal W L (word-line).
During the read cycle, the bit-lines are held high (prechsrged). Assume that
a "0" is stored at node A an& "1' is stored at node B. W h e n the cell is
selected; i.e., WL set to "I", BL is discharged through N1 and N3.
To write in the cell, one of the bit-liner is pulled low and the other high and
then the cell is selected by W L , Assume that B is set to "0" whil e mltlally
' ' ' a
"1" is stored at node A ("0" at B).N1 and P1 should be riaed such that node
A is pulled down enough to turn P2 ON. This in turn causes node B to be
pulled np. The crosssoupled inverter pair have a high gain to cause the nodes
A and B to switch to opposite voltages. The data retention (standby) current
of thk cell can be 85 low BS 10-"A. Although this full-CMOS cell has low
retention current, the cell area is so large that it does not allow high-density
SRAMs. A typical cell area using a 0.8 ~m design rules is 75 p d ,
The stability of the memory cell is its sbility to hold a stable state. Fig. 6.5(a)
ahows the transfer cumes of full CMOS S U M S . The box between the two
Low-Pomuer CMOS Random Access Memory Circuits 319
Figure 6.4 CMOS memory c d M i r h PMOS laad
characteristics (I and 11) defines the Static Noise Margin (SNM). Static
noise is DC disturbance, such ffi offsets and mismatches, due to the pioeesskg
and variations in process conditions. The SNM is defined as the maximum
value of V, (static noise IOOIC~ ffi shown in Fig. 6.5jb)) that can be tolerated
by the cross-coupled inverters before altering state. A n important parameter
in SNM is the memory cell ratio, I , defined by
where transistors N , and N , sre the a c e m and driver NMOS transistors shown
in Fig. 6.4. An a n d y s k of SNM for memory cells is given in [13]. This static
noise margin parameter incremes with the ratio 7 . However, it k limited by
the cell area constraint. The stability of the cell iS maintained even if VDDis
scaled down.
Another mcmory cell configuration is shown in Fig. 6.6. This cell is similar
to the full CMOS memory cell, except that the PMOS pull-up devices are
replaced by high-iesistance polysilicon loads. The memory cell area can be
320 CHAPTER
6
"DO
about 30% to 40% smaller than the CMOS &-transistor memory cell, because
the two polyrilieon resistances c a n be formed on top of the two NMOS driver
transistors. The High Resistive Load (HRL) memory cell har been used in
several S R A M generations from 4 K b . The high state storage node of Fig.
6.6 ulll be p d e d down with time due to two kinds of leakage current; the
I d a g e current ofthe drsin junction and the subthreshold current. The voltage
drop BCZOBI the resistance R prevents iegvlac cell operation, if the leakage
current reacher the l e d of the poly-Si remtor current. In several SRAMs
generations using BRL memory cell, the total standby current w w act to 1 p A
per chip a t room temperature for battery-backup applications. Thus, for each
memory generation with quadrupled density, the polyJi resistance value is also
quadrupled. For 4 M b chip which h a II total standby current less than 1 PA,
Low-Power CMOS Random Access Memow Cwcuzts 321
I
typical d u e s of &'stance me in the 5 x 1 P 0 range and the resistance current
is limited to 10-laA. This current should be mvch larger than the total leakage
current of the storage node of the cell to improve tho data retention margin.
The leakage current cannot be scaled because, fist, the subthreshold current
per channel width, tends to increase; particalerly with the trend to decrease
the threshold voltage for low-voltage. Second, the leaksge current of the drain
jonction per area unit tends t o increase with technology scaling. Moreover the
junction area is shrank with a rate lower than the SRAM density increase rate.
In [14], it w m determined that the maxim- SRAM capacity for low-power
applications, using an ERL memory cell is 4 Mb where the retention current is
1 @A.
Note that the high-level node voltages of all poly-Si load memory cells are
(VDD- VT)after mite cycle, where VT is the threshold voltage of the access
transistor, subject to body effect. These nodes need a time of several ms to
charge np to VDD.The SNM of the ply-Si load memory cell L more sensitive
to cell ratio 7 , than the full CMOS cell 1131. A typical valne of I is 3. Also
the cell stability is drastically degraded when VDDis 3 V or less. The transfer
curves in the read mode can be easily plotted for different VDDto flnd out that
the cell cannot store the data a t a certain low-voltage.
322 CHAPTER6
I p-Suhsmle
I
Low-Power CMOS Random Access Memory Circuzts 323
For 4 Mb and higher density SRAMs, the polysilieon load cell starts to be
replaced by a polysjliean PMOS load called PMOS Thin Film Damistor (TFT)
for low-power applications [S,9, 151. Fig. 6.7 shows a cmss section and
k c n i t diagram of the poly-Si PMOS load memory cell 181. The TFT device is
fabricated from amorphous silicon (a- Si). This material has a grain size of 2
~ r while
n that of the conventional poly-Si material is 0.03 pm. The thickness
of this a - Si is 100 n m and the gate oxide thickness of lhe TFT is 40 nm.
This technology rerulls in improved ON/OFF currents compared to the one
using poly-Si. The N i drain area of the NMOS transistor ia used ar the gate
electrode for the PMOS TFT. To obtain a small area, the polydimn PMOS
must be stacked on the NMOS driver. The second palysilicon Iaye~farms the
channel regions. The T F T memory cell area is more than 40% s d e r than
the fall CMOS one.
Fig. 6.8 shows the drain curzcot of B PMOS TFT used in a 4-Mb SRAM as
a function of the gate voltage. An ON current more than W 7 A is obtained
at a supply voltage of 3 V, while an OFF current of lO-"A is attained. The
ON current is larger by more than six order of magnitude than memory cell
leakage currents which b much better than the current of the HRL cell Thos,
it results in an excellent data letentian characterbtic. Moreover, the very low
OFF current results in a standby current less than 1 p A for 4-Mb SRAM. This
current is low enough for battery back-up operation. At 1.2 V power supply,
the current flowing in the PMOS TFT is more than one-and-a-half order of
magnitude larger than the OFF current. Thk demonstrates the ability of this
teehnoiogy for iow-voitsge operation.
Afier write cyde, the hgh-storage node voltage in the cell becomes VDD- VT.
The time needed for charging up this node to VDDis
C,VT
t,h = - (6.2)
4
where 4 ir the current flowing in tho load device and C, is the total parasitic
capacitance of the node. Using 4-Mb data for TFT memory cell, VT = 1 V ,
C, = 10 fF and 4 = 10 p A the to&is around 1 me. For poly-Si load this
chage-np time is larger than 100 m i because h k low i y ~0.1 PA. The
average interval time between two word-line selections (for the same word-line)
is given by
1. = Nlcy,rr
~
(6.3)
M
where N is the number of memory ceUr per SRAM chip, M is the number of
memory cells pel word-line, and (or noted t n c ) b the operating cycle
time. For CMb, a typical value oft, is 4.5 ma when the cycle time is 70 na and
324 CHAPTER6
M equ& 64cell/word-line. Comparing t. to t.k for poly-Si load and PMOS

TFT we have
t,* < t, For P M O S TFT (6.4)
to* > 1. For p l y - S t Lond (6.5)
Thus, the high-storage node, in the ease of PMOS T F T sell, is charged-np
qvkkly to VDD.For this rearon, the Soft Error Rate (SER) of the PMOS T F T
cell is much lower than that of the poly-Si cell [El.
6.1.3 R e a m r i t e Operation
Fig. 6.9 shows a simplified readout circuitry for an SRAM. The circuit has
static bit-line loads composed of pull-up NMOS devices N , and N2.The bit-
lines are pulled-up to a voltage (VDD- h), where V!, is the threshold voltage
Low-Power CMOS Random Access Memory Circuits 325
326 CHAPTER6
"OD
WL
Figure 8.10 Power reduction by pulsing the word tine.
mbjett tu body effect. When the word-line W L is asserted, one word is selected.
At this time, the bit-line B L is p d e d down to s level determined by the pull-up
NMOS HI, the word-line transistor N., and the driver NMOS transistor Nd ss
shown in Fig. 6.9(b). The voltage at the node A should be low (mar ground) to
not alter the RAM content during this read operation. A small swing change
on BL is dwirable to achieve the high-speed readout, particularly if CnL is
high. The Sense Amplifier (SA) amplifies the small swing, AV on the bit-line.
Typical values 0fAV-J are 100 mV wd.L?& respectively. It should
be noted that t&FA phould provide a wide opemting margin over all pmcess,
temperature, and voltage cornerr.
If the W L signal stays asserted, all selected eolamns consume a DC current

flowing through the NMOS devices N,. N. and Nd. Thus, the shortening of
read mode duration is necessary to reduce the power dissipation during this
active mode. This is possible by pulsing W L with enough time to read the cell
as shown in Fig. 6.10. The generation of pulsed W L signal is possible owing
to the Address Transition Detection (ATD) technique as will be discussed in
Section 6.1.5.
Fig. 6.11(a) shows asimplified circuit configuration for SRAM write operation.
For II write operation the memory cell state should be Ripped. When the write
signal W E is asserted, the input data and its complement are placed on the
bit-lines. If for example, a vero has to be stored in the node A initially at
VDD,the voltage at this node should be below the threshold voltage of the
coll, as shown in equivalent circuit of Fig. 6.ll(b). The bit-line in thia crse is
pulled-down to almost 0 V. The design of write circuitry should provide a wide
operating margin o v a all process, temperature, and voltage corners. Note that
B DC current is consumed during a write mode, hence the W E signal should
WL ~
BL
&o be short to cut this current at the end of the write operation. In high-speed
SEAMS, write recovery time is an important component of the write eyde time.
It is defined BE the time necessary to recover from the write cycle to the read
state after the W E s i g d is disabled. Note that the swing on bit-lines after
mite operation is large. Thus, an equalizer circuit is needed to reduce this
s-g, so that the read operation is performed qoidrly.
Fig. 6.12 illustrates b simplified achematic of an SEAM with xead/write cir-

cuitry. At the end of the memory cycle a differential voltage existed on the
bit-lines. A PMOS equalizing device is used to equalise the bitliner after each
read and write operation. The differential voltages on the bit-lines are restored
328 CHAPTER6
Dafa-i"
%D
WE 0
WL
0
@.@ x
T
Lou-Power CMOS Random Access Memory Gircuzts 329
Bil-line conBLioning
column 1 md COlvm" m
AQ 1M
a% /
9 X3LdVH3 OEE
rn The decoders (row and column);

The memory array. Ifm memory cells are connected to the ward-he,
the active power of memory array (in read mode) is given by
Pmm-ma, =mPd + (n- l)m&ab + mrDcAtfVDD (6.6)

Where P , is the power dissipated in active mode when selecting the m
.
cells and ~ I . . I , is the data retention (standby) power of the unselected
mekory cells in the m Y n array. The second term is neplipible. The
third term is due to the DC current, ID,, dadng the read operation.
At is the activation t i m e of the DC eonr-g parts and f is the
operating frequency (f = 1Jinc).An example of such a current is the
DC current flowing Gom the bit-line load to the ground through the
memory cell;
rn Sense amplifiers. They m e dominated mainly by a DC current; and
Remaining periphery such as input/output buffer, write circuitry ete.
Note that the power dissipated by the pads is not included. The power dissi-
pation of the components, other than the memory array, depends on the total
capacitances, the opersting frequency and the internal voltage swing. It can
include a DC component with a major contribution from the sense amplifier.
To reduce the active power consumption many techniques can be used and are
summatized 85 follows :
m Reducing the capacitances of the word-line and the number of m cells

connected to it. This is possible by osing Hierarchical Word-Line
- (HWL) techniques.
Reducing the DC current by using the pulse operation technique for
the word-tine and the periphery circuits (including sense amplifier).
rn Use of multi-stage static CMOS decoding to reduce the AC current.
Lowering the operating power supply d t a g e .
The standby power (or Sometimes called retention current) of an SRAM has a
major contribution from the memozy cells in the array if the sense amplifiers
are disabled in this mode. It is given by
Pstcdbv = mnprcar (6.71)

332 CHAPTER6
One way to reduce the standby current is to reduce the operating voltage. How-
ever, note that the data-retention cnirent will increase with memory capacity.
Moreover, the leakage current, per cell, tends to increase because the threshold
voltage is expected to be reduced for low-voltage operation.
In the following sections, many key circuits in an SRAM are reviewed. The
circnit techniqocs and memory organisation to reduce the lrctive and data-
retention currents are presented.
6.1.5 Address lkansition Detector (ATD) Circuit

To generate the different t-ng signals for word-lines, equalisation and sensing,
an on-chip pulse generator, which detects the address change, is needed. It is
baaed on address transition detection technique. The ATD is a key technique
to reduce the active power of memories. Fig. 6.14(a) shows the schematic
diagram of an ATD pulse generator. Short pulses are generated with XOR
circuits when the address changes from "L" to 'H" or "H"t o "L"; then summed
through an OR gate. The overall pulse width is controlled by the RC delay line
shown in Fig. 6.14(b). The corresponding waveforms are shown in Fig. 6.14(c).
The d m o pulse is usually stretched out with a d&y circuit to generate the
different pulses needed in the SRAM. Note that the CS signal is also included
as m input to the ATD generator.
6.1.6 Decoders
Usually the decoding in an SRAM is performed by using complementary CMOS.

Two kinds of decoders arc used ; the row and the column decoders. Fast
static decoders are based on OR//NOR and ANDINAND gates. Fig. 6.15
shows an example of a two-bit input address EOW decoder. The input bnffers
have to drive the interconnect capacitance of the address lines and the input
capacitance of the NAND gates. To match the pitch of the memory cell and to
perform decoding for severals blocks, twostages decoders ale used. The first
stage performs predecoding and the second one performs the final decoding
function [Fig. 6.161. The twostages decoder circuit has other advantages over
the onc Stage decoder such as to reduce the number of transistors and fanin.
Also it reduces the loading on the address input buffers. This predecoding teeh-
nique optimiiaer both speed and power. In the last stage an additional signd 4,
is included in the AND gate. This signal is generated from an ATD pulse gen-
erator to enable the decoder and ensue the pulse activated word-line. There
(h) 6
i
Addressi
334 CHAPTER6
-
:
Address h e r
Word line dtivcr
r
Low-Pourer CMOS Random Access Memory CirczLita 335
Predecodcr Final decoder
are several ways to build mw-decoderr and it depends on the R.AM architecture
division.
The column decoder permits the selection d l out of m bits of the accessed TOW.
Fig. 6.17(a) shows the circuits involved for column selection uskg an example
of 4 columns. The selected gate permits the transferring of the data from the
bit-lines to the common data-lines I j O . The signals Yi a r e controlled by the
ANDINAND c o l u m decoder BS shown in Fig. 6.17(b).
336 CHAPTER
6
Low-Power CMOS Random Access MemonJ Czrcuits 337
6.1.7 Bit-line Conditioning Circuitry

The NMOS bit-lines' loads [Fig, 6.181 have been used in many SRAMs at 5
V pow= supply. They provide a precharge level on the bit-lines of VDD VT. ~
The threshold voltage of the load, VT is subject to the body effect. A typical
valne of this precharge level for 5 V power supply is 3.5 V. This level is suitable
for voltage-type sense amplifiers to provide large gain and f st rensiog delay.
To reduce the DC current, during the write circuit, a variable bit-line load
tdmique can be employed [Fig. 6.191, It realizes fast sensing in the read cycle
and B short wdte pulse width in the mite cycle. For fast sensing, the voltage
swing of the bit-line shodd be small. To achieve this, the load impedance
should be low. On the other hand, to obtain a low current dndng write cycle,
the load impedance of the bit-lines shonld be high. As shown in Fig. 6.19,
during the read operation, all four NMOS transistors N,, Na, N,, and N4 are
turned ON. The bit-lines are switched into a low-impedance state so that the
Voltage swing of the bit-lines is limited to R small value (e.g., 100 mV). During
the write operation, the NMOS devices N, and NI arc witched OFF and only
the small she transistors N, and N , are turned ON.
338 CHAPTER6
i
NI
Figure 6.19 Variable load bit-hrs.
T
As the power supply voltage is sealed down to 3 V, the preeharge level can be
lower t h q 2 V, Thus, d-g r e d operation the high-level node of the memory
cell can t;,f&e equal to the bit-line d t s g e . Hence, the noise margin of the
memory cell is drastically degraded and consequently the cell stebbility and soft
error are degraded. Therefore, at 3 V power supply voltage, a PMOS trsnsktor
can be used w bit-liner' load [Fig. 6 . 201. The bit-lines precharge voltage
is V b ~ Far. law-voltage bit-liner precharge voltage, special ~ e n s eamplifiers
should be used because conventional sensing circuits have poor voltage gain
(less than 10). A variable impedance bit-line, using PMOS transistois, can
&o be implemented.
6.1.8 Sense Amplifier

When reading II memory cell, the bit-lines are initially precharged. then one
i f the two bit-lines goes down, while the other stays high. The operation of
polling down the bit-line is very slow because the discharging MOS device, in
the memory cell, is small and the bit-line capacitance is high. This results in
very slow memory read time. Sense ampliiiers are used t o detect the small
"adation on the bit-lines and amplify it to get at the end fuU-swing signal. A
dmple anbalanced inverter with a high logic threshold voltage can be used.
j i c e its input is single and has very small noise margin,it ir very sensitive to
noise on the bit-line. Thus, sense amplification, for the data-liner, is a key to
aehieve fast access time and low-power dissipation. In general, the delay of B
sense amplifier (from the time of word-line activation) represents 30 to 40 %of
the whole read aserr tie.
Various kinds of sense amplifiers have been devised for fast sensing operation
and low-power dissipation. Fig. 6.21(a) shows a ringlcend sense ampliser with
an active current-mlror. Thin structure forms the basin for ~ n SRAMa' y
sense amplifier circuits. It has two differentid inputs, D L and DL. The noise
equally affects both the two inputs and only the difference is detected. The
transistor N, acts as a curent source. Before the signal $ 4 . ~ is asserted, the
data-lines D L and DL are high. AU the nodes, A, B and C, a x high. The
signal & A is a s e r t e d when DL starts, for example, to drop slowly. In this m e ,
the NMOS transistor N, is ON. The output voltage (node C) drops suddenly
to a c a t & voltage. Thus, the input signal is amplified by the gain of this
differential amplifier.
Fig. 6.2l(b) shows the voltage waveforms of the single-end sense amplifier
uskg SPICE simulation. The signal is generated with an ATD pulse. It i s
340 CHAPTER
6
Low-Pourer CMOS Random Access Memory C~rcuets 341
asserted for a time, enough to amplify the small variation (few hundreds of rnV)
on data-lines', then it is disadivated. In this scheme the DC cnrrent consumed
by the sense amplifier is cnt off. Usually the sense amplifier is common to msny
columns through the common data-liner. The small Signel gain of this amplifier
is given by
* = 9-- (6.8)
90
where is the transconductance of the driver NMOS Nd and go is the corn-
y'mn
bioed output conductance of the PMOS load and the NMOS driver.
In many SRAMs multi-stage sense amplifiers are needed to attain large volte.gge
gain. In this case, the daublbend sense arnpLifier is used a6 shin Fig.
6.22. This circuit h s often been wed in many SRAMs. To attain high-speed
data sense, a two and three-stage sense amplifier technique a n be adopted.
Fig. 6.23 shows a two-stage amplifier structure. An equalisation technique is
used for the data-lines, using the equalization pulse 4sq,which is generated
with an ATD pnlse. It is indispensable, not only to attain faster data transfer
'Thc auipui of the srme ampmcr k then iatchcd.
342 CHAPTER
6
I
S
Figure 8.14 PMOS cross-couplid sense nmplrficr
during read operation, but also to suppress incorrect data before the comect
data appears in the sense amplifier [17]. For low-powei applications and &o
due to the plastic packaging limitations of static memories, this type of sense
amplifier can result in high power dissipation for high-density memories even
if the current source is pulsed.
Many circuits have been proposed to reduce the power of the sense amplifier
while improving their sensing delay time. One of them is the PMOS CIOSS-
coupled amplifier [I81 shown in Fig. 6.24. The PMOS loads, P, and Pz,are
cross-coupled and the M e r e n t i d outputs S a m S are connected to their girtes.
The positive feedback in this latch amplifier permits much faster sense speed
than the conventional one. In this circuit the equalization technique is used
for the reasons discussed above. Fig. 6.25 rhawr the senre delnys of both the
PMOS cross-coupled amplifier and the double-end current-mirror amplifier as
1 function of the average current of the amplifier. The input voltages simulate
344 CHAPTER6
0 6 prn CMOS
-
Convenuo~aicurrent -mrrror SA
1 2 3 4 5 6
'd
the common data-lines' voltages and the sense delay id is defined as the delay
time from the crosso~erpoint ofthe input voltages to the point when the ontput
reacher 1 V difference. The PMOS cross-coupled amplifier has less than half the
delay of the conventional current-mirror sense smplifrer. Moreover, this latch
amplifier consumes less than one-Mth ofthe power of 6 current-mirror amplifier.
The PMOS cross-coupled latch amplifier requires much more accurate timing
for +., to optimize the sensing delay [la], Thin circuit also has low-power
property compared to the current-mirror amplifier since it has nearly full-swing
outputs with positive feedback.
346 CHAPTER6
When the voltage is sealed to 3 V power supply, the data-line voltage is near
VDD, then a level shifting can be pedormed. Fig. 6.26 shows a two stage
sense amplifier wed for 3.3 V mpply. The first stage is a cross-coupled NMOS
amplifier which also performs level shifting of the common data-line voltage.
In the second dage, a conventional sense amplifier is used which operates at
the maximnm 9 .;. point since the l e d on SA a d YZ =re medium leutlr.
Fig. 6.21 shows another sense amplifier developed for low-voltage power supply
[IS]. This circuit is mcd when the bit-tines are close to VDD,where the gain of
a conventional current-mirroi amplifier is poor. The circuit is composed of a
level-shift circuit and a conventional current-mirror amplifier. The level-shifter
shifts the bibline voltage to a medium voltage; 0.6 to 0.7 V, (@ 1 V power
Low-Power CMOS Random Access Memory Czrczlits 347
supply voltage) where the gain IS maximum. Low-VT NMOS devices NL and
N2 are used to provide these medium levels. There devices are subject to the
body effect.
Recently current sense-amplifiers have been proposed to overcome the gain

reduction of voltage amplifiers a t low power supply [T, 121. Alao they reduce the
power diiaipntion of the sensing operation compared to voltage sense amplifiers
at the same delay. There circuits require wry careful dengn.
6.1.9 Output Latch

In low-power SRAM, the pulse technique for word-line and seme amplifter ir
indispensable in order to reduce the DC Current. In such B pulse mode. a data-
latch circuit is required to Store the amplified data by the sense amphfier from
the memory cell for the data output circuitry. Fig. 6.28 shows an example of
an output latch placed after the sense amplifier. The requirements of such an
ontput latch are the following '
m The latch circuit must not delay the mad access time. Such a require-
ment is attained by connecting the latch with data-bus lines in parallel.
One input transmission gate, controlled by 41,is used to enter the data
to the latch. Another transmission gate, controlled by 40, is used to
put the dat. back into the det-bnr.
rn The latched data must not be destroyed by the noise entering the
SRAM. A noise in an SFAM is generated and propagated by the fol-
lowing mechanism. On the system board, 8 ground noire can enter the
SRAM. When the peak level of the ground noise becomes large enough
for the first gate of the address buffer to change the logic value of the
address input, an ATD pulse noise is generated. This noise pulse could
turn on the word-lineand the *erne amplifier for a short time resulting
in an expected signal on the data-bus. Therefore, the Latched data
conld be destroyed if the inpnt Gp.1 is ON. To avoid such a problem,
two circuit techniques m e included in the eireuit of Fig. 6.28. The first
one is the generation of Qr only when the pulse width of the ATD is
large enongh, compared to that of the noise. The other circuit tech-
nique is to place latch-protecting invertem [Fig. 6.281 in the front of
the output gates. The inverterr prevent noise from entering the output
gates.
348 CHAPTER6
1 The new data must be quickly latched into the data-latch. The circuit
of Fig. 6.28 can be optimbed for fast operation.
6.1.10 Hierarchical Word-Line for Low-Power Memory

With the increased memory size, the word-line delay and the column power
increase. To solve this problem, B Divided Word-Line (DWL) structure was
proposed [ZOr. The concept of DWL is shown in Fig. 6.28. The cell array
and the word-line are divided into ng blocks (rub-arrays). If the SRAM has
no columns, each block has n o / n ~columns. The divided word-line of each
block is activated by the main word-line and the corresponding block select
signal. Consequently, only the memory cells connected to one divided word-
Line w i t h a selected block are accessed in a cycle. Hence, the column current
- Global
row decoder
n-
Block 2nd Block
- nBch Block
Elnck sdcct
lillC
n i n CI,IIIIlI"S
C B
(rneniory cells)
Figure B.m Divided Word-Linc (DWL) concept [ZD]
is reduced, since only the selected columns switch. Moreover, the ward-line
selection delay, which is the delay time from the address input to the divided
word-line, is reduced. This delay is composed ofthe main word-line select delay
and the divided word-linc select delay. The main word-line selection delay is
reduced compared to the conventional one, because the total capacitance of
connected transistors is reduced. In a conventional S U M , the word-he has all
the row memory c e k ' gates of B row connected to it. The insin word-line delay
increases as the number of blocks increase because the number of block select
gates increases. On the other hand, the divided word-line delay decreases as
the number of connected cells i s reduced with the increasing number of blocks.
Consequently, the word-line selection delay has a minimum for a certain number
of blocks.
Fig. 6.30 shows the effect of the number of blocks in DWL structure on the
word-line select delay and the colvmn power for 64-Kb SRAM [l o]. In this
example. a number of blocks of eight can be chosen. The ares penalty for this
case is only 5%, compared to the conventional memory. AE an example, for
I-Mb SRAM, the cell array is divided into 16 blocks and each black consists of
612 OWE by 128 columns. 9-bit address (,4...Ae) is used to select B I O W within
350 CHAPTER
6
I 2 16 32
Number of Blocks
a block using two-stage row decoder. Global block selection is done using &bit
address.
The DWL structure has been widely used in high-density SRAMa for its low-
power. high-speed characteristics. However, in high-density SRAMs, with a
capacity more than 4 M b , the nomber of blocks in the DWL structure will
have t o increase. Therefore, the capacitance of the global w o r d - h e increases
cansing the delay and power increase. To solve this problem, the concept of
Hierarchical Word Decoding (HWD) was proposed in [21] as shown in Fig.
6.31. The word select line is divided into more than two lev&. The number of
lev& (hierarchy) is determined by the total load capacitance of the word select
line to efficiently distribute it. Hence. the delay hnd the power ayt reduced.
For 4-Mb, three levels of hierarchy haw been used with 32 blocks; each block
having 128 columns by 1024 rows. Fig. 6.32 shows the delsy time and the total
352 CHAPTER
6
capacitance of the word decoding path comparison for the optimized DWL
and HWD strmtures of 256-Kb, 1-Mb, snd 4-Mh S U M S . For 256-Kb SRAM
there is no significant advsnthge of HWD over DWL. However, for high-density
SRAMs the perfounance, of HWD in terms ofpower and delay, becomes dear.
The three-levels scheme can be used efficiently for 16-Mb SRAMs.
6.1.11 Low-Voltage SRAM Operation and Circuitry

There are several applications which need a 1.2 V battery power supply. For
such B application 1 V SRAMs are needed. At 1 V power supply, B stable oper-
ation is targeted and it is very important that the noise is reduced. Moreover,
the active and standby powers should be reduced t o meet the requirement of
battery operation.
For 1 V power supply, a full CMOS memory cell has a lower power dirripation
in standby mode and greater immunity to transient noise and voltage variation
than other cells. It can also operate at the lowest supply voltages. Although
a full CMOS cell operates well at ultralow-voltage, its area is almost double of
that of PMOS TFT. Henee it is not suitable for high-density memories (sine >
4Mb).
When the full CMOS memory cell is operated at 1 V power ropply, a typical
cell ratio is 3 for stable operation. The SNM of this cell, at 1V, can be h o s t
the same as for a poly-Si load memory cell at 5 V. When nsing the fnU CMOS
4 no boosting of the wad-line is needed to write a high voltage level in the
cell. However, the PMOS T F T cell requires a boosted voltage (V.h > VDD)
on the word-line during the write cycle 1191. If the voltage of the word-line is
raised only to VDDin the write cycle, the high node B of Fig 6.33 is initially
at VDD- VT, where VT is the threshold voltage of the access device subject to
the body effect. This low-level (VDO- I+) of the node B em not charge up to
V0o because of the poor drimbility of the PMOS T F T device.
When the boosted word-he tedrniqne is applied to the PMOS T F T cell during
a write cycle, a problem can a G e . The unselected cells connected to the boosted
c o m m o n word-he suffer from an instability problem because a large current
flows through the low node of the cell. This large current is due to the high
voltsge on the access transistor. Consequently, this technique is not suitable
for 1 V operation.
Figure 8.54 Twertep t.Ehniq\is for 1 V operation [is].

354 CHAPTER6
Word driver
Low- VT
MOSFET
-Din WE Din
(a)
Figure B.55 (a) TSW m d l w i t e ~imuitm[is]
A TwrrStep Word (TSW) voltage technique has been proposed by Ishibarhi et

al. 1191 to solve the cited problem. Fig. 6.34 shows the block diagram of the
proposed memory. The boosted-level generator' generates a voltage V,, = 1.5V
for VDO = 1V. The word-line voltage har two-steps, one is VDD and the other
is K h . The circuitry for the TSW method is shown in Fig. 6.35(s). When Q,
goes to zero, the signal W L is raired to V,, = VDD. Then when .$ch is mserted
with a high l e d , equal to Vch, the transistor Pi tnms ON and then the W L
level is increared to V , , = Vch. In this e a e , the low threshold voltage device
N, tun. OFF and the inverter formed by the transistors Pa m d N, is isolated
to reducc m y leakage current.
Fig. 6.35(b) shows the voltage waveforms for the TSW circuitry in read/write
modes. During the write cycle, the high node A is first charged to a low voltage,
'The boostcdLvel8~lcratorirprcsentcdin ScetionB.2.11.
then raised to Vms.The bit-hes are initially floating, then prechaged at the
end of mite cycle. In the next read cycle, the b i t - k s are floating. Before the
word-line voltages rise to V,,, the cell discharges BL through the low node B .
Thus, when the word-line has risen to Vwt, current does not flow in the cell
and the node B stays at low level voltage. Note that this technique requires
mdti-V, CMOS devices and causes delay in writing because the bit-lines are
discharged before writing.
However. the low-voltagge S U M S discussed above require a relatkely high

threshold voltage VT 2 0.5V. Thus, their speed is qnite slow. As an ex-
ample. a 258-Kb SRAM with full CMOS memory cells attained 3 ps access
time at 1 V power supply using 0.8 pm CMOS technology [22]. The active
power at 0.1 MHa is 0.2 mW and the standby power is 5 nW.Another example
is a 1-Mb SRAM with fuU CMOS memory c c b which achieves 200 n s access
t h e at 1 V power supply using 0.5 p n CMOS technology 1231. The active
356 CHAPTER
6
cuprent at 1 MHs is 0.1 mW snd the standby current is 10 nW.Note that if

the tbrerhald voltage is too low for ultra-low voltage applications, all the eir-
wits composing the SRAM will suffer from the subthreshold current leakage.
Thus, the retention current increases drastically cansing B sedous problem for
low-power applications. Moreover, the temperature effect and the threshold
voltage variation enhance this current. So far, no practical solution has been
proposed.
6.2 DYNAMIC RAM

The first dynamic RAM (DRAM) was introduced in 1970 with a capacity of
1-Kb. Since then, the density has quadrupled every three years (one genera-
tion). Recently, some wperimentd 256-Mb DRAMs were reported [24, 25, 261.
At p'esent, low-voltage 16-Mb DRAMr run in high-volume production. The
development of there higher densities have made DRAMs the cheapest per bit
compared with other types of memories. They are widely used as the main
memory of mainframes,PCs, and workstations. The access time har been de-
creased from few hundreds of ns for 4-Kb DRAMr to less than 50 ns for 256-Mb.
Also the power dissipation has been reduced by an order of magnitode from
4 K b capacity to 256-Mb capacity reaching 50 mW at 1.5 V power supply. The
area of the memory cell has been reduced from more than 100 @mafor 64-Kb
DRAM to 1.28 @mafor 64-Mb DRAM.
In addition to the trend for higher-density standard DRAMs, there are two
other trends: Low-Power (LP) DRAMs, and high-speed DRAMr. The high-
speed DRAMs sacrifice the retention current ar well as density for faster access
time. Low-voltage low-power DRAMs are becoming important particularly
for battery operation. LP DRAMs extend the time of the battery operation
as well as battery back-up operation. The active current of LP DRAMS has
been lowered. The data-retention cuiient has also been reduced but rtii it is
about one order of magnitude higher than those of SRAMs'. The 5 V power
supply standard has been used for many DRAM &enmations from 64Kb to
16-Mb externally. This was followed hy 64-Mb DRAM powered with external
3.3 V not only to reduce the power dissipation, but &o t o emme reliability.
The gate oxide reliability limits the msldmum voltage which is related to the
boosted voltage inaide the chip. Regarding the internal voltage, the 5 V can
be used to a maximum DRAM capacity of 4-Mb. At 16-Mb generation, the
internal voltage is 3.3 V while maintaining external 5 V with on chip voltage
'This comparison is msdc for I - M b mernezicr.
6 - WL SWING
LIMITER
5 -
-? 4 -
w
0 3
4
t;
- -, Li
? I - -
4 Mn
1 - - 4 NiCd
0 I I I I I I
DENSITY 1M 4M 16M MM 256M Ic (hi0
FEAT.SlzE1.3 0.8 0.5 0.3 0.2 0.1 ipim)
Toi 25 20 I5 10 7 5 (nm)
Figure 8.38 Trends of DRAM upp ply [ Z B )
down converter [see Section 6.31. Howevez the 3 3 V externill power supply wlll
dominate.
Recently, activities to r e d r e 1.5 V battery-operated DRAMs are accelerating

the trend in lowvoltage operation [ZT. 28. 291. Fig. 6.36 shows the trend
of DRAM supply [ZS]. In battery operation, the chip must be operated on
B variety of batteries with various supply voltages for a long-term and under
supply fluctuationr.
358 CHAPTER6
-
CAS \ /
6.2.1 Basics of a DRAM

In general the pins of a DRAM are :
m Address; which is seprrrated in time with two separate fields. There

fields are the row and column address.
1 Row Address Strobe (m).
The row address is docked by this signal.
rn Column Address Strobe (m). The column address on the multi-
plexed pins is clocked by this signal.
rn Write Enable (m).
.
m Inpnt/outpot data pi...
External power supply pins.
It is dear that the multiplexed address penalims the access delay so for fast
DRAMr separate address input pins can be used. The multiplexing permits the
reduction of the pin count and the cost of packaging. An example of DRAM
timing, ndng the addresa multiplexing during read mode, is shown in Fig. 6.31.
Some important times are shown, such as the access time from low, tmS,
the row addxss strobe cyde time (or cycle time), tRC,and the row address
strobe low-state time, 1x1s.
Fig. 6.38 shows B gene& 4 M b DRAM architecture. It uses almost the same
circuit techniques as SRAM except for memory army. Some additional circuits
are needed such es a Back Bias Generator (BEG), B Half-Voltage Generator
(BVG), an optiond Voltage-Down Converter (VDC), a R,eference Voltage Gea-
erator (RVG), and a boosted voltage generator circnit. The substrate back-bias
voltage is indispensable for stable operation of the DRAM array. The half-
voltage generatar permits generation of the precharge level for the bit-lines to
half-VDD as it is explained in the following sections. The reference voltage
generator ir needed for the VDC. The boosted voltage generator uses b charge-
pump circuit and permits overdriving of the word-line WL to a voltage higher
than VDD.More details on these circuits, composing the DRAM, are given in
the following sections.
6.2.2 DRAM Memory Cell

CMOS DRAMr, with threetransistor and four-transistor cells, were used in 1-
and 4-kb generations. One-tranristor (IT) cell offers smdei chip size and low
cost. These justify the process complexity to fabricate the IT ccU, particularly
its capacitor.
A &hematic of B 1T DRAM cell is illustrated in Fig. 6.39(a). The charge is

stared in capacitor C,.To prevent loss of the stored information, the capacitor
must be refreshed within a specific time with spedal circuitry. The bit line has
a capacity CBLinduding the parasitic load of the canneeted circuits. Typical
values for the storage and the bit-line eapaeiton are 30 f F And 250 f F , re-
spectively. The ratio R = CBL,’C, is very important for the sensing operation.
360 CHAPTER
6
---
9.
RAS CAS WE
r .
102
I'
Low-Power CMOS Random Accrss MemonJ Circuits 361
Doring the read operation ( W L is selected) the bit-line wltage changes by
where (VMC- Vm,) is the difference between the memory cell voltage and the
bit-line voltage before the selection ofthe cell. A typicd value of the difference
is V D D ,Hence,
~ we have fog the hit-line renre signal
(63)
For 3.3 V supply voltage, and using a rstio E = 8 far 16-Mb DRAM,the sense
signal V , = 180 mV. This r m d voltage change, of the bit-line, requires sensing
circuits. For low-voltage operation, V. decreases, thus a low ratio R is required.
This is possible by reducing CBLand increasing C,.
C, was implemented ming a simple planar-type capacitor a~ rhom in the

structure of Fig. 6.39(b). Thi structure WBS used in DRAMS with capacity up
to I-Mb. With the increased density, many threedimensional approaches were
used for DRAMs with capacity higher than I-Mb. One approach is to stack
the capacitor over the access transistor (STCcell). Another approach is to m e
a trench capacitor. For more details on advanced cell structure the reader can
consult 130, 311.
The signal charge (Q.ig = C.AV,) transferred to the bit-line during a r e d

operation should have enongh margin agsinst noise. The sources of noise are
the following :
rn bit-line noise; which is caused by capacitive couplings and other sonr~eei

leakage charge; which is mainly due to the leakage in the junction of
. the NMOS trmsistor of a IT memory cdl; and

a-particleinduced soft errom
In the early DRAM,the plate of the capacitor WBS grounded to reduce the
noise injection from the VDDpower supply. However, for multi-Mb DRAMs,
a VDD/Z bias €or the eeU plate was nsod. This scheme has several advantages
such as, the reduction of the stcess on the thinner oxide of the atorage capacitor,
and the reduction of supply voltage noise. Many I-Mb DRAMs have used this
cell biasing scheme.
362 CHAPTER6
For Gb DRAM cell design with redneed VOD,the ratio R should be rednced.
This L possible by reducing the bit-line capacitance, Csr. and increasing the
storage capacitance C.. On the other hand, the area occupied by C. should
be rednced to increase the chip capacity. One solution for C. reduction is the
use or* capacitor insulator with extremely high permittivity 6 such BI Ferra-
electric materials nuch as BoSrTiOJ film. Consequently B simple planar-typo
capacitor can be nsed in that c a ~ e
Low-Power CMOS Random Access Memory Czrcurfs 363
6.2.3 R e a m r i t e Circuitry
Fig. 6.40 illurtrstes the Merent circuits for read, write precharge, and equal-
isation funotions. The read operation is performed as follows. Initially both
the bit-lines ( B L and BZ)are precharged to V, which is equal to VDD/Zand
eqndized before the data reading operatirm. This hali-yoo preeharge technique
permits the reduction of the active power disdpation 89 discussed in Section
6.2.9. The signal W L is seleded by the TOW decoder. The high level of the
word-line voltage har to be greater than VDD to increase the stored chaise in
the memory cell. The selected memory cell is connected to one bit-line. Then
AVBL (100 to 200 mV) appears between the bit-lines, immediately &her the
word-line rises. Then it is amplified by the latch-type CMOS sense amplifier
364 CHAPTER6
which is connected to both bit-liner. After the sensing and the restoring o p
erations, the voltage levels of the bit-lines bsve a full-swing condition. The
bit-line differential voltage signal is transferred to the differential output-lines
(0 and d), through a read drcnit. The signal YR i selected h o s t at the
8-e time with W L . The parasitic capadtance of the output-line is large (a
typical value 2 pF for 4-Mb DRAM), and the readout circuit would need a long
time to amplify the ootput-line signal. A main sense amfler is used to read
the output-liner, then the data is selected among several main SAs connected
to different sub-arrays. Finally it ia transferred to the output buffer.
The DRAM cell readout mechanism is destructive, and hence the same data
must be wsdtten to the cell on every read access. Consequently, on each bit-
line pair, a CMOS mpifier is needed to amplify and restore the level. This
mechanism is not needed in SRAMs since the lead operation is non-destructive.
In the write made, the YW Jignd is selected by a column decoder as shown

in Fig. 6.40. In this ease, the write control signal is actiTated. The selected
bit-lines are connected to a pak of wdte-liner W and W and the data are
transferred to the memory cell when W L goer HIGH.
6.2.4 Low-Power Techniques

Fig. 6.38 can be osed to identify the different sources of power dissipation in B
DRAM. For simplicity we asmme that the internal supply voltage is the s a m e
compared t o the external one. The total power dissipated is the addition of
two components; the active power and the data-retention power. The active
power is the rum of the power dissipated by the following components;
The decoders (row and column);
The memory army. This is the dominant one. If m memory e d s ate

connected to the word-line, the active power of memoly array is &ken
by
P.,,sm.a,,ov = m x Poem (6.11)
Where Pmctm is the power dissipated in active mode when selecting the
m cells. It is given by
Pacam= C m A V m V D D f (6.12)
m The sense amplifier;

Low-Power CMOS Rondorn Access Memory Circuzts 365
= Other circuits such as refresh circuit, substrate back-bias generator,

boosted l e d generator, B voltage reference circuit, and a half-VDD
generator. These circuits &a dissipate a DC current;
m The rest ofperiphery such BS main sense amplifier, input/antput buffers,
write circuitry etc.
Note that the power dissipated by the pads is not included.
To ieduce this active power, many techniques can be used and a m smnmarieed
as follows :
rn Reducing all capacitances; particularly the bit-line and word-lines <a-

paeitanees. As seen from Equations (6.11) and (6.12)m Y Csr.should
be reduced. Techniques which permit this are partial activation multi-
divided bit-line and shared IjO [see Section 6.2.7]. Also to *educe the
word-line capacitance, a techniqne such as partial activation of mdti-
divided ward-line can be used [see Section 6.2.81;
Lowering the internal VDD.This i n d u d e the generation of half-Voo
for precharging the bit-lines and reducing the external supply voltage;
and
Reducing the DC power required by periphery circuits. This is possible
by using static CMOS decodes and puke operation technique using
an ATD circuit (as in SRAMs).
The data retention power in a DRAM is mainly due to refresh operation and
the DC power ( I D c ) due to peripheral circuits such 8s BBG, BVG. VRG,
HVG. The refresh process is performed by reading the m cells connected on
each word-line and restoring them. Thus, n refresh cycles are needed for n x m
DRAM. It can be estimated by
where 9 is the total dynamic energy (f is the operating frequency) and

n/fvejrS,b is the refreah time of m c e b . To reduce the power dissipation due
to the ieLwb mode, one obvious technique is to increase f,<j,<,h and decrease
n.P, L the AC and DC power dissipated by the other circuits such BS VDC,
BBG, RVG, BVG,and boosted level generator. To redoee this power m y
366 CHAPTER
6
Figure 8.41 Static CMOS .mrd-linc dr>vrr
techniques can be used. One of them is to reduce the frequency of operation

of circuits which have high-power during active mode when operating in data
retention mode. Another one is to reduce the DC current of there ckcuits
using, for example, dynamic concept.
In the following sections, the circuit techniques to reduce the active and data-
retention power dissipation are presented. Also, different circuits conrtitnting
a DRAM are described and low-power issues of these eirenits are discussed.
6.2.5 Decoder
In a DRAM, the static CMOS NAND decoders are used. The power is reduced
by ‘sing the predecoding technique. This topic is discussed more in Section
6.1.6 for SRAMs. Fig. 0.41 shows astatie CMOS word-line driver. The boosted
level, K h , generated by an intunsl charge pump circnit, is used in the output
stage. When node A is high at (VDD- VT),the antpnt inverter le& a high
DC ourent because this is l m w then Vrhby 8%least two threshold voltages,
sobjeet to body effect. Therefore, a small size PMOS transistor PI is used to
restme the level of the node A to K, l e d . Also this transistor permits the
latching of the low output level (ground). Thc Xi signal, when selected, is
normally at Voo. The unselected X, is discharged to ground in the selected
block before the row decoder selection.
6.2.6 Sense Amplifier

The main sense amplifier is the main source of DC current during the x-
t h e mode. It employs the same sense amplifier discussed in Section 6.1.8 for
SRAMs. T h e DC enrrent can be shut down using the ATD technique.
6.2.7 Bit-Line Capacitance Reduction

Redocing the bit-line capacitance not only reduces the power dissipation but
slso improves the signal-t-noise ratio of the memoiy cell. This is possible by
two approaches :
1. Reducing the number of memory cells n per bit-line. In this ease,

multi-divided bit-line technique is used.
2. Redncing the jnnctian capacitances of connected transistors such 8 s
access devices. One possible solotion is the back-bias of the substrate
cant- these devices. A negative voltage on the substrate permits
to reduce the junction capacitance. In addition, the we of the trench
isolation technique for CMOS devices rather than the LOCOS isolation
results in almost 50% ieduction in capacitance,
Fig. 6.42 shows the principle of multi-divided bit-line architecture for the mem-
ory array. The m x n m a y is now divided into m columns by k snbarrays.
Each subarray contains n/k word-lines. In this scheme the bit-line capacitance
CsLis reduced by dividing it into k sections. Also the signal-twmise of the
cell is improved. Fig. 6.43 illustrates an example of I-Mb DRAM [32]. The
memmy is divided into two parts; upper and lower. One part is divided into
N = 16 sub-arrays and the total number of rubarrays i s k = 32. Two sub-
-
bit-lines share one amplifier which are selected by isolation sign&, I S 0 and
ISO. Thus, a partial activation is performed by selecthg only one SA along
the bit-line. The switeh SW is controlled by the Y signal from the shared
e o l m decoder. This signal runs in parallel to the bit-linw and uses metal-2.
Thos, the 1/0is shared by two sub-bit-hes. Thk principle results in reduced
power dissipation and chiprize. It has been used foz many DRAM generations
up to 16Mb.
6.2.8 Multi-Divided Word-Line

368 CHAPTER
6
Row decodri
._ - - _
--_ ---__ Bit-lineinmetal-l
(meid-2)
Figure (1.45 Multi-divided bit.8ne orchilceturr with shard SA, I/O snd
eolum.dccodrr[Zl].
370 CHAPTER
6
,,,R ..-._
._ ._
Fig. 6.44 shows the hierarchical word-line structure proposed for a 256-Mb
DRAM [26]. This scheme resembles the one used in the SRAM. The DRAM
cell array is divided into several blocks and each o m itself is divided into sub
arrays. The SnbWord-Line (SWL) circnitry is embedded in the subarray.
Only one S W L is activated by the Main-Word-Line (MWL) and the 109" select
Jignd. It is common to two sub-mays as shown in Fig. 6.44. Thus, only two
cell rubarrays are activated which represents B very small portion of the total
cell arrays. In the case of the 256-Mb, the active cell array rise is 1/1024 of
the total number. This ntrosture results in reduced active current and ground
bounce.
Lorn-Power CMOS Random Access Memory Czrcoits 371
6.2.9 Half-voltage Generator

One efficient technique to reduce the memory anay operating current is half-
VDD bit-line precharge [33, 341. During the sensing operation, one bit-line
switch- from V D D / ~to VDDand the other switches to m o . This resnlts
in L powex swing of almost h a , compared to the fd-Voo precharge ease,
BS well as peak current. Note that the reduetian in peak current leads to
suppression of noise. In addition, the precharge time is reduced and the cycle
time is shortened. This preeharging technique has been used darting from
I-Mb DRAM generation.
A simple circuit which permits the generation of this half-VDn is shown in

Fig. 6.45. The HVG CLcait is composed of two stager. One stage L B bias
generator which generates two voltagelevelr; (VDD/Z+VT) and (VDD/Z-VT).
The second one is the push-pull output stage which generates the level V D D / ~
distributed to the memory array. The load capacitance, seen by the push-pull
output stage, is huge. A typical value is a few tens of nF. A typical response
time when the circuit is powered-op is few tens of ps at 3.3 V power supply
voltage for 16-Mb DRAM. This HVG circuit has many disadvantages such as
ZL6
duty ratio of the H V G E signal in the data-retention mode. To solve the other
problems dted an HVG G c d t was proposed k [28] but this circuit dissipates
B DC =-rent.
6.2.10 Back-Bias Generator
The back-bias valtage VBB is utilised in a DRAM to reduce the subthreshold

current and the junction capacitances, to improve deem isolation, t o enhance
latch-up immunity, and protect the circuit against voltage undershoots of the
inpnt signals. Also this voltage can he wed to compensate for some device
parameter variations.
For NMOS devices with P-well (substrate) a negative Vsa is generated by

pumping electrons out of the ground node and into the substrate. A typical
VBB generator configuration is shown in Fig. 6.47. This circuit is known as
charge p a p . The node A oscillates between VT and (Vr - VDD). D n k g
the high side of the cycle, the node A must be at least at VT to pump the
chsrge from the gronnd. On the low side of the cyde, the node A mart be a
VT drop below V s S .The antput node VBs stablize. at a voltage l e d equal t o
(ZVT - VDD),since the losd capacitance is huge. The clock (clk) is generated
by B ling oscillator with N (N is an odd number) stage. The frequency f of
oscillation, is approximately 1/(2Ntd), where t d is the delay of one inverter.
The buffer is needed to drive the huge C,,,, capacitance. The average current
pumped out of the substrate is approximated by
Ipmp= ( V m - vBBm;.)c,,f (6.14)
where VBBminis the back-bias voltage when no current is pumped and is equal
t o ( W - V D n ) (optimumvalue). During thertart-upalargecorrent Lpumpcd;
equal to (-Vasin..C,,,f).
Another PMOS version, of the charge-pump circuit, ir shown in Fig. 6.48.

Since the gate voltage of PI only reaches -VOD, Vsa ir pumped to a limit
of (VT - VDD). For VDD = 5V, the NMOS and PMOS charge pump circuits
generates typical voltage. of-3 and-4 V,respectively. However, for 3.3 V power
supply, the PMOS version can generate a low negative voltage of -2.5 V which
is Lower than the one generated by the NMOS version at this power supply.
Fig. 6.49 shows e. pumping circuit which avoids the VT losses and hence is
suitable for low-voltage operation [35]. When the clock ( c l k ) is low, the voltage
of the node A reaches (IVT~I - VDD), and the PMOS transistor PI clamps
374 CHAPTER
6
Low-Power CMOS Random Access Memory Clrczlzts 375
376 CHAPTER
6
the voltage of node B to the ground level. The Vgg level is in that case,
(IVT,~- VOD- VT,,). When clk goes to a hieh level, the voltage of A rises to
V T and
~ the voltage of B , by capacitive coupling, becomes -VOD, causing VBB
to be equal to -VDD. Therefore the Vse will be
Vsa = mas{-Vm, V
l ,I~ VDD - VF") (6.15)
This eircvit needs a special triplewell strncture to avoid minority carrier injw-
tion of the NMOS transistor N, as discussed in [SS].
To reduce the power dissipation of the BBG dreuit, while the DRAM is not in
an active mode, the BBG can be operated a t low fpequency. Fig. 6.50 shows
D simplified circuit diagrsm of the BBG circuits for low-power operation [Xi].
In the normal mode, the ring oscillator works all the time to retain the Vsa
level. In the data retention mode, the BBG Enable (BBGE) signal is clocked
Lou-Powuer CMOS Random Access Memory Czrcuits 377
with a low duty ratio. Then the ring oscillator is operating with low-frequency
to iefresh the pumping eircuit.
6.2.11 Boosted Voltage Generator

A Boosted level circuit is needed to generate a voltage level above VDDby
at least VT. Tho word-line driver is powered with this voltage Vrh. A simple
boosted voltage generator is shown in Fig. 6.51. It use6 the charge pump
circuit technique discussed in Section 6.2.10. The outpnt of this Circnit is
switching between (VDD- VT)and ( 2 % ~- V ) .The clock 4 is generated by a
simple ring oxillator. Another circuit which switches between VDDand ~ V D D
is shown in Fig. 6.51(b). It uses two non-overlapping clock phases. This second
circuit configuration uses feedback NMOS devices, N I and Na, to eliminate the
threshold voltage loss and boost the voltage a t higher voltage. This circuit is
not sensitive to power supply voltage reduetion.
The boosted level can not be dkctly used to drive the load. Thus a pass
transistor is needed to isolate the switching boosted level from the load as
shown in the example of the drcuit of Fig. 6.52(a) [28]. The charge pump
circuit CP1 generates at the node A, B boosted signal switching between VDD
and ZVOD. To control the pass tiandstor N , two pump circuits CP2 and CP3,
and an inverter INV are needed. The pump circuit C P generates, a t node
B, a signal switching between WDDand ~ V D and D uses the boosted voltage
Vrh. The other pump circuit CP3, controls the inverter INV. The output
of this inverter (node D) switches between VDDand SVDD. The output of
this KVG circuit is Vc,, = 2VDD and it is stable since is large. The
voltage waveforms are shown in Figure 6.52(b). This ekcnit is insensitive to
VDDreduction and can work down to s u b 1 V power supply.
6.2.12 Self-Refresh Technique

Standard DRAMS require an erternd DRAM controller5 to control the refresh
pmcerir of memory cells. The stored charge in the memory cell deueases due
to the leakage current with high rate at high temperature. The refresh time
(period) L,.th is determined from the timc needed for the stored charge in
the memory cell to keep enough margin against leakage at high temperature.
This indicates that trljr.,h can be lower than what is expected at room tem-
378 CHAPTER 6
380 CHAPTER 6
perature. One way to increase this time, and hence reduce the dato retention
powex dissipation, is to eontrol the refresh period funftion of the chip tempera-
ture. Fig. 6.53 shows LUL on-chip self-refresh control circuit with a memory-cell
l e h g e monitoring scheme. A iefreJh dock hraffrlh ir generated automatically
with a period of t,s,va,h.The moOitox cell, which has s hk?.&ecunent I&,
controls the refresh period. Initially node A is high, the NMOS transistor N is
OFF, and node B is low. When the c h a w on node A is deereased to the p&t
that the PMOS transistor P toms ON, node B riser up. Then, during t h e 7
B high puke is generated at the node C, whieh in turn charges OP node A to
high level.
Low-Power CMOS Random Access Memory Cixuits 381
6.2.13 Low-Voltage DRAM Operation and Circuitry

Low-uoltage operation is reqnired to reduce the power dissipation and to assue
the reliability of deepsubmicrometer MOS devices in futue DRAMS. The
power rupply voltage ULO be as low as 1 Y to meet the requirement of battery
operation for portsble applications. To get high performsnce in a high-density
DRAM, at low supply voltage, the threshold voltage of MOS devices should be
reduced. This results in an increased subthreshold curtent and hence circuit
techniques are neeeded to reduce the standby current. In this section, circuit
tehniques to reduce the subthreshold current for the DRAM array ( e q u d k r ,
precharge and ~ e m ampli&r)
e circuits, memory-cell access, and word-line driver
are described.
6.2.13.1 DRAMArray Circuits

Fig. 6.54 shows the conventional DRAM array circuit with the half-VDD bit-
lines precharging tehniqm. This circnit has already been discussed in Section
6.2.3. When VDO is sealed down, this M - V D D seheme causes several problems
with respect to the CMOS latch-type SA and the e q d n e r . For example, for
the NMOS transistor, Nsr,of the N-type SA (N-SA) the following problem
can exist. When the signal 4.. is pulled-down during the readout operation,
the sensing operation starts when the voltage Vosl [See Fig. G.541 becomes
larger than the VT of the NMOS transistor of the SA. However, if VDOJ Z is law
enough, approaching the d u e of V., then the sensing operation is very slow
doe to the low value of VGV,. Note that VT is subject to the body effect when
the common source of the N-SA is falling to ground.
Another problem arises duing the equalization period. The equalization is

carried out by the NMOS device, N g p , when the signal dp is activated. In the
final stage of equalisation, the drive current of the NMOS qualiner decreases
drastically, particularly when VDD/Z is not higher than VT. Note that the
threshold voltage of the equalizer is also subject to the body effect.
One solotion to these problems is the use oflow-VTdevices in the DRAM army
for the CMOS SA, prechlrrge and equ&g circuits. However, this leads to a
drastic inuerse in the leakage current during the active period. The leakage
current paths are shown in Fig. 6.55. To significantly reduce this leahge cur-
rent the concept of Welldynchronized Sensing and Equalizing (WSSE) concept
was proposed [37]. It is based on the following two concepts:
382 CHAPTER
6
rn The voltage levels of the transistor souxes and the well are equaled
during the sensing, the restoring, and the equalizing period. This dim-
h a t e s the body effect.
rn A negative (positive) him, V s s (&) is applied to P-well (N-well),

respectively, during the active period. Thus, the leakage current is
reduced because VT incremes due to the body effect.
Lou-Pourer C M O S Random Access Memory Circuits 383
Fig. 6.56(a) shows the WSSE eireuits using a triple-well structure. The N-well
and the P-well control voltages, Vw, and Vwp, respectively, are controlled by B
s p e d logic. Fig. 6.56(b) finstrates the voltage waueforms. Before the word-
line is activated, the bit-lines and #
,, and $, are equaliaed to haKVoo. The
P-well and N-well levels BIC prechapged to ( ~ / ~ V -DVDn ) and (1/2Yon ~
VT~), respectively. There voltage levels permit to avoid any drain-well voltsge
forward-biasing during the initial time, after W L activation. During this initial
time, one bit-line is different than VDD/Z.In the sensing and restoring period,
the signals 4.. and Vwp are palled-down while the signals $, and Vw. are
pallhp; each pair is synchronimd. After this period, the bit-lines BL and
are in full-Jwing condition. Then, the level Vw, is pulled below GND to VHH
and isolated from &, while the level Vw. is pulled above VDDto V& and
isolated from qLp.
6.2.13.2 Memory Cell

First, let's dixcms the requirements far the memory cell, particularly at low-
voltage. Fig. 6.51 shows the memory cell in the restoring operation. To restore
the high-level, V b , from the bit-line to the storage capacitor, the word-line
must be boosted to s level Vch.This l e d has the following requirement
Vrh > VDD+ ~ ( V D +
D a) (6.16)
where a is the voltagemarginand VT(VDD) is the threshold voltngeofthe access
NMOS transistor when its source is at VDD.Note that the NMOS device has
(VDD+IVHHI) a5 an effective back-bias voltage. Far transistor reliability, Vs,
should be as s m d as. possible. This meam that Vr(Voo)is required to be
s m d . This threshold voltage is given by
VT(V?D) = VTO + 7v,- (6.17)

where VTois threshold at zero source and substrate bias, 7 is the body effect
coefficient and 4, is the Fermi potential.
Fig. 6.58 shows the anselected memory oell in long cyde operation. The bit-
line hsr completed t h e sg
- operation and is at gronnd level (GND). In this
situation, t h e memory cell is exposed to worst case leakage condition. The
c h q e stored in the cell leaks rapidly due to the subthreshold current. This
situation sets the lower limit of the threshold voltage. Note that the access
transistor of the memory cell has lVss1 as back-bias voltage. The threshold
voltage in this mode is given by
384 CHAPTER6
Low-Power CMOS Random Access MemonJ Czrcuats 385
To meet these two requirements of the threshold voltage, the substrate voltage
should have a suEcient bad-bias voltage to suppress the body effect.
For example when the internal supply voltage is VOD= 1.5 V, the IVsel is set
to -1. The V~(1.5V)’ is 1 V and the Vp(0) is 0.75 V and S = 90 mV/decade.
‘Extrapolakd thrcrhold v o h g r .
386 CAAPTER6
+
this case, Vch must be larger than (VDD VT(VDD)) which is 3 V.
-
Therefore, the lcskage current of e transistor with W = 1 pm, is 10 fF. In
When the VT of the memory cell is reduced, the leakage current increases
drastically. The concept of Boosted Senre Gronnd (BSG) [38] was proposed
to shnt down the subthreshold current in the memory cell B C C ~ S S transistor.
This is achieved by slightly boosting the low-level voltage of the bit-line. This
level is called BSG level, and is set at 0.5 V. During a long cycle operation,
the gatesource ofan unseleeted cell is negative (-0.5 V), then the subthreshold
current is redveed by 6 orders ofmagnitude (for S = 80 mV/decade). Fig. 6.59
shows the BSG circuit applied to a memory cell. The BSG line is common to all
N-channel sense amplifiers. The BSG l e d is generated by .e circuit similar to
the VDC circuit [see Section 6.3. I0 active mode, the differential amplifier and
N I are activated and the voltage of the sense ground becomes Kc,. The W2
transistor has alarge width and is activated by the signal SE at the beginning
of the sensiig period to suppress an unnecessary rise in the BSG level by the
sensing current. In the standby mode, the differential amplifier is made inactive
to reduce the standby current and also N , and N 2 . The BSG level is clamped
to the threshold voltage of N,. Note that the boosted level, Vrh, is reduced
compared to the conventional scheme because VT is reduced.
6.2.13.3 Word-Line Driver

Scaling the threshold voltage down increases the subthreshold current of a
DRAM, particularly for iterative circuits such m word-line drivers or decoders.
If the DRAM is divided into k blocks, each block has a drivers, then the total
of word drivers is k.n. Fig. 6.60 shows an example of DRAM drivers. During
lhe active mode, one driver out of k.n drivers is selected by the row decoder
and the word-line is at the boosted Level K h , generated by the internal ehsrge
pump circuit.
When the threshold voltage is low, the subthreshold elurent of each driver is
important. Then for &DRAM the total subthreshold current of the drivers is
L,adr = L.n.l.,a (6.19)

where I,,s is the subthreshold current of NMOS and PMOS transistors (as-
sumed the same). For B high-capacity DRAM, the current L b d , would be
driver har a subthreshold current of -

huge. For example, a multi-Giia-bit DRAM har B 1 million drivers, and each
10 nA at room temperature, then the
total subthreshold current would be 10 mA. At 75 C,this current can be hun-
dreds of mA. This high DC current destroys the Vc6 level because the charge
Figure 8.59 Boosted Senre Ground (BSG) tirclut
pump eLcuit cannot handle such a DC current. Note that this current should
always be evaluated in the worst case; maximum temperature, and the lowest
value of VT. In the standby mode, all the drivers are turned OFF. The current
L a d - is still the same.
To solve this problem, the concept of Self-Reverse-Biasing (SRB) scheme c 8 n

be used !24]. This concept has already been discussed in Seetion 4.10 [Chapter
41. Fig. 6.61 shows the application of the SRB scheme to word-he drivers.
During the active mode, the control signal 3 is low and the node SL is equal
to Kh. Only one word-line is selected. When 6 goes to high (standby mode),
the PMOS device Ps limits the subthreshold current. In this mode, all drivers
are OFF,even lhe selected one. Fig. 6.62rhowr the technique to turn off the
388 CHAPTER6
V,h (boosled levcil

selected drive^ in standby mode. When d is low, node Ai is high, then the
selected wmd driver is low.
One problem associated with the SRB acheme is that daring the actke mode,
after one selected word-line driver is activated, d the other drivers m e leaking
thereby substantidly contributing to the active current. This problem is solved
by the partial Betivation of hierarchical power-line scheme 139). Fig. 6.63 shows
the principle of the 2-D selection scheme. In this scheme, the array of k blodrs
b7 n drivers is divided into E sob-blocks in columns and I sub-blocks in mw6.
The total of sub-blocks, each containing a set of drirers, is k x I . Dudng the
active mode, only one subblock is activated. Thus the subthreshold carrent in
the active mode is drastically reduced.
6.3 ON-CHIPVOLTAGE DOWN CONVERTER

Chip makers prefer to scale down VDDto enhance the device reliability, while
the users prefer it the s a m e power supply voltage and dislike the frqumt
changes. The reduction of VODis &o important to achieve low-power char-
acteristic. The strategy to meet these cantrildictory requirements is to use an
on-chip Voltage Down Convwter (VDC). A VDC can be used to convert the
old power supply voltage standard of 5 V to 3.3 V to power CMOS circuits
using 0.5 p n and sub-0.5 pm technology. For the state-of-the-art 0.25 fim
(SMOS technology, the power snpply voltage must be 2.5 V. However, the new
standard is becoming 3.3 V and is likely to stay that way for many years. Thus
a 3.312.5-V VDC is required.
On-chip VDCs are used for DRAMS as w d BJ SRAMs, ASICs and digital
proeersors. They m e employed in commercial 16-Mb DRAMr to reduce the
external 5 V to an internal voltage of 3.3 V. For SRAMs,they have not been
commonly used as in DRAMr, partieulmly in commercial ones. The SRAMs
can operate over B wide range of power supply. Moreover, they already have
low data retention current, enough for battery-operated applications. In thk
section, w e discuss the VDC &<“it tcchniquer for DRAMS which are basically
the same as for SRAMs and other circuits.
Numerous pspers have reported designs of the VDC circuit for B DRAM [32,40,
41, 42, 43, 44, 451 and for an SRAM [46]. Fig. 6.64 shows one approach using
a VDC to reduce the internal voltage for 8 DRAM. Memory cell array and the
periphery circuits are powered from the internal supply voltage, while the 110
390 CHAPTER6
Low-Power CMOS Random Access MemonJ Circuits 391
Figure 8.82 Detail of rord-driver w i t h voltage ahifter.
vch
t ,O 0 V”b
u h
392 CHAPTER
6
bfiers are powered with the external voltage to maintain the compatibility.
However. the VDC, in thk situation, should be stable when supplying a large
current to periphery and memory array. When the VDC is used for battery
operated applications, the standby current should be less than 1 p A over a
wide range of temperature (0-70C).
-
Fig. 6.65 shows a schematic of the
5
VDC structure for a DRAM, used to convert
V to 3.3 V. It is composed ofaReference Voltage (&) Generator (RVG),
a driver circuit and B time-dependent load. The buffer dreuit consists of a
differential amplifier [Fig. 6.661 and common-smrw drive PMOS transistor Pb.
The current load has B peak, for the memory spray, ofmore than 100 mA in 10-
30 nd time and more than 100 mA in few ns for the periphery <Leuit. To deliver
such a large carrent, the width of the PMOS 8 of the outpot stage shanld be
large. Moreover, when the output current changes rapidly, the output voltage
VDD decreases by AVDD. To m i n i = AVDD, the gate control voltsge, VG,
hes to change quickly. This is possible by increasing the differential amplifier
tail current, I,. The current snomce, I., is needed to clamp the mtpnt voltage
VDDwhen the load ourrent becomes almost zero.
Q 10
circuit
t.
Figure 6.08 Schematic of Lhr differential amplifier,
394 CHAPTER6
A VDC circuit is one of the keys for achieving 8. DRAM with data-retention
current that can be used in battery based applications. The requirements for
low-power are the following :
The standby current mast be less than 1 P A o v a a wide range of

temperature, process and power supply voltage variations; and
rn The output impedance of the VDC should be low.
6.3.1 Driver Design Issues

The internal voltage generated by the VDC c a n have many BOIIIC~S of flnctua-
tions which are as follows. DC changes in the reference voltage dne to process
and temperature variations. Transient variations caused by the noise in the
external power supply and by the load current. The variation of the internd
voltage with respect to the reference voltage should be less than 3%. The vari-
ation with respect to the load have to be less than 10% and with respect to the
power supply less than 1%.
The stability of thir circuit is essential for the operation of the VDC. To study
the stability, ac smd-signal analysis is carried out. Fig. 6.67 shows the aim-
plified equident circuit using the MOS smd-signal techniques [47]. The gate
capacitance of the output PMOS Cor is hnge and is taken into account. gml
and gmr are the transcondnctances of the differential amplifier and the output
stage, iespectively. T , and p1 are their iwpective equivalent output resistance.
Ci. is the ovtput load capacitance composed of the wire capacitance C-', and
the switched capacitance of the memory core em8.
The frequency response of this circuit L upreared by
(6.20)
The circuit has two poles: m = l/CGq,for the differential amplifier and
PI = l/C,,n for the output stage. The two poles must be sufficiently sep-
arated from each other to M J U I ~ a good phase margin 1481. For a DRAM
application, the pole pa varies drastically, because of the load variation. Thus.
the circuit CM fail to ensure a sufficient phase margin and hence it c a n generate
ringing or oscillation. Therefore, phase compensation has to be applied. One
'A typical ralw of C, is 1OOpF.
'A typical ralm 01C, is 1200 DF.
possible compensation technique is shorn in Fig. 6.68(a) and it is called Miller

compensation technique. The compensation capacitor C,is connected between
the input and the output ofthe second stage. It shifts the pole p1 towards lower
fieqoeney pk, BS shown in Fig. 6.68(b). Thos, the phase margin is improved.
The condition of the stablization is defined at the paint of 0 dB loop gain where
the phase margin is larger than 45 degrees. Using the smd-sigignal analysis with
the compensation eapacitm C. the condition c a n be utracted. This capacitor
is a function of gma, gml, CL and Co. To determine it, gmml has to be known,
using Iarge-Signd analysis. The PMOS driver Pb has to be rised to satisCy the
condition on A V D D ~ V D(less
D than lo%), due to the transient load current
variation. Hence 9-2 can be determined from the she of &. For a 1 6 M b
DRAM, the width of the antpot PMOS Pb can be as high as 30,000 p m and
C, eqn& t o 200 p F . This is for 3.3 V internal power supply generation from
5 V.
The current tail of the differential amplifier can be high (few ma) in active
mode. The driver can be &activated in standby mode to conmme only a very
small current by Chip Select (CS) signal. In this case, the internal vdte.ge can
be supplied by a low-power voltage follower (461. The voltage fallowex has the
same eonfigmation as the driver but the tail current is in the nub-fiA range.
6.3.2 Reference Voltage Generator

The Reference Voltage Generator (RVG) must provide B high accuracy over
a wide variation of VDD,process, and temperature. So far, the RVGr have
been based on the band-gap reference and on the threshold d t a g e generator.
396
CHAPTER6
LOOP
Gain
The former consumes a DC current which is not low enough for low-power
applications. The latter is more suitable far B CMOS technology.
Fig. 6.69(a) shows a PMOS-VTdifference generator with an output voltage

AVT = l V ~ ~-i lIvTpsl (VT,, < V T ~ Z
< 0). The equivalent circuit is shown in
Fig. 6.69(b). This circuit needs a PMOS device with high threshold voltage. A
typical value for the threshold voltage difference is I.]*. The PMOS transistam
are chosen as threshold voltage difference generator because they are in N-weUs
and therefore the difference is independent of back-biar (VBB). The circuit of
Fig. 6.69(a) does not s&er mnch f m m V~D..~ bounce. The temperatwe
dependency of the VT difference is expressed by [49]
(6.21)
where N.il and N.42 are the surface impurity concentrations of PI and P2$
respectively. Far B stable-temperature design, the concentration ratio N.il/N,i2
and. therefore the threshold voltage difference, should not be excessively large.
A typical valne of temperature dependency is 0.4 mV/C, whieh is small for the
VDC circuit.
Since the AVT is around 1 V, the circuit of Fig. 6.10 is used to convert this
difference to the required internal supply voltage. The voltageup converter
amplifies AVT to:
V,.t = AVT (1+ 2)
R
(6.22)
The mismatch between the two PMOS devices PI and P, of Fig. 6.69 can be
minimised by using large channel widths and lengths. But stiU the deviation
on VT, dne to the fabrication process, has to be eliminated. This can be done
by using fuse trimming technique to control the ratio of the resistors R1 and
R2. The total current consumed by this RVG circuit is
where 31 is the current consumed by the voltage regulator [eee Fig. 6.69(a)]
and I, is the current of the differential amplifier. I& = K c f / ( R r + R2)is the
current of the ontput stage. I can be made < Ip A, however I. and II, can not
be made rmdcr, particdarly I,. The resistor is implemented, foz example, by
using doped polysilicon. Typical valuei of the resistances m e of the order of
100 K l l . They can not be increased excessively, otherwise the m a of the RVC
can be significantly high. Moreover, the substrate noise can affect the reference
398 CHAPTER6
voltage through the coupling capacitances of the resistors. The total current of
this type of RVG is in the order of few .e tens of p A .
To redme the current of the RVG to rub-pArmgefbr battery-operated DRAMs,

the concept of dynamic RVG can be used [50] - s h o w in Fig. 6.71. A PMOS
transistor P, with low [VT~ is used. Doring the sampling peiiod (#, is high), all
switches S, -5’4 are closed. The threghold voltage difference, AVT, between the
two PMOS devices, Pi and P2*appears a c m s the resistor RR.If the transistor
dimensions of the pairs P, and P2,and HIand are identical, the reference
voltage is given by
I, = A VT
~
(6.24)
RR
This current is mirrored to the output node. If the dimension of P is identical
to that of P>,the output voltage V,, is given by
V7#, = AVT-Rr. (6.25)

RR
This shows that the reference voltage e m be adjusted to any voltage. Moreover,
with trimming technique V,,, can be adjusted against pmcess vadation effect
(AVT variation). The ontput voltage is sampled on the hold capacitor C,.
When 4, is low, the circuit is in hold mode. Clock +2 is delayed to clock
to minimbe fluctuation of the output voltage. These clocks ape generated from
the self-refresh clack circuit in il DRAM. The ciircuit consumes a DC current
only when 4, is applied. The average cuiient consumed by this circuit is
I,, = 31x74 = ~ ( A V T I R E ) ~ ~ (6.26)

where 7+ is the duty ratio of The corrent of thb circuit c m be reduced
to a low-level in sub-PA iange by controlling the duty ratio. For example t o
generate a reference voltage of 2.4 V from an externd power supply voltage
of 3.3 V, RR and Rr. me 9 kR and 12 kfl, respectively. AVT has a typical
value of 0.3 V. The total DC is 100 PA. So with a duty ratio lower than
1/100, the average current can be reduced below 1 p A . It can be easily shown
that this circuit has a low sensitivity to power supply voltage and temperature
variations.
6.4 CHAPTER SUMMARY

Low-power architectures/circuitr techniques for SRAMs, DRAMs and VDCs
were reviewed. The obviow technique to reduce the power dissipation is the
400 CHAPTER6
voltage ~ealing. The reduction of power supply voltage to 1- and sub-1 V

range requires new circuit innovations and breakthroughs, particularly when
low threshold voltage devices are used. It ww shown that not only the power
supply voltage scaling contribntes to the power consvmption reduction but &o
the reduction of capacitances and DC currents using sophisticated techniques.
Many of the techniques presented for memories can be useful to other applica-
tions such as : ASICs, DSPs, etc. Design issuer for stable operation of a VDC
and Iow-rtandby current techniques were invertigated.
REFERENCES
[I] 8. Tram ct al., "An 8 - m 1-Mb ECL BiCMOS SRAM ~ t a hConfigurabIe

Memory Array Size," International Solid-state Circuits Cod. Tech. Dig.,
pp. 36-37, Febzuluy 1989.
[2] M. Matsni et al., "An 8-ns I-Mb ECL BiCMOS SRAM," International
Solid-State Circuits Conf. Tech.Dig.,pp. 38-39, February 1989.
[3] Y.Maki et al., 'A 6.5-nr 1 Mb BiCMOS ECL SRAM," International Solid-
State Circuits Conf. Tech. Dig., pp. 136-137, February 1990.
[4] M. Takada et al., "A 5-11s 1-Mb ECL BiCMOS SRAM," BEE Journal of
Solid State Circuits, uol. 25, no. 5, pp. 1057-1062, October 1990.
151 A. Ohba et al.. "A 7--ns I-Mb BiCMOS ECL SRAM with Program-Free
Redundancy," in Symp. VLSI Circuits C o d Tech. Dig., pp. 41-42, May
1990.
[6] Y. Okajimact al., "A 7-nr 4-Mb BiCMOS SRAM with a Parallel Testing
Circuit," International Solid-State Circuits Conf. Tech. Dig., pp. 54-55,
Febrosry 1991.
[7] K. Sas& ct d.,"A 7-ns 140-mW 1-Mb CMOS SRAM with Current Sense
Amplifier," IEEE Journal of Solid.State Circuits, vol. 27, no. 11, pp. 1511-
[8] T. Ootani et al., "A 4-Mb CMOS SRAM with a PMOS Thin-Film Tran-
sistor Load Cell," IEEE Journal of Solid-State Circuits, "01. 25, no. 5, pp.
1082-1092, October 1990.
[9] S. Mur&kami et al.. "A ZI-mW 4 M b CMOS SRAM for Battery Opere-
tion,' lEEE Journal ofSolid-State Circuits, vol. 26, no. 11, pp. 1563-1570,
November 1991.
[lo] K. Saraki et al., "16-Mb CMOY SRAM with a 2 . 3 - p ~Single-Bit-Line
~~
Memory Cell," IEEE Journal of Solid-state Circuits, val. 28, no. 11, pp.
1125-1130, November 1993.
404 DIGITALVLSI DESIGN
LOW-POWER
[Ill M. Metrumiya et al., 'A 15-ns 16-Mb CMOS SRAM with Interdigitated
Bit-Lme Architecture," IEEE Journal of Solid-State Circuits, ual. 27, no.
11, pp. 1497.1503, November 1992.
[I21 K. Sen0 et al.. " A 9-ns 16-Mb CMOS SRAM with OfEset-Compensated
Cnrrent Sense Amplifier," IEEE Journal of Solid-State Cirenitr, vol. 28,
no. 11, pp. 1119-1124,November 1993.
[I31 E. Seevinck, F. J. List, and J. Lohrtroh, Static-Noise Marsin Analysis of

MOS SRAM C e b , " IEEE Journal of Solid-State Circuits, vol. SC-22, no.
5 , pp. 748-754, Oetobei 1987.
[I41 H. Kato et al., "Consideration of Poly-Si Loaded Cell Capacity Limits for
Low-Power and High-speed," IEEE Journal of Solid-State Circuits, vol.
27, no. 4, pp. 683-685. April 1992.
[I51 K. Saraki et al.,"A 23-ns 4-Mb CMOS SHAM with 0.2-pA Standby Cur-
rent," IEEE Journal of Solid-state Circuits, vol. 25, no. 5, pp. 1075-1081,
October 1990.
[I61 K. Ishibarhi, T. Yamanaka, and K. Shimohigashi, "An a-Immune.2-V
Supply Voltage SRAM using a Polysilicon PMOS Load Cell," IEEE Jour-
nal of Solid-state Circuits, vol. 25, no. 1, pp. 55-60, February 1990.
[I?] K. Saraki et al., "A 15-ns I-Mbit CMOS SRAM," IEEE Journal of Solid-
State Circuits, vol. 23, no. 5 , pp. 1067-1072, October 1988.
[I81 K. S s a k i e l al., "A 9-ns I-Mbit CMOS SRAM," IEEE Jonrnal of Solid-
State Circuits, "01. 24, to. 5, pp. 1219-1225, October 1989.
[I91 K. Ishibarhi, K. Takasugi, T. Yamanaka, T. Hashimoto, K. Sasaki. " A
I-V TFT-Losd SRAM using a Two-step Word-Voltage Method," IEEE
Journal of Solid-state Circuits, vol. 27, no. 11, pp. 1519-1524, Msy 1992.
[20] M. Yoshimito, K. An-, H. Shioohara,T. Yoshihara, H. Takagi, S. Nagao,
S. Kayano. and T. Nakano, "A Divided Word-Line Structure in the Static
RAM and its Applieation to a 64K Fall CMOS RAM," IEEE Journal of
Solid-State c i r c u i t s , vol. SC-18, no. 5, pp. 479-485, October 1983.
[21] T. Hirose, H. Kuriyama, S. Mnmkami, K. Yuzuriha, T. Mukai, K. Tsut-
sumi, Y. Nishimura, Y . Kohno, and K. Anami, "A 20-ns 4 M b CMOS
SRAM with Eieraichical Word Decoding Architecture," IEEE Journal of
Solid-State Circuits, vol. 25, no. 5, pp. 1068-1074, October 1990.
REFERENCES 405
[22] A. Sekiyama, T. Seki, S. Nagai, A. Iwase, N. Surilti, and M. Hayaraka, "A

I-V Operating 256-Kb FaLI-CMOS SRAM," IEEE Journal of Solid-state
Circuits, vol. 21, no. 5, pp. 776-782, May 1992.
[23] T. Yabe, et al.. "High-Speed and Low-Standby-Power Cieuit Design of 1
to 5 V Operating 1 Mb Full CMOS SRAM." Symposium on VLSI Circuits
Tech. Dig., pp, 107-108, May 1993.
[24] G. Kitrukawa, et 81.. "256-Mb DRAM Circuit Technologies for File Appli-
cations," IEEE Journal of Solid-State Circuits, "01. 28, no. 11, pp. 1105-
[25] T. Hasegawa, et al., "An Experimental DRAM with a NAND-Structnred
Cell," IEEE Journal ofSolid-State Circuits, val. 28, no. 11, pp. 1099-1104,
November 1993.
1261 T. Sugibayashi, et al., "A 30-nn 256-Mb DRAM with a Multidivided Array
Structure," IEEE Journal of Solid-State Circuits, "01. 28, no. 11, pp. 1092-
[27] M. A&, J. Etoh, K. Itoh, S-I. Kimura, and Y. Kawamota, "A 1.5-V
DRAM for Battery-Bwed Applications," IEEE Journal of Solid-State Cir-
cuits, "01. 24, no. 6, pp. 1206-1212, October 1989.
[28] Y. Nakagome, et d.,-An Experimental 1.5-V 64-Mb DRAM," IEEE Jour-
nal of Solid-State Circuits, vol. 26, no. 4, pp. 465-471, April 1991.
[29] H. Yamauehi, et al., "A Circuit Technology for High-speed Battery-
Opersted 16-Mb CMOS DRAMS,~IEEE Journal of Solid-State Circuits,
"01. 28, no. 11, pp. 10841091, November 1993.
[30] N. C. C. Lu, " Advanced Cell Structnres for Dynamic RAMS," IEEE Cir-
cuits m d Devices Magashe, no. 1, pp. 21-36, Jenuary 1989.
[31] M. Takadn, "DRAM Technology for Giga-bit Age," International Conf.

Solid State Devices and Materials, Tech. Dip., pp. 874876, 1993.
[32] L. Itoh, et d.,"An Experimental 1-Mb DRAM with on Chip Voltage
Limiter," in International Solid-State Circuits Cod., Tech. Dig., pp. 282-
283, 1984.
[33] N. C-C. Lu, and H. H. Chao, '' Half-Voo Bit-Line Sensing Scheme in
CMOS DRAMS," IEEE Journal of Solid-State Circuits, "01. SC-19, no. 5,
pp. 451-454, August 1984.
406 LOW-POWER
DIGITALVLSI DESIGN
(341 B. Kawamoto, T. Shinods, Y. Yamapehi, S. Shimiuu, K.Ohishi, N. Tan-

imum, T. YasUi, 'A 288K CMOS Pseudostatic RAM," IEEE Journal of
Solid-state Circuits, vol. SC-19, no. 5 , pp. 619-625, October 1984.
1.351 Y.Trikihwa et d.,"An Emcient Back-Bias Gcnezstor 6 t h Xybzid P u m p
ing Circuit for 1.5 V DRAMs," in Symposium of VLSI Circuits, Tech. Dig.,
pp. 85-86, May 1993.
(361 Y. KQnishi, ct al., "A 3&ns 4-Mb DRAM with a Battery-Backup (BBU)
Mode," IEEE Journal ofsolid-state Circuits, vol. 25, no. 5 , pp. 1112-1117.
October 1990.
[37] T. Ooirhi, et al., "A Wen-Synchronized Senring/Equalizing Method for
S u b 1 V Operating Advanced DRAMs," in Symposium on VLSI Circuits.
Tech. Dig., pp. 81-82, May 1993.
1381 M. Asakura, et al., "An Experimental 256-Mb DRAM with Boosted Sense-
Ground Scheme," IEEE Journal of Solid-state Circuits, d.29. no. 11, pp.
1303-1309, November 1994.
1391 T. Sskata et al., "Subthreshold-Current Reduction Circuits for Multi-
Gigabit DRAMS," in Symposium on VLSl Circuits, Tech. Dig.. pp. 45-46,
May 1993.
[40] T. hrruyama, et al.. "A New On-Chip Voltage Converter for Submicrome
ter High-Density DRAMs," IEEE Journal of Solid-state Circnits, vol. 22,
no. 3, pp. 437-441, June 1987.
141) M. T s h d a . e l al., -A 4-Mb DRAM with Aalf Internal Voltage Bit-Cine
Precharge," IEEE Journal ofSolid-State Circuits, vol. 21, no. 5 , pp. 612-
617. October 1986.
1.121 M. Hiroguchi, e l aL, "Dual-Operation-Vdtage Scheme for B S i g l e 5-V.
16-Mb DRAM," IEEE Journal of Solid-State Circuits, vol. 23, no. 5. pp.
1128-1132, Oetober 1988.
1431 G. Kitsukawe, et al., "A I-Mb BiCMOS DRAM Using Temperature-
Compensstion Circuit Techniques," IEEE Journal of Solid-State Circuits,
"01. 24, no. 3, pp. 597-602. Jnnc 1989.
144) M. Boriguchi, et al., "A Tunable CMOS-DRAM Voltage Limiter with Sta-
bilised Feedback Amplifier," IEEE Journal of Solid-State Circuits, YO\. 25.
no. 5. pp. 1129-1135, October 1990.
REFERENCES 407
[45] M. Roriguchi, et al., "Dual-Regulator Dual-Decoding-Trimmer DRAM

Voltage Limiter far Brun-in Test," IEEE Journal of Solid-State Circuits,
d.26, no. 11, pp. 15441549, November 1991.
[46] K. Ishibashi, K. S-ki, and H. Topshima, " A Voltage Doan Converter
with Submicroampere Standby Corrent for Low-Power Static RAMS,"
IEEE Journal of Solid-State Circuits, "01. 27, no. 6, pp. 920-926, June
1992.
[47] P. E. Anen, and D. R. Rolberg, "CMOS Analog Circuit Design," Holt,
Rinehart and Winston Publisher, 1987.
[48] P. R. Gray, and R. G. Meyer, "Analysis and Design of Analog Integrated
Cteuit," 2nd Edition Wiley Publisher, 1984.
[49] R. A. Blauschild et al., " A New NMOS Temperature Stable Voltage Ref-
erence," IEEE Journal of Solid-State Cicuitr. vol. SC-13, pp. 767-774,
December 1978.
[60] H. &aka, Y. Nsksgome, J. Etoh, E. Ymaeki, M. Ao?4 and K.
Miyamwa, *Sub-l-prn Dynamic Reference Voltage Generator for Battery-
Operated DRAMS," in Symp. VLSI Circuits, T e d . Dig., pp. 87-88, May
1993.
7
VLSI CMOS SUBSYSTEM DESIGN
In this chapter, we study the application of the dreuit techniqnes developed

through Chapter 4 in the implementation of CMOS b d d i n g blocks soch as
adders, multipliers, ALUs, data-path, and regnlar structures, etc. The pow=
dissipation constraint is also included through the several options presented
for each dreuit. The use of Phase locked Loop (PLL) in high-speed CMOS
systems for deskewing the internal clock is also examined. Low-power issuer of
the circuits presented are also discussed.
7.1 PARALLEL ADDERS

Parallel adders ere the most important elements used in arithmetic operations
of microprocessors, DSPr, ete. As in any logic design they are constrained
by parameters aoch as speed, area, and power dissipation. The adder cell ir
also an dement of multipliers, dividers, multiplier-acuundatorr (MACs). etc.
A m o n g the varions adder's implementations used in many desigrw, we c a n cite
the following clssse.:
-.
m Ripple Carry Adders (RCA);
Carry Look-Ahead Adders (CLA);
Carry Select Adders (CS); and
m Conditional Sum Adders (CSA).
This section h dovoted to describing all these adder classes.

410 CHAPTER7
7.1.1 Ripple Carry Adders

In Chapta 4, a d-rription of the fnmtiondity of an adder cell was presented.
In an n-bit adder, a propagation of the carry always occurs. This propagation
limits the speed of the adder. The simplest way to construct an n-bit adder
is to cascade n 1-bit adders as shown in Fig. 7.1. This adder is called Ripple
Carry Adder (RCA). Beesuse the carry ripples through the n-stager, the sum
of the nthbit csnnot be perhmed until the c a w C=.L is evaluated. The delay
of n-bit addition is given by
+., = (n - 1)t. + t, (7-1)

where t , is the esrry delay and t. is the som delay. Since the carry propagation
path is II critical stage for the delay, the full-adder cell should be optlnied.
The sum and carry out are given by
S = A @ B ( B C (7.2)
C,, = A . B + (A + B).C;, (7.3)

The schematic of Fig. 7.2 cam be genewted to &dently implement the adder
cell. Compared to the conventional CMOS full-adder implementation, there is
no inveiter stage. Therefore, the carry delay is redoced. To optimiae the cell,
the transistors in the carry path W, and W,, UUL be s i n 4 up [see Fig. 1.21.
The other devices can be kept amall to reduce the load on the carry and the
power dissipation. The transistors, driven by the carry in C,,, are placed close
to the output. Thir will reduce the body effect. since the cairy signal is the
VLSI CMOS SubSystem Design 411
T T
Crilicai path
412 CHAPTER7
latest one in an adder chain. The schematic of Fig. 1.2 ir symmetrical and
leads to better layout and small area. Since the outpnts are complemented,
and in order t o implement an RCA circuit, the configuration of Fig. 7.3 can be
used. In this case, many cells use inverted inputs.
Note that an n-bit RCA circuit is subject to the glitching problem. Fig. 7.4
shows 8 static simulation of a 4-bit adder, vrith the inputs A; set to zero (0),
and the inputs B; and C,. i i s i g from 0 to 1. The outputs S, should stay
at 0, however, due to the delay of the carry signal, through the chain of full-
adders, the autpnts exhibit spurious transitions (glitching). There dynamic
transitions dissipate extra powm and can represent an important portion of
the total power. With careful design this glitchhg problem cam he minimized.
One ddvbntage of the RCA is its low-power characteristic. However, its speed
is very limited, particularly when the adder is wide.
Another efficient full-adder cell is based on Transmission Gates (TGs). Fig. 7.5
shows an optimived version of the fd-adder cell wing TGs & e d y discussed in
Chapter 4. The carry ieal propagates only through one TG. Hence, an n-hit
RCA would be faster and more compact than the conventional one'. Fig. 7.6
shows the construction ofan n-bit d d e r . Pmctiedy, an inverter is added every
four stages to reduce the degradation of the carry signal due to the dktribnted
RC effect. When the carry rignd is inverted after 4 I-bit stager, complementary
carry path adders are used for the next 4-bit stages. This adder structure is
sometimes called Mancherter adder. This circuit is faster than the RCA and
may have loww power dissipation.
7.1.2 Carry Look-Ahead Adders

To avoid the linear growth of the carry delay, we use a Carry Lookahead Adder
(CLA) in which the earties can be generated in pardel. The carry of each bit
is generated from the propagate and the generate ~ignalr(P(, G;)ss well i ~ sthe
input carry (Go).The propaggste and the generate signals (Pi,Gi) are derived
from the operands A; and B, hy
G; = B. (7.4)
414 CHAPTER7
I
. T I
Ci"
The carries of the four stager are given by
C I = G a t POCO (7.6)
Cz = G I + PIGo PIP& + (7.71
Cs = Gn + PxGr+ PzPzGo + PZPLPOCO (1.81
Cn = Gs + PsGr + PsPzGi + PsPzPxGo+ PaP,P,PoCo (1.9)
Fig. 1.7 shows the block diagram of a 4bit CLA adder. The carry generator
blocks (CLG1 to CLG4) generate the carries CL to Cn, in parallel, &om the
w r y in signal Co. The different P< and G; signals are implemented following
the expressions given b7 Equations (7.4) and (1.51. The B- generator blocks
(SG1 to SG4) generate the sums. The mm, S ( , Li generated by
Sc = Ci-1 @ Ai @ B; (7.10)
416 CHAPTER7
or
s, = C<L, B Pj (7.11)
if the propagate signal is given by
P, = A< Q B, (7.12)
In general, an n-bit CLA adder can be implemented dciently using 4-bit
blocks.
Fig. 7.8(a) and 7.8(b) show the first and the fourth CMOS carry lookahead
generator kcuits, respectively. The generate and propagate signals are gener-
ated in parallel and are fed to all carry generators with the input carry signal
Co. The e u r y signals %regenerated simultaneously. However, because the
number of stacked MOS transistors increases, the delay of the fourth carry is
greater than that of the first and limits the adder speed. The sum generator
of the CMOS adder of Fig. 7.2 c m be used in this ewe. The same circuit is
used for all four bits. This implementation is slow beeavae of the large numbers
of stacked MOS transistors which represent a high equivalent resistance in the
pull-up and pd-down paths.
Another CLA circuit implementation in static CMOS design which improves

the critical carry path delay is shown in Fig. 7.9(s). In this circuit, the number
of stacked devices is reduced. The same cell of Fig. 7.9(a) can be used to
generate each carry within a 4-bit block. P and G are the global prqagate
and generate signals, respectively. The invezter of the circuit of Fig. 7.9(4 is
used to reduce the load on the fourth carry, C,-, when it is used to drive the
next fourth CLG circuit. The output of this inverter, I, drives many blocks
such BS the next first-bit, the next second-bit, the next third-bit CLGs, and
the next sum blocks. For the fourth bit stage, P and G aze given by
P = P.+sP,+2P,+,P; (7.13)
+
G = Gi+a Pi+sGi+? +P;+aP;+2Gi+i +Pi+sPd+&+tGi (7.14)
The circuits of Fig. 7.9(b) and Fig. 7.9(c) show the implementations of the
global functions P and G . Simildy, the P and G sign& for the third. second
and first bit stages c a n be constructed. For an n-bit adder, all the P and G
-
signals are computed in parallel. Hence, the critical path is the carry path
C, C;+,, except for the fust &bit adder block, where the oritieal path can
be from one of the inputs ( A , or Bo) to the carry out C4.
The 11101 generator is implemented using the propagate signals, P<and p;. Fig.
7.10(a) illustrates one pwsible circuit using B static CMOS implementation.
VLSI CMOS SubSystern Design 417
t Gn
418 CHAPTER7
VLSI CMOS SubSistem Design 419
ci -
Figure 7.10 S w generator circuits: (a) static CMOS; (b) transmiasion @tr
ramion.
Another circuit more compact and faster is shown in Fig. T.lO(b). It uses
transmisJion gates and needs only 6 transistors.
Many urcuit techniques for high-speed carry lookahead adders have been pro-
pored. One of them uses the pseudo-NMOS like style [I]. The adder w~ used in
a multiplier and achieved a high-speed static operation. However, it consumer
a DC current and it is not snitable for low-power applications.
420 CHAPTER
7
Other CLA implementations, to improve the carry path delay, are based on the
transmission gates and CPL families. In this section we present the one based
on CPL. The TG version is left to the reader to design. Fig. 7.11 shows the
block digram of a 32-bit PMOS lsttch CPL carry loakahesd adder using 4 b i t
blocks. The carry generators (CLGs) of each 4 b i t block generate the carries
C,+>through C(+$in parallel from the carry in, C.. The different P; and G,
signals, required by each 4-bit block, m e not shown for clarity reasons. When
the carry Cj+4 is fed to the next 4-bit block it "re3 B buffer to distribute this
carry to other CLGs and SGs. Therefore, the carry path is not signifmtly
loaded. This results in a h t operation. Fig. 7.12 shows the CPL implementa-
tion of the CLG of the fourth bit. This circuit is located in the clitical path of
the carry signal. It is compact and uses only NMOS pass transistors. P and G
are the global propagate and generate signals, respectively. The fourth carry
is generated from the carry in or G signals through only one NMOS device.
The P signal block i b implemented using ANDINAND CPL style. After each 4
CLG blocks of the critical path, the carry is buffered and restored using PMOS
latch buffers. The PMOS latch restorer the reduced high level to full-swing
to avoid any DC leakage current as shown in Fig. 7.11. Fig. 7.13 shows the
G signal block for the fourth-bit CLG 8s an example. The same circuit gtyle
can be used t o generate this G signal for the third-bit, the second-bit, and the
first-bit CLGs. In addition the output inverter rises a PMOS latch to rertore
the swing. The PMOS latch circuit is incorporated only when dual rail signals
are available. However, for a single-ended signal, a feed-back PMOS, transistor
is added to restore the full r d high-level ar in the case of the sum generator of
Fig. 7.14.
7.1.3 Carry-Select Adder

Another adder implementation which improves the speed of the RCA is the
Carry Select adder (CS). It provides B regular layout. as in the m e of an
RCA. A CS adder basically consists o f blocks; each wrecuting two additions.
One ammeS that the carry in is "1"; the other assumes the carry in is "0". The
real carry in is computed from the previans block and selects one of the two
m m outputs with a simple TG multiplexer. Fig. 7.15 shows an example of an
&bit carry select adder implementation with 4 4 staging. The carry signal, C,,
selects the nerd foulsums and the carzy Cs.The 4 b i t adder blocks usvaUy nse
RCA with transmission gate implementation. For a 32-b adder, the use of the
normal sta&g 4-4-4-444-4-4, does not lead to an optimum delay. This is due
to the multiplexing delay of the next carry. Optimal staging depend. on the
technology. For example, for the 0.8 pm CMOS device parameters presented
Buffers
I
C"
... ...
422 CHAPTER7
Figure 1.13 G blockin CPL logic.

VLSI CMOS Su6System Design 423
in Chapter 3, simulations show that the optimal staging of a 32bit CS adder

nSing TGr is 4-4-7-9-8 at 3.3 V power supply '&age. This implementation
is regular and easy to layout. however it has a higher occupied area than the
RCA.
7.1.4 Conditional S u m Adders

In 1960 Sklansky considered the Conditional S u m Adder (CSA) 8s the fastest
one,from a theoretical point ofview [Z,31. The concept behind this architecture
is explained using the basic circuit of Fig. 7.16. This example is for a 4 b i t
conditional rum adder. It user two types of c e h i) the conditional cell, and ii)
the multiplexer. For each bit there is one conditional cell circuit. It computes
two sums and two carries: So and Coare cdculsted for a eauy in iero, and S'
and C' are calcdated for a carry in one. The selection of the true s- is done
with the first carry in and the previous carries. The troe final carry out (G in
Fig. 7.16) is also selected.
424 CHAPTER7
A possible implementation ofthe conditional som adder is shown in Fig. 7.17

for the c s e of B 4-bit adder [4]. The conditional cell can be implemented
vith the compact logic elements of Fig. 7.17(b). The different sign& ofthe
conditional cell ate constructed using the following relations
s'p = A;.B* + A*.B+ (7.15)
(7.16)
(7.17)
(7.18)
VLSI CMOS Subsystem Design 425
The adder uses mainly for the multiplexers transmission gates as shown in Fig.
7.17(~). Note that the architectue we6 the signals and their complements
(dualhail architecture) to avoid the use ofinverterr for the multiplexers. Oth-
erwise the delay of the csrrg path will be pen&& by the addition ofinverterr.
To design an n-bit (e.g., 32-bit) adder, one possible technique for fast operation
is to use staged blocks of constant width or variable width. In this case, dl
the conditional sum blocks compute thelr respective double snms and double
output carrier in paallel. The troe sum and carry out signals of each block a r e
then selected by the carry in generated by the preYions stage. The architecture
at the block level UBU B any-select like technique where the carry in of each
block ir the true carry out of the previous block. The optimal staging a n
be determined from circuit simulation. The architecture has two critical delay
paths within a block. One from the carry in to the carry out which is affected
by the layout routing since the carry in of a block is distribnted to all the final
multiplexers. The other critical delay path is the one from the LSB-inpnt of B
block to the cnrry out.
To reduce the power dissipation and the delay of the CSA adder, B CPL-Wre
circuit style can be used. Fig. 7.18 shows the different circuit cells needed
to implement such an adder. In Fig. ?,la(*), the conditional cell schematic
is shown. The output signals have a high level voltage equal to VDD - VT.
Fig. 7.18(b) shows the compact mdtiplexer using NMOS pass-transistors. The
control signals of the multiplexers should have f u l - r d swing, When using t h e e
reduced swing circoits in the adder, whenever a full-rail swing is needed it can
be generated with the double-rail swing restored circuit of Fig. ?.lS(c). The
output inverter ofthe rum Signal is shown in Fig. 7,18(d). The feedback PMOS
transistor is needed to restore the high level when only a single-rail exists. The
layout of such an adder is regular. Only three c& of the first. second and
third bits have to be drawn. Fig. 7.19 illutratw the layout of a 4bit block
0.8 pm design rules.
7.1.5 Adder’s Architectures Comparison

The ripple adder has the smallest area compared to the other classes and the
lowest power in many ca~es. So it should be limited to applications where the
area and/or the power must minimized, while the speed is not important. For
fast adders, u ~ u d l ythe CLA &cuit is used, however its power dissipation can
be relatively high. The carry select adders are widely used as the optimum
compromise between high-speed operation of the CLAr and the small area of
426 CHAPTER7
* : MUXs
(a1
VLSI CMOS SulSystem Design 427
428 CHAPTER7
Figure 1.18 I bit ~anditionalSM sddcr layout
R C h . The conditional snm adder, with variable block staging, combincd with
carry select like style ULO iesult in the fastest adder if well optimized. The
power dissipation of this adder can be comparable or maybe less than that
of the RCA because it u e s jl reduced internal swing and a datively small
transistor count if thc CPL-like style is used. When considering all the criteria
ouch as the power, the area and the speed, a tool can be developed to select
the adder class which satisfies the specified requirements.
Far wide adders, having operand's sire more than Whit, the different arehitec-
turer can still be utilised. However, to optimize the speed and power of such a
wide adder, several additional algorithms can be combined. Examples of wide
adders can be found in 15. 61.
7.2 PARALLEL MULTIPLIERS

High-speed parallel multipliers are becoming one of the keys in RISCs (Re-
dnced Instruction Set Compnteers), DSPs (Digital Signal Processors), graphics
accelerators and so on. Parallel multipliers are used in data proeerrorr as well
nr digital signal processors. For example, for multi-media applications 16 Y 16
fart multipliers are needed. For flosting-point unit osing double-precision mul-
tiplication (IEEE-754 standard), the mantissa data hnr 52-bit. Then 54 Y 54
are required for such an operation. The two added bits are the sign bit and
the guard bit. In this section we discuss several parallel multiplier algorithms
which have been used in VLSI. The reader can consult references [7, 81 for more
details on array multiplication algorithms.
7.2.1 Braun Multiplier

Consider two unsigned numbers X = Xn-l...XzXoand Y = YLi...YrY0
(7.19)
(7.20)
The product P = P ~ ~ ~ , . . . P ~ Pwhich

, , , results from multiplying the mdtipli-
-d X by the multiplier Y, c a n be written in the following form
i=o j=o
Each of the partial product terms Pk = Xi% is c d e d summand. Fig. 7.20(a)

s h o w an example of 4 x 4 multiplication. The summands are generated in
parallel with AND gates. Fig. 7.20(b) shows the Braun's array multiplier
[7]. Such a multiplier of n x n requires n(n - 1) addecs and na AND gates.
The adder can be implemented efficiently by arranging the array for a regular
layout. Fig. 7.21 shows 8 regular 4 Y 4 array implementation of the multiplier
of Fig. 7.20 using three different cells. The fist cell contains an AND gate
[Fig. 7.21(b)]. The second cell shown in Fig. 721(c) contains a fd-adder and
an AND gate. T h e routing lines arc d s o illostmted in these cells. The last
cell represents a M-addex composing the final carry propagate adder. The
multiplier array is using what ir called carry-save adders.
The delay of such a multiplier is dependent on the delay of the full-adder cell and
the final adder in the last row. In the multiplier array, an sdder with balanced
carry and s u m delays is desirable beoause sum and carry signals are both on
the critical path. This is diJkent than the case of a p d l e l adder where the
carry path should be optimized and speed up compared t o the s u m path. For
large arrays, the speed and power of the full-adder are very important. CPL-
like styles discussed in Chapter 4 can result in reduced power dissipation and
high-speed of operation. The final sdder in the last row can USE the techniques
presented in Section 7.1.
430 CAAPTER7
x, x* x, xo =x
Y3 Y> Y, Yo =Y
VLSI CMOS SuhSystem Design 431
xi qv;
(bl
432 CHAPTER7
7.2.2 Baugh-Wooley Multiplier

It was noted that Biaun multiplier performs multiplication of unsigned nun-
bers. The Baugh-Wooley teehnique [7] was developed to design regular direct
multipliers for two's complement numbers. This direct approach doer not need
any two's complementing operations prior to multiplication. Let us consider
two-numbers X and Y with the following form
x = -x,-12"-' + c
; a - I
i=o
X.2' (7.22)
Y = -Y,-,2"-' + c
i=n-*
i=o
K2i (7.23)
The product P = XY is given by the following equation
P = XY 5 x"_rY,_,2"-' + cc
i=n-2j=n-2
i=o j=o
X;Ip'"
-x-., c
i=n->
i=o
fi2"f"-Y n.i c
<=*-a
i=o
X,2"+'-' (7.24)
In order to avoid the use of subtractor cells and use only adders, the negative
t e r m should be transformed. So
c (- c
+ 2"-' + i=n-2 E P - 1
1
i=n-2
__,.-x,_1 KZ"+L - x ".I p . 2
i=o *=o
(7.25)
Using this property in Equation (7.23), the product P becomes
P = XY = -2-'+(z".l + + x".*Y"-,) .2'*-2
Using the above rdstion M n x n multiplier, using only adders, can be imple
mented. The schematic circuit diagram of 8.4 x 4 two's complement mdtiplicr
bared on Baugh-Wooley'a algorithm is shown in Fig. 1.22. The different cells
composing the array are &o shown. In this scheme n(n- 1) 3 full-addus are +
VLSI CMOS SudSyslem Desagn 433
Figure T.22 (a) 4 x 4 Baush-Wooley two's complement r e d s &nay (FA :

M-Adder).
required. So for the ease a f n = 4 the array needs 15 adders. When n is rela-
tively large, the Rnal adder stage in the multiplier army a n be implemented
with the techniques discussed in Section 7.1.
This type of multiplier L suitable for applications where operands vith less
than 16 bits are to be processed. Application;, for snch a mdtiplier are, far
exxamplc, for digital filters where s m d operands mc used (q., 6 , 8 and 12).
For low-power and high-speed of operation, the array uses a CPL-like adder
BS mentioned pieviously in Section 7.2.1,while a CSA scheme, combined with
carry select, a n be u t i e d in the final adder. For operands equal or greater
than &bit, the Baugh-Wooley scheme becomes too area-consuming and slow.
434 CHAPTER 7
Henee, techniques t o reduce the size of the array, while maintaining the regu-
larity are required.
72.3 The Modified Booth Multiplier

For operands equal or greater than &bits, the modified Booth algorithm [a]
have been used in almost all the designed multipliers. It is bhsed on recoding
the two's complement operand (Lo., multiplier) in order to reduce the number
of partial products to be added. Thb makes the multiplier faster and uses
less hardware (area). For eurmple. the modified Rad*-2 algorithm is based on
partitioning the multiplier into overlapping groups of 3-bits, and each group is
decoded to generate the correct paztial product.
Let us mite the multiplier, Y ,in two's complement

;=*--I
Y = -Y,-,2"-' + 1 Y.2' (7.27)
irnO
It can be rewritten as follows
In this equation, the terms in brackets have valuer in the set{-2, -1,O, 1, +2}.
The reeoding of Y ,using the modified Booth algorithm, generates another
number with the following five signed digits, -2, -1. 0, +1, +2. Each recoded
digit in the multipliei performs B certain operation on the multiplicand, X ,85
illustrated in Table 7.1
Table 7.1 Partid ereduct .cl<c&n
Y2,+>Ya, Y,,., Recoded Operation

digit on X
0 0 0 0 OXX
0 0 1 +I + l X X
0 1 0 +I +I x x
0 1 1 +2 +2xx
1 0 0 -2 -2 x x
1 0 1 -1 -1 Y x
1 1 0 -1 -1xx
1 1 1 0 OxX
So the bits of the multiplier are partitioned into groups of overlapped 3-hits,
each group permits generation of B ceitain partial product. The five posi-
ble multiples of the multiplicand are relatively easy to generate following the
explanation given in Table 7.2
The generated partial prodnct is related to the multiplicand for each recoded
digit by the relationships presented in Table 7.3. PP,is the partial product and
PP, is the sign bit of the partial product w t h P, = Pn-l when no shifting of
the partial product is performed. Note that the partial product is represented
+
on n 1 bits.
436 CHAPTER7
Recoded Digit Opuation on X

0 Add 0 to the partial product
+1 Add X to the-partid-product
+2 Shift left X one position and add it to the partial
product
-1 Add two’s complement ofX to the partial product
-2 Take two’s complement of X and shift left one
Table 7.S Pmtial prodvct gmcrathn relations.
Recoded Operation on X Added to

Digit LSB
0 PP; = 0 fori=O,.-.n 0
+1 PP; = x, fori=O, ...a 0
+2 PP, = for i =0. ...n 0
-1 PP; = x, for i = 0,.. -n 1
-2 PP, = Z,-, for i = O , . . .n 1
To clarify this algorithm, an example is presented in Fig. 7.23. Let X = l O O l O l O l

and Y = 01101001. The recoded digits of Y are
oiioio,oi: - +a -1 -2 +I
The bits are grouped into 3-bit groups overlapped by one bit and a bit with
a value of aero is added on the right side of Y 85 Y-I. So the mdtiplicstian
of two %bit numbers generates only 4 partial products. The number is then
reduced by half, The partial prodnet in thb example is represented on 9 bits.
For a correct partial product’s addition, the signs aze extended 85 shown in Fig.
7.23. The shape ofthe multiplier is then trapeiaidal due to the sign extension.
(-107) 10010101 = X
(+165)
%ELzy Operalion BltE recoded
+I 010
-2 100
extension -1 101
~100101010 +2 ni I
1101010000011101 = P (-11235)
In order to make the =nay rectangular, and then more regular for VLSI im-
plementation, the problem of sign extension must be addressed. This problem
is more crucial when the operand lengths ars wide, where each partial product
must be sign-extended to the length of the product. In thirIeetion we will not
deal with the techniques to solve the problem of the sign extension. Bat we
d discuss one technique which is shown in Fig. 1.24 for the e m p l e of Fig.
7.23. The bmie idea is to use two extra bits in the partial product. For the
first partial product, the two additional bits, PP,+I and PP,+. ale equal to
the sign bit of the partial product
PP..,, = PP-,, = PP, (7.29)

For the second partial product, if the first partial product was positive, then the
two additional bits for this second partial product a e given by the expression
above, otherwire we have two clues
PP,+z = PPm+,=l if PP,=O (1.30)

and -
PP*+, = PP..+> = 1 if PP, = 1 (7.31)
So it is more interesting to use a third bit, F, as a flag to indicate whether
there is, from the previous partial, a negative sign bit to be propagated. F1 is
the flag generated by the first partial product to the next one. For the example
of Fig. 1.24, FO = 0 (no PP before the first one). and F, = F2 = F, = 1. SO
for the first partial product there is a sign propagation to all the others. This
438 CHAPTER
7
(-107) lOOlOlOl = X
(+I051 KOEl = Y
Y Y
Operation Bits recoded
..
:1E110010101 +I 010
mOl10101 I0 -2 100
~OOllOlOll -I 101
D~00l01010 +2 01 1
ll~10100P0011101= P (-11235)
..I
,
8-1 Additional hiis 10 he gencrawJ [sign ~i1cnsi0n1
0 Additional bits generated fmm the previous Sign and the prescnl sign
Figure 1.24 Thc prcviour trample of Figvrc 7.23 eith aimpiifiId sign cxtm-
<om.
fiag is expressed by the following Boolean equation

Fj+1 = F j + P P , , j (7.32)
where PP,,i k t h e sign bit of the j t h partial product.
Let us now see the implementation of the n x n modified Booth multiplier. Fig.
7.25 shows the block diagram of the multiplier. Also it gives an idea about the
fioorplan of this subsystem. It is composed of the following blodrs:
m The multiplier axray containing partial product’s generators and I-bit

adders;
.i The Booth encoder and the sign extenJon bits (PP,+2,PP,+l,F).
The Booth encoder generates the five signals (0, +lx, +2x, -Ix, and
- 2 x ) for each group of 3-bit of Y ;and
rn The final stage adder performs 2n bits addition.
For the sake of simplicity, we treat the case of B 6 x 6 multiplier. All the c&
described in this easmple are the besic cells of any multiplier size. Fig. 7.26
X<*-l:O>
"Y
3
Y<n-l:O>
I I
+JcF.w n-bit adder
P<Zn-l:n:
Figure 7.25 Block diagram of the n x n multiplier uing modificd Bovth

al*mithm.
shows the implementation of such a multiplier. Four types of c& are used plus
the final adder. There cells are:
The ADD cell which generates 0 or 1 [see Table 7.31. The schematic
circuit of this cell is shown in Fig. 7.27(a). Two implementations
m e possible: one using pars-transistors controlled by the five signals
d&g the recoded digit code, and the other one is an AND2 gate of
the two sign& -1x and -2x.
The partial product MUX (PP-MUX) which generates the partial prod-
uct. Fig. 7.2T(b) shows the schematic of PP-MUX using CPL type
logic. The feedback PMOS, Pj in this figure or in the o m of Fig.
440 CHAPTER7
sumin 'i-1
*
5
C cT
4
Sum"",
(*) not conncclcd for PP-HA
(b) (Ci
7.17 Boothmdtipiicr c&: (4 ADD; (b)PP-MUX; (0) PP-FA (or

PP-HAl.
442 CHAPTER7
?.Z?(a)are used to restore the high level to eliminate any DC current.

This implementation permits fast operation and lowpower operation.
The PP-FA (PP-HA) cells. They merge the PP-MUX &cuit and a
full-adder (half-adder). respectively. CPL-lihe adder can be utibed
for fart operation and low-povrer.
rn TheBooth Encoder (BE).It generates thcfivecontrolrignalsox, +lx,
+2x, -lx, and -2x from a group of three bits of the multiplier Y.
Fig. 7.28 shows the schematic of the different circuits involved in the
BE block. The additional circuits ofthe two bits PP,,+i,j and PPn+z,j
of the jth PP are &o illutrsted. Pj and Fj+, are the previous and
the next flags, respectively. PPn,, is the sign bit of the jth PP. Note
that Po is 0.
The Booth multiplier exhibits a lot ofunnecessary glitches. The main mason for
glitchcs is due to the race condition between the multiplicand sod the multiplier
due to the Booth encoder. The power dissipation assodated with the glitches
can be an important portion ofthe total power and henee it needs to be reduced
by some techniques of signal synehroniaation.
7.2.4 Wallace Tkee

By applying the Booth algorithm, the number of partial products is hdfed.
However for large moltipliers, 32bit and over, the nnmber of the partial prod-
ucts is over 16-bit. In this case, the performanee of the modified Booth a l g e
rithm is limited. One techniqne, to improve the performance of there multipli.
ers, b to adopt the Wallace tree using 4 2 compressors. A 4 2 compressor
accepts 4 numbers and a carry in, and $urns them to produce 2 numbers and
carry out (really it is a 5-3 compressor). Fig. 7.29(a) shows an example of
rueh a tree on partial products of 110. unaigned 8 x 8 multiplisr. Eight partial
products are produced. Using 4-2 eompressors, two levels of additioru (rteges)
are needed. The final two summands are added nsing a fast 16-bit adder. Some
eeros me added to the array. This example shows that the bits which m e not
nsed in the M stage (level) jnmp to the next one t o be combined with the ones
produced by the compressors. Fig. ?.29(b) shows the architectme of the 8 x 8
multiplier. For the first stage of the tree, two blocks, A and B,are required.
The block A (B) of compressors group the first (last) four partial products,
respectively.
VLSI CMOS SubSysten Design 443
3-1
Figure T.28 Logic aehemstis of the Booth encoder including thc aim exten-
sion losir
444 CHAPTER7
pp"J
Fl
Fig. 1.30 shows how the 4-2 compressor can be implemented by 2 full-adders or
by custom static CMOS Iogjc [9]. 4-bit 11,...,In. are added to produce 2 s u m
S and C. Hence, 4-bit of the partial product are compressed to produce two
new partial products. The compressor is implemented, using carry-save adder
construction, by two cascaded fd-adders as shown in Fig. 1.30(b). Notice that
carry-out2 is never generated by carry-in. Fig. 1.31 shown the 4 2 compressor
circuit osing B compact structure of multiplexers [lo]. This structure is faster
than the static complementary version. Fig. 1.32 shows the intereonneetion of
the 4-2 compressors for block A of the example of Fig. 1.29. C. is connected
........... X
x7
Y7 ........... Y
: 0 zcra
446 CHAPTER7
As
B 7
I
L
448 CHAPTER7
I.
x<31:0>
I
I I
7 I 1 2nd stage-BlockE ]
iz-
-P<15:0>
ii
laslage-BlockC
1st stage-Block D
] 2nd slage.Block F
PPG: Gcncrator of panial
products ] 3rd alage-Block G 7
to the next carry-in f&. Since these signals are independent, the carry is not
propagated through the row.
To further enhance the Wallace tree multiplier, the modified Booth algorithm
can be used to rednee the number of partial prodocts by half in a camy-save
adder array. One example of such combined construction is the architectme
of the 32 x 32 multiplier shown in Fig. 7.33. It consists of four functions:
the Booth encoder, the partial product's generator, the compressor blocks, and
the final 64-bit adder. The Wallace tree is constructed with 3 stages (levels).
The first stage har 4 blocks (A to D ) , with each block summing up 4 partial
450 CHAPTER7
products among 16. The second stage s u m up the 8 new generated partial
products from the first stage. Hence, two blocks are needed, E and F. Finally,
block G of the third stage of the tree generates two other new partial products
to the find adder. This architectare exhibits some irregularities in the b y m t
since it has a complicated interconnection scheme. Hence, the interconnection
wirer affect the speed and power dirsipntion of the adder.
7.2.5 Multiplier’s Comparison

The basic array multipliers, like Baugh-Wooley scheme, consume low-power
and have relatively good performance. However, their use ean be limited to
process operands with less than 16-bit (e.g., &bit). For operands of 16-bit and
over, the modified Booth algorithm reduces the partial product’s numbers by
half. Therefore, the speed of the multiplier is reduced. Its power dissipation
ir comparable to the Baugh-Wooley multiplier due to the circuitry overhead in
the Booth algorithm. However, circuit techniques can ~ a n e ethis multiplier to
have low-power characteristics. The fastest multipliers adopt the Wallace tree
with modified Booth encoding. A Wallace tree would lead, in general, to larger
power dissipation and area, due t o the interconnect wlres. Henee, it is not
recommended for low-power consumption applications. Dynamic multipliers
ace not discussed in this section since they introduce problems of control and
timing. Hence a t m area and power dissipation are added to the design.
7.3 DATAPATH
A VLSI chip can be partitioned in two piuts; the data path (oz execution unit)
and the control unit. Data paths are often used in digital signal proce~~ors,
microprocessors and application specific ICs (ASKS). The data path consists
of a combination of an Arithmetic Logic Unit (ALU), a shifter, a file register,
1/0ports, a multiplier, an adder, B magnitude comparator, and data busses,
etc. It performs many operations on the data in the register file, to which
the results are sent back. The data busses permit communication between the
diSerent units of the data path. The data busses are the communication means
for the dats transfer between the ALU, shiiler, and file register, ete. These
busses have a heavy load (few p F ) . In CMOS design, dynamic techniques are
used to &ow fast operation. One way to reduce the power dissipation, doe to
the precharging transistors, is to use static burres (111.
Lalch A
Lalch C
Latch B
I
Op Code
*I Bus-B
Figure 7.34 Atithmeti= LogiE u d (4l.U).
The control unit delivers the instructions to the data path. These instructions
determine the operations that the data path has to perform. The eontrol unit
can be implemented using random logic, micro-ROM (Read Only Memory),
PLA (Programmable Logic Array) or n combination of these three implemen-
tations. Other macrocells, snch as TLB (Itandation Lookaside Suffe~),cache
memory. ete., can be added to the data path and the control nnit. In thj,
section, several blocks of a data path are discussed.
7.3.1 Arithmetic Logic Unit

ALU is an important part of a data path. It is a macrocell which executes
hthmetic operations snch as multiplication, addition, mbtraetion, negation,
and logic operations such ar AND, OR, XOR. camp-on, etc. It performs
the operation on two operands stored in latch A and latch B and puts the
result in latch C as shown in Fig. 7.34. The operation code (op code) selects
the operation of the ALU to be executed. The flags indicate the status o f the
ALU, snch as overflow, ser+rerult, and carry generation, etc. The input latches
A and Bare, in general, connected to two pardel data busses. Sometimes, the
input latches are merged with MUXs to select many input sauces to the ALU.
The result latch is connected to one of the busses or, to B t h d one. The ALU
described in this section is static for low-power applications.
The madmum clock frequency of a VLSI circuit may be limited by the ALU
operations; especially the arithmetic ones. The critical delay o f an arithmetic
452 7
CHAPTER
operation is due mainly to the carry propagation along the width of the ALU.
There are many types of ALU, depending on the number of operations t o be
performed. Fig. 7.35 shows the block diagram of a 1-bit slice of an ALU. It
has exactly the same structure as the adder, except that the P and G blocks
are programmable. Fig. 7.35(a) shows the P block with 4 control sign&
(OPI . . . O&). The feedbaek PMOS transistor. P j , permits restoration ofthe
high-level from VDD - V., to VDD.Hence the DC current of the first inverter,
due to the reduced high-level, is eliminated. Fig. 7.35(b) shows the G block
with 4 op code sign& (O&..OPa). The P and G b l a h use the pass-transistor
style. The techniques discussed in Section 7.1 can be applied to achieve low-
power and fast operation. The carry and resdt (sum) blocks m e shown in Fig.
1.35(c) and (d), respectively. Table 7.4 summarises some of the functions that
can be implemented with these blocks. Several other operations can be realimd
with this ALU.
Table 1.1 Examples of ALU wcrationr (1.me- with).
Operation LSB-C.. P function G fanction Op code

...ope)
(0P1
Add w. carry 0 P = A ZOI B G = A 01 B 10011101
Subtraction 1 P=AzorB G=AorB 10011101
Bit-wke AND 0 P=AondB G=O 01110000
Bit-wire OR 0 P=AorB G=O 00010000
Not A 0 P=H G=O 10100000
Table 1.4 (cm6inwd)
Operation Result
Add w. carry A tB
Subtraction A t B+1
Bit-wire AND A and B
Bit-arise OR A mB
Not A A
To implement an n-bit ALU, all the techniques discussed for carry speed-up in
adders can be applied. Drivers are needed to dirtribvte the op code signals for
VLSI CMOS SudSystem Design 453
- -
P P P P
454 CHAPTER7
* B
Eigure 1.38 Absolute value calsulntor
an n-bit ALU. Foi low-power design, the busses which communicate with the
ALU are in general not precharged 8s in the case of many data paths.
1.32 Absolute Value Calculator
The Absolute Valne Calculator (AVC) is, in general, used in data path.
of video processors to compare the data of two pictuw. Fig. 7.36 shows
the architecture of the AVC. This pardel circuits performs two subtractions
simultaneously, A - B and B A. Using the most significant bit of there two
~
operations, the MUX circuit selects the positive one. Then the output giver
the absolute d u e IA-BI.
To reduce the power dissipation and the area of an n-bit AVC, the logic of
two n adders rewired c a n be reduced by the merging of the common functions
for both operations. Also the techniques described in Section 7.1. for n-bit
addition. should be nsed
7.3.3 Comparator
A magnitude comparator is oscd in many DSP applications. It permits com-
parison of the magnitudes of two numbcis A and B by providing if A < B, or
A = B, or A > B. Fig. 7.37(a) shows an example of a two-bit comparator
which requires two types of eelk C1 and CZ. The cell, C1, is constructed by
the eireuit of Fig. 7.37(b). Table '1.5 shows the truth table for this cell.
Table 7.5 b t h tsbk for cLil C1
Let ns explain how B %bit comparator works. When A, c B,, then C, =

DI = 0, and A1Aa < BIBo regardless of the magnitudes of the lower bits
Simile.& for A1 > B,, then C, = 1, D , = 0, and AlAo > BIBo regardler.
of the magnitudes of the lower bits. When A1 = BL = 0, the magnitudes of
the two 2-b numbers depends on A. and Bo. In this situation, there are three
different cases:
1. AlAo < B I B ofor A. c BO (i.e., Co = Do = 0). Then we can set

Eo = Fo = 0.
2. AlAo = BLBOfor Ao = BO ( k . , C, = 0, Do = 1). Then we can set

Eo = 0 and Fo = 1.
3. AlAo > BIBo far AO > BO (i.e., C, = 1, Do = 0). Then we c m set
Eo = 1 and Fo = 0.
These relations can easily be nsed to implement the second cell, Cz, of the
comparator a8 shown in Fig. 7.37(c)
This technique, for the two-bit comparator, can be extended for an n-bit =om-
parator. It can be constructed by using B parallel tree of the cells C1 and C2.
A 4-bit comparator could. for example, be constructed with two 2-bit compara-
tors connected in parallel and at the output the 4 E and F generated signals
456 CHAPTER7
arefed to an added C2 cell. In this architecture, the glitching is reduced by

equdizing the delay paths of each cell.
7.3.4 Shifter
Another macrocell of the data path is the shifter. It pertorms shift or rotate
operations on the data If the number of bits to be shifted is arbitnuy, then
a barrel rhifter is used [12,131. Fig. 7.38 shows the CMOS implementation
s3 s2 S1 SO
of a 4 b i t barrel sbifter. NMOS transistors are used as switches in the array.

The input bns (Do- D,) can be connected to the output bus (Ra - RB)via
the pass transistors. The control signal So-hselects the pass transistors to be
switched. These signals determine the amount of shift and they m e generated
by a 2-bit decoder. Since the outpots have a high level of VDD- VT,due to
the pass transistor, then the output buffer nses a feedback PMOS device, Pf,
to iestore the high level to VDO.This eliminates any DC current in the first
inverter of the buffer.
Table 7.6 shows the values of the output bus function of the input data. De-
pending on the values ofD < 6 : 0 >, several shift operation8 can be performed.
For example if D < G : 4 >= “O”, and D < 3 : 0 > is the 4-bit input data, then
458 CHAPTER7
B l o g i d shift is realiued. However, if D < 6 :4 >= “1” and D < 3 : 0 > is the
input data, then an arithmetic shift operation is performed.
Table 7.6 Output bu. function of the &Sting amount
The barrel rhiftei is not 8 critical unit for the delay. A low-power operation is
performed by odng a static implementation. This shifter can be implemented
with transmission gates and the feeedbak PMOS are not required. However
for low-power, the use of NMOS array is more efficient. The feedback PMOS
should be sized to minimum.
7.3.5 Register File

A register file is a set oircgisters which store data. It consists of a small array
of static memory c&. Register files are wed by miemprocessors and DSPs
and they permit multiple read and write ports [14. 15, 16, IT]. A typical array
is 32 registers of 32-bit. For example an ALU needs two pieces o i data from
the regjster file. The array has dual-read ringle-te architecture.
Fig. 7.39 shows the schematic ofthe singleended memory eeU with 2 read ports
and 1 write port (2R-IW). The read ports are the r e d bit-lines BL.RI and
BL-R2. The memory cell, composed of two cross-coupled inverters h and 12
is addrwsed by two read word-line signals, W L R l and WL-R2. The NMOS
transistor N, is controlled by the Wzite Enable ( W E ) signal. N1 is connected
aerially to the write B E C ~ S S transistor N 2 . The transistor flz is controlled by
the write word-line ( WL - W) signal. The transistor N, isolates the stored data
from the write bit-line ( B L W ) .To write the datain the storage node A from
the write bit-line, the imerters I , and I2 rhonld be sized earefnlly. The P ratio
of the inverter I, should be larger than 1 (e.g., 5 ) to set the threshold voltage
of 1, to a law-level. This is due to the fact that Nl and N2 we&!+ transfers a
high level (only 1’00 -VT=). Moreover, to ensure a correct write operation, the
‘ThedeFdlianofB iasivoninChc~pirr4.
VLSI CMOS SubSysten Design 459
BL-W BL.RI BL-RZ
WL-w
WL-RI
WLLRZ
WE(Wdte Enable)
Figure 7.8s ( Z R I W ) rcgisterflle rrU.
feedback inverter 1, should he we& so the access transistors N, and N, can

chmge the state of node A. For example the NMOS and PMOS of I, shodd
be minim- siae except that the length of the NMOS is twice the minimum.
Also the acce55 transistars should have highcr p compared to the transistors of
1,. For a given technology, the sizes should be determined by circuit simulation
for a correct write operation. The inverter 1% is a buffer for the storage node.
A pair of three-port memory e& is shown in Fig. 7.40. This rtrueture has
shared access transistor Na and write bit-line, B L W . To read and write the
memory cell, the simplified rchematio of Fig. 7.41 is nsed. This schematic
uses the calomn multiplexing scheme. For low-power, the register file U E ~ S
static design and avoids the use of the conventional sense amplifier for bit-
line’s sensing. The sense amplifier consumes DC power. For a three port
register file, two read and one write row decoders are required. Also, Write
Enable (WE) and column addresses are needed to produce the column write
enable for writing the data to the specified storage node. For fast operation
AND gates can be u.ed with a m-om of of 5-bit inputs.
During the read operation, if for example Na is asserted, then the data is put
on the bit-line, BL.Rl. The bit-line is selected through the pass-transistor N,.
The data is then senred by the inverter I , in Fig. 7.41. During this period, the
460 CHAPTER
7
BL-FSA HL-W BL_R2H
BL-RIA WE-I WE-2 BCRiB
Figure 1.10 A pmir d t h r r c p o r t memory c& (2H-1W).
read enable signel, RE, is asserted, Ni is OFF and only the feedbaek PMOS P j
is activated when a one ( V D-~VT,) is on the data-line. In this situation, the
feedback PMOS charges up the data-line to VDD.Also the DC current, which
c m be generated due to the reduced high l e d on the data-line, is completely
eliminated. The p ratio of the inverter I, should be higher than one (e.g., 5 )
to achieve a symmetrical r e d access time for a % e m and a one. When R E = 0,
then the data-lines axe i 4 a t e d from the bit-liner and the NMOS transistor Nz
is ON. Therefore, the latch formed by the pair of inverters 11 and I , latches
the old data.
The operation of such a re&a file is fully static and does not dissipate any
atatic power at any mode of operation. Furthermore, the read and write o p
erations are asynchronous. This type of register file is suitable for low-power
applications.
7.4 REGULAR STRUCTURES

In this section we examine the design of large regular rtruetnres such as Pro-
grammable Logic Arrays (PLAs), Read Only Memories (ROMs) and Content
Addressable Memories (CAMS). The ROMs and PLAs are not only used to im-
plement controllers in a regular manner but they also can be applied to signel
processing. RAMS arc treated separately in Chapter 6. These large structures
VLSI CMOS SvbSystem Design 461
WSie decoder
(WAI
vow ,K.
Y l W ....
WE lWritof3nablc)
-
YOR. YOR. Y l R , .
RE (Read Enable)
462 CHAPTER7
me usually dynamic circuits for fart operation. These dynamic circuits can be
shut down with a power management Unit for power ravings. If for example
the do& is turned OFF, all dynamic circuits go into 8 piechsrge mode with
all PMOS precharge devices are ON.
7.4.1 Programmable Logic Array

Logic functions such s those used in the control units of VLSI processors, or
in finitestate machines, a r e hard to implement in random logic. One way of
implementing these functions, in a regular structure, is the m e ofProgrammable
Logic Array (PLA) [18,191.
PLAs have regular architecture divided mainly in two planes BS shown in Fig.
7.42. Theso planes pelform a specific fnnction such 85 OR and AND. CMOS
PLAs can be implemented in both static and dynamic styles. The style is
chosen depending on the timing strategy in the chip. Other factors such BJ
speed, power dissipation, and the allowed area, p l q an important role in the
PLA design style. A CMOS PLA example, ushg psendo-NMOS like style, is
s h a m in Fig. 7.43. The output OR functions are r & d with NOR gates.
From Fig. 7.43(a), we have
PI = A t B t C = A.B.C (7.33)
P, = A+C = A.C (7.34)
Pa =
-
B + C = B.6 (7.35)
-
P, = A + 6 = A.C (7.36)
The buffers are used when the load on the bit-line is large. They consist in
general of two invectez's stages. The OR plane is in principle similar to the
AND plane [Fig. 7.43(b)]. From Fig. 7.43(b), we have
x = Pi + P, + Pa (7.37)
Y = P, + P, (7.38)
For this pseudo-NMOS PLA, NOR-NOR logic gate style iz used. This example
shows that the PLA organization is useful for implementing Sum Of Products
(SOP) functions. Hence any SOP function can be redzed by programming the
army with the AND and OR cells. Any type of latch or register cm be used
at the input and output. ThL design style of PLAs has e n m d size area and
VLSI CMOS SudSystem Design 463
Inputs 0"tP"tE
Figvre T.12 AND-OR PLA ~ h r t e c t u r e .
it is simple to implement. However,it is not suitable for low-power application

due to the high DC power dissipetion, p a r t i d w l y when the PLA is large.
Moreover, it has B speed problem.
In dynamic CMOS style, the circuit shown in Fig. 7.44 can be used. It is a self-
timed PLA, where the AND and OR planes are both realised =sing precharged
NOR configuration. In this structure, o d a~ &gle clock phase is needed.
When the dock, elk, is high the bit-lines are preeharged in both planes. The
NMOS transistors NA and No are OBF, guaranteeing that there is no p.th
to ground. Tracking liner in both planes are used to generate a delayed clock
to the OR plane. When the clod is law, the prechargt PMOS transistors, in
the AND plane, turn OFF, N A tarns ON and the produets a~leevdnsted. The
tiaching lines ensure that No tuns ON only when the inputs to the OR planer
are stable. Othetwise the outputs can be spmiously discharged. This PLA is
fast, bnt it har a lot of wasted dynamio power. The wmted power har r e v a d
sources such ar:
464 CHAPTER7
_ _ _
X = ARC+AC+RC
Y = ABCiAC
x = q + Pi+ Fj$ L + P 4
(bl
Figure 1.48 P#eudD-NMOS CMOS PLA:(s)AND plane; (b)OR pknc.

AND-plane OR-plane
clk
- :vinua1Ground
Figure 7.44 Sclf-timcd d+c PLA using NOR-NOR style.
m The virtual ground Liner are charged and discharged every cycle. The
total eapheitance of the virtual ground is important, particularly for
large PLAs because for the purpose oflayout compactness the ground
lines ate in diffusion. This capacitance can be reduced using metal
level in multi metal’s technology;
m The number of inverters forming the buffers are important. Then,
duiing the evaluation, several of them switch; and
m The switching activity of dynamic NOR implementation is high [see
Chapter 41.
Consider now the PLA shown in Fig. 7.45 mith AND-NOR structure. The OR
plane is still the same compmed to the PLA of Pig. 7.44. However, the AND
plane is considerably simplified because:
rn The virtual ground Liner disappear; and

466 7
CHAPTER
AND-plane OR plane
Delay Tra'h"g
- 'Vinual Ground
Figure 1.45 Sclf-timeddynamic PLA u s h r AND-NOR stylo
The number of inverters for buffering is reduced by half.
The switching activity of the NAND implementation is aLo lower than that of
NOR implementation, resulting in Iower power in the AND plane. O n e problem
associated with this struetme is that the use of NAND may result in a large
discharge time.
Another dynamic PLA combines the pseudo-NMOS and dynamic logic design
styles [19].Fig. 7.46 shows an example of such a structure. The AND plane
uses a predseharged pseud-NMOS NOR style, while the OR plane uses B
conventional dynamic precharged style. During the precharge phase, the clock
signal is high and the bit-lines in the AND are predircharged to ground. In
the OR plane, the bit-lines are precharged to VDD.The i n p d s@ to the
OR plane are low. During the evaluation phare (clk = 0), the PMOS loads in
the AND plane are ON, and t h e plane behaves as pseudo-NMOS logic. In this
case, the PMOS device should be siaed correctly to ensure safe operation when
the output stays at a low level. The product terms are evaluated and then
the outputs. During this evaluation phase, the PLA dissipates a static power
m d y by the AND plane. Then the power is increased by this DC component.
PMOSlOad ,
This PLA does not need the seW-t-g techaiqne nsed previously. Also it was
shown that this PLA has a kst operation [IQ].
When implementing smaller controllers, it is sometimes more interesting to use

random logic. The implementation consists of two or more levels of logic gates
using s standard cell library. It is much less regular than a PLA structure and
it can have lower power dissipation.
7.4.2 Read Only Memory

Read Only Memory (ROM) is used in many applications. In DSPs, for
example. it can be used BJ table lookup to store coefficients. Also it is often
used in VLSI processors as a microcode controller. In this case, the ROM
contains the microprogram instructions. Typical miero-ROM size is 2k words
of 64 bits. The read-out cycle of the ROM limits the speed of the processor.
Conceptually, the structore of a ROM is quite similar to that of B PLA.
Fig. 7.41 shows a simple ROM circuit architecture using NOR logic design. The
state of the memory array is retained even if the ROM is not powered. The
89P
Bit-he (merall)
G - word-fine (rnCtSl2)
Diffurian
Ward-ime (polyriiicon)
Figure 7.41 Layout of a ROM memery cell
The ROM can be implemented in both styles: static and dynamic. In static
styla, the pseudo-NMOS logic, similar to that of static PLA, can be used.
Fb. 1.49 shows an example of a small ROM 'Lsing pseudo-NMOS circuit style.
The conditioning circuits use PMOS devices, with their gates grounded, and
the sense amplifier circuit is simply an inverter. The column decoder is also
shown. One of the column decoders selects one of the two bit-lines. Then,
node A is initially at VDD.If the selected bit-line is &charged, then node A
is discharged and the outpot is pulled up to VDD.The pseud-NMOS is eaey
to design and does not need a careful design, howveer, the power dissipation
may be significant due to the DC current. For a relatidy large ROM, like the
one used in microcontrollers, the power dissipation c m be significantly rcduced
using the low-power techniques of SRAMsa. They include pulse mode operation
using address transition detection, and r m d swing sensing, ete.
*These tecbsiisuca M discused in mom detail in Chapter 6.
470 CHAPTER7
ROW demder 4 a
q< Gmunded PMOS
Figure 7.40 PseudeNMOS ROM cirsYtry.
A dynamic version of the ROM ir shown in Fig. 1.50. During preeharge phase,
elk = 1 and the bit-lines are precharged to VDD- VT, where VT is subject
to the body effect. Node A is also precharged by the PMOS trensistar Pp.
The select lines Sell and Sei2 are controlled by a column decoder. Ail the
word-lines are predirchsrged to groond. Dudog evsluation, cfk = 0 and if the
hit-line is discharged to gro.aund, node A is also discharged. Then the ontput of
the inverter I is p d e d up. If node A is not discharged, the feedbadr PMOS
transistor Pt permits to maintain the high level at VDD.Since the swing on
the high-load bit-line is reduced, the power dissipation is reduced on this line
by a factor V D D / ( V D D - VT).
7.43 Content Addmssable Memory

A Content Addressable Memory (CAM)is an important maeroeell of a T~ms-
lation Loakaside Buffer (TLB) [XIand cache memory [21] circuits ofcomputer
systems. The TLB permits the translation of the virtual sddress of a CPU to
the physical address, and the cache memory from the physical address to the
memory data.
decoder Word-linc
Bit-line
Sdl + r
Figure T.60 Dynsmi~ROM cirrvit.y.
A CAM stores tags which can be compared against an input address word
(A o...A,,,) as shown in Fig. 7.51(*). A match detection signal is sent by the
CAM if the valuer stored in the CAM array match with the input address word.
A CMOS implementation of the CAM cell is illustrated in Fig. ?.5l(b). It c m
be readable and writable jwt as an ordinary memory cell. The read/write and
decoder circuits are similar to that of B RAM.
A tag word ir formed by identical cells which are repeated in a horiaontd array.
The write lines are used to write data in the array. The comparison procehs k
described e ~ ,follows. Dnring prechmge phase, the bit-lines me predischarged
low. All the write lines are low. The Match line (ML) is precharged high.
During the evaluation phase, suppose that a "1" is stored at node A. Assume
that C B L line is held high and m l i n e is held low. In this case, the transistors
N3 and N1 are OFF, hence the M L Line remains high, indiea&a match at
this bit location. Assume now that C B L is driven low and C B L high. The
transistor NQis OFF, but N1 and N2 are ON. Then the ML line is discharged,
indicating B mismatch at this bit location.
For an array of n tags, there m e n matchliner f M L ( 0 )...ML(n)). Each match

line is common to m cells. If there is B mismatch in any bit of the tag wocd,
the match line is discharged. If all the m bits match, the common match-line
remains high, To detect the match signal in any of the match liner a dynamic
472 7
CHAPTER
Wnfe Line(WL)
Match Line (ML

-
CBL (b) CBL
Plgurs 7.61 (a) CAM m a y ; (b) CMOS CAM cell

NOR circuit is used, LU shown in Fig. 7.62. When the clock is low the NOR
gate is precharged along with the match lines. The inputs to the NOR gate
me predischarged to ground. When the cUr signal is high (evaluation phase),
one of the match lines, MI,((), stays high and the others are discharged to
ground. When the msteh liner are stable, the eual signal i n asserted with elk
using self-timing (similar to the PLA case). This permits keeping the dynamic
NOR gate from falsely diecharging. The inputs to the NOR gate must not go
high until the data is stable. If one of the match line stays high, then the NOR
gate is discharged and the output matoh signal goes to high.
7.5 PHASE LOCKED LOOPS

Phase Locked Loopa (PLLs) have many applications in digital and analog
systems. In digital systems, on-dip PLLs are needed for the following reasons:
To reduce clock skew dne to clock distdbntion. As systems continue

to demand higher clock frequencies, dock skew associated with input
buffers snd clock distribution becomes a significant design problem LU
shown in Fig. 7.63(a). The internal dock drives the output register,
which in turn delivers the data to the output pad (with a buffer). The
474 CHAPTER7
skew between the external and internal clocks is due to the clock tree.
The outpot datais significantly delayed compared to the external clock.
One main contribution is the dock skew. In Fig. T.SS(b), the internal
dock is deskewed via the use of a PLL. The PLL shonld reduce this
skew OD B wide range of process, temperatnre and voltage vadations;
To synchronize data between chips as shown in Fig. 7.54. The PLL
solves the problem of clock skew Grom chip to chip. An example of such
an application is &cussed ia “221;and
To generate internal clocks with higher frequencies than the external
dock (system dock).
There are other applications of PLL for clock recovery in serial data communi-
cations and these are not discussed in this section. Several theoretical references
on PLLs can be found [23,24, 251. Thu section provides m introduction to
the PLL. The CMOS circuit design of the PLL, for low-power applications, is
then discussed.
7.5.1 Charge-PumpedPLL
One interesting C O Z L ~ ~ ~ U F L ~of O PLL is the charse-pumped loop shown in
~ Othe
Fig. 7.55. It is B PLL-based frequency multiplier which consists of a Phase
Frequency Detector (PFD), B ChargePump(CP)‘, a Loop Filter(LF), II Volt-
age Controlled Oscillator (VCO), and a programmable frequency divider. The
feedback of the internal dock is compared to the external clock for phase m d
frequency error. The outputs of the phase/frequency detector are two +tal
si& called U (for Up) and D (for Down). The charge pump and loop fl-
ter convert these digital EignaLE into ap analog signal (control) suitable for the
VCO. The VCO function of the control signal level generates a certain oscilla-
tion frequency. If the PLL generates multiples of the external clock Gequency,
then a frequency divider is inserted between the generated clock and the phase
detector.
A simplified diagram of the charge pump and loop filter is shown in Fig. 7.68.
It consists of two switchable corrent S O U ~ Cdriving
~ ~ an impedance (LF). The
pnlses generated by the PFD block are nsed to switch the charge pump, to
charge or discharge the impedance. The loop filter flters these pukw and has
an analog output signal to control the VCO.
‘Thc chargo PUP 102 PLL should not he confused with the one vacd to sonerate diffeicnt
“Oltagcl.
Clock
p outpu,
Data oul I D a a uul

I
Figure 7.6s PLL clock gener*ticm ior drakeluing: (a) n chip without PLLi
(b) a chip with PLL.
476 CHAPTER7
Chip#l Chip #2
Data pad
Figure T.66 Block diascm of the PLL.
7.5.2 PLL Circuit Design

This section presents the design of the PLL components. Fig. 7.57 shows the
I@ diagram of the PFD circuit. It usel mainly static-CMOS NAND gates
which results in good performance and law-power dissipation. The operation
of this circuit using the state diagram of Pig. 7.6T(c) is aa followa. The circuit
has three states: 1) UP,where the up signal U is w e r t e d when the external
clock elk.., f a down, 2) D O W N ,where the down signal D is asserted when
the internal clock elk fall. down, and 3) NOP,where the detector does not
LF
Q r4
change the ontpnt control signals. In thia last state both U and D signals are
at zero level. The d a t a changes whenevu clk or clk..t f a down. In no case
U and D are both activated.
Consider that d k and elk..t have the same freqneney bnt the f&g edges of
eB..t (elk) leads the falling edges ofclk (~lkept),respectively. Then, d ( 8 )is
asserted with II certain duty cyde, while D (U)is never asserted. In this case,
the PFD is characteiiaed &B the phase detector.
Consider now the case where clkezt has a higher frequency than elk. d is
asserted moat of the time. More falling edger of clEsmt signal than elk. A
similar sitnation vhen clE h s higher freqoency than clk,,, and D is assected
most of the t h e . In this case, the PFD is characterbed as frequency detector.
The 8 and b signals, generated by the PFD, BE connected to the charge

p m p dreuit of Fig. 1.58(a). When the signal d (d)is asserted the pull-up
PMOS (pull-down NMOS) transistor charges (discharges) the output, respec-
t i d y . Another variation of the charge pump circuit is shown in Fig. 7.58(b).
Two tranei4tors P,*j and are added as current 80urces biased by 8 current
478 CHAPTER7
clk
I
mirror circuit. In this situation, the output curent of the h g e pump can be
adjusted through the control of the current mirror.
The manolit!ic impLenentation of the filter of Fig. 7.56 is shmn in Fig.

7.59. The two capacitors C, and Cz are in the order of tens of pF and are
made with the NMOS transistors Ncr and Ivct. The re*stoz is made with a
transmission gate in dosed stste. It can also be implemented with an N-well
implant available in the CMOS pmcenn. The capacitor Ca is added in parallel
to the simple RC (R-C;) low-pass filter to form a second order filter. In this
ease, the stability of the system is maintained even with the process variation
of these on-chip components. Note that these capacitors c a n occupy a large
portion of the PLL.
The charge pump and filter generate a control voltage for the VCO. One
important parameter of the VCO is the VCO gain. When considering
the charaeted4tic frequency-control voltage, the VCO gai0 is the sbpe of lhis
characteristic. A linear characteristic is, in general, desirable. In general the
VCO is implemented using h ring oscillator as shown in Fig. 7.60. A series
connection of de1e.y inverter cells forms a tapped delay line which oscillates
with a frequency determined by the delay time of the cell and the odd number
480 CHAPTER7
of stages. The delay of the cell is controlled by a current which in turn is

controlled by the control voltage V,. V, modulates the ON resistance of p d -
down N1, and through the current mirror,the p d - u p PI. All the devices of the
VCO should be oriented in the same direction and have redundant contacts to
reduce the jitter due to process variations. In the VCO of Fig. 7.60. madmnm
frequency is achieved at madmum control voltage. Typical values of the VCO
gain at low power supply voltage E B range
~ from 10 MHn/V to 100 MAzjV
depending on the number of stages and technology. Note thst the bandwidth
of the VCO presented previously is limited.
The VCO of Fig. 7.61 har an excellent bandwidth characteristic, where B wide
range of frequency can be generated I%]. It ia used for video signal processors
end covers a wide range of applications. The freqnency range EM change by one
order of magnitude from 50 MHz to 350 MHe. In fig. 7.61, by turning ON and
OFF 8 CMOS TGs with control signals, the number ofring oacihtor stages can
be selected among eight values (7,S,ll,l5,Zl,ZS.3S.61). Each stage of the ling
oscillator combines an inverter in parallel with II current-controlled inverter.
The inverter inereares the frequency of oscillation of the VCO, where= the
currenteontrolled inverter permits tuning of the frequency of the VCO.
The generated clock frequency can be N times the external dock frequency
(reference frequency). This dock then feeds the clock driver and tree. Since
the PLL discussed here is intended to be integrated on-ehip, it is then sensitive
to the noise generated on the power lines (called power-supply-induced dock
jitter). If the power supply changes by 100 mV the skew 01 phaae error will
Flgure T.00 VCO wing m n t controlled OMOS ring oscillator.
Selection signals
7th stage 5 I It stage
Generated clock
Figure T.01 VCO with .&&able charsctrti.tie..

482 CHAPTER
7
be important before the PLL has time (tens of clodrJ eydes) to correct this
emor [ZT]. One vay to reduce the effect of thjs problem is to dedicate an analog
power supply pin to the VCO and the charge pump. At the drcuit l e d , a ncw
VCO delay cell war proposed by Young [ZT] to iedoce the phase error.
Another VCO dhmatilse is shown in Fig. 7.62. It is rimilm to the Voltage-

Controlled Delay Line (VCDL) [%]. The control voltage, V., is used to vary
the amount of the effective load seen by each inverter output. The frequency-
control voltage characteristic of this VCO has a negative slope. Then the
minimum frequency of osdllation is linlited by the maximum V D DTherefore,
.
the minimum freqnency is increased with iednced VDD.A positive slope is, in
g e n e d , desirable so the mioimum frequency is not set by VDD.
The frequency divider can be implemented using togglc flip-flops. Fig. 7.63
shows an example o f a divider with division ralm of 1, 112, 114, and 118. The
PLL, so far discussed, is not completely digital. Only the PFD, charge pump
and the frequency divider are digital. While, the I F and VCO are analog m d
operate 8s eontinuoostime systems.
7.5.3 Low-Power Design

In deep mode, the on-chip PLL may bc controlled for low-frequency operation,
or it may be disabled to reduce its power dissipation to the lealrsge currents.
T clk T clk
Q Q
Figure 1.84 A VCO emntrollcd by enable dgtd far low-pow= modc

484 CHAPTER7
As an exsmple, to disable the PLL, is to shvt down the VCO and disable
the external clock. Fig. 7.64 shows the Same VCO of Fig. 7.62 but with one
inverter transformed to a tw&nput NAND gate. One of the inputs is controlled
by the Enable signal to shut down the PLL when it is low. The NAND gate
can be used for any of the VCOs presented previously. Also the enable signal
can be used to disable any current O O I I T C ~used in the PLL to eliminate any DC
cunent. A typical power dissipation of B PLL, at 3.3 V,is in the range of tens
of mW depending on the frequency.
7.6 CHAPTER SUMMARY

This chapter has presented the design of aeverd subsystems used in VLSI chips.
Many circuit alternatives are discussed which trade area, speed and power. The
reader can construct theoe options and compare their performance in terms of
power, delay and area. The power dissipation isrue is stressed more. Also
several building blocks of VLSI chips using advanced circuit tcdrniqoes have
been investigated. These iodnde
rn High-speed addition.
rn Multiplication techniques.
I PLL and clock deskewing technique.
REFERENCES
[l] J. Mori, et al., "A 10-ns 54 x 54-b Pardel Structured Full Army Multiplier
with 0.6-pm CMOS Technology." IEEE Journal of Solid-state Circuits,
vol. 26. no. 4, pp. 600-606, April 1991.
(21 J. SUansky, "An Evaluation of Several Two-Snmmand Binary Adders."

IRE 'Itanrllctions on Electronic Computers, vel. EC-9, pp. 213-226, June
1960.
[3] J. Sklansky, 'Conditional-Sum Addition Logic," IRE Transactions on Elee-
tronic Camputem "01. E C Q ,pp. 226-231, June 1960.
[4] I. S. Abu-Khater, R.H.Yan,A. Bellaouar, and M. 1. ELnaary. -A 1-V Low-
Power High-Performance 32-b Conditional Sum Adder." IEEE Symposium
on Loar-Power Electronics. Tech. Dig.,San Diego, pp. 68-67,October 1994.
[5] T. Sato, et al., "An 8.6ns 112-b Transmission Gate Adder with a Conflict-
Frec Smass Circuit," IEEE Journal of Solid-State Circuits. 701. 27, no. 4,
pp. 657-659, A p d 1992.
161 K. Ucda. H. Susiki.. K. Suds. Y. Tasuiihashi..~X. Shinohara. "A Whit
~
' Adder Ey P a r Tranaislor B&OS Ci"rcuit," IEEE Custom' lntcgrsfcd

Circuit Conference. Tech Dig. pp. 12.2 1-12 2 4 \lay 1993
(71 K. Hwang, "Compoter Arithmetic: Principles, Architecture, and Design,"
John Wiley and Sons, 1979.
[8] J. J. F. Cawnagh, "Compoter Science Series: Digital Computer Arith-
metic." MeGraw-Hill Book Co.. 1984.
[Q] M. Nagsmatsu, S. Tanaks, J. Mori, T. Noguchi, and K. Hstanska, "A 16-ns

32x32-bit CMOS Multiplier with an improved Pardel Structure," IEEE
Cuatom Integrated Circuits Conference, Tech. Dig., pp. 10.3.1- 10.3.4, May
1989.
486 DIGITALVLSI DESIGN
LOW-POWER
[lo] N. Ohkubo, M. Suzild, T. Shinbo, T. Yamanaka, A. Shimieu, K. Sasab,

and Y. Nakagome, 'A 4.4-n5 CMOS 54x54-b Multiplier nsing Pass-
Transistor Multiplexer," IEEE Custom Integrated Circuits Conference,
Tech. Dig., pp. 599-602, May 1994.
[Ill R. Bechade, et al., "A 32b 66MAu Microprocessor," IEEE International
Solid-State Circuits Conference, Tech. Dig.. pp. 208-209, Februaiy 1994.
[12] C. A. Mead, and 1.A. Conway, "Introduction to VLSI Systems," Addison-
Wesley, 1980.
[13] R. W. Sherbnme, e t al., "Data path Design for RISC," Pme. Conf. Ad-
vanced Research in VLSI, pp. 53-62, 1982.
[14] R. W. Sherburne, et al.. "A 32-bit NMOS Microprocessor with e Large
Register File," IEEE Journal of Solid-State Circuits, vol. SC-19, no. 5, pp.
682-689, October 1984.
[I61 K. J. O'Connoz, "The %-Port Memory Cell." IEEE Journal of Solid-
State Circaits, vol. SC22, no. 5, pp, 712-720, October 1987.
[I61 R. D. Jolly, *A 9-ns, 1.4Gigabyte/s IT-Ported CMOS Register File," IEEE
Journal of Solid-State Circnits, vol. 2 6 , no. 10, pp. 1407-1412, October
1991.
[I?] H.Shinoharn, et al., '"A Flexible Multipoit RAM Compiler for Data Path,"
IEEE Journal of Solid-state Circuits, "01. 26, no. 3, pp. 343-349, March
1991.
1181 A. R. L-, "A Low-Power PLA for B Signal Processor," IEEE Jonmal of
Solid-State Circuits, voL 26, no. 2, pp. 107-115, Febrnary 1991.
[I91 G. M. Blair, "PLA Design for Single-Clock CMOS," IEEE Jounal ofsolid-
State Circuits, vol. 27, no. 8, pp. 1211-12113, August 1992.
[ZO] H. Kadota, et el., "A 32-bit Microprocessor with On-Chip Cache and
TLB." IEEE Journal ofsolid-State Circuits, vol. SC-22, no. 5, pp. 800.807,
October 1987.
[Zl] A. J. Smith, "Cache Memories," Computing Snrveys, Vol. 14, pp. 473-530,
September 1982.
(221 L. Ashby, "ASIC Clock Distribution using a Phare Locked Loop (PLL)," in
IEEE International ASIC Conference and Exhibit, Tech. Dig., pp. P1.6.1-
P1.6.3, September 1991.
REFERENCES 487
[23]F. M. Gardner, "Phase Lock Techniques," John Wiley and Sons, 1919.
[24] F. M. Gardner, "Charge-Pump PhaseLocked Loops," IEEE Transactions
on Communications, COM-28(11). pp. 1849-1858, November 1980.
1251 R. E. Bert, "Phase-Locked Loops," McGraw Hill, 1984
[26] J. Goto, et al., "A Programmable Clock Generation with 50 to 350 MHz
Lock Range for Video Signal Processors," IEEE Custom Integrated Cir-
cuits Conference, Tech. Dq.,pp. 4.4.1-4.4.4, May 1993.
[21] I. A. Young, J. I<. Greason, and K. L. Wong, "A PLL Clock Generator
with 5 to 110 MHs of Lo& Range for Microprocessors," IEEE Journal of
Solid-State Circuits, 701.21, no. 11, pp. 1599-1607, November 1992.
[ZS] M. G. Johnson, and E. L. Hodsan, 'A Vaiahle Deb7 Line PLL far CPU-
Coprocessor SyruchroniUation," IEEE Journal of Solid-State Circuits, vol.
23, no. 5 , pp. 1218-1223,October 1988.
8
LOW-POWER
VLSI DESIGN METHODOLOGY
Thk chapter presents Low-Power (LP) de- methodologies at several abstrac-

tion levels such as physical, logical, architectural, and algorithmic levels. AU
the power reduction techniques discussed are related to the dynamic power
dissipation. It is shown that LP techniques, at the high-level (algorithmic and
architectural) of the design, lead to power ravings of several orders of magni-
tode. Many uampleo are included to give the reader a quaotitative picture
of LP issues. Several LP techniques, particularly at the circuit level have al-
ready been discussed in Chapters 4, 6 , and 7 including those related to static
power oonsiderstiona. However, they are not reconsidered in this chapter. The
power estimation techniques at the circuit, logical,architectural and behavioral
levels are overviewed. Power aoalysk a t high-level d o - a~ early prediction
and apt-stion of the power of a system. The LP concepts such as switch-
ing ncti.;ty, glitching, etc., discussed in Chapter 4 are used throughout this
chapter.
8.1 LP PHYSICAL DESIGN

There are several techniques to reduce the power at the physical design (layout)
level. Same ofthese issues hwe been discusscd in Chapter 4 for full-custom and
semi-curtom designs. In this section w e present two approaches for low-power
physical design.
490 CHAPTER8
8.1.1 Floorplanning
Floorplanning of a circuit is the first step in VLSI layout design. It permits the
allocation of space on a chip for a given set ofmodules. A module can be rigid,
e.g., the module is in the library and its dimension and power dissipation are
known. or pezibie, e.g., it has not beon deaigned and has B list of parameters
such as different shapes and power consumptions for feasible implementations.
Floorplanner for low-power design should choose a suitable implementation for
each module such as the total power/area of a chip are optimieed [I].
8.1.2 Placement and Routing

The placement and routing of a VLSI circuit is performed on standard cells,
gate armyys, functional blocks, etc. All the diffeient modules me already laid
out and well charactedeed in the library. Traditionally, placement refers to
the process of placing modules to minimize area and delay. Placement for
low-power uses the switching activity-eapaeitanee products as B function to be
minimized, in contrast with delay minimiuation, where the wire capacitance
has to be minimiad. After placement, routing permits connection of the mod-
ales with wirer. High switching activity wires should he kept short using the
lower parasitic capacitance layer. A CAD tool for placement has already been
developed [4.
8.2 LP GATE-LEVEL DESIGN

The low-power design methodology should &LEO be applied to logic design.
To achieve thia goal, power is traded for speed and area. In this section, we
discuss a number of techniques to reduce the switching activity and internal
capacitances during teebnology-independent and technology-dependent phases
of logic design.
8.2.1 Logic Minimization and Technology Mapping

The area and power optimiaation of logic structures (both combinational and
sequential) have matured considerably. The power optimimtian task benefits
from there techniques. The objective of logic minimization is to reduce the
boolean function. For low-power design, the signal witching activity is mini-
Low-Power VLSI Deszgn Methodology 491
mized by restructuring a logic circuit during the technology-independent phase

[3]. It is assumed that at the higher-level of abstraction, decisions regmding
the power supply voltage and the dock Bequency have already been made. The
power minimidion is eonstrained by the delay, however, the area may increase.
D-g this p h e of logic minimization, the function to be minimis& is
-
where P, is the probability of the node i being a "1" (1 P$)is the probability
that node i is a 'V", and Cs ia the capacitance of this node. For more infar-
mation on thia model see Section 8.5.2.1. To minimiie the above equation. one
has to first evaluate the current value of P; and then change it by making P:
dose to 0 or close to 1. Also in [3], zero-delay approximation is assumed. This
implies that the glitching power is neglected.
-
To minimize the switching activity, some techniques that can be used are:
Use don't-cares to minimize the probability P< of II function. Indeed,

the signal probability of B gate can change by altering the ON-set or
the OFF-set by adding points from the don't-cme set.
rn Collapse nodes that are not on the critical path. The intermediate
signal lines me implemented as single node. The delay may increase,
however this does not affect the m m d l performance of the circuit.
Power dissipation can be imprwed by m much as 60%, at the expense of an 8 %

area increke [3] and with no delay degradation. More typical power reduction
would be in the range of 10.20% [4].
The technology mapping step for low-power refers to the process of trans-
forming a logic function into a technology-dependent (e.g,, CMOS) circuit with
minimieed power consomed. This technology dependent Step ~ s e sa target
technology. The first step in technology mapping is to decompose each logic
function into twwinput gates. The objective of this decomposition is to mini-
mize the total power dissipation by reducing the total switching activity. Fig.
8.1 shows an example of a foor-inpot AND gate decomposition into two dif-
ferent implementations. The probabilities of inpots being at "1" logical are
also shown in pig. 8.1. Primary inputs ace assumed to be uncotrelatcd. The
switching activity at each internal node is also shown in Fig. 8.1. A two-input
( i , j ) AND gate is given by
a = (1- P,Pj)PdPj (8.2)

492 CHAPTERa
Lmpiomcnration 1
lrnpiemsntition 2
We s m m e also that the gate delays are zero to ignore the power dne to the
glitehing phenomenon. The total switching setivitie for implementations 1
and 2 are 0.888 and 1.056, respectively. Therefore, implementation 1 is better
than implementation 2. This problem ofdecomposition was addressed by [5,6].
In 151, the power dissipation, associated to glitehing, is neglected while in [6]it
is not. Taking into rrccount the power dissipation of glitches is very i m p o r t a t
ar is discussed in Section 8.2.2.
The concept of technology mapping of logic opt-ation is an important step

for standard c e h and gate anays (or sea of gstes) circuit design. All the cells
in the library are characterized in terms of ares and speed. Another parameter
to be added for low-power design is the characterization ofthe internal power of
the gate and its output parasitic capacitance. Hence the process of technology
Low-Power VLSI Design Methodology 493
mapping ir to search, using B target library, the best possible implementation

following constraints such power, area and delay.
In this aectian we do not consider the algorithms for technology mapping. The
reader can consult rcfcrencea [5, 71. We illnstrste this concept of technology
mapping by the following example. Fig. 8.2 shows an example for implementing
the logic circuit of Fig. 8.2(a) into two implementations. The first implemen-
tation [Fig. 8.2(a)] is for minimal area deign using OAI (OR-AND-INVERT)
gate. The second implementation [Fig. 8.2(b)] is for minimal power design
where the high switching node N of Fig. 8.2(a) ir hidden using B mom complex
gate.
Thus the process of technology mapping is to &st decompose the logic func-
tion such that the total switching activity is minimbed. Then, to hide any
high svitching activity node within complex gates 80 that the capacitance of
that node is minimisod. However, mahiog LL gate too complex c a n trade the
delay for low-power. Typical reduction in power dissipation is on the order of
20% without any degradation in performance but st the expenac of small area
penalty.
The quality of the targeted cell library can considerably impact the results of
mapping [S]. For eremple, the availability ofcells with different drive etrengths
and doublerail outputs (signal and its complement) gives more fleldbility for
logic optimisstion. A goad library a n result in 20-5095 of power dissipation
reduction.
8.2.2 Spurious Thinsitions Reduction

Due to the finite delays of logic gates, signal m e * in static logic deigns can
result in dynamic hasards. Hence, a node can have transitions in one dock
eyde before stabbing to the correct Logic level. These unnecessary switching
transitions (glitches) can consume power dissipation in the order of 20.40%
19, 10, 111.
To .educe this power the first appioach in to balance the path delays by chang-
ing the logic atmsture (e.g., tree) ar explained in Section 4.5.5. Another tech-
nique ir to balence the delay of the patho by pising down the gates in the fast
paths 1121, However, this approach can increare the delay of the circuit. ALSO
insertion of buffers (delay elements) in the fast paths can baknce the delay.
However, the added buffers increare the power dissipation.
494 CHAPTER
8
Another techniqne employs self-timing techn;gues to reduce the lo@= depth

and then the glitehing power [9, 111. The self-timed circuit should save more
power than what it introducer. As B cLcuit example that exhibits spadous
transitions, is an adder. The rum sign& can have fake transitions before they
are stable. If the load capacitances on the outputs are relatively large, then the
power due to the glitches can be important.
A conventional self-timed method for an adder is shown in Fig. 8.3. A Tran-

sition Detector (TD)similar to the one discussed for SFLAMs h Chapter 6 is
used. For each set of inputs ( A and B;) there is one transition detector. If
A and B are both n-bit wide, then n TDs are reqnired for the pardel adder.
For any transition at the inputs, the TD generates a pulse for the self-timed
function. This self-timed circuit delays the pulse by an amount equal to the
critical pnth of the adder. The delayed pulse then feeds the clock of a D-Flip-
Flop (DFF) or B gated &wit for the sum function. Consequently, the output
Self-timed
funclion
-
Gated
Pdlel-adder
function I
s m s are not witched notil they are evaluated. The additional Circuitry in the
conventional approach UUI colls~unrmore power than it mag s m e .
Another approach bsded on self-timing to reduce the spudous transition was

proposed by [ll]. Fig. 8.4 shows a parallel adder using simple self-timed
circuitry. When input signals are written into the registerr A and B, a single
register bit is used to genepate an 'Input Valid" signal to the self-timed function.
For an n-bit pardel adder, only B onebit register is required. es shown in Fig.
8.4. The self-timed function is implemented using a series of inverters with
dual-rail. Two enable signals E and 3 are generated by the selEtimed Circuit.
They feed the gsted sum XOR gates. Also the enable ipd, E. cantrola the
one-bit register to disable the i n p m t d i d signal. This technique har resulted
in 25% power reduction [ll].
496 CHAPTER8
Parallel-adder
Gated Output XOR Oale
8.2.3 Precomputation-BasedPower Reduction

Consider the original circuit of Fig. S.S(a). R1 and R2 are two registers at the
input and output of II combinational logic block. The idea of precomputing is
to preevaluate the output values of the circuit one clock cycle before they are
required, to disable a part of the input register R1,then to reduce the inteinal
switching activity in the succeeding clock cycle [l3]. Fig. 8.5(b) illustrates B
simplified architecture of the preeompoting concept. Thin technique can be
applied to several circuits su& BS: Finite State Machines (FSMs), pipeline
circuits, etc.
To illustrate this technique, consider the ulunple of an n-bit comparator that

compares two n-bit numbers A and B and computes the function F that indi-
cates that A > B. Fig. 8.6 shows the application of precomputing technique to
the comparator. If the most signifiesnt bit, A=.I and B,.,, are different, then
F ean be performed from the 1-bit MSB comparator and the registers R2 and
R3 are disabled. Therefarc, the (n-I)comparators are shut-down. If the inputs
have a uniform probability equal to 0.5, the enable signal has a pmbability of
0.5 to be at the logical level "1" or "0". Therefore. for h relatiwly large n the
power saving can be qnite significant even if we include the power due to the
*dditional circuitry.
This technique of preeomputation can be synthesized for logic opt-ation.

The selection of sub-set of input signals for which the output is precomputed
is critical for power savings. Otherwise, the additional circuitry can dissipate
a relatively important power. Note that this added logic slightly increases the
area of the circnit and may also inerese the clock cycle. The preeomputation
techniqne can be applied to a mnltiple output function. However, if the logic
has a large number of ontputs, then it may be worthwhile to s e k c t i d y apply
precompotation technique to a small number of complex outputs. This selective
partitioning will add a duplication of combinational logic and regirtera and this
may offset the powex savings.
498 CHAPTER8
8.3 LP ARCHlTECTUKE-LEVELDESIGN
In this section, sxhitecture meens also Register Transfer Level (RTL). The
architecture uses a set of primitives suoh 8s adders, multipliers, ROMs, register
filer, etc. RTL synthesis programs m e used to convert an RTL description to
a set of registers and combinational lwgic. The impact of low-power techaiqnes
on the architecture level c a n be more significant than the gate level as .rill
be shown in this section. Techniques to reduce the power dissipation discxssed
m e : parallelism, pipeline, distributed processing m d power man<&ment.
8.3.1 Parallelism
Parallelirm can be used to reduce the power dissipation at the expense of area
while maintaining the same throughput [lo]. To finstrate thia, the quantitative
example of Fig. 8.7 is considered. In Fig. 8.7(a), a regbter snpplies two 16-bit
operands to a 16 x 16 multiplier. We refer to this architecture to reference one
and we w e the ref notation for frequency, power snpply voltsge, power dissipa-
tion, etc. This register is clocked at a maximal frequency f , s j = 50 ME$.We
assume that the worse case delay of the multiplication is 20 ns at V,el = 3.3 V
power supply voltage. It is clear that we cannot reduce %,I to reduce the
A
500 CAAPTER
8
throughput as in the c s e of Fig. 8.7(a). The input registers are docked at

f7.,/2 = 26 M A S . Therefore, the power snpply can be reduced to achieve
B worst c- delay of 40 m. With the same 16 x 16 multiplier, the power
supply UUL be reduced Gom K,f = 3.3 V to 1.8 V ( V , s j / l . 8 3 ) . This value
can be determined from the simulation of the two architectures. The effective
capacitance has increased by a factor of 2 due to the duplication. However, due
to the extra routing to both multipliers, thb effective capacitance is around 2.2
G C j . Thus, the estimated power dissipation is given by
Hence
Ppe7= 0.33P,.j
Thus, the power dissipation is significantly reduced.
The key to this power ssVings is the duplication of the hardware in parallel
configuration. In general, N processors E B be~ paralldked by duplication, with
each processor running with slower do& (by 8 factor of N).In this case, for
the s a m e throughput, the power dissipation c a n be ieduced with the increase of
N. Therefore. the power ropply voltage (VDD) can be aggressively rednced to
meet II worst case delay almost equal to the reference delay divided by N. To
wploit this power mpply reduction, the threshold voltage ( V T ) should also be
reduced to limit the degradation of the delay as VDDapproaches VT. Keep in
mind that the scaling of VT is also limited by the static current oonsiderations.
When the number N is relatively large, the parallelism can lead to several
problems. A highly p m d d k e d configuration can result in s drastic incresse of
the occupied area. In addition, there is rooting overhead to distribute the input
and output signals. This also increases the &re8 and the wiring capacitance.
Therefore, the power dissipation &a tends to increase and then limits the
utility of parallelism.
8.3.2 Pipelining
Pipelining is another arehiteetluc that can reduce the power dissipdion [lo].
As an example, let us consider the case of the 16 x 16 multiplier presented in
Section 8.3.1. The 60 MAB multiplier is broken into two equal parts as shown in
Fig. 8.8. A set of pipeline registtun (or latches) is inserted, resulting in a 2-stage
pipelined version of the multiplier. Architectures with more pipeline stages can
i mulliplicr
be realized. S i e e the hardware between the pipeline stager is reduced then the
reference voltege V,.! = 3.3 V c a n be reduced to 1.8 V (V,.t/1.83) to maintain
a worst case delay of 20 ns (50 MHe). The estimated power dissipation is given
hv
The switching capacitance has increased slightly due to the pipelining. Thus,
the power dissipation is redneed by a faetar ofalmost 2.8 which is spprodmately
the same IU the pardel EIUC. Alao the area increase is relatively low and the
area penalty h due only to the additional registers (or latches). As the pipeline
registers reduce the logic depth, the power dissipation, due to the glitches, is
also reduced.
In general, if a processor is pipelined with N stages of regiptets, then the delay

between the pipeline stages is reduced by almost a factor N while the dock
frequency is maintained. Then, the power supply voltage can be scaled sggres-
sively. Canscqnently, the power saving is large.
Note that ez in the case of pardelism, an architecture with a large nnmber of

pipeline stages can result in an offset in power and &re&. The added registers
must be clocked and hence the load on the clock network c a n be important,
with increased pipelining. One drawback ofthe w e of the pipeline is that more
latency is added to the ontput signal.
The combination of pipelining and pardelism c a n result in further power re-

doction. because the power gopply voltage can be reduced aggressively. Also
502 CHAPTER8
the frequency of operation is reduced. However. the luea would increase sign%-
eantly. For low-voltage, the threshold voltage should also be reduced to reduce
the power dissipation, otherwise the power supply voltage redoction is limited.
Indeed, at low-voltage, VDO approaches VT and the delay inereares d r a r t i d y .
To maintain the throughput with pardelism/pipelhing, the threshold voltsge
should be reduced compared to VDO.
8.33 Distributed Processing

To reduce the power dissipation of a centraked processor, B distributed pro-
cessing technique can be ntihed. This concept of distributed processing is
explained by the example of the Vector Quantied (VQ) image encode [I41
presented in reference [15]. First we review the VQ algorithm for the video
compression, then in the next section the power reduction st the algorithm
level of the VQ is discussed.
A video image, represented by a group of pixel, is vector qoantized by b r e a m

it into blocks (uectois) of pix& that are mapped to a codebook of probable
vectors using Mean Square Error (MSE) as the distortion m e m e . For the
example given in [15], the image is segmented into 4 Y 4 pkel-vector (vector
siae is 16). The VQ employs B codebook of 256 lev& The inpot data is
represented on 16 x &bit and the output (&bit) represents the index of the
best match as shown in Fig. 8.9 [El. Then the compression ratio is 163. To
process 30 framesjs, a vector must be compressed every 17.3 ps ( e d frame
is 128 Y 240 pixels). The MSE (distortion metric) between a vector X of 16
pix& and a codebook vector Cis given by
15
MSE = c ( C ; - X $ (8.8)
i=o
To compute this algorithm, a large number of memory access to the codebook
and arithmetic operations is needed (see Section 8.4). The number of com-
putations can be reduced by using differential search a priori combined with
TrecSearch (TS) between two vectors a and b at the s a m e level of the tree.
The distortion diffeience between the two vectors a and 6 at the same level of
the tree is given by
M.7E.s = M S E , - M S E b (8.9)
Then,
1s 16
- X,)'- c(Cbi - X,)l

MSE.a = c(C.i (8.10)
i=o i=o
The two terms (C:; - CiJ and Z(C,; - C,) are Computed B pliori and stored
in a memory to reduce the number of operations.
Fig. 8.10(a) shows the centralized implementation of the VQ. It has a ten-
traliaed memory, processing element, and eontroller. This architecture is time-
multiplexed, wbich performs operations sequentially over a large number of
clock cycle^. In TSVQ, each l e d of the tree has specific code vectors that
are found only at that level. Therdore, the memory can be paltitimed into
separate memories for each level of the tree. Fig. 8.10(b) shows the distributed
implementation of the VQ.The memory s k e from one module to the other in-
creaser. The architecture is pipelined allowing the dock frequency and supply
voltage to be reduced.
The distributed memory architectme has lover switched capacitance when

leading the code vectors than the centralized ease. This distributed imple
mentation has eight controners and prowsing elements, bot since th.7 arc
clocked a t lower freqneney, with low svpply voltage, the energy dissipated per
- -
vector does not change [15]. Through this partitioning, the power dissipated,
of the eentraliaed implementation, was reduced by a factor 11 at the expense
of an area increase by a factor of 2.
504 CHAPTERa
From this example we can learn that proper design of the architecture, through
distributed processing, is more power-efficient than the centralieed procerror.
In the distributed implementation, the different l o d hardware ~esonrcescan
be optimized more efficiently than the global hardware in the centralized imple-
mentation. The application of this technique depends on whetha the executed
algorithm can be partitioned. Keep in mind, that the power s8-g trades the
occupied area, while the throughput is maintained.
8.3.4 Power Management

In old designs of microprocessors, DSPs, ASICs, etc., there war warted power
due t o the clocking of blocks which a e idle for B significant period of time.
Recently, power management methodologies are playing an important iole to
avoid wasted power in normal and standby modes of operation [I?, IS, 19, 201.
In this section, only some of the power management techniqnes m e discussed.
There are two types of power management: i) dynamic and i) static. Dynamic
Power Management (DPM) allows selective shut-down of different blocks of
the chip based on the l e d of activity required to run a particular application.
Different blocks of the chip may be idle for a certain period of time when
mnning different applications. For example, the floating point unit can have
lOO%idletime when the processor is executing integer applications. The DPM
requires additional logic on the chip. This logic is controlled by signals of idle
periods.
In the PowerPC' 603 [21], the DPM mode is ensbled by software. The DPM
logic automatically stops the dock switching of specific unit generated by
clock regenerators. The clock regenerators produce two docks, C1 and C2,
which feed master and slave latches. Two "freeze" input signals control the
clocks, C1 and C2, as s h o w in the timing diagrams of Fig. 8.11. The logic
needed for DPM does not introdnee any performance degradation and it eon-
s - ~ 0.3% of the total die areain the PowerPC. The DPM provides a power
raving of 10.20% depending on the application to be executed. The DPM can
be implemented at either high-level (cg., execution u.it) and low-level (e.g., a
block inside a unit) of hardwlue.
Static Power Management (SPM) permits the awing of the power dissipation in
the standby mode. In this $me, the activity of the entire system is monitored
rather than a specific unit (or block). When the system remains idle for a
'PowerPC 603 is h a m l B M C o w .
506 CHAPTER8
yl
T
c1 ...............
................
...............
........
CLLiRr-tLh
a_FP.EEz
e
................
c2 ~
........ .........
c1mm
c1
significant period of time, then the entire chip L rhut-down2. The SPM may
have several modes depending on whether the entire chip is shut-down or a part
ofit. For example, the PowerPC 603 has three modes which are programmable
through a hardware bat controlled by software (operating +em). In this
microprocwor, one mode is called sleep mode which allows a m-am power
swings by disabling the do& to all units. h this mode the PLL and external
input do& are disabled to bring the power dissipation down to the leakage
levels. The power of PowerPC 603. in the sleep mode, is as low as 1.8mW
1201.
8.4 ALGORITHMIC-LEVELPOWER REDUCTION

Algorithm opt-ation can have a signifcant impact on the power eonsump
tion of a system. Design decisions, made at this level, combined with the
architecture level, may lead to a large powcr saving. In this section, we disicnsr
two approaches that reduce the power dissipation at the algorithm level. The
first one is based on the reduetion of the switched capacitance, by minimieing
the complexity of the system. The second method cxploita data coding for the
purpose of low switching activity.
8.4.1 Switched Capacitance Reduction

The power dissipated bs an algorithm can be mearmred, for example. by coant-
ing the number of operations reqnired to execute such an algorithm. To reduce
the power of an algorithm, the number of primitive operations so& as: mem-
ory access, ALU operations, ctc., should be minimiled. The different types of
operations do not consume the same amount of power. For example a multi-
plication operation consumes more power than an addition operation. Thus,
when minimiving the number of operations of an algorithm, the type of op-
eration should be taken into account. Keep in mind that high performance
systems w e complex algorithms that require a large nnmher of operations.
To illnstrate this consideration, the computation complexity for three methods

of the VQ algorithm are presented. Remember that the distortion metric b e
tween the input data (vecto. X )and a codebook vector Cis given by Equation
(8.8). One method to evaluate the distortion and find the best match is to use
B full rearch through the codebook. Thus, the distortion k computed for the
256 levels of the codebook. Each level requires 16 memory access l o perform
16 aubtrastions, 16 multiplications, 15 additions, etc. Hence a large number of
primitive operations are needed.
In the binary TSVQ already presented in Section 8.3.3, the codebook is orga,
nieed into a tree structure a~ shown in Fig. 8.12. The input vector is compared
with two code vectors at each node. Based on this comparison, one of the two
branches is chosen and the eodehook search space is reduced compared to the
full search, since a reduced number of code vectors (16) is utiked. For each
comparison, at 8 specific level, an index bit is generated as shown in Fig. 8.12.
The process of comparison thmngh the tree is repeated until a leaf node is
reached. Far II codebook of 256 levels, the tree has depth of 8 (d=7). Com-
pared to the full search, the nvmber ofmemary ~ e e e s sand executing operations
508 CRAPTER
8
d=O
d=l
d=2
d=3
6.7
iedoced considerably since only 16 code vectoxs -re used in the TSVQ a l p
rithm. One VLSI implementation of the TSVQ algorithm uses systolic arrays
P21.
The number of computations can be fulther reduced by using the djffermtial

search of the TSVQ [see Eqnation (8.11)]. At each level (i) of the tree the
daferentd distortion between the left (vector a) and right (Tector 6) code
vectors connected to the level (i 1) is compnted. Therefore, the number of
~
operations is reduced. Table 8.1 [15] shows the computation complexity of the
three methods of the VQ. The differential TSVQ results in a lower number of
operations to be executed for each type.
8.4.2 Switching Activity Reduction

Minimizing the switching activity, at high level, is one way ta ieduee the power
dissipation of digital proccsso~s. This can hsve an infinenee on the power
reduction, erpedally when the switching signals have a large capseitanee. One
method to minimiae the switching activity, at the algorithmic level, is to USE
an appropriate coding for the signals rather then strakht binary code.
Algorithm Memory Multi- Add/

Access plication Substract
Full Search 4096 4096 8448

Tree Search 266 256 520
Differential 136 128 136
Tree Search
In [23], Grey-coding h s been nsed for the address lines of B microprocessor,

for both instructions Bnd data accesses, to reduce the switching activity of
the nets. The sdwntage of Gray code over binary code is that Gray code
changes by only one bit as it sequences from one number to the next. In other
words, if the memory access pattern is a sequence of consecutive addresses, then
each memory access chmgen only one bit at its address bit. Dur to instruction
locality, dudng program execution, most of the memory accesses are sequential.
Therefore the Gray code eliminates the simultanmus switches of a significant
nnmber of bits.
Table 8.2 shows B eomphrison of 3-bit representation of the binary and Gray
codes. Note that the Gray code have only one transition for reqoential change
Tabla 8.2 Binary snd Gray-oode rcpresmtstion.
Binary Grav Decimal

Code- COG Equivalent
000 000 0
110 101 6
111 100 7
510 CHAPTER
8
while the binary code may have many transitions
In 1231, the switching property of the address coding w e memured Using the
number of bit switches per executed instruction. For instroction accesses, both
the Gray and binary coding were compared wing benchmark programs. The
maximum reduction in bit switches was found to be as high as 58% and the
-
average reduction was equal to 31%. The same study was also carried out for
data addresses. The average reduction of bit switches was 8%.
8.5 POWER ESTIMATION TECHNIQUES

Power estimation means, in general, the techniques of estimating the average
powex dissipation of cirenits. The goal of t h s section is to present an overview
of power analysis techniques and took at the eleuit, gate, architectural, and
behavioral levels of sbstractian. Measuring the power consumption is cdti-
e a l for low-power design as it permits the designer to optimise power, meet
r q ~ e m e n t s ,and know the power distribution through the chip.
8.5.1 Circuit-Level Tools

The most straight-forward method of power estimation is by circuit simulation;
perform a circuit airnulation of the design and m m u e the average current
drawn fram the supply. Therefore, the average power can be estimated. The
disadvantage of this approach is that the results are strongly dependent on the
input patterns to the circuit (pattern-dependenttechnique) also called dynamic3
power simulation. If the circuit has 8 large number of inputs, thcn the circuit
simulation would be lime consuming and w e n impractical.
The most accurate power simulator to date is still SPICE.However, it can han-
dle only very small circuits (e.g, hundreds of transistors). SPICE accurately
taker into account non-linear capacitances ljunction and gate) which esnnot
be eaptvred by higher level tools. Also, it rnaccurately measwe short-circuit
and leakage currents. The latter is very important for low-VT applications.
SPICE cannot be used to estimate the power of large circuits or chips, due to
the time e o n r u i n g nature of the simulator. It is a pattern-dependent power
analysis tool.
' D y n d c l l i y computed PQWY should not bm c o d a d with dynamic power.
Another transistor-level power simulator/analyeer is PowerMdI' [24]. It a p

plies an event-driven airnulation algmithm to inere- the computation speed
by two to three oiderr of magnitude over SPICE,with an acceptable level of
aecuracy (within 10%). Also, it uses table lookup to determine the terminal
current of the device from the applied voltages.
PowerNIill can also identify the hot spots (which consnme more dynamic
power) and twuble spots (which comnme unexpectedly large amoontr ofleahge
.mulent). Moreover, elements with excessive short-circait are detected. This
allows the designer to resise the circuit to reduce the riselfall time. Static
reduced-swing nodes ace detected as shown in the example of Fig. 8.13. The
node A is charged to VDD- VT when the input is low.
Another approach far power estimation is the use of statistical techniques.

The work in [25] suggested the use of Monte Carlo simdation to ert-te
the total average power of the circait. Basically, this statiitical technique is
based on applying randomly generated inpnt patterns, a t the primary inpnto,
and monitoring the convergenee of the power dissipation. The simulation is
stopped when the measured power is dose enough to the troe average power.
This approach, based on the Monte Carlo method, requires simulation over B
large number of measurements. The advantage of the statistical techniques is
that they can be built around existing simulation tools.
'PorerMill is fromEPlC D&gn Technology.
512 CHAPTER8
8.52 Gate-Level Techniques

In order to oveccome the shortcoming of power analysis tools, at the *renit
level, recently several gatdeml estimation tools have been proposed. In this
section, we present two techuiqnes for power estimation at the gatelevel. The
first approach relies on the probabilistic method. while the second one is bared
on event-driven simulation.
8.5.2.1 Probabilistic Power Estimation

The power dissipation c a n be analyeed wing pattern-independent approach
when the sign& sre represented with probabilities (also called static tech-
niques). This approach permits to overcome the shortcomings of simulation-
baaed techniques. The nser supplier the probabilities of the primary inputs to
a logic network. The average power dissipation of a logic network is estimated
as
N
P = V&fC%C, (8.12)
i=l
where N is the nnmber of nodes in the network. With a total physical capxi-
tance Ci. ai is the switching activity (or c d e d transition probability, P,)given
by
oli = P,(1- P,) (8.13)
where P*ir the probability that the node i is at high level. In this expression of
sctivity it in assumed that the circuit input and internal nodes me independent
(spatial independence). Also the values of the same Jignal, in two consecutive
dock cycles, are assumed independent ( t e m p m l independence).
If the input probabilities to a network w e provided, then they are propagated

through the circuit to evaluate the transition probability at each node. For
example, for a 2-input AND function: y = z,.=a, the probability of the out-
put to be at high level is given by: Pu = Pz,.P*,. The computation of the
probabilities for different gates is discussed in Chaptu 4.
One tool (LTIMES),bared on probabilities, w s r h t proposed in [26]. In this

work, the temporal and spatial independence of rignds are assumed. Prac-
tically, the signals may be correlated. Also e aero-delay model wm aasumed,
which leadds to an error in ertimating the power, since the glitching power h
not accounted for.
Low-Power VLSI Desrgn Methodology 513
Probabilistic power estimation approaches that compute the power, due to

glitches, and apply a r e d delay model have been proposed [Z7, 281. In [27],
the switching activity computation is based on the tmnailion density. The
assnmption made in [ZT] is the spatial independence of the sign&. A power
estimator tool, based on the tran&tion demity, has been called DENSIM.
The transition density of a node is defined as the ayerage number of nodal
transitions per unit time. If y is a boolean function with inputs, z,, then the
boolean difference of y, with respect to zi,is defined by
By
az;
= y(=, = 1) @ y(.; = 0) (8.14)
It was shown in [29] that if 2, are spatially independent, then the density of
the boolean fonction is given by
(8.15)
where P ( z ) is the equilibrium probability of the signal over time. Equations

(8.14) and (8.15) are used to propagate the density throngh the boolean net-
work. Byfa=; is one if B transition at zi will cause a simultaneous transi-
tion at y. As an example, consider the c8se of a 2-input AND gate with
Y = ~n thi. CW, ay/a., = c2 and ay/ars = =,, that D ( ~ = )
P(Z~)D(Z +P(z,)D(ra).
I) Hence, from the probability and density d u e s , at
the p d m a y inputs of a logic network, the density at the aotput can be =om-
puted. The boolean differences of B logic network s l e calculated using Binory
Doeision Diagrams (BDDs) [30].
Note that the average power dissipation is computed by
(8.16)
The factor 112 k added to a c c o r d for the doable transition pm dock period.
This model, blued on transition density, ignores the spatial correlation of the
signals and eompntes, approximatidy, the power due to glitches. The work
in [28] attempts to handle both spatial and temporal eorrdations. One disad-
vantage of the approach in [28] is that the use of BDDs, for the whole circuit,
tends to limit the siw of the network thst can be analyzed.
The probabilistic techniques have the advantage that the user does not have
to supply dmnlation patterns and they are daimed to have fast computation
514 CHAPTERa
time. However, they do not account for the internal power of the gates and
static power dissipation. These techniques can be nsed, for example, as a fast
power estimator for logic synthesis. They might also be suited for comparing
varioos subsystem structures.
8.5.2.2 Event-Driven Simulation

Another gate level power analysis approach has been proposed for semi-cutom
design [31]. The environment of the system is shown in Fig. 8.14. The system
uses a cell library that has been charscterieed for static and dynamic pover
dissipation with the Entice' (ENergy and T h i n g Characterieation En-on-
ment) cell characterization system [32]. The dynamic power includes the power
due to the short-circuit and the one due to the load capacitance. Entice char-
acterizes each cell taking into account the following parameters: input signal
slope. output capacitive load, operating voltage, temperature, and process pa-
rameters. Entice uses SPICE as a circuit simulator to model each cell for
power.
A set of p a e r vector8 drrcribes all possible events where power can be &-
sipated by the cell for dynamic and static cases. With SPICE these power
events are accurately chanlcterised. There are two types of power vectors: i)
dynamic snd ii) static. A dynamic power vector describer an event in which
power is dissipated due to a signal switching st the cell inputs. For example,
for a 2-input ( A and B) AND gate, when A = 1and B makes a tianAtion from
0 to 1, an energy is dissipated. A ststic power vector describes the conditions
of logic signals under which leakage power OCUUS.
The designer creates a design from the cell library at gate level then it is inpnt
to the Aspen' (A System for Power EatimatioN) system. Also the stimulus
to drive the logic simulator and the interconnect loads, representing the inter-
cell connectivity (estimdea or actual d u e s provided by back-annotation from
layout) are specified. A logic simulator such as Verilog-XL' is wed as an
even-driven simulator. Upon invocation, Aspen monitors the power event
occwrence (node a~tiYity)ofeach cell and computes the total power dissipation
a8 the sum of the power dissipation of all the cells in the power vector paths.
Multiple time windows can be specified for simulation to compute the average
power O Y ~ I different time periods Note that Aspen uses the power vectors of
a cell to compute the total power.
bEnliceis from MotordsInc.

*Aspen io from Motmrola In.
'Verilog-XL is fmm Cadcncr Deign Systems In.
The dynamic power of each cell is computed by multiplying the number of

power events (transitions' count) by the energy dissipation per transition event
of I cell. This proce$s is applied to all dynamic power vectors for a cell to
obtain the total energy dissipated. The total dynamic power of a cell, over a
certain time period, is equal to the total energy divided by the t h e period.
The static power vector is used to compute the leakage of B cell. Note that
the static power of B cell is dependent on the logic state of a cell, 85 shown in
Fig 8.15. To compute the static power dissipation, the duration of activation
time of the corresponding static power vector is measured. A transition of net
signal may cause a static power vector to be activated and another vector to be
deactivated. Vectors are time stamped during aetiwtion andnpon deactivation.
Then the total time length in which the vector is active is foand. The activation
time length of the static power vector is multiplied with the power dissipation
value (per time unit) to obtain the static power of the vector. Again the static
power dissipation for aU veotors asrociatcd with a cell instance is summed to
derive the total power dissipation.
516 CHAPTER8
The results reported by Aspen, such SJ the switching activity of nodes, can
be used to drive floorplanning, placement and routing tools. Also Aspen can
handle chips with B complexity of o w e d hundred thousand gates and is four
orders ofmagnitude faster than SPICE.It prodnces results within 10% accuracy
of SPICE results. One disadvantsge of Aspen is that it cannot handle power
due to the glitches.
8.53 Architecture-LevelPower Estimation

The architecture of B design is represented by fnnctiond blocks and the com-
plexity ofthe design at this l e d is relatively low compared to the circuit lrnd
gate levels. In this section, several approaches and techniqoes for power mod-
&g and mdysia at the archi%ectomllevel are reviewed.
8.5.3.1 Gate Count Method
One tool developed for architectural power dissipation estimation is based on

epuivdent logic count, memory sise, logic circuit styles (dynamic 01 static),
interconnection busses, cLo& network a d layoat style (fdkustom or remi-
custom) [33]. The complexity of an architecture is described in terms ofaverage
number oflogic gates soch ~1a Sinat AND (bufeted NAND) gate connected to
three identical AND gates at the output node (i.e, Ianin=fanout=3) as shown
in Fig. 8.16. The total power ofthe logic part is roughly equal to the number of
gates multiplied by the power of a gate using B user specified switching activity.
This activity factor is sssumed fued acioss the design.
1
latch
The power ofthe on-chip memory is modeled for a certain memory architectnre.
The interconnections are defined in two categories, local and intermediate, and
global busses. The local interconnection is defined as interconnections within a
logic gate. The intermediate interconnections are used for connection between
gates or functional blocks (subsystems). The global bun includes data, control,
and address busses. The lengths of local and intermediate interconnections are
modeled by the Rent's rule [34]. Then the power can be computed from the
lengths u&g a fixed switching activity equal to the one specled far the logic.
The global interconnect is determined from the dimensions of the ehip and the
number of drivers/receivers connected to it.
The power model of the clock network ia bared on the H-tree [34] and the chip
dimFnsionr. The power of on-chip drivers are also modeled in two components.
One'is the power used to drive the off-chip total capacitance. The other is the
pou/er consumed by the pad driver itself. The activity factor for the pads is
P
ars med fixed and is equal to 1 [33].
T$e tool developed in [33] is used ar a power estimator in the early stage of
t#e design. It requires some technological parameters (feature siae, gate oxlde
fltickncss, p a m e t e r e of the intereonneetion layers, etc.), the snpply voltage,
the chip area, the switching fhctor and the gate count. This tool can only be
used ar a roogh estimator of the total power of the chip since the switching
activity is arrumed fixed through the design. Therefore the pourer partition
between the different units can be incorrectly estimated.
518 CHAPTER
8
8.5.3.2 The Power Factor Approximation Method

The Powcr Factor Approrunation (PFA) technique is another method to e&
timate the power dissipation [35]. It h a been used for D S P s architectnres.
The total power dissipations ofa functional block such as: multipliers, adders,
memories, etc. can be modeled by the following approximation
where G is the number of the logic gates comparing the fnnctional block, ui is
the switching activity of the ith gate, C,is the load of the ith gate, i,.,i is the
short eirenit component, and f is the frequency. This power equation can be
expressed in more compact form as
Pavg = SGf (8.18)
where x i s the PFA constant snd can be related e d y to Equation (8.17). G

can also be looked at a the hardware complexity factor instead of a number of
gates. The parameter I( has Merent d o e s for different blodts. For example
for an n-bit multiplier, thc factor G can be approldmately equal to 2 as shown
in Fig. 8.17. This is due to the number of addw eelk in the multiplier. Then,
we have
P."d< = K.".ltn2f."". (8.19)
The power supply voltage is included in the parameter IC. This parameter is
extracted e m p i ~ i d l yfrom meeaured or simulated power valuer at a h e d power
supply voltage.
For a VLSI chip, composed of several functional blocks, the t o t d power dissi-
pation can be determined by summing the power o f & bloekr. We have
PM = niG,f, (8.20)
d, b l e r l .
Thus, this PFA technique is based on modeling precharacterimd functiond

blocks. Each block has a PFA factor independent from the other. Hence this
technique provides some general methodologg compared to the gate esnivalent
model of Svenssan and Liu discussed previously. The PFA factor is extracted
using independent Uniform mile Noise (UNW) inputs (i.e, random inputs).
UWN inputs mean that the input's bit axe uncorrelated in space and time and
'Withon* ,he static power diaaip.,i.,,
independent of the data distribution. The signal and transition probabilities of

each i bit of the input are given by
Pi(1)= 0.5 and P((0+ 1) = 0.25 (8.21)
Consequently, this technique doer not account for the strong dependency of
power consumption on the statistics of the input data [36]. The next section
tr t s the ease of power modeling, taking into account the correlated behavior
ofthe bits.
I
8.6.3.3 Dun/ Bif Type Model
k
In digital signal processing, corrdation can exist between value of a temporal
~e uence of data. The UWN model can lead to an error in estimating the
power of a dreuit even if the bit-width utiliantion is maximized. To take into
account the data correlation, the Dual Bit Type (DBT) dbta model har been
proposed in [36,311. The DBT data permits accurlrte estimation of the power
dksipation.
520 CBAPTER8
P(0-1)
p =4.99
p =4.80
p = -0.60
p = 0.0
p = 0.60
p=o.80
p = 0.99
14 12 10 8 6 4 2 I1
Fig. 8.18 shows the transition activity for several different two's complement
data stream versus the bit (for an n-bit word). In this figure, eaeh enme
corresponds to B different temporal correlation given by
P = cou(Xt-l,X,)
sl (8.22)
where X,_l
corresponds to the white noise case, where P ( 0 -
and Xt are successive data (intime) and rais the variance. p = 0
1) = 0.25. From Figure
8.18 it is evident that the UWN model, while sufficient for describing activity
in the Least Significant Bits (LSBs), is inadequate for the Most Significant Bit
(MSB)region. The U N W model works correctly for the LSBs up to the break
point BPO. The MSB region corresponds to the sign bits and consequently,
the signal and transition probabilities of there bits are far from random. p > 0
eorrerpands to a lower activity for positively correlated signals, while p < 0
corresponds t o a higher activity for negatively correlated signals. T h e MSB
region starts from the break point B P I . The region between BPO and BPI
can be modeled by linear interpolation. BPO and B P 1 can be determined from
the word-level statistics [37].
The power estimation of the architecture modules is based on B black-box teeh-

nique of the switched capacitance. T y p i d modules are: adders, multipliers,
shifterr, RAMS, ROMs, ete. The power dissipation is modeled for each module
by
P = CV&f (8.23)
where the switched capacitance C is related to the compleity and the activity
of the module. For example of an n-bit dpple-carry subtractor, the switching
capacitance is modeled by
c = CGf,n (8.24)
where C,,, is a capacitive coefficient (in fF/bit) determined from the DBT
model. Ce,f can be a single coefficient for the U W N case. The DBT model
employs several codfieienti for C.,,, which reflect the data representation and
signal statistics. For the case of the subtractor, for example, B table of Cc,j is
generated as a function of all possible data transitions, i.e., i g n bits transitions
and LSB bits random transitions.
To extract the capaeitiae coefficients ofeaeh module, the library should be char-
acterbed. This operetion is performed onetime for one library. The process of
extraction consists of several steps:
I Pattern generation. Input patterns to B module are generated based

on the DBT data model. Both xandom (UWN) and sign data stlearns
should be used. The input patterns containing the U W N camponent
must be simulated for several cycles. This allows convergence of the
a~eragecapacitance.
Simulation. The generated patterns are fed to a simulator (such 85 a

circuit simulator) from which the switching capacitances ace extracted.
- -
rn Capacitive coefficient's extraction. The simulation step produces the
average effective switching capacitances for the entire series of applied
input tramitions such a: U U, S 9 , cte. The capacitive
coefficients are utracted from the effective switching capacitances and
the complexity parameters.
Based on this methodology, a power mdysis tool, at the architectural level,

has been developed [%I.
'U and S me- UWN and dgl P-S of the input bits. rmapcctively.
522 CHAPTER8
8.5.4 Behavioral-LevelPower Estimation

A behavioral representation describes the function of .e system versus a set of
inputs. The behavior can be specified, for example, by algorithms (in Vedog,
VHDL, ete.) 01 by boolean functions. The power estimation, at the behavioral
level, relates the consumed energy to the execution of an algorithm. Decisions
at the system and behavioral levels can influence the final power dissipation of
the circuit by several orders of magnitude.
One approach for power estbation, at the behavioral level, h a been proposed
in [38]. It is based on the combination of analytical and stochatic power
models. In this work, e cl- ofapplieationa such a zeal time DSPs is considered
for the power estimator. In the behavioral context, the power consnmed by a
hardware resource is given by
P = N.CV'f (8.25)
where N . is the number of accesses to the resource over the period of computa-
tion. Cis the average capacitance switched per access and f is the computation
frequency.
In [38] the power of aome hardware ielionrce~,such as execntion units, registers,

etc., are analytically modeled (using Equation (8.25)) from the Control/Data
Flow Graph (CDFG)which is used to represent the design. The average ca-
pacitance switched, per BCC~JI, for a partioular hardware is estimated from
the white noise data modd. The power consumed by hardware resources such
a controllers, interconnects, and clock network is diScult to estimate. Sta-
tistically a large number of reabed chips i used to estimate the switched
capacitance of there hardware ~esources.
8.6 CHAPTER SUMMARY

Low dynamic power techniques at several levels of abstractions have been pre-
sented. Algorithmic and architectural decisions c ~ influence
n the power dis-
sipation of a circuit by orders of magnitude. Therefore, CAD tools that help
the designer to analyee the power of the ckeuit at these levels are needed. At
lower levels of the design, the power reduction teehniqner offer some ravings
but less than the one expected at higher levels. Several powor estimation tools
have been discussed at the different levels of the design. Keep in mind that
the circuit simulators provide B high accuracy for power analyais and take into
account all power components.
REFERENCES
[I] K-Y. Chaa. and D. F. Wong. "Low Power Considerations in Floorplan

Design," Prae. of the International Workshop on Law Powev Design, pp.
45-50, April 1994.
[Z] H. V8ishnav and M. Pedram, "PCUBE A Performance Driven Placement

Algorithm for Lower Power Designs," Proc. of the EURO-DAC'93, pp.72-
77, September 1983.
[3] A. Shcn, A. Ghosh, S. Devadar, and K. Keutaer, "On Average Power Dis-
sipation and Random Pattern Testability of CMOS Combinational Logic
Network," Proc. of the International Conference on Computer-Aided De-
sign, pp. 402-401, November 1992.
[4] K. Keutaer, "The Impact of CAD on the Design of Low Power Digital
Circuits." IEEE Symposinm on Low Power Electronics, Tech. Dig., pp.
4245, October 1994.
[5] GY. Tsui, M. Pedram, and A. M. Despain, "Technology Decomposition
and Mapping Targeting Low Power Dissipation," 30th ACMfIEEE Dcsign
Automation Conference, Tech. Dig., pp.68-T3, June 1993.
[6] R. Murgai, R. K. Brayton, and A. Sangiovanni-VinEente, "Deeomposi-
tion of Logic Functions for Minimum Transition Activity," Proe. of the
International Workshop on Low Power Design, pp. 33-38, A p d 1994.
[TI V.Tiwad, P. Ashar, and S. M&,
"Technology Mapping for Low Power."
30th ACMfIEEE Design Antomation Conference, Tech. Dig.,pp.74-79,
Jrme 1993.
[a] K. Scott and K. Keutsc., "Improving Cell Libraries for Synthesis," IEEE
Custom Integrated Circuits Conference, Tech. Dig., pp. 128-151, May 1994.
[9] C. Lemonds and S. Mhhant Shetti, "A Low Power 16 by 16 Multiplier using
Transition Reduction Circuitry," Proe. of the International Workshop on
Low Power Design, pp. 139-142, April 1994.
524 DIGITALVLSI
LOW-POWER DESIGN
A. Chandrakasan, S. Sheng, and R. W. Brodcrren, '%w-Power CMOS

Design," IEEE Journal of Solid-state Circuits, "01. 27,no. 4, pp. 472-484,
A p d 1992.
U. KO,P. T. Balsam, and W. Lee, '"A Self-timed Method to Mlnimiie
Spurious Trannitionr in Low Power CMOS Cixcuit.," IEEE Symposium
on Low Power Electronics, Tech. Dig., pp. 62-63,October 1994.
[I21 R. I. Bahar, H.Cho. 0 . D. Hachtcl, E. Mac", and F. Somenzi. "An Appli-

cation of ADD-Based Timing Analysis to Combinational Low Power Re-
Synthesis," Proe. of the International Workshop on Low Power Design, pp.
139-142. April 1994.
[I31 M. Alidins, 1. Montiero. S. Devadar, A. Ghosh, and M. Papaefthmiou,
"Precomputing-Based Sequential Logic Optimization for Low-Power,"
IEEE lhnsactionr on Very Large Scale Integration Systems, vol. 2, no. 4,
pp. 426-436, December 1994.
1141 A. Ghersho, and R. Gray, "Vector Qusntisation and Signal Compression,'
Khwer Academic Pubhhers, MA, 1992.
[I51 D. B. Lidrky, and J. M. Rabaey, "Low-Power Design of Memory Intensive

Functions," IEEE Symposium on Low Power Electronic-, Tech. Dig., pp.
16-11. October 1994.
[16] A. P. Chnndrskasan, A. Burstein, and R. W. Brodersen, "A Low-Power

Chipset for B Portable Multimedia I/O Terminal," IEEE Jonrnal of Solid-
State Circuits, "01. 29, no. 12, pp. 1415-1428. December 1994.
[I71 J. Sfhut., *A 3.3 V 0.6 p m HiCMOS Superscalar Microprocessor," IEEE

International Solid-State Cholits Conf., Tech. Dig., pp. 202203,Febiuary
1994.
[I81 N. K. Yeung, Y-H.Sutu. T. Y-F.Su, E. T. Pat, C-C Chao, S. Akki, D.
D. Yau, and R. Lodenquai. "The Design o f a SSSPECint92 RISC Proces-
sor under ZW," IEEE International Solid-state Circuits Conference, Tech
Dig., pp. 206-207, February 1994.
[19] D. Pham, et sl., "A 3.0W 75SPECint92 85SPECfp92 Superscalar RISC,"
IEEE International Solid-state Circuits Conference. Tech. Dix., DO. 212-
213. February 1994
[ZO] G. Gerora, et al., "A 2.2 W 80 MHz Superscalar RISC Microprocessor."
lEEE Journal of Solid-State Circuits, vol. 29, no. 12, pp. 1440-1454, De-
cember 1994.
REFERENCES 525
[XI S. Gary, C. Diete, J. Eno, G. Geross, S. Park, and H. Sanches. "The Poa-
erPC 603 Microprocessor: A Low-Pow- Design for Portable Apphtiom,"
Proc. of COMPCON'94, Tech. Dig., pp. 307-315, February 1994.
[22] R. K. Kolagotla, S-S. Yu, and J. F. Jda, "VLSI Implementation of a 'Itee
Searched Vector Quantieer," IEEE Transactions on Signal Processing, "01.
41, no. 2, pp. 901-905, February 1993.
[23] C-L. Su, C-Y. Tsui, and A. M. Derpain, "Low Power Aichitecture Design
and Compilation Techniques foz High-Performance Processors," Proceed-
ings of COMPCON'OI, Tech. Dig., pp. 489-498, Februsry 1994.
[24] A-C Deng, "Power Analysis for CMOS/BiCMOS Circuits." Proe. of the
International Workshop on Low Pow- Design, pp. 3-8, A p d 1994.
[25] C. M. Emher, "Power Dkipation Andyysk of CMOS VLSI Circaits by
Means of Switch-Level Simulation," Proc.of the European Solid-state Cir-
cuits Conference,pp. 61-64, 1990.
1261 M. A. Cirit, "Estimating Dynamic Power Consumption of CMOS Cir-
cuits," IEEE International Conference on Computer Aided Design, pp.
534537, November 1987.
[27]F. Najm, I. Hai,and P. Yang, *An extension of Probabilistic Simulation

for Reliability Andy& of CMOS VLSI Circnits," 28th ACMjIEEE Design
Automation Conference, Tech. Dig., pp. 644649, June 1991.
[28] A. Ghosh, S. Devadas, K. Keutser, and J. White, 'Estimation of Av-
erage Switching Activity in Combinational and Sequential Circuits," 29th
ACM/IEEE Design Automation Conference, Tech. Dig., pp. 253-259. June
1992.
[29] F. N. Najm, '"A Survey of Power Estimation Techniques in VLSI Circuits,"
IEEE Transactions on Very Large Scale Integration Systems. vol. 2, no. 4,
pp. 446-455, December 1994.
[30] R. E. Bryant, "Graph-Baaed Algorithms For Boolean Function Manipula-

tion," IEEE Tmnsaetiona on Computer-Aided Design, pp. 677-691, Augort
1986.
[31] B. J. George, G. Yeap, M. G. Wloka. S. C. Tyle., and D. GossCn, "Power
Analysis for Semi-custom Design," IEEE Custom Integrated Circuits Con-
ference, Tech. Dig., pp. 249-252, 1994.
526 LOW-POWER
DIGITALVLSI DESIGN
[32] B. J. George, G. Yeap, M. G. Wloka, S. C. Tyler, and D. Goss&, "Power

Analysis and Characteridion for Semi-Custom Design," Proc. of the In-
t e r n s t i o d Workshop on Low Power Design,pp. 215-218, April 1934.
1.331 D. Lui, and C. Svensron, "Power Conramption Estimation in CMOS VLSI
Chips,' IEEE Journal of Solid-state Circuits, uol. 29, no. 6, pp. 663-610,
June 1994.
[34] A. B. Bakoglu, "Circuits, Interconnects, and Packaging for VLSI,"
Addison-Wesley, Rcading, MA, 1990.
[35] S. R. Powell and P. M. Chm, 'Estimating Power Dissipation of VLSI
Signal Processing Chips: The PFA Technique," VLSI Signal Procesing
N.pp. 250-259, 1990.
1361 P. E. Landman, and J. M. Rabaey, "Power Estimation for High Level
Synthesis," EDAGEUROASIC, Paris, Rance, pp. 361-366,February 1993.
[37] P. E. Landman, and J. M. Rahaey, "Bla&-Box Capacitance Models for
Architectural Power Analysis," Proceedings of the International Workshop
on Low Power Design, N a p , CA, pp. 165-170,A p d 1994.
1381 R. Mehra, and J. Rabaey, "Behavioral Level Power Estimation and Explo-
ration," Proceedings of the International Workshop on Low Power Design,
Nape, CA, pp. 191-202. April 1994.
INDEX
Absolute value calculator. 454 Bidirectional I/O, 229

Adders BiNMOS
carry lookahead, 412 family, 272
carry select, 420 gate design, 274
sompruison, 425 logic gates, 277
conditional I-, 423 p-transistor, 299
Manchester, 412 Bipolar
ripple carry, 410 EberrMoU model. 94
Address transition detection, 332 Gummel-Poon model, 101
Adiabatic computing, 249 high current effects, 99
ALU, 451 hwh level injection, 101
Arithmetic logic unit, 451 Kirk effect, 99
Array multiplication, 429 knee cumnt, 101
ATD,332 structure, 91
AVC, 454 technology, 21
Back-biar generator, 373 transit time, 105
Barrel rhifter, 456 Webster effect, 99
BiCMOS Bird’s beak, 30
applications, 299 Body effect, 66
BiNMOS logic, 272 Boosted voltsge generator, 377
bootstzapped, 288 Booth multiplier, 434
CEBiCMOS, 285 Bootstrapped BiCMOS, 288
comparison, 294 BSlM model, 77
complementaiy technology, 43 Buffet siring, 221
complementary, 283 By-pars capacitance, 235
conventional gate, 257 CAM, 470
delay analysis, 262 Capacitance
DSP, 303 estimation, 138
gate array, 304 fringing, 144
low-voltage families, 280 gate, 83
merged, 281 i n.w t . 139
power dissipation. 266 junction, 82
pracesser, 36 MOS. 82
quasi-complementary, 282 parasitic, 141
shunting techniques, 268 wiring, 143
528 LOW-POWERDIGITAL
VLSI DESIGN
CBiCMOS, 283 Data path, 450

CEBiCMOS, 285 Desi- roles, 44
Channel length moddation, 75 Dital d g d P I O C ~ Q S O I , 303
Chmge pump, 373 Distzibuted processing, 502
Charge sharing, 180 Domino logic, 177
Clock buffers, 226 DPL, 207
Clock distribution, 224 DRAM, 356
Clock skew, 187, 474 asceoo t i e , 359
Clock tree, 226 architecture, 359
Clacked CMOS, 183 baek-bi- generator, 373
C I O ~ boosted voltage generator, 377
singlephase, 198 ceh 359
strategy, 188 charge pump, 373
two-phase, 202 deeodez, 366
CMOS sealing, 89 half-voltage generator, 371
CMOS hierarchical word-line, 370
complex gate, 149 lowvoltage, 381
CPL, 203 refresh, 377
delay- 124 sense amplifier, 367
domino, 177 DSP, 303
DPL, 207 Dnal pass-tramistor logic, 203
dynamic, 177 Dynamic logic, 177
full-adder, 171 Early
inverter, 116 effect, 89
layout, 161 voltage, 99
NORA, 183 Ebers-Moll model, 94
power dissipation, 129 Edgetriggered D-Ripflop, 194
process technology, 14 F&, 146
peodc-NMOS, 176 Fanout, 146
SRPL, 210 Flipflop, 194
tranamistiion gate, 169 Floorplanning, 490
Zipper, 183 hequency divider, 482
Colnmn decoder, 332 FuU-adder, 171
Comparator, 455 Full-custom design, 165
Complementary BICMOS, 283 Gate array, 166, 304
Complementary pass-transistor Glitches, 160, 493
logic, 203 Ground bounce, 233
Compressor, 442 CTL, 236
Content addressable memarp:
.. 4:70 Gummcl-Poon model, 101
Control unit, 451 Gunning 110, 236
CPL, 203 Half-voltage generator. 371
current gain, 97 High level injection, 101
Indez 529
HSPICE Mobility model, 74

bipolar parsmeters, 105 MOS SPICE Models, 69
MOS parameters, 77 MOSl model, 72
110 circuits, 214 MOS3 model, 73
Input pad, 214 Multi-threshold voltage techniqne,
Isolation, 27 242
JK Bipflop, 197 Multiplexer, 171
Kink effect, 62 Multipliers
Kirk efteet, 99 Baugh-Wooley, 432
Latch, 190 Braun, 429
dynamic, 191 comparison, 450
hold time, 190 modiiied Baath, 434
setnp t i e , 190 Wanace, 442
static, 190 N-well process, 14
Leakage current, 130 Noise margin, 121
Lightly doped drain, 17 NORA logic, 183
L o 4 oxidation of silicon, 28 Output buffer, 229
LOCOS, 28 Output pad, 227
Low-power Pardel adders, 409
algorithmic-level, 507 Parallelirm. 498
arehitreturtlevel, 498 P-tranristor logic
circuit techniques, 239 complementary, 203
CMOS technology, 17 conventional, 169
DRAM, 364 dud. 203
gate-level, 490 swing restored, 203
Layout guidelines, 165 Phase IocEred loop, 473
physical design, 489 Pipelining, 500
reference voltage generator ,399 PLA, 462
SRAM, 330 Plaeement and routing, 490
Low-voltage PLL, 473
CMOS technology, 20 charge pumped loop, 414
DRAM. 381 filter, 479
MOS model, 84 phase frequency detector. 476
SRAM, 352 voltage controlled oscillator, 479
TTL, 215 Power diSsip&on
MBiCMOS, 281 components, 129
Memory dynamic, 132
DRAM, 356 estimation, 510
ROM. 467 internal, 152
SRAM, 313 measurement, 138
Merged BiCMOS, 281 short-circuit, 135
Minimum power supply, 123 stetic, 130
530
Power management, 505 equalieing, 327

Prechargc transistor, 178 hieiacbical word decoding, 350
Preeomputation, 496 law-voltage, 352
Prababilirtic power estimation, 512 ontpnt latch, 347
Programmable logic a ~ r a y462
, read cycle time, 315
Pseudo-NMOS, 176 readjwsrite circuitry, 324
QCBiCMOS, 282 row decoder. 332
Quasi-complementary BEMOS, s-e amp&, 339
282 SRPL. 210
Raee, 493 Standard-cd, 165
RAM Subthreshold current, 86
dynamic, 356 Swing restored pars-transistor
static, 313 logic, 203
Read only memory, 467
Reference voltage generator. 395
- ..
Switchiw activity. 152
Technology mapping, 491
Register file, 458 TFT, 323
Register transfer level, 498 Thin film transistor, 323
Register, 194 Threshold mltage, 66, 85
Reg& structures, 460 TLB, 470
RGM, 467 Toggle, 197
Row decoder, 332 Trench isolation, 3 1
RTL, 498 TTL. 215
RVG, 395
Scaling, 89
Schmitt trigget, 218
Self-reverse biasing, 239
Semi-custom design, 165 Vector quantiacd image encoder,
Sense amplifier. 339 502
Shift-, 456 Video compression, 502
Silicon On Insulator. 52 Voltage controlled oscillator, 479
SO1 SIMGX, 52 Voltage down convcrtez, 389
Sol. 52 Voltage levels interface, 231
SPICE, 510 Voltage-eontrolled delay h e , 482
Spnrious transition, 160, 412,493 VQ, 502
SEAM, 313 Wallace tree, 442
addrear access time, 315 webster effect, 99
architectnx, 315 Zipper CMOS logic, 183
ATD, 332
bitline prechatge, 337
cell. 318
column decoder, 332
divided word-line. 348

Low-Power Digital VLSI Design

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Low-Power Digital VLSI Design

Încărcat de

Drepturi de autor:

Formate disponibile

1

LOW-POWER VLSI DESIGN:

1.1 WHY LOW-POWER?

Recently, power dissipation is becoming an important constraint in B design.

Battery-powered systems such BS bptop/noteboak campatus, electronic

technologies. such as Nickel-Metal Hydride (Ni-MH) which provide large

rn Another issue related to high power dissipstion is reliability. With the

standard for IC operating voltage is 3.3 V (i10%). The effect of lowering

1.2 LOW-POWER APPLICATIONS

Battery-powered portable systems; for example notebooks, palmtops, CDs,

dissipation dominated by I j O devices such as hard disk ddves and LCD

750 mAH secondary NiCd

. SubGHz processors for high-perfomance workstations and computers.

1.3 LOW-POWER DESIGN METHODOLOGY

1.3.1 Power Reduction Through Process Technology

Improved devices’ charlrcteristics for low-voltage operation. This is due to

Availability of multiple and variable threshold devices. This iesults in

L (/4 I 0.50 I 0.35 1 0.25 I 0.15

Area (mm') I 8 x 10 15.6 x I I 4x5 1 2.5 x 3

1.3.2 Power Reduction Through Circuitnogic design

Use of more static style over dynamic style;

1.3.3 Power Reduction Through Architectural Design

rn Power management techniqoes where annsed blocks are shutdown;

1.3.4 Power Reduction Through Algorithm Selection

rn Minimking the number of operations and henee the number of hardware

1.3.5 Power Reduction in System Integration

1.4.1 Low-Voltage Process Technology

1.4.2 Low-Voltage Device Modeling

FETs i6 discussed. The SPICE’ device models of an 0.8 pm CMOS/BiCMOS

1.4.3 Low-Voltage Low-Power VLSI CMOS Circuit Design

1.4.4 Low-Voltage VLSI BiCMOS Circuit Design

1.4.5 Low-Power CMOS Random Access Memory Circuits

1.4.6 VLSI CMOS SubSystem Design

1.4.7 Low-Power VLSI Design Methodology

This chapter ~ e w ffia an introduction to IC fabrication of CMOS bnlk, bipolar

2.1 CMOS PROCESS TECHNOLOGY

2.1.1 N-well CMOS Process

Figure 2.1 (emtinwd)

2.1.2 Twin-Tub CMOS Process

The fabrication ofsobmicron MOS transistors requires additional process steps

2.1.3 Low-Voltage CMOS Technology

Figure I.l Twin-tub pmscss sequence

Side will Field irxidc

other pararitic capacitances. Also, the subthreshold cmrrent should be reduced

Extensions and variations of standard CMOS process have been proposed to

Table 1.1 Perforrnsnee cornperison tow-uoltsge.

[ N a m e [Ref.] I C M O S Process 1 Voltage (V)I Delay (ps) I

An example of improved performance CMOS technology suitable for low-voltage

1.5 V nsing 0.35 p m teehno1o.q is 1.3 fJ which is 113 times improvement of

2.2 BIPOLAR PROCESS TECHNOLOGY

Figure 1.7 C r o a s a d i o n d vicw of the SICOS bipolm device structure [ll]

One of the features of advanced bipolar transistors is the replacanent of aln-

In this aection, we introduce &typical DoublePolysilicon Self-Aligned (DPSA)

Si Epitaxial Laycr .. Strip resist

Deposit !he second

B P-type bare is implanted through a pre-implantation oxide as shown in Fig

The advantage of bipolar devices is their high-speed performance. However,

2.3 ISOLATION IN CMOS AND BIPOLAR TECHNOLOGIES

2.3.1 CMOS Device Isolation Techniques

Isolation in CMOS is reqnired to separate the devices electrically by elimioat-

Active Area Active Area

Figure 2.10 Fidd o y d c irol~tirmin MOS integrated circuits.

2.3.1.1 Local Oxidation ofSilicon (LOCOS)

LOCOS is a relatively simple process for the isolation of active devices in