Sunteți pe pagina 1din 4

Implementing and Evaluating Adiabatic Arithmetic Units"

Micah C.Knapp Peter J. Kindlmann Marios C. Papaefthymiou


Department of Electrical Engineering
Yale University
New Haven, CT 06520

Abstract of their energy consumption. When simulated a t oper-


ating frequencies of 10MHz and 33MHz, our arithmetic
In recent years, several adiabatic logic architectures units achieved energy savings factors of 4 and 3 , respec-
have been proposed for low-power VLSI design. However, tively.
no work has been presented describing the implementa- The remainder of this paper has five sections. In Sec-
tion and evaluation of nontrivial adiabatic circuits. We tion 2 we briefly present a few adiabatic logic structures,
have evaluated a specific adiabatic architecture and used and in Section 3 we discuss the logic-level implementa-
it in the design of low-power arithmetic units. We investi- tion issues associated with the architecture we w e d in our
gated implementation issues specific t o adiabatic system designs. In Section 4 we describe our designs and discuss
development and performed a systematic comparison of related system-level implementation issues. In Section 5
our designs with corresponding CMOS circuits. %m this we present the results of QUI- empirical comparison. We
paper we describe our adiabatic designs, discuss imple- conclude with a discussion o f possible target systems for
mentation issues a t the logic and architectural level, and adiabatic implementation.
report our empirical findings.
2. Adiabatic logic families
1. Introduction Researchers have proposed several adiabatic logic ar-
Low-power circuit design methodologies based on the chitectures. A family of structures based on reversible
thermodynamic principle of adiabatic changes have re- logic has been described in [S, 71. An architecture in-
ceived considerable attention in recent years. Adiabatic spired by bipolar transistor structures has been proposed
circuits achieve low energy consumption by maintaining in [3]. Two fairly simple adiabatic architectures based
small potential differences across their devices while they on diodes have been presented in [l, 21, and an exten-
are conducting and by allowing the energy stored in their sion of these structures that uses cross-coupled p-type
capacitors t o be recycled. Energy is supplied periodi- transistors instead of diodes has been described in [4].
cally to the gates of an adiabatic circuit by means of Inverters from the last three logic families are shown in
slowly changing clocking waveforms. These waveforms Figure 1 and are named by the number and types of de-
are driven by oscillators which recycle the energy stored vices needed t o generate them.
in the circuit's capacitors. Despite the development of Our empirical study covered the structures in Figure 1
several adiabatic logic architectures, there exist no re- and focused on the 2N2P architecture with two NMOS
ports on the difficulties associated with the development and two PMOS transistors in its inverter. All of these
of nontrivial adiabatic circuits or on the actual power adiabatic structures are dynamic, that is, the input clock
savings that have been achieved by nontrivial adiabatic provides the energy that drives the output loads of each
circuits. gate. T h e adiabatic logic families we studied require ei-
ther two or four clock phases for their operation, with
We recently performed an empirical study of several
adiabatic logic structures that have been proposed in the relative shifts of 180- or 90, respectively. These clocks
literature. We focused our investigation on the logic ar- can be sinusoidal and can be driven from free-running
chitecture described in [4] and we used it t o design adia- oscillators.
batic adders and multipliers with low energy dissipation. Our simulations of simple adiabatic structures and
In this paper we describe our designs and discuss related [I7 2, 41 indicaked that the most important operational
logic and architectural level implementation issues such characteristics for the successful utilization of adiabatic
as transistor sizing, d a t a synchronization, latency, and logic architectures are a constant load presented by the
throughput. We also present an empirical comparison of gates to the clock, the reduction or elimination of float-
our designs with corresponding CMOS designs in terms ing output nodes, the ability to generate a signal and its
complement on the same clock phase, the reduction of the
'This work was supported in part by a Young Investiga- number of clock phases needed for correct operation, the
tor Award from the US Army Research Office under Grant WO. ability to drive a high output to V d d and a low output t o
DAAHO4-95-1-0281. V,, , predictable operation under normal operating con-

7.2.1 115
IEEE 1996 CUSTOM INTEGRATED CIRCUITS CONFERENCE
0-7803-3177-6 $5.00 0 1996 IEEE
1T1D 2N-2N2D

moUT- OUT b

r
Figure 1: Adiabatic inverters with and without diodes. Figure 2: Fault in an incorrectly ratioed 2N2P inverter.

ditions, and the reduction or elimination of diodes. No both PMOS transistors x and y turn on and start con-
single logic family covered all of these requirements, but ducting. The drain of one of the two PMOS transistors,
we found that the 2N2P devices covered more than the for example x, is connected t o ground through a tree of
other structures. NMOS gates, and its output OUT-b is thus forced t o
low. Due t o the cross-coupling of OUT and OUT-b,
3. Logic-level issues concerning 2N2P transistor y remains on and turns x off when the volt-
age of signal OUT exceeds the PMOS threshold voltage.
The 2N2P logic family is very similar to CMOS CVSL The problem arises when the PMOS transistors are too
logic [5] and thus possesses several of the positive char- strong with respect t o the NMOS tree, in which case the
acteristics of conventional CMOS. Since 2N2P gates use output OUT-b t h a t should be forced t o ground through
ground nodes and avoid diodes, they can drive their out- che NMOS tree is driven t o the PMOS threshold value
put loads t o V d d and K sAlso,
~ for a t least some portion via x. When this occurs, the output OUT-b is forced to
of the clock cycle, the outputs of the 2N2P gates are the wrong state.
not floating and, therefore, are much less susceptible t o Figure 2 illustrates the logic fault that can occur with
noise. The 2N2P logic gates generate both the output sig- a 2N2P logic inverter. The top graph shows the input
nal and its complement on the same clock phase and can data. The middle graph shows the output waveform of
use the differential signal pairs to generate fairly complex a correctly functioning 2N2B inverter, where the PMOS
logic functions such as XOR in one gate. Another benefit width is 20X, the NMOS width 3X, and the output load
of generating both a signal and its complement from the is 1 O O f F . The bottom graph shows the output of a 2N2P
same gate is that the load seen by the input clock is rel- inverter exhibiting a functional fault (the second output
atively the same regardless of what output level is being should be a t 0instead of 1).In this case, the PMOS
driven. This balanced load prevents the clock oscillator width is 30X, the NMOS width is 3X, and the output load
from changing frequency due t o shifts in the capacitance is 1 O O f F .
of the circuit. Even though they are connected to the To prevent this logic fault, we can recommend, that
ground, the 2N2P logic gates can recycle a large percent- the NMOS logic trees be kept fairly small, t h a t is, not
age of the energy used t o charge their output nodes. many NMOS transistors in series between the output
Unfortunately, the 2N2P logic family has some poten- and ground, and that the PMOS transistors are kept
tially serious problems. The logic gates cannot hold an fairly weak with respect t o the NMOS transistors. Un-
output value for more than a quarter of the clock cycle fortunately, weak PMOS transistors connect the out-
and, therefore, require four clocks t o build logic circuits. put loads t o the input clock through a large resistance,
Moreover, the 2N2P logic gates use a ratioed gate struc- and therefore, increase the energy consumed by the gate.
ture such that the PMOS and NMOS transistors must be These factors must be considered and balanced correctly
sized according to the output load and t o each other. If to build a functional, low-power 2N2P logic gate. The
an incorrect ratio is used, a logic fault may occur which, 2N2P iogic gates must be used with caution, and large
in addition t o driving an incorrect output value, causes circuits built with this logic family must be thoroughly
the gate t o consume significant amounts of energy. simulated with accurate load estimates t o ensure a func-
We looked closely a t the logic fault that can occur with tional circuit,
2N2P, and discovered that it is caused by the manner in
which the logic gates evaluate their output levels. We 4. Systems implementation issues
describe the events which force a logic fault t o OCCUF in
terms of the 2N2P inverter (as shown in Figure 4). When We have used 2N2P logic gates t o design two adiabatic
the input clock reaches the PMOS threshold voltage, circuits of medium complexity, a 4x4 bit carry lookahead

7
f .2.2
116
x<3> X<l> s5
.-, ,
I
Y<O> 5-

4.5 -
k0v
4-
Y<Y>
3.5 -
P<lV
s,
Y<2> 8
a 2.5

8<2>

15
Ya3>

P<3>

Figure 4: Power vs frequency for 4x4 bit adders


Figure 3: 4x4 bit adiabatic multiplier.
number of additional gates are thus added into the cir-
adder with about 200 transistors and a 4x4 bit multiplier cuit just to maintain data synchronization. For example,
(shown in Figure 3) with about 500 transistors. Both our 4x4 bit adiabatic multiplier requires the addition of
circuits assume parallel data in and out, have been ex- 25 data buffers to maintain data synchronization. These
tensively simulated using Hspice and function correctly. buffers are denoted as thick black lines in the block dia-
We have also designed larger circuits, an 8x8 bit carry gram of our 4x4 adiabatic multiplier shown in Figure 3.
lookahead adder with about 700 transistors and an 8x8 An adiabatic logic family that only requires two clocks
bit multiplier with approximately 2500 transistors, which (where each gate introduces a half clock period delay)
also appear to be functionally correct based on prelim- wouPd require fewer data buffers but would get less work
inary simulations. For all designs we used MOSIS l p m done per period, since only two logic functions could be
process technology with V d d = 3 Volts and minimum-size performed per clock period.
transistors ( l / w = 3X/2X, where X = .5pm). Every adiabatic logic family we investigated uses
We have identified three important system design is- mostly analog signals t o perform digital functions, thus
sues with 2W2P that are mostly applicable t o system de- allowing the logic to recycle energy and limit the power
sign with any adiabatic logic family. The first issue is dissipated by the circuit. Unfortunately, it is impossible
that adiabatic circuits have limited potential t o achieve to simulate such systems with simple digital simulators.
both significant energy savings and high performance. More complex analog simulators such as Hspice must be
One of the largest performance limiting factors is that used to not only test critical circuits and special corner
the dynamic gates in an adiabatic circuit can only per- cases, but also to test the functionality of the circuit.
form one logic evaluation per clock cycle. Since the gate For more complex circuits, the computing requirements
clock frequencies must be slow to achieve savings in en- can be very high. For example, simulations of our 4x4
ergy consumption, the overall circuit speed can be v e r y multiplier with 256 input test vectors take on tRe order
slow. For example, consider an adiabatic circuit con- of 4 hours to complete using Hspice on a state of the
sisting of four cascaded logic gates. The latency of this art PowerPC-based workstation with 64MB main mem-
circuit is 100ns, assuming a lOMHz gate clock frequency. ory. Since adiabatic systems must be thoroughly tested
The upper bound on the throughput is lOMHz (that is, t o ensure correct functionality, large scale development
new input data every loons), since each gate in the circuit of adiabatic systems can be very expensive in terms of
can analyze new data only once per cycle. Since 2N2P time and resources.
circuits have large latencies and relatively low through-
puts, each gate must perform as complicated a function 5. Systems evaluation
as possible in order to increase system performance. In We compared the energy consumption of our adiabatic
Section 3, however, we saw that a long and weak NMOS arithmetic units with that of equivalent CMOS circuits
Cree can lead to logic faults with 2N2P. Thus, the com- that we designed for minimum energy consumption. The
plexity of the functions that can be performed reliably input data rates for the CMOS circuits were identical to
by any gate without excessively increasing its size and those for the adiabatic circuits, and therefore, our com-
energy consumption is limited. parison was based on energy consumed per computation
Another critical system issue that stems from the dy- (the circuits did equal amount of computing during equal
namic nature of the adiabatic gates is data synchroniza- amounts of tiime). We reduced the amount of glitching
tion. Since every gate introduces a phase delay from in the CMOS circuits using standard CMOS design tech-
input to output, two signals that originate from different niques. Since the energy consumed per operation in a
clock phases must be synchronized in order to be used CMOS circuit can be approximated by C V&?we used
0

as inputs to the same adiabatic gate. This requirement minimum sized transistors in a11 CMOS gates in order to
leads to the insertion of data buffers, each of which shifts minimize the overall circuit capacitance.
B signal by one phase (a quarter clock period). A large Figures 4 and 5 show the energy consumption per clock

7.2.3
117
10L '

9-
Y
0
I
L
8-
i
H1
7-

g
1
6-

B 5-

4- Y

3-
i
1
U
2-

1-
5 io is m 26 30 35 40

Figare 7: Latency in 4x4 bit CMOS multiplier.


adiabatic adder is 4 clock phases, and the latency of our
adiabatic multiplies is 12 clock phases.

6. Conclusion
h r g e scale system development using adiabatic tech-
nologies is =ore complex than conventional CMOS cir-
cuit development because of d a t a synchronization and
simuiation issues. Moreover, adiabatic circuits have large
latencies due to the dynamic nature of their gates.
Another subtle yet important performance issue with
adiabatic systems is that although it is possible t o build
adiabatic memory elements, these elements have a la-
tency of one clock cycle. Therefore, they can only be
clocked at a maximum frequency of half the gate clock
frequency. The implication of this fact is t h a t the clock
frequencies specified by default for adiabatic and CMOS
circuits are not directly comparable, since the former usu-
Figure 6: Latency in 4x4 bit adiabatic multiplier ally refers t o g a t e frequencies, and the latter refers t o
system frequencies.
cycle for our 4x4 carry lookahead adders and 4x4 mul- Therefore, it seems that adiabatic technology is more
tipliers, respectively. The graphs were obtained from suitable for low-speed, combinational circuits.
Mspice simulations of 256 test vectors a t d a t a rates gf
10, 16, 25, and 33MHz. Our tidiabatic circuits exhibit References
significant savings in energy consumption, especially at
low clock frequencies. At 10MHz the relative savings fac- J. Denker, S. Avery, A. Dickinson, A. Kramer, and T. Wik.
Adiabatic computing with tRe 2N-2N2D logic family. In 1994
tor is 4 , and a t 33MHz the relative savings factor is 3. Internationai Workshop on Low Power Design, April 1994.
Preliminary simulations of our 8x8 bit designs indicate A . Dickinson and 9. Denker. Adiabatic dynamic logic. In CIGC,
power savings similar to those achieved with the 4x4 bit L994.
circuits. Fi. Hinman and M. Schlecht. Recovered energy logic: A highly
Efficient aiternative t o today's logic circuits. In IEEE Power
Figures 6 and 7 illustrate the performance issues in Electronics Specicalists Conference Record, pages 17-26, 1993.
adiabatic systems design. Figure 6 illustrates the latency A. Kramer, 3 . 5 . Denker, B. Flower, and J. MuProny. 2nd order
of our 4x4 bit adiabatic multiplier running a t 33Mhz. adiabatic computation with 2N-2P and 2N-2N2P logic circuits.
The top graph shows one of the four gate clocks, the In 1995 International Workshop on Low Power Design, April
1995.
middle graph shows the input, and the bottom graph N. W e s t e and K. Eshraghian. CMOS V L S I Design. Addison-
shows the output which is lagging five phase delays after Wesley, Reading, Massachusetts, 1985.
the input signal. When compared with CMOS, adiabatic S. Younis and T. Knight. Practicalimplementationof charge re-
circuits suffer from very long latencies. In Figure 7, the covering asymptotically zero-power CMOS. In Research i n In-
top graph shows the input and the bottom graph shows tegrated Systems: Proceedings o f the 1993 Symposium, March
1993.
the output for the CMOS multiplier. In this circuit, the S. Younis and T. Knight. Asymptotically zero energy split-level
input/output latency is 8ns and the effective throughput charge recovery logic. In Proceedings of the 1 9 9 4 International
can reach 125MHz. Our 4x4 adder has a latency of 3 Workshop o n Low Power Design, March 1994.
clock phases, which is still much more significant than
that of the CMOS adder. For the 8x8 bit designs, the
differences are even more significant. The latency of our

7.2.4
118

S-ar putea să vă placă și