Sunteți pe pagina 1din 16

418

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Minimization of Switching Activities of Partial Products for Designing Low-Power Multipliers


Oscal T.-C. Chen, Sandy Wang, and Yi-Wen Wu
AbstractThis work presents low-power 2s complement multipliers by minimizing the switching activities of partial products using the radix-4 Booth algorithm. Before computation for two input data, the one with a smaller effective dynamic range is processed to generate Booth codes, thereby increasing the probability that the partial products become zero. By employing the dynamic-range determination unit to control input data paths, the multiplier with a column-based adder tree of compressors or counters is designed. To further reduce power consumption, the two multipliers based on row-based and hybrid-based adder trees are realized with operations on effective dynamic ranges of input data. Functional blocks of these two multipliers can preserve their previous input states for noneffective dynamic data ranges and thus, reduce the number of their switching operations. To illustrate the proposed multipliers exhibiting low-power dissipation, the theoretical analyzes of switching activities of partial products are derived. The proposed 16 16-bit multiplier with the column-based adder tree conserves more than 31.2%, 19.1%, and 33.0% of power consumed by the conventional multiplier, in applications of the ADPCM audio, G.723.1 speech, and wavelet-based image coders, respectively. Furthermore, the proposed multipliers with row-based, hybrid-based adder trees reduce power consumption by over 35.3%, 25.3% and 39.6%, and 33.4%, 24.9% and 36.9%, respectively. When considering product factors of hardware areas, critical delays and power consumption, the proposed multipliers can outperform the conventional multipliers. Consequently, the multipliers proposed herein can be broadly used in various media processing to yield low-power consumption at limited hardware cost or little slowing of speed. Index TermsAdder-tree, arithmetic, digital, low-power design, switching activity.

switching activities [3]. Thus, switching activities within the functional units of a multiplier account for the majority of the power dissipation of a multiplier, as given in the following: (1) where is the switching activity parameter, is the loading cais the operating voltage, and is the operating pacitance, can also be viewed as the effective switching cafrequency. pacitance of the transistors nodes on charging and discharging. Therefore, minimizing switching activities can effectively reduce power dissipation without impacting the circuits operational performance. Many researchers have elucidated various approaches that use modified algorithms, architectures, and circuits to reduce power consumption [4][9]. Abu-Khater et al. developed circuit techniques for low-power, high-performance multiplier designs [4]. Moshnyaga et al. analyzed the algorithmic, structural, and circuit levels, and used sign generation and 42 compressors to minimize switching activities [5]. Angel and Swartzlander suggested using an efficient sign extension scheme to process the sign bits [6], allowing the multiplier to bypass processing sign extensions, thus reducing power dissipation. Yu et al. reorganized a Booth-encoded carry-save adder array in a multiplier design to reduce power consumption [7]. Goldovsky et al. developed modified radix-4 Booth encoders to generate partial products that are summed by (3,2), (5,3), and (7,4) counters in an array with reducing sum and carry vectors [8]. Mahant-Shetti et al. employed a bottom-up temporal tiling approach to design a leapfrog array multiplier that minimized spurious transition activity [9]. In this work, low-power multipliers are investigated by minimizing switching activities of partial products according to effective dynamic ranges of input data. In designing the proposed low-power multipliers, the radix-4 Booth algorithm is utilized to reduce the complexity of implementation. For every two input data, the one with a smaller effective dynamic range is processed to yield several Booth codes. According to the Booth codes, the other datum is multiplied with 2, 1, 0, 1, or 2 to generate partial products that are then shifted and summed in parallel to yield the final result. Hence, these partial products have a greater chance of equaling zero because of the Booth encoding the datum with a smaller effective dynamic range. Furthermore, the switching activities of partial products decrease, implying a decline in power dissipation. To realize the proposed multipliers, the dynamic-range determination units can be easily designed in front of the Booth decoders and adder trees, to switch or pass input data flows where the adder trees

I. INTRODUCTION DVANCES IN microelectronic technology have led to more effective encoding of data, more reliable transmission of information, and more embedded intelligence in systems. In particular, to meet the increasing market demand for portable applications, these microelectronic devices consume very low power. Consequently, various digital signal processing chips are now designed with low-power dissipation [1], [2]. In such systems, a multiplier is a fundamental arithmetic unit. The computation of a multiplier manipulates two input data to generate many partial products for subsequent addition operations, which in the CMOS circuit design, require many
Manuscript received July 4, 2000; revised April 2, 2002. This work was supported in part by the Computer and Communication Research Laboratories, ITRI, Taiwan, under Contract TI-89024, and in part by the National Science Council, Taiwan, under Contract 88-2736-L-194-003. The authors are with the Department of Electrical Engineering, Signal and Media Laboratories, National Chung Cheng University Chia-Yi, 621, Taiwan R.O.C. (e-mail: oscal@ee.ccu.edu.tw). Digital Object Identifier 10.1109/TVLSI.2003.810788

1063-8210/03$17.00 2003 IEEE

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

419

(a) Fig. 1.

(b)

The proposed multipliers. (a) The column-based multiplier. (b) The row-based or hybrid-based multiplier.

can be implemented by the column-based, row-based and hybrid-based structures. The proposed multipliers, using columnbased, row-based and hybrid-based adder trees are named as the proposed column-based, row-based and hybrid-based multipliers, respectively. When only the dynamic-range determination unit is used in front of the conventional multiplier that uses counters and compressors, such a multiplier is denoted as the proposed column-based multiplier, as shown in Fig. 1(a). The conventional Booth-algorithm multiplier adds partial products in the column direction. Although partial products are more likely to be zero in the proposed column-based multiplier than in the conventional one, some compressors or counters which sum these zero products may consume power because they add the switched sum or carry-out bit of neighboring compressors or counters. To improve upon this, additions of partial products in the row direction are proposed to reduce the number of partial products connected to each adder unit, and the number of intermediate accumulation results connected to each adder unit. With this multiplier, only some functional units can be activated to conduct operations according to the one of two input data, with a smaller effective dynamic range [10], [11]. Switching activities of the unused functional blocks are minimized where

input bits of unused functional blocks remain unaltered. However, to have capability of preserving the previous states for unused functional blocks, the proposed row-based multiplier requires more flip-flops than the proposed column-based one. The states of input data stored in the flip-flops can be changed by a group of bits such as 4, 6, and 8 bits to reduce the number of flip-flops. On the other hand, the critical delay of the proposed row-based multiplier is also longer than that of the proposed column-based multiplier because of adding in the row direction. This situation is improved by developing the hybrid-based adder tree which integrates column-based and row-based structures in the proposed hybrid-based multiplier. These two multipliers include master-stage flip-flops, a dynamic-range determination unit, slave-stage flip-flops, Booth decoders, a row-based adder tree or a hybrid-based adder tree, and a sign-extension unit, as depicted in Fig. 1(b). In this study, the low-power 2s complement Booth-algorithm multipliers based on column-based, row-based, and hybrid-based adder trees are implemented using TSMC 0.25 m CMOS technology. The proposed column-based multiplier increases the probability that partial products become zero for power reduction. The proposed row-based and hybrid-based multipliers not only reduce the bit switching of

420

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Fig. 2. The proposed column-based 16

2 16-bit multipliers.
flip-flops are to latch input data for the dynamic-range determination unit to decide the input data flow and generate control signals. The slave-stage flip-flops store the updated input data or retain previous data. Dynamic-Range Determination Unit The dynamic-range determination unit detects effective dynamic ranges of input data, and then generates control signals. In the proposed column-based multiplier, these control signals determine the data flows between the mater-stage and slave-stage flip-flops. In the proposed row-based and hybrid-based multipliers, the control signals not only select the data flows but also manipulate slave-stage flip-flops to maintain noneffective bits in their previous states, and thus ensure that the functional units addressed by these data do not consume switching power. Additionally, these control signals are used to control the data path of an adder tree and the sign extension operation. The effective dynamic range detection can be realized using groups of bits to simplify the implementation. In this study, a basic group is based upon two bits for detection, since a partial product is determined by an average of two bits of an input datum in the radix-4 Booth encoding. Fig. 5 shows the functional blocks of the dynamic-range determination unit that includes comparators, logic gates, multiplexors, and latches. Data detection begins from the most significant bits, and the comparators examine each 3-bit group, but not the four least significant bits. If these three bits are all either zero or one, then a control signal output is 1; otherwise it is 0. An overlapped bit in

partial products, but also minimize the power consumption of functional units for noneffective bits. Moreover, in Appendix 1, equations are derived to demonstrate that the proposed multipliers exhibit partial products with low switching activities. The multiplication operations of the practical data are analyzed in the proposed multipliers that consume less power than the conventional multipliers. Consequently, the multipliers proposed herein are very well suited to low-power multimedia processing at reasonable hardware cost or little reduction of speed. II. PROPOSED LOW-POWER MULTIPLIERS Figs. 24 show the proposed column-based, row-based, and hybrid-based 16 16-bit multipliers, respectively, to demonstrate the fact that the proposed multipliers have low-power consumption. In these three kinds of multipliers, Booth encoding is performed through the radix-4, resulting in eight partial products for summation in the column-based, row-based, and hybrid-based adder trees. The functional units of the proposed low-power multipliers are described as follows: Master-Stage and Slave-Stage Flip-Flops The master-stage and slave-stage flip-flops are realized using the true-single phase edge-triggered circuit, as shown in Fig. 3(a) [12]. This type of circuit design has both high-speed and low-power dissipation characteristics. The master-stage

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

421

Fig. 3. The proposed row-based 16

2 16-bit multipliers. (a) Mode I. (b) Mode II.

422

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

(c)

(d)

Fig. 3.

Continued. (c) Mode III. (d) Mode IV.

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

423

Fig. 4. The proposed hybrid-based 16

2 16-bit multipliers. (a) Mode III. (b) Mode IV.

424

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Fig. 5. The dynamic-range determination unit.

the neighboring two groups is used to support a continual comparison. For a 16-bit datum, six groups are compared to determine the effective dynamic ranges of 4, 6, 8, 10, 12, 14, and 16 bits. Herein, a 16 16-bit multiplier has two input data, the effective dynamic ranges of which are determined by 12 3-bit comparators. The one of two input data having a smaller effective dynamic range can be determined by logic operations on the signals from the comparators. The signal, indicating a datum with a smaller effective dynamic range, is generated to control multiplexors in the switcher of the dynamic-range determination unit to manipulate the input data flow. Furthermore, this and other signals, indicating the effective dynamic ranges of input data, address the control signal generator of the dynamic-range determination unit to yield the control signals that manipulate the slave-stage flip-flops, multiplexors of a row-based or hybridbased adder tree, and a sign-extension unit. Four bits or more can constitute a basic group of data that are either changed or unchanged in slave-stage flip-flops together, to reduce the number of the slave-stage flip-flops in the proposed row-based multipliers, because each of the functional units after the slave-stage flip-flops requires at least two partial products to be computed. Herein, when effective dynamic ranges of input data randomly occur between 1 and 16 bits, four operational modes are considered for analysis simplification. 1) Mode I that operates on 4, 8, 12, and 16 bits; 2) mode II that operates on 8, 12, and 16 bits; 3) mode III that operates on 8 and 16 bits, and 4) mode IV that operates on 12 and 16 bits. Only modes III and IV of the proposed hybrid-based multipliers are explored by considering the reduction of processing speed. A Booth Decoder The radix-4 Booth decoder can generate five possible values of 2, 1, 0, 1, and 2 times the input datum. The proposed radix-4 Booth decoder, shown in Fig. 3(a), includes a 3-to-1 multiplexor and simple logic gates to select the decoded value of 0, 1 or 2 times the input datum, or to invert the output value. An Adder Tree The carry-save adders, (3,2), (5,3), and (7,4) counters, and a leapfrog adder array applied in the Yu, Goldovsky, and MahantShettis multipliers [7][9], respectively, are applied in the adder trees of the proposed column-based multipliers

for comparison. The proposed row-based multipliers require seven ripple adders and multiplexors that are arranged in four operational structures, as shown in Fig. 3. The hybrid-based adders, shown in Fig. 4, include the row-based adders and the column-based adders using Yus approach. The eight partial products are grouped into two parts which are individually summed in the column-based adders, and the results from these two parts are added by using the row-based adder. Sign-Extension Unit The sign-extension unit is used only in the proposed rowbased and hybrid-based multipliers. By using the control signals of the dynamic-range determination unit, only input bits in the effective dynamic range are allowed to move to the slave-stage flip-flops. Input bits in the noneffective dynamic range remain in their previous states such that no switching activities consume power. Here, the effective dynamic ranges of input data are determined by a group of bits as a basis, such that the detected effective dynamic-range values may exceed the actual ones. After an adder tree performs addition, the results in the effective and noneffective dynamic ranges have correct and incorrect values, respectively. Sign extension must be assigned to the output result in the noneffective dynamic range to restore the correct value in the final step. Fig. 6 shows the functional blocks of the sign-extension unit in four different operational modes. Herein, multiplexors were used to decide which bits were signs and which were values. III. POWER ANALYSES The proposed 16 16-bit 2s complement Booth-algorithm multipliers using the column-based, row-based and hybrid-based adder trees are implemented by the Cadence tool, using TSMC 0.25 m CMOS technology to generate their layouts. These layouts are extracted, and post-simulated by the Power-mill and Time-mill tools. Here, the widths/lengths of the pMOS and nMOS transistors are 2.5 m/0.25 m, and 1.0 m/0.25 m, respectively, for most circuit cells in the conventional and proposed multipliers, except in the slave-stage flip-flops and the carry propagation adder in the last stage of the adder tree. Considering the driving capabilities of slave-stage flip-flops and the processing speed of the carry propagation

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

425

Fig. 6. The sign-extension unit. (a) Mode I. (b) Mode II. (c) Mode III. (d) Mode IV.

adder, their transistors are sufficiently enlarged for use in both the proposed and conventional multipliers. Adaptive differential pulse code modulation (ADPCM) audio, G.723.1 speech, and wavelet-based image coders are employed in practical power analyzes. Their multiplication operations are performed using a multiplier that is either the proposed or conventional multiplier. In the ADPCM audio coder, a 0.125-second segment of audio is analyzed, in which the multiplication operations of low-pass and high-pass band-splitting, and signal prediction involve 17 367 input vectors. In the G.723.1 speech coder, the multiplication operations involved in autocorrelation of linear prediction coding for 0.05-second speech signals sampled at 8 KHz have 26 697 input vectors. In the wavelet-based image coder, one fortieth of the multiplication operations of the 512 512-pixel Lenna image

through the 5-tap low-pass and 3-tap high-pass filtering of the wavelet filters are performed and involve 19 117 input vectors. Fig. 7 shows the histograms of effective dynamic ranges of input data for multiplication in these three applications. Table I lists the power consumption, areas and critical delays of the conventional and proposed column-based multipliers in these three applications. The proposed column-based multipliers that use the approaches of Yu, Goldovsky, and Mahant-Shetti, consume less power than the conventional Yus, Goldovskys and Mahant-Shettis multipliers. Goldovskys multiplier requires a larger hardware area than the other two conventional multipliers since it uses the condition-sum adder in the last stage of its adder tree. In Mahant-Shettis multiplier, the sum output of a full adder is linked to the sum input of the subsequent adder using a leapfrog connection, such

426

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Fig. 7. The histograms of effective dynamic ranges of input data for multiplication in the three practical applications. (a) ADPCM audio coder. (b) G.723.1 speech coder. (c) Wavelet-based image coder.

that this multiplier requires more full adders to realize its adder tree than Yus multiplier. Yus multiplier includes the adder array for adding from the most to the least significant bits. Here, a further modification connects the sum and carry outputs of a carry save adder to the carry and sum inputs of the subsequent carry save adder, respectively. The proposed column-based multiplier, using Yus approach, consumes less power than the other two proposed column-based multipliers. Additionally, it uses 31.2%, 19.1%, and 33.0% less power than the Yus multiplier, to realize the ADPCM audio, G.723.1 speech and wavelet-based image coders, respectively. Here, the dynamic-range determination unit consumes 7.1% less power consumption in the proposed column-based multiplier using Yus approach. Fig. 7 shows that the effective dynamic ranges of input data from the wavelet-based image coder vary less and are smaller than 9 bits. Hence, the proposed column-based multipliers computing the wavelet-based image coder, can effectively switch or pass the input data flow to encode input

data of which effective dynamic ranges are smaller than 9 bits, and thus they consume less power than those computing the ADPCM audio and G.723.1 speech coders. Table II lists the power consumption, areas and critical delays of the proposed row-based and hybrid-based multipliers for these three applications. The proposed row-based and hybrid-based multipliers in modes III and IV consume less power than the proposed column-based multipliers. Additionally, Tables I and II illustrate that the row-based multiplier in mode IV consumes the least power. The proposed row-based and hybrid-based multipliers in mode IV save more than 35.3%, 25.3%, and 39.6%, and 33.4%, 24.9%, and 36.9% of the power in Yus multiplier to realize the ADPCM audio, G.723.1 speech and wavelet-based image coders, respectively. Nevertheless, the proposed column-based, row-based and hybrid-based multipliers exhibit more than 0.0%, 21.3%, and 2.5% of the critical delay, and more than 12.6%, 14.8%, and 12.6% of the hardware area of Yus multiplier, respectively, when the

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

427

TABLE I POWER CONSUMPTION, AREAS AND CRITICAL DELAYS OF THE CONVENTIONAL AND PROPOSED COLUMN-BASED MULTIPLIERS

TABLE II POWER CONSUMPTION, AREAS AND CRITICAL DELAYS OF THE PROPOSED ROW-BASED AND HYBRID-BASED MULTIPLIERS

POWER CONSUMPTION

OF THE

TABLE III PROPOSED COLUMN-BASED, ROW-BASED AND HYBRID-BASED, AND YUS 16 EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH UNIFORM DISTRIBUTIONS

2 16-BIT MULTIPLIERS FOR

operational mode IV is utilized. The power dissipation of the dynamic-range determination unit and sign-extension unit is less than 8.8% of those of the row-based and hybrid-based multipliers in mode IV. When considering the factor of multiplying power consumption, areas and critical delays, the proposed hybrid-based multiplier in mode IV performs best in these three applications and the second best performer is the proposed column-based multiplier.

The proposed column-based multiplier that follows Yus approach, the proposed row-based and hybrid-based multipliers in modes III and IV, and Yus conventional multiplier are chosen from Tables I and II for a comparison that involves the effective dynamic ranges of input data with uniform and Gaussian distributions. Here, each distribution case involves 15 000 input vectors, where the signs of the input data are randomly generated. Table III lists the power consumption of the proposed

428

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

TABLE IV POWER CONSUMPTION


OF THE PROPOSED COLUMN-BASED, ROW-BASED AND HYBRID-BASED, AND YUS 16 EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH GAUSSIAN DISTRIBUTIONS

2 16-BIT MUTIPLIERS FOR

and conventional multipliers for uniformly distributed effective dynamic ranges of input data. The saving ratios of power consumption of the proposed column-based multiplier against that of the conventional multiplier increase with the effective dynamic ranges of the input data. The proposed row-based and hybrid-based multipliers in mode III have the largest power saving ratios for effective dynamic ranges of input data between 1 and 8 bits, whereas these two multipliers in mode IV have the largest power saving ratios for effective dynamic ranges of input data between 1 and 12 bits. This effect reveals that operational modes III and IV can match the effective dynamic ranges of input data from 1 to 8 bits and from 1 to 12 bits, respectively. Table IV specifies the power consumption of the proposed and conventional multipliers when effective dynamic ranges of input data follow the Gaussian distributions with different means and standard deviations. The effective dynamic ranges of input data increase with the mean, increasing power consumption. However, for a given mean, larger standard deviations facilitate increased power savings because of an increased probability of encoding the data with smaller effective dynamic ranges. Tables III and IV reveal that the proposed row-based or hybrid-based multipliers in modes III or IV consume the least power for various effective dynamic-range distributions. Therefore, the multipliers proposed herein consume less power by reducing the switching activities of partial products to realize various low-power multimedia applications. The results of the previous 16 16-bit proposed and conventional multipliers are analyzed to effectively utilize the proposed column-based, row-based, and hybrid-based multipliers. The proposed column-based, row-based, and hybridbased multipliers have 1.00, 1.21, and 1.03 times the critical delay, and 1.13, 1.15, and 1.13 times the hardware area of Yus conventional multiplier, respectively, when operational mode IV is utilized. Furthermore, the proposed row-based and hybrid-based multipliers can conserve more power than the proposed column-based multiplier. When neighboring input data have similar effective dynamic ranges and the same sign,

TABLE V PROBABILITIES OF THE BOOTH DECODED VALUES BEING 2Y , Y , 0, Y , AND 2Y

the proposed column-based multiplier can be cost-effective. When two neighboring input data have a large dynamic-range difference, the proposed row-based and hybrid-based multipliers can effectively save power when their operational modes are selected to match the effective dynamic-range distribution of input data. In addition, the proposed row-based multiplier may consume less power but has a longer delay than the proposed hybrid-based multiplier. Users can thus determine a proposed multiplier that is suited to their applications by considering the chip area, speed, power consumption, and data type. IV. CONCLUSION The three proposed Booth-algorithm multipliers are demonstrated to dissipate less power than conventional ones. These three multipliers are equipped with dynamic-range determination units to add partial products in the column-based, row-based and hybrid-based adder trees. The dynamic-range

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

429

TABLE VI PROBABILITIES OF BOOTH DECODED VALUES AT DIFFERENT EFFECTIVE DYNAMIC RANGES

determination unit detects the one of two input data, with the smaller effective dynamic range for Booth encoding, minimizing the switching activities of partial products. Additionally, the DRD unit of the proposed row-based and hybrid-based multipliers controls the slave-stage flip-flops to store effective dynamic-range bits of an input datum, manipulates the data flow of an adder tree, and determines the operation of the sign-extension unit for further power reduction. The power analyzes of multiplication operations of the practical input data confirmed that the proposed 16 16-bit column-based, row-based, and hybrid-based multipliers dissipate less power than Yus conventional multiplier. The proposed hybrid-based multiplier is the best and the proposed column-based multiplier is the second best in terms of the product factors of hardware

areas, critical delays and power consumption. Consequently, the proposed low-power multipliers can be used in various practical applications with a small increase in hardware complexity or critical delay. Finally, power consumption, hardware complexity, processing speed, and data types are the most important considerations of the cost-effective selection of the proposed column-based, row-based, or column-hybrid multiplier. APPENDIX THEORETICAL ANALYSES OF SWITCHING ACTIVITIES The theoretical foundation is derived to illustrate the reduction of switching activities for the partial products of the pro-

430

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

posed 2s complement multiplication. The radix-4 Booth algorithm is usually applied to encode one of two input data, and [13]. If a series of data, , , is used for , is partitioned into several Booth encoding, then a datum, 3-bit groups, each of which has one bit that is overlapped with with a the previous group. Hence, the 2s complement of word length of , can be represented by

average switching activity of the partial product, can be approximately given by the following:

(2)

is the th digit of , and equals zero. Here, where is assumed to be an even number. Multiplying the other input datum, by , (2) is modified to,

(5)

(3)

where the output value from the radix-4 Booth decoder has a bits. Here, input data are assumed to word length of be uncorrelated and switch simultaneously. In addition, neighboring partial products are independent and simultaneously change their states without glitching, and thus have an average of one half of the bits with switching. Furthermore, the average switching activity of all partial products is

is an intermediate product that According to (3), , , 0, can be represented by five different values of , and . Table V lists occurrence probabilities of these five , except for , are uniformly values when three bits of distributed as 0 or 1. Table VI presents the occurrence probabilities of the Booth decoded values, obtained by considering the effective dynamic of , , , ranges of , where and designate probabilities associated with as , , 0, , and , respectively, for the has an effective dynamic range of bits. If case in which , the effective dynamic ranges of have probabilities of , where indicates the probability that the effective dynamic range is bits, then the probability , is zero can be derived as that each partial product, the following:

(6)

for

(4)

and The relationship between can be classified simply as four cases of changes of partial products1) from zero to zero, 2) from zero to nonzero values, 3) from nonzero values to zero, and 4) from nonzero to nonzero values. Switching activities occur in cases 2), 3), and 4). The

According to (6), the switching activity can be reduced when increasing the probabilities that the partial products are zero. is a fixed value for an effective dynamic From Table VI, range of bits. Hence, altering the distribution of can effec, minimizing switching activitively increase the value of ties. Table VI, and (4) and (6) reveal that the minimum average switching activity occurs when the effective dynamic range of is only 1 bit. In this case, equals where is 0.5 and is 1 for greater than 0. Equation (6) represents the average switching activity of partial products of the conventional multiplication, using the radix-4 Booth algorithm. The partial products from Booth decoders that operate on the most significant bits of input data are more likely to become zero when the proposed column-based multiplication, as shown in Fig. 1(a), is employed by Booth encoding the one of two input data, with a smaller effective dynamic range. Additionally, the dynamic-range determination unit has a detection resolution of 2 bits and determines effective dynamic ranges larger than 4 bits. Accordingly, the probability

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

431

TABLE VII AVERAGE SWITCHING ACTIVITIES OF PARTIAL PRODUCTS OF THE PROPOSED AND CONVENTIONAL BOOTH-ALGORITHM MUTIPLIERS EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH UNIFORM DISTRIBUTIONS

FOR

that the effective dynamic range of input data is bits for Booth encoding can be formulated as for

from zero to nonzero values, from nonzero to nonzero values, and from nonzero values to zero. Thus, the switching activity of the th partial product can be formulated as

for an odd number (10) represents the least number in the predetermined data where , and ( ) belongs to . For example, the prorange for posed hybrid-based 16 16-bit multiplier in mode III has two and where includes from predetermined data ranges, an even number. from 9 to 16 bits. When is 5, ( ) equals 1 to 8 bits and (7) 11 and thus belongs to : then is 9 and is used in Eq. (10). ( ) with in Eq. (4), yields the probability that Consequently, the average switching activity of all partial prodReplacing the partial product from the Booth decoder is zero: ucts for the proposed row-based or hybrid-based multiplication is for (8) The average switching activity of all partial products within the proposed column-based multiplication are then represented by (11) (9) Only partial products from the effective dynamic range of an input datum for Booth encoding are switched and the others remain in their previous states. These additional reduced switching activities come primarily from the changes of effective dynamic ranges of two neighboring input data for Booth encoding, from large to small. As well as the Booth encoding smaller dynamic-range numbers, the proposed row-based and hybrid-based multipliers, shown in Fig. 1(b), perform additions of partial products at effective dynamic data ranges to save power. Several grouped data ranges are allowed for preserving the previous states to reduce the number of the slave-stage flip-flops. Hence, their switching activities occur when partial products, within the grouped effective dynamic ranges, change According to (6), (9) and (11), the average switching activities of partial products for the conventional and proposed multiplication can be analyzed for various effective dynamic ranges of input data. Here, 16 16-bit multiplication is used as an example in which two input data, and , are assumed to have the same dynamic-range distribution. Table VII illustrates average switching activities of partial products for effective dynamic ranges of input data with uniform distributions. According to Table VII, a larger effective dynamic range of input data implies greater switching activities. With the proposed multipliers, saving ratios are likely increased with effective dynamic ranges because the more differences between the effective dynamic ranges of two input data enable the proposed multipliers to encode input data with smaller effective

432

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

TABLE VIII AVERAGE SWITCHING ACTIVITIES OF PARTIAL PRODUCTS OF THE PROPOSED AND CONVENTIONAL BOOTH-ALGORITHM MUTIPLIERS FOR EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH THE GAUSSIAN DISTRIBUTIONS

REFERENCES
[1] A. P. Chandrakasan and R. W. Brodersen, Low-Power CMOS Design. Piscataway, NJ: IEEE Press, 1998. [2] G. K. Yeap, Practical Low-ower Digital VLSI Design. Norwell, MA: Kluwer, 1998. [3] A. P. Chandrakasan and R. W. Brodersen, Minimizing power consumption in digital CMOS circuits, Proc. IEEE, vol. 83, no. 4, pp. 498523, Apr. 1995. [4] I. S. Abu-Khater, A. Bellaouar, and M. Elmasry, Circuit techniques for CMOS low-power high-performance multipliers, IEEE J. Solid-State Circuits, vol. 31, pp. 15351546, Oct. 1996. [5] V. G. Moshnyaga and K. Tamaru, A comparative study of switching activity reduction techniques for design of low-power multipliers, in Proc. IEEE Int. Symp. Circuits Syst., Apr. 1995, pp. 15601563. [6] E. Angel and E. E. Swartzlander, Jr., Low power parallel multipliers, in Proc. IEEE Workshop Very Large Scale Integration (VLSI) Signal Processing, 1996, pp. 199208. [7] Z. Yu, L. Wasserman, and A. Willson, Jr., A painless way to reduce power dissipation by over 18% in Booth-encoded carry-save array multipliers for DSP, in Proc. IEEE Workshop Signal Processing Syst., 2000, pp. 571580. [8] A. Goldovsky, B. Patel, M. Schulte, and R. Kolagotla, Design and implementation of a 16 by 16 low-power tows complement multiplier, in Proc. IEEE Int. Symp. Circuits Syst., vol. 5, 2000, pp. 345348. [9] S. Mahant-Shetti, P. Balsara, and C. Lemonds, High performance low power array multiplier using temporal tiling, IEEE Trans. VLSI Syst., vol. 7, pp. 121124, Mar. 1999. [10] R. Sheen, S. Wang, O. T.-C. Chen, and R.-L. Ma, Power consumption of a 2s complement adder minimized by effective dynamic data ranges, in Proc. IEEE Int. Symp. Circuits Syst., vol. I, May 1999, pp. 266269. [11] S. Wang, Y. Wu, O. T.-C. Chen, and R. Ma, Low-power multipliers by minimizing inter-data switching activities, in Proc. IEEE 43rd Midwest Symp. Circuits Systems, vol. 1, Aug. 2000, pp. 8892. [12] J. Yuan and C. Svensson, New single-clock CMOS latches and flipflops with improved speed and power savings, IEEE J. Solid-State Circuits, vol. 32, pp. 6269, Jan. 1997. [13] O. T.-C. Chen, W.-L. Liu, H.-C. Hsieh, and J.-Y. Wang, A highlyscaleable FIR using the Radix-4 Booth algorithm, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, vol. 3, May 1998, pp. 17651768.

dynamic ranges. Table VIII presents the average switching activities of the conventional and proposed multipliers for effective dynamic ranges of input data with the Gaussian distributions. When the standard deviation increases, variations in effective dynamic ranges of two input data increase, thereby increasing the saving ratios of switching activities. In contrast, when the mean increases, minimizing switching activities becomes increasingly difficult. That is, an increase in the effective dynamic range decreases the probability that the partial products become zero, making more a reduction in switching activities more difficult. According to Table VII and VIII, the proposed row-based or hybrid-based multipliers in modes III or IV can exhibit the least switching activity, since it uses smaller effective dynamic-range numbers for Booth encoding and controls values of partial products in part of the noneffective dynamic range to remain unchanged. The variation characteristics of the results in Tables VII and VIII are quite consistent with those in Tables III and IV, respectively. However, as effective dynamic ranges of input data span in a small range or has a low standard deviation, the power conserved from reduction of switching activities cannot compensate for the power consumed by the overhead hardware components in the proposed multipliers. Thereby, the proposed multipliers consume little more power than the conventional multiplier in these cases.

ACKNOWLEDGMENT Valuable comments and suggestions from reviewers are highly appreciated. Dr. Bing J. Sheu, Nassda Corp., Santa Clara, USA, are also commended for his valuable suggestions on low-power circuit design. Nan-Ying Shen, Dept. of Electrical Engineering, National Chung Cheng University, Chia-Yi, Taiwan helped on the circuit layouts and simulations.

Oscal T.-C. Chen (S89M94) was born in Taiwan, R.O.C., in 1965. He received the B.S. degree in electrical engineering from National Taiwan University in 1987, and the M.S. and Ph.D. degrees in electrical engineering from University of Southern California at Los Angeles, in 1990 and 1994, respectively. From 1994 to 1995, he was with the Computer Processor Architecture Department of Computer Communication and Research Labs. (CCL), Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, as System Design Engineer, Project Leader, and Section Chief. He contributed significantly to many industrial applications including the fuzzy chip, neural networks, speech recognition system, and digital signal processor. Since September 1995, he has been an Associate Professor in the Department of Electrical Engineering, National Chung Cheng University (NCCU), Chiayi, Taiwan. Currently, he is also Director of the Academic Development Division, Office of Research and Development, NCCU. He has also served as a Technical Consultant with the Institute for information Industry, Center for Aviation and Space Technology and CCL, ITRI. His research interests include analog/digital circuit design, video/audio processing, DSP processors, VLSI systems, RF IC, microsensors, and communication systems. Dr. Chen was an Associate Editor of IEEE Circuits and Devices Magazine from July 1995 to March 1999, and a Founding Member of the multimedia systems and applications technical committee of IEEE Circuits and Systems Society. He participated in the Technical Program Committee of the IEEE International Conference on Multimedia and Expo, 20002002. He was the corecipient of the Best Paper Award of IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATON (VLSI) SYSTEMS in 1995. He is a Life Member of Chinese Fuzzy Systems Association.

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS

433

Sandy (Li Yueh) Wang was born in Taiwan, R.O.C., in 1974. She received the B.S. and M.S. degrees in electrical engineering from National Chung Cheng University, Taiwan, R.O.C., in 1997 and 1999, respectively. In 2000, she joined Winbond Corporation, Hsinchu, Taiwan R.O.C. Her research interests include operational amplifiers, RF circuit modules, and lowpower CMOS integrated circuits for consumer electronics.

Yi-Wen Wu was born in Yunlin, Taiwan, in 1976. She received the B.S. degree in electrical engineering from National Taiwan Ocean University at Keelung, Taiwan, R.O.C. in 1999, and the M.S. degree in electrical engineering from National Chung Cheng University at Chiayi, Taiwan, R.O.C. in 2001. Currently, she is an Integrated Circuit Design Engineer in the Etrend Electronics, Inc., Tainan, Taiwan, where she works in the field of very large scale integration (VLSI) circuit design and system analysis. Her research interests include digital circuit design, board-level development and system integration.

S-ar putea să vă placă și