Documente Academic
Documente Profesional
Documente Cultură
htm
A 1616 MUX Based Multiplier Design Using Optimized Static CMOS Logic Style
Abhijit Asati* and Chandrashekhar** * Lecturer, Electrical & Electronics Engineering Group, BITS, Pilani, India ** Director, Central Electronics Engineering Research Institute, Pilani, India
Abstract Simpler VLSI implementation of array multipliers makes them preferable for smaller operand sizes, in-spite of their linear time complexity. In general array multipliers have bad space complexity O (n2), and it requires approximately n2 cells to produce multiplication, therefore as the operand size grows the circuit takes large area and power. In this paper we present a MUX based 1616 unsigned multiplier circuit, which utilize an efficient partial product generation and partial product addition technique. The time and space complexity of such multiplier is much better than simpler array multiplier techniques. The multiplier has been designed using optimized static CMOS logic cells to provide best area, power and delay performance. The multiplier circuit is implemented using conventional CMOS logic in 0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, and simulated after parasitic extraction. The simulation result shows large reduction in propagation delay and the average power compared to tree multiplier implementation by [3]. Keywords: MUX based, array, Wallace tree, booth encoding, partial product, complexity, operand size,
Introduction
In Digital Signal Processor implementation like Standard Digital Signal Processors and ASIC Digital Signal Processors, the multiplier is used as fundamental building block. The performance of different signal processing algorithms like frequency domain filtering (FIR and IIR), frequency-time transformations (FFT), Correlation etc depend on performance of multiplier implementation. In most real-time DSP processing task, the multiplier block must operate at high speed, consuming less layout area and low Power. The multiplication algorithms differ in the means of
54
partial product generation and partial product addition [1]. The array multipliers have linear time complexity i.e O (n) therefore their delay may degrade for multipliers having larger operand sizes. Also array multipliers have bad space complexity O (n2), and they requires approximately n2 cells to produce multiplication, therefore as the operand size grows the circuit takes large area and power [2], [4], [5]. The reduction in partial product row by factor of n can be achieved using a radix-m booth encoding, (where m=2n). By using Booth radix-4 (m=4=22) encoding the partial product rows can be halved [3]; therefore the number of logic cells required to generate partial product are reduced to n2/2 [2]. Further in Wallace tree accumulation, since ripple effect is reduced it produces product in far less time, the time complexity is reduced to O (log n) but requires large gate and routing area compared to regular array, hence unsuitable for VLSI implementation [2]. The advantage of reduction in hardware using Booth encoding scheme can be combined with, accelerated Wallace tree accumulation of partial product to obtain the reduced time complexity of O (log n), which are very much suitable for multipliers having large operand sizes [2], [3]. As discussed earlier, for smaller operand sizes the tree based architectures may have smaller gate delay but consume more silicon area due to increased routing and encoding overheads, on the other hand array multipliers have larger gate delay but consume smaller routing length. The MUX based array multipliers show faster and compact implementation due to efficient partial product generation and efficient partial product addition. In this paper we present, an implementation of 1616, multiplier design using MUX based array technique and static CMOS logic cells. These static CMOS logic cells provide best area, power and delay performance as described in [6]. The VLSI implementation of multiplier circuit is done using 0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, using conventional CMOS logic. Simulation results are compared with another faster Booth encoded Wallace tree multiplier implementation as in [3]. Section II discusses the conventional static CMOS logic design style, section III explains the design of MUX based multiplier algorithm, Section IV describes the illustration of the Multiplication Logic; Section V describes schematic 44 multiplier and 1616 multiplier. Physical implementation and results are described in section VI. Section VII concludes the paper.
55
style at scaled down technologies. A logic gate with fan-in of n requires 2n (n Ntype + n P-type) devices. Two logic blocks, N-block and P-block, form a CMOS gate. The topology of N-block is the dual of that of the P-block. Since both the two blocks have equal number of transistors, transistor count may increase. The channel widths of series connected n-channel MOS transistors (NMOS) or p-channel MOS transistors (PMOS) have to be increased to obtain a reasonable conducting current to drive capacitive loads. The increase in size of PMOS results in a significant area overhead, and also an increased gate input capacitance, which may lead to high dynamic power dissipation. The higher gate input capacitance loads the previous stage thereby increases the delay. The ratio of PMOS/NMOS transistor widths () should be chosen optimally for achieving good, noise margin, higher speed and lower power consumption as described in [7], [9]. The short-circuit currents of a static CMOS gate can be minimized by appropriately sizing transistors for equal rise and fall times. The schematic of 1-bit full adder, 2-input AND, 3-input AND, 2-input MUX, 2-input function implemented using Conventional Static CMOS Logic design is shown in Figure 1. The full adder cell is designed using principle of symmetry has 28 transistors as described in [6], [8]. The 28-transistor performs considerably better than the 40-transistors version [6]. The 32-bit adder designed using complimentary CMOS has a power delay product of less than half of the CPL version [6]. The 2-input AND cell, 3-input AND cell, 2 input MUX and other cells also provide better a power delay product.
(a)
(b)
(c)
(d)
Figure 1: schematic using conventional static CMOS logic design style of (a) complex Full adder cell using principle of symmetry (b) 2 input AND gate (c) 3 input AND gate (d) 2 input MUX .
56
X = x n 1 x n 2 K x 0 Y = y n 1 y n 2 K y 0 Let ,
X
j
(1)
P = XY
Xj &Yj are binary nos. after truncation, up-to the (j+1)th bit in X,Y respectively;
= x j 1 x j 2 K x 0
j 1
Yj = y
P j = X jY j
X 0 = Y 0 = 0 = P0 X = X n = X n 1 + 2 n 1 x n 1 & Y = Y n = Y n 1 + 2 n 1 y n 1 Pn = X n Y n = X n 1 + 2 n 1 x n 1 Y n 1 + 2 n 1 y n 1 = = =
}{
= 2 2 n 2 x n 1 y n 1 + 2 n 1 ( x n 1Y n 1 + X n 1 y n 1 ) + X n 1Y n 1
n 1
xjyj2
2 j
+ ( x j Y j + X j y j ) 2 j + P0
0 n 1 0
n 1
n 1
x j y j 2 2 j + ( x jY j + X j y j ) 2 j x j y j 22 j + Z j 2 j
0 n 1
n 1
(3)
57
where , Z j = x jY j + X j y j Zj = X Z j = Yj Zj =0
j
if if if
x j = 0, y j = 1 x j = 1, y j = 0 x j = 0, y j = 0 if x j = 1, y j = 1 ( 4)
Z j = X j +Yj
0 P7
0/0/0/C3 =0/0/0/1 0 0 P6
0/X2/Y2/S2 =0/1/0/0 0 0 P5
58
2j
AND2 22X1Y1 X2 Y2 0 0 0 C1 0 X0 Y0 S0
AND2 20X0Y0
AND2 24X2Y2 X3 Y3 0 0 0 C2 0 Y1 X1 S1
4:1 MUX
4:1 MUX
X1 Y1
0 X0 Y0 S0
Z121
AND2 26X3Y3 0 0 0 C3 0 X2 Y2 S2
4:1 MUX
4:1 MUX
4:1 MUX
X2 Y2 Z222
0 Y1 X1 S1
0 X0 Y0 S0
4:1 MUX
4:1 MUX
X14:1 Y1 MUX
4:1 MUX
X3 Y3 Z323
Xj
4:1 MUX
0 Xi Yi Si
CELL-I
Cout Sout Xj Yj Cout Sout
59
Sin Xi=Xj
Xj
Yj
Xj Yj Cin
Yi=Yj Si=Sj
CELL-II
FA
II
Cj
FA
AND2
AND2
XiYi
XiYi
60
the 16-bit multiplier were carried out using these cell libraries and automatic place and route tool LEDIT (SPR) from M/s Tanner Research Inc. It was noticed that the physical library utilizing W/L ratio of 3 for NMOS transistor gave the smallest average switching energy-delay product. The generated layouts were simulated after parasitic extraction using circuit simulator, ELDO spice. Supply voltage VDD is kept at 3.3V. The table 1 shows the comparison of important parameters like propagation delay and power dissipation at 20MHz data rate with tree based implementation as in [3]. Table 2 shows the maximum power leakage power, transistor count, core area, total routing length and number of vias.
Table 1 Algorithm (technology) Proposed (0.6m) BEWM ref [3] (1.25 m) VDD (V) 3.3 5 Propagation delay () ns 14.15 60 Average power (mW) 22.05 100
Table 2 Algorithm Maximum Leakage Transistor Core Total Number (technology) Power Power count area routing of Via (mW) (nW) (mm2) length (mm) Proposed 623.46 53.34 10168 23.76 1386.71 3452 (0.6m)
Comparing these two multiplier architectures shows that proposed MUX based array multiplier architecture shows reduction in delay by a factor of 0.235 and reduction in average power consumption almost by a factor of 0.22. The maximum instantaneous power, leakage power, transistor count, core area, total routing length and number of vias are also shown for judging the VLSI implementation characteristics.
61
Conclusion
This paper present a 16-bit MUX based unsigned multiplier implementation using an optimized static CMOS logic style. The multiplier algorithm performs efficient partial product generation and addition; which makes its time and space complexity better than other array multipliers. The simulation results are compared with faster tree multiplier implementation shows reduction in propagation delay by a factor 1/4 and average switching power by approximately by a factor 1/4.
References
[1] [2] [3] A. Hesham, Technology scaling effects on multipliers, IEEE Transactions on Computers, Vol.47, No.11, pp. 1201-1215, November 1998. Z. Kiamal, Multiplexer-based array multipliers, IEEE Transactions on Computers, Vol.48, No.1, pp. 15-23, January 1999. F Jalil, M *N Booth encoded multiplier generator using optimized wallace trees, IEEE Transactions on very large Scale Integration (VLSI) Systems, Vol. 1, No.2, pp. 120-125, June 1993. V. Chanramouli, Self-Timed design in GaAs-case study on a high-speed, parallel multiplier, IEEE Transactions on very large Scale Integration (VLSI) Systems, Vol. 4, No.1, pp. 146-149, March 1996. P. Kornerup, A systolic, linear-array multiplier for a class of right-shift algorithms, IEEE Transactions on Computers, Vol.43, No.8, pp. 892-898, August 1994. Reto Zimmermann and Wolfgang Fichtner, Low-Power Logic Styles: CMOS Versus Pass Transisistor Logic IEEE Journal of solid state circuits, Vol. 32, No. 7, pp. 1079-1090, July 1997 Mohab Anis, Mohamed Allam and Mohamed Elmasry, Impact of Technology Scaling on CMOS Logic Styles, IEEE Transaction on circuits and systems-II, Analog and Digital Signal Processing, VOL. 49, NO. 8, pp. 577-587, August 2002. S.M. kang, Yusuf Leblebici, CMOS Digital integrated Circuits, Analysis and Design, Third edition McGrawhill, 2003. N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, AddisonWesley, 1994 Jan M. Rabaey, Anantha Chandrakasan, Borivose Nikolic, Digital Integrated Circuits, Second Edition PrenticeHall of India Private Limited, 2004.
[4]
[5]
[6]
[7]