Documente Academic
Documente Profesional
Documente Cultură
-
Absirucr Recently a renewed interest is seen in RNS type of binary adder cell employed in the modulo adder. So to
(Residue Number System) which stems out from the fact that make the MAC efficacious, a new low power CMOS adder
these systems are inherently parallel and modular and thus are cell is used which plays a pivotal role in scaling down the
fast and simple. In many DSP applications Multiply-Accumulate power requirements of the entire MAC unit [7].
(MAC) operation turns out to be the most basic one and hence
an RNS based 36-bit MAC architecture is presented in this
The MAC operation is completely done in residue
paper to speed up the whole operation. A further enhancement domain. So the operands have to be initially transformed intc
in speed up is achieved by exploiting the logarithmic properties residue domain (forward conversion) and after the
of Galois fields and integer rings. The choice of forward and completion of the MAC operation the sum residues have to
reverse converters used in the design results in considerable be converted back to binary form (reverse conversion). In
savings in silicon real estate. The adder cells used is based on order to effect these conversions, a forward and a reverse
pass transistor design which attribute to very low power converter are also incorporated in the proposed design. With
consumption. these, the design represents a complete MAC unit with both
the inputs and outputs in standard binary form.
I. INTRODUCTION
11. RNS OVERVIEW
Residue Number System (RNS) eliminates the speed-
draining carry propagation in arithmetic computations, thus In RNS, an integer X is uniquely represented by an r-tuple
making it attractive in heavily computation intensive of integers (x1,x2,....,xr) which is called the residue
applications. The parallel arithmetic nature of RNS offers a representation of X. The integers xi, i = 1,2,.,..,r are called
potential speed up in FIR filtering, discrete Fourier the residues and are obtained as remainders when the number
transforms, matrix multiplications, and similar other X is divided by a set of distinct relatively prime integers mi, i
computations [13. A Multiply-Accumulate (MAC!) unit is an = 1,2,. ..,r, which are called the moduli of the residue number
integral part of an arithmetic processor used in such system. Thus, xi = X mod mi, denoted by lXlmi , where
applications. In a MAC unit, multiplication speed is of
paramount importance and it becomes necessary to search for O<xixi<mi.It follows from the Chinese Remainder Theorem
new strategies for increasing the speed. One such strategy is (CRT) that, for any given r-tuple satisfying the above
the use of index calculus for doing multiplication. Even relationships, there exists one and only one integer X such
though it speeds up multiplication by converting it to that OIX<M where M = n;=,mi defines the range of the
addition, its domain was limited to the set of prime moduli. number system. The number X can be evaluated from the r-
When a large range such as 32 or 36 bits is aimed at, with a tuple (x,, x2,...,xr) using (1):
set of prime moduli, the moduli chosen will eventually result
in non-uniform word length. This necessitates exploring
additional moduli with similar properties that will provide
extra flexibility with uniform word length. It has been shown
that index calcuius techniques could be extended to non-
prime moduli that are powers of 2 [2,3]. This was further Arithmetic operations on two operands X and Y are
extended to powers of odd primes in [4,5]. Thus by defined as: Z = X O Y , where X = (x1,x2, ...,xr), Y = (yI,y2,
combining both primes and powers of prime moduli a very and zi = xi yi for i = 1,..., r. The
...,yr), Z = (ZI,ZZ, ...,q), O
fast and less complex multiplier could be implemented. symbol denotes any of the operations of addition,
Evaluation of the summation of the products obtained is subtraction or multiplication. From the above definition, it is
the next phase of the computation. This is achieved by seen that these operations are performed in parallel in each of
performing addition in residue domain using modulo adders. the residue channels, independent of one another. This
Since the proposed MAC is a multiplierless unit, the main inherent parallelism and the cany-free arithmetic between
logic units are modulo adders. Hence the performance of the different residue channels provide speed up during arithmetic
design mainly depends on the choice of modulo adder used. processing.
Hence a modulo adder which outperforms other counterparts
is chosen [ 6 ] . Another design issue equally important is the
381
Xij Yij
Index Index
A U U B -
Inverse Index ROM
H
+
Register 1 1
H=ACBB
S = H €BCi,
mi is less than 2"-1, A modification is also done based on the have been reduced by 5 bits. This amounts to an overall
fact that the offset value to be added is known a-priori. So the reduction of 2 " x n x r (1280) bits of ROM for the entirp
logic requirements of the second adder in a cascade of two design. The reduction in total hardware requirements due to
can be brought down to half. Thus the primitive cells of the this is about 14% compared to earlier ones. There is also a
basic mod mi adder are constructed from a full adder (FA) 14% delay reduction assuming the use of carry propagate
followed by a zero- primitive (one-primitive) for a zero-offset adders.
(one-offset). To bring down the power and area requirements, 1- n+ Register 1
t n-+
the transmission gate adder used in the above adder is
replaced with a new low power counterpart given in [7]. This
14 transistor pass logic adder implementation which
outperforms all other counterparts is shown in Fig. 2.
V. FORWARD
AND REVERSE CONVERTERS
382
XI Binary to Residue MAC Residue to Binary
Converter Converter
C
Finally, a block schematics of a 36-bit MAC unit
---+
complete with the forward and reverse converters is shown in
Fig. 5 .
VI. CONCLUSIONS
M. A. Soderstrand, W. K. Jenkins, G. A. Jullien, and F. J. Taylor, [6] J.C. Smith and F.J. Taylor, "A fault-tolerant GEQRNS processing
Residue Number System Arithmetic: Modern Applicalions in Digital element for linear systolic array DSP applications," lEEE Trans.
Signal Processing. New York: IEEE Press, 1986. Comput., vol. 44, no. 9, pp. 1121-1 130, Sep. 1995.
I. M. Vinogradov, Elements of Number 7heov. New York: Dover [7] D.Radhakrishnan, "Formal design procedures for low-power CMOS
Publications, 1954. full adder cells," IEEE Trans. Circuits Syst. 4,unpublished.
G. C. Cardarilli, R. Lojacono, G. Martinelli and M. Salerno, [8] M. Dugdale, "Residue multipliers using factored decomposition,"IEEE
"Structurally passive digital filters in residue number systems," IEEE Trans. CircuitsSyst. 41, vol. 41, no. 9, pp. 623-627, Sept. 1994.
Trans. CircuitsSyst., vol. 35, pp. 149-158, 1988. [9] J. Mathew, D. Radhakrishnan and T. Srikanthan, "Using the 2"property
D. Radhakrishnan, "Modulo multipliers using polynomial rings," IEE to implement an efficient general purpose residue-to-binary converter,"
-
proc. CirnrifsDevices Syst., vol. 145, no. 6, pp. 443-445, Dec. 1998. Inti. Symp. Signals, Circuits Syst., Iasi, Romania, July 1999.
D. Radhakrishnan and A. P. Preethy, "A novel 36-bit single fault [IO] D. Radhakrishnan and Y.Yuan, "Novel approaches to the design of
tolerant multiplier using 5-bit moduli," IEEE TENCON 98, vol. I, pp. VLSI RNS multipliers," IEEE Trans. Circuits Syst., vol. 39, no. 1, pp.
128-130, New Delhi, India, Dec. 1998. 52-57, Jw. 1992.
383