Low Power Efficient MAC Unit Using Proposed Carry Select Adder

Low Power Efficient MAC Unit using Proposed Carry Select Adder
Y Rama Krishna K Suresh Babu

PG Scholar in VLSI Design, Assistant Professor,
Department of ECE, Department of ECE,
Dhanekula Institute of Engineering & Technology, Dhanekula Institute of Engineering & Technology,
Ganguru, Krishna Dist., Andhra Pradesh, India. Ganguru, Krishna Dist., Andhra Pradesh, India.
Abstract: A conventional carry selects adder (CSLA) is an

A design of high performance 8 bit Multiplier-and- RCA–RCA configuration that generates a pair of sum
Accumulator (MAC) is implemented in this paper. words and outputcarry bits corresponding the
MAC unit performs important operation in many of anticipated input-carry (Cin = 0 and 1) and selects one
the digital signal processing (DSP) applications. The out of each pair for final-sum and final-output-carry
multiplier is basic array multiplier and the adder is [4]. A conventional CSLA has less CPD than an RCA,
proposed carry select adder. The proposed CSLA but the design is not attractive since it uses a dual
design involves significantly less area and delay than RCA. Few attempts have been made to avoid dual use
the recently proposed BEC-based CSLA. Due to the of RCA in CSLA design. Kim and Kim [5] used one
small carry-output delay, the proposed CSLA design RCA andone add-one circuit instead of two RCAs,
is good for square-root (SQRT) CSLA. The total where the add-onecircuit is implemented using a
design is coded with Verilog-HDL and the synthesis multiplexer (MUX). In [6] they proposed a square-root
is done using Cadence RTL complier using typical (SQRT)-CSLA to implement largebit-width adders
libraries of TSMC 180nm technology.The total power with less delay. In a SQRT CSLA, CSLAswith
dissipation is 605uW. increasing size are connected in a cascading structure.
The main objective of SQRT-CSLA design is to
Keywords: Adder, arithmetic unit,low power design, provide a parallelpath for carry propagation that helps
Array multiplier, Carry select adder, multiplier and to reduce the overall adder delay. As suggested[7] a
accumulator (MAC). binary toBEC-based CSLA. The BEC-based CSLA
involves less logic resources than the conventional
INTRODUCTION CSLA, but it has marginally higher delay. A CSLA
Low power,area-efficient, and high-performance VLSI based on common Boolean logic (CBL) is also
systems are increasingly used in portable and proposed in [8] and [9]. The CBL-based CSLA of [8]
mobiledevices, multistandard wireless receivers, and involves significantly less logic resource than the
biomedical instrumentation [2], [3]. An adder is the conventional CSLA but it has longer CPD, which is
main component of anarithmetic unit. A complex almost equal to that of the RCA. To overcome this
digital signal processing (DSP) system involves problem, a SQRT-CSLA based on CBL was proposed
several adders. An efficient adder design in [9]. However, the CBL-based SQRTCSLA design
essentiallyimproves the performance of a complex of [9] requires more logic resource and delay than the
DSP system. A ripple carry adder (RCA) uses a simple BEC-based SQRT-CSLA of [7]. We observe that logic
design, but carrypropagation delay (CPD) is the main optimization largely depends on availability of
concern in this adder. Carry look-ahead and carry redundant operations in the formulation, whereas adder
select (CS) methods have beensuggested to reduce the delay mainly depends on data dependence. In the
CPD of adders. existing designs, logic is optimized without giving any
consideration to the data dependence. In this brief, we
made an analysis on logic operations involved in
Page 1186
conventional and BEC-based CSLAs to study the data and B = bn–1bn–2 …b0 is the multiplier. The product
dependence and to identify redundant logic operations. P = p2n– 1, p2n–2 …p0 can be written as follows:
Based on this analysis, we have proposed a logic P=∑𝑛−1 𝑛−1
𝑖=0 ∑𝑗=0 ai.bj.2i+j
formulation for the CSLA. The main contributions in
this brief are logic formulation based on data
dependence and optimized carry generator (CG) and
CS design. Most of digital signal processing methods
use nonlinear functions such as discrete cosine
transform (DCT) or discrete wavelet transforms
(DWT). Because they are basically accomplished by
repetitive application of multiplication and addition,
Fig.1 Multiplication Example
speed of the multiplication and addition arithmetic
determines the execution speed and performance of the
An array implementation is shown in Figure 2. In the
entire calculation. Multiplication –and -accumulate
4x4 Array multiplier, the multiplier array consists of 3
operations are typical calculation. Multiplication –and
rows of Carry-save adders (CSAs), in which each row
– accumulateoperations is typical for digital filters.
contains 3 full adders (FAs). Each FA has three inputs
Therefore, the functionality of the MAC unit enables
and two outputs. The sum bit and the carry bit. 3 FAs
high-speed filtering and other processing typical for
in the first CSA row that have only two valid inputs
DSP applications. Since the MAC unit operates
can be replaced by 3 half adders (HAs) and 3 FAs in
completely independent of the CPU, it can process
the last row can be constructed as a 3-bit ripple-carry
data separately and there by reduce CPU load. The
adder.
application like optical communication systems which
is based on DSP, require extremely fast processing of
huge amount of digital data. The Fast Fourier
Transform(FFT)alsorequiresadditionandmultiplication.
AMACunitconsistsofamultiplierand an accumulator
containing the sum of the previous successive
products. The MAC inputs are obtained from the
memory location and given to the multiplier block.
The design consists of 8 bitarraymultiplier,16 bit carry
selectadderandaregister.Thispaperisdividedintosixsecti
Fig.2.Array Multiplier
ons.Inthefirstsectionthe introduction about MAC unit
is discussed. In the
On the other hand, the Baugh-Wooley multiplier uses
secondsectiondiscussaboutthedetailedoperationofMAC
the same array structure to handle 2’s complement
unit.Thethirdand fourth section deals with the
multiplication, with some of the partial products
operation of modified array multiplier and carry select
replaced by their complements. The multiplier array
adder respectively. In the fifth section, the
consists of (n–1) rows of carry-save adders (CSA), in
obtainedresultforthe8bitMACunitisdiscussedandfinally
which each rows contains (n–1) full adders (FA).The
the conclusion is made in the sixth section.
last row is a ripple adder for carry propagation
ARRAY MULTIPLIER
PROPOSED ADDER DESIGN
Consider the multiplication of two unsigned n-bit
The proposed CSLA is based on the logic formulation
numbers, where A=an–1an–2 …a0 is the multiplicand
given in (3a)-(3g), and its structure is shown in Fig.3.
Page 1187
It consists of one HSG unit, one FSG unit, one CG (s). The LSB of S0 is XORed with Cinto obtain the LSB
unit, and one CS unit. The CG unit is composed of two of S.
CGs (CG0 and CG1) corresponding to input-carry ‘0’ 𝑠0 (𝑖) = 𝐴(𝑖) ⊕ 𝐵(𝑖)
and ‘1’. The HSG receives two n-bit operands (A and 𝑐0 (𝑖) = 𝐴(𝑖). 𝐵(𝑖) 3(a)
B) and generate half-sum word S0 and half-carry word 𝑐10 (𝑖) = 𝑐0 (𝑖) + 𝑠0 (𝑖).𝑐10 (𝑖 − 1)
C0 of width n bits each. Both CG0 and CG1 receive S0 For (𝑐10 (0) = 0) 3(b)
and C0 from the HSG unit and generate two n-bit full- 1 (𝑖) 1
𝑐1 = 𝑐0 (𝑖) + 𝑠0 (𝑖).𝑐1 (𝑖 − 1)
carry words 𝑐10 and𝑐11 corresponding to input-carry ‘0’
For (𝑐11 (0) = 1) 3(c)
and ‘1’, respectively. The logic diagram of the HSG 0
C(i)=𝑐1 (𝑖) if cin=0 3(d)
unit is shown in 3(b). The logic circuits of CG0 and 1
C(i)=𝑐1 (𝑖) if cin=1 3(e)
CG1 are optimized to take advantage of the fixed
input-carry bits. The optimized designs of CG0 and 𝐶𝑜𝑢𝑡 = 𝐶(𝑛 − 1) 3(f)
CG1 are shown in 3(c) and (d), respectively. S(0) = S0(0)⊕ Cin
S(i) = S0(i)⊕ C(i-1) 3(g)
MAC IMPLEMENTATION
TheMultiplier-Accumulator(MAC)operation is
thekeyoperation not only in DSP applications but also
in multimedia information processing and various
other applications. MAC unit consist of multiplier,
proposed adder and an accumulator. In this paper, we
used 8-bitarray multiplier.The MAC inputs are
obtained from the memory location andgiven
tothemultiplierblock.Theinputwhichisbeingfedfromthe
memorylocation is 8 bit.Whentheinput
isgiventothemultiplieritstartscomputing value for the
given 8-bit input and hence the output will be16 bits.
Fig.3 Proposed architecture
The CS unit selects one final carry word from the two
carry words available at its input line using the control
signal Cin. It selects 𝑐10 when Cin = 0; otherwise, it
selects 𝑐11 . The CS unit can be implemented using an
n-bit 2-to-l MUX. However, we find from the truth
table of the CS unit that carry words𝑐10 and 𝑐11 follow a
specific bit pattern. If 𝑐10 (i) = ‘1’, then𝑐11 (i) = 1,
irrespective of S0(i) and C0(i), for 0 ≤ i≤ n − 1. This
feature is used for logic optimization of the CS unit.
The optimized design of the CS unit is shown in 3(e),
which is composed of n AND–OR gates. The final
Fig4. MAC Unit
carry word c is obtained from the CS unit. The MSB of
c is sent to output as Cout, and (n − 1) LSBs are
Themultiplieroutputisgivenastheinput to proposed carry
XORed with (n − 1) MSBs of half-sum (S0) in the FSG
select adder whichperformsaddition.Theoutputof
[shown in 3(f)] to obtain (n − 1) MSBs of final-sum
Page 1188
proposed carryselect adderis17 biti.e.onebit is for the
carry (16 bits+ 1 bit). Then, the output is given tothe
accumulator.Theoutputoftheaccumulatoristaken out or
fed back as one of the input to the carry select
adder.The figure 1 shows the basic architecture of
MACunit.
SIMULATIONS AND LAYOUT

We have designed the SQRT-CSLA in Verilog using
the proposed CSLA design and the existing CSLA Fig.5 Simulation output of 8-Bit MAC unit
designs of [7] and [8] for bit-widths 16, 32,64, and
128. All the designs are synthesized in the Cadence TABLE –I: Post layout ASIC synthesis results
RTL Compiler (RTL) using the SAED 180-nm CMOS Comparison
library. The netlist file Extracted from RTL Compiler.
As shown in Table I, the proposed SQRT-CSLA
involves significantly less area and less delay and
consumes less power than the existing designs. We can
find from Fig. 5 that the proposed SQRT-CSLA design
offers a saving of 21.9% area and 44.3% power than
the RCA-based conventional SQRT-CSLA, 7.16%
area and 33.3% power than BEC-based SQRT-CSLA
and 27% area and 52.2% power than CBL-based
SQRTCSLA on average, for different bit-widths. The
power comparison in µW is shown in fig 6. Power
required for the proposed design is very low compared
to the other designs. The area overhead is shown in fig
7 and Data Arrival Time comparison is shown in fig 8
as a graph.The 8-bit MAC is designed using proposed
SQRT CSLA and it is compared with previous
designed addersConventional,BEC,CBL (comparison
shown in table-II).The simulation result and layout is
shown below. The proposed MAC design offers a
saving of 6.1% area and 14% power than the MAC
using CONV-CSLA; 1.57% area and 9.59% power
than the MAC using BEC based CONV-CSLA; 7.9%
area and 17.2% power than the MAC using CBL based
CONV-CSLA. From fig 9 it can be seen that the area
required for MAC is very less compared to the
conventional MAC designs. Fig 10 shows the power
comparison graph and fig 11 shows the Data Arrival
Time comparison graph. Fig 12 shows the Layout of
Mac using the proposed Adder.
Fig.6. Power Comparison
Page 1189
Fig.7Area Comparison
Fig.9 Area Comparison
Fig.8DAT Comparison
TABLE - II: Implemented Post Layout Synthesis

Results Comparison for MAC
Fig.10 Power Comparison
Fig.11 DAT Comparison
Page 1190
Layout of MAC by using Proposed SQRT CSLA: [4] O. J. Bedrij, “Carry-select adder,” IRE Trans.
Electron. Comput.,vol. EC-11, no. 3, pp. 340–344, Jun.
1962.
[5] Y. Kim and L.-S. Kim, “64-bit carry-select adder

with reduced area,”Electron. Lett., vol. 37, no. 10, pp.
614–615, May 2001.
[6] Y. He, C. H. Chang, and J. Gu, “An area-efficient

64-bit square root carryselectadder for low power
application,” in Proc. IEEE Int. Symp. CircuitSyst.,
2005, vol. 4, pp. 4082–4085.
[7] B. Ramkumar and H.M. Kittur, “Low-power and

area-efficient carry-selectadder,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 20, no. 2,pp.
Fig.12.MAC Layout 371–375, Feb. 2012.
CONCLUSION [8] I.-C. Wey,C.-C. Ho, Y.-S. Lin, and C.C. Peng, “An
We have eliminated all the redundant logic operations area-efficient carryselect adder design by sharing the
of the conventional CSLA and proposed a new logic common Boolean logic term,” in ProcIMECS, 2012,
formulation for the CSLA. The proposed CSLA design pp.1-4.
involves significantly less area and delay than the
[9] S.Manju and V. Sornagopal, “An efficient SQRT
recently proposed BEC-based CSLA. Due to the small
architecture of carry selectadder design by common
carryoutput delay, the proposed CSLA is a good
Boolean logic,” in Proc. VLSI ICEVENT, 2013,pp. 1–
Design for the SQRT adder. Due to this performance
5.
results CONV-CSLA, BEC, CBL, Proposed adders is
placed in MAC and Achieved better results. In future
array multiplier can replaced with any low power
multiplier and extended for different bit widths.
REFERENCES
[1] Basant Kumar Mohanty, Sujitkumar Patel, “Area-
Delay-Power efficient carry select adder”, IEEE
Transactions on Circuit and Systems, Vol.61, No.6,
June-2014, pp.418-422.
[2] K. K. Parhi, VLSI Digital Signal Processing. New

York, NY, USA:Wiley, 1998.
[3] A. P. Chandrakasan, N. Verma, and D. C. Daly,

“Ultralow-power electronicsfor biomedical
applications,” Annu. Rev. Biomed. Eng., vol. 10, pp.
247–274, aug. 2008.
Page 1191

Low Power Efficient MAC Unit Using Proposed Carry Select Adder

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Low Power Efficient MAC Unit Using Proposed Carry Select Adder

Încărcat de

Drepturi de autor:

Formate disponibile

Low Power Efficient MAC Unit using Proposed Carry Select Adder

Y Rama Krishna K Suresh Babu

Abstract: A conventional carry selects adder (CSLA) is an

SIMULATIONS AND LAYOUT

Fig.9 Area Comparison

TABLE - II: Implemented Post Layout Synthesis

Fig.10 Power Comparison

Fig.11 DAT Comparison

[5] Y. Kim and L.-S. Kim, “64-bit carry-select adder

[6] Y. He, C. H. Chang, and J. Gu, “An area-efficient

[7] B. Ramkumar and H.M. Kittur, “Low-power and

[2] K. K. Parhi, VLSI Digital Signal Processing. New

[3] A. P. Chandrakasan, N. Verma, and D. C. Daly,

S-ar putea să vă placă și