Sunteți pe pagina 1din 5

2014 2nd International Conference on Devices, Circuits and Systems (ICDCS)

FPGA IMPLEMENTATION OF HIGH SPEED VEDIC MULTIPLIER


USING CSLA FOR PARALLEL FIR ARCHITECTURE
1, 2, 3

Amina Naaz.S 1, Mr.Pradeep M.N 2, Satish Bhairannawar 3, Srinivas halvi 4


Department of Electronics & Communication, Dayanandasagar College of Engineering, Bangalore, India
4
Department of Medical Electronics, Dayanandasagar College of Engineering, Bangalore, India
1

aminakhadeerulla@gmail.com, 2pradec023@gmail.com

Abstract In todays world lots of research work is

going in the field of communication and signal processing


applications. Every application demands for a higher
throughput arithmetic operation. One of the key arithmetic
operations is multiplication which takes maximum execution
time. The development of efficient multiplier is a subject of
interest over decades. So there is a need for an efficient
multiplier which obtains higher performance for real time
signal processing application. This paper presents the
modular design of Vedic multiplier using carry select adder.
The delay of proposed multiplier is reduced due to high
speed carry select adder. The proposed multiplier is applied
to parallel FIR filter. It can be observed that the
combinational delay reduced for the proposed multiplier
compared to existing architecture.

independently in parallel. The delay associated with the array


multiplier is the time taken by the signals to propagate through
the gates that form the multiplication array. Booth
multiplication is another important multiplication algorithm.
Large booth arrays are required for high speed multiplication
and exponential operations which in turn require large partial
sum and partial carry registers. Multiplication of two n-bit
operands using a radix-4 booth recording multiplier requires
approximately n / (2m) clock cycles to generate the least
significant half of the final product, where m is the number of
Booth recorder adder stages. Thus, a large propagation delay
is associated with this case. Due to the importance of digital
multipliers in DSP, it has always been an active area of
research and a number of interesting multiplication algorithms
have been reported in the literature.

Vedic mathematics was rediscovered in the early twentieth


century from ancient Indian scriptures (Vedas). The
conventional mathematical algorithms can be simplified and
even optimized by the use of Vedic mathematics. The Vedic
I. INTRODUCTION
algorithms can be applied to arithmetic, trigonometry, plain
and spherical geometry, calculus. One of the main purposes of
In many DSP algorithms, the multiplier lies in the critical Vedic mathematics is to transform the tedious calculations
delay path and ultimately determines the performance of into simpler and orally manageable operation. Vedic
algorithm. The speed of multiplication operation is of great mathematics provides more than one method for basic
importance in DSP as well as in general processor. In the past operations like multiplication and division. For each operation
multiplication was implemented generally with a sequence of there is at least one generic method provided along with some
addition, subtraction and shift operations. There have been methods which are directed towards specific cases simplifying
many algorithms proposals in literature to perform the calculations further. Central Processing Unit is
multiplication, each offering different advantages and having increasingly working on higher frequencies with reduction in
tradeoffs in terms of speed, circuit complexity, area and power size of transistor. Arithmetic and Logic Unit (ALU) is one of
consumption. The multiplier is a fairly large block of a the most important and critical blocks in Central Processing
computing system. The amount of circuitry involved is Unit. Hence it is imperative to have fast and efficient ALU.
directly proportional to the square of its resolution i.e. a Division is the most time consuming among the basic
multiplier of size n bits has n gates. For multiplication mathematical calculation. In computing technology functions
algorithms performed in DSP applications latency and like Sine and Cosine are also frequently required and are
throughput are the two major concerns from delay perspective. implemented in hardware. With these considerations, it is
Latency is the real delay of computing a function, a measure always important to have fast and efficient mechanism to
of how long the inputs to a device are stable is the final result implement mathematical functions. Vedic mathematics
available on outputs and throughput is the measure of how provides algorithms to simplify the mathematics and hence is
many multiplications can be performed in a given period of perfect solution for the problem stated and also consumes less
time.
area.
Keywords- Urdhva Tiryakbhyam, FIR (FINITE IMPULSE
RESPONSE), Vedic Mathematics, Parallel FIR Architecture,
FFA (FAST FIR ALGORITHM).

Digital multipliers are the core components of all the digital


signal processors (DSPs) and the speed of the DSP is largely
determined by the speed of its multipliers. The most common
multiplication algorithms followed in the digital hardware are
array multiplication algorithm and Booth multiplication
algorithm. The computation time taken by the array multiplier
is less because the partial products are calculated

978-1-4799-1356-5/14/$31.00 2014 IEEE

Organization: The paper is organized as follows: Section II


proposes the related work. Section III contains architecture of
proposed Vedic multiplier. Section IV provides application of
proposed Vedic multiplier in parallel FIR filter. Section V
contains results and discussion. Section VI conclusions
followed by future work.

2014 2nd International Conference on Devices, Circuits and Systems (ICDCS)

II. RELATED WORK


Laxman P.Thakre et al., [1] proposed the concept of
URDHVA TIRYAKBHYAM sutra and NIKILAM sutra. The
conventional multiplier has been synthesized, simulated &
compared with Vedic multiplier individually & also tested in
FFT operations. It reveals that 80% more hardware, memory
& area are consumed by conventional multiplier when
compared to Vedic multiplier. S.S.Kerur et al., [2] proposed
fast Vedic multiplications are required to compute
Convolution, FFT, etc. Here multiplication is done using
URDHVA TIRYAKBHYAM algorithm based on ancient
Vedic mathematics. The combinational delay obtained is
compared with the normal multiplier. Further the Vedic
multiplier is used in matrix multiplication. The proposed
Vedic multiplier proves to be efficient in terms of speed.
G.Vaithiyanathan et al., [3] proposed that Vedic multiplier
implemented using URDHVA TIRYAKBHYAM sutra
consumes less power compared to array multiplier.
PushpalataVerma and K.K.Mehta [4] proposed 8 bit
architecture of Vedic multiplier using ripple carry adder. The
proposed architecture is compared with conventional method
of multiplication, it is inferred that proposed is faster
compared to conventional method. Nivedita A. Pande et al.,
[5] proposed a design-methodology for high-speed
multiplications, where two integers of n-bit size each are
multiplied to produce a 2n-bit product. This paper presents an
efficient and simple method for performing multiplications,
where the lower level multipliers can be used to design higher
level multipliers. Krishnaveni D and Umarani [6] proposed a
4x4 binary multiplier designed using URDHVA
TIRYAKBHYAM sutra. A new 4-bit adder is proposed which
when used in multiplier, reduces its delay. R.Naresh Naik et
al., [7] proposed floating point multipliers implemented using
URDHVA TIRYAKBHYAM sutra shows considerably
improvement in speed when compared to normal Booth
multiplier. R.Priya and J.Senthil Kumar [8] proposed a
modified CLSA applied to Vedic multiplier which shows
reduction in gate count compared to regular CLSA. Arushi
Somani et al., [9] proposed Vedic multiplier and compared
with conventional method in terms of power, delay, space,
speed, power delay product and energy delay product.
III. PROPOSED ARCHITECTURE OF 16X16 VEDIC
MULTIPLIER
The multiplication of 2x2 using Urdhva Tiryakbhyam for
13x12 is as explained below. The least significant digit 3 of
multiplicand is multiplied vertically by least significant digit 2
of the multiplier, get their product 6 and set it down as the
least significant part of the answer, then 2 and 1, 3 and 1 are
multiplied crosswise, add the two, get 5 as the sum and set it
down as the middle part of the answer, then 1 and 1 is
multiplied vertically, get 1 as their product and put it down as
the last the left hand most part of the answer.So 13 12 =
156.

The Urdhva Tiryagbhyam algorithm can be implemented for


binary number system in the same way as decimal number.
The 22 Vedic multiplier module is then used to implement
higher level multipliers (44 multiplier, 88 multiplier, 1616
multiplier). The 16x16 proposed Urdhva Tiryakbhyam using
8x8 Vedic which are designed using 4x4 in turn 2x2 modules
and high speed carry select adder is as shown in Figure 1. The
proposed method was compared with existing architecture
[10] where the modular design of Vedic Urdhva Tiryagbhyam
method using carry save adder and 17 bit ripple adder were
discussed. In our proposed method the adders are replaced
with high speed carry select adders which in turn reduce the
delay and increases the speed of entire Vedic architecture.
B15-b8 a15-a8

8x8VM

b7-b0 a15-a8

8x8 VM
(15-0)

b15-b8

a7-a0 b7-b0

8x8VM
(15-0)

(15-0)

Carry Select
Adder

Carry Select Adder

Q31-Q16

Q15-Q8

a7-a0

8x8 VM
(15-8)

(7-0)

Q7-Q0

Figure 1. Proposed architecture of 16x16 bit Vedic multiplier

The proposed 16x16 bit Vedic multiplier is structured using


four 8x8 Vedic modules and two carry select adders. The
carry select adder used in the design increases the speed of
addition of partial products, as carry select adders has less
delay when compared with all other adders. The 16 bit
multiplicand A can be decomposed into pair of 8 bits AH-AL.
Similarly multiplicand B can be decomposed into BH-BL.
The least significant 8 bits of multiplicand i.e. (a7-a0) and
(b7-b0) are multiplied vertically which gives the LSB 8 bits
product (Q7-Q0). The right most carry select adder adds three
input from three right most 8x8 Vedic multiplier. The LSB
bits of right most carry select adder is retained for the product
(Q15-Q8) and the MSB bits are fed to the left most carry
select adder, to add with the output of left most 8x8 Vedic
multiplier which gives the final 16 bit product (Q31-Q16).
The outputs of 8X8 bit multipliers are added according to
UrdhvaTiryagbhyam Sutra to obtain the 32 bits final product.
Thus, in the final stage two carry select adders were used so
that there is considerable improvement in speed.

2014 2nd International Conference on Devices, Circuits and Systems (ICDCS)

IV. PROPOSED VEDIC MULTIPLIER IN PARALLEL FIR


ARCHITECTURE

((H0 + H2) (X0 + X2) H0X0


[(H0 + H2) (X0 + X2)

FAST FIR ALGORITHM (FFA) :

(H0 H2) (X0 X2)])}

The Finite Impulse Response (FIR) performance speed can be


improved by utilizing property of symmetry. To exploit the
symmetry of coefficients, main idea is to manipulate the
polyphase decomposition to earn as many subfilter blocks as
possible, which contain symmetric coefficients so that half the
number of multipliers within a single subfilter block can be
utilized for the multiplications of whole taps. For a set of
symmetric coefficient in odd length N, when (N mod 3) equals
zero as shown in equation 5 can earn two more sub filter
blocks containing symmetric coefficients. The implementation
of the three-parallel FIR filter [11] modified for 33-tap FIR
filter is shown in Figure 2.

Y1 =

+ {(H0 + H2) (X0 + X2)


[(H0 + H2) (X0 + X2)
(H0 H2) (X0 X2)] H0X0}
+

h (0) = h (32),

h (1) = h(31),

h (2) = h (30),

h (3) = h (29),

h (4) = h(28),

h(5) = h(27), . . . ,h (12) = h (20),

{(H0 + H2) (X0 + X2)


[(H0 + H2) (X0 + X2)
(H0 H2) (X0 X2)] H0X0}

{h(0), h(1), h(2), h(3), h(4), h(5),

Where

(H0 + H1 + H2) (X0 + X1 + X2)


(H1+H2) (X1+X2) (H0 + H2)(X0 + X2)

Example 1: Consider a 33-tap FIR filter with a set of


symmetric coefficients as follows:

h(6), h(7), h(8), h(9), . . . , h(32)}

Y2 =

(2)

H1X1 +
[(H0+H2) (X0+X2) (H0H2) (X0X2)]

X1

H1

<<1
X2

The symmetric coefficients for 33 tap filters can be applied as


shown in Figure 2.

(H0+H2)

(3)

Y0

Y2
(H0-H2)

H0 H2 = {h (0) h (2), h (3) h (5), h (6) h (8) . . . h (18)

h (20), h (21) h (23), h (24) h(26)}

Y3

H0+H1+H2

Where h (0) h (2) = (h (32) h (30))

h (3) h (5) = (h (29) h (27))


h (6) h (8) = ( h (26) h (24))

(1)

H0
X3

h (9) h (11) = (h (23) h (21))


H1+H2

Y0 =

H0X0 +
{(H1 + H2) (X1 + X2) H1X1

Figure 2. Three parallel FIR filter implemented using symmetric coefficients


in odd length (N mod 3 =0)

2014 2nd International Conference on Devices, Circuits and Systems (ICDCS)

Design of H1subfilter block:

X1=2
h1=2

m1=4

h4=5

h7=8

m2=10

h10=11

m3=16

a1=14

m4=22

a2=30

a3=52

h13=14

h16=17

h19=14

h22=11

h25=8

h28=5

h31=2

m5=28

m6=34

m7=28

m8 =22

m9=16

m10=10

m11=4

a4=80

a5=114

a6=142

a7=164

a8=164

a9=190

a10=194

Figure 3. Internal structure of filter blocks H1.

The symmetric parallel FIR filter is shown in Figure 2. The


three parallel FIR filter consists of filter blocks. The input to
the system is represented as X0, X1, X2 and the response of
the system as Y0, Y1 and Y2. Let X0=5, X1= 2, X2=3. The
filter blocks H1 with its mod 3 coefficients are shown in Figure
3. It requires 11 multipliers, 10 adders with 10 delay elements
to get H1.similarly filter blocks H0, H0+H1+H2, H0+H2, H0H2, H1+H2 with its mod 3 coefficients each block requires 11
multipliers, 10 adders with 10 delay elements to .With similar
approach other sub filter blocks can be drawn and Y0 +Y1+Y2
can be calculated. The proposed high speed Vedic multiplier is
used in parallel FIR architecture. The proposed technique
improves the speed of FIR filters and area utilization when
compared to traditional Vedic multiplier.
.
V. RESULTS AND DISCUSSION

existing 16x16 Vedic multiplier module. It can clearly show


that the proposed Vedic multiplier is faster compared to
existing multiplier.
TABLE II: COMPARISON RESULTS OF PARALLEL
FIR
ARCHITECTURE USING EXISTING AND PROPOSED 16X16 BIT VEDIC
MULTIPLIER

XC3S400 -5
pq208

Parallel FIR Architecture


with existing Vedic
multiplier[10]

Parallel FIR Architecture


with proposed Vedic
multiplier

Delay

73.682ns

58.924ns

Number of
Slices

3582 out of 3584 (99%)

3517 out of 3584(98%)

Number of slice
flip flop

1024 out of 7168(14%)

1024 out of 7168(14%)

Number of 4input
LUTs

6693 out of 7168(93%)

6382 out of 7168(89%)

Number of
bonded
IOBs

146 out of 141 (103%)

146 out of 141 (103%)

TABLE I : COMPARISON RESULTS OF EXISTING AND PROPOSED


VEDIC MULTIPLIER
XC3S400 -5 PQ208

Existing Vedic
multiplier[10]

Proposed Vedic
multiplier

Delay

58.24ns

40.83ns

Number of Slices

461 out of 3584(12%)

445 out of 3584 (12%)

Number of 4input
LUTs

808 out of 7168(11%)

777 out of 7168(10%)

Number of bonded
IOBs

64 out of 141 (45%)

64 out of 141 (45%)

It has been observed that for proposed 16x16 Vedic multiplier


module has gate delay of 40.843ns with Device utilization of
12% while it is 58.24 ns with Device utilization of 12% for the

It has been observed that for parallel FIR architecture using


proposed 16x16 bit Vedic multiplier module has gate delay of
58.924ns with Device utilization 3517 out of 3584 and number
of 4- input LUTs are 6382 out of 7168 while it is 73.68ns with
device utilization of 35582 out of 3584 . This shows that
proposed Vedic multiplier applied to parallel FIR Architecture
has less delay and area. It is clearly seen from the above results

2014 2nd International Conference on Devices, Circuits and Systems (ICDCS)

that there is considerable speed improvement. Therefore the


proposed multiplier, when used in other applications can really
yield good results.

The RTL view of parallel FIR filter using proposed Vedic


multiplier is shown in Figure 5.
VI. CONCLUSION

A. RTL SCHEMATICS:
a.

RTL View of proposed 16x16 bits Vedic multiplier

The proposed 16x16 Vedic multiplier architecture has been


designed and synthesized using on Spartan 3 XC3S400 board
and is used in parallel FIR filter design. The proposed Vedic
multiplier with carry select adder is compared with the existing
Vedic multiplier and can be inferred that proposed is faster
compared to existing Vedic multiplier. In future the proposed
multiplier performance parameters can be improved by high
level pipelining operations and applied in signal processing
applications like image processing and video processing.
REFERENCES

[1]

Figure 4. Hardware implementation of proposed Vedic multiplier.

The RTL view of proposed Vedic multiplier is shown in Figure


4.
b.

RTL view of parallel FIR architecture using proposed


16x16 bit Vedic multiplier

Figure 5. Hardware implementation of parallel FIR filters using proposed


Vedic multiplier.

Laxman P.Thakre, Suresh Balpande, Umesh Akare, Sudhir Lande,


Performance Evaluation and Synthesis of Multiplier used in FFT
operation using Conventional and Vedic algorithms, Third
international conference on emerging trends in Engineering and
Technology , IEEE, 2010.
[2]
S. S. Kerur, Prakash Narchi, Jayashree C N, Harish M Kittur and
Girish V. A., Implementation of Vedic Multiplier for Digital Signal
Processing, International Conference on VLSI ,Communication &
Instrumentation (ICVCI), 2011.
[3]
G.Vaithiyanathan, K.Venkatesan, S.Sivaramakrishnan, S.Sivaand,
S.Jayakumar, Simulation and implementation of Vedic multiplier using
VHDL code, International Journal of Scientific & Engineering
Research, vol.4, 2013.
[4]
Pushpalata Verma and K. K. Mehta, Implementation of an Efficient
Multiplier based on Vedic Mathematics Using EDA Tool,
International Journal of Engineering and Advanced Technology
(IJEAT), vol.1, June 2012.
[5]
Nivedita A. Pande, Vaishali Niranjane, Anagha V. Choudhari, Vedic
Mathematics for Fast Multiplication in DSP, International Journal of
Engineering and Innovative Technology (IJEIT) ,vol.2, 2013.
[6]
Krishnaveni D. and Umarani.T.G, Vlsi implementation of Vedic
multiplier with reduced delay, International Journal of Scientific &
Engineering Research, vol.2, May-2011.
[7]
R.Naresh Naik, P.Siva Nagendra Reddy, K. Madan Mohan, Design of
Vedic Multiplier for Digital Signal Processing Applications,
International Journal of Engineering Trends and Technology (IJETT),
vol.4, 2013.
[8]
R.Priya and J.Senthil Kumar, Implementation and comparison of
Vedic Multiplier using Area Efficient CSLA Architectures,
International Journal of Computer Applications, vol 73, July 2013.
[9]
Arushi Somani, Dheeraj Jain, Sanjay Jaiswal, Kumkum Verma and
Swati Kasht, Compare Vedic multipliers with Conventional
Hierarchical array of array multipliers, Computer Technology and
Electronics Engineering (IJCTEE),vol.2, 2012.
[10] Manoranjan Pradhan, Rutuparna Panda and Sushanta Kumar Sahu,
Speed Comparison Of 16x16 Vedic Multipliers, International
Journals of Computer Applications, vol.21, May 2011.
[11] Yu-Chi Tsao and Ken Choi, Area-Efficient VLSI Implementation for
Parallel Linear-Phase FIR Digital Filters of Odd Length Based on Fast
FIR Algorithm, IEEE Transactions on circuits and systems, vol.59,
June 2012 .

S-ar putea să vă placă și