Fpga Implementation of High Speed Vedic Multiplier Using Csla For Parallel Fir Architecture

2014 2nd International Conference on Devices, Circuits and Systems (ICDCS)
FPGA IMPLEMENTATION OF HIGH SPEED VEDIC MULTIPLIER

USING CSLA FOR PARALLEL FIR ARCHITECTURE
1, 2, 3
Amina Naaz.S 1, Mr.Pradeep M.N 2, Satish Bhairannawar 3, Srinivas halvi 4

Department of Electronics & Communication, Dayanandasagar College of Engineering, Bangalore, India
4
Department of Medical Electronics, Dayanandasagar College of Engineering, Bangalore, India
1
aminakhadeerulla@gmail.com, 2pradec023@gmail.com
Abstract In todays world lots of research work is
going in the field of communication and signal processing

applications. Every application demands for a higher
throughput arithmetic operation. One of the key arithmetic
operations is multiplication which takes maximum execution
time. The development of efficient multiplier is a subject of
interest over decades. So there is a need for an efficient
multiplier which obtains higher performance for real time
signal processing application. This paper presents the
modular design of Vedic multiplier using carry select adder.
The delay of proposed multiplier is reduced due to high
speed carry select adder. The proposed multiplier is applied
to parallel FIR filter. It can be observed that the
combinational delay reduced for the proposed multiplier
compared to existing architecture.
independently in parallel. The delay associated with the array

multiplier is the time taken by the signals to propagate through
the gates that form the multiplication array. Booth
multiplication is another important multiplication algorithm.
Large booth arrays are required for high speed multiplication
and exponential operations which in turn require large partial
sum and partial carry registers. Multiplication of two n-bit
operands using a radix-4 booth recording multiplier requires
approximately n / (2m) clock cycles to generate the least
significant half of the final product, where m is the number of
Booth recorder adder stages. Thus, a large propagation delay
is associated with this case. Due to the importance of digital
multipliers in DSP, it has always been an active area of
research and a number of interesting multiplication algorithms
have been reported in the literature.
Vedic mathematics was rediscovered in the early twentieth

century from ancient Indian scriptures (Vedas). The
conventional mathematical algorithms can be simplified and
even optimized by the use of Vedic mathematics. The Vedic
I. INTRODUCTION
algorithms can be applied to arithmetic, trigonometry, plain
and spherical geometry, calculus. One of the main purposes of
In many DSP algorithms, the multiplier lies in the critical Vedic mathematics is to transform the tedious calculations
delay path and ultimately determines the performance of into simpler and orally manageable operation. Vedic
algorithm. The speed of multiplication operation is of great mathematics provides more than one method for basic
importance in DSP as well as in general processor. In the past operations like multiplication and division. For each operation
multiplication was implemented generally with a sequence of there is at least one generic method provided along with some
addition, subtraction and shift operations. There have been methods which are directed towards specific cases simplifying
many algorithms proposals in literature to perform the calculations further. Central Processing Unit is
multiplication, each offering different advantages and having increasingly working on higher frequencies with reduction in
tradeoffs in terms of speed, circuit complexity, area and power size of transistor. Arithmetic and Logic Unit (ALU) is one of
consumption. The multiplier is a fairly large block of a the most important and critical blocks in Central Processing
computing system. The amount of circuitry involved is Unit. Hence it is imperative to have fast and efficient ALU.
directly proportional to the square of its resolution i.e. a Division is the most time consuming among the basic
multiplier of size n bits has n gates. For multiplication mathematical calculation. In computing technology functions
algorithms performed in DSP applications latency and like Sine and Cosine are also frequently required and are
throughput are the two major concerns from delay perspective. implemented in hardware. With these considerations, it is
Latency is the real delay of computing a function, a measure always important to have fast and efficient mechanism to
of how long the inputs to a device are stable is the final result implement mathematical functions. Vedic mathematics
available on outputs and throughput is the measure of how provides algorithms to simplify the mathematics and hence is
many multiplications can be performed in a given period of perfect solution for the problem stated and also consumes less
time.
area.
Keywords- Urdhva Tiryakbhyam, FIR (FINITE IMPULSE
RESPONSE), Vedic Mathematics, Parallel FIR Architecture,
FFA (FAST FIR ALGORITHM).
Digital multipliers are the core components of all the digital

signal processors (DSPs) and the speed of the DSP is largely
determined by the speed of its multipliers. The most common
multiplication algorithms followed in the digital hardware are
array multiplication algorithm and Booth multiplication
algorithm. The computation time taken by the array multiplier
is less because the partial products are calculated
978-1-4799-1356-5/14/$31.00 2014 IEEE
Organization: The paper is organized as follows: Section II

proposes the related work. Section III contains architecture of
proposed Vedic multiplier. Section IV provides application of
proposed Vedic multiplier in parallel FIR filter. Section V
contains results and discussion. Section VI conclusions
followed by future work.
II. RELATED WORK

Laxman P.Thakre et al., [1] proposed the concept of
URDHVA TIRYAKBHYAM sutra and NIKILAM sutra. The
conventional multiplier has been synthesized, simulated &
compared with Vedic multiplier individually & also tested in
FFT operations. It reveals that 80% more hardware, memory
& area are consumed by conventional multiplier when
compared to Vedic multiplier. S.S.Kerur et al., [2] proposed
fast Vedic multiplications are required to compute
Convolution, FFT, etc. Here multiplication is done using
URDHVA TIRYAKBHYAM algorithm based on ancient
Vedic mathematics. The combinational delay obtained is
compared with the normal multiplier. Further the Vedic
multiplier is used in matrix multiplication. The proposed
Vedic multiplier proves to be efficient in terms of speed.
G.Vaithiyanathan et al., [3] proposed that Vedic multiplier
implemented using URDHVA TIRYAKBHYAM sutra
consumes less power compared to array multiplier.
PushpalataVerma and K.K.Mehta [4] proposed 8 bit
architecture of Vedic multiplier using ripple carry adder. The
proposed architecture is compared with conventional method
of multiplication, it is inferred that proposed is faster
compared to conventional method. Nivedita A. Pande et al.,
[5] proposed a design-methodology for high-speed
multiplications, where two integers of n-bit size each are
multiplied to produce a 2n-bit product. This paper presents an
efficient and simple method for performing multiplications,
where the lower level multipliers can be used to design higher
level multipliers. Krishnaveni D and Umarani [6] proposed a
4x4 binary multiplier designed using URDHVA
TIRYAKBHYAM sutra. A new 4-bit adder is proposed which
when used in multiplier, reduces its delay. R.Naresh Naik et
al., [7] proposed floating point multipliers implemented using
URDHVA TIRYAKBHYAM sutra shows considerably
improvement in speed when compared to normal Booth
multiplier. R.Priya and J.Senthil Kumar [8] proposed a
modified CLSA applied to Vedic multiplier which shows
reduction in gate count compared to regular CLSA. Arushi
Somani et al., [9] proposed Vedic multiplier and compared
with conventional method in terms of power, delay, space,
speed, power delay product and energy delay product.
III. PROPOSED ARCHITECTURE OF 16X16 VEDIC
MULTIPLIER
The multiplication of 2x2 using Urdhva Tiryakbhyam for
13x12 is as explained below. The least significant digit 3 of
multiplicand is multiplied vertically by least significant digit 2
of the multiplier, get their product 6 and set it down as the
least significant part of the answer, then 2 and 1, 3 and 1 are
multiplied crosswise, add the two, get 5 as the sum and set it
down as the middle part of the answer, then 1 and 1 is
multiplied vertically, get 1 as their product and put it down as
the last the left hand most part of the answer.So 13 12 =
156.
The Urdhva Tiryagbhyam algorithm can be implemented for

binary number system in the same way as decimal number.
The 22 Vedic multiplier module is then used to implement
higher level multipliers (44 multiplier, 88 multiplier, 1616
multiplier). The 16x16 proposed Urdhva Tiryakbhyam using
8x8 Vedic which are designed using 4x4 in turn 2x2 modules
and high speed carry select adder is as shown in Figure 1. The
proposed method was compared with existing architecture
[10] where the modular design of Vedic Urdhva Tiryagbhyam
method using carry save adder and 17 bit ripple adder were
discussed. In our proposed method the adders are replaced
with high speed carry select adders which in turn reduce the
delay and increases the speed of entire Vedic architecture.
B15-b8 a15-a8
8x8VM
b7-b0 a15-a8
8x8 VM
(15-0)
b15-b8
a7-a0 b7-b0
8x8VM
(15-0)
(15-0)
Carry Select
Adder
Carry Select Adder
Q31-Q16
Q15-Q8
a7-a0
8x8 VM
(15-8)
(7-0)
Q7-Q0
Figure 1. Proposed architecture of 16x16 bit Vedic multiplier
The proposed 16x16 bit Vedic multiplier is structured using

four 8x8 Vedic modules and two carry select adders. The
carry select adder used in the design increases the speed of
addition of partial products, as carry select adders has less
delay when compared with all other adders. The 16 bit
multiplicand A can be decomposed into pair of 8 bits AH-AL.
Similarly multiplicand B can be decomposed into BH-BL.
The least significant 8 bits of multiplicand i.e. (a7-a0) and
(b7-b0) are multiplied vertically which gives the LSB 8 bits
product (Q7-Q0). The right most carry select adder adds three
input from three right most 8x8 Vedic multiplier. The LSB
bits of right most carry select adder is retained for the product
(Q15-Q8) and the MSB bits are fed to the left most carry
select adder, to add with the output of left most 8x8 Vedic
multiplier which gives the final 16 bit product (Q31-Q16).
The outputs of 8X8 bit multipliers are added according to
UrdhvaTiryagbhyam Sutra to obtain the 32 bits final product.
Thus, in the final stage two carry select adders were used so
that there is considerable improvement in speed.
IV. PROPOSED VEDIC MULTIPLIER IN PARALLEL FIR

ARCHITECTURE
((H0 + H2) (X0 + X2) H0X0

[(H0 + H2) (X0 + X2)
FAST FIR ALGORITHM (FFA) :
(H0 H2) (X0 X2)])}
The Finite Impulse Response (FIR) performance speed can be

improved by utilizing property of symmetry. To exploit the
symmetry of coefficients, main idea is to manipulate the
polyphase decomposition to earn as many subfilter blocks as
possible, which contain symmetric coefficients so that half the
number of multipliers within a single subfilter block can be
utilized for the multiplications of whole taps. For a set of
symmetric coefficient in odd length N, when (N mod 3) equals
zero as shown in equation 5 can earn two more sub filter
blocks containing symmetric coefficients. The implementation
of the three-parallel FIR filter [11] modified for 33-tap FIR
filter is shown in Figure 2.
Y1 =
+ {(H0 + H2) (X0 + X2)

[(H0 + H2) (X0 + X2)
(H0 H2) (X0 X2)] H0X0}
+
h (0) = h (32),
h (1) = h(31),
h (2) = h (30),
h (3) = h (29),
h (4) = h(28),
h(5) = h(27), . . . ,h (12) = h (20),
{(H0 + H2) (X0 + X2)

[(H0 + H2) (X0 + X2)
(H0 H2) (X0 X2)] H0X0}
{h(0), h(1), h(2), h(3), h(4), h(5),
Where
(H0 + H1 + H2) (X0 + X1 + X2)

(H1+H2) (X1+X2) (H0 + H2)(X0 + X2)
Example 1: Consider a 33-tap FIR filter with a set of

symmetric coefficients as follows:
h(6), h(7), h(8), h(9), . . . , h(32)}
Y2 =
(2)
H1X1 +
[(H0+H2) (X0+X2) (H0H2) (X0X2)]
X1
H1
<<1
X2
The symmetric coefficients for 33 tap filters can be applied as

shown in Figure 2.
(H0+H2)
(3)
Y0
Y2
(H0-H2)
H0 H2 = {h (0) h (2), h (3) h (5), h (6) h (8) . . . h (18)
h (20), h (21) h (23), h (24) h(26)}
Y3
H0+H1+H2
Where h (0) h (2) = (h (32) h (30))
h (3) h (5) = (h (29) h (27))

h (6) h (8) = ( h (26) h (24))
(1)
H0
X3
h (9) h (11) = (h (23) h (21))

H1+H2
Y0 =
H0X0 +
{(H1 + H2) (X1 + X2) H1X1
Figure 2. Three parallel FIR filter implemented using symmetric coefficients

in odd length (N mod 3 =0)
Design of H1subfilter block:
X1=2
h1=2
m1=4
h4=5
h7=8
m2=10
h10=11
m3=16
a1=14
m4=22
a2=30
a3=52
h13=14
h16=17
h19=14
h22=11
h25=8
h28=5
h31=2
m5=28
m6=34
m7=28
m8 =22
m9=16
m10=10
m11=4
a4=80
a5=114
a6=142
a7=164
a8=164
a9=190
a10=194
Figure 3. Internal structure of filter blocks H1.
The symmetric parallel FIR filter is shown in Figure 2. The

three parallel FIR filter consists of filter blocks. The input to
the system is represented as X0, X1, X2 and the response of
the system as Y0, Y1 and Y2. Let X0=5, X1= 2, X2=3. The
filter blocks H1 with its mod 3 coefficients are shown in Figure
3. It requires 11 multipliers, 10 adders with 10 delay elements
to get H1.similarly filter blocks H0, H0+H1+H2, H0+H2, H0H2, H1+H2 with its mod 3 coefficients each block requires 11
multipliers, 10 adders with 10 delay elements to .With similar
approach other sub filter blocks can be drawn and Y0 +Y1+Y2
can be calculated. The proposed high speed Vedic multiplier is
used in parallel FIR architecture. The proposed technique
improves the speed of FIR filters and area utilization when
compared to traditional Vedic multiplier.
.
V. RESULTS AND DISCUSSION
existing 16x16 Vedic multiplier module. It can clearly show

that the proposed Vedic multiplier is faster compared to
existing multiplier.
TABLE II: COMPARISON RESULTS OF PARALLEL
FIR
ARCHITECTURE USING EXISTING AND PROPOSED 16X16 BIT VEDIC
MULTIPLIER
XC3S400 -5
pq208
Parallel FIR Architecture

with existing Vedic
multiplier[10]
Parallel FIR Architecture

with proposed Vedic
multiplier
Delay
73.682ns
58.924ns
Number of
Slices
3582 out of 3584 (99%)
3517 out of 3584(98%)
Number of slice
flip flop
1024 out of 7168(14%)
1024 out of 7168(14%)
Number of 4input
LUTs
6693 out of 7168(93%)
6382 out of 7168(89%)
Number of
bonded
IOBs
146 out of 141 (103%)
146 out of 141 (103%)
TABLE I : COMPARISON RESULTS OF EXISTING AND PROPOSED

VEDIC MULTIPLIER
XC3S400 -5 PQ208
Existing Vedic
multiplier[10]
Proposed Vedic
multiplier
Delay
58.24ns
40.83ns
Number of Slices
461 out of 3584(12%)
445 out of 3584 (12%)
Number of 4input
LUTs
808 out of 7168(11%)
777 out of 7168(10%)
Number of bonded
IOBs
64 out of 141 (45%)
64 out of 141 (45%)
It has been observed that for proposed 16x16 Vedic multiplier

module has gate delay of 40.843ns with Device utilization of
12% while it is 58.24 ns with Device utilization of 12% for the
It has been observed that for parallel FIR architecture using

proposed 16x16 bit Vedic multiplier module has gate delay of
58.924ns with Device utilization 3517 out of 3584 and number
of 4- input LUTs are 6382 out of 7168 while it is 73.68ns with
device utilization of 35582 out of 3584 . This shows that
proposed Vedic multiplier applied to parallel FIR Architecture
has less delay and area. It is clearly seen from the above results
that there is considerable speed improvement. Therefore the

proposed multiplier, when used in other applications can really
yield good results.
The RTL view of parallel FIR filter using proposed Vedic

multiplier is shown in Figure 5.
VI. CONCLUSION
A. RTL SCHEMATICS:
a.
RTL View of proposed 16x16 bits Vedic multiplier
The proposed 16x16 Vedic multiplier architecture has been

designed and synthesized using on Spartan 3 XC3S400 board
and is used in parallel FIR filter design. The proposed Vedic
multiplier with carry select adder is compared with the existing
Vedic multiplier and can be inferred that proposed is faster
compared to existing Vedic multiplier. In future the proposed
multiplier performance parameters can be improved by high
level pipelining operations and applied in signal processing
applications like image processing and video processing.
REFERENCES
[1]
Figure 4. Hardware implementation of proposed Vedic multiplier.
The RTL view of proposed Vedic multiplier is shown in Figure

4.
b.
RTL view of parallel FIR architecture using proposed

16x16 bit Vedic multiplier
Figure 5. Hardware implementation of parallel FIR filters using proposed

Vedic multiplier.
Laxman P.Thakre, Suresh Balpande, Umesh Akare, Sudhir Lande,

Performance Evaluation and Synthesis of Multiplier used in FFT
operation using Conventional and Vedic algorithms, Third
international conference on emerging trends in Engineering and
Technology , IEEE, 2010.
[2]
S. S. Kerur, Prakash Narchi, Jayashree C N, Harish M Kittur and
Girish V. A., Implementation of Vedic Multiplier for Digital Signal
Processing, International Conference on VLSI ,Communication &
Instrumentation (ICVCI), 2011.
[3]
G.Vaithiyanathan, K.Venkatesan, S.Sivaramakrishnan, S.Sivaand,
S.Jayakumar, Simulation and implementation of Vedic multiplier using
VHDL code, International Journal of Scientific & Engineering
Research, vol.4, 2013.
[4]
Pushpalata Verma and K. K. Mehta, Implementation of an Efficient
Multiplier based on Vedic Mathematics Using EDA Tool,
International Journal of Engineering and Advanced Technology
(IJEAT), vol.1, June 2012.
[5]
Nivedita A. Pande, Vaishali Niranjane, Anagha V. Choudhari, Vedic
Mathematics for Fast Multiplication in DSP, International Journal of
Engineering and Innovative Technology (IJEIT) ,vol.2, 2013.
[6]
Krishnaveni D. and Umarani.T.G, Vlsi implementation of Vedic
multiplier with reduced delay, International Journal of Scientific &
Engineering Research, vol.2, May-2011.
[7]
R.Naresh Naik, P.Siva Nagendra Reddy, K. Madan Mohan, Design of
Vedic Multiplier for Digital Signal Processing Applications,
International Journal of Engineering Trends and Technology (IJETT),
vol.4, 2013.
[8]
R.Priya and J.Senthil Kumar, Implementation and comparison of
Vedic Multiplier using Area Efficient CSLA Architectures,
International Journal of Computer Applications, vol 73, July 2013.
[9]
Arushi Somani, Dheeraj Jain, Sanjay Jaiswal, Kumkum Verma and
Swati Kasht, Compare Vedic multipliers with Conventional
Hierarchical array of array multipliers, Computer Technology and
Electronics Engineering (IJCTEE),vol.2, 2012.
[10] Manoranjan Pradhan, Rutuparna Panda and Sushanta Kumar Sahu,
Speed Comparison Of 16x16 Vedic Multipliers, International
Journals of Computer Applications, vol.21, May 2011.
[11] Yu-Chi Tsao and Ken Choi, Area-Efficient VLSI Implementation for
Parallel Linear-Phase FIR Digital Filters of Odd Length Based on Fast
FIR Algorithm, IEEE Transactions on circuits and systems, vol.59,
June 2012 .

Fpga Implementation of High Speed Vedic Multiplier Using Csla For Parallel Fir Architecture

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Fpga Implementation of High Speed Vedic Multiplier Using Csla For Parallel Fir Architecture

Încărcat de

Drepturi de autor:

Formate disponibile

2014 2nd International Conference on Devices, Circuits and Systems (ICDCS)