Sunteți pe pagina 1din 5

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan.

04 06, 2013, Coimbatore, INDIA

An area efficient multiplexer based CORDIC


V.Naresh , B.Venkataramani and R.Raja
National Institute of Technology, Tiruchirappalli,India
308111004@nitt.edu, bvenki@nitt.edu,408111051@nitt.edu
AbstractIn the literature, multiplexer has been proposed for
the ASIC implementation of unrolled CORDIC (COordinate
Rotation DIgital Computer) processor. In this paper, the
efficacy of this approach is studied for the implementation on
FPGA. For this study, both non pipelined and 2 level pipelined
CORDIC with 8 stages and using two schemes one using
adders in all the stages and another using multiplexers in the
second and third stages. A 16 bit CORDIC for generating the
sine/cosine functions is implemented using all the four schemes
on both Xilinx Virtex 6 FPGA(XC6VLX240) and Altera
Cyclone II FPGA(EP2C20F484C7). From the implementation
results, it is found that the nonpipelined and pipelined
CORDICs using multiplexer requires 1.6, 1.4 times lower area
in Xilinx FPGA and 1.8, 1.6 times lower area in Altera FPGA
than that using only adders. This is achieved without reduction
in speed.
Keywords-CORDIC, rotation mode, multiplexer, pipelining,
FPGA.

I.

INTRODUCTION

CORDIC is the acronym for COordinate Rotation DIgital


Computer. It is an iterative algorithm introduced by Jack E.
Volder [1] and later refined by Walther [2] and others.
CORDIC unit uses only shifts and adds to perform a wide
range of functions including vector rotations, certain
trigonometric, hyperbolic, linear and logarithmic functions, it
can realize universal modulator [4], demodulator [5].
CORDIC algorithm is used in diverse applications such as
mathematical coprocessor units, calculators, waveform
generators, universal modulator, demodulator digital filters
carrier as well as bit time recovery circuits and digital
modems. In the rotation mode, CORDIC may be used for
converting a vector in polar form to rectangular form.
CORDIC algorithm is very well suited for VLSI
implementation. The block diagram of unrolled CORDIC
with 8 stages is shown in Fig.1. Even though adders and
shifters were originally used for the implementation of
CORDIC, a novel scheme which uses multiplexer (MUX)
for few stages of unrolled CORDIC is proposed in [3] and is
studied by implementation on ASIC. In this scheme, the first
stage is removed and adders at the 2nd and 3rd stages are
replaced by multiplexers. This methodology achieves less
area compared to original unrolled CORDIC.
FPGAs such as Virtex 6 and Cyclone II have fixed logic
blocks which contain various functional blocks such as look
up tables, fast carry logic and flip flops. The objective of this
paper is to study how an FPGA based unrolled CORDIC
using multiplexer performs compared to that using adders.

978-1-4673-2907-1/13/$31.00 2013 IEEE

The organization of the paper is as follows: Section II


gives an overview of CORDIC algorithm. Section III
explains the unrolled CORDIC algorithm in rotation
mode; section IV presents the Mux based CORDIC;
section V describes the pipelined Mux based unrolled
CORDIC. Section VI presents the implementation
results. Section VII gives the conclusion.
II. OVERVIEW OF CORDIC ALGORITHM
The CORDIC algorithm is an iterative method of
performing vector rotations by arbitrary angles using
shifts and adds. In the rotation mode, CORDIC may be
used for converting a vector in polar form to rectangular
form. In the vector mode, it converts a vector in
rectangular form to polar form. Both the modes are
derived from the general rotation transform.
xf = x cos y sin
(1)
yf = x sin + y cos
(2)
Equation (1) and (2) rotates a vector ( x , y ) in a
Cartesian plane by an angle to another vector with the
coordinates (xf ,yf ). The rotation may be achieved by
performing a series of successively smaller elementary
rotations , , .
Rotation of the vector by an angle can be rewritten as
x = x cos y sin
(3)
y = x sin + y cos
(4)
= x y tan

(5)

= y + x tan

(6)

The computational complexity of (5), (6) can be reduced


by rewriting these equations as

)=

x
y

= x y tan
= y + x tan

(7)
(8)
1

(9)

To get the final coordinate values, we should perform


divide ( , ) by 0
. The value of for i=0,
1,2,,N is chosen such that tan is 2 . This reduces
the multiplication by the tan to simple shift operation.
becomes smaller and
As the iteration increases,
smaller.
We may terminate the iteration when the difference
becomes very small for some value of
between =

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

N. The remaining angle by which the vector needs to be


rotated after completion of i iterations is indicated by the
parameter z
and is defined by (10).
z = z
(10)
i is considered to be positive when the rotation required
is anticlockwise and is negative otherwise. The direction
of this rotation depends on the i.
= sgn (z )
(11)
The computation of 0
may be simplified as
follows: Since cos =1 for very smaller values of ,

may be computed for N=8 and may be used


for any value of N>8.

advantage of not using extra hardware for division and it


results in less hardware complexity. Initially, we assign
constant values to x and y. These values are shifted by j
bits, where j is the integer {0, 1, 2, 3, 4, 5, 6, 7} which
results in division of x and y by 1,2, 4, 8, 16, 32, 64 and 128
for every stage. In this mode, the vector is iteratively rotated
to make new vectors in the intermediate stages to get the
desired angle.
IV. MUX BASED CORDIC
The scheme for reducing the area of the CORDIC
using multiplexer is proposed for the ASIC implementation
in paper [3]. This is adopted for the FPGA based
implementation in this paper. The area is reduced by
removing some of the stages of Fig.1. The first stage output
of original unrolled CORDIC architecture is equal to xi,
therefore we can directly write the output of first stage as
(16)
y1= xi
x1 = xi

(17)

If the first stage output is positive, then

y2= y1

(18)

x2 = x1 +
Figure2. CORDIC Rotations

Fig.2 shows an example of three vector rotations,


corresponding to a three stage CORDIC. The vector v
gives the approximated values of sine and cosine after
three rotations. All equations of CORDIC algorithm can
be implemented by only additions, subtractions and
shifts; therefore CORDIC algorithm does not require
multipliers.
III. THE UNROLLED CORDIC IN ROTATION MODE
In rotation mode, CORDIC can simultaneously compute the
sine and cosine of the input angles. In this mode, we set the y
component of the input vector to zero, x component to 1/k
and the angle accumulator is initialized with the desired
rotation angle . The output of angle accumulator decreases
or increases depending on the most significant bit of the
output of the previous stage. For rotation mode, the CORDIC
equations are given by
xi+1 = xi yi. .2

(12)

yi+1 = yi + xi . .2

(13)

=z

K =0

.tan

(14)
(15)

Fig.2shows the architecture of the eight stage unrolled


CORDIC; this consists of only adders, subtractors and
shifters; accuracy improves as the number of stages
increases. Addition or subtraction on the angle value takes
place in each rotation of the vector depending on the most
significant bit of previous angle. We can perform division
just by doing right shift using shift registers. This has the

(19)

The vector coordinates corresponding to negative output is


y2 = y1 +
x2 = x1

y1
2

(20)

(21)

The output of the second stage is fixed. So we can


implement the second stage using two Muxes and
choosing select line as the MSB bit of the previous angle
accumulator output. Fig.3 shows the circuit of second
stage using Muxes with MSB select line.

Figure . 3
To reduce the area, we replace the third stage with
Muxes. Since the third stage output also depends only on
xi, we can express the outputs as
3 xi

y3 = y2 +

x3 = x2

= i

2
x

+ =
=

13 xi
8

(22)
(23)

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

for sgn1=0, sgn2=0


=

x3 = x2

y3 = y2

x3 = x2 +

= +
=
for sgn1=0,sgn2=1

(27)

y3 = y2

(28)

x3 = x2 +

=
+ =
for sgn1=1,sgn2=1

Figure. 5

(29)

The block diagram of the CORDIC when the adders till


third stage are replaced with Muxes is shown in Fig.4. As
the adders are replaced with Muxes, the area of the circuit is
reduced till 3rd stage. But the replacement of adders with
Muxes beyond the third stage results in an exponential
increase in the number of Muxes as shown in Table I.
Table I. Multiplexers required for eliminating different stages

No. of eliminated

No. of Muxes

Stages

Required

14

30

V. PIPELINED MUX BASED UNROLLED CORDIC


The pipelined CORDIC use registers in between
each iteration stage as shown in Fig.5. The advantage of
pipelined unrolled CORDIC over the unrolled CORDIC is
its higher frequency of operation. This property can be used
in high speed applications. The number of registers depends
on the number of stages in pipelining and there will be an
increase in area. The first output of an N-stage pipelined
CORDIC core is obtained after N clock cycles. Thereafter,
outputs will be generated during every clock cycle. In this
paper, pipelined registers are placed after fourth and seventh
stages. Fig.4 shows the Mux based pipeline unrolled
CORDIC architecture in which pipeline registers are
inserted at the output of fourth and seventh stage.

VI. RESULT
A 16 bit CORDIC for generating the sine/cosine
functions with and without two level piplining are
implemented using both mux based aproach and also one
using shifters and adders. The implementation resuls on
Xilinx Virtex 6 FPGA(XC6VLX240) and Altera Cyclone
II FPGA(EP2C20F484C7 devices are shown in TableII and
Table III respectively. From these Tables, it is found that the
nonpipelined and pipelined CORDICs using multiplexer
requires 1.6, 1.4 times lower area in Xilinx Virtex 6 FPGA
and 1.8, 1.6 times lower area in Altera cyclone II than that
using only adders. This is achieved without reduction in
speed.
TABLE II.

Implementation Results of Xlilnx Virtex 6 FPGA

Slice
LUTs
Critical
Path
Delay(ns)
Frequency
(MHz)

281

310

503

28

28

13

13

35

35

77

77

458

TABLE III. Implementation results onAltera Cyclone II FPGA

Logic
Elements
Critical
Path
Delay(ns)
Frequency
(MHz)

Unrolled
two level
pipeline

(26)

Unrolled
two level
Pipeline

Mux based
two level
Pipeline

(25)

Mux
based two
level
Pipeline

=
for sgn1=1,sgn2=0

(24)

Mux
Based
Without
Pipeline

Mux
Based
Without
Pipeline

Unrolled
CORDIC

Unrolled
CORDIC

7 xi

y3 = y2 +

873

471

530

959

83

83

29

29

12

12

34

34

The Implementation statistical result of all the four schemes


on Xilinx Virtex 6 is represented in column chart as shown
in Fig.6.

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

600
500
400
LUTs

300
Critical Path
Delay

200

Frequency

100
0
Mux Original
Original Mux
Unrolled based based Unrolled
CORDIC without Double double
Pipeline PipelinedPipelined

Figure.6
VII. CONCLUSIONS
The design proposed in this paper for unrolled
CORDIC is based on eliminating some of the adder
stages by introduction of Muxes for area reduction. By
implementation on Xilinx and Altera FPGAs, it is
verified that the area reduction is achieved in these
devices also in addition to the implementation on ASICs
as reported in the literature.
REFERENCES
[1]

J.E. Volder, The CORDIC Trigonometric Computing Technique,


IRE Transactions on Electronic computer, vol. EC-8, pp. 330-334,
1959.

[2]

J. Walther, a unified algorithm for elementary functions, proc.


Spring joint comp. con & vol.38, pp.379-385, 1971.

[3]

Peter Nilsson, complexity reduction in unrolled CORDIC


architectures Electronics, circuits, and systems,2009.ICECS 2009,
pp.868-871.

[4]

Vankka, J.; Kosunen, M.; Hubach, J.; Halonen, K.; , "A CORDICbased multicarrier QAM modulator," Global Telecommunications
Conference,1999.GLOBECOM99,vol.1A,no.,pp.
173177vol.1a,1999 .

[5]

Chen, A.; McDanell, R.; Boytim, M.; Pogue, R.;, "Modified CORDIC
demodulator implementation for digital IF-sampled receiver," Global
Telecommunications Conference, 1995. GLOBECOM '95., IEEE ,
vol.2, no., pp.1450-1454 vol.2, 14-16 Nov 1995.

[6]

Deprettere, E.; Dewilde, P.; Udo, R.;, "Pipelined cordic architectures


for fast VLSI filtering and array processing," Acoustics, Speech, and
Signal Processing, IEEE International Conference on ICASSP '84. ,
vol.9, no., pp. 250- 253, Mar 1984.

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

Figure 1. Original unrolled CORDIC

Figure 4. Mux based pipelined unrolledCORDIC

S-ar putea să vă placă și