Documente Academic
Documente Profesional
Documente Cultură
1
A.MALASHRI, 2C.PARAMASIVAM,
Department of ECE, K.S.Rangasamy College of Technology, Tiruchengode.
1
malashri67@gmail.com, 2sivamvlsi@gmail.com
Abstract-- This paper presents a pipelined, reduced memory bottleneck in the pipeline of the FFT processor. The
and low power CORDIC-based architecture for fast Fourier Coordinate Rotation Digital Computer (CORDIC) [5]
transform implementation. The proposed algorithm utilizes algorithm is an alternative method to realize the butterfly
a new addressing scheme and the associated angle generator operation without using any dedicated multiplier
logic in order to remove any ROM usage for storing twiddle hardware. CORDIC algorithm is very versatile and
factors. CORDIC is implemented by a simple hardware hardware efficient since it requires only add and shift
through repeated shift-add operations Low power is
achieved by the using the Coordinate Rotation Digital
operations, making it very suitable for the butterfly
Computer algorithm in the place of conventional operations in FFT [6]. Instead of storing actual twiddle
multiplication and furthermore, dynamic power factors in a ROM, the CORDIC-based FFT processor
consumption is reduced with no delay penalties. needs to store only the twiddle factor angles in a ROM for
the butterfly operation. Additionally, the CORDIC-based
Index Terms— FFT, CORDIC, VLSI, Low power butterfly can be twice faster than traditional multiplier-
based butterflies in VLSI implementations.
I. INTRODUCTION In this, we propose a modified CORDIC
algorithm for FFT processors which eliminates the need
Fast Fourier transform (FFT) is among the most for storing the twiddle factor angles. The algorithm
widely used operations in digital signal processing. Often, generates the twiddle factor angles successively by an
a high performance FFT processor is the key component accumulator. With this approach, full memory
and determines most of the design metrics in many requirements of an FFT processor can be reduced by more
applications such as Orthogonal Frequency-Division than 20%. Memory reduction improves with the increased
Multiplexing (OFDM), Synthetic Aperture Radar (SAR) radix size. Since the critical path is not modified with the
and software defined radio. For embedded systems, in CORDIC angle calculation, system throughput does not
particular portable devices; efficient hardware realization change.
of FFT with small area, low-power dissipation and real-
time computation is a significant challenge. II. FAST FOURIER TRANSFORM
A typical FFT processor is composed of butterfly The N-point discrete Fourier transform is defined by
calculation units, memory banks and control logic
(address generator for data and twiddle factor accesses).
In most cases, an FFT processor uses only one butterfly
unit to realize all calculations iteratively, and the “in-
place” memory access strategy is required for the least
amount of memory. With “inplace” strategy, the outputs
of a butterfly operation are stored back to the same Figure 1 shows the signal flow graph of 16-point
memory location of the inputs, saving the memory usage decimation-in-frequency (DIF) radix-2 FFT. FFT
by one half. However, correct memory addressing scheme algorithm is composed of butterfly calculation units:
is required to avoid the data conflict. This study
implements an efficient addressing scheme to realize the
parallel, pipelined and “in-place” memory accessing. It
produces an output at every clock cycle; furthermore the
memory banks and the butterfly unit are utilized with
100% efficiency within the pipeline.
Equations (2), (3) describe the radix-2 butterfly operation
In FFT processors, butterfly operation is the at stage m as shown in Fig.1. Each butterfly operation
most computationally demanding stage. Traditionally, a needs four data accesses (two read and two write);
butterfly unit is composed of complex adders and however, hardware realization of four port memory units
multipliers, and the multiplier is usually the speed is difficult and costly.
2
been suggested for high-throughput computation. to the muxing unit. The multiplexer unit will produce
CORDIC algorithm is often realized by pipeline input for the core block.
structures, leading to high processing speed. Figure 2
shows the basic structure of the pipelined CORDIC unit. INPUT INPUT BLOCK
Table 1.Address generation table of the proposed design for 16-point radix-2 FFT
REFERENCES
[1] Xiao, X., Oruklu, E., & Saniie, J. (2012) Low Power And Reduced
Memory Architecture for CORDIC-based FFT, J Sign Process Syst
[2] Wey, C., Lin, S., & Tang, W. (2007). Efficient memory-based FFT
processors for OFDM applications. In IEEE International Conf. on
Electro-Information Technology, 345–350. May.
[3] Mittal, S., Khan, M., & Srinivas, M. B. (2007). On the suitability of
Bruun’s FFT algorithm for software defined radio. In 2007 IEEE
Sarnoff Symposium, (pp. 1–5),Apr.
[7] Abdullah, S. S., Nam, H., McDermot, M., & Abraham, J. A. (2009).
A high throughput FFT processor with no multipliers. In IEEE
International Conf. on Computer Design, pp. 485–4 90.
[11] Xiao, X., Oruklu, E., & Saniie, J. (2009). Fast memory addressing
scheme for radix-4 FFT implementation. In IEEE International
Conference on Electro/Information Technology, EIT 2009, 437–
440, June.
[12] Xiao, X., Oruklu, E., & Saniie, J. (2010) Reduced Memory
Architecture for CORDIC-based FFT. In IEEE International
Symposium on Circuits and Systems, 2690–2693.