Sunteți pe pagina 1din 7

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261277927

FPGA implementation of fast serial 64-points


FFT/IFFT block without reordering block

Conference Paper · May 2013


DOI: 10.1109/ICIEV.2013.6572558

CITATIONS READS

3 95

4 authors, including:

Muhammad Firmansyah Kasim Trio Adiono


University of Oxford Bandung Institute of Technology
25 PUBLICATIONS 74 CITATIONS 173 PUBLICATIONS 473 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Software Defined Radio Based RF Monitoring System View project

Nonlinear Dynamic Modeling of a Fixed-Wing UAV View project

All content following this page was uploaded by Muhammad Firmansyah Kasim on 05 June 2015.

The user has requested enhancement of the downloaded file.


FPGA Implementation of Fast Serial 64-Points
FFT/IFFT Block without Reordering Block
1
Muhammad Firmansyah Kasim, 2Trio Adiono,
3
Muhammad Fahreza, 4Muhammad Fadhli Zakiy
Sekolah Teknik Elektro Informatika
Institut Teknologi Bandung
Bandung, Indonesia
1
copycat91@students.itb.ac.id, 2tadiono@stei.itb.ac.id,
3
muhammad.fahreza@students.itb.ac.id, 4fadhli.zakiy@students.itb.ac.id

Abstract— There has been many FPGA implementation of serial order[5]-[9]. Therefore, it needs a reordering block to reorder
Fast Fourier Transform (FFT) operation. In the most cases, its output. If there is no reordering block in the FFT block, the
output of the serial FFT block is in bit-reversed order, so it needs challenge is to make sure that the original data will be restored
a reordering block to reorder the output. However, some of FFT in case FFT block's output enters the IFFT block.
applications do not require ordered output of FFT, such like
Spectral Subtraction method[1]. In this paper, we propose an
FFT has wide range of applications. Some of the
FPGA implementation of serial FFT and IFFT architecture in applications do not require ordered value of FFT result. For
one block without reordering block. By not implementing the instance, Spectral Subtraction (SS) method by Boll[1]. SS
reordering block, we can save some clock cycles latency and method performs noise cancellation in frequency spectrum
increase speed of the block. The architecture is implemented in based on previous values of FFT result at the same index. The
Altera DE2-70 board with Cyclone II EP2C35F672C6 FPGA FFT/IFFT architecture in this paper is intended to be used in
chip. Our 64-points FFT/IFFT block utilizes 2960 logic elements such applications.
or half of logic elements utilized by Altera MegaFunction's FFT In this paper, we explain the architecture of a block that can
IP. The block can work in maximum frequency of 84.55MHz and
perform FFT and IFFT operations in one block. We use 64-
perform 64-points FFT/IFFT operation in 863.4ns.
points FFT and IFFT operations. Input of this block enters the
Keywords— FFT, FPGA implementation, reordering block. block serially and its output is also available serially.
Organization of this paper is as follow. In the section II, we
I. INTRODUCTION elaborates FFT operation. In section III, we give overview of
Discrete Fourier Transform (DFT) is the Fourier Transform FFT/IFFT block architecture while in section IV we explains
with discrete index. Equation of DFT is defined as follow. more about the building blocks of the FFT/IFFT block. Next
N−1 in section V, performance of our FFT/IFFT block is expressed
X (k ) = ∑ x( n) e− j 2 π k n / N (1) in terms of area and speed. Last, section VI contains
n =0 conclusion of this paper.
Results of DFT operation is a frequency representation of a
signal. Direct calculation of DFT above will result in O(N2) II. FFT/IFFT OPERATIONS
time-complexity, which will grow very large for large N. FFT operation is defined as follow.
In 1965, Cooley-Tukey[2] introduced a method to compute
DFT in O(N logN) time-complexity. It is based on symmetry N−1

of the complex coefficients. The method proposed by Cooley- X (k ) = ∑ x( n) e− j 2 π k n / N (2)


n =0
Tukey is usually called as Fast Fourier Transform (FFT).
There has been many implementations of FFT operations in
VLSI. Mahdavi[3] worked on FPGA implementation of radix- where n is index of input signal in time domain, k is index of
2 1024-points FFT using floating point and parallel signal in frequency domain and N is total input points.
calculation. Zhang[4] also implemented parallel 16-points For N=64, we can decompose k and n into
FFT block with maximum frequency of 50MHz using radix-4
architecture. In 2009, Saeed[5] implement FFT/IFFT k =k 2 +4 k 1+16 k 0
processor in FPGA using radix-22 architecture using serial (3)
n=n2 +4 n 1+16 n0
input and output. Serial FFT block requires smaller area in the
implementation, compared to parallel FFT block.
where k0, k1, k2, n0, n1, n2, = {0, 1, 2, 3}.
In the most FPGA implementation of serial FFT blocks,
Thus, we can write the FFT operation as below.
output of the FFT blocks is unordered, i.e. bit-reversed

978-1-4799-0400-6/13/$31.00 ©2013 IEEE


X (k 2+4 k 1+16 k 0 ) = ∑ x (n 2 +4 n 1+16 n 0 )W 64
nk
(4)
n2, n1, n0

− j 2 π/N
with W 64=e . By subtituting the value of n0 to be 0,
1, 2, and 3, we get

{ }
x(n 2 +4 n1 )
16 k
+ x( n2+4 n 1+16)W 64 k (n +4 n )
X (k )= ∑ W 64 . 2 1
(5)
n2, n1 + x( n2+4 n 1+32)W 32
64
k

48 k
+ x( n2+4 n1 +48)W 64

16
Replacing W 64=− j from equation 5 results
Figure 1: FFT block diagram

{ }
x(n 2 +4 n1 ) k 2 n2
k X (k 2+4 k 1+16 k 0 )= ∑ x2 ( n 2+4 k 1+16 k 2) W 4 (11)
+ x( n2+4 n1 +16)(− j) k( n +4 n )
X (k )= ∑ W 64 . 2 1
(6) n2

n2, n1 + x( n2+4 n1 +32)(−1) k


k
+ x( n2+4 n1 +48) j where

x 2( n 2+4 k 1+16 k 2) =
The twiddle factor at the right, W64, can be rewritten as

{ }
x1 (n 2 +16 k 2 )
(k 2 +4 k 1 +16 k 0 )(n 2+4 n1 ) k 2 (n2 +4 n1 ) 4 (k 1 +4 k 0 )(n2 +4 n1 ) k
W = W W + x1 ( n2+4+16 k 2)(− j) k n
2
(12)
k W 16 .
64 64 64 1 2

k 2 (n2 +4 n1 ) (k 1 +4 k 0 )(n2 +4 n1) (7)


= W 64 W 16 . + x1 ( n2+8+16 k 2)(−1) 2

k
+ x1 ( n2+12+16 k 2) j 2

From equation 6 and 7, we get


Finally, using the same steps, we can get

{ }
x(n2 +4 n1 )
+ x(n 2+4 n1 +16)(− j)
kA X (k 2+4 k 1+16 k 0 )= x 3( k 0 +4 k 1+16 k 2 ) (13)
X (k )= ∑
k 2 nA k A nA
kA W 64 W 16 . (8)
n2, n1 + x(n 2+4 n1 +32)(−1)
kA where
+ x(n 2+4 n1 +48) j

x3 ( k 0+ 4 k 1 +16 k 2) =
where kA=k2+4k1 and nA=n2+4n1. Hence, the 64-points FFT

{ }
equation can be written as x 2 (4 k 1+16 k 2 )
k2
+ x 2 (1+4 k 1+16 k 2 )(− j ) (14)
k A nA k2 .
X (k 2+4 k 1+16 k 0 )= ∑ x 1 (n 2+ 4 n1+16 k 2) W + x 2 (2+4 k 1+16 k 2)(−1)
16 (9) k2
n2, n1
+ x 2 (3+4 k 1+16 k 2 ) j

with
The equation 13 shows that the output is in bit-reversed order.
According to equation 10, at the first stage, set of input to
x1 ( n 2+4 n1+16 k 2 ) =
be processed is x(nA), x(nA+16), x(nA+32), x(nA+48). At the

{ }
x( n 2+4 n1) second stage, set of input to be processed according to
+ x( n 2+4 n 1+16)(− j ) k (n +4 n ) (10)
kA
equation 12 is x1(n2), x1(n2+4), x1(n2+8), x1(n2+12). Last,
k W 64 . 2 2 1
according to equation 11, to get the result of 64-points FFT,
+ x( n 2+4 n 1+32)(−1 ) A

k
set of input to be processed is x2(0), x2(1), x2(2), x2(3).
+ x( n 2+4 n 1+48) j A
Similar to FFT, the IFFT operation is as follow.

k 2 (n2 +4 n 1) N−1
The factor W 64 at the right is the twiddle factor. 1
Using the same steps, we can simplify equation 9 to be
x( n) =
N
∑ X (k )W −kN n . (15)
k =0
Figure 2: Pairs of input that should operates together for ordered input. Figure 4: Pairs of input that should operate together with unordered
Input that corresponds to the same circle are operated together in a input. Input that corresponds to the same circle are operated together in a
radix-22 block. For this case, the first 4 input are operated at different radix-22 block. For this case, the first 4 input are operated together.
time.
A general control block is not required in this
implementation. This is because each radix-2 2 block and TF
multiplier block has its own control block based on its load
signal and a counter inside each block. Figure 1 shows the
block diagram of the FFT/IFFT block.
A radix-22 block contains 2 blocks, those are radix-2A and
radix-2B. For short, we call radix-2A and radix-2B as radix-
2X in general. Each radix-2X block is accompanied by a delay
Figure 3: FFT block diagram (2) block. The delay block is used to make sure that one input
data operates with the correct other input. For 16-points FFT,
The IFFT operation is similar to FFT operation. Hence, a little operation of radix-22 pairs of input are shown in figure 2 while
modification to FFT operation is enough to have the for 16-points unordered IFFT are shown in figure 4.
architecture to calculate IFFT. For the figure 2, the first stage, pairs 0-4-8-12 operate
together, as well as pairs 1-5-9-13, 2-6-10-14, and 3-9-11-15.
III. FFT/IFFT BLOCK OVERVIEW Contrast with ordered input, for unordered input, pairs 0-1-2-3
FFT/IFFT block implemented in this paper uses single path are operated together, as well as 4-5-6-7, 8-9-10-11, and 12-
radix-22 single path delay feedback architecture. The 13-14-15.
architecture that uses radix-22 ensures a less utilization of Based on figures 2 and 4, for N-points FFT operation with
logic elements and multipliers. For 64-points case, it requires ordered input, the first delay block has to make delay of N/2
3 radix-22 blocks and 2 twiddle factor (TF) multipliers. clock cycles, then the second delay block has to make delay of
Input of this block enters the block serially. If the input that N/4 clock cycles, and so on until the last block that gives
enters the block is in normal order, output of the FFT/IFFT delay of 1 clock cycle. Otherwise, for N-points IFFT operation
block is in bit-reversed order (unordered). In ordinary FFT with unordered input, the first delay block has to give delay of
block, an additional reordering block is required to order the 1 clock cycle, then the next delay block gives delay of 2 clock
unordered output. However, if the unordered input enter the cycles, and so on until the last block that gives delay of N/2
FFT/IFFT block, the output's order will be restored. In our clock cycles. Block diagram of FFT block with delay block is
architecture, we implement FFT operation with ordered input shown in figure 3.
and IFFT operation with unordered input. Hence, there is no For consecutive FFT-IFFT operations, one has to make sure
reordering block for applications that utilize FFT and IFFT that the FFT operation is finished before doing IFFT
operations in our architecture. It is to reduce some clock operation. If doesn't, register memory at the delay block for
cycles latency. FFT operation will be overwritten by content for IFFT
operation. For N-points operations, it is safe to wait for N
Figure 6: Radix-2B block diagram. This block is similar to radix-2A block
except that some of the input is multiplied by -j for FFT and j for IFFT.

Figure 5: Radix-2A block diagram load signal is active. Selector signal, SA, is active according to
equation below.
clock cycles after the first output is valid before doing the
IFFT operation. The waiting period of N clock cycles can be
SA = [(c / Nd) % 2 == 0]. (19)
used for other operations in frequency domain.
Differences between FFT and IFFT operations can be seen
from their equations. The FFT equation is as below, The operator '/' means integer division operation and '%'
means modulo operation.
N−1 In order to reduce area of implementation while
X (k ) = ∑ x( n) e− j 2 π k t/ N (16) maintaining the accuracy, output of adder is shifted by one bit
n =0 to the right. Therefore, if the input of a radix-2A has format of
Q1.15, the output has format of Q2.14 if FFT operation is
while IFFT equation is as below, performed or Q1.15 if IFFT operation is performed.

N−1
B. Radix-2B Block
1 Radix-2B block is just like radix-2A block with a little
x( n) =
N
∑ X (k ) e j2π k n/N
. (17)
k =0 modification in the data input. Radix-2B block operates input
according to equations below.
It can be inferred that differences of FFT and IFFT operations
are: y m= z m+ z n+m
• exponential factor in FFT are minus of exponential y 2 n +m= z 2 n+m+(− j) z 3 n +m
factor in IFFT; (20)
y n+m= z m−z n +m
• in IFFT operation, the sum result needs to divide y 3 n= z 2 n +m−(− j) z 3 n+m
with N, while FFT does not.
The differences between FFT and IFFT operations are
handled in radix-2X blocks and TF multiplier blocks. where zp, is the p-th input to the radix-2B, yp is the p-th output
of the radix-2B, n is the delay of the delay block
IV. FFT/IFFT SUB-BLOCKS accompanying the respective radix-2B block, and m is the
arbitrary integer, m < n. The data input is controlled whether it
A. Radix-2A Block has to be multiplied by -j, j, or by 1. Block diagram of radix-
Radix-2A block is just like an ordinary radix-2 block. 2B is shown in figure 6.
Radix-2A computes the equations below Selector between j and -j is a signal that indicates if the
y m= x m+ xm+n process is IFFT. This is to handle the differences between FFT
(18) and IFFT operation in equation 16 and 17.
y m+n= x m− x m+n
The control block controls selector signals and
where xp is the p-th input, yp are the p-th output, n is delay of acknowledge output signal. The acknowledge signal is simply
the delay block accompanying the radix-2A block, and m is like the acknowledge signal in radix-2A block. The control
arbitrary integer, m < n. Block diagram of radix-2A block is block also implements a counter, c, like radix-2A block. There
shown in figure 5. are 2 selector signals from the control block, SA and SB. SA is a
The control block inside a radix-2A block controls the selector signal just like in radix-2A while SB is a selector
selector of multiplexers and acknowledge signal based on how signal for the left multiplexer. SA is active according to
many delay clock cycles associated with the block, Nd. The equation 19 while SB is active according to the equation
control block also implements a counter, c, that starts with 0 below.
and increases if load signal is active in each clock cycle.
Acknowledge signal is active after Nd clock cycles after the
SB = [(c / 4floor(log4(Nd))) % 4 == 3]. (21) different delay, we add one additional input of this block to set
the upper bound of address pointer.
Implementation using RAM instead of register block is
Similar to the radix-2A block, output of adder inside this advantageous in area reduction. Table I shows the comparison
block is shifted by one bit to the right to reduce area used of total logic elements and memory bits between
while maintaining accuracy. implementation using flip-flops and RAM of 16 bits data. It
C. Twiddle Factor (TF) Multiplier shows that implementation using RAM can reduce many logic
elements by utilizing its memory bits. In FPGA Cyclone II
Twiddle factor (TF) multiplier consists of 16-bit EP2C35F672C6, there are more memory bits available than
multipliers, adders, TF ROM, and a control block. In TF logic elements. Therefore, it is better to utilize memory bits
ROM, N values of e-j2πk/N saved in order from k=0 to N-1. For rather than logic elements in this case.
the first TF multiplier, it saves TF for N=64, and the next
multiplier saves TF for N=16. V. PERFORMANCE OF FFT/IFFT BLOCK
In control block, there is a counter, c, implemented to set
Our FFT/IFFT architecture has been implemented in FPGA
the address of TF ROM. The address of TF ROM depends on
using Altera DE2-70 board. The board uses Cyclone II
the counter value, c, and total values saved, N, as shown in the
EP2C35F672C6 FPGA chip. There are 483840 memory bits
following equation.
and 33216 logic elements available in the chip. The code is
written in Verilog and compiled by Altera Quartus v12.1.
k = rev(c / N4) (c % N4) (22) Design parameters of this block are area and speed. Area is
measured using total logic elements if implemented without
where N4 = N/4, and rev(x) is 1 if x=2, 2 if x=1, and equals to embedded multiplier (TLEs). Speed is measured using
x otherwise. If input of FFT/IFFT block is unordered, the maximum frequency and latency of the block. Table II shows
counter, c, will count in bit-reversed order. the design parameters of the FFT/IFFT block for N=64 points.
The TF acquired in the address is then multiplied by input Performance of this block can be seen by comparing area of
of this block. If the block does IFFT operation, the acquired this block with area of other 64-points FFT blocks with 16 bits
TF is conjugated first before multiplied by the input. The word length. Compared parameters are logic elements (LEs),
multiplication operation is a complex multiplication, that is embedded 9-bit multipliers (Mult), total logic elements if
implemented without embedded multipliers (TLEs), and
pr + j pi =(a r + j ai )(br + j bi ) memory bits (Mem). Based on data in the table III, our
FFT/IFFT block utilizes less area than FFT blocks from
pr = a r b r−a i bi (23)
Quartus Mega function in all area aspects, i.e. logic elements,
p i = ar bi +a i b r . multipliers, total logic elements, and memory bits.
We use some optimizations inside the FFT/IFFT block.
Implementation of equations 23 directly needs 4 multipliers Using 3 multipliers instead of 4 multipliers inside TF
and 2 adders. multiplier block can reduce utilized logic elements. Utilizing
Modification of equations 23 results the equations below, memory as delay block instead of using registers dramatically
reduce utilized logic elements by utilizing memory bits.
p r =a r (br +bi )−bi (a r+ a i ) Besides logic elements utilization, we also compared clock
(24) cycles latency between our block and some 64-points
p i =a r (br +bi )+br (a i −a r )
FFT/IFFT block implementations. Table IV shows that our
FFT/IFFT block requires less clock cycles than other FFT
which needs 3 multipliers and 5 adders in the implmentation. blocks. This is because we do not implement a reordering
Equations 24 needs less multiplier than equations 23. Hence, block to reorder unordered output. Commonly, a reorder block
implementation of equations 24 results in smaller area than needs at least N clock cycles. Therefore, by not implementing
implementation of equations 23. any reorder block, it can save N clock cycles in operation.
D. Delay Block Output of FFT operation in our block is in bit-reversed
order. If bit-reversed ordered input become the input of our
Every radix-2X block is accompanied by a delay block. We IFFT operation, the output will be in normal order. Hence,
implement the delay block using RAM instead of registers to using our FFT/IFFT block can reduce the number of cycles,
minimize its area. Besides RAM, a control block is also but the operation in frequency domain should be processed in
implemented to control read/write enable signal and address bit-reversed order or independent of order.
pointer. The address pointer increases if load signal is active In order to do 64-points FFT-IFFT operations
and returns to 0 if it has reached an upper bound. Data in the consecutively, we should wait for 64 clock cycles after FFT
address pointer is read and then is written with the input data. operation before doing IFFT operation. Hence, in order to
One challenge in implementation of the delay block is the complete FFT-IFFT operations, it requires 210 clock cycles.
difference of delay clock cycles between FFT and IFFT This is less than twice of clock cycles of the other FFT blocks.
operations. To implement the delay block that be able to give
TABLE I will be in bit-reversed order. Commonly, serial FFT blocks
AREA COMPARISON OF DELAY BLOCK IMPLEMENTATION
requires reorder block to reorder the output. In his paper,
Delay Clock Logic Elements Memory Bits Saeed[5] states that reorder block is not necessary when
Cycles RAM Flip-flops RAM Flip-flops performing FFT – IFFT because if the unordered FFT output
8 47 144 128 0 goes to IFFT process, the order should be restored. However,
16 52 272 256 0 he does not explain the architecture in detail. In this paper, we
32 59 528 512 0 explain the architecture in more detail of FFT/IFFT block with
64 65 1040 1024 0 no reordering block.
TABLE II With no reordering block, we have to implement delay
DESIGN PARAMETERS OF FFT/IFFT BLOCK blocks that is able to give different clock delays according to
TLEs Mem Fmax (MHz) N-Clk Latency (ns) order of its input. Advantage of not implementing reordering
2960 4608 84,55 73 863,4 block is that we can save N clock cycles for N-points
FFT/IFFT operations. For 64-points FFT/IFFT operations, our
TABLE III block requires 73 clock cycles to complete the operation.
AREA COMPARISON OF FFT BLOCKS
Output of FFT operation of this block is in bit-reversed
FFT Blocks LEs Mult TLEs Mem order. So the frequency-domain operations between FFT and
Our FFT/IFFT block 2264 12 2960 4608 IFFT operations will receive bit-reversed order input.
FFT from Quartus MegaCore 4373 24 5765 9984 Therefore, the implementation of this FFT/IFFT block is
function with 4 multipliers[10] appropriate for frequency-domain operations which do not
FFT from Quartus MegaCore 4915 18 5959 9984 depend on frequency directly.
function with 3 multipliers[10]
In future, this block can be utilized for digital signal
TABLE IV processing purpose to obtain high-speed circuits. One possible
CLOCK CYCLES COMPARISON BETWEEN HW BLOCK AND SOME FFT BLOCKS application of this block is implementation of real-time noise
Blocks Length (bits) No. of cycles cancellation using Spectral Subtraction method.
Our FFT/IFFT block 16 73
Xilinx's FFT IP V1.0.5 [12] 16 192 REFERENCES
Altera Megafunction's [12] 12 112 [1] S. F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral
J. V. McCanny's[6] 24 130 Subtraction,” IEEE Trans. on Acoustic, Speech, and Signal Processing,
Vol. ASSP-27, No. 2, 1979.
T. Chen's (1993) [7] 16 208
[2] J. W. Cooley and J. W. Tukey, “An Algorithm for the Machine
T. Chen's (1999) [8] 16 222 Calculation of Complex Fourier Series,” Mathematics of Computation,
TABLE V Vol. 19, No. 90, pp. 297-301, 1965.
ACCURACY OF FFT/IFFT BLOCK [3] N. Mahdavi, R. Teymourzadeh, M. B. Othman, “VLSI Implementation
of High Speed and High Resolution FFT Algorithm Based on Radix 2
Configuration RMSE ME PE for DSP Application,” The 5th Student Conference on Research and
FFT only 1,60 x 10-3 7,13 x 10-3 0,05% Development, 2007.
[4] S. Zhang, D. Yu, “Design and Implementation of A Parallel Real-time
IFFT only 2,68 x 10-5 1,14 x 10-4 0,07% FFT Processor,” Proc. 7th International Conference on Solid-State and
FFT-IFFT 1,39 x 10-3 5,83 x 10-3 0,34% Integrated Circuits Technology, pp. 1665-1668, 2004.
[5] A. Saeed, M. Elbably, G. Abdelfadeel, and M. I. Eladawy, “Efficient
Accuracy of this block was also measured using ModelSim FPGA implementation of FFT/IFFT Processor,” Int'l Journal of
Circuits, Systems and Signal Processing, Issue 3, Vol. 3, pp. 103-110,
simulation tools. Input of this block were randomly generated. 2009.
Output from the FFT/IFFT hardware were then compared with [6] J. V. McCanny, D. Trainor, Y. Hu, T. J. Ding, “Rapid Design of
FFT/IFFT results using Python programming. We tested this Complex DSP Cores,” Proc. of the 23rd European Solid-State Circuits
block using 3 configurations: FFT operation only, IFFT Conference, pp. 284-287, 1997.
[7] T. Chen and L. Zhu, “An Expandable Column FFT Architecture Using
operation only, FFT and IFFT operations performed in one Circuit Switching Network,” The Journal of VLSI Signal Processing,
block. Comparison parameters are: root mean square error Vol. 6, No. 3, pp. 243-257, 1993.
(RMSE), maximum error (ME), and percentage error (PE). [8] T. Chen, G. Sunanda, J. Jin, “COBRA: A 100-MOPS Single-Chip
Percentage error is equal to root mean square error divided by Programmable and Expandable FFT,” IEEE Transactions on VLSI
Systems, Vol. 7, pp. 174-182, 1999.
root mean square of output from Python programming. Table [9] E. Bidet, D. Castelain, C. Joanblanq, P. Senn, “A Fast Single-Chip
V shows the result. Implementation of 8192 Complex Point FFT,” IEEE Journal of Solid-
The error is mainly because of bit-shifts inside radix-2X State Circuits, Vol. 30, No. 3, pp. 300-305. 1995.
blocks. Error of FFT – IFFT operation is larger than FFT [10] Altera Corp., “FFT MegaCore Function User Guide.” [Online]
Available on www.altera.com/literature/ug/ug_fft.pdf.. Accessed on
operation and IFFT operation only. This is because in FFT – Jan 27th, 2013.
IFFT operation, the error is accumulated from the two [11] Y. Li and W. Chu, “Implementation of Single Precision Floating Point
operations. Square Root on FPGAs,” IEEE Symposium on FPGAs for Custom
Computing Machines, pp. 226-232, 1997.
VI. CONCLUSION AND FUTURE WORKS [12] K. Maharatna, E. Grass, U. Jaghold, “A 64-point Fourier Transform
Chip for High-Speed Wireless LAN Application Using OFDM,” IEEE
There has been many implementations of serial FFT block. Journal of Solid-State Circuits, Vol. 39, pp. 484-493, 2004.
If input for serial FFT operation is in normal order, the output

View publication stats

S-ar putea să vă placă și