Documente Academic
Documente Profesional
Documente Cultură
section 6. W80
x ( 2) X ( 2)
−1
W82 W80
2. Discrete Fourier Transform and FFT x(3)
−1 −1
X ( 6)
W80
x ( 4) X (1)
−1
The Discrete Fourier Transform (DFT) of discrete W81 W80
signal x (n ) can be directly computed as x(5)
−1 −1
X (5)
W82 W80
N −1
X ( k ) = ∑ x( n)WNnk k = 0,1,..., N − 1
x (6) X (3)
(3) −1 −1
n=0 x (7 )
W83 W82 W80 (b)
X ( 7)
−1 −1 −1
where WN = e− j 2π / N = cos 2π − j sin 2π and WN is known as
N N
Fig. 1 Flow graph of FFT
the phase or twiddle factor and j 2 = −1 . Here x(n) and
X (ω ) are sequences of complex numbers.
3. The New Twiddle-Factor-Based FFT
An efficient method of computing the DFT that
significantly reduces the number of required arithmetic Algorithm
operations is called FFT [1]-[10][12]-[20]. An FFT
algorithm divides the DFT calculation into many short- It can be seen from Fig. 1.b that, unless sufficiently
length DFTs and results in huge savings of computations. large number of registers is available, in most practical
If the length of DFT N = R v , i.e., is the product of situations, the twiddle factor W82 will be loaded from the
identical factors, the corresponding FFT algorithms are memory to the CPU twice in both Stages 1 and 2. Such
called Radix-R algorithms. Assume FFT length is 2M, redundant memory access is repeatedly seen in loading
where M is the number of stages. The radix-2 DIF FFT other twiddle factors and therefore, becomes a serious
divides an N-point DFT into 2 N/2-point DFTs, then into problem when computing a large FFT. In this section, we
4 N/4-point DFTs, and so on. That is, the radix-2 DIF present the twiddle-factor-based FFT algorithm, which
FFT expresses the DFT equation as two summations, then can reduce the number of memory access as well as the
divides it into two equations, each of which computes number of multiplication operations.
every two output samples. To arrive at a two-point DFT
x(1)
W[0]
x(8)
// in the prolog
x(2)
W[0]
x(4)
int n5 = n1 % 2; // n1: N = 2^n1
x(3)
W[4] W[0]
x(12)
int n2 = N;
W[0]
x(4) x(2)
x(5)
W[2] W[0]
x(10) // The Prolog of the tree structure
W[4] W[0]
x(6) x(6) for (proc = 0; proc < n5; proc++) {
W[6] W[4] W[0]
x(7)
W[0]
x(14)
n3 = n2;
x(8)
W[1] W[0]
x(1)
n2 >>= 1;
x(9) x(9)
W[2] W[0]
for (bu = 0; bu < n; bu += n3) {
x(10) x(5)
x(11)
W[3] W[4] W[0]
x(13)
// Calculate the butterfly
x(12)
W[4] W[0]
x(3)
// butterfly_cal(x[0], x[1], x[2], x[3]): Calculate
x(13)
W[5] W[2] W[0]
x(11)
// the butterfly with two specified points, x[0], x[2]
x(14)
W[6] W[4] W[0]
x(7) // are the real parts of the points; x[1],x[3] are
x(15)
W[7] W[6] W[4] W[0]
x(15) // the imaginary parts of the points.
butterfly_cal(x[2bu], x[2bu+1],
Super 1. stage
Stage 2. Twiddle factor x[2(bu+n2)], x[2(bu+n2)+1]);
3. butterfly
Section 1
}
Section 2
}
Fig. 3 Structure of a 16-point FFT based on the proposed // The Kernel of the tree structure
algorithm. // If there are r pairs of registers, denoted as
// Reg_real [0: (r-1)] and Reg_im[0: (r-1)],
From the above discussion, we can see that we can // to be used, (r-1) butterflies can be computed .
compute the FFT structure in the way similar to the // Reg_real and Reg_im are for real and imaginary parts,
Hoffman coding. This resolves the data dependence and // respectively. Immediate results can be saved in given
the verification of this algorithm can be viewed as // registers, rather than writing them back to the memory.
“decoding” of the Hoffman codes. for (proc = N >> n5; proc > 1; proc >>= j) {
It can be seen that four loops are required in this int n4 = proc >> j;
section of computation, while traditional approaches may int index = 0;
require three. This loop overhead, however, can be easily
absorbed in current processors with multiple data paths, for (group = 0; group < 2^(n1 – n3) ; group++){
such as TI TMSC62x DSP [18]. // Fetch the points from memory to registers
Section 2: In the second section, the rest butterflies for (i = 0; i < r; i++) {
involving the twiddle factor W N0 are computed. Note that Reg_real[i] = x[2index];
no multiplication is needed in computing these ( N − 1) Reg_im[i] = x[2index + 1];
index = index + n4;
butterflies (Theorem 1) as W N0 = 1 . All these butterflies }
are organized as a binary tree and there are log 2 N stages. m=r;
The memory access of this section can be significantly // Calculation the butterflies: (r-1) in total
reduced if a few user-visible data registers are available. for (i = 1; i <= j; i++) { // r = 2^j -- levels
Depending on the size of the given registers (M) to save p = m;
intermediate results, we can traversal the binary tree with m = m / 2;
an algorithm shown in Fig. 4, where the visit to a node for (q = 0; q<r; q=q+p) {
refers to a 2-point butterfly computation. bufferly_cal(Reg_real[q], Reg_im[q],
Reg_real[q+m], Reg_im[q+m]);
// N: N-point FFT }
// x: input samples }
// x[2k] -- real part of the kth data sample // Store the points back to the memory
// k= 0, 1, 2,..., (n-1). for (i = 0; i < r; i++) {
// x[2k + 1] -- imaginary part of the kth data sample x[2index] = Reg_real[i];
// r: r pair of registers x[2index + 1] = Reg_im[i];
// j: r = 2^j index = index + n4;
void section2(int N, float* x, float* w, int n1) }
{ }
Fig. 4 Tree traversal algorithm for section 2 computation in the algorithm shown in Fig. 2
Word
0: W160 W162 W164 W166
Group 2 Group 5
Address
1: W161 W163 W165 W167
In this section, we present a variety of details that shall The idea of our proposed twiddle-factor-based FFT
be considered in the implementation of the proposed algorithm, actually, can be borrowed to modify many
algorithm in various platforms. existing FFT algorithms to squeeze out the redundant