Documente Academic
Documente Profesional
Documente Cultură
6.1
l
A block of samples of the filter input and desired output are collected and then processed together to obtain a block of output samples A good measure of computational complexity in a block processing system is given by the number of operations required to process one block of data divided by the block length. We are to introduce a computationally efficient implementation of the block LMS (BLMS) algorithm and a fast BLMS in frequency domain. Fig.1 shoes a schematic of a block processing system.
input data Input Buffer (S/P) Parallel Processor Output Buffer (P/S) output data
6.2
l
The block LMS (BLMS) algorithm works on the basis of the following strategy. The filter tap-weights are updated once after the collection of every block of data samples. The gradient vectors, 2e(n ) X (n ) , used to update the filter tap-weights are calculated during the current block. Using k to denote the block index, the BLMS recursion is obtained as
W (k + 1) = W (k ) + 2 B
e(kL + i) X (kL + i )
i=0
L 1
...(2) L where L is the block length and B is the step - size parameter. i = 0,1,2,..., L 1 ...(3)
(k-th block)
X (kL + 1) ... X (kL + L 1)]T ...(4)
...(5)
...(6) ...(7)
T
e(kL + i) x ( kL + i ) = X
( k )e (k )
...(8)
W (k + 1) = W (k ) + 2
B T X (k )e (k ) L
...(9)
Equation (7),(8) and (9) define one iteration of the BLMS algorithm.
l
Remarks (1)The convergence behavior of the BLMS is governed by the eigenvalues of the correlation matrix R = E [ X (n ) X T (n)] . (2)The BLMS algorithm has N modes of convergence which are characterized by the time constants
B ,i = 1 for i = 0,1,..., N 1 4 B i where i s are eigenvalue s of R.
The time constants are in the unit of iteration (block) interval (3)Averaging the instantaneous gradient vectors as was done in BLMS algorithm results in gradient vector with a lower variance, as compared with that in the conventional LMS algorithm. For block length L, comparable or less than the filter length, N, the misadjustment, MB, of the BLMS can be approximated by the following expression :
MB B tr[R] L ...(10)
If we let M B M , where M denotes the misadjustment of the conventional LMS algorithm, we obtain
B ,i = = 1 4 L i 1 4 i block interval sample interval
Thus, the convergence behavior of BLMS and the conventional LMS algorithm are the same.
6.3 6.3.1
l
Mathematical Background for Studying BLMS Algorithm Linear Convolution Using DFT
Consider the filtering of a sequence x(n) through a FIR filter with coefficients w0, w1, , wN-1.
y (n) = wi x(n i)
i =0 N 1
NG filter length
When N is large, define a column vector X (k ) of length N=N+L-1 as (LG block length)
~ X ( k ) = [ x( kL N + 1) x( kL N + 2) ... x (kL + L 1)]T ...(11)
...(13)
...(14)
In eq.(14), the elements represented by * correspond to circular convolution results, which do not coincide with linear convolution samples.
The procedure expressed in eq.(14) is commonly known as overlap-save method. This name reflects the fact that in each block of input, X (k ) consists of L new samples and N-1 overlapped samples from the previous block. Another equally efficient method for the computation of linear convolution using DFT is the overlap-add method. However, the overlap-add method has been found to be computationally less efficient than the overlap-save method. 6.3.2
l
Circular Matrices
...(15)
In Ac , each row (column) is obtained by circularly shifting the previous row (column) by one element.
l
One important property of circular matrices is that such matrices are diagonalized by DFT matrices. That is, if F is the MxM DFT matrix defined as
1 1 1 j 2 / M e F = 1 e j 4 / M M M j 2 ( M 1) / M 1 e 1 e e e
j 4 / M j 8 / M
L L L
1 e e
j 2 ( M 1) / M j 4 ( M 1) / M
M
j 4 ( M 1) / M
M L e
j 2 ( M 1)
2
/ M
Then
AF = FAc F 1 ...(17)
is a diagonal matrix. Furthermore, the diagonal element of AF correspond to the DFT of the first column of Ac , i.e.
A f = diag[ a F ] where a F = Fa , a = [a 0
This can be proved (see p.255) 6.3.3 Window Matrices and Matrix Formulation of the Overlap-Save Method
l
Let denote by y c (k ) the column vector that appears on the left-hand side of eq.(14). That is,
y c (k ) = [* * L * y( kL) y (kL + 1) L y (kL + L 1)]T
We can see that ~ (k ) can be obtained from y c (k ) with zero. y This substitution can be written in the form of a matrix-vector product as
~ (k ) = P y ( k ) y 0, L c
(21)
with I L being the LxL identity matrix and 0 s are zero matrix with appropriate dimensions.
Thus, we obtain
~ (k ) = P X (k )W (k ) y 0 ,L c ...(23)
and X F (k ) = FX c F 1
...(26)
Note that X F (k ) is the diagonal matrix consisting of the elements of DFT of the first column of Xc(k), since Xc(k) is a circular matrix. We also note that the first column of Xc(k) is the input vector ~(k ) . x Therefore, we can obtain
~ (k ) = P F 1 X ( k )W (k ) y 0 ,L F F ...(27) .
6.4
l
FBLMS Algorithm
The fast BLMS (FBLMS) algorithm is a computationally efficient implementation of the BLMS algorithm in the frequency domain.
y Equation(27), ~ (k ) = P0 ,L F 1 X F ( k )WF ( k ) , corresponds to the filtering
where Xc(k) is the circular matrix of samples of the filter input and
= B / L , and
I PN ,0 = N 0
l
0 0
...(31)
PN,0 is an NxN windowing matrix which ensures that the last L-1 elements of the updated weight vector W (k + 1) remain equal to zero after each iteration of eq.(30).
~
The equivalence between eq.(9) & eq,(30) can be verified easily. Conversion of eq.(30) to its frequency domain equivalent can be done by premultiplying it on both sides by the DFT matrix F and using the identity F-1F=I to obtain
T ~ WF (k + 1) = WF (k ) + 2 FPN , 0 F 1 FX c ( k ) F 1 Fe ( k )
...(32)
...(33)
and
PN , 0 = FPN , 0 F 1
...(34)
Fig. 8.3 depicts a block diagram of the FBLMS algorithm. Related equations:
~ (k ) = P F 1 X ( k )W (k ) y 0 ,L F F ~ ~ e ( k ) = d ( k ) ~ (k ) y
* WF (k + 1) = WF (k ) + 2 PN ,0 X F (k )eF (k )
6.4.1
l
It has been shown that under fairly mild conditions the FBLMS algorithm can work well even when the tap-weight constraining matrix PN,0 is dropped from equation(33). That is, the recursion
WF (k + 1) = WF ( k ) + 2 X * ( k )eF (k ) F ...(35)
converges the same set of tap-weights when N is chosen sufficiently large and the input process, x(n), does not satisfy some specific ( unlikely to happen in practice ) conditions.
l
Figure 8.3 shows the block diagram of the constrained FBLMS algorithm. It is easily converted to the unconstrained FBLMS algorithm if the gradient constraining operation, enclosed by the dotted-line box, is dropped. Convergence Behavior of the FBLMS Algorithm
6.4.2
l
Since the first N-1 elements of d ( k ) are all zero, thus d (k ) = P0 ,L d ( k ) using this in eq.(36), we obtain
~ ~ e ( k ) = P0, L (d (k ) F 1 X F ( k )WF (k )) ~ = P0,L F 1 ( F d ( k ) X F ( k )WF (k )) ...(37)
where d F (k ) = Fd (k ) and
P0, L = Fp0, L F 1 ...(39)
Then, we obtain
WF (k + 1) = WF ( k ) + 2 X * ( k )[P0,L ( d F ( k ) X F (k )WF (k ))] F ...(40)
where W0, F is the optimum value of the filter tap-weight vector in the frequency domain. Using eq.(41) in eq.(40), we can obtain
* * v F (k + 1) = ( I 2 X F (k ) P0 ,L X F ( k ))v F (k ) + 2 X F (k ) P0 ,L e0 ,F (k )
...(42)
where e0,F (k ) is the optimum error vector obtained when WF (k ) is replaced by W0,F (k ) . It can be shown (omitted) that, for coloured inputs, as happens with the conventional LMS algorithm, the unconstrained FBLMS algorithm will also perform poorly. The same is true for the constrained FBLMS algorithm. Step-Normalization
6.4.3
l
The convergence behavior of the FBLMS algorithm can be greatly improved by using individually normalized step-size parameters for each element of he tap-weight vector WF (k ) , rather than a common step-size parameter. This technique is known as step-normalization which is similar to one that was used for improving the convergence of the transform-domain LMS algorithm.
The step-normalization is implemented by replacing the scalar step-size parameter by the diagonal matrix
( k ) = diag[ 0 (k ) 1 (k ) L N ' 1 ( k )] ...(43)
where i(k) is the normalized step-size parameter for the i-th tap. These are obtained by the equations.
2 i (k ) = 0 / xF ,i ( k ) ...(44)
where 0 is a constant,
in the frequency domain, xF,i(k) . These estimates may be obtained using the following recursion:
2 2 x F , i ( k ) = x F ,i ( k 1) + (1 ) x F ,i (k )
2
...(45)
0<<1, is close to 1. 6.4.4 Summary of the FBLMS Algorithm Table 8.1: Input: Tap-weight vector, WF (k )
Signal power estimates, x2
F , i ( k 1)
Desired output vector, d ( k ) = [d (kL ) d (kL + 1) ... d ( kL + L 1)]T Output: Filter output, y (k ) = [ y( kL) y (kL + 1) ... y ( kL + L 1)]T Tap-weight vector update, WF (k +1) Note: N: filter length L: block length (1) Filtering
x F (k ) = FFT ( ~ (k )) x y ( k ) = the last L elements of IFFT of x F ( k ) WF (k ) Here denotes element - by element multiplica tion of vectors.
i ( k ) = 0 / x F ,i ( k ) 2 (k ) = [ 0 ( k ) 1 (k ) L N ' 1 ( k )]T
This last step is applicable only for the constrained FBLMS algorithm. Note: The algorithm in Table 8.1 is applicable to both real- and complex-valued signals. 6.4.5
l
Block processing of signals, in general, results in a certain time delay at the system output. In many applications this processing delay may be intolerable and hence it has to be minimized. In applications where the processing delay is not an issue, L is usually chosen to N. The exact value of L depends on N. For a given N, one should choose L so that N=N+L-1 is an appropriate composite number so that efficient FFT and IFFT algorithm can be used in the realization of the FBLMS algorithm. On the other hand, in applications where it is important to keep the processing delay small, one may need to strike a compromise between system complexity and processing delay. In the following, we will discuss an alternative implementation of the FBLMS algorithm, which is found to be more efficient ( Partitioned FLMS algorithm ).
6.5
l
When N is large and a block length L, much smaller than N is used, an efficient implementation of the FBLMS algorithm can be derived by dividing ( partitioning ) the convolution sum, y (n ) = wi x( n i) , into a
i =0 N 1
number of smaller sums and processing as discussed bellow. The resulting implementation is called the partitioned FBLMS ( PFBLMS) algorithm .
l
Let us assume that N=P?M where P & M are integers. The convolution sum, y (n ) = wi x( n i) , may be written as
i =0 N 1
y (n ) = y l ( n)
l =0
P1
...(46)
M 1 i =0
where y l (n ) =
i + lM
x (n lM i)
...(47)
We choose L=M and divide the input data into blocks of 2M samples such that the last M samples of, say, the k-th block are the same as the first M samples of the (k+1)-th block. The convolution sum in equation(47) can be evaluated using circular convolution these data blocks with an appropriate weight vectors having been padded with M zeros.
l
Using x(kM+M-1) to represent the newest sample in the input, we define the vectors
X F ,l (k ) = FFT ([ x((k l ) M M ) x ((k l ) M M + 1) L x ((k l )M + M 1)]T ) WF ,l (k ) = FFT ([wlM (k ) wlM +1 (k ) L w lM + M 1 ( k ) 0 0 L 0]T ) 14243 4 4
M
...(48)
...(49)
y l (k ) = [ yl (kM )
yl (kM + 1) L
y l (kM + M 1)]
...(50)
y ( k ) = yl ( k )
l =0
P 1
...(53)
...(55)
Using this result, the block diagram of PFBLMS algorithm may be depicted in Fig. 8.5
W
l =0
P 1
F ,l
transversal filters, each of length P, with the j-th filter processing the frequency domain samples belonging to the j-th frequency bin, for j=0,1,2, ,M-1.
l
where (k ) is the vector of the associated step-size parameter which may be normalized.
d (k ) y ( k ) eF (k ) = FFT ...(57) 0 d (k) = [d (kM ) d (kM + 1) L d ( kM + M 1)]T and 0 is the length M zero column vector.
Equation(56) is the unconstrained PFBLMS algorithm. The constrained PFBLMS algorithm recursion is obtained by constraining the filter tap weights after every iteration of eq.(56) .
Data blocks
2 M samples 6444 7444 8 4 4
k-th block
(k+1)-th block
l=0
l=P =M
6.5.1
l
Choose M=pL, where p is an integer As we show later, this choice of L and M leads to an efficient implementation of PFBLMS algorithm.
l
Note that if we want to use DFT to compute the output of various partitions in eq.(53), y (k ) = y l (k ) , then ~0 (k ) corresponds to the x
l =0 P 1
vector of input samples associated with the first partition, i.e. y0(n) in eq.(53), with n=kL+L-1. Observe that the first element of ~0 (k ) is x x(kL-M).
Similarly, the vectors corresponding to the subsequent partitions starts with samples x(kL-2M)=x((k-p)L-M) x(kL-3M)=x((k-2p)L-M) Note that M=pL and so on. We thus find that
x F ,l (k ) = x F , 0 (k pl ), for l = 1,2,..., p 1 ...(59)
Fig. 8.6 depicts an implementation of the PFBLMS algorithm when M=pL. The difference between Fig. 8.5 & Fig. 8.6 is hat each delay in Fig 8.5 is replaced by p delay units in Fig. 8.6 .
Past frequency domain vector of input, ~F ,0 ( k l ) , for k = 1,2,..., ( P 1) p x Desired output vector, d ( k ) = [d (kL) d ( kL + 1) ... d (kL + L 1)]T Output: Filter output, y (k ) = [ y (kL ) y( kL + 1) ... y (kL + L 1) ]T Tap-weight vector update, WF (k + 1) , l = 0,1,2,...,P 1
(1) Filtering
x F ,0 ( k ) = FFT ( ~0 (k )) x y ( k ) = the last L elements of IFFT (
W
l=0
P -1
F ,l
(k ) x F ,0 (k pl ) )
i (k ) = 0 / xF , 0 ,i ( k ) 2 (k ) = [ 0 (k ) 1 (k ) L M ' 1 ( k ) ]T
6.5.2
l
The computational complexity of the constrained PFBLMS algorithm depends on how the constraining step is implemented.
In the implementation of the unconstrained PFBLMS algorithm, the processing of each data block requires two (M+L)-point FFTs and one IFFT of the same length. Noting that L output samples are generated at the end of each block processing interval, we obtain the per sample computational complexity of the unconstrained PFBLMS algorithm as
3 ( p + 1)L C = ( p + 1) P + ( p + 1) log2 4 2
The memory requirement of the unconstrained and constrained PFBLMS algorithms are about the same. The number of memory words required to implement the PFBLMS algorithm is approximately
S = ( p + 1) 2 LP words