Sunteți pe pagina 1din 22

Chapter 6 Block Implementation of Adaptive Filters

6.1
l

Block processing of data samples

A block of samples of the filter input and desired output are collected and then processed together to obtain a block of output samples A good measure of computational complexity in a block processing system is given by the number of operations required to process one block of data divided by the block length. We are to introduce a computationally efficient implementation of the block LMS (BLMS) algorithm and a fast BLMS in frequency domain. Fig.1 shoes a schematic of a block processing system.
input data Input Buffer (S/P) Parallel Processor Output Buffer (P/S) output data

6.2
l

Block LMS Algorithm

In conventional LMS algorithm


W (n + 1) = W (n ) + 2e(n) X (n) ...(1) where X (n) = [ x(n ) x (n + 1) ... x(n N + 1)]T and W ( n) = [w0 (n) w1 ( n) ... w n N +1 ( n)]T e(n) = d (n) y( n) is the output error. d (n ) is the desired output y (n) = W T ( n) X ( n)

The block LMS (BLMS) algorithm works on the basis of the following strategy. The filter tap-weights are updated once after the collection of every block of data samples. The gradient vectors, 2e(n ) X (n ) , used to update the filter tap-weights are calculated during the current block. Using k to denote the block index, the BLMS recursion is obtained as
W (k + 1) = W (k ) + 2 B

e(kL + i) X (kL + i )
i=0

L 1

...(2) L where L is the block length and B is the step - size parameter. i = 0,1,2,..., L 1 ...(3)

Also, e(kL + i ) = d (kL + i) y (kL + i)


l

Define the matrix


X (k ) = [ X (kL)

(k-th block)
X (kL + 1) ... X (kL + L 1)]T ...(4)

and the column vectors,


d (k ) = [d ( kL) d (kL + 1) ... d (kL + L 1)]T y (k ) = [ y (kL ) y (kL + 1) ... y (kL + L 1)]T
T

...(5)

e (k ) = [e(kL) e(kL + 1) ... e(kL + L 1)]

Also, note that


y (k ) = X (k )W (k ) e (k) = d (k)-y(k)
L 1 i=0

...(6) ...(7)
T

e(kL + i) x ( kL + i ) = X

( k )e (k )

...(8)

Then, we can obtain the recursion

W (k + 1) = W (k ) + 2

B T X (k )e (k ) L

...(9)

Equation (7),(8) and (9) define one iteration of the BLMS algorithm.
l

Remarks (1)The convergence behavior of the BLMS is governed by the eigenvalues of the correlation matrix R = E [ X (n ) X T (n)] . (2)The BLMS algorithm has N modes of convergence which are characterized by the time constants
B ,i = 1 for i = 0,1,..., N 1 4 B i where i s are eigenvalue s of R.

The time constants are in the unit of iteration (block) interval (3)Averaging the instantaneous gradient vectors as was done in BLMS algorithm results in gradient vector with a lower variance, as compared with that in the conventional LMS algorithm. For block length L, comparable or less than the filter length, N, the misadjustment, MB, of the BLMS can be approximated by the following expression :
MB B tr[R] L ...(10)

If we let M B M , where M denotes the misadjustment of the conventional LMS algorithm, we obtain
B ,i = = 1 4 L i 1 4 i block interval sample interval

Thus, the convergence behavior of BLMS and the conventional LMS algorithm are the same.

6.3 6.3.1
l

Mathematical Background for Studying BLMS Algorithm Linear Convolution Using DFT

Consider the filtering of a sequence x(n) through a FIR filter with coefficients w0, w1, , wN-1.
y (n) = wi x(n i)
i =0 N 1

NG filter length

computational complexityG multiplications and N-1 additions. N

When N is large, define a column vector X (k ) of length N=N+L-1 as (LG block length)
~ X ( k ) = [ x( kL N + 1) x( kL N + 2) ... x (kL + L 1)]T ...(11)

and W (k ) of length Nas


W (k ) ~ W (k ) = ...(12) 0 where W(k) = [w 0 (k ) w1 (k ) ... w N 1 (k )]T

...(13)

Circular convolution of W (k ) and X(k)


* * M * = y (kL ) y( kL + 1) M y (kL + L 1) x(kL N + 1) x(kL + L 1) x(kL N + 2) x (kL N + 1) M M x (kL 2) x(kL 1) x( kL) x (kL 1) x(kL ) x (kL + 1) M M x(kL + L 1) x(kL + L 2) L x( kL N + 2) w 0 (k ) L x(kL N + 3) w1 ( k ) M L M L x (kL) w N 2 (k ) L x(kL + 1) w N 1 (k ) L x(kL + 2) 0 M L M L x(kL N + 1) 0

...(14)

In eq.(14), the elements represented by * correspond to circular convolution results, which do not coincide with linear convolution samples.

The procedure expressed in eq.(14) is commonly known as overlap-save method. This name reflects the fact that in each block of input, X (k ) consists of L new samples and N-1 overlapped samples from the previous block. Another equally efficient method for the computation of linear convolution using DFT is the overlap-add method. However, the overlap-add method has been found to be computationally less efficient than the overlap-save method. 6.3.2
l

Circular Matrices

Consider the MxM circular matrix


a0 a1 Ac = M a M 2 a m1 a M 1 a0 M a M 3 a M 2 a M 2 L a M 1 M a M 4 a M 3 a1 L a2 M L a M 1 L a0

...(15)

In Ac , each row (column) is obtained by circularly shifting the previous row (column) by one element.
l

One important property of circular matrices is that such matrices are diagonalized by DFT matrices. That is, if F is the MxM DFT matrix defined as
1 1 1 j 2 / M e F = 1 e j 4 / M M M j 2 ( M 1) / M 1 e 1 e e e
j 4 / M j 8 / M

L L L

1 e e
j 2 ( M 1) / M j 4 ( M 1) / M

M
j 4 ( M 1) / M

M L e
j 2 ( M 1)
2

/ M

Then
AF = FAc F 1 ...(17)

is a diagonal matrix. Furthermore, the diagonal element of AF correspond to the DFT of the first column of Ac , i.e.

A f = diag[ a F ] where a F = Fa , a = [a 0

...(18) a1 L a M 1 ]T is the first column of Ac .

This can be proved (see p.255) 6.3.3 Window Matrices and Matrix Formulation of the Overlap-Save Method
l

Define the NxN circular matrix, for N=L+N-1, as


x (kL N + 1) x (kL + L 1) L x( kL N + 2) x (kL N + 2) x (kL N + 1) L x(kL N + 3) X c (k ) = M M M x (kL + L 1) x(kL + L 2) L x( kL N + 1) ...(19)

Also, define the length N column vector


~ (k ) = 0 ...(20) y y (k ) where y (k ) = [ y (kL) y( kL + 1) L y (kL + L 1)]T 0 is the length N 1 zero vector.

Let denote by y c (k ) the column vector that appears on the left-hand side of eq.(14). That is,
y c (k ) = [* * L * y( kL) y (kL + 1) L y (kL + L 1)]T

We can see that ~ (k ) can be obtained from y c (k ) with zero. y This substitution can be written in the form of a matrix-vector product as
~ (k ) = P y ( k ) y 0, L c

(21)

where P0,L is the NxN windowing matrix defined as


0 P0, L = 0 0 IL ...(22)

with I L being the LxL identity matrix and 0 s are zero matrix with appropriate dimensions.

Thus, we obtain
~ (k ) = P X (k )W (k ) y 0 ,L c ...(23)

Eq.(23) may be written as


~ ~ (k ) = P F 1 FX ( k ) F 1 FW (k ) y 0 ,L c

extended output of length N


...(24)

where F is the NxN DFT matrix Next, define


~ WF (k ) = FW (k ) ...(25)

and X F (k ) = FX c F 1

...(26)

Note that X F (k ) is the diagonal matrix consisting of the elements of DFT of the first column of Xc(k), since Xc(k) is a circular matrix. We also note that the first column of Xc(k) is the input vector ~(k ) . x Therefore, we can obtain
~ (k ) = P F 1 X ( k )W (k ) y 0 ,L F F ...(27) .

6.4
l

FBLMS Algorithm

The fast BLMS (FBLMS) algorithm is a computationally efficient implementation of the BLMS algorithm in the frequency domain.
y Equation(27), ~ (k ) = P0 ,L F 1 X F ( k )WF ( k ) , corresponds to the filtering

part of the FBLMS algorithm.


l

The output vector ~ (k ) , in extended form is defined in eq.(26) y


~ (k ) 0 y y (k )

The vector of desired outputs in extended form, is defined as


0 ~ d (k ) = d (k ) ...(28)

The extended error vector is


~ ~ e ( k ) = d (k ) ~ (k ) y ...(29)

To obtain the frequency domain equivalent of eq.(9),


W ( k + 1) = W ( k ) + 2 B T X (k )e (k ) , we replace W (k ) and e (k ) by their L

extended version, and note that eq.(9) may also written as


~ ~ T W (k + 1) = W ( k ) + 2 PN ,0 X c ( k )~( k ) e ...(30)

where Xc(k) is the circular matrix of samples of the filter input and
= B / L , and

I PN ,0 = N 0
l

0 0

...(31)

PN,0 is an NxN windowing matrix which ensures that the last L-1 elements of the updated weight vector W (k + 1) remain equal to zero after each iteration of eq.(30).
~

The equivalence between eq.(9) & eq,(30) can be verified easily. Conversion of eq.(30) to its frequency domain equivalent can be done by premultiplying it on both sides by the DFT matrix F and using the identity F-1F=I to obtain
T ~ WF (k + 1) = WF (k ) + 2 FPN , 0 F 1 FX c ( k ) F 1 Fe ( k )

...(32)

and then we can write the weight-vector updating equation as


WF (k + 1) = WF ( k ) + 2 PN , 0 X * (k )eF (k ) F
~ where eF (k ) = F e ( k )

...(33)

and

PN , 0 = FPN , 0 F 1

...(34)

Fig. 8.3 depicts a block diagram of the FBLMS algorithm. Related equations:
~ (k ) = P F 1 X ( k )W (k ) y 0 ,L F F ~ ~ e ( k ) = d ( k ) ~ (k ) y
* WF (k + 1) = WF (k ) + 2 PN ,0 X F (k )eF (k )

6.4.1
l

Constrained and Unconstrained FBLMS Algorithms

It has been shown that under fairly mild conditions the FBLMS algorithm can work well even when the tap-weight constraining matrix PN,0 is dropped from equation(33). That is, the recursion
WF (k + 1) = WF ( k ) + 2 X * ( k )eF (k ) F ...(35)

converges the same set of tap-weights when N is chosen sufficiently large and the input process, x(n), does not satisfy some specific ( unlikely to happen in practice ) conditions.
l

Figure 8.3 shows the block diagram of the constrained FBLMS algorithm. It is easily converted to the unconstrained FBLMS algorithm if the gradient constraining operation, enclosed by the dotted-line box, is dropped. Convergence Behavior of the FBLMS Algorithm

6.4.2
l

Consider the unconstrained recursion, eq.(35). Substituting eq.(27) in eq.(29), we get


~ ~ e ( k ) = d ( k ) P0,L F 1 X F (k )WF (k ) ...(36)

Since the first N-1 elements of d ( k ) are all zero, thus d (k ) = P0 ,L d ( k ) using this in eq.(36), we obtain
~ ~ e ( k ) = P0, L (d (k ) F 1 X F ( k )WF (k )) ~ = P0,L F 1 ( F d ( k ) X F ( k )WF (k )) ...(37)

Premultiplying eq.(37) on both sides by F, we get


eF (k ) = P0,L ( d F ( k ) X F (k )WF (k ) ) ...(38)

where d F (k ) = Fd (k ) and
P0, L = Fp0, L F 1 ...(39)

Then, we obtain
WF (k + 1) = WF ( k ) + 2 X * ( k )[P0,L ( d F ( k ) X F (k )WF (k ))] F ...(40)

Next, we define the tap-weight error vector


v F ( k ) = W F ( k ) W 0 ,F ...(41)

where W0, F is the optimum value of the filter tap-weight vector in the frequency domain. Using eq.(41) in eq.(40), we can obtain
* * v F (k + 1) = ( I 2 X F (k ) P0 ,L X F ( k ))v F (k ) + 2 X F (k ) P0 ,L e0 ,F (k )

...(42)

where e0,F (k ) is the optimum error vector obtained when WF (k ) is replaced by W0,F (k ) . It can be shown (omitted) that, for coloured inputs, as happens with the conventional LMS algorithm, the unconstrained FBLMS algorithm will also perform poorly. The same is true for the constrained FBLMS algorithm. Step-Normalization

6.4.3
l

The convergence behavior of the FBLMS algorithm can be greatly improved by using individually normalized step-size parameters for each element of he tap-weight vector WF (k ) , rather than a common step-size parameter. This technique is known as step-normalization which is similar to one that was used for improving the convergence of the transform-domain LMS algorithm.

The step-normalization is implemented by replacing the scalar step-size parameter by the diagonal matrix
( k ) = diag[ 0 (k ) 1 (k ) L N ' 1 ( k )] ...(43)

where i(k) is the normalized step-size parameter for the i-th tap. These are obtained by the equations.
2 i (k ) = 0 / xF ,i ( k ) ...(44)

where 0 is a constant,

2 x F , i ( k ) are the power estimates of the samples of the filter input

in the frequency domain, xF,i(k) . These estimates may be obtained using the following recursion:
2 2 x F , i ( k ) = x F ,i ( k 1) + (1 ) x F ,i (k )
2

...(45)

0<<1, is close to 1. 6.4.4 Summary of the FBLMS Algorithm Table 8.1: Input: Tap-weight vector, WF (k )
Signal power estimates, x2
F , i ( k 1)

Extended input vector,


~(k ) = [ x( kL N + 1) x (kL N + 2 ) ... x (kL + L 1)]T x

Desired output vector, d ( k ) = [d (kL ) d (kL + 1) ... d ( kL + L 1)]T Output: Filter output, y (k ) = [ y( kL) y (kL + 1) ... y ( kL + L 1)]T Tap-weight vector update, WF (k +1) Note: N: filter length L: block length (1) Filtering
x F (k ) = FFT ( ~ (k )) x y ( k ) = the last L elements of IFFT of x F ( k ) WF (k ) Here denotes element - by element multiplica tion of vectors.

(2) Error estimation


e (k) = d (k)-y(k)

(3) Step-Nomalization for i=0 to N-1


2 2 x F , i ( k ) = x F ,i ( k 1) + (1 ) x F ,i ( k )
2

i ( k ) = 0 / x F ,i ( k ) 2 (k ) = [ 0 ( k ) 1 (k ) L N ' 1 ( k )]T

(4) Tap-weight adaptation


0 eF (k ) = FFT ( ) e (k ) * WF (k + 1) = WF ( k ) + 2 (k ) X F ( k ) eF (k )

(5) Tap-weight constraint


the first N elements of IFFT (WF (k + 1)) WF (k + 1) = FFT ( ) 0

This last step is applicable only for the constrained FBLMS algorithm. Note: The algorithm in Table 8.1 is applicable to both real- and complex-valued signals. 6.4.5
l

Selection of the Block Length

Block processing of signals, in general, results in a certain time delay at the system output. In many applications this processing delay may be intolerable and hence it has to be minimized. In applications where the processing delay is not an issue, L is usually chosen to N. The exact value of L depends on N. For a given N, one should choose L so that N=N+L-1 is an appropriate composite number so that efficient FFT and IFFT algorithm can be used in the realization of the FBLMS algorithm. On the other hand, in applications where it is important to keep the processing delay small, one may need to strike a compromise between system complexity and processing delay. In the following, we will discuss an alternative implementation of the FBLMS algorithm, which is found to be more efficient ( Partitioned FLMS algorithm ).

6.5
l

The Partitioned FBLMS Algorithm

When N is large and a block length L, much smaller than N is used, an efficient implementation of the FBLMS algorithm can be derived by dividing ( partitioning ) the convolution sum, y (n ) = wi x( n i) , into a
i =0 N 1

number of smaller sums and processing as discussed bellow. The resulting implementation is called the partitioned FBLMS ( PFBLMS) algorithm .
l

Let us assume that N=P?M where P & M are integers. The convolution sum, y (n ) = wi x( n i) , may be written as
i =0 N 1

y (n ) = y l ( n)
l =0

P1

...(46)
M 1 i =0

where y l (n ) =

i + lM

x (n lM i)

...(47)

We choose L=M and divide the input data into blocks of 2M samples such that the last M samples of, say, the k-th block are the same as the first M samples of the (k+1)-th block. The convolution sum in equation(47) can be evaluated using circular convolution these data blocks with an appropriate weight vectors having been padded with M zeros.
l

Using x(kM+M-1) to represent the newest sample in the input, we define the vectors
X F ,l (k ) = FFT ([ x((k l ) M M ) x ((k l ) M M + 1) L x ((k l )M + M 1)]T ) WF ,l (k ) = FFT ([wlM (k ) wlM +1 (k ) L w lM + M 1 ( k ) 0 0 L 0]T ) 14243 4 4
M

...(48)

...(49)

y l (k ) = [ yl (kM )

yl (kM + 1) L

y l (kM + M 1)]

...(50)

and note that


y l (k ) = the last M elemnts of IFFT (WF ,l ( k ) X F ,l (k ) ...(51)

where k is the block index l is the partition index we also define


y ( k ) = [ y (kM ) y (kM + 1 L y ( kM + M 1)]T ...(52)

and note that

y ( k ) = yl ( k )
l =0

P 1

...(53)

From eq.(48), we note that


X F ,l ( k ) = X F ,0 ( k l ) ...(54)

From eqs.(50), (53) and (54), we obtain


y ( k ) = the last M elements of IFFT (WF ,l (k ) X F , 0 (k l ))
l =0 P1

...(55)

Using this result, the block diagram of PFBLMS algorithm may be depicted in Fig. 8.5

The implementation of the summation on the right-hand side of eq.(5.5),

W
l =0

P 1

F ,l

(k ) X F ,0 ( k l ) , can be considered as a parallel bank of 2M

transversal filters, each of length P, with the j-th filter processing the frequency domain samples belonging to the j-th frequency bin, for j=0,1,2, ,M-1.
l

The adaptation of the filter tap-weights is done according to the recursion:


WF ,l (k + 1) = WF ,l (k ) + ( k ) X F ,0 (k l ) e F ( k ) for l = 0,1,..., P 1 ...(56)

where (k ) is the vector of the associated step-size parameter which may be normalized.
d (k ) y ( k ) eF (k ) = FFT ...(57) 0 d (k) = [d (kM ) d (kM + 1) L d ( kM + M 1)]T and 0 is the length M zero column vector.

Equation(56) is the unconstrained PFBLMS algorithm. The constrained PFBLMS algorithm recursion is obtained by constraining the filter tap weights after every iteration of eq.(56) .

Data blocks
2 M samples 6444 7444 8 4 4

k-th block

(k+1)-th block

k: block index l: partition index

l=0

l=P =M

6.5.1
l

PFBLMS Algorithm with M>L

Assuming that Block length = L Partition length = M Define the vector


~ ( k ) = [ x(kL M ) x0 x (kL M + 1) ... x( kL + L 1)]T ...(58)

Choose M=pL, where p is an integer As we show later, this choice of L and M leads to an efficient implementation of PFBLMS algorithm.
l

Note that if we want to use DFT to compute the output of various partitions in eq.(53), y (k ) = y l (k ) , then ~0 (k ) corresponds to the x
l =0 P 1

vector of input samples associated with the first partition, i.e. y0(n) in eq.(53), with n=kL+L-1. Observe that the first element of ~0 (k ) is x x(kL-M).

Similarly, the vectors corresponding to the subsequent partitions starts with samples x(kL-2M)=x((k-p)L-M) x(kL-3M)=x((k-2p)L-M) Note that M=pL and so on. We thus find that
x F ,l (k ) = x F , 0 (k pl ), for l = 1,2,..., p 1 ...(59)

Fig. 8.6 depicts an implementation of the PFBLMS algorithm when M=pL. The difference between Fig. 8.5 & Fig. 8.6 is hat each delay in Fig 8.5 is replaced by p delay units in Fig. 8.6 .

Summary of the PFBLMS Algorithm

Input: Tap-weight vector, W F , l ( k ) , l = 0 ,1,2 ,..., P 1 Extended input vector,


~ x0 (k ) = [ x (kL M ) x (kL M + 1) ... x (kL + L 1)]T

Past frequency domain vector of input, ~F ,0 ( k l ) , for k = 1,2,..., ( P 1) p x Desired output vector, d ( k ) = [d (kL) d ( kL + 1) ... d (kL + L 1)]T Output: Filter output, y (k ) = [ y (kL ) y( kL + 1) ... y (kL + L 1) ]T Tap-weight vector update, WF (k + 1) , l = 0,1,2,...,P 1

(1) Filtering
x F ,0 ( k ) = FFT ( ~0 (k )) x y ( k ) = the last L elements of IFFT (

W
l=0

P -1

F ,l

(k ) x F ,0 (k pl ) )

(2) Error estimation


e (k) = d (k)-y(k)

(3) Step-Nomalization for i=0 to M-1


2 2 x F ,0 , i (k ) = xF , 0 ,i ( k ) + (1 ) x F ,0,i (k )
2

i (k ) = 0 / xF , 0 ,i ( k ) 2 (k ) = [ 0 (k ) 1 (k ) L M ' 1 ( k ) ]T

(4) Tap-weight adaptation


0 eF (k ) = FFT ( ) e (k ) for l = 0 to P 1
* WF ,l (k + 1) = WF ,l ( k ) + 2 (k ) X F , 0 (k pL) eF (k )

(5) Tap-weight constraint for l = 0 to P-1


the first M elements of IFFT (WF ,l (k + 1)) WF ,l (k + 1) = FFT ( ) 0

Notes: M = partition length L = block length M=M+L


l

Step(5) is applicable only to the constrained PFBLMS algorithm where i =


1 e j 2pi /( p+1) . p +1

6.5.2
l

Computational Complexity and Memory Requirement

The computational complexity of the constrained PFBLMS algorithm depends on how the constraining step is implemented.

In the implementation of the unconstrained PFBLMS algorithm, the processing of each data block requires two (M+L)-point FFTs and one IFFT of the same length. Noting that L output samples are generated at the end of each block processing interval, we obtain the per sample computational complexity of the unconstrained PFBLMS algorithm as
3 ( p + 1)L C = ( p + 1) P + ( p + 1) log2 4 2

The memory requirement of the unconstrained and constrained PFBLMS algorithms are about the same. The number of memory words required to implement the PFBLMS algorithm is approximately
S = ( p + 1) 2 LP words

S-ar putea să vă placă și