Sunteți pe pagina 1din 73

Topic 8-Mean Square Estimation: Wiener and Kalman Filtering

Papoulis Chapter 13 2 weeks

Minimum Mean-Square Estimation (MMSE)


Optimum Linear MSE estimation ---the orthogonality principle
Wiener Filtering
Kalman Filtering
Adaptive Filtering (not in Papoulis)

Mean Square Estimation: Wiener and Kalman Filtering


In this topic we consider mean-square estimation of the value of
a random process S(t) at a time t by observing a process X(t)
over a time interval [a, b] (in Topic 4 we considered estimating a
Random Variable). The estimate is denoted by (t).
Estimation includes the following classes of problems
Smoothing and interpolation: if t is interior to [a, b] and there is no
noise the estimate will be called interpolating and with noise the
estimate is referred to as smoothing.
If t is outside this interval and if there is no noise, the estimate is
called a predictor. If t > b, then it is a forward predictor and if t
< a, then it is a backward predictor.
If t is outside this interval and if there is noise, then the estimate is
called filtering and prediction.

Unless stated otherwise the random processes are WSS and


the estimators are linear ( orthogonality condition).

Review: Parameter Estimation (Predicting the value of Y)


[Review from Topic 4]

Suppose Y is a RV with known PMF or PDF


Problem: Predict (or estimate) what value of Y will be
observed on the next trial
Questions:
What value should we predict?
What is a good prediction?
We need to specify some criterion that determines what is a
good/reasonable estimate.
Note that for continuous random variables, it doesnt make
sense to predict the exact value of Y, since that occurs with
zero probability.
A common estimate is the mean-square estimate.

The Mean Square Error (MSE) Estimate


We will let be the mean square estimate of the observable random
variable, Y with E[Y] =
The mean-squared error (MSE) is defined as:
(8-1)
e = E[(Y )2]
We proceed by completing the square
E[(Y )2] = E[(Y + )2]
= E[(Y )2 + 2(Y )( ) + ( )2]
= var(Y) + 2( )E[Y ] + ( )2
= var(Y) + ( )2 > var(Y) if
(8-2)
Clearly the MSE minimized when
(8-3)
=
= is called the minimum- (or least-) mean-square error (MMSE
or LMSE) estimate
The minimum mean-square error is var(Y).
4

The MSE of a RV Based Upon Observing another RV


Let X and Y denote random variables with known joint distribution
Suppose that we observe the value of X (that is, X is the observed
signal/data). How can we find the MMSE estimate of Y, denoted by
that is a function of the observed data, X?
Can the MMSE estimate, , which is a function of X, do better than
ignoring X and estimating the value of Y as = Y = E[Y]? Yes!
Denoting the MMSE estimate by c(X), the MSE is given by
" "
2

e = E XY {[Y ! c(X)] } =

2
[y
!
c(x)]
f X,Y (x, y)dxdy
#

!" !"
"

"

= # f X (x) # [y ! c(x)]2 fY |X (y | x)dydx


!"

(8-4)

!"

Note that the above integrals are positive, so that e will be minimized
if the inner integral is a minimum for all values of x.
Note that for a fixed value of X, c(x) is a variable [not a function].
5

The MSE of a RV Based Upon Observing another RV-2


Since for a fixed value of x, c(x) is a variable [not a function], we can
minimize the MSE by setting the derivative of the inner integral, with
respect to c, to zero:
"
d "
2
[y
!
c(x)]
fY |X (y | x)dy = # 2(y ! c) fY |X (y | x)dy = 0
#
dc !"
!"

(8-5)

Solving for c after noting that


"

"

# c(x) f

Y |X

!"

gives

(y)dy = c(x) # fY |X (y)dy = c(x), where the integral is one.

Y = c(X) =

!"
"

# yf

Y |X

(y)dy = E[Y | X]

(8-6)

!"

Thus the MMSE estimate, , is the conditional mean of Y given the


observation (or data) X.
The MMSE estimate is, in general, a nonlinear function of X.

MMSE Example
Let the random point (X, Y) is uniformly distributed on a semicircle
1

Y
(12)1/2!

The joint PDF has value 2/ on the semicircle


The conditional PDF of Y given that X = is a uniform density on [0,
(12)1/2].
So, = E[Y|X = ] = (1/2)(12)1/2 and this estimate achieves the
least possible MSE of var(Y|X = ) = (12)/12
Intuitively reasonable since
If || is nearly 1, the MSE is small (since the range of Y is small)
If || is nearly 0, the MSE is large (since the range of Y is large)

The Regression Curve of Y on X


1

= E[Y|X=] as a function of is a curve called the regression


curve of Y on X ) (plotted as the lower curve above)
Graph of (1/2)(12)1/2 is a half-ellipse
Given X value, the MMSE estimator of Y can be read off
from the regression curve

Example: As an example, suppose Y =X3 is the unknown.


Then the best MMSE estimator is given by

Y = E{Y | X } = E{ X 3 | X } = X 3 .

(8-7)

Clearly in this case Y =X3 is the best estimator for Y. Thus the
best estimator can be nonlinear.

Example : Let

#kxy,
f X , Y ( x, y ) = "
! 0

0 < x < y <1


otherwise,

where k > 0 is a suitable normalization constant. To determine the best


estimate for Y in terms of X, we need f Y | X ( y | x ).
1

f X ( x) = x f X ,Y ( x, y )dy = x kxydy
kxy
=
2

2 1

kx(1 x 2 )
=
,
2

0 < x < 1.

y
1

Thus

1
f X , Y ( x, y )
kxy
2y
f Y X ( y | x) =
=
=
; 0 < x < y < 1.
2
2
f X ( x)
kx (1 x ) / 2 1 x

So, the best MMSE estimator is given by


10

Y = ( X ) = E{Y | X } = x y f Y | X ( y | x)dy
1

= x y

2y
1 x 2
3

dy =

2
1 x 2

2
y
x dy

2 y
2 1 x 3 2 (1 + x + x 2 )
=
=
=
.
2
2
2
31 x x 31 x
3 1 x

(8-8)

Once again the best estimator is nonlinear. In general the best


estimator Y = E{ Y | X } is difficult to evaluate, and hence next we
will examine the special subclass of best linear estimators.

11

Linear MMSE Estimation I


Suppose that we wish to estimate Y using a linear function of the
observation X.
The linear MMSE estimate of Y is = aX + b, where a and b are
chosen to minimize the mean-square error E[(Y aX b)2]
Let Z =Y - = Y aX b be the error, then we will show that the
minimum occurs when

! XY " Y
a=
"X

b = Y ! a X

(8-9)

and the minimum MSE (with a linear estimate) is


2
y

2
XY

emin = ! (1! " )

(8-10)
12

Optimum Linear MSE Estimate: Proof


Suppose that a is fixed, then the problem is to estimate the
quantity Y - aX by the constant b.
But, we know from the previous example [see (8-3) that, under
those circumstances,
(8-11)
b = E [Y aX ] = Y a X
With b determined as above, the mean-square error becomes
E[(Y aX b)2] = E{ [(Y Y) a (X - X )]2
(8-12)
= (Y)2 -2aXYXY + a2(X)2
Minimization of (8-12) is accomplished by simply differentiating
this expression with respect to a giving
(8-13)
a = (XY Y )/X
Substituting these values of a and b into the MSE gives
2
y

2
XY

emin = ! (1! " )

(8-14)

13

Linear MMSE Estimation The Orthogonality Principle


As before, let Z =Y aX b be the estimation error, then the MSE
is
(8-15)
e = E[(Y aX b)2] = E [(Z2)]
Setting the derivative of the MSE with respect to a to zero gives

!e
= E[2Z("X)] = E[(Y " Y )X] = 0
!A

(8-16)

Which says that the estimation error, Z = Y- , is orthogonal, (that


is uncorrelated) with the received data X.
This is referred to as the orthogonality principle of linear
estimation.
When the estimation error is uncorrelated with the observed (data),
X, and, intuitively, the estimate has extracted all the correlated
information from the data.
14

Orthogonality Condition: A Geometric View


(from D. Snider text section 6.3)

Recall the properties of the dot product in 3-dimensional vector analysis:

! ! !
! ! ! !2 ! ! !2
v u = | v | cos! | u |, v v = | v | , u u = | u |

(8-16b)

We can use the dot product to express the orthogonal projection of one vector
onto another, as in the figure below.

15

!
!
The length of v proj is | v | cos! ; its direction is that of the unit
! !
vector u/ | u |; thus

!
! !
! !
!
!
! !
| v | cos! | u | u v u !
v proj = | v | cos! u/ | u | =
!
! = ! !u
|u |
| u | u u

(8-16c)

Now compare these identities with the expressions for the second moments
for zero-mean random variables

XY = ! X "! Y ,

XX = ! X2 ,

YY = ! Y2

(8-16d)

The dot products are perfectly analogous to the second moments if we


regard X and Y as the "lengths" of X and Y, and the correlation
coefficient as the cosine of the "angle between X and Y." After all, lies
between -1 and +1, just like the cosine . In this vein, we say two random
variables X and Y are "orthogonal" if E{XY} = 0 (so the angle is 900). Note
that this nomenclature is only consistent with these analogies when the
variables have mean zero.
The vector analogy is useful in remembering the least-mean-squared-error
formula. Furthermore, note that is orthogonal to in the figure. By analogy,
it reasonable that the prediction error is orthogonal (in the statistical sense)
to the LMSE predictor.

16

Gaussian MMSE = Linear MMSE


[From Topic 4]

In general, the linear MMSE estimate has a higher MSE than the
(usually nonlinear) MMSE estimate E[Y|X]
If X and Y are jointly Gaussian RVs, it can be shown that the
conditional PDF of Y given X = is a Gaussian PDF with mean

Y + (Y /X)( X)

(8-17)

and variance
(Y)2(1 2)

(8-18)

Hence,
E[Y|X = ] = Y + (Y /X)( X) (8-19)
which is the same as the linear MMSE.
For jointly Gaussian RVs, MMSE estimate = linear MMSE estimate
Another special property of the Gaussian RV.

17

Minimum Mean-Square Error (MMSE) Linear Estimate of a Random Process

From Topic 4, we know that the optimum mean-square


(generally, non-linear) estimate of a random process S(t) is the
conditional mean and is defined as

! E[S(t) | X(! ), a " ! " b],


S(t)

where a " t " b (8-20)

A linear estimate takes the form

=
S(t)

! h(! )X(! )d!,

where a " t " b

(8-21)

The objective is to find h() so as to minimize the MS error


b

2 } = E{[S(t) ! " h(! )X(! )d! ]2 } (8-22)


E{[S(t) ! S(t)]
a

18

Minimum Mean-Square Error (MMSE) Linear Estimate-2


From the orthogonality condition, the mean-square error will be a
minimum if the observed data is orthogonal to the estimation error
over the observation interval:
b

ES,X {[S(t) ! " h(! )X(! )d! ]X(" )} = 0

a #" #b

(8-23)

so that the optimal estimator, h(), can be found as the solution of


the integral equation
b
RSX (t, ! ) = h(" )RXX (", ! )d"
a " ! " b (8-24)

!
a

In general the above equation can only be solved numerically.

19

Examples of Linear MMSE


(assume all random processes are WSS)

Prediction: We want to estimate the future value of S(t +) based


on its present value. The optimum linear estimate is given by

+ ! ) = E[ S(t
+ ! ) | S(t)] = aS(t)
S(t

(8-25a)

The optimum linear estimate satisfies the orthogonality condition

E{[S(t + ! ) ! aS(t)]S(t)} = 0
and we can solve for a as
a = RS ()/RS (0)

(8-25b)

(8-26)

20

Examples of Linear MMSE


Filtering: We want to estimate the present value of S(t) based on
the present value of another process X(t). The optimum linear
estimate is given by

= E[ S(t)
| X(t)] = aX(t)
S(t)

(8-27)

The optimum linear estimate satisfies the orthogonality condition

E{[S(t) ! aX(t)]X(t)} = 0

(8-28)

and we can solve for a as


a = RSX (0)/RXX (0)
(8-29a)
and applying (8-14) we see that the minimum MSE (MMSE) is

emin = ! S2 (1! " 2SX ) = RSS (0) ! aRSX (0)

(8-29b)
21

Examples of Linear MMSE


Interpolation: We want to estimate the value of a process S(t) at a point t +
in the interval (t, t +T ) based on 2N +1 samples S(t+kT) that are within the
time interval (see Fig 13-1 in the text reproduced below).

The optimum linear (interpolation) estimate is


+ !) =
S(t

" a S(t + kT )
k

0#! #T

(8-30)

k=! N

The optimum linear estimate satisfies the orthogonality condition


+ !) !
E{[ S(t

" a S(t + kT )]s(t + nT )} = 0


k

|n| # N

0 # ! # T (8-31)

k=! N
22

From which it follows that


N

" a R (kT ! nT ) = R (! ! nT ),
k

! N # n # N, 0 # ! # T (8-32)

k=! N

This is a system of 2N + 1 linear equations that can be solved


to yield the 2N + 1 unknowns ak .

23

Examples of Linear MMSE


Smoothing: We want to estimate the present value of S(t) based
on the value of another process X(t) which is the sum of the
signal S(t) and a noise signal, (t):
X(t) = S(t) + (t)
(8-33)
The optimal estimate can be written as the conditional mean

= E[S(t) | x(! ), !" < ! < "]


S(t)

(8-34)

and the linear estimate is

=
S(t)

"

$ h(! )X(t ! ! )d!

!" # t # "

(8-35)

!"

Note that the estimate (t) is the output of a linear filter with
impulse response h() and with input X(t).
The orthogonality condition gives

E{[S(t) ! S(t)]X(t
!! ) = 0

all !

(8-36)
24

The previous equation is equivalent to:


"

E{[S(t) ! $ h(! )X(t ! ! )d! ]X(t ! " )} = 0

!" # t # "

(8-37)

!"

Which becomes
"

RSX (! ) =

# h(" )R

XX

(! ! " )d" for all !

(8-38)

!"

To determine h(t) we need to solve the above integral equation


which is easy to so since it is a convolution of h() with RXX ( )
that hold for all values of .
Taking transforms of both sides we obtain
SSX() = H() SXX()
or
SSX (! )
(8-39)
H (! ) =

SXX (! )

which is known as the non-causal Wiener filter.


Why is this a non-causal solution?

25

Figure showing that the estimate is


the output of a linear filter

So with the independent signal and noise,


SSX() = SSS() and SXX() = SSS() + S()

(8-40)

and (8-39) simplifies to

SSS (! )
H (! ) =
SSS (! ) + S"" (! )

(8-41)

Is this an intuitively reasonably solution? What happens when


the noise gets very small?

26

If the spectra SSS() and S() shown below do not overlap, then
H() = 1 in the band of the signal and H() = 0 in the band of the
noise, and the MMSE is zero.

Which can seen by extending (8-14)

1
2
2
emin = ! S (1! " SX ) =
2#
1
=
2#

"
*
[S
(
$
)
!
H
($ )SSX ($ )]d$
# SS

!"

"

SSS ($ )S%% ($ )
# S ($ ) + S ($ ) d$
%%
!" SS

(8-42)
27

Nonlinear Orthogonality Rule


.
Interestingly, a general form of the orthogonality principle also holds
in the case of nonlinear estimators also.
Nonlinear Orthogonality Rule: Let h( X ) represent any functional
form of the data and E{Y | X } the best estimator for Y given X . With
e = Y E{Y | X } we shall show that
(8-43)

E{eh( X )} = 0,
implying that
e = Y E{Y | X }
This follows since

h( X ).

E{eh(X)} = E X {(Y ! EY |X [Y | X])h(X)}


= E X {Yh(X)} ! E X {EY |X [Y | X]h(X)}
= E X {Yh(X)} ! E X {EY |X [Yh(X) | X]}
= E X {Yh(X)} ! E X {Yh(X)} = 0.

(8-44)
28

PILLAI

Discrete Time Processes


The non-causal estimate, [n] of a discrete time process in terms of the
observed data
(8-45)
X[n] = S[n] + [n]
"
is the estimate
= # h[k]x[n ! k]
(8-46)
S[n]
!"

which is the output of a linear time invariant, non-causal system with input
X[n] and impulse response h[n].
By the orthogonality principle we have
"

E{(S[n]! # h[k]x[n ! k])X[n ! m]} = 0,


so that

!"

RSX [m] =

for all m

(8-47)

"

# h[k]R

XX

[m ! k],

for all m

(8-48)

k=!"

Taking z-transform of both sides of (8-48) gives


H(z) =

SSX [z]
S XX [z]

(8-49)
29

Causal Prediction---Use Entire Past


Consider the estimation of a process S[n] in terms of its entire past
S[n-k], k 1:
#
= E[S[n] | s[n ! k], k " 1] = $ h[k]S[n ! k]
(8-50)
S[n]
k=1

The objective is to find the constants h[k] so as to minimize the


MSE. From the orthogonality principle the error S[n]-[n] must be
orthogonal to the data S[n-m] giving
"

E{(S[n]! # h[k]S[n ! k])S[n ! m]} = 0,

m $1

(8-51a)

k=1

which gives the Wiener-Hopf (discrete) equation


!

RS [m] = " h[k]RS [m # k],

m $1

(8-51b)

k=1

Equation (8-51) is a system of infinitely many equations expressing


the unknowns h[k] in the terms of the autcorrelation RS[m]

30

The Wiener-Hopf equations cannot be directly solved with Z


transforms, since the two sides are not equal for every value of m.
There is fairly complicated (mathematically) spectral theory
described in the text that factors the transfer function of the impulse
response h[n] into causal and anti-causal sequences. We will not
discuss this approach.
Instead we will consider the more practical case of a predictor that
uses a finite number of past samples.
The solutions involve (straightforward) matrix inversion operations.

31

Causal Prediction-Using L Past Samples


Consider the estimation of a process S[n] in terms of its past L samples

S[n-k], k 1:
L
= E[S[n] | s[n ! k],1 " k " L] = # h[k]S[n ! k]
S[n]

(8-52)

k=1

The objective is to find the L filter constants h[k] so as to minimize the


MSE. From the orthogonality principle the error S[n]-[n] must be
orthogonal to the data S[n-m] giving
L

E{(S[n]! " h[k]S[n ! k])S[n ! m]} = 0,

1# m # L

k=1

(8-53)

which gives the Wiener-Hopf (discrete) equation


L

RS [m] = ! h[k]RS [m " k],

1# m # L

(8-54)

k=1

Equation (8-51) is a system of L equations expressing the unknowns h[k]


in the terms of the autcorrelation RS[m]

32

By rewriting [8-54] as
L

! R [m " k]h[k] = R [m],


S

0#m#L

[8-55]

k=1

we recognize [8-55]can be written as a matrix-vector equation

Rh = r

[8-56]

where the matrix R is an L x L Toeplitz matrix with the kmth


element equal to Rm-k , h is an Lx1 column vector with kth
element equal to h[k], and r is a Lx1 column vector with kth
element equal to Rk .
A Toeplitz matrix is a matrix in which each descending
diagonal from left to right is constant. For example, if the
Toeplitz matrix A has an ijth element Ai,j , then Ai+1,j+1 = Ai,j
There are many computationally efficient algorithms for
inverting a Toeplitz matrix.
33

It is easy to see that that matrix R is Toeplitz by displaying


the elements of the matrix-vector equation

" R0
$
$ R1
$ R2
$
$
$R
$ L!2
# RL!1

R2 !

RL!1 %
'
R0 R1 ! RL!2 '
R1 R0 ! RL!3 '
'
"
'
R2 ! R0 R1 '
'
RL!2 ! RL!1 R0 &
R1

" h1 % " R1 %
$
' $ '
$ h2 ' $ R2 '
$ h3 ' $ R3 '
$
'=$ '
$" ' $ " '
$h ' $ '
$ L!1 ' $ '
# hL & # R0 &

[8-57]

which can be solved by standard matrix inversion techniques


to give
h = R !1r
[8-58]
34

Adaptive Signal Processing for Wiener Filters --the LMS Algorithm


(the next set of charts on Adaptive Filtering is not in the text)

Optimal Wiener filter theory was developed to provide structure to the


process of selecting the most appropriate frequency characteristics of
linear filters.
A wide range of different approaches can be used to develop an
optimal filter depending on the nature of the problem: specifically,
what, and how much, is known about signal, channel, and noise
features.
If a model of the desired signal is available, with unknown
parameters, then a well-developed and popular class of adaptively
adjusted filter parameters using the Least Mean Square (LMS)
algorithm can be applied.
Using the LMS algorithm the filters can also be made adaptive to
respond to particular system parameters or changes.
35

Least Mean Square Filtering- Generic Problem


Input: x(n)
H(z)
Linear Filter

Desired
Response:
d(n)

Filter
Output:
y(n)

Estimation
Error: e(n)

The basic concept behind Wiener Filter theory is to minimize the difference
between the filter output, y(n), and some desired output, d(n). Noise could be
present in the filter output. This minimization either performs a matrix inversion
such as in (8-58) to find the Wiener filter when the model is known, or when the
model has some unknown parameters will use the least mean square (LMS)
approach, which adaptively adjusts the filter coefficients to reduce the square of
the difference between the desired and actual waveform after filtering. As before,
we will assume that H(z) is a feedforward finite impulse response (FIR) filter with
coefficients h(k) = hk , K=1,2,L.
The system is described by the following equation
L

e(n) = d(n) ! y(n) = d(n) ! " h(k ! n)x(n)


k=1

[8-59]

36

Least Mean Square Filtering--- System Identification


The LMS approach has a number of other applications in addition to standard
filtering including systems identification, interference canceling, and inverse
modeling or de-convolution. For system identification, the filter is placed in
parallel with the unknown system and the parameters can be adapted (i.e.,
changed) to minimize the estimation error.
The desired output
is the output of the
unknown system,
and the filter
coefficients are
either (1) computed
(Wiener) or (2)
adjusted (LMS
adapted) so that the
filter output best
matches that of the
unknown system in
the MMSE sense.

Unknown
System

Input:
x(n)

Estimation
Error: e(n)

_
H(z)
Linear Filter

Estimated
Output: y(n)

37

Wiener Filter ~ MATLAB Implementation


If the system parameters are known, the Wiener-Hopf equation
can be solved using MATLAB s matrix inversion operator
( \ ) as shown in the following example.
The MATLAB toeplitz function is useful in setting up the
correlation matrix. The function call is:
Rxx = toeplitz(rxx);
where rxx is the input row vector. This constructs a
symmetrical matrix from a single row vector and can be used
to generate the correlation matrix in the Wiener-Hopf equation
from the autocorrelation function rxx .

38

Example 1: Given a sinusoidal signal in noise (SNR = - 8 db), design


an optimal filter using the Wiener-Hopf equation. Assume that you
have a copy of the desired signal available (usually the desired signal
would have to be estimated).
The program to implement this example first generates the data, then
calculates the coefficients using the routine wiener-hopf , filters the
data using filter , and plots the results.
Note: the filter coefficients are called b in MATLAB (as opposed to h)
fs = 1000;
N = 1024;
L = 256;
%
[xn, t, x] = sig_noise(10,-8,N);

% Sampling frequency
% Number of points
% Optimal filter order

% xn is signal + noise and


%
x is the desired signal
% Determine the optimal FIR filter coefficients and apply
b = wiener_hopf(xn,x,L);
% Apply Wiener-Hopf Equations
y = filter(b,1,xn);
% Filter data using optimum filter
. plot results
39

Example (continued): The solution uses the routine wiener_hopf to calculate


the optimum filter coefficients.
This program computes the correlation matrix from the autocorrelation function
and the toeplitz routine, and also computes the crosscorrelation function.

function b = wiener_hopf(x,y,maxlags)
% Function to compute LMS algol using Wiener-Hopf equations
% Inputs:
x = input
%
y = desired signal
%
Maxlags = filter length
% Outputs:
b = FIR filter coefficients
%
rxx = xcorr(x,maxlags,'coeff'); % Compute the autocorrelation vector
rxx = rxx(maxlags+1:end)';
% Use only positive half of symm. vector
rxy = xcorr(x,y,maxlags);
% Compute the crosscorrelation
vector
rxy = rxy(maxlags+1:end)';
% Use only positive half
%
rxx_matrix = toeplitz(rxx);
% Construct correlation matrix
b = rxx_matrix\rxy;
% Calculate FIR coefficients using matrix
%
inversion,

40

Example: Results

SNR -8 db; 10 Hz sine


2

The original data


(upper plot) are
considerably less
noisy after
filtering (middle
plot).
The filter
computed by the
Wiener-Hopf
algorithm has the
shape of a
bandpass filter
with a peak
frequency at the
signal frequency
of 10 Hz.

0
-2
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.6

0.7

0.8

0.9

80

90

100

Time(sec)
After Optimal Filtering
500
0
-500
0

0.1

0.2

0.3

0.4

0.5
Time(sec)

0.6
Optimal Filter Frequency Plot
0.4
0.2
0

10

20

30

40

50
60
Frequency (Hz)

70

41

LMS Adaptive Filters


Classical filters (FIR and IIR) and optimal Wiener filters have fixed frequency
characteristics and can not respond to changes that might occur during the
course of the signal. Adaptive filters can modify their properties based on
selected features of signal being analyzed, or can work when the frequency or
statistical characteristics are not known a priori.
The LMS Algorithm consists of two basic processes
Filtering process: calculate the output of FIR filter by convolving input and
taps. Calculate the estimation error by comparing the output to desired
signal
Adaptation process: Adjust tap weights based on the estimation error
A typical adaptive filter paradigm is shown below, where the arrow denotes a
quantity that is being adapted. Typically the filter is a FIR filter (which is what
we will assume) with impulse response h(k).

Input:
x(n)

H(z)

Response:
y(n)

Error: e(n)

Desired
Response:
d(n)

42

Most commonly used adaptive filtering algorithm.


Define cost function as mean-squared difference
between filter output and desired response (MSE).
If parameters known, uses the method of steepest
descent to invert the Wiener matrix
Move towards the minimum on the error
surface to get to minimum (the MSE has a
single minimum)
Requires the gradient of the error surface to be
known
When parameters not known, the most popular
adaptation algorithm is the LMS algorithm
Derived from steepest descent
Does not require gradient to be know: it is
estimated at every iteration

Mean-Square Error (MSE)

Stochastic Gradient Approach

h2

Weight Values

Mean-Square Error (MSE) as a (Convex)


Function of the Tap Weight

h1

estimated gradient

! update value $ ! old value


$ ! learning- $! tap ' $
#
& #
& #
&#
&! error $
# of tap-weigth & = # of tap-weight & + # rate
&# input &# signal &
%
#
& #
& #
&#
&"
" vector
% " vector
% " parameter %" vector %

43

Least Mean Squared (LMS) Approach to Adaptive Filtering


If the MSE, e, were available, then the algorithm would use the MSE to compute
the optimum filter coefficients. But, in most practical situations, the MSE is
unknown or changing, while the instantaneous error en is often available.
Note that the MSE is defined as E [(en)2], and, ideally, we are interested in the
gradient, or derivative, of the MSE with respect to the adjustable parameters h(k).
Taking the derivative of the MSE with respect to h(k) gives:

!E[(en )2 ]
!(en )2
=E
!h(k)
!h(k)

[8-60]

where we have used the property that the differentiation and expectation
operations are interchangeable.
So, since we dont have access to the average MSE, we will drop the E operation
and use the fact that
[8-61]
!(en )2

!h(k)
is an unbiased estimate of the gradient [8-60] to approximate the gradient.

44

Least Mean Squared (LMS) Approach to Adaptive Filtering


The LMS algorithm uses the estimated gradient [8-61] to adjust the filter
parameters.
The LMS algorithm adjusts the filter coefficients so that the sum of the squared
errors, which approximates (estimates) the MSE, converges toward this
minimum. The LMS algorithm uses a recursive gradient method known as the
steepest-descent method for finding the filter coefficients that produce the
minimum sum of squared errors. A modified steepest-descent algorithm updates
the adjustable parameters to move in the direction of the negative gradient. The
symbol hn(k) denotes the impulse response coefficient h(k) at the nth iteration of
the LMS algorithm.
Filter coefficients are modified using an estimate of the negative gradient of the
error function with respect to a given hn(k). This estimate is given by the partial
derivative of the (instantaneous)squared error, en , with respect to the
coefficients, hn(k): using the chain rule for differentiation and [8-59] we have

!en2
![d(n) " y(n)]
= 2e(n)
= "2e(n)x(n " k)
!hn (k)
!hn (k)

[8-62]
45

LMS Algorithm (continued)


Using this (estimate) for the gradient to construct an error signal,
the LMS algorithm updates the filter parameters in the direction of
the negative gradient, so if the filter parameter h(k) at the nth
iteration of the LMS algorithm is denoted by hn(k), the LMS
algorithm computes hn+1(k) as
hn+1(k) = hn(k) + e(n)x(n-k) , k = 1,2,.,N and n=1,2,, [8-63]
where is a constant learning rate parameter that controls the rate
of descent and convergence to the filter coefficients.
The equation [8-63] can be written as a vector iterative equation as
hn+1 =hn + e(n) xn ,

n=1,2,,

where xn is a Nx1column vector whose mth entry is x(n-m).

[8-64]
46

Example 2: Applying the LMS algorithm to a systems identification task.


The unknown system will be an all-zero linear process with a digital
Transfer Function of:
H(z) = .5 +.75z-1 + 1.2 z-2
Confirm the match by plotting the magnitude of the Transfer Function for
both the unknown and matching systems.

b_unknown = [.5 .75 1.2];


% Define unknown process
xn = randn(1,N);
xd = conv(b_unknown,xn);
% Generate unknown system output
xd = xd(3:N+2);
% Truncate extra points (symmetrically)
%
%
Apply Weiner filter
b = wiener_hopf(xn,xd,L);
% Compute matching filter coefficients
b = b/N;
% Scale filter coefficients
..Calculate and plot frequency characteristics.

47

Example Results

Original Coefficients: 0.5; 0.75; 1.2


Identified Coefficients: 0.44; 0.67; 1.1
Unknown Process

Matching Process

The identified Transfer


Function and coefficients
closely matched those of
the unknown system.

4.5
6
4
5

3.5

|H(z)|

|H(z)|

In this example, the


unknown system is an
all-zero system so the
match by an FIR filter
was quite close. A
system containing both
poles and zeros would
be more difficult to
match.

2.5

3
2

1.5

1
1
0.5

50

100

150

Frequency (Hz)

200

250

50

100

150

200

250

Frequency (Hz)

48

Adaptive Noise Cancellation


Adaptive noise cancellation requires a reference signal that contains
components of the noise, but not the signal. The reference channel
carries a signal N (n), that is correlated with the noise N(n), but not
with the signal of interest, x(n). The adaptive filter will produce an
output N*(n), that minimizes the overall output. Since the adaptive
filter has no access to the signal, x(n), it can only reduce the overall
output by minimizing the noise in this output.

x(n) + N(n)
Signal Channel
N(n)
Reference Channel

Adaptive
Filter

N*(n)

Desired output
Adaptive Noise
Cancellation

Error Signal
e(n) = x(n) -N*(n)
49

Adaptive Line Enhancement (ALE)


A reference signal is not necessary to separate narrowband from
broadband signals. In Adaptive Noise Enhancement, broadband and
narrowband signals are separated by delay: only narrowband signals will be
related to delayed versions of themselves. The error signal contains both
broadband and narrowband signals, but the filter can reduce only the
narrowband signals. Hence the adaptive filter output contains the filtered
narrowband singal. The decorrelation delay must be chosen with care.
Desired output:
Broadband Signal
(Inteference
Suppression)

B(n) + Nb(n)

Delay
D
Decorrelation
delay

Adaptive
FIR Filter

Nb*(n)

Desired output:
Narrowband
Signal
e(n) = Bb(n) + Nb(n) - Nb*(n)
(Adaptive Line
Enhancement)
Error Signal

50

Example: 3 Given the same sinusoidal signal in noise as used in


Example 1, design an adaptive filter to remove the noise. Just as in
Example 1, assume that you have a copy of the desired signal.

% Same initial lines as in Example 8-1 .....


% xn in the input signal containing noise
% x is the desired signal (as in Ex 8-1 I nose free version of the signal)
%
% Calculate Convergence Parameter
PX = (1/(N+1))* sum(xn.^2);
% Calculate approx. power in xn
delta = a * (1/(10*L*PX));
% Calculate
%
[b,y] = lms(xn,x,delta,L);
% Apply LMS algorithm (see below)
%
% Plotting identical to Example 8-1....

The adaptive filter coefficients are determined by the LMS algorithm


51

LMS Algorithm
The LMS algorithm is implemented in the function lms.
The input is x, the desired signal is d, delta is the
convergence factor and L is the filter length.
function [b,y,e] = lms(x,d,delta,L)
% Simple function to adjust filter coefficients using the LSM algorithm
% Adjusts filter coefficients, b, to provide the best match between
%
the input, x(n), and a desired waveform, d(n),
% Both waveforms must be the same length
% Uses a standard FIR filter
%
M = length(x);
b = zeros(1,L); y = zeros(1,M);
% Initialize outputs
for n = L:M
x1 = x(n:-1:n-L+1);
% Select input for convolution
y(n) = b * x1';
% Convolve (multiply) weights
with input
e(n) = d(n) - y(n);
% Calculate error
b = b + delta*e(n)*x1;
% Adjust weights
end

52

Example: Results
SNR -8 db; 10 Hz sine
4

x(t)

2
0
-2
0

0.1

0.2

0.3

0.4

y(t)

0.5

0.6

0.7

0.8

0.9

0.6

0.7

0.8

0.9

90

100

Time(sec)

0.5

After Adaptive Filtering

-0.5
0

0.1

0.2

0.3

0.4

-7

x 10

0.5
Time(sec)

Adaptive Filter Frequency Plot

6
|H(f)|

Application of an
adaptive filter using
the LSM recursive
algorithm to data
containing a single
sinusoid (10 Hz) in
noise (SNR = -8
db). The filter
requires the first 0.4
to 0.5 seconds to
adapt (400-500
points), and that the
frequency
characteristics after
adaptation are
those of a
bandpass filter with
a single cutoff
frequency of 10 Hz.

4
2
0

10

20

30

40

50

Frequency (Hz)

60

70

80

53

Adaptive Line Enhancement (ALE)


In the next example an ALE Filter is constructed using the LMS
algorithm. The desired waveform is just the signal delayed. The
best delay was found empirically to be 5 samples.
delay = 5;
% Decorrelation delay
a = .075;
% Convergence gain
%
%Generate data: two sequential sinusoids, 10 & 20 Hz in noise (SNR = -6)
x = [sig_noise(10,-6,N/2) sig_noise(20,-6,N/2)];
.. Plot original signal .
%
PX = (1/(N+1))* sum(x.^2);
% Calculate waveform power for delta
delta = (1/(10*L*PX)) * a;
% Use 10% of the max. range of delta
%
xd = [x(delay:N) zeros(1,delay-1)]; % Delay signal to decorrelate noise
[b,y] = lms(xd,x,delta,L);
% Apply LMS algorithm
Plot filtered signal ..
54

Example 4: Results Unlike a fixed Wiener Filter, an adaptive filter can track
changes in a waveform as shown in this example where two sequential
sinusoids having different frequencies (10 & 20 Hz) are adaptively filtered.

10 & 20 Hz SNR -6 db
6
4

x(t)

2
0
-2
-4

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

1.2

1.4

1.6

1.8

Time(sec)
0.4
After Adaptive Filtering

y(t)

0.2

-0.2

-0.4

0.2

0.4

0.6

0.8

Time(sec)

55

Example 5: Adaptive Noise Cancellation (ANC). The LMS algorithm


is used with a reference signal to cancel a narrowband interference
signal.
Original Signal
1

0
-0.5
-1

0.5

1.5

2.5

3.5

3.5

3.5

x(t) + n(t)

Signal + interference
1
0
-1
-2

0.5

1.5

2.5

After Adaptive Noise Cancellation


1

y(t)

In this
application,
approximately
1000 samples
(2.0 sec) are
required for the
filter to adapt
correctly.

x(t)

0.5

0
-1
-2

0.5

1.5

Time(sec)

2.5

56

Phase Sensitive Detection

Phase Sensitive Detection, also known as Synchronous or


Coherent Detection, is a technique for demodulating
amplitude modulated ( AM ) signals that is also very
effective in reducing noise.
From a frequency domain point of view, the effect of
amplitude modulation is to shift the signal frequencies to
another portion of the spectrum on either side of the
modulating, or carrier, frequency.
Amplitude modulation can be very effective in reducing noise
because it can shift signal frequencies to spectral regions
where noise is minimal.
The application of a narrowband filter centered about the new
frequency range (i.e. the carrier frequency) can then be used to
remove the noise.
A Phase Sensitive Detector functions as a narrowband filter
that tracks the carrier frequency. The bandwidth can be quite
small.
57

Phase Sensitive Detection


Multiplier
Vm(t)

Vc(t)

V(t)

Lowpass
Filter

Vout(t)

Vc*(t)
Phase
Shifter

If Vm(t) = A(t) cos(ct) and Vc(t) = cos(ct) the carrier signal.


Te output of the multiplier, V (t) is:
V (t) = Vm(t) cos (ct ) = A(t) cos (ct) cos (ct + )
= A(t)/2 [ cos (2 ct + ) + cos ]
After lowpass filtering the high frequency term, cos (2 ct + ), is removed:
Vout(t) = A(t) cos
And the output is a demodulated signal, A(t) time a constant (cos ).

58

Phase Sensitive Detection (continued)


Frequency characteristics
of a phase sensitive
detector. The frequency
response of the lowpass
filter is effectively reflected
about the carrier
frequency producing a
bandpass filter that tracks
the carrier frequency. By
making the cutoff
frequency small, the
bandwidth of the virtual
bandpass can be very
narrow.

1.2
BW
1

0.8

0.6

0.4

0.2

fc
0

0.5

1.5

Frequency (Hz)

59

Example 6: Using a Phase Sensitive Detection to demodulate the


signal amplitude modulated with a 5 Hz sawtooth wave. The AM
signal is buried in -10 db noise. The filter is chosen as a second-order
Butterworth lowpass filter with a cutoff frequency set for best noise
rejection while still providing reasonable fidelity to the sawtooth
waveform.

wn = .02;
[b,a] = butter(2,wn);
%
% Phase sensitive detection
ishift = fix(.125 * fs/fc);
vc = [vc(ishift:N) vc(1:ishift-1)];
v1 = vc .* vm;
vout = filter(b,a,v1);

% Lowpass filter cutoff frequency


% Design lowpass filter
% Shift carrier by 1/4 period
% using periodic shift
% Multiplier
% Apply lowpass filter

60

Example: Results
Phase Sensitive Detection
2

Vm(t)

1
0
-1
-2
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.4

0.5

0.6

0.7

0.8

0.9

0.4

0.5

0.6

0.7

0.8

0.9

Modulated Singal with Noise

Vm(t)

1
0
-1
-2
0

0.1

0.2

0.3

Vm(t)

1
0.5
0
-0.5

Demodulated Signal
0

0.1

0.2

0.3

Time (sec)

61

Kalman Filter
The Kalman Filter is a recursive (iterative) time-domain data
processing algorithm in the time domain that solves the same
problems as the Wiener filter. The Kalman filter can be also be
made adaptive, but we will not cover this topic (I do in my Digital
Communications course )
Generates optimal estimate of desired quantities given the set of
measurements (estimation, prediction, interpolation, smoothing,)
Optimal filtering for linear system and white Gaussian errors,
Kalman filter is best estimate based on all previous
measurements
Recursive/Iterative
Does not need to store all previous measurements and reprocess all
data each time step.
Kalman algorithmic approach can be viewed as two steps: (1)
prediction and then (2) correction.
62

Kalman Algorithm System Model

Black box
system model

System
model noise
Input

System
dynamics

System
state

Output
device

Optimal
estimate of
system state

Observed
output
Kalman
FilterEstimator

Measurement
noise

63

Kalman Filter Overview---Discrete Time


The system state process, yk, that is to be estimated is modelled by the following
difference equations
yk = Ayk-1 + Buk + wk-1
[8-65a]
zk = Hyk + vk

[8-66a]

where the system state process is denoted by yk with filter parameters, A and
B, and the output filter H that are known. The model noise is wk. The process zk
is the observable system output (filtered signal + noise) and the process uk is the
system input. The model noise wk has covariance Q, the measurement noise vk
has covariance R, and P denotes the prediction error co-variance matrix.
Kalman Filter algorithm is a two-step process: prediction and correction
1. Prediction: -k is an estimate based on measurements at previous time steps
that follows the system above system dynamics
-k = Ayk-1 + Buk
[8-67a]
P-k = APk-1AT + Q
[8-67b]
2. Correction: k has additional information the measurement at time k
k = -k + Kk(zk - H -k )
[8-68a]
Pk = (I - KkH)P-k
where Kk = P-kHT(HP-kHT + R)-1
[8-68b]
64

Blending Factor
If we are sure about measurements:
Measurement error covariance of the output noise R decreases to zero
The Kalman Gain, Kk decreases and weights residual more heavily than
prediction

If we are sure about prediction


Prediction error covariance P-k decreases to zero
The Kalman Gain Kk increases and weights prediction more heavily
than residual

65

Kalman Filter Summary

Prediction (time update)

Correction (Measurement Update)


(1) Compute the Kalman Gain

(1) Project the state ahead


-k = Ayk-1 + Buk
(2) Project the error covariance ahead
P-k = APk-1AT + Q

Kk = P-kHT(HP-kHT + R)-1
(2) Update estimate with measurement zk
k = -k + K(zk - H -k )
(3) Update error covariance
Pk = (I - KH)P-k

66

Example Constant System Model

Prediction: system model has no input and no model noise


-k = yk-1
P-k = Pk-1
Correction:
Kk = P-k(P-k + R)-1
k = -k + Kk(zk - H -k )
Pk = (I - Kk)P-k

67

Example Constant System Model


0

k
-0.1

-0.2

-0.3

-0.4

-0.5

-0.6

-0.7

10

20

30

40

50

60

70

80

90

100

68

Example Constant Model


1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

10

20

30

40

50

60

70

80

90

100

Convergence of Error Covariance - Pk


69

Example Constant Model


Larger value of R the
measurement error covariance
(indicates poorer quality of
measurements)

k-0.1
-0.2

-0.3

Filter slower to believe


measurements slower
convergence

-0.4

-0.5

-0.6

-0.7

10

20

30

40

50

60

70

80

90

100

70

Comparing the Least-Squares (Kalman ) and the Least Mean Square Error
(Wiener) Approach
2
n

Wiener Filter minimize E[e ] (8-69a)

Least Mean Square [LMS] criterion


is statistical

Error criterion is not an explicit


function of the data, but depends
only on statistics

Kalman Filter minimizes

"w

N!n 2
n

n=o

Least Squares (Kalman) Error


criterion is an explicit function of
the signal samples

To track variations in the channel,


the weighting factor, w [<1], is
used to determine the relative
importance of the past errors
The equalizer coefficients are
chosen to minimize the least
squares error for each sampling
time, n

(8-69b)

71

Least-Squares/Kalman Algorithm: For an FIR Filter

LSE N = " w N!n en2 = " w N!n (zk ! dk )2 = " w N!n (y'n c n ! dn )2
n=0

n=0

(8-70a)

n=0

Solving the above gives: c n = A !1


n yn

(8-70b)

Where, A n = " w n!k rk' + ! I [ ! is the noise power]


k=0

(8-70c)

= wAn!1 + rn rn' ,

(8-70d)

and

rn = " w n!k rk ak = wrn!1 + rn an

(8-70e)

k=0

Challenge: given cn , how do we find cn+1 ?


72

Least-Squares/Kalman Algorithm: For a Tapped Delay Line Equalizer---continued

The key result that we use to derive the Kalman algorithm is the
!1
Matrix Inversion Lemma to determine An!1 from An!1

!1
n

A =w

!1

{A

!1
n!1

!1
!1
An!1
rn rn' An!1
!
!1
w + rn' An!1
rn

Letting, Dn = An!1
kn =

(8-71a)

(8-71b)

1
Dn!1rn [denotes the Kalman Gain] (8-71c)
w + n

n = rn' Dn!1rn

(8-71d)

It can be shown that


cn+1 = cn + kn en

(8-71e)

Dn = w !1[Dn!1 ! kn rn' Dn!1 ]

(8-71f)

This algorithm takes "big" steps in the direction of the Kalman gain to iteratively
realize the optimum tap setting at each time instant [based upon the received
samples up to "n"]. The algorithm is effectively using the Gram-Schmidt
orthogonalization technique to realize copt from the successive input vectors, {rn }
The Kalman algorithms converge in ~N iterations !
73

S-ar putea să vă placă și