Least Squares Full Resume

Least Squares with Examples in Signal Processing1 The sum of squares of x is denoted by kxk22 , i.e.
Ivan Selesnick March 7, 2013 NYU-Poly kxk22 =

X
|x(n)|2 = xT x. (3)
These notes address (approximate) solutions to linear equations by least n
squares. We deal with the easy case wherein the system matrix is full
The energy of a vector x refers to kxk22 .
rank. If the system matrix is rank deficient, then other methods are
In these notes, it is assumed that all vectors and matrices are real-valued.
needed, e.g., QR decomposition, singular value decomposition, or the
In the complex-valued case, the conjugate transpose should be used in place
pseudo-inverse, [2, 3].
of the transpose, etc.
In these notes, least squares is illustrated by applying it to several basic
problems in signal processing: 2 Overdetermined equations
1. Linear prediction Consider the system of linear equations
2. Smoothing y = Hx.
3. Deconvolution If there is no solution to this system of equations, then the system is

4. System identification overdetermined. This frequently happens when H is a tall matrix (more
rows than columns) with linearly independent columns.
5. Estimating missing data In this case, it is common to seek a solution x minimizing the energy of
For the use of least squares in filter design, see [1]. the error:
1 Notation J(x) = ky Hxk22 .

We denote vectors in lower-case bold, i.e.,
Expanding J(x) gives

x1
J(x) = (y Hx)T (y Hx) (4)
x2

x= .. .
(1) = yT y yT Hx xT HT y + xT HT Hx (5)
.
T T T T
xN = y y 2y Hx + x H Hx. (6)
We denote matrices in upper-case bold. The transpose of a vector or matrix Note that each of the four terms in (5) are scalars. Note also that the scalar
in indicated by a superscript T , i.e., xT is the transpose of x. xT HT y is the transpose of the scalar yT Hx, and hence xT HT y = yT Hx.
The notation kxk2 refers to the Euclidian length of the vector x, i.e., Taking the derivative (see Appendix A), gives

q
kxk2 = |x1 |2 + |x2 |2 + + |xN |2 . (2) J(x) = 2HT y + 2HT Hx
x
1 For feedback/corrections, email selesi@poly.edu.
Setting the derivative to zero,
Matlab software to reproduce the examples in these notes is available on the web or
from the author. http://eeweb.poly.edu/iselesni/lecture_notes/
Last edit: November 26, 2013 J(x) = 0 = HT Hx = HT y
x
1
Let us assume that HT H is invertible. Then the solution is given by Set the derivatives to zero to get:
x = (HT H)1 HT y. x=
1 T
H (11)
2
This is the least squares solution. y = Hx (12)
1
min ky Hxk22 = T
x = (H H) T
H y (7)
x Plugging (11) into (12) gives
In some situations, it is desirable to minimize the weighted square error, 1
y= HHT .
i.e., n wn rn2 where r is the residual, or error, r = y Hx, and wn are
P
2
positive weights. This corresponds to minimizing kW1/2 (y Hx)k22 where
Let us assume HHT is invertible. Then
W is the diagonal matrix, [W]n,n = wn . Using (7) gives
= 2(HHT )1 y. (13)
min kW1/2 (y Hx)k22 = x = (HT WH)1 HT Wy (8)
x
where we have used the fact that W is symmetric. Plugging (13) into (11) gives the least squares solution:
3 Underdetermined equations x = HT (HHT )1 y.

Consider the system of linear equations We can verify that x in this formula does in fact satisfy y = Hx by plugging
y = Hx. in:
Hx = H HT (HHT )1 y = (HHT )(HHT )1 y = y

If there are many solutions, then the system is underdetermined. This X
frequently happens when H is a wide matrix (more columns than rows)
with linearly independent rows. So,
In this case, it is common to seek a solution x with minimum norm. That
min kxk22 s.t. y = Hx = x = HT (HHT )1 y. (14)
is, we would like to solve the optimization problem x
min kxk22 (9) In some situations, it is desirable to minimize the weighted energy, i.e.,
x P 2
n wn xn , where wn are positive weights. This corresponds to minimizing
such that y = Hx. (10)
kW1/2 xk22 where W is the diagonal matrix, [W]n,n = wn . The derivation
Minimization with constraints can be done with Lagrange multipliers. So, of the solution is similar, and gives
define the Lagrangian: 1
min kW1/2 xk22 s.t. y = Hx = x = W1 HT HW1 HT y
L(x, ) = kxk22 T
+ (y Hx) x
(15)
Take the derivatives of the Lagrangian:
This solution is also derived below, see (25).
L(x) = 2x HT
x 4 Regularization
In the overdetermined case, we minimized ky Hxk22 . In the underde-
L(x) = y Hx
termined case, we minimized kxk22 . Another approach is to minimize the
2
weighted sum: c1 ky Hxk22 + c2 kxk22 . The solution x depends on the Setting the derivative to zero,
ratio c2 /c1 , not on c1 and c2 individually.

A common approach to obtain an inexact solution to a linear system is J(x) = 0 = HT Hx + AT Ax = HT y
x
to minimize the objective function:
= (HT H + AT A) x = HT y
J(x) = ky Hxk22 + kxk22 (16)
So the solution is given by
where > 0. Taking the derivative, we get
x = (HT H + AT A)1 HT y

J(x) = 2HT (Hx y) + 2x
x So,
Setting the derivative to zero,
min ky Hxk22 + kAxk22
x
(18)
J(x) = 0 = HT Hx + x = HT y = x = (HT H + AT A)1 HT y
x
= (HT H + I) x = HT y
Note that if A is the identity matrix, then equation (18) becomes (17).
So the solution is given by
6 Constrained least squares
T 1 T
x = (H H + I) H y
Constrained least squares refers to the problem of finding a least squares
solution that exactly satisfies additional constraints. If the additional
So,
constraints are a set of linear equations, then the solution is obtained as
min ky Hxk22 + kxk22 = x = (HT H + I)1 HT y (17) follows.
x
The constrained least squares problem is of the form:
This is referred to as diagonal loading because a constant, , is added
to the diagonal elements of HT H. The approach also avoids the problem min ky Hxk22 (19)
x
of rank deficiency because HT H + I is invertible even if HT H is not. In such that Cx = b (20)
addition, the solution (17) can be used in both cases: when H is tall and
when H is wide. Define the Lagrangian,
5 Weighted regularization L(x, ) = ky Hxk22 + T (Cx b).

A more general form of the regularized objective function (16) is:
The derivatives are:
J(x) = ky Hxk22 + kAxk22

L(x) = 2HT (Hx y) + CT
where > 0. Taking the derivative, we get x

J(x) = 2HT (Hx y) + 2AT Ax L(x) = Cx b
x
3
Setting the derivatives to zero, 6.1 Special cases
Simpler forms of (23) are frequently useful. For example, if H = I and
L(x) = 0 = x = (HT H)1 (HT y 0.5CT ) (21)
x b = 0 in (23), then we get
min ky xk22 s.t. Cx = 0
L(x) = 0 = Cx = b (22) x
1 (24)
= x = y CT CCT Cy
Multiplying (21) on the left by C gives Cx, which from (22) is b, so we
have If y = 0 in (23), then we get
C(HT H)1 (HT y 0.5CT ) = b min kHxk22 s.t. Cx = b
x
1 (25)
or, expanding, = x = (HT H)1 CT C(HT H)1 CT b
C(HT H)1 HT y 0.5C(HT H)1 CT = b. If y = 0 and H = I in (23), then we get
Solving for gives 1
min kxk22 s.t. Cx = b = x = CT CCT b (26)
x
T 1 T 1 T 1 T

= 2 C(H H) C C(H H) H yb
which is the same as (14).
Plugging into (21) gives
7 Note
1
x = (HT H)1 HT y CT C(HT H)1 CT C(HT H)1 HT y b The expressions above involve matrix inverses. For example, (7) involves
Let us verify that x in this formula does in fact satisfy Cx = b, (HT H)1 . However, it must be emphasized that finding the least square
solution does not require computing the inverse of HT H even though the
Cx = C(HT H)1 HT y inverse appears in the formula. Instead, x in (7) should be obtained, in
CT C(HT H)1 CT
1
C(HT H)1 HT y b
practice, by solving the system Ax = b where A = HT H and b = HT y.
The most direct way to solve a linear system of equations is by Gaussian
= C(HT H)1 HT y elimination. Gaussian elimination is much faster than computing the inverse
1
C(HT H)1 CT C(HT H)1 CT C(HT H)1 HT y b

of the matrix A.
= C(HT H)1 HT y C(HT H)1 HT y b

8 Examples
=b X
8.1 Polynomial approximation
So, An important example of least squares is fitting a low-order polynomial to
data. Suppose the N -point data is of the form (ti , yi ) for 1 i N . The
min ky Hxk22 s.t. Cx = b = goal is to find a polynomial that approximates the data by minimizing the
x
1 energy of the residual:

x = (HT H)1 HT y CT C(HT H)1 CT C(HT H)1 HT y b
X
E= (yi p(ti ))2
(23)
i
4
where p(t) is a polynomial, e.g.,
p(t) = a0 + a1 t + a2 t2 .
The problem can be viewed as solving the overdetermined system of equa-

tions,
2

Data y1 1 t1 t21
y2 1 t2 t22 a0

1
. .
. . .. a1 ,
..
0 . . . . a
2
yN 1 tN t2N
1
which we denote as y Ha. The energy of the residual, E, is written as

2
0 0.5 1 1.5 2
E = ky Hak22 .
2
Polynomial approximation (degree = 2) From (7), the least squares solution is given by a = (HT H)1 HT y. An
1 example is illustrated in Fig. 1.
0
8.2 Linear prediction
1
One approach to predict future values of a time-series is based on linear
prediction, e.g.,
2
0 0.5 1 1.5 2
y(n) a1 y(n 1) + a2 y(n 2) + a3 y(n 3). (27)
2
Polynomial approximation (degree = 4)
If past data y(n) is available, then the problem of finding ai can be solved
1
using least squares. Finding a = (a0 , a1 , a2 )T can be viewed as one of
solving an overdetermined system of equations. For example, if y(n) is
0
available for 0 n N 1, and we seek a third order linear predictor,
1 then the overdetermined system of equations are given by

2 y(3) y(2) y(1) y(0)
0 0.5 1 1.5 2 a1
y(4) y(3) y(2) y(1)

.. .. .. .. a2 ,
Figure 1: Least squares polynomial approximation.

. . . . a
3
y(N 1) y(N 2) y(N 3) y(N 4)
which we can write as y = Ha where H is a matrix of size (N 3) 3.

From (7), the least squares solution is given by a = (HT H)1 HT y
. Note
that HT H is small, of size 3 3 only. Hence, a is obtained by solving a
small linear system of equations.
5
Once the coefficients ai are found, then y(n) for n > N can be estimated
using the recursive difference equation (27).
An example is illustrated in Fig. 2. One hundred samples of data are
available, i.e., y(n) for 0 n 99. From these 100 samples, a p-order linear
2
Given data to be predicted predictor is obtained by least squares, and the subsequent 100 samples are
1 predicted.
0
8.3 Smoothing
1
One approach to smooth a noisy signal is based on least squares weighted
2
0 50 100 150 200
regularization. The idea is to obtain a signal similar to the noisy one, but
smoother. The smoothness of a signal can be measured by the energy of
2
Predicted samples (deg = 3) its derivative (or second-order derivative). The smoother a signal is, the
1 smaller the energy of its derivative is.
0 Define the matrix D as

1 1 2 1
p=3
1 2 1

2
0 50 100 150 200 D= .. ..
. (28)
. .

2
Predicted samples (deg = 4) 1 2 1
1
Then Dx is the second-order difference (a discrete form of the second-order
0
derivative) of the signal x(n). See Appendix B. If x is smooth, then kDxk22
1
is small in value.
p=4
2 If y(n) is a noisy signal, then a smooth signal x(n), that approximates
0 50 100 150 200
y(n), can be obtained as the solution to the problem:
2
Predicted samples (deg = 6)
min ky xk22 + kDxk22 (29)
1 x
0 where > 0 is a parameter to be specified. Minimizing ky xk22 forces

1 x to be similar to the noisy signal y. Minimizing kDxk22 forces x to be
p=6 smooth. Minimizing the sum in (29) forces x to be both similar to y and
2
0 50 100 150 200
smooth (as far as possible, and depending on ).
If = 0, then the solution will be the noisy data, i.e., x = y, because this
Figure 2: Least squares linear prediction.
solution makes (29) equal to zero. In this case, no smoothing is achieved.
On the other hand, the greater is, the smoother x will be.
Using (18), the signal x minimizing (29) is given by
x = (I + DT D)1 y. (30)
6
Noisy data This matrix is constant-valued along its diagonals. Such matrices are called
1.5
Toeplitz matrices.
1
It may be expected that x can be obtained from y by solving the linear
0.5
system y = Hx. In some situations, this is possible. However, the matrix
0
H is often singular or almost singular. In this case, Gaussian elimination
0.5
0 50 100 150 200 250 300
encounters division by zeros.
For example, Fig. 4 illustrates an input signal, x(n), an impulse response,
Least squares smoothing
h(n), and the output signal, y(n). When we attempt to obtain x by solving
1.5
y = Hx in Matlab, we receive the warning message: Matrix is singular
1
to working precision and we obtain a vector of all NaN (not a number).
0.5
Due to H being singular, we regularize the problem. Note that the input
0 signal in Fig. 4 is mostly zero, hence, it is reasonable to seek a solution x
0.5 with small energy. The signal x we seek should also satisfy y = Hx, at
0 50 100 150 200 250 300
least approximately. To obtain such a signal, x, we solve the problem:
Figure 3: Least squares smoothing.
min ky Hxk22 + kxk22 (32)
x
Note that the matrix I + DT D is banded. (The only non-zero values are where > 0 is a parameter to be specified. Minimizing ky Hxk22 forces
near the main diagonal). Therefore, the solution can be obtained using fast x to be consistent with the output signal y. Minimizing kxk22 forces x to
solvers for banded systems [5, Sect 2.4]. have low energy. Minimizing the sum in (32) forces x to be consistent with
An example of least squares smoothing is illustrated in Fig. 3. A noisy y and to have low energy (as far as possible, and depending on ). Using
ECG signal is smoothed using (30). We have used the ECG waveform (18), the signal x minimizing (32) is given by
generator ECGSYN [4].
x = (HT H + I)1 HT y. (33)
8.4 Deconvolution
Deconvolution refers to the problem of finding the input to an LTI system This technique is called diagonal loading because is added the diagonal
when the output signal is known. Here, we assume the impulse response of of HT H. A small value of is sufficient to make the matrix invertible. The
the system is known. The output, y(n), is given by solution, illustrated in Fig. 4, is very similar to the original input signal,
shown in the figure.
y(n) = h(0) x(n) + h(1) x(n 1) + + h(N ) x(n N ) (31) In practice, the available data is also noisy. In this case, the data y is
where x(n) is the input signal and h(n) is the impulse response. Equation given by y = Hx + w where w is the noise. The noise is often modeled as
(31) can be written as y = Hx where H is a matrix of the form an additive white Gaussian random signal. In this case, diagonal loading
with a small will generally produce a noisy estimate of the input signal.
h(0) In Fig. 4, we used = 0.1. When the same value is used with the noisy
h(1) h(0)

data, a noisy result is obtained, as illustrated in Fig. 5. A larger is needed
H= .
h(2) h(1) h(0)
so as to attenuate the noise. But if is too large, then the estimate of the
.. ..
. . input signal is distorted. Notice that with = 1.0, the noise is reduced
7
but the height of the the pulses present in the original signal are somewhat
attenuated. With = 5.0, the noise is reduced slightly more, but the pulses
are substantially more attenuated.
3
Input signal To improve the deconvolution result in the presence of noise, we can
2
minimize the energy of the derivative (or second-order derivative) of x
1 instead. As in the smoothing example above, minimizing the energy of the
0 second-order derivative forces x to be smooth. In order that x is consistent
1 with the data y and is also smooth, we solve the problem:
2
0 50 100 150 200 250 300 min ky Hxk22 + kDxk22 (34)
x
4 where D is the second-order difference matrix (28). Using (18), the signal
Impulse response
x minimizing (34) is given by
2
0
x = (HT H + DT D)1 HT y. (35)
2 The solution obtained using (35) is illustrated in Fig. 6. Compared to the

solutions obtained by diagonal loading, illustrated in Fig. 5, this solution is
4
0 50 100 150 200 250 300 less noisy and less distorted.
This example illustrates the need for regularization even when the data
Output signal (noisefree)
10 is noise-free (an unrealistic ideal case). It also illustrates the choice of
regularizer (i.e., kxk22 , kDxk22 , or other) affects the quality of the result.
5
8.5 System identification

0
System identification refers to the problem of estimating an unknown

5
0 50 100 150 200 250 300

system. In its simplest form, the system is LTI and input-output data is
available. Here, we assume that the output signal is noisy. We also assume
3
Deconvolution (diagonal loading)
that the impulse response is relatively short.
2 The output, y(n), of the system can be written as
1
y(n) = h0 x(n) + h1 x(n 1) + h2 x(n 2) + w(n) (36)
0
1
where x(n) is the input signal and w(n) is the noise. Here, we have assumed
= 0.10
2
the impulse response hn is of length 3. We can write this in matrix form as
0 50 100 150 200 250 300
y0 x0
y x x
Figure 4: Deconvolution of noise-free data by diagonal loading. 1 1 0 h0
y2 x2 x1 x0

h1

y3 x3 x2 x1
h2
.. .. ..
. . .
8
Output signal (noisy)
10
5
Output signal (noisy)
10 0
5 5
0 0 50 100 150 200 250 300
5 3
Deconvolution (derivative regularization)
0 50 100 150 200 250 300 2
1
3
Deconvolution (diagonal loading) 0
2
1
1 = 2.00
2
0 0 50 100 150 200 250 300
1
= 0.10
2
Figure 6: Deconvolution of noisy data by derivative regularization.
0 50 100 150 200 250 300
which we denote as y Xh. If y is much longer than the length of the im-
3 pulse response h, then X is a tall matrix and y Xh is an overdetermined
Deconvolution (diagonal loading)
2 system of equations. In this case, h can be estimated from (7) as
1
h = (XT X)1 XT y (37)
0
1
= 1.00 An example is illustrated in Fig. 7. A binary input signal and noisy
2
0 50 100 150 200 250 300 output signal are shown. When it is assumed that h is of length 10, then we
obtain the impulse response shown. The residual, i.e., r = y Xh, is also
3
Deconvolution (diagonal loading) shown in the figure. It is informative to plot the root-mean-square-error
2
(RMSE), i.e., krk2 , as a function of the length of the impulse response.
1
This is a decreasing function. If the data really is the input-output data of
0
an LTI system with a finite impulse response, and if the noise is not too
1
= 5.00
severe, then the RMSE tends to flatten out at the correct impulse response
2
0 50 100 150 200 250 300
length. This provides an indication of the length of the unknown impulse
response.
Figure 5: Deconvolution of noisy data by diagonal loading.
8.6 Missing sample estimation
Due to transmission errors, transient interference, or impulsive noise, some
samples of a signal may be lost or so badly corrupted as to be unusable. In
9
2 Input signal 1.5
Signal with 100 missing samples
1 1
0 0.5
1 0
2 0.5
0 20 40 60 80 100 0 50 100 150 200
2 Output signal (noisy) Estimated samples

1.5
1 1
0 0.5
1 0
2
0.5
0 20 40 60 80 100
0 50 100 150 200
0.8
Estimated impulse response (length 10)
1.5
Final output
0.6
1
0.4
0.5
0.2
0
0
0 2 4 6 8 10 0.5
0 50 100 150 200
2 Residual (RMSE = 0.83)

Figure 8: Least squares estimation of missing data.
1
0 this case, the missing samples should be estimated based on the available
1 uncorrupted data. To complicate the problem, the missing samples may be
2 randomly distributed through out the signal. Filling in missing values in
0 20 40 60 80 100
order to conceal errors is called error concealment [6].
10 This example shows how the missing samples can be estimated by least
RMSE vs impulse response length
8
squares. As an example, Fig. 8 shows a 200-point ECG signal wherein 100
samples are missing. The problem is to fill in the missing 100 samples.
6
RMSE
To formulate the problem as a least squares problem, we introduce some

4
notation. Let x be a signal of length N . Suppose K samples of x are
2
known, where K < N . The K-point known signal, y, can be written as
0
0 5 10 15 20
Length of impulse response
y = Sx (38)
Figure 7: Least squares system identification.
where S is a selection (or sampling) matrix of size K N . For example,
10
if only the first, second and last elements of a 5-point signal x are observed, where y is the available data and v consists of the samples to be determined.
then the matrix S is given by For example,

1 0 0 0 0 1 0 0 0 0 y(0)
S = 0 1 0 0 0 .

(39)
0 1 0 y(0) 0 0 " y(1)
#
v(0)
ST y + STc v =

0 0 0 0 1 0 0 0 y(1) + 1 0 v(1) = v(0) . (44)

0 0 0 y(2) 0 1 v(1)

The matrix S is the identity matrix with rows removed, corresponding to
the missing samples. Note that Sx removes two samples from the signal x, 0 0 1 0 0 y(2)
The problem is to estimate the vector v, which is of length N K.

x(0)
Let us assume that the original signal, x, is smooth. Then it is reasonable

1 0 0 0 0 x(1) x(0)
Sx = 0 1 0 0 0
to find v to optimize the smoothness of x , i.e., to minimize the energy of the
x(2) = x(1) = y. (40)

0 0 0 0 1 x(3)

x(4) second-order derivative of x . Therefore, v can be obtained by minimizing
2
x(4) kD xk2 where D is the second-order difference matrix (28). Using (43), we
find v by solving the problem
The vector y consists of the known samples of x. So, the vector y is shorter
than x (K < N ). min kD(ST y + STc v)k22 , (45)
v
The problem can be stated as: Given the signal, y, and the matrix, S,
i.e.,
find x such that y = Sx. Of course, there are infinitely many solutions.
Below, it is shown how to obtain a smooth solution by least squares. min kDST y + DSTc vk22 . (46)
v
T
Note that S y has the effect of setting the missing samples to zero. For
From (7), the solution is given by
example, with S in (39) we have
v = (Sc DT D STc )1 Sc DT D ST y. (47)

1 0 0 y(0)

0 1 0 y(0) y(1)

Once v is obtained, the estimate x in (43) can be constructed simply by
ST y =

0 0 0 y(1) = 0 . (41)

inserting entries v(i) into y.
0 0 0 y(2) 0

An example of least square estimation of missing data using (47) is
0 0 1 y(2) illustrated in Fig. 8. The result is a smoothly interpolated signal.
Let us define Sc as the complement of S. The matrix Sc consists of the
rows of the identity matrix not appearing in S. Continuing the 5-point We make several remarks.
example, 1. All matrices in (47) are banded, so the computation of v can be
implemented with very high efficiency using a fast solver for banded
" #
0 0 1 0 0
Sc = . (42) systems [5, Sect 2.4]. The banded property of the matrix
0 0 0 1 0
can be represented as
Now, an estimate x G = Sc DT D STc (48)
= ST y + STc v
x (43) arising in (47) is illustrated in Fig. 9.
11
0 0.6 Clipped speech with 139 missing samples
0.4
10
0.2
20 0
0.2
30
0.4
40
0 100 200 300 400 500
50
0.6 Estimated samples
60
0.4
70 0.2
0
80
0.2
90 0.4
0 100 200 300 400 500
100
0 20 40 60 80 100
nz = 282 0.6 Final output
0.4
Figure 9: Visualization of the banded matrix G in (48). All the non-zero values
0.2
lie near the main diagonal.
0
0.2
2. The method does not require the pattern of missing samples to have 0.4
any particular structure. The missing samples can be distributed quite 0 100 200 300 400 500
randomly.
Figure 10: Estimation of speech waveform samples lost due to clipping. The lost
3. This method (47) does not require any regularization parameter be samples are estimated by least squares.
specified. However, this derivation does assume the available data, y,
is noise free. If y is noisy, then simultaneous smoothing and missing only need to change the matrix D to the following one. If we define the
sample estimation is required (see the Exercises). matrix D as

1 3 3 1
Speech de-clipping: In audio recording, if the amplitude of the audio
1 3 3 1

source is too high, then the recorded waveform may suffer from clipping D= .. ..
, (49)
. .

(i.e., saturation). Figure 10 shows a speech waveform that is clipped. All
values greater than 0.2 in absolute value are lost due to clipping. 1 3 3 1
To estimate the missing data, we can use the least squares approach
then Dx is an approximation of the third-order derivative of the signal x.
given by (47). That is, we fill in the missing data so as to minimize the
energy of the derivative of the total signal. In this example, we minimize Using (47) with D defined in (49), we obtain the signal shown in the
the energy of the the third derivative. This encourages the filled in data to Fig. 10. The samples lost due to clipping have been smoothly filled in.
have the form of a parabola (second order polynomial), because the third Figure 11 shows both, the clipped signal and the estimated samples, on
derivative of a parabola is zero. In order to use (47) for this problem, we the same axis.
12
Filled in (red) experiments and compare results of 2-nd and 3-rd order difference
Clipped data (blue)
matrices.
0.6
7. System identification. Perform a system identification experiment with
0.4 varying variance of additive Gaussian noise. Plot the RMSE versus
0.2 impulse response length. How does the plot of RMSE change with
0 respect to the variance of the noise?
0.2
8. Speech de-clipping. Record your own speech and use it to artificially
0.4
create a clipped signal. Perform numerical experiments to test the
0 100 200 300 400 500
least square estimation of the lost samples.
Figure 11: The available clipped speech waveform is shown in blue. The filled in 9. Suppose the available data is both noisy and that some samples are
signal, estimated by least squares, is shown in red. missing. Formulate a suitable least squares optimization problem
to smooth the data and recover the missing samples. Illustrate the
9 Exercises effectiveness by a numerical demonstration (e.g. using Matlab).
1. Find the solution x to the least squares problem:
A Vector derivatives
min ky Axk22 + kb xk22 If f (x) is a function of x1 , . . . , xN , then the derivative of f (x) with respect
x
to x is the vector of derivatives,
2. Show that the solution x to the least squares problem
f (x)

min 1 kb1 A1 xk22 + 2 kb2 A2 xk22 + 3 kb3 A3 xk22 x1
x
f (x)
is

x

1 f (x) =
2 . (51)
x = 1 AT1 A1 + 2 AT2 A2 + 3 AT3 A3 x .

..
1 AT1 b1 + 2 AT2 b2 + 3 AT3 b3

(50)

f (x)
xN
3. In reference to (17), why is HT H + I with > 0 invertible even if
HT H is not? This is the gradient of f , denoted f . By direct calculation, we have
T
4. Show (56). x b=b (52)
x
5. Smoothing. Demonstrate least square smoothing of noisy data. Use and
various values of . What behavior do you observe when is very T
b x = b. (53)
high? x
Suppose that A is a symmetric real matrix, AT = A. Then, by direct
6. The second-order difference matrix (28) was used in the examples calculation, we also have
for smoothing, deconvolution, and estimating missing samples. Dis-
T
cuss the use of the third-order difference instead. Perform numerical x Ax = 2Ax. (54)
x
13
Also, The second order difference is obtained by taking the first order difference
twice:
(y x)T A(y x) = 2A(x y), (55)
x
x D D y
and
which give the difference equation
kAx bk22 = 2AT (Ax b). (56)
x
y(n) = x(n) 2 x(n 1) x(n 2). (63)
We illustrate (54) by an example. Set A as the 2 2 matrix, The third order difference is obtained by taking the first order difference
three times:
" #
3 2
A= . (57)
2 5
x D D D y
Then, by direct calculation
" #" # which give the difference equation
h i 3 2 x
T 1
x Ax = x1 x2 = 3x21 + 4x1 x2 + 5x22 (58) y(n) = x(n) 3 x(n 1) + 3 x(n 2) x(n 3). (64)
2 5 x2
so In terms of discrete-time linear time-invariant systems (LTI), the first
order difference is an LTI system with transfer function
xT Ax = 6x1 + 4x2

(59)
x1
D(z) = 1 z 1 .
and
The second order difference has the transfer function
xT Ax = 4x1 + 10x2

(60)
x2
D2 (z) = (1 z 1 )2 = 1 2z 1 + z 2 .
Let us verify that the right-hand side of (54) gives the same:
" #" # " # The third order difference has the transfer function
3 2 x1 6x1 + 4x2
2Ax = 2 = . (61)
2 5 x2 4x1 + 10x2 D3 (z) = (1 z 1 )3 = 1 3z 1 + 3z 2 z 3 .
B The Kth-order difference Note that the coefficient come from Pascals triangle:
The first order difference of a discrete-time signal x(n), n Z, is defined 1
as 1 1
y(n) = x(n) x(n 1). (62) 1 2 1
1 3 3 1
This is represented as a system with input x and output y, 1 4 6 4 1
.. .. ..
x D y . . .
14
C Additional Exercises for Signal Processing Students [5] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery.
Numerical recipes in C: the art of scientific computing (2nd ed.). Cam-
1. For smoothing a noisy signal using least squares, we obtained (30),
bridge University Press, 1992.
x = (I + DT D)1 y, > 0.
[6] Y. Wang and Q. Zhu. Error control and concealment for video commu-
T 1 nication: A review. Proc. IEEE, 86(5):974997, May 1998.
The matrix G = (I + D D) can be understood as a low-pass filter.
Using Matlab, compute and plot the output, y(n), when the input is an
impulse, x(n) = (n no ). This is an impulse located at index no . Try
placing the impulse at various points in the signal. For example, put
the impulse around the middle of the signal. What happens when the
impulse is located near the ends of the signal (no = 0 or no = N 1)?
2. For smoothing a noisy signal using least squares, we obtained (30),
x = (I + DT D)1 y, > 0,
where D represents the K-th order derivative.

Assume D is the first-order difference and that the matrices I and
D are infinite is size. Then I + DT D is a convolution matrix and
represents an LTI system.
(a) Find and sketch the frequency response of the system I + DT D.
(b) Based on (a), find the frequency response of G = (I + DT D)1 .
Sketch the frequency response of G. What kind of filter is it (low-pass,
high-pass, band-pass, etc.)? What is its dc gain?
References
[1] C. Burrus. Least squared error design of FIR filters. Connexions Web
site, 2012. http://cnx.org/content/m16892/1.3/.
[2] C. Burrus. General solutions of simultaneous equations. Connexions

Web site, 2013. http://cnx.org/content/m19561/1.5/.
[3] C. L. Lawson and R. J. Hanson. Solving Least Squares Problems. SIAM,

1987.
[4] P. E. McSharry, G. D. Clifford, L. Tarassenko, and L. A. Smith. A

dynamical model for generating synthetic electrocardiogram signals.
Trans. on Biomed. Eng., 50(3):289294, March 2003.
15

Least Squares Full Resume

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Least Squares Full Resume

Încărcat de

Drepturi de autor:

Formate disponibile

Least Squares with Examples in Signal Processing1 The sum of squares of x is denoted by kxk22 , i.e.

Ivan Selesnick March 7, 2013 NYU-Poly kxk22 =

3. Deconvolution If there is no solution to this system of equations, then the system is

1 Notation J(x) = ky Hxk22 .

3 Underdetermined equations x = HT (HHT )1 y.

Hx = H HT (HHT )1 y = (HHT )(HHT )1 y = y

5 Weighted regularization L(x, ) = ky Hxk22 + T (Cx b).

The problem can be viewed as solving the overdetermined system of equa-

which we denote as y Ha. The energy of the residual, E, is written as

which we can write as y = Ha where H is a matrix of size (N 3) 3.

0 where > 0 is a parameter to be specified. Minimizing ky xk22 forces

2 The solution obtained using (35) is illustrated in Fig. 6. Compared to the

8.5 System identification

System identification refers to the problem of estimating an unknown

0 50 100 150 200 250 300

0 0 50 100 150 200 250 300

2 Output signal (noisy) Estimated samples

2 Residual (RMSE = 0.83)

To formulate the problem as a least squares problem, we introduce some

The problem is to estimate the vector v, which is of length N K.

2. For smoothing a noisy signal using least squares, we obtained (30),

where D represents the K-th order derivative.

[2] C. Burrus. General solutions of simultaneous equations. Connexions

[3] C. L. Lawson and R. J. Hanson. Solving Least Squares Problems. SIAM,

[4] P. E. McSharry, G. D. Clifford, L. Tarassenko, and L. A. Smith. A

S-ar putea să vă placă și