Sunteți pe pagina 1din 28

EE103 (Fall 2011-12)

8. Linear least-squares
denition
examples and applications
solution of a least-squares problem, normal equations
8-1
Denition
overdetermined linear equations
Ax = b (A is mn with m > n)
if b range(A), cannot solve for x
least-squares formulation
minimize Ax b =
_
_
m

i=1
(
n

j=1
a
ij
x
j
b
i
)
2
_
_
1/2
r = Ax b is called the residual or error
x with smallest residual norm r is called the least-squares solution
equivalent to minimizing Ax b
2
Linear least-squares 8-2
Example
A =
_
_
2 0
1 1
0 2
_
_
, b =
_
_
1
0
1
_
_
least-squares solution
minimize (2x
1
1)
2
+ (x
1
+ x
2
)
2
+ (2x
2
+ 1)
2
to nd optimal x
1
, x
2
, set derivatives w.r.t. x
1
and x
2
equal to zero:
10x
1
2x
2
4 = 0, 2x
1
+ 10x
2
+ 4 = 0
solution x
1
= 1/3, x
2
= 1/3
(much more on practical algorithms for LS problems later)
Linear least-squares 8-3
2
0
2
2
0
2
0
10
20
30
x
1
x
2
r
2
1
= (2x
1
1)
2
2
0
2
2
0
2
0
5
10
15
20
x
1
x
2
r
2
2
= (x
1
+ x
2
)
2
2
0
2
2
0
2
0
10
20
30
x
1
x
2
r
2
3
= (2x
2
+ 1)
2
2
0
2
2
0
2
0
20
40
60
x
1
x
2
r
2
1
+ r
2
2
+ r
2
3
Linear least-squares 8-4
Outline
denition
examples and applications
solution of a least-squares problem, normal equations
Data tting
t a function
g(t) = x
1
g
1
(t) + x
2
g
2
(t) + + x
n
g
n
(t)
to data (t
1
, y
1
), . . . , (t
m
, y
m
), i.e., choose coecients x
1
, . . . , x
n
so that
g(t
1
) y
1
, g(t
2
) y
2
, . . . , g(t
m
) y
m
g
i
(t) : R R are given functions (basis functions)
problem variables: the coecients x
1
, x
2
, . . . , x
n
usually m n, hence no exact solution with g(t
i
) = y
i
for all i
applications: developing simple, approximate model of observed data
Linear least-squares 8-5
Least-squares data tting
compute x by minimizing
m

i=1
(g(t
i
) y
i
)
2
=
m

i=1
(x
1
g
1
(t
i
) + x
2
g
2
(t
i
) + + x
n
g
n
(t
i
) y
i
)
2
in matrix notation: minimize Ax b
2
where
A =
_

_
g
1
(t
1
) g
2
(t
1
) g
3
(t
1
) g
n
(t
1
)
g
1
(t
2
) g
2
(t
2
) g
3
(t
2
) g
n
(t
2
)
.
.
.
.
.
.
.
.
.
.
.
.
g
1
(t
m
) g
2
(t
m
) g
3
(t
m
) g
n
(t
m
)
_

_
, b =
_

_
y
1
y
2
.
.
.
y
m
_

_
Linear least-squares 8-6
Example: data tting with polynomials
g(t) = x
1
+ x
2
t + x
3
t
2
+ + x
n
t
n1
basis functions are g
k
(t) = t
k1
, k = 1, . . . , n
A =
_

_
1 t
1
t
2
1
t
n1
1
1 t
2
t
2
2
t
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
1 t
m
t
2
m
t
n1
m
_

_
, b =
_

_
y
1
y
2
.
.
.
y
m
_

_
interpolation (m = n): can satisfy g(t
i
) = y
i
exactly by solving Ax = b
approximation (m > n): make error small by minimizing Ax b
Linear least-squares 8-7
example. t a polynomial to f(t) = 1/(1 + 25t
2
) on [1, 1]
pick m = n points t
i
in [1, 1], and calculate y
i
= 1/(1 + 25t
2
i
)
interpolate by solving Ax = b
1 0.5 0 0.5 1
0.5
0
0.5
1
1.5
n = 5
1 0.5 0 0.5 1
2
0
2
4
6
8
n = 15
(dashed line: f; solid line: polynomial g; circles: the points (t
i
, y
i
))
increasing n does not improve the overall quality of the t
Linear least-squares 8-8
same example by approximation
pick m = 50 points t
i
in [1, 1]
t polynomial by minimizing Ax b
1 0.5 0 0.5 1
0.2
0
0.2
0.4
0.6
0.8
1
n = 5
1 0.5 0 0.5 1
0.2
0
0.2
0.4
0.6
0.8
1
n = 15
(dashed line: f; solid line: polynomial g; circles: the points (t
i
, y
i
))
much better t overall
Linear least-squares 8-9
Least-squares estimation
y = Ax + w
x is what we want to estimate or reconstruct
y is our measurement(s)
w is an unknown noise or measurement error (assumed small)
ith row of A characterizes ith sensor or ith measurement
least-squares estimation
choose as estimate the vector x that minimizes
A x y
i.e., minimize the deviation between what we actually observed (y), and
what we would observe if x = x and there were no noise (w = 0)
Linear least-squares 8-10
Navigation by range measurements
nd position (u, v) in a plane from distances to beacons at positions (p
i
, q
i
)
(u, v)
(p
1
, q
1
)
(p
2
, q
2
)
(p
3
, q
3
)
(p
4
, q
4
)

1

4
beacons
unknown position
four nonlinear equations in two variables u, v:
_
(u p
i
)
2
+ (v q
i
)
2
=
i
for i = 1, 2, 3, 4

i
is the measured distance from unknown position (u, v) to beacon i
Linear least-squares 8-11
linearized distance function: assume u = u
0
+ u, v = v
0
+ v where
u
0
, v
0
are known (e.g., position a short time ago)
u, v are small (compared to
i
s)
_
(u
0
+ u p
i
)
2
+ (v
0
+ v q
i
)
2

_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
+
(u
0
p
i
)u + (v
0
q
i
)v
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
gives four linear equations in the variables u, v:
(u
0
p
i
)u + (v
0
q
i
)v
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2

i

_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
for i = 1, 2, 3, 4
Linear least-squares 8-12
linearized equations
Ax b
where x = (u, v) and A is 4 2 with
b
i
=
i

_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
a
i1
=
(u
0
p
i
)
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
a
i2
=
(v
0
q
i
)
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
due to linearization and measurement error, we do not expect an exact
solution (Ax = b)
we can try to nd u and v that almost satisfy the equations
Linear least-squares 8-13
numerical example
beacons at positions (10, 0), (10, 2), (3, 9), (10, 10)
measured distances = (8.22, 11.9, 7.08, 11.33)
(unknown) actual position is (2, 2)
linearized range equations (linearized around (u
0
, v
0
) = (0, 0))
_

_
1.00 0.00
0.98 0.20
0.32 0.95
0.71 0.71
_

_
_
u
v
_

_
1.77
1.72
2.41
2.81
_

_
least-squares solution: (u, v) = (1.97, 1.90) (norm of error is 0.10)
Linear least-squares 8-14
Least-squares system identication
measure input u(t) and output y(t) for t = 0, . . . , N of an unknown system
u(t) y(t)
unknown
system
example (N = 70):
0 20 40 60
4
2
0
2
4
t
u
(
t
)
0 20 40 60
5
0
5
t
y
(
t
)
system identication problem: nd reasonable model for system based
on measured I/O data u, y
Linear least-squares 8-15
moving average model
y
model
(t) = h
0
u(t) + h
1
u(t 1) + h
2
u(t 2) + + h
n
u(t n)
where y
model
(t) is the model output
a simple and widely used model
predicted output is a linear combination of current and n previous inputs
h
0
, . . . , h
n
are parameters of the model
called a moving average (MA) model with n delays
least-squares identication: choose the model that minimizes the error
E =
_
N

t=n
(y
model
(t) y(t))
2
_
1/2
Linear least-squares 8-16
formulation as a linear least-squares problem:
E =
_
N

t=n
(h
0
u(t) + h
1
u(t 1) + + h
n
u(t n) y(t))
2
_
1/2
= Ax b
A =
_

_
u(n) u(n 1) u(n 2) u(0)
u(n + 1) u(n) u(n 1) u(1)
u(n + 2) u(n + 1) u(n) u(2)
.
.
.
.
.
.
.
.
.
.
.
.
u(N) u(N 1) u(N 2) u(N n)
_

_
x =
_

_
h
0
h
1
h
2
.
.
.
h
n
_

_
, b =
_

_
y(n)
y(n + 1)
y(n + 2)
.
.
.
y(N)
_

_
Linear least-squares 8-17
example (I/O data of page 8-15) with n = 7: least-squares solution is
h
0
= 0.0240, h
1
= 0.2819, h
2
= 0.4176, h
3
= 0.3536,
h
4
= 0.2425, h
5
= 0.4873, h
6
= 0.2084, h
7
= 0.4412
0 10 20 30 40 50 60 70
4
3
2
1
0
1
2
3
4
5
t
solid: y(t): actual output
dashed: y
model
(t)
Linear least-squares 8-18
model order selection: how large should n be?
0 20 40
0
0.2
0.4
0.6
0.8
1
n
relative error E/y
suggests using largest possible n for smallest error
much more important question: how good is the model at predicting
new data (i.e., not used to calculate the model)?
Linear least-squares 8-19
model validation: test model on a new data set (from the same system)
0 20 40 60
4
2
0
2
4
t

u
(
t
)
0 20 40 60
5
0
5

y
(
t
)
t
0 20 40
0
0.2
0.4
0.6
0.8
1
n
r
e
l
a
t
i
v
e
p
r
e
d
i
c
t
i
o
n
e
r
r
o
r
validation data
modeling data
for n too large the predictive
ability of the model becomes
worse!
validation data suggest n = 10
Linear least-squares 8-20
for n = 50 the actual and predicted outputs on system identication and
model validation data are:
0 20 40 60
5
0
5
t
solid: y(t)
dashed: y
model
(t)
I/O set used to compute model
0 20 40 60
5
0
5
t
solid: y(t)
dashed: y
model
(t)
model validation I/O set
loss of predictive ability when n is too large is called overtting or
overmodeling
Linear least-squares 8-21
Outline
denition
examples and applications
solution of a least-squares problem, normal equations
Geometric interpretation of a LS problem
minimize Ax b
2
A is mn with columns a
1
, . . . , a
n
Ax b is the distance of b to the vector
Ax = x
1
a
1
+ x
2
a
2
+ + x
n
a
n
solution x
ls
gives the linear combination of the columns of A closest to b
Ax
ls
is the projection of b on the range of A
Linear least-squares 8-22
example
A =
_
_
1 1
1 2
0 0
_
_
, b =
_
_
1
4
2
_
_
a
1
a
2
b
Ax
ls
= 2a
1
+ a
2
least-squares solution x
ls
Ax
ls
=
_
_
1
4
0
_
_
, x
ls
=
_
2
1
_
Linear least-squares 8-23
The solution of a least-squares problem
if A is left-invertible, then
x
ls
= (A
T
A)
1
A
T
b
is the unique solution of the least-squares problem
minimize Ax b
2
in other words, if x = x
ls
, then Ax b
2
> Ax
ls
b
2
recall from page 4-25 that A
T
A is positive denite and that
(A
T
A)
1
A
T
is a left-inverse of A
Linear least-squares 8-24
proof
we show that Ax b
2
> Ax
ls
b
2
for x = x
ls
:
Ax b
2
= A(x x
ls
) + (Ax
ls
b)
2
= A(x x
ls
)
2
+Ax
ls
b
2
> Ax
ls
b
2
2nd step follows from A(x x
ls
) (Ax
ls
b):
(A(x x
ls
))
T
(Ax
ls
b) = (x x
ls
)
T
(A
T
Ax
ls
A
T
b) = 0
3rd step follows from zero nullspace property of A:
x = x
ls
= A(x x
ls
) = 0
Linear least-squares 8-25
The normal equations
(A
T
A)x = A
T
b
if A is left-invertible:
least-squares solution can be found by solving the normal equations
n equations in n variables with a positive denite coecient matrix
can be solved using Cholesky factorization
Linear least-squares 8-26

S-ar putea să vă placă și