Least Square Examples

EE103 (Fall 2011-12)
8. Linear least-squares
denition
examples and applications
solution of a least-squares problem, normal equations
8-1
Denition
overdetermined linear equations
Ax = b (A is mn with m > n)
if b range(A), cannot solve for x
least-squares formulation
minimize Ax b =
_
_
m
i=1
(
n
j=1
a
ij
x
j
b
i
)
2
_
_
1/2
r = Ax b is called the residual or error
x with smallest residual norm r is called the least-squares solution
equivalent to minimizing Ax b
2
Linear least-squares 8-2
Example
A =
_
_
2 0
1 1
0 2
_
_
, b =
_
_
1
0
1
_
_
least-squares solution
minimize (2x
1
1)
2
+ (x
1
+ x
2
)
2
+ (2x
2
+ 1)
2
to nd optimal x
1
, x
2
, set derivatives w.r.t. x
1
and x
2
equal to zero:
10x
1
2x
2
4 = 0, 2x
1
+ 10x
2
+ 4 = 0
solution x
1
= 1/3, x
2
= 1/3
(much more on practical algorithms for LS problems later)
2
0
2
2
0
2
0
10
20
30
x
1
x
2
r
2
1
= (2x
1
1)
2
2
0
2
2
0
2
0
5
10
15
20
x
1
x
2
r
2
2
= (x
1
+ x
2
)
2
2
0
2
2
0
2
0
10
20
30
x
1
x
2
r
2
3
= (2x
2
+ 1)
2
2
0
2
2
0
2
0
20
40
60
x
1
x
2
r
2
1
+ r
2
2
+ r
2
3
Outline
denition
Data tting
t a function
g(t) = x
1
g
1
(t) + x
2
g
2
(t) + + x
n
g
n
(t)
to data (t
1
, y
1
), . . . , (t
m
, y
m
), i.e., choose coecients x
1
, . . . , x
n
so that
g(t
1
) y
1
, g(t
2
) y
2
, . . . , g(t
m
) y
m
g
i
(t) : R R are given functions (basis functions)
problem variables: the coecients x
1
, x
2
, . . . , x
n
usually m n, hence no exact solution with g(t
i
) = y
i
for all i
applications: developing simple, approximate model of observed data
Least-squares data tting
compute x by minimizing
m
i=1
(g(t
i
) y
i
)
2
=
m
i=1
(x
1
g
1
(t
i
) + x
2
g
2
(t
i
) + + x
n
g
n
(t
i
) y
i
)
2
in matrix notation: minimize Ax b
2
where
A =
_
_
g
1
(t
1
) g
2
(t
1
) g
3
(t
1
) g
n
(t
1
)
g
1
(t
2
) g
2
(t
2
) g
3
(t
2
) g
n
(t
2
)
.
.
.
.
.
.
.
.
.
.
.
.
g
1
(t
m
) g
2
(t
m
) g
3
(t
m
) g
n
(t
m
)
_
_
, b =
_
_
y
1
y
2
.
.
.
y
m
_
_
Example: data tting with polynomials
g(t) = x
1
+ x
2
t + x
3
t
2
+ + x
n
t
n1
basis functions are g
k
(t) = t
k1
, k = 1, . . . , n
A =
_
_
1 t
1
t
2
1
t
n1
1
1 t
2
t
2
2
t
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
1 t
m
t
2
m
t
n1
m
_
_
, b =
_
_
y
1
y
2
.
.
.
y
m
_
_
interpolation (m = n): can satisfy g(t
i
) = y
i
exactly by solving Ax = b
approximation (m > n): make error small by minimizing Ax b
example. t a polynomial to f(t) = 1/(1 + 25t
2
) on [1, 1]
pick m = n points t
i
in [1, 1], and calculate y
i
= 1/(1 + 25t
2
i
)
interpolate by solving Ax = b
1 0.5 0 0.5 1
0.5
0
0.5
1
1.5
n = 5
1 0.5 0 0.5 1
2
0
2
4
6
8
n = 15
(dashed line: f; solid line: polynomial g; circles: the points (t
i
, y
i
))
increasing n does not improve the overall quality of the t
same example by approximation
pick m = 50 points t
i
in [1, 1]
t polynomial by minimizing Ax b
1 0.5 0 0.5 1
0.2
0
0.2
0.4
0.6
0.8
1
n = 5
1 0.5 0 0.5 1
0.2
0
0.2
0.4
0.6
0.8
1
n = 15
(dashed line: f; solid line: polynomial g; circles: the points (t
i
, y
i
))
much better t overall
Least-squares estimation
y = Ax + w
x is what we want to estimate or reconstruct
y is our measurement(s)
w is an unknown noise or measurement error (assumed small)
ith row of A characterizes ith sensor or ith measurement
least-squares estimation
choose as estimate the vector x that minimizes
A x y
i.e., minimize the deviation between what we actually observed (y), and
what we would observe if x = x and there were no noise (w = 0)
Navigation by range measurements
nd position (u, v) in a plane from distances to beacons at positions (p
i
, q
i
)
(u, v)
(p
1
, q
1
)
(p
2
, q
2
)
(p
3
, q
3
)
(p
4
, q
4
)

1
4
beacons
unknown position
four nonlinear equations in two variables u, v:
_
(u p
i
)
2
+ (v q
i
)
2
=
i
for i = 1, 2, 3, 4
i
is the measured distance from unknown position (u, v) to beacon i
linearized distance function: assume u = u
0
+ u, v = v
0
+ v where
u
0
, v
0
are known (e.g., position a short time ago)
u, v are small (compared to
i
s)
_
(u
0
+ u p
i
)
2
+ (v
0
+ v q
i
)
2
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
+
(u
0
p
i
)u + (v
0
q
i
)v
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
gives four linear equations in the variables u, v:
(u
0
p
i
)u + (v
0
q
i
)v
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2

i
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
for i = 1, 2, 3, 4
linearized equations
Ax b
where x = (u, v) and A is 4 2 with
b
i
=
i
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
a
i1
=
(u
0
p
i
)
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
a
i2
=
(v
0
q
i
)
_
(u
0
p
i
)
2
+ (v
0
q
i
)
2
due to linearization and measurement error, we do not expect an exact
solution (Ax = b)
we can try to nd u and v that almost satisfy the equations
numerical example
beacons at positions (10, 0), (10, 2), (3, 9), (10, 10)
measured distances = (8.22, 11.9, 7.08, 11.33)
(unknown) actual position is (2, 2)
linearized range equations (linearized around (u
0
, v
0
) = (0, 0))
_
_
1.00 0.00
0.98 0.20
0.32 0.95
0.71 0.71
_
_
_
u
v
_
_
1.77
1.72
2.41
2.81
_
_
least-squares solution: (u, v) = (1.97, 1.90) (norm of error is 0.10)
Least-squares system identication
measure input u(t) and output y(t) for t = 0, . . . , N of an unknown system
u(t) y(t)
unknown
system
example (N = 70):
0 20 40 60
4
2
0
2
4
t
u
(
t
)
0 20 40 60
5
0
5
t
y
(
t
)
system identication problem: nd reasonable model for system based
on measured I/O data u, y
moving average model
y
model
(t) = h
0
u(t) + h
1
u(t 1) + h
2
u(t 2) + + h
n
u(t n)
where y
model
(t) is the model output
a simple and widely used model
predicted output is a linear combination of current and n previous inputs
h
0
, . . . , h
n
are parameters of the model
called a moving average (MA) model with n delays
least-squares identication: choose the model that minimizes the error
E =
_
N
t=n
(y
model
(t) y(t))
2
_
1/2
formulation as a linear least-squares problem:
E =
_
N
t=n
(h
0
u(t) + h
1
u(t 1) + + h
n
u(t n) y(t))
2
_
1/2
= Ax b
A =
_
_
u(n) u(n 1) u(n 2) u(0)
u(n + 1) u(n) u(n 1) u(1)
u(n + 2) u(n + 1) u(n) u(2)
.
.
.
.
.
.
.
.
.
.
.
.
u(N) u(N 1) u(N 2) u(N n)
_
_
x =
_
_
h
0
h
1
h
2
.
.
.
h
n
_
_
, b =
_
_
y(n)
y(n + 1)
y(n + 2)
.
.
.
y(N)
_
_
example (I/O data of page 8-15) with n = 7: least-squares solution is
h
0
= 0.0240, h
1
= 0.2819, h
2
= 0.4176, h
3
= 0.3536,
h
4
= 0.2425, h
5
= 0.4873, h
6
= 0.2084, h
7
= 0.4412
0 10 20 30 40 50 60 70
4
3
2
1
0
1
2
3
4
5
t
solid: y(t): actual output
dashed: y
model
(t)
model order selection: how large should n be?
0 20 40
0
0.2
0.4
0.6
0.8
1
n
relative error E/y
suggests using largest possible n for smallest error
much more important question: how good is the model at predicting
new data (i.e., not used to calculate the model)?
model validation: test model on a new data set (from the same system)
0 20 40 60
4
2
0
2
4
t
u
(
t
)
0 20 40 60
5
0
5
y
(
t
)
t
0 20 40
0
0.2
0.4
0.6
0.8
1
n
r
e
l
a
t
i
v
e
p
r
e
d
i
c
t
i
o
n
e
r
r
o
r
validation data
modeling data
for n too large the predictive
ability of the model becomes
worse!
validation data suggest n = 10
for n = 50 the actual and predicted outputs on system identication and
model validation data are:
0 20 40 60
5
0
5
t
solid: y(t)
dashed: y
model
(t)
I/O set used to compute model
0 20 40 60
5
0
5
t
solid: y(t)
dashed: y
model
(t)
model validation I/O set
loss of predictive ability when n is too large is called overtting or
overmodeling
Outline
denition
Geometric interpretation of a LS problem
minimize Ax b
2
A is mn with columns a
1
, . . . , a
n
Ax b is the distance of b to the vector
Ax = x
1
a
1
+ x
2
a
2
+ + x
n
a
n
solution x
ls
gives the linear combination of the columns of A closest to b
Ax
ls
is the projection of b on the range of A
example
A =
_
_
1 1
1 2
0 0
_
_
, b =
_
_
1
4
2
_
_
a
1
a
2
b
Ax
ls
= 2a
1
+ a
2
least-squares solution x
ls
Ax
ls
=
_
_
1
4
0
_
_
, x
ls
=
_
2
1
_
The solution of a least-squares problem
if A is left-invertible, then
x
ls
= (A
T
A)
1
A
T
b
is the unique solution of the least-squares problem
minimize Ax b
2
in other words, if x = x
ls
, then Ax b
2
> Ax
ls
b
2
recall from page 4-25 that A
T
A is positive denite and that
(A
T
A)
1
A
T
is a left-inverse of A
proof
we show that Ax b
2
> Ax
ls
b
2
for x = x
ls
:
Ax b
2
= A(x x
ls
) + (Ax
ls
b)
2
= A(x x
ls
)
2
+Ax
ls
b
2
> Ax
ls
b
2
2nd step follows from A(x x
ls
) (Ax
ls
b):
(A(x x
ls
))
T
(Ax
ls
b) = (x x
ls
)
T
(A
T
Ax
ls
A
T
b) = 0
3rd step follows from zero nullspace property of A:
x = x
ls
= A(x x
ls
) = 0
The normal equations
(A
T
A)x = A
T
b
if A is left-invertible:
least-squares solution can be found by solving the normal equations
n equations in n variables with a positive denite coecient matrix
can be solved using Cholesky factorization

Least Square Examples

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Least Square Examples

Încărcat de

Drepturi de autor:

Formate disponibile

EE103 (Fall 2011-12)

S-ar putea să vă placă și