Sunteți pe pagina 1din 20

02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 1

Nonlinear least squares problems


This lecture is based on the book
P. C. Hansen, V. Pereyra and G. Scherer,
Least Squares Data Fitting with Applications,
Johns Hopkins University Press, to appear
(the necessary chapters are available on CampusNet)
and we cover this material:
Section 8.1: Intro to nonlinear data tting.
Section 8.2: Unconstrained nonlinear least squares problems.
Section 9.1: Newtons method.
Section 9.2: The Gauss-Newton method.
Section 9.3: The Levenberg-Marquardt method.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 2
Non-linearity
A parameter of the function f appears nonlinearly if the
derivative f/ is a function of .
The model M(x, t) is nonlinear if at least one of the parameters in
x appear nonlinearly.
For example, in the exponential decay model
M(x
1
, x
2
, t) = x
1
e
x
2
t
we have:
M/x
1
= e
x
2
t
which is independent of x
1
,
M/x
2
= t x
1
e
x
2
t
which depends on x
2
.
Thus M is a nonlinear model with the parameter x
2
appearing
nonlinearly.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 3
Fitting with a Gaussian model
1 0.5 0 0.5 1
0
0.5
1
1.5
2
2.5
The non-normalized Gaussian function:
M(x, t) = x
1
e
(tx
2
)
2
/(2x
2
3
)
, x =

x
1
x
2
x
3

,
where x
1
is the amplitude, x
2
is the time shift, and x
3
determines
the width of the Gaussian function.
The parameters x
2
and x
3
appear nonlinearly in this model.
Gaussian models also arise in many other data tting problems.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 4
The nonlinear least squares problem
Find a minimizer x

of the nonlinear objective function f:


min
x
f(x) min
x
1
2
r(x)
2
2
= min
x
1
2
m

i=1
r
i
(x)
2
,
where x R
n
and, as usual,
r(x) =

r
1
(x)
.
.
.
r
m
(x)

R
m
,
r
i
(x) = y
i
M(x, t
i
), i = 1, . . . , m .
Here y
i
are the measured data corresponding to t
i
.
The nonlinearity arises only from M(x, t).
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 5
The Jacobian and the gradient of f(x)
The Jacobian J(x) of the vector function r(x) is dened as the
matrix with elements
[J(x)]
ij
=
r
i
(x)
x
j
=
M(x, t
i
)
x
j
, i = 1, . . . , m, j = 1, . . . , n.
The ith row of J(x) equals the transpose of the gradient of r
i
(x):
[J(x)]
i,:
= r
i
(x)
T
= M(x, t
i
)
T
, i = 1, . . . , m.
Thus the elements of the gradient of f(x) are given by
[f(x)]
j
=
f(x)
x
j
=
m

i=1
r
i
(x)
r
i
(x)
x
j
and it follows that the gradient is the vector
f(x) = J(x)
T
r(x) .
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 6
The Hessian matrix of f(x)
The elements of the Hessian of f, denoted
2
f(x), are given by
[
2
f(x)]
k
=

2
f(x)
x
k
x

=
m

i=1
r
i
(x)
x
k
r
i
(x)
x

+
m

i=1
r
i
(x)

2
r
i
(x)
x
k
x

,
and it follows that the Hessian can be written as

2
f(x) = J(x)
T
J(x) +
m

i=1
r
i
(x)
2
r
i
(x),
where
[

2
r
i
(x)
]
k
=
[

2
M(x, t
i
)
]
k
=

2
M(x, t
i
)
x
k
x

, k, = 1, . . . , m .
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 7
The optimality conditions
First-order necessary condition:
f(x) = J(x)
T
r(x) = 0 .
Second-order sucient condition:

2
f(x) = J(x)
T
J(x) +
m

i=1
r
i
(x)
2
r
i
(x) is positive denite.
The rst and often dominant term J(x)
T
J(x) of the Hessian
contains only the Jacobian matrix J(x), i.e., only rst derivatives!
In the second term, the second derivatives are multiplied by the
residuals. If the model is adequate then the residuals will be small
near the solution and this term will be of secondary importance.
In this case one gets an important part of the Hessian for free if
one has already computed the Jacobian.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 8
Local linear LSQ problem
If we introduce a Taylor expansion around the LSQ solution x

,
the local least squares problem for x close to x

can be written
min
x
J(x

) (x x

) +r(x

)
2
=
min
x

J(x

) x
(
J(x

)x

+r(x

)
)

2
.
It follows from the results in Chapter 1 that:
Cov(x

) J(x

Cov
(
J(x

) x

r(x

)
)
(J(x

)
T
= J(x

Cov
(
r(x

) J(x

) x

)
(J(x

)
T
= J(x

Cov(y)(J(x

)
T
.
This provides a way to approximately assess the uncertainties in
the least squares solution x

for the nonlinear problem.


02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 9
Newtons method
If f(x) is twice continuously dierentiable then we can use
Newtons method to solve the nonlinear equation
f(x) = J(x)
T
r(x) = 0
which provides local stationary points for f(x). This version of the
Newton iteration takes the form, for k = 0, 1, 2, . . .
x
k+1
= x
k

2
f(x
k
)
)
1
f(x
k
)
= x
k

(
J(x
k
)
T
J(x
k
) + S(x
k
)
)
1
J(x
k
)
T
r(x
k
),
where S(x
k
) denotes the matrix
S(x
k
) =
m

i=1
r
i
(x
k
)
2
r
i
(x
k
).
Convergence. Quadratic convergence, but expensive requires
mn
2
derivatives to evaluate S(x
k
).
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 10
The Gauss-Newton method
If the problem is only mildly nonlinear or if the residual at the
solution is small, a good alternative is to neglect the second term
S(x
k
) of the Hessian altogether.
The resulting method is referred to as the Gauss-Newton method,
where the computation of the step x
GN
k
involves the solution of
the linear system
(
J(x
k
)
T
J(x
k
)
)
x
GN
k
= J(x
k
)
T
r(x
k
).
Note that in the full-rank case this is actually the normal equations
for the linear least squares problem
min
x
GN
k

J(x
k
) x
GN
k
(r(x
k
))

2
2
.
This is a descent step if J(x
k
) has full rank.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 11
Damped Gauss-Newton = G-N with line search
Implementations of the G-N method usually perform a line search
in the direction x
GN
k
, e.g., requiring the step length
k
to satisfy
the Armijo condition:
f(x
k
+
k
x
GN
k
) < f(x
k
) + c
1

k
f(x
k
)
T
x
GN
k
= f(x
k
) + c
1

k
r(x
k
)
T
J(x
k
)
T
x
GN
k
,
with a constant c
1
(0, 1).
This ensures that the reduction is (at least) proportional to both
the parameter
k
and the directional derivative f(x
k
)
T
x
GN
k
.
Line search make the algorithm (often) globally convergent.
Convergence. Can be quadratic if the neglected term in the
Hessian is small. Otherwise it is linear.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 12
Algorithm: Damped Gauss-Newton
Start with the initial point x
0
, and iterate for k = 0, 1, 2, . . .
Solve min
x
J(x
k
) x +r(x
k
)
2
to compute the step
direction x
GN
k
.
Choose a step length
k
so that there is enough descent.
Calculate the new iterate: x
k+1
= x
k
+
k
x
GN
k
.
Check for convergence.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 13
The Levenberg-Marquardt method
Very similar to G-N, except that we replace the line search with a
trust-region strategy where the norm of the step is limited.
min J(x
k
) x + r(x
k
)
2
2
subject to x
2
bound.
Constrained optimization is outside the scope of this course (it is
covered in 02612).
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 14
Computation of the L-M Step
The computation of the step in Levenberg-Marquardts method is
implemented as:
x
LM
k
= argmin
x
{
J(x
k
) x + r(x
k
)
2
2
+
k
x
2
2
}
where
k
> 0 is a so-called Lagrange parameter for the constraint
at the kth iteration.
The L-M step is computed as the solution to the linear LSQ
problem
min
x

J(x
k
)

1
/2
k
I

r(x
k
)
0

2
2
.
This method is more robust, in case of an ill conditioned Jacobian.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 15
Algorithm: Levenberg-Marquardt
Start with the initial point x
0
and iterate for k = 0, 1, 2, . . .
At each step k choose the Lagrange parameter
k
.
Solve the linear LSQ problem
min
x

J(x
k
)

1
/2
k
I

r(x
k
)
0

2
2
to compute the step x
LM
k
.
Calculate the next iterate x
k+1
= x
k
+ x
LM
k
.
Check for convergence.
Note: there is no line search (i.e., no
k
-parameter), its role is
taken over by the Lagrange parameter
k
.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 16
The role of the Lagrange parameter
Consider the L-M step, which we formally write as:
x
LM
k
=
(
J(x
k
)
T
J(x
k
) +
k
I
)
1
J(x
k
)
T
r(x
k
).
The parameter
k
inuences both the direction and the length of
the step.
Depending on the size of
k
, the step x
LM
k
can vary from a
Gauss-Newton step for
k
= 0, to a short step approximately in the
steepest descent direction for large values of
k
.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 17
How to choose the Lagrange parameter
A strategy developed by Marquardt. The underlying principles are:
1. The initial value
0
J(x
0
)
T
J(x
0
)
2
.
2. For subsequent steps, an improvement ratio is dened as:

k
=
actual reduction
predicted reduction
=
f(x
k
) f(x
k+1
)
1
2
(x
LM
k
)
T
(J(x
k
)
T
r(x
k
)
k
x
LM
k
)
.
Here, the denominator is the reduction in f predicted by the local
linear model.
If
k
is large then the pure Gauss-Newton model is good enough, so

k+1
can be made smaller than at the previous step. If
k
is small
(or even negative) then a short steepest descent step should be
used, i.e.,
k+1
should to be increased.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 18
Algorithm: Marquardts Parameter Updating
If
k
> 0.75 then
k+1
=
k
/3.
If
k
< 0.25 then
k+1
= 2
k
.
Otherwise use
k+1
=
k
.
If
k
> 0 then perform the update x
k+1
= x
k
+ x
LM
k
.
As G-N, the L-M algorithm is (often) globally convergent.
Convergence. Can be quadratic of the neglected term in the
Hessian is small. Otherwise it is linear.
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 19
G-N without damping (top) vs. L-M (bottom)
x
2
x
3
0 0.5
0.1
0.2
0.3
0.4
0.5
x
1
x
2
0 2 4
0
0.2
0.4
x
1
x
3
0 2 4
0.1
0.2
0.3
0.4
0.5
x
2
x
3
0 0.5
0.1
0.2
0.3
0.4
0.5
x
1
x
2
0 2 4
0
0.2
0.4
0.6
x
1
x
3
0 2 4
0.1
0.2
0.3
0.4
0.5
02610 Optimization and Data Fitting Nonlinear Least-Squares Problems 20
MATLAB Optimization Toolbox: lsqnonlin
[x,resnorm] = lsqnonlin(fun,x0) requires an initial point x0
and a function fun that computes the vector-valued function
f(x) =

f
1
(x)
.
.
.
f
m
(x)

and solves the problem


min
x
f(x)
2
2
= min
x
(
f
1
(x)
2
+ + f
m
(x)
2
)
)
.
Use optimset to choose between dierent optimization methods.
E.g., LargeScale=off and LevenbergMarquardt=off
give the standard G-N method, while Jacobian=on and
Algorithm=levenberg-marquardt give the L-M algorithm.

S-ar putea să vă placă și