Sunteți pe pagina 1din 37

Iterative Methods for Linear Systems

Howard C. Elman1
Department of Computer Science
and
Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
elman@cs.umd.edu
This chapter contains an overview of some of the important techniques used to solve linear
systems of equations
Ax = b
(1)
by iterative methods. We consider methods based on two general ideas, splittings of the coecient
matrix, leading to stationary iterative methods, and Krylov subspace methods. These two ideas
can also be combined to produce preconditioned iterative methods. In addition, we outline some
convergence results for using the methods considered to solve two classes of model problems
arising from elliptic partial di erential equations.
In x1, we introduce the basic ideas of stationary iterative methods and consider several particular examples of such methods: the Jacobi, Gauss-Seidel, SOR and SSOR methods. We outline
some results on convergence of these methods, for both general matrices and those with special structure. In x2, we give an overview of Krylov subspace methods for systems where the
coecient matrix is symmetric. These include the conjugate gradient method for symmetric
positive-de nite systems, and several generalizatons of this technique for the symmetric inde nite case. In x3, we examine the use of Krylov subspace methods for nonsymmetric problems.
This is an active area of current research, and we highlight GMRES, the most popular method
in current use, together with the QMR method, one of several new ideas being studied. In x4, we
present several preconditioning techniques that can be used in combination with Krylov subspace
methods. Our emphasis here is methods such as incomplete factorizations that are de ned purely
in terms of the algebraic structure of the coecient matrix. In x5, we outline the convergence
properties of the methods presented for two classes of model problems, the discrete Poisson equation, which is symmetric positive-de nite, and the discrete convection-di usion equation, which
is nonsymmetric. Finally, in x6, we present a brief discussion of several important topics that we
have not considered here.
Before proceeding, we introduce several points of notation. We will assume that A is a
nonsingular real matrix of order n. All the methods considered generate a sequence of iterates
x(k) that are intended to converge to x = A 1 b. They all require a stopping criterion that
can be used to determine when the iterate is suciently accurate. We will not address this
question in any detail, except to note that the residual r(k) = b Ax(k) is easily computable; a
commonly used stopping criterion is to require that the relative residual kr(k)k=kbk be smaller
than some tolerance, where k: k is some vector
P norm. Throughout this chapter, we will use (v; w)
to represent the Euclidean inner product nj=1 vj wj , and kv k2 = (v; v )1=2 to denote the Euclidean
1
This work was supported by the U. S. Army Research Oce under grant DAAL-0392-G-0016, and by the
National Science Foundation under grants ASC-8958544 and CCR-8818340.

norm. Many of the methods under consideration compute this norm of the residual, kr(k) k, as
part of the iteration. Essentially all of the results presented here carry over to complex systems
of equations where complex inner products are used in place of real inner products.

1. Stationary Methods.

In this section, we give a brief overview of stationary methods for solving (1). Methods of this
type, such as relaxation methods, were the most widely used examples of iterative methods when
large computers rst became available. (See [79, 80] for a historical perspective.) To some extent,
they are now somewhat less popular than the methods discussed in xx2 and 3, although the ease
with which they can implemented and their uses in the context of preconditioners continue to
make them an important topic of study.

1.1. Basic Principles.

A splitting of the coecient matrix A is a representation of A in the form

A = M N:
(2)
The problem (1) is then equivalent to Mx = Nx + b. This suggests, for nonsingular M , the
stationary method for constructing a sequence of approximate solutions to (1),

x(k+1) = M 1 Nx(k) + M 1 b:

(3)

Here, x(0) is a (possibly arbitrary) initial guess for the solution. The \classical" Jacobi, GaussSeidel and successive overrelaxation methods [74, 78] are examples of such methods.
Let e(k) = x x(k) denote the error at the k'th step. We say that the method (3) is convergent if lim k!1 e(k) = 0. Note that e(k) = Gk e(0), where G = M 1 N is the iteration matrix.
Consequently, for any consistent matrix norm k  k, the error satis es

ke(k)k = kGk e(0)k  kGk k ke(0)k;


(4)
and the norm of the error tends to zero if kGk k ! 0. Unfortunately, it is usually dicult to
derive analytic bounds on kGk k, and analysis of iterative methods typically makes use of the

following result [74].


Theorem 1.1. The norm kGk k ! 0 if and only if (G) < 1. Moreover, kGk k  c(k)(G)k where
c(k) is a polynomial in k.
Here, (G) = maxfjj :  is an eigenvalue of Gg. That is, the method is convergent for arbitrary
initial guesses provided that the largest eigenvalue of the iteration matrix has modulus less
than one. The second assertion states that convergence is in some sense characterized by this
eigenvalue. In particular, if our goal is to make ke(k) k=ke(0)k   for some small , then from
(4), it is sucient to perform enough iterations such that kGk k  . In light of the theorem, this
suggests that k should satisfy
log   log c(k) + k log (G)  k log (G) for large k;
i.e.

k  j log j=j log (G)j


2

(5)

iterations will suce.


We will consider examples of stationary methods using a variant of the computation (3). We
have
Mx(k+1) = Nx(k) + b = Mx(k) + b (M N )x(k);
so that
x(k+1) = x(k) + M 1 r(k):
(6)
This expression determines an implemention that automatically provides the residual vector, r(k) ,
which is often used in the stopping criterion for an iterative method. Its cost per iteration is
somewhat higher than for (3) if the residual is not required.
We simultaneously consider \point" and \block" versions of several stationary methods. Let
the coecient matrix be written as A = D L U , where

D = [Aii]; L = [ Aij ]; j < i;

and U = [ Aij ]; j > i

are block matrices consisting of the block diagonal, strict block lower triangular and strict block
upper triangular parts of A, respectively. Here, each Aij is itself a matrix, and the diagonal
entries Aii are assumed to be square and nonsingular. If all the matrices Aij are square of order
1, then the methods we discuss below are \point" versions; otherwise, they are block versions.
The Jacobi method is de ned by choosing M = D and N = (L + U ) in the splitting (2),
producing the iteration matrix BJ = D 1 (L + U ). The Gauss-Seidel method is de ned using
M = D L, N = U , giving the iteration matrix BGS = (D L) 1 U . The iterations therefore
have the form
x(k+1) = x(k) + D 1 r(k)
for the Jacobi method, and
x(k+1) = x(k) + (D L) 1r(k):
for the Gauss-Seidel method.
The successive over-relaxation method (SOR) is de ned by the splitting
(7)
M = 1 (D !L); N = 1 [(1 ! )D + !U ];

!
where ! 6= 0 is real. The iteration matrix is L! = (D !L) 1 [(1 ! )D + !U ], producing the
iteration

x(k+1) = x(k) + ! (D !L) 1r(k):

The idea underlying this method is to parameterize the Gauss-Seidel scheme, to which it reduces
for the choice ! = 1.
The symmetric successive over-relaxation method performs a \lower triangular sweep" based
on the splitting (7), followed by an analogous \upper triangular sweep." Speci cally, let

M1 = !1 (D !L);
M2 = !1 (D !U );

N1 = !1 [(1 ! )D + !U ];
N2 = !1 [(1 ! )D + !L]:

(8)

The SSOR iteration is

x(k+ ) = x(k) + M1 1 (b Ax(k) ); x(k+1) = x(k+ ) + M2 1 (b Ax(k+ )) :


1
2

1
2

1
2

(9)

This iteration can be expressed in terms of a single splitting of the form (2). Using (9), it can be
shown that
x(k+1) = x(k) + (M1 1 + M2 1 + M2 1 AM1 1 )r(k):
But

M1 1 + M2 1 + M2 1 AM1 1 = M2 1 (M2 + N1)M1 1 = (D !U ) 1(2 ! )D(D !L) 1;


so that the SSOR splitting matrix is

M = (2 1 ! ) (D !L)D 1(D !U ):

(10)

We present this method in the context of stationary methods for historical reasons. In fact, for
model problems, convergence is actually slower than for SOR [78], and the SSOR splitting is now
primarily used in the context of preconditioning. See Section 4.
There is a large body of analysis of convergence properties of stationary iterative methods.
The texts by Varga [74], Young [78] and Hageman and Young [41] are comprehensive references,
and the general texts by Ortega [53] and Stoer and Bulirsch [68] contain good concise overviews.
Much of the analysis is based on the Perron-Frobenius theory of nonnegative matrices, and we
will not develop this machinery here. We highlight some of the main results below.
Definition. Let A be a square matrix.
A is diagonally dominant if jaii j  Pj 6=i jaij j for all i.
A is strictly diagonally dominant if strict inequality holds for each i.
A is irreducible if there is no permutation matrix P such that P T AP has the form
!
A~11 A~12 :
0 A~22
where A~11 is of order p and A~22 is of order q and both p and q are greater than 0.
A is irreducibly diagonally dominant if it is irreducible and diagonally dominant, and strict inequality holds in at least one index.
Theorem 1.2.
(i) If A is either strictly diagonally dominant or irreducibly diagonally dominant, then both the
point Jacobi method and the point Gauss-Seidel method are convergent.
(ii) If BJ  0 (elementwise) and the Jacobi method is convergent ((BJ ) < 1), then the GaussSeidel method is also convergent and (BGS ) < (BJ ) < 1.
(iii) If A is symmetric positive de nite, then the Gauss-Seidel method is convergent, and the SOR
and SSOR methods are convergent for ! 2 (0; 2).
(iv) The SOR method is not convergent for ! 62 (0; 2).
To give a avor of the analysis, we prove assertion (i). Note that strictly diagonally dominant
matrices and irreducibly diagonally dominant matrices are nonsingular [74]. Consider the Jacobi
method. Suppose D 1 (L + U )v = v . This implies that (D L U )v = 0, i.e., AJ () =
D L U is singular. But if A = AJ (1) is either strictly or irreducibly diagonally dominant,
then so is A() for all jj  1. Consequently, it must be that (BJ ) < 1, so that the Jacobi
4

method is convergent. Similarly, for the Gauss-Seidel method, if (D L) 1Uv = v , then


AGS () = D L U is singular. As above, when A is strictly or irreducibly diagonally
dominant, so is AGS () for jj  1, so that (BGS ) < 1.

1.2. Consistently Ordered Matrices and Property A.

The results of Theorem 1.2 do not say anything about how fast convergence is for any of the
methods considered. For problems with additional structure, it is possible to give more precise
statements about rates of convergence.
Definition. Let A = [Aij ], 1i; j nb , where Aij is a submatrix and Aii is square and nonsingular.
A has block Property A if there is a permutation matrix P such that P T AP has the form

D1 C1 :
C2 D2

(11)

where D1 and D2 are block diagonal matrices whose only nonzero blocks are diagonal blocks of
A.
A is block consistently ordered if the integers 1; : : :; nb can be partitioned into t disjoint sets
fSk gtk=1 such that if Aij 6= 0, then i 2 Sk implies j 2 Sk 1 for j < i and j 2 Sk+1 for j > i.
If the blocks of A are all of size 1 (so that nb = n), then these de nitions reduce to \point"
Property A and consistent ordering. The most common examples of matrices with these structures are those arising from discretizations of elliptic and parabolic partial di erential equations.
Some examples are given below; see [53, 68, 74, 78] for details.2

 Five-point (in two-dimensions) and seven point (for three-dimensions) nite di erence operators. For example, the left side of the gure below shows a \natural" ordering of a 5  4

grid, and the right side shows how these grid points can be grouped into sets indexed by
superscripts that de ne a consistent ordering. A \red-black" ordering produces matrices
with point Property A; for example, list the odd-numbered grid points in the left side of
the gure rst, followed by the even-numbered points.

2
It is not always appreciated that consistent ordering is a more restrictive property than Property A. Any
matrix of the form (11) is consistently ordered, with two sets S1 and S2 determined from the blocking in (11), but
not every matrix that can be permuted into this form is consistently ordered itself. A counterexample [78] is
0 2+c 1 0
1 1
A=B
@ 01 2 +1c 2 +1c 01 CA ;
1
0
1 2+c
which is not consistently ordered but has Property A. (Interchange rows 2 and 3 and columns 2 and 3.) This matrix
is a discretization of the one-dimensional Helmholtz equation u00 + u = 0 with periodic boundary conditions. The
relation (12) does not hold for this matrix.

16
11
6
1

17
12
7
2

18
13
8
3

19
14
9
4

(4)
(3)
(2)
(1)

20
15
10
5

(5)
(4)
(3)
(2)

(6)
(5)
(4)
(3)

(7)
(6)
(5)
(4)

(8)
(7)
(6)
(5)

 Linear nite elements on triangles, and bilinear nite elements on two-dimensional quadri-

laterals. On regular grids, grouping of unknowns by lines produces matrices with block
consistent orderings, and line red-black orderings lead to block Property A.
 Discretizations of coupled di erential operators together with orderings by grid points often
produce matrices with block Property A, where the size of the blocks is the number of
di erential operators that are coupled together.

For consistently ordered matrices, the following results, gleaned from Young [78], Chapters 5
and 6, show the relationship between the Jacobi and Gauss-Seidel iteration matrices and identify
a good choice for the SOR parameter.
Theorem 1.3. Let A be a consistently ordered matrix.
(i) For each eigenvalue  of the SOR iteration operator L! , there is an eigenvalue  of the Jacobi
operator BJ such that
( + ! 1)2 = ! 22 :
(12)
Conversely, if  is an eigenvalue of BJ , then there is an eigenvalue  of L! such that (12) holds.
In particular, (BGS ) = (BJ )2 .
(ii) If  (BJ ) is real and (BJ ) < 1, then the choice !  = 1+p12 (B ) minimizes (L! ) with
respect to ! , and (L! ) = !  1.
We present an elementary proof of (i), due to Golub and de Pillis [32], in the case where A
has the form
!
!
!
!
Ip
M = I 0
0 0
0 M = D L U:
M T Iq
0 I
MT 0
0 0
2

Let M = V W T denote the singular value decomposition of M , i.e., V and W are orthogonal
matrices and  is the matrix of singular values 1  2      r  0, where r = max(p; q ).
Then
!
!
!
!T
0
M
V
0
0

V
0
BJ = M T 0 = 0 W
(13)
T 0
0 W :
By symmetrically permuting the rows and columns of the interior matrix on the right of (13), we
nd that BJ is similar to a block diagonal matrix containing the r two-by-two blocks
0 j
j 0

; j = 1; : : :; r;

on its block diagonal, and zeros elsewhere. Therefore, the eigenvalues of BJ are fj g, together
with p + q 2r zeros. Analogously, we have

L! = (I !L) 1[(1 !)I + !U ] = (I + !L)[(1 !)I + !U ]


!
!
!T
V
0
(1
!
)
I
!

V
0
= 0 W
;
!(1 ! )T (1 ! )I + ! 2 T 
0 W
which is similar to the block diagonal matrix containing r two-by-two blocks
!
(1 ! )
!j
! (1 ! )j (1 ! ) + ! 2 2j ; j = 1; : : :; r;
and (1 ! )Ip+q 2r on its block diagonal. Consequently, the eigenvalues of L! are (1 ! ) and
the roots of the equation
(1 ! )2 = ! 22j ;
which is equivalent to (12).

2. Krylov Subspace Methods I: Symmetric Problems.

An alternative methodology that has proven to be a fruitful source of iterative methods is based on
Krylov subspaces. Given a square matrix B and vector v , let Kk (v; B )  spanfv; Bv; : : :; B k 1v g,
the Krylov subspace generated by B with respect to v . Given an initial guess x(0) for (1) with
residual r(0), a Krylov subspace method produces a sequence of iterates of the form

x(k) = x(0) + v(k);

(14)

where v (k) 2 Kk (r(0); A). A very simple example is the rst-order Richardson method x(k+1) =
x(k) + k r(k) , where r(k) = b Ax(k) and k is a scalar. Any iterate of the form (14) satis es
x(k) = x(0) + k 1 (A)r(0) where k 1 (t) is a polynomial of degree k 1. Equivalently, the residual
satis es r(k) = k (A)r(0) where k (t) is a member of the set

Pk  fpolynomials k of degree k satisfying k (0) = 1g:


As motivation for basing an iterative method on this idea, suppose A has
a set of orthonormal
P
n
(
j
)
n
n
(0)
eigenvectors fv gj =1 , with corresponding eigenvalues fj gj =1 . If r = j =1 j v (j ), then r(k) =
P
n  ( )v (j ) and
j =1 j k j
11=2
0n
X
j ()j kr(0)k2:
kr(k)k2 = @ j2k (j )2A  max
2(A) k
j =1

Thus, the residual is small provided jk j is small on the spectrum of A.


In this section we present some Krylov subspace iterative methods applicable to symmetric
problems. Our emphasis is on methods that require no estimation of auxiliary parameters such as
eigenvalues of the coecient matrix. We focus on the conjugate gradient method (CG) for symmetric positive-de nite systems, and on generalizations of CG for symmetric inde nite problems
7

derived from the connection between CG and the Lanczos algorithm. An extensive bibliography
of CG and related methods is given in [34].

2.1. Methods for Symmetric Positive De nite Problems.

Assume A is symmetric and positive-de nite. Then the expression (u; Av ) de nes an inner
product, and we refer to the associated norm kukA  (u; Au)1=2 as the \A-norm." The conjugate
gradient method of Hestenes and Stiefel [42] is de ned as follows.
The Conjugate Gradient Method.
Choose x(0); compute r(0) = b Ax(0); set p(0) = r(0)
for k = 0 until convergence do
k = (r(k) ; r(k))=(p(k); Ap(k))
x(k+1) = x(k) + k p(k)
r(k+1) = r(k) k Ap(k)
<Test for convergence>
k = (r(k+1); r(k+1))=(r(k); r(k))
p(k+1) = r(k+1) + k p(k)
enddo

The e ectiveness of CG stems from its minimization properties. The choice of the scalar

k is such that the new iterate minimizes the A-norm of the error among all choices along the
\direction vector" p(k) , i.e.
kx x(k+1)kA = min
k
k kx ukA :
u=x( ) + p(

More importantly, the one-dimensional minimization is actually a k-dimensional one: x(k) is the
unique vector in the translated Krylov space x(0) + Kk (r(0); A) for which the A-norm of the error
is minimum. We will prove this, making use of the following lemma.
Lemma 2.1. For any k such that x(k) 6= x, the vectors generated by the conjugate gradient
methods satisfy
(i) (r(k); p(j )) = (r(k); r(j )) = 0; j < k;
(ii) (p(k); Ap(j )) = 0;
j < k;
(1)
(0)
(
k
1)
(iii) spanfr ; r1 ; : : :; r g = spanfp(0); p(1); : : :; p(k 1)g = Kk (r(0); A):
Proof. We prove relations (i) and (ii) simultaneously by induction on k. They are trivially true

for k = 0. Assume they all hold for indices 0; : : :; k 1. For the equalities of (i), we have
(r(k); p(j )) = (r(k 1) ; p(j )) k 1 (p(k 1); Ap(j )):

(15)

If j < k 1, then the induction hypothesis implies that both inner products on the right side
of (15) are 0; if j = k 1, then, from the de nition of k 1 , the expression on the right is
(r(k 1); p(k 1)) (r(k 1); r(k 1)), which is zero by the induction hypothesis. To complete the
induction, we use the recurrence for p(j ) , giving
(r(k); p(j )) = (r(k); r(j )) + j 1 (r(k); p(j 1)) = (r(k); r(j )):
8

For assertion (ii), by the recurrences de ning p(k) and r(k+1), we have
(p(k) ; Ap(j )) = 1 (r(k) ; r(j +1) r(j )) + (p(k 1); Ap(j )):
k 1

(16)

For j < k 1, the induction hypothesis for (i) and (ii) imply that the right side of (16) is zero.
For j = k 1, using the induction hypothesis for (i) and the de nition of k and k 1 , the right
hand expression in (16) is
(k 1)
(k 1)
(k) (k)
(r(k) ; r(k)) (p (k 1); Ap(k 1) ) + ((kr 1); r(k )1) (p(k 1) ; Ap(k 1)) = 0:
(r ; r ) (r ; r )
For assertion (iii), a straightforward inductive argument shows that

spanfr(0); r1(1); : : :; r(k 1)g  Kk (r(0); A); spanfp(0); p(1); : : :; p(k 1)g  Kk (r(0); A):
By (i) and (ii), each of the sets fr(j )gkj =01 and fp(j )gkj =01 is linearly independent. But Kk (r(0); A)
has dimension at most k, so that all three sets must be identical.
To establish the k-dimensional minimization property of CG, it is convenient to use the
function E (u)  (u; Au) 2(b; u). Note that E (u) = (x u; A(x u)) (x; Ax), i.e. E (u) di ers
by a constant from the square of the A-norm of the error. Consequently, the error norm and
E (u) are minimized by the same quantities.
Theorem 2.1. The iterate x(k) generated by the conjugate gradient method is the unique member
of x(0) + Kk (r(0); A) for which either (and therefore both) the A-norm of the error is minimum,
or the residual b Ax(k) is orthogonal to Kk (r(0); A).
Proof. By de nition, x(k) = x(0) + Pk ak , where Pk = [p(0); : : :; p(k 1)] is the matrix with columns
p(0); p(1); : : :; p(k 1), and ak = ( 0; 1; : : :; k 1 )T . By Lemma 2.1, (iii), x(k) 2 x(0) + Kk (r(0); A).
To establish the minimizing property of x(k) , let u(k) denote any other vector in x(0) + Kk (r(0); A),
and let v (k) = x(k) u(k) . Thus, v (k) has the form v (k) = Pk bk . We have
E (u(k)) = E (x(k) + v(k)) = E (x(k)) + (v(k); Av(k)) + 2(v(k); Ax(k) b):
But

(v (k); Ax(k) b) = (bk ; PkT r(k) ) = 0;


by Lemma 2.1, (i). Therefore, for u(k) 6= x(k), E (u(k)) > E (x(k)).
From Lemma 2.1, we know that r(k) is orthogonal to Kk (r(0); A). To establish uniqueness,
suppose there exists u(k) 2 x(0) + Kk (r(0); A) such that r^(k) = b Au(k) is also orthogonal to
Kk (r(0); A). Note that r(k) r^(k) = A(u(k) x(k)). Therefore, u(k) x(k) 2 spanfp(0); : : :; p(k 1)g,
and (p(j ) ; A(u(k) x(k) )) = 0, j = 0; : : :; k 1. Consequently, u(k) x(k) = 0.
This result implies that CG computes the exact solution to (1) in at most n steps. Moreover,
the optimality of the CG iterate can be used to derive bounds on the A-norm of the error. The
analysis makes use of the Chebyshev polynomials, de ned by
8
>
< cos(k cos 1t)1 for t 2 [ 1; 1]
k (t)  > cosh(k cosh t) for t > 1
:
(17)
: ( 1)k k ( t)
for t < 1
9

Using (17), it can be shown, see e.g. [61], that k is a polynomial of degree k, and that
h p
i
p
(18)
k (t) = 12 (t + t2 1)k + (t t2 1)k :
Let  = (A) = max(A)=min(A), the condition number of A.
Theorem 2.2. The error e(k) = x x(k) after k steps of the conjugate gradient method satis es

p !k
1
1
=
ke(k)kA  2 1 + 1=p ke(0)kA:

Proof. Let e(0) =

Pn v(j) denote a decomposition of the initial error into orthonormal


j =1 j

eigenvectors, where Av (j ) = j v (j ). Let a = min(A), b = max(A). Theorem 2.1 implies that


ke(k)kA  kk (A)e(0)kA, where k (t) is any polynomial in Pk . But

0n
11=2
X
kk (A)e(0)kA = @ j2j k (j )2A  max jk (j )j ke(0)kA  max jk (t)j ke(0)kA:
j =1

t2[a;b]

(19)

Consider the particular choice of k (t) as the scaled and translated Chebyshev polynomial
 
   b + a 
k (t) = k b + a 2t
k
:
(20)

b a b a

b a

 

For t 2 [a; b], the argument in the numerator of (20) lies in [ 1; 1], so that jk (t)j  1=k bb+aa .
But, by (18),
 b + a  1 1 + 1=p !k
k b a  2 1 1=p :
(21)
The result follows from (19), (20) and (21).
Suppose our objective is to make the relative error ke(k) kA =ke(0)kA  . Using the bound
from Theorem 2.2, it suces for the inequality

p !k
p !k
1
1
=
2
=
  2 1 + 1=p = 2 1 1 + 1=p

to hold. Taking
the p
natural logarithm of both sides of the inquality and using the fact that
p
2
=
p
ln(1 1+1=  )  2=  for large , this is equivalent to the condition

p
k  12 j ln =2j :

(22)

That is, a bound on the number of iterations required to reach a given stopping criterion is
approximately proportional to the square root of the condition number of the coecient matrix.
This quantity is often much smaller than n.
The bound of Theorem 2.2 can be improved in cases where the eigenvalues of A are clustered
into groups, or where there are a small number of isolated eigenvalues. See [3, 36].
10

Finally, we note that it is possible to construct a variant of the conjugate gradient method
whose kth iterate is the unique member of x(0) + Kk (r(0); A) that minimizes ke(k) kA = kr(k)k2 ,
i.e. the Euclidean norm of the residual. This method is known as the conjugate residual method
(CR). From a computational point of view, the main di erence between CG and CR is that the
scalars required by CR have the form
2

k = (r(k); Ar(k))=(Ap(k); Ap(k)); k = (r(k+1); Ar(k+1))=(r(k); Ar(k));


(23)
to avoid the computation of two matrix-vector products per step, the product Ar(k+1) can be
used to perform the update Ap(k+1) = Ar(k+1) + k Ap(k) .

2.2. Methods for Symmetric Inde nite Problems.

If A is inde nite, then the denominator (p(k); Ap(k)) of k computed by CG may be zero and the
algorithm will break down. In practice, the exact value zero typically does not occur, but a very
small value of (p(k) ; Ap(k)) will make the computation unstable. We now show how to avoid this
problem by exploiting the connection between CG and the Lanczos algorithm. This idea gave
rise to the SYMMLQ algorithm developed by Paige and Saunders [54]. In addition, we give a
brief description of a stabilized version of the CR method applicable to inde nite systems.
Consider the Lanczos computation for generating orthonormal vectors [14, 33, 56, 77]. Let
(0)
v be a vector such that kv (0)k2 = 1, and let v( 1) = 0. An orthogonal basis for Kk+1(v(0); A)
can be constructed by the recurrence
j+1 v(j +1) = Av (j) j v(j ) j v(j 1) ; 0  j  k 1;
(24)
where j = (v (j ); Av (j )) and j +1 is chosen so that kv (j +1)k2 = 1. Let Vk = [v (0); v (1); : : :; v (k 1)],
and let Tk denote the symmetric tridiagonal matrix

tri [ j ; j ; j +1 ]; 0  j  k 1:
Then (24) is equivalent to the relation
AVk = Vk Tk + k [0; : : :; 0; v(k)];
(25)
and VkT AVk = Tk . The Lanczos algorithm constructs the orthonormal set fv (j )g and uses the
eigenvalues of Tk as estimates for the eigenvalues of A. Note that the o -diagonal entries of Tk
are uniquely determined up to sign.3
Suppose CG is applied to a (possibly inde nite) system, but that the computation does not
break down through step k. The residuals and direction vectors satisfy

r(j +1) = r(j)

A(r(j ) +

j 1

p(j 1)) =

Ar(j ) +

1 + j j 1 r(j ) j j 1 r(j
j 1
j 1

or in matrix form

1) ;

ARk = Rk Sk 1= k [0; : : :; 0; r(k)]:


(26)
Here, Rk = [r(0); : : :; r(k 1)] and Sk = tri [ j1 ; 1j + jj ; jj ]: Sk is similar to a symmetric
matrix Tek via a diagonal similarity tranformation Tek = k Sk k 1 , where k = diag (kr(0)k2 ;
1

The products j+1 v(j+1) are uniquely determined.

11

kr1(1)k2; : : :; kr(k 1)k2). Postmultiplying (26) by k 1 and letting Vek = Rk k 1 denote the matrix
of normalized residuals leads to the equivalent relation
p
e
e
e
(27)
AVk = Vk Tk k 1 [0; : : :; 0; ~v(k)]:
k 1
But (25) and (27) are identical, so that Vek = Vk , and Tek = Tk . That is, the normalized residuals
generated by CG are precisely the Lanczos vectors. Moreover, the CG iterate x(k) can be recovered
directly from (25). By Theorem 2.1, x(k) is the unique vector in x(0) + Kk (r(0); A) with residual
orthogonal to Kk (r(0); A). That is, x(k) = x(0) + Vk y (k) , where y (k) = (y0(k); : : :; yk(k)1 )T , such that
VkT r(k) = 0. But


r(k) = r(0) AVk y (k) = Vk kr(0)k2"1 Tk y (k) k yk(k)1 v(k);
(28)
where "1 = [1; 0; : : :; 0]T . Orthogonality is imposed by choosing y (k) to satisfy
0 = VkT r(k) = kr(0)k2"1 Tk y (k):
(29)
Note that by (28) and (29),

r(k) = k yk(k)1 v(k) ;

kr(k)k2 = k yk(k)1 :

(30)
Let us consider the question of breakdown in CG more closely, following [54]. By the recurrence de ning fp(j )g in CG, we have Rk = Pk LTk where
2
3
1 0
66 . . . .
77
.
.
7:
LTk = 66
4
1 k 2 75
1
Equivalently, Vk = Pk L~ Tk , where L~ k = k 1 Lk . Let Dk = PkT APk , the diagonal matrix whose
entries are the denominators of the scalars f j g. Then Tk = VkT AVk = L~ k Dk L~ Tk . That is, the
LDLT factorization of Tk contains a (small or) zero pivot if and only if the CG algorithm (nearly)
breaks down. If A is positive-de nite, then so is Tk , and the LDLT factorization is stable. If A
is inde nite, then the LDLT factorization may not exist, but it may still be possible to compute
an iterate x(k) 2 x(0) + Kk (r(0); A) with residual orthogonal to Kk (r(0); A). It is only necessary
that y (k) in (29) be computable, i.e., that Tk be nonsingular. In fact, Tk may be singular when A
is inde nite, but Tk is the leading principal minor of Tk+1 , so that the eigenvalues of Tk interlace
those of Tk+1 [77]. Therefore, Tk cannot be singular for two consecutive indices k, and it is always
possible to construct a sequence of iterates fx(k) g whose residuals are orthogonal to Kk (r(0); A):
This is the basis of the SYMMLQ method developed by Paige and Saunders [54]. This method
can be speci ed in a form analogous to CG, where x(k+1) is derived from x(k) by a short recurrence.
We will not present this variant here; cf. [9, 54] and x3.2. Instead, we describe a construction
that has also proven to be useful for developing methods for nonsymmetric problems. Let the
tridiagonal system of (29) be represented as Tk y (k) = d(k). The upper left 2  2 corner of Tk is

0 1 ;
1 1
12

and the plane rotation

0
Q0 = B
@

0 =(02 + 12)1=2 1 =(02 + 12)1=2


1 =(02 + 12)1=2 0=(02 + 12)1=2

Ik

1
CA

is such that Q0 Tk is zero below the diagonal in the rst column. In similar fashion, it is possible
to de ne plane rotations Q1; : : :; Qk 1 such that Rk = Qk 1    Q1Q0 Tk is an upper triangular
matrix with three nonzero bands, and (29) is equivalent to the upper triangular system

Rk y(k) = Qk 1    Q1 Q0 d(k):
Rk is nonsingular if and only if Tk is nonsingular, in which case y(k) is easily obtained. Moreover,
because Tk 1 is a leading principal minor of Tk , Rk can be computed from Rk 1 using just one
plane rotation. An ecient implementation of a method equivalent to SYMMLQ using kr(k) k2 as
a stopping criterion is to solve for y (k) (when possible), calculate kr(k) k2 from (30), and compute
x(k) only after the stopping test is satis ed.
The conjugate residual method minimizes kr(k)k2, which constitutes a norm (of the error)
even if A is inde nite. However, CR is is also subject to breakdown, if (r(k) ; Ar(k)) = 0 at any
step. (See (23).) This problem can be xed using the ideas just presented. Relation (25) is
equivalent to AVk = Vk+1 Tbk , where Tbk is the matrix of dimensions (k + 1)  k containing Tk in
its rst k rows and [0; : : :; 0; k] in its last row. Then



kr(k)k2 = kr(0)k2"1 Tbk y(k) 2 :

(31)

Consequently, the coecients y (k) producing x(k) = x(0) + Vk y (k) with minimal residual norm
can be obtained by minimizing the expression on the right of (31). It can be shown that unless
the exact solution has been obtained at step k 1, Tbk has full rank, so that y (k) can always be
obtained. The same set!of plane rotations discussed above can be used to transform Tbk to upper
b
triangular form R0k . The algorithm based on this analysis is equivalent to the MINRES
method of [54].
Finally, we present another algorithm equivalent to MINRES whose implementation is closer
in form to CG and CR. By an argument identical to the proof of Lemma 2.1, (i), it can be shown
that for CR,
k = (r(k); Ap(k))=(Ap(k); Ap(k)):
An alternative method for generating vectors fp(k) g that are orthogonal with respect to the
A2 -inner product, i.e. (Ap(j ); Ap(k)) = 0 for j 6= k, is based on the recurrence

p(k+1) = Ap(k) + k p(k) + k p(k 1) ;

(32)

where k = (Ap(k) ; A2p(k) )=(Ap(k); Ap(k)), k = (Ap(k) ; A2p(k 1))=(Ap(k 1); Ap(k 1)) (with

13

p(

1) = 0, 0 = 0).

Thus, an algorithm for constructing the MINRES iterate is as follows:

The Orthogonal Direction Minimum Residual Method.


Choose x(0); compute r(0) = b Ax(0); set p(0) = r(0); p( 1) = 0
for k = 0 until convergence do
k = (r(k) ; Ap(k))=(Ap(k); Ap(k))
x(k+1) = x(k) + k p(k)
r(k+1) = r(k) k Ap(k)
<Test for convergence>
k = (Ap(k) ; A2p(k))=(Ap(k); Ap(k));
if k > 0; k = (Ap(k) ; A2p(k 1) )=(Ap(k 1); Ap(k 1)) else k = 0
p(k+1) = Ap(k) + k p(k) + k p(k 1);
Ap(k+1) = A2 p(k) + k Ap(k) + k Ap(k 1);
enddo

This algorithm is presented in Chandra [9]. Only one matrix-vector product is required at each
step, to compute A2 p(k) from Ap(k). Several variants of this method have been developed that
save work by using the CR update for p(k+1) unless this leads to breakdown, in which case (32)
is used; see [9, 45].
Error analysis for the SYMMLQ and MINRES algorithms can be found in [9, 64, 69].

3. Krylov Subspace Methods II: Nonsymmetric Problems.

The conjugate gradient method and its variants have two properties that make them e ective:
they are \optimal," in the sense that at the kth step, either an error function is minimized
or a condition of orthogonality is imposed with respect to the k-dimensional space Kk (r(0); A);
and they are inexpensive, requiring a xed length recurrence at each step. It is known that
there are no generalizations of CG for solving arbitrary nonsymmetric systems that have both
these properties [25, 26]. In the past two decades, a large amount of e ort has been devoted
to developing e ective Krylov subspace methods that retain one of the two properties. That
is, either they retain optimality by allowing the cost per iteration to grow, or they sacri ce
optimality but require a small amount of work per step. Both types of methods can be derived
using variants of the Lanczos process for nonsymmetric matrices. In this section, we summarize
some of the important developments along these lines.
Before proceeding, we note that one strategy for solving a nonsymmetric system is simply to
embed it into a symmetric positive-de nite one using the normal equations AT Ax = AT b, which
can then be solved using the conjugate gradient method. This idea is generally not favored,
because the condition number of AT A is the square of that of A. Cf. [52] for a discussion of
situations where it may be e ective.

3.1. The Generalized Minimal Residual Method.

Krylov subspace methods derived from optimality criteria were the subject of extensive research
from the late 1970's through the mid-1980's. See [2, 15, 65] for characterizations of these ideas and
a more complete list of references. The main idea is as follows. For general nonsymmetric A, it
is not possible to generate an orthogonal basis for Kk (r(0); A) using short recurrences like (24) or
(32). However, a basis can be generated by a strategy analogous to the Gram-Schmidt procedure,
14

in which all previously constructed vectors fv (0); : : :; v (k 1)g are used in the construction of v (k) .
This basis can then be used to construct an iterate x(k) that satis es a \k-dimensional" criterion,
e.g., r(k) is minimized over x0 + Kk (r(0); A) or r(k) is orthogonal to Kk (r(0); A).
The work and storage requirements of such a computation grow like O(kn), which will become
prohibitive for large k. To avoid this diculty, the k-dimensional criterion can be modi ed, either
by truncating the space or by restarting the algorithm. For example, at step k, the (minimization or orthogonality) condition can be imposed on an m-dimensional subspace of Kk (r(0); A),
where m is independent of k. An example of such a truncated algorithm is Orthomin(m) [19].
Alternatively, the k-dimensional condition can be imposed as long as k  m, at which point
the iteration is restarted with x(m) as a new initial guess. This strategy, with a minimization
condition, has been demonstrated to have superior convergence characteristics. The most popular implementation of it is the restarted version of the generalized minimal residual algorithm
(GMRES), developed by Saad and Schultz [66].
GMRES is a generalization of the variants of the conjugate gradient method based on the
connection between CG and the Lanczos algorithm. It replaces the symmetric Lanczos recurrence (24) with the Arnoldi algorithm [77], which, given v (0) with kv (0)k2 = 1, constructs an
orthonormal basis for fv (0); Av (0); : : :; Ak 1 v (0)g as follows:
The Arnoldi Method.
Choose v (0) with kv (0)k2 = 1
for s = 0 until k 1 do
w0(s+1) = Av(s)
for r = 0 until s do
(r)
hrs = (wr(s+1)
1 ;v )
wr(s+1) = wr(s+1)
hrsv (r)
1
enddo

hs+1;s = kws(s+1)k2
v (s+1) = ws(s+1)=hs+1;s

enddo

This computation is analogous to the modi ed Gram-Schmidt process. (Cf. [75] for an alternative method for generating an orthonormal basis that has superior stability characteristics.)
Let Vk = [v (0); : : :; v (k 1)], and let Hk = [hrs ]; 0  r; s  k 1. By construction, Hk is an
upper-Hessenberg matrix, and

AVk = Vk Hk + hk;k 1 [0; : : :; 0; v (k)]:

(33)

The Arnoldi method for eigenvalues is to use the eigenvalues of Hk = VkT AVk as estimates for
those of A; see [62, 77]. When A is symmetric, Hk reduces to the tridiagonal matrix produced
by the Lanczos algorithm, and (33) is identical to (25).
In a similar manner, the variant of MINRES derived from (31) can be generalized. Given
an initial guess x(0) for the solution to (1), let r(0) = b Ax(0) and v (0) = r(0)=kr(0)k2 . Any
x(k) 2 x(0) + Kk (r(0); A) has the form x(k) = x(0) + Vk y(k). Let Hb k denote the matrix of dimensions
15

(k + 1)  k containing Hk in its rst k rows and [0; : : :; 0; hk;k 1] in its last row. Then (33) is
equivalent to AVk = Vk+1 Hb k , and

r(k) = r(0) AVk y(k) = Vk+1 kr(0)k2"1 Hb k y(k) :

(34)

Consequently, using the orthogonality of the vectors fv (j )g, we have



kr(k)k2 = kr(0)k2"1 Hb k y(k) 2 :

(35)

The GMRES method computes x(k) 2 x(0) + Kk (r(0); A) such that kr(k) k2 is minimum. From
(35), the vector of coecients y (k) that produce this iterate can be obtained by minimizing the
expression on the right of (35). As in the symmetric case, this upper-Hessenberg
least squares
!
problem can be solved by transforming Hb k into upper triangular form R0k , where Rk is upper

triangular, using k + 1 plane rotations (which are also applied to kr(0)k2"1 ). Here, Hb k contains
Hb k 1 as a submatrix, so that in a practical implementation, Rk can be updated from Rk 1 .
Moreover, by an analysis similar to that leading to (30), it can be shown that kr(k)k2 can be
obtained at essentially no cost. Hence, a step of the GMRES algorithm consists of constructing
a new Arnoldi vector v (k 1), determining the residual norm of the iterate x(k) that would be
obtained from Kk (r(0); A), and then either constructing x(k) if the stopping criterion is satis ed,
or proceeding to the next step otherwise.
As we have noted, this computation may become too expensive as k gets large. The restarted
GMRES algorithm, GMRES(m), restarts the computation every m steps, as follows.
The Restarted GMRES Method.
1. Choose x0 , compute r(0) = b Ax(0), v (0) = r(0)=kr(0)k2 .
2. Compute fv (1); : : :; v (m 1) g using the Arnoldi
algorithm.

(
m
)
(0)
(
m
)
b
3. Compute y such that kr k2 Hm y 2 is minimum, and x(m) = x(0) + Vm y (m) .
4. Compute r(m) = b Ax(m) . If the stopping criterion is met then stop;
otherwise, set x(0) = x(m) , v (0) = r(0)=kr(0)k2 , and repeat step 2.
We summarize the main convergence properties of GMRES in the following result. See [66]
and [19] for proofs.
Theorem 3.1. Let x(k) denote the iterate generated after k steps of GMRES, with residual
r(k) = b Ax(k) .
(i) The residual is zero (i.e. the exact solution has been obtained) if and only if the Arnoldi
iteration breaks down with hk;k 1 = 0. In particular, the exact solution is obtained in at most n
steps.
(ii) The residual norms satisfy kr(k)k2 = mink 2Pk kk (A)r(0)k2.
(iii) If A is diagonalizable, A = X X 1 where  is the diagonal matrix of eigenvalues of A, then

kr(k)k2  kX k2 kX 1k2 min


max j ( )j kr(0)k2:
2P j k j
k k

(iv) Let M = 21 (A + AT ) and R = 12 (A AT ) denote the symmetric part and skew-symmetric


16

part of A, respectively. If M is positive-de nite, then

min (M )2
2  1  (M ) (M ) + (R)2
min
max

kr(k)k

!k=2

kr(0)k2:

(36)

Assertions (ii) and (iii) follow from the optimality of GMRES with respect to the residual norm.
Assertion (i) guarantees that GMRES and GMRES(m) will solve any nonsingular problem provided that the dimensions of the Krylov spaces are large enough. This di erentiates GMRES
from many other Krylov subspace methods; for example, Orthomin and the generalized conjugate residual method (GCR) [19] may break down without producing the exact solution if the
symmetric part of A is inde nite. See [66] for further details. Assertion (iv) is a consequence of
the fact that when M is positive-de nite,

kr(j+1)k

min (M )2
2  1  (M ) (M ) + (R)2
min
max

!1=2

kr(j)k2;

see [19]. We also note that bound (36) holds for restarted GMRES, i.e. when r(k) is the residual obtained after s sets of GMRES(m) computations, r(0) is the initial residual for the rst
GMRES(m) computation, and k = sm.

3.2. Biorthogonalization Methods.

The Lanczos process was originally de ned as a method for reducing an arbitrary nonsymmetric
matrix A to tridiagonal form, using two sets of biorthogonal vectors. (See [33, 56, 77] for details
and additional references.) The symmetric version (26) is a special case. Let v (0) and w(0) be
vectors such that (v (0); w(0)) = 1, and v ( 1) = w( 1) = 0. Then the nonsymmetric Lanczos
process is given by
v~(j +1) = Av (j) j v(j) j v(j 1) ;
w~ (j +1) = AT w(j) j w(j ) j w(j 1) ;
v (j +1) = j +1 v~(j ); w(j+1) = j +1w~ (j +1) ;
where j = (w(j ); Av (j )) and j +1 and j +1 are chosen so that (w(j +1) ; v (j +1)) = 1. Letting
Vk  [v (0); : : :; v (k 1)], Wk  [w(0); : : :; w(k 1)] and Gk  tri [ j ; j ; j +1 ]; 0  j  k 1, we
have the following relations:

spanfv (0); : : :; v (k 1)g = spanfv (0); Av(0); : : :; Ak 1 v (0)g;


spanfw(0); : : :; w(k 1)g = spanfw(0); AT w(0); : : :; (AT )k 1w(0)g;
AVk = Vk Gk + k [0; : : :; 0; v(k)];
AT Wk = Wk GTk + k [0; : : :; 0; w(k)];
WkT Vk = Ik ; WkT AVk = Gk ; VkT AT Wk = GTk :

(37)

The Lanczos algorithm for solving nonsymmetric linear systems uses v (0) = r(0)=kr(0)k2 and
computes x(k) 2 x(0) + Kk (v (0); A) such that r(k) is orthogonal to spanfw(j )gkj =01 . Using the
relations (37), we see that
x(k) = x(0) + Vk y (k) ;
(38)
17

where y (k) is the solution of the tridiagonal linear system Gk y (k) = kr(0)k2"1 of order k. Thus, this
method imposes an orthogonality condition on a pair of spaces of increasing dimension, using two
short term recurrences. It is also possible to derive a variant of this scheme, called the biconjugate
gradient method, that bears essentially the same relation to it as CG does to SYMMLQ; see e.g.
[27, 29, 43, 63] for details.
Unfortunately, there are several diculties with this class of methods. First, there is no
guarantee that they will not break down. See [38, 43, 55] for analysis of this issue. In particular,
it may happen that

v~(j +1) 6= 0; w~ (j+1) 6= 0; but (~v(j+1); w~ (j +1)) = 0;

(39)

so that j +1 and j +1 cannot be de ned. In practice, such an exact breakdown is unlikely to occur,
but a near breakdown, where (~v (j +1); w~ (j +1))  0 also leads to numerical instability. Moreover,
empirical studies have shown that even when breakdown or near breakdown does not occur,
convergence behavior is very erratic, i.e. residual norms may vary dramatically before a stopping
criterion is met. As a result, although biorthogonality methods have been developed essentially
in parallel with CG, they have been far less popular than minimal residual methods that use
only a single Krylov space. Recently, however, there has been a resurgence of interest in these
methods, due to several attempts to stabilize them and make them more robust. We describe one
such algorithm, the quasi-minimal residual algorithm (QMR) developed by Freund and Nachtigal
[31]. Other related techniques include Sonneveld's conjugate gradient squared method [67], Van der
Vorst's BI-CGSTAB method [73], and \transpose free" quasi-minimal residual methods developed
by Chan, de Pillis and Van der Vorst [7] and Freund [28].
The QMR approach addresses the two de ciencies of the Lanczos algorithm, its erratic convergence behavior and the instability associated with orthogonality or near orthogonality of a
pair of Lanzcos vectors v~(j +1) and w~ (j +1). Our description follows [29]. Let us consider the issue
of erratic convergence, assuming for the moment that no breakdown has occurred in the rst k
steps. Conditions (37){(38) imply that the residual generated by the Lanczos iteration satis es

r(k) = r(0) AVk y(k) = Vk+1 kr(0)k2 "1 Gb k y (k)

(40)

where Gb k is the matrix of dimensions (k +1)  k containing Gk in its rst k rows and [0; : : :; 0; k]T
in its last row. Note the similarity to the residual produced by GMRES (34), except that here
Vk+1 is not an orthogonal matrix, so that kr( k)k2 cannot be minimized
inexpensively. The idea


(
k
)
(0)
(
k
)
of quasi-minimality is to choose y so that kr k2 "1 Gk y 2 is minimized instead. As in
the case of GMRES, Gk is an upper-Hessenberg matrix (in fact, it is tridiagonal), and Gk 1 is
the leading principal minor of Gk , so that this minimization problem can be solved inexpensively
from one step to the next.
The iterate x(k) could be computed using (38). However, because the columns of Vk+1 are
not orthogonal, kr(k) k2 cannot be monitored inexpensively. As a result, this update of x(k) will
be costly, of order O(kn) at step k. Fortunately, x(k) can be updated from x(k 1) using short
recurrences.4 Let Q0; : : :; Qk denote the series of plane rotations that transform the subdiagonal
4

The construction outlined here is essentially the one used by SYMMLQ [54].

18

of Gb k to zero, i.e.,

!
R
k
b
Qk    Q1 Q0 Gk = 0 ;

Qk    Q1Q0 (kr(0)k2"1 ) = z(k);

where Rk is an upper triangular matrix of order k containing three nonzero bands. The QMR
coecients are given by y (k) = Rk 1 z^(k) , where z^(k) is the vector of length k containing the rst k
entries of z (k) . Let Pk = Vk Rk 1 = [p(0); p(1); : : :; p(k 1)]. Note that the rst k 1 entries of z (k)
and z (k 1) are identical, and Rk 1 is the leading principal minor of Rk . Consequently,
1) + p(k 1)z^(k) :
k 1

x(k) = x(0) + Pk z^(k) = x(k

(41)

But Pk Rk = Vk , so that p(k 1) can be constructed using a short recurrence involving only p(k 1) ,
p(k 2) and p(k 3) . Therefore, a practical implementation of QMR updates x(k) using (41) and
the ecient computation of p(k 1) .
The modi cation to handle breakdown changes things slightly. It is based on the fact that
in cases where (near) breakdown (v (j ); w(j ))  0 occurs, it still may be possible to construct
alternative sets of vectors fv (j ) ; : : :; v (j +l 1)g and fw(j ); : : :; w(j +l 1)g such that
[w(j ); : : :; w(j +l 1)]T [v (j ); : : :; v (j +l 1)] is nonsingular;
fv(0); : : :; v(j+l 1)g = spanfv(0); Av(0); : : :; Aj+l 1v(0)g;
fw(0); : : :; w(j+l 1)g = spanfv(0); Av(0); : : :; Aj+l 1v(0)g:
That is, for some number l of steps, the biorthogonality condition cannot be imposed, but bases
for two Krylov subspaces can still be constructed. After this, new vectors v (j +l) and w(j +l) can
be found that augment the Krylov spaces and satisfy
(v (j +l); w(j +l)) 6= 0;
v (j +l) is orthogonal to Kj+l (v(0); A); and
w(j +l) is orthogonal to Kj+l (~v(0); AT ):
The result is two sets of vectors fv (j )gkj =01 and fw(j ) gkj =01 , grouped into blocks,
[V0; V1; : : :; VK 1 ] = [vn ; : : :; vn 1 ; vn ; : : :; vn 1 ; : : :; vnK ; : : :; vk 1 ];
[W0 ; W1; : : :; WK 1 ] = [wn ; : : :; wn 1 ; wn ; : : :; wn 1 ; : : :; wnK ; : : :; wk 1];
(where n0 = 0), which satisfy
(
= j;
T
Vi Wj = 0Di ifif ii 6=
j;
0

with Di nonsingular.
The form of the look-ahead Lanczos computation within block r + 1 is

v~(j+1) = Av(j ) Vr Dr 1 WrT Av (j) Vr 1Dr 11 WrT 1 Av(j)


w~ (j+1) = AT w(j ) Wr Dr T VrT AT w(j) Wr 1 Dr T1VrT 1 AT w(j )
19

(42)

if it is determined that j(~v (j +1); w~ (j +1))j  0, or

v~(j +1) = Av (j ) j v (j ) (j =j )v(j 1) Vr 1 Dr 11 WrT 1 Av (j)


w~ (j +1) = AT w(j ) j w(j ) (j =j )w(j 1) Wr 1 Dr T1VrT 1 AT w(j )

(43)

otherwise. The vectors are normalized by v (j +1) = v~(j +1)=j +1 , w(j +1) = w~ (j +1)=j +1 , where
j+1 = kv~(j +1)k2 , j+1 = kw~ (j +1)k2. We refer the reader to [30] for additional details of this
construction, e.g. methods to determine whether (42) or (43) should be used, and de nitions of
j and j . By construction, the vectors satisfy all the conditions of (37) except the last, where Gk
is now an upper-Hessenberg matrix of block tridiagonal form. Consequently, (40) is still satis ed
(with Gb k de ned in an identical manner as above, and quasi-minimality can be imposed in exactly
the same way. The length of the recurrence used to update p(k 1) depends on the number of
look-ahead steps needed.
This process prevents the occurrence of breakdown (39) and near breakdown. Except in
special cases, the computations (42) and (43) will terminate at some step m with either v~(m+1) = 0
or w~ (m+1) = 0, in which case the process has produced an invariant subspace of either A or AT .
If it terminates with v~(m+1) = 0, then x(m) is the exact solution; if it terminates with w~ (m+1) = 0,
then the algorithm must be restarted. The special case where termination never occurs, known as
\incurable breakdown," is unlikely to occur in oating point arithmetic and we will not consider
this issue; see [57, 70].
Some of the important convergence properties of QMR as summarized as follows; see [30, 31]
for proofs.
Theorem 3.2. Let x(k) denote the iterate generated after k steps of QMR, with residual r(k) =
b Ax(k) .
(i) Let Gm denote the upper-Hessenberg matrix generated by the look-ahead Lanczos process, and
assume Gm is diagonalizable, where Xm is the matrix whose columns are the eigenvectors of Hm .
For k < m,
kr(k)k2  kVk+1k2 kX k2 kX 1k2 min
max jk (j )j kr(0)k2 :
2P
k k j 2(A)

(k )
(ii) Let rGMRES
denote the residual produced by k steps of GMRES. Then

p
(k)
(k)
kr(k)k2  kVk+1k2 krGMRES
k2  k + 1 krGMRES
k2:

We brie y contrast some of the properties of the two classes of methods considered in this
section. As we have noted, one of the main di erences between GMRES and the methods based on
biorthogonality is that GMRES imposes a true minimization, but at the cost of storing and using
a set of vectors of increasing size. One other signi cant di erence is in the use of the coecient
matrix. GMRES requires one matrix-vector product per step. The methods considered here,
the nonsymmetric Lanczos method and QMR, require two matrix-vector products, one by A and
one by AT . This will play a signi cant role in evaluating the relative costs of the two classes of
methods. Moreover, it may happen that performing a product with AT is more expensive than
performing one with A. Several other versions of biorthogonalization methods avoid reference
to AT . The conjugate gradient squared [67] and BI-CGSTAB methods [73] require two matrixvector products by A at each step but have no minimization properties. The \transpose free"
20

quasi-minimal residual methods [7, 28] have such a property, but at a cost of three matrix-vector
products by A at each step. We emphasize that this is an area of active research, and no single
method has been demonstrated to be clearly superior.

4. Preconditioning and preprocessing.

For any nonsingular matrix M of order n, the system (1) is equivalent to

M 1 Ax = M 1 f:

(44)

The idea of preconditioning is to apply a Krylov subspace method to (44), the preconditioned
system, with the aim of computing the solution more eciently than if the method is applied
to (1). In this section, we describe some e ective preconditioning techniques for sparse linear
systems, and we describe the construction of the reduced system for matrices with property A.
In order for a preconditioning operator M to be e ective, it must satisfy two criteria:
1. The number of iterations required to solve (44) should be smaller than the number of
iterations needed to solve (1).
2. The cost per iteration of the Krylov subspace method applied to (44) should not be significantly higher than the cost per iteration for (1).
For the rst requirement to hold, M 1 should in some sense be a good approximation to A 1 . For
the second requirement, the cost per step should be low enough so that the total cost (essentially,
the product of the cost per step and the number of steps) of the preconditioned iteration is lower
than if the preconditioning operator is not used. In an implementation, the coecient matrix is
referenced by way of one or more matrix-vector products, so that what is required is that the
action of the inverse of the preconditioner applied to a vector, w M 1 v , be inexpensive to
compute.
Recall that for symmetric positive-de nite problems, the number of iterations required for
convergence of the conjugate gradient method is approximately proportional to the square root
of the condition number of the coecient matrix. Thus, one way of making more precise the
notion that M 1 be a \good approximation" to A 1 is to require (M 1 A)  (A).5 More
generally, the iterates x(k) produced by a Krylov subspace method applied to (1) have errors that
satisfy
e(k) = k (A)e(0);
where k 2 Pk . Therefore
ke(k)k = kk (A)r(0)k  kk (A)k ke(0)k:
The method converges rapidly if kk (A)k is small for small k. If, in particular, A is diagonalizable,
A = X X 1 where  is a diagonal matrix containing the eigenvalues of A on its main diagonal,
then
kk (A)k = kX k kX 1k kk ()k:
Strictly speaking, for preconditioned CG, we really want the preconditioned problem also to be symmetric
positive-de nite. The preconditioned system can be formally represented as C T AC 1 x^ = C 1 f , where M = CC T
is symmetric and x = C T x^. It is not necessary for M to be represented in factored form in an implementation;
see x4.2.
5

21

If, in addition, kX k kX 1k is not too large (for example, kX k2 kX 1k2 = 1 when A is normal
matrix) then kk (A)k is small provided k () is small for  2  (A). Thus, a good approximation
M 1 of A 1 is one for which M 1 A is close to normal and the eigenvalues of M 1A are close to
one. In such a case, it will be easier to nd a polynomial k that is small on  (M 1A) than to
nd one that is small on  (A). In cases where M 1 A is far from normal, new analytic techniques
based on pseudospectra [71, 72] provide insight into properties of preconditioners.
Preconditioning can also be related to splitting operators and stationary methods. Recall
that if A = M N , then the stationary method de ned by (3) or (6) is rapidly convergent if
(M 1N ) = (I M 1 A) is small, i.e., if M 1 A is in some sense close to the identity. Thus,
a good preconditioning operator M is essentially the same thing as a good splitting operator
used to generate stationary iterations. Moreover, consider the modi cation of (6) de ned by
introducing an iteration parameter,

x(k+1) = x(k) + M 1 r(k) :

(45)

The new error satis es e(k+1) = (I M 1 A)k e(k) , and more generally, e(k) = (I M 1 A)k e(0).
If, for an appropriate choice of , (I M 1 A) is smaller than (M 1N ), then convergence of
the stationary method will be accelerated. We could further re ne the iteration by varying the
value of from step to step, i.e., replacing by k in (45). The error at the k'th step then has
the form
e(k) = (I k 1 M 1 A)    (I 0 M 1A)e(0):
This expression is simply a speci c way of specifying a polynomial k 2 Pk , in factored form.
This requires a strategy for choosing the parameters f j g. Krylov subspace methods construct
such \acceleration polynomials" automatically.

4.1. Examples of preconditioners.

In this section we give some examples of preconditioners that can be applied to arbitrary sparse
matrices. These are all of algebraic type, i.e., de ned by means of some algebraic computation
on the elements of the original matrix A.
The (parameterized) incomplete factorization [4, 6, 18, 37, 49]. Incomplete factorization techniques
are motivated by the idea that for very large sparse matrices, where direct solution techniques
based on LU decomposition of A are not viable, it may be possible to compute sparse approximations L and U to the factors of A so that the product represents a reasonable approximation
of A. Let N  f(i; j ) j 1  i; j  ng be an index set containing all diagonal indices (i; i). The
parameterized incomplete factorization based on N has the form M = LU , where L is a lower
triangular matrix, U is a unit upper triangular matrix, and lij = 0 and uij = 0 for (i; j ) 62 N . If
N contains few indices, then L and U are sparse and applying (LU ) 1 will be inexpensive. The

22

factorization is de ned as follows:


Parameterized Incomplete Factorization.
for i = 1 until n do
for j = 1 until n do
i;j ) 1 l u
aij Pmin(
ik kj
k=1
if (i; j ) 2 N then
if (i  j ) then lij
sij
if (i < j ) then uij
sij

sij

else

lii

endif
enddo

lii + sij

uii

1
for j = i + 1 until n do

uij

enddo
enddo

uij =lii

The preconditioning matrix


P M satis es A = M R, where rij = 0 for o -diagonal indices
(i; j ) 2 N , and rii = j 6=i rij . The special case of = 0 corresponds to the incomplete LU
(ILU) factorization of A, of Meijerink and van der Vorst [49], for which [LU ]ij = aij for all
(i; j ) 2 N . The choice = 1 corresponds to the modi ed incomplete LU factorization (MILU)
of Gustafsson [37], which generalizes the ideas in [6, 18]. Possibilities for sets of nonzero indices
include the index set for which A is nonzero, and, more generally, the \k-level" index set, where
indices f(i; j ) j 1  i; j  ng in which ll-in is permitted are assigned levels as follows [76].
 All indices corresponding to original nonzero entries of A are assigned level 0, and all other
indices are initialized to level 1.
i;j ) 1 l u is a potential entry
 During the course of the factorization, if sij = aij Pmin(
ik kj
k=1
in one of the factors L, U , then

level(i; j ) = min1kmin(i;j ) 1 [level(i; k) + level(k; j )] + 1:


That is, the level of the index (i; j ) is one plus the smallest sum of levels contributing to
the entry with that index. The set Nm is then taken to be those indices (i; j ) for which the
level is less than some prede ned level m.
In the cases = 0 or = 1, we refer to this as the ILU(k) and MILU(k) factorization, respectively.
Incomplete factorization with drop tolerance. [50, 81] A variant of these techniques is to base the
ll-in on the sizes of the elements participating in the factorization. Suppose A(k) is the result
of performing k steps of incomplete Gaussian elimination, with A(0) = A. Let Ab(k) denote the
matrix produced by replacing with zero all entries of A(k) that have absolute value less than
some speci ed tolerance  . A(k+1) is then produced by formally applying a step of Gaussian
23

elimination to Ab(k) . An implementation is as follows:


Parameterized Incomplete Factorization with Drop Tolerance.
for i = 1 until n do
for j = 1 until n do
i;j ) 1 l u
sij aij Pmin(
ik kj
k=1
m = min(i; j )
if sij   then
if (i  j ) then lij
sij
if (i < j ) then uij
sij

else

lii

lii + sij

endif
enddo

uii

1
for j = i + 1 until n do

uij

enddo
enddo

uij =lii

Values of  in the interval [10 4; 10 2] have been used successfully [81].


SSOR preconditioning and SOR preconditioning. These strategies use the SOR splitting matrix

M = (D !L);
and the SSOR splitting matrix

M = (D !L)D 1(D !U );
respectively. Note that preconditioning is not a ected by multiplication by a scalar, whence the
di erence between these operators and those of (7) and (10). When A is symmetric and D is
positive de nite, the SSOR preconditioner can be applied so that the preconditioned problem is
symmetric. In contrast, the SOR operator has not been a popular choice of a preconditioner, in
part because the SOR-preconditioned operator cannot be made symmetric. However, we have
found it to be e ective for some nonsymmetric problems [21, 22, 23]; see x5.
All of these methods share the advantage of being very general, in the sense that they can be
de ned for arbitrary matrices. For the incomplete factorizations, it is necessary that the pivot
elements lii be nonzero. This can be established for = 0 when A is an M -matrix (aij  0
for i 6= j and A 1  0) [49], but in general, the size of the pivot element must be monitored
during the course of the factorization. We know of no study demonstrating a clear advantage of
either of the strategies for incomplete factorization; an advantage of the strategy based on level
of ll is that the storage requirements of the factors can be determined symbolically, whereas
the method based on drop tolerance requires the numerical computation. The advantage of the
SSOR and SOR preconditionings is that they are well-de ned as long as the (block) diagonal of
A is nonsingular. The e ectiveness of of all of these preconditioners for certain model problems
arising from partial di erential equations is well understood; see x5. In this regime, they are often
24

not the optimal choices; methods more closely based on properties of the di erential operators
may produce better asymptotic results, e.g. [11].

4.2. Some Implementation issues.

As noted above, the preconditioned conjugate gradient method (PCG) can be implemented
without explicit reference to factors of M . An implementation is given below; see [13, 33].
It is straightforward to show that if M = CC T , then this scheme is equivalent to solving
C T AC 1x^ = C 1 f by CG, where x(k) = C 1 x^(j).
The Preconditioned Conjugate Gradient Method.
Choose x(0) ; compute r(0) = b Ax(0); solve M r~(0) = r(0); set p(0) = r(0)
for k = 0 until convergence do
k = (r(k) ; r~(k))=(p(k); Ap(k))
x(k+1) = x(k) + k p(k)
r(k+1) = r(k) k Ap(k)
<Test for convergence>
Solve M r~(k+1) = r(k+1)
k+1 = (r(k+1); r~(k+1))=(r(k); r~(k))
p(k+1) = r~(k+1) + k p(k)
enddo

The cost associated with preconditioning is then precisely the cost of applying the action of
M 1.6 A similar algorithm can be developed for the symmetric inde nite problem, e.g. using the
orthogonal direction minimum residual method.
For nonsymmetric problems, there is some exibility in how the preconditioned problem may
formulated, with three di erent \orientations" possible:
Left orientation
[M 1 A] [x] = [M 1 f ];
Two-sided orientation [M1 1 AM2 1 ] [M2x] = [M2 1 f ];
(46)
Right orientation
[AM 1 ] [Mx] = [f ]:
Here, the two-sided orientation requires an explicit representation of M as a product M = M1 M2 .
If such a factorization is not available, then only the left and right orientation are possible.
In general, this issue of orientation does not play a signi cant role in the cost per step of any
algorithm. For example, left orientation requires a preconditioned matrix-vector product of the
form

M 1 w1;

w1

Av; w

w1

M 1 v; w

whereas the right orientation requires

Av;

so that the costs of the two computations are the same. However, for Krylov subspace methods
such as GMRES that minimize the Euclidean norm of the residual, the orientation has an e ect.
In all cases, the preconditioned problem can be written as
Abx^ = f;^

If kr(k) k2 is used in the stopping test, then one extra inner product is needed, since in contrast to CG, this
quantity is no longer available as a byproduct of the computation.
6

25

where Ab, x^, and f^ are the quantities in brackets of (46). Thus, if x^(k) is the kth iterate, then
kr^(k)k2 is minimized over the space Kk (^r(0); Ab). Each of the orientations has some potential
advantages.
1. The quantity minimized for the right oriented system is the norm of the residual of the
original system (1), i.e., kr^(k) k2 = kb Ax(k) k2 , where x(k) = M 1 x^(k).
2. If M is a good preconditioner, in the sense that M 1 A  I , then for the left-oriented preconditioner, r^(k) = M 1 Ae(k)  e(k) , so that kr^(k)k2 may represent a good approximation
to ke(k) k2.
3. For some nonsymmetric problems, it is possible to construct symmetric preconditioners for
which there is a more robust analysis for the two-sided orientation than exists for the other
two orientations. (See [24].)
When convergence analysis of the type alluded to in item (3) is not possible, we have preferred
the right orientation, and its property of minimizing the residual norm associated with (1).

4.3. Preprocessing.

Finally, we mention a technique for preprocessing, which, although not strictly speaking a prepreconditioner like those of the rest of this section, may signi cantly improve the performance of
iterative methods. Suppose the system (1) has point Property A, i.e., it is of the form

D1 C1
C2 D2

u1
u2

= bb1 ;
2

(47)

where D1 and D2 are diagonal matrices. Premultiplication by

I
0
1
C2 D1 I
produces the equivalent problem

D1

0 D2

C1
C2D1 1 C1

u1
u2

b2

b1
C2D1 1b1 :

This is the same thing as performing one step of block Gaussian elimination, and u2 is then the
solution of the reduced system
(D2 C2 D1 1 C1)u2 = b2 C2D1 1 b1:
The decoupled unknowns u1 can be recovered by solving D1u1 = b1 C1u2 . If C1 and C2 are
sparse, then the preprocessing step has modest cost, and the reduced matrix D2 C2 D1 1 C1 will
also be sparse, so that the reduced system can also be solved using a preconditioned Krylov space
method. We will consider examples of this process in x5.

5. Model Problem Analysis.


26

One of the main problem classes for which iterative methods have been developed is that of
discrete second order elliptic boundary value problems. A model problem is
u + ux + uy = f on
;

u = g on @
;

(48)

where
is a domain on R2 . If at least one of  of  is nonzero, then the di erential operator
of (48) is non-self-adjoint, and discretization leads to a nonsymmetric linear system (1) (where x
now represents a discrete vector). If both  = 0 and  = 0, then we have the Poisson equation
u = f;

(49)

which gives rise to a symmetric positive-de nite linear system. In this section, we outline some
convergence results for the methods discussed above as they are applied to discretizations of these
model problems.
Consider (49) rst. Suppose this problem is discretized by nite di erences or piecewise linear
nite elements on a uniform m  m grid in
, and the grid points are ordered using a natural leftto-right, bottom-to-top ordering. The result is a linear system with a block diagonal coecient
matrix
A = tri [ Im ; T; Im ]
(50)
of block order m, where
T = tri [ 1; 4; 1 ]:
of order m. Hence A has order n = m2 . See [74] for additional details. It can be veri ed by
direct computation that A has a set of orthogonal eigenvectors

fv(j)gnj=1 = fv(s+(t

1)m) g
1s;t;m

where for 1  j; k  m, the entry of the eigenvector in index j + (k 1)m is


  tk 
vj(s+(+(kt 1)1)mm) = sin msj
+ 1 sin m + 1 :

The corresponding eigenvalue is

s+(t

s
1)m = 4 2 cos m + 1

(51)

2 cos mt+ 1 :

(52)

Let h = 1=(m + 1). Consider the point Jacobi splitting operator BJpt = D 1 (L + U ), where
D is the diagonal of A, and L and U are the strict lower and upper triangular parts of A. Then
the vectors de ned by (51) are also the eigenvectors of BJpt, with eigenvalues
1 cos  s  + 1 cos  t  ; 1  s; t;  m:
2
m+1 2
m+1
The maximum corresponds to s = t = 1, so that the spectral radius of the Jacobi operator is

(BJpt) = cos(h)  1  2h2 =2:


27

Moreover, A is consistently ordered, so that Theorem 1.3 holds. (The ordering sets Sk are de ned
by grouping diagonal sets of grid points, as in x1.2.) Therefore, for the point Gauss-Seidel method,
pt ) = ((B pt))2  1  2 h2 ;
(BGS
J

and for the point SOR method,

!   1 +2h  2 2h;

(L! )  1 2h:

In light of (5), we see that the number of iterations required for the point Jacobi iteration to
converge is approximately proportional to 2m2 = 2, and the point Gauss-Seidel iteration requires
on the order of m2 = 2 (half as many) steps. SOR with !  is dramatically faster, with iteration
counts proportional to m=(2 ).
Now consider the line Jacobi splitting A = D L U , where

0D
1
B
...
D=@

Dm

1
CA

with Dj = T for each m, and L and U are the remaining lower and upper triangular parts of
A. It can be veri ed that the vectors of (51) are also the eigenvectors of the line Jacobi iteration
matrix BJline = D 1 (L + U ), with

BJline v (s+(t 1)m) =

 2 cos(t=(m + 1)) 
(s+(t
4 2 cos(s=(m + 1)) v

1)m):

Again, the maximum occurs with s = t = 1, so that

h)  1 12  2h2  1  2h2 :


(BJline ) = 2 cos(
cos(h) 1 + 1  2h2
2

A is block consistently ordered (with ordering sets Sk = fkg, 1  k  m), so that


line )  1 2 2h2 ;
(BGS

and

p
!   1 +2h  2 2 2 h;

p
(Lline
! )  1 2 2 h:

Thus, the line Jacobi and Gauss-Seidel methods require roughly half asp many iterations as the
point versions of these methods, and the line SOR method requires 1= 2 as many steps as the
point version. The computational costs of line methods are virtually identical to those of point
methods.
Next, we consider the behavior of CG and PCG applied to (49). The smallest and largest
eigenvalues of A correspond to the cases s = t = 1 and s = t = m in (52), giving
 (A)  2 2h2 ;  (A)  8; and (A)  4 :
min

min

 2 h2

28

Consequently, from (22), the number of iterations required for convergence (in the A-norm) is
proportional to 2m= . Asymptotically, this convergence behavior is qualitatively similar to that
of SOR with the optimal value of ! . For more general problems, SOR would require an estimate
for a good value of ! , or one would have to be constructed adaptively; in contrast CG requires
no such parameter estimate.
Convergence of CG can be speeded up using preconditioning. Let us consider three preconditioners: the ILU and MILU factorizations ( = 0 and = 1 in the parameterized incomplete
factorization), and the SSOR factorization. Let the level{0 nonzero patterns be used for the ILU
and MILU factorizations, i.e. no ll-in is permitted in the sparse factors. These techniques have
the following characteristics:
 The condition number for the ILU factorization grows like O(m2), the same asymptotic
behavior as for A [9]. However, the eigenvalues of the ILU preconditioned matrix are more
tightly clustered than those of A [49].
 When the natural ordering is used for A, the condition number for the MILU factorization
grows like O(m)[18; 37]:7
 When the natural ordering is used for A, there exists a parameter !1 such that with
the SSOR preconditioning M = (D !1 L)D 1 (D !1 U ), the condition number of the
preconditioned system grows like O(m) [78].
Thus, with thepMILU and SSOR preconditioners, the number of iterations required by PCG is
reduced to O( m). An advantage of the MILU method is that it is essentially independent of
parameters. See [8] for a systematic treatment of this class of preconditioners for problems with
periodic boundary conditions.
Now let us turn to the more general problem (48). When this equation is discretized by nite
di erences on a uniform m  m grid, the resulting coecient matrix has the form
A = tri [ bIm ; T; eIm ]
(53)
of block order m, where
T = tri [ c; a; d ];
of order m, and a, b, c, d, and e are scalars, which we will assume here to be nonnegative. A set
of linearly independent eigenvectors of A is then

vj(s+(+(kt 1)1)mm) =
with eigenvalues

 c 1=2
d

 sj   b 1=2  tk 


sin m + 1 e
sin m + 1 ;


(54)

s
1=2
2 (be)1=2 cos mt+ 1 :
1)m = a 2 (cd) cos m + 1
Exactly as in the self-adjoint case, we can show that the vectors of (54) are also eigenvectors of
the point Jacobi iteration matrix and of the line Jacobi iteration matrix. Two commonly used
s+(t

We are being slightly inaccurate here. This result has been established when the MILU factors are constructed
for the matrix A + ch2 I , where c is an arbitrary positive constant. In practice, the conditioning of the MILU
preconditioned system grows like O(m) for c = 0 as well, although there is no proof of this.
7

29

nite di erence operators for (48) are the centered di erence and upwind di erence schemes,
which produce the following matrix coecients (after scaling by h2 ):
Centered a = 4
Upwind
a = 4 + h + h
di erences: b = (1 + h=2)
di erences: b = (1 + h)
c = (1 + h=2)
c = (1 + h)
d = (1 h=2)
d = (1 h)
e = (1 h=2)
e = (1 h) .
For both discretizations, as h ! 0 we have

(BJpt)  1 16 + 16 + 2 h2 ;


pt )  1   +  +  2 h2 ;
(BGS
q8  8 
pt
(L! )  1 2 8 + 8 +  2 h;
2

(BJline )  1 8 + 8 +  2 h2 ;
 

line )  1
2 h2 ;
(BGS
+
+
2

q4  4 
line
(L! )  1 2 4 + 4 + 2 2 h:
2

(55)

Note that these results imply that all of the methods display faster convergence for the nonself-adjoint operator (48) than for the Poisson operator. The methodology developed by Parter
[58, 59] and Parter and Steuerwalt [60] provides a systematic way to obtain results of this type, as
well as generalizations to splittings derived from \multiline" orderings of the underlying grid. All
of these results apply in the asymptotic regime, as h ! 0. Cf. the work of Chin and Manteu el
[10] and Elman and Golub [21, 22, 23] for analysis of these types of methods that apply in the
nonasymptotic regime.
The behavior of preconditioned iterative methods in the nonsymmetric case is less welldeveloped than it is in the symmetric positive-de nite case. This is because it is more dicult to
get useful bounds on the quantities referenced in Theorems 3.1 and 3.2 than it is to bound the
condition number in the symmetric positive-de nite case. However, using the analysis of splitting
operators as in (55), we can provide some heuristic justi cation of the e ectiveness of incomplete
factorizations applied to discretizations of (48), as follows.
Theorem 5.1. Suppose the matrix A derived from discretizing (48) is an M -matrix. Let A =
M R, where M = LU is the ILU(0) factorization of A, and let BJline denote the line Jacobi
iteration matrix. Then (M 1 R)  (BJline).
The proof is based on the fact that for M -matrices, augmenting the index set in which nonzeros
are permitted in an incomplete factorization improves the quality of the incomplete factorization.
In this case, the (factorization of the) block diagonal matrix D of the line Jacobi splitting can be
regarded as an incomplete factorization of A whose nonzero index set is contained in the nonzero
set associated with the ILU(0) factorization. See [5, 23] for details. A consequence of this result
is that the eigenvalues of M 1 A = I M 1 R are contained in a circle centered at 1 with radius
bounded by (BJline).
We conclude this section with a discussion of using the reduced system preprocessing step
of x4.3 to solve the discrete problems under consideration here. First, by using a red-black
ordering, the linear systems (50) and (53) can be permuted into the form (47), so that the reduced
system can be constructed explicitly. Let S = D2 C2D1 1 C1 denote the reduced matrix. It is
straightforward to show that most rows of S contain nine nonzero entries, i.e., S corresponds to
a nine-point operator on the reduced grid. Moreover, the rows and columns of S can be ordered
in such a way that S has block Property A. Examples of two such orderings derived from a
30

six-by- ve grid are shown below. The gures show the unknowns of the reduced system, with
\: " representing the decoupled unknowns. In the gure on the left, if all unknowns lying on the
same diagonal grid line are grouped together, the result is a \one-line" ordering of the reduced
grid. In the gure on the right, all unknowns lying on successive pairs of lines of the reduced grid
are grouped together, producing a \two-line" ordering.


6

2


11

5

1


10

4


14

9

3


13

8


15

12

7


7

1


13

8

2


9

3


14

10

4


11

5


15

12

6

In both cases, the reduced matrix corresponding to these line orderings has a block tridiagonal
structure
S = tri [ Sj;j 1 ; Sjj ; Sj;j +1 ]:
For the one-line orderings, most diagonal blocks Sjj have tridiagonal form, and for the two-line
orderings, most diagonal blocks have pentadiagonal form. In either case, S has block Property
A, and it is possible to de ne the block Jacobi splitting S = D L U where D is the block
diagonal matrix consisting of the block diagonal entries Sjj , and L and U come from the lower
triangle and upper triangle of S , respectively. The block Gauss-Seidel and SOR operators are
de ned in an analogous manner.
Note that this procedure applies to both symmetric and nonsymmetric problems. Results in
[21, 22, 23, 40, 60] indicate that it is more ecient to solve the reduced system than the original
system, using line iterative methods whose costs per step are comparable. For example, it is
shown in [40] that if A is an M -matrix, then the spectral radius of the two-line Jacobi iteration
matrix for the reduced system is smaller than the spectral radius of the two-line Jacobi iteration
matrix for the original system.8 In view of Theorem 1.2, analogous statements apply for the
block Gauss-Seidel and SOR methods. We cite two results from [22] that show how the block
Jacobi operators behave for the convection-di usion equation.
h
Theorem 5.2. For the centered nite di erence discretization, if j h
2 j < 1 and j 2 j < 1, then the
spectral radius of the one-line Jacobi iteration matrix is bounded by

 h 2 r
2

+ 1

 h
2

 h 2 r

 h 2 !2

s

 h 2

2
!
2 2

+ 1

+2

 h 2
2

(1 cos(h))

For the centered di erence scheme, if j h2 j < 1 and j h2 j < 1, then the spectral radius of the
Indeed, the result is for k-line methods, k  2.
8

31

two-line block Jacobi iteration matrix for the reduced system is bounded by

(1q (h=2)2) cos 2h


2)2)(1 (h=2)2) cos h
q + 2 (1 (h=
+ o(h2):
2
2
2
2
[8 q( 1 (h=2) + 1 (h=2) ) (1 (h=2) ) +
2 (1 (h=2)2)(1 (h=2)2) (1 cos h) + 2(1 (h=2)2) (1 cos2 h)]
It can be shown that the bound for the two-line ordering is smaller than that for the one line
ordering, and that these bounds become very small as h=2 and h=2 tend to 1.

6. Other Topics.

We conclude by brie y mentioning some important topics that we have not addressed. These
include:
1. Adaptive methods. In discussing Krylov subspace methods, we have restricted our attention
to techniques that do not require estimates of the eigenvalues of A. Alternative methods,
such as the Chebyshev algorithm for symmetric [35] and nonsymmetric [46, 47] problems,
and various \hybrid" methods that combine CG-like methods with adaptive strategies (see
[51] for a complete set of references), make use of eigenvalue estimates. The disadvantages of
such techniques are that the costs of estimating eigenvalues may be high, and convergence
may be slow if inaccurate estimates are obtained; they are also somewhat more dicult
to program than CG-like methods. They have the advantage, however, of requiring less
work per step than CG-like methods, and they also tend not to depend as much on inner
products, which is useful in the context of parallel computations. (See below.)
2. Problem-based preconditioners. For preconditioners, we have only considered techniques
that can be de ned for arbitrary matrices. This has the advantage of being of broad applicability. However, it is often possible to take advantage of properties of particular problems to produce better preconditioners. For example, in the context of partial di erential
equations, techniques such as alternating direction implicit methods [74], block incomplete
factorization [12], domain decomposition [44] and multilevel methods [39, 48] produce preconditioners that often (especially for self-adjoint problems) lead to faster convergence than
the techniques of x4.
3. Ordering E ects. There are many issues associated with the ordering of the rows and
columns of the coecient matrix that a ect the performance of iterative methods, especially when preconditioners are used. These include e ectiveness of incomplete factorization
methods and eciency of parallel implementations. See e.g. [1, 17, 20] for some discussions
of these issues.
4. Parallel Computations. As for all numerical computations, the ecient implementation of
iterative methods on large-scale parallel computers introduces many new concerns. For
example, as noted above, the inner products required by CG-like methods may present
a diculty on parallel architectures. For vectors of length n, at least O(log n) steps are
required for the parallel computation of an inner product, whereas it may be that all
other computations can be done in O(1) time. Similarly, in the context of discrete partial
32

di erential equations, red-black and \multi-color"orderings are more conducive to ecient


parallel implementation, but (for preconditioning) they lead to slower convergence. The
study of issues of these types, and the e ects of computer architecture, is an area of active
research; see e.g. [16].

Acknowledgements. I thank Michael Chernesky, Dianne O'Leary, and Xuejun Zhang for some
editorial remarks. Xuejun Zhang provided the proof of Theorem 1.2, (i).

References
[1] L. M. Adams and H. J. Jordan. Is SOR color blind? SIAM J. Sci. Stat. Comput., 7:490{506,
1986.
[2] S. F. Ashby, T. A. Manteu el, and P. E. Saylor. A taxonomy for conjugate gradient methods.
SIAM J. Numer. Anal., 27:1542{1568, 1990.
[3] O. Axelsson. Solution of linear systems of equations: iterative methods. In V. A. Barker,
editor, Sparse Matrix Techniques, pages 1{51. Springer-Verlag, New York, 1976.
[4] O. Axelsson and G. Lindskog. On the rate of eigenvalue distribution of a class of preconditioning methods. Numer. Math., 48:479{498, 1986.
[5] R. Beauwens. Factorization iterative methods, M-operators and H-operators. Numer. Math.,
31:335{357, 1979.
[6] N. I. Buleev. A numerical method for the solution of two-dimensional and three-dimensional
equations of di usion. Math. Sb., 51:227{258, 1960.
[7] T. F. Chan, L. de Pillis, and H. A. Van der Vorst. A Transpose-Free Squared Lanczos
Algorithm and Applications to Solving Nonsymmetric Linear Systems. Technical report,
Department of Mathematics, UCLA, 1991.
[8] T. F. Chan and H. C. Elman. Fourier analysis of iterative methods for elliptic problems.
SIAM Review, 31:20{49, 1989.
[9] R. Chandra. Conjugate Gradient Methods for Partial Di erential Equations. PhD thesis,
Yale University, Department of Computer Science, 1978.
[10] R. C. Y. Chin and T. A. Manteu el. An analysis of block successive overrelaxation for a
class of matrices with complex spectra. SIAM J. Numer. Anal., 26:564{585, 1988.
[11] P. Concus and G. H. Golub. Use of fast direct methods for the ecient numerical solution
of nonseparable elliptic equations. SIAM J. Numer. Anal., 10:1103{1120, 1973.
[12] P. Concus, G. H. Golub, and G. Meurant. Block preconditioning for the conjugate gradient
method. SIAM J. Sci. Stat. Comput., 6:220{252, 1985.
[13] P. Concus, G. H. Golub, and D. P. O'Leary. A generalized conjugate gradient method for the
numerical solution of elliptic partial di erential equations. In J. R. Bunch and D. J. Rose,
editors, Sparse Matrix Computations, pages 309{332. Academic Press, New York, 1976.
33

[14] J. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue
Comuptations, Volume I, Theory, Volume II, Programs. Birkhauser, Boston, 1986.
[15] J. E. Dennis, Jr. and K. Turner. Generalized conjugate directions. Linear Algebra Appl.,
88/89:187{209, 1987.
[16] J. J. Dongarra, I. S. Du , D. C. Sorensen, and H. A. van der Vorst, editors. Solving Linear
Systems and Vector and Shared Memory Computers. SIAM, Philadelphia, 1990.
[17] I. S. Du and G. A. Meurant. The e ect of ordering on preconditioned conjugate gradients.
BIT, 29:635{657, 1989.
[18] T. Dupont, R. P. Kendall, and H. H. Rachford Jr. An approximate factorization procedure
for solving self-adjoint elliptic di erence equations. SIAM J. Numer. Anal., 5:559{573, 1968.
[19] S. C. Eisenstat, H. C. Elman, and M. H. Schultz. Variational iterative methods for nonsymmetric systems of linear equations. SIAM J. Numer. Anal., 20:345{357, 1983.
[20] H. C. Elman and Agron. Ordering techniques for the preconditioned conjugate gradient
method on parallel computers. Computer Physics Communications, 53:253{269, 1989.
[21] H. C. Elman and G. H. Golub. Iterative methods for cyclically reduced non-self-adjoint
linear systems. Math. Comp., 54:671{700, 1990.
[22] H. C. Elman and G. H. Golub. Iterative methods for cyclically reduced non-self-adjoint
linear systems, II. Math. Comp., 56:215{242, 1991.
[23] H. C. Elman and G. H. Golub. Line iterative methods for cyclically reduced convectiondi usion problems. SIAM J. Sci. Stat. Comput., 13:339{363, 1992.
[24] H. C. Elman and M. H. Schultz. Preconditioning by fast direct methods for nonselfadjoint
nonseparable elliptic problems. SIAM J. Numer. Anal, 23:44{57, 1986.
[25] V. Faber and T. A. Manteu el. Necessary and sucient conditions for the existence of a
conjugate gradient method. SIAM J. Numer. Anal, 21:352{362, 1984.
[26] V. Faber and T. A. Manteu el. Orthogonal error methods. SIAM J. Numer. Anal, 24:170{
187, 1987.
[27] R. Fletcher. Conjugate gradient methods for inde nite systems. In G. A. Watson, editor,
Numerical Methods Dundee 1975, pages 73{89. Springer-Verlag, New York, 1976.
[28] R. Freund. A Transpose-Free Quasi-Minimal Residual Algorithm for Non-Hermition Linear
Systems. Technical Report 91-18, RIACS, NASA Ames Research Center, 1991.
[29] R. Freund, G. H. Golub, and N. M. Nachtigal. Iterative Solution of Linear Systems. Technical
Report NA-91-05, Stanford University, Numerical Analysis Project, 1991. To appear in Acta
Numerica.

34

[30] R. Freund, M. H. Gutknecht, and N. M. Nachtigal. An Implementation of the Look-Ahead


Lanczos Algorithm for Non-Hermition Matrices. Technical Report 91-09, RIACS, NASA
Ames Research Center, 1991. To appear in SIAM J. Sci. Stat. Comput.
[31] R. Freund and N. M. Nachtigal. QMR: a Quasi-Minimal Residual Method for Non-Hermitian
Linear Systems. Technical Report 90-51, RIACS, NASA Ames Research Center, 1990.
[32] G. H. Golub and J. E. de Pillis. Toward an e ective two-parameter SOR method. In D. R.
Kincaid and L. J. Hayes, editors, Iterative Methods for Large Linear Systems, pages 107{119.
Academic Press, San Diego, 1990.
[33] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University
Press, Baltimore, second edition, 1989.
[34] G. H. Golub and D. P. O'Leary. Some history of the conjugate gradient and Lanczos algorithms: 1948-1976. SIAM Rev., 31:50{102, 1989.
[35] G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation
iterative methods, and second order Richardson iterative methods. Numer. Math., 3:147{156
(Part I), 157{168 (Part II), 1961.
[36] A. Greenbaum. Comparison of splittings used with the conjugate gradient algorithm. Numer.
Math., 33:181{194, 1979.
[37] I. Gustafsson. A class of rst order factorizations. BIT, 18:142{156, 1978.
[38] M. H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related
algorithms, Part I. SIAM. J. Matr. Anal. Appl., 13:594{639, 1992. Part II, to appear in
SIAM. J. Matr. Anal. Appl.
[39] W. Hackbusch. Multi-Grid Methods and Applications. Springer-Verlag, Berlin, 1985.
[40] L. A. Hageman and R. S. Varga. Block iterative methods for cyclically reduced matrix
equations. Numer. Math., 6:106{119, 1964.
[41] L. A. Hageman and D. M. Young. Applied Iterative Methods. Academic Press, New York,
1981.
[42] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems.
Journal of Reasearch of the National Bureau of Standards, 49:409{435, 1952.
[43] W. Joubert. Lanczos methods for the solution of nonsymmetric systems of linear equations.
SIAM. J. Matr. Anal. Appl., 13:926{943, 1992.
[44] D. E. Keyes, T. F. Chan, G. Meurant, J. S. Scroggs, and R. G. Voigt, editors. Domain
Decomposition Methods for Partial Di erential Equations. SIAM, Philadelphia, 1992.
[45] D. G. Luenberger. The conjugate residual method for constrained minimization problems.
SIAM. J. Numer. Anal., 7:390{398, 1970.
35

[46] T. A. Manteu el. The Tchebychev iteration for nonsymmetric linear systems. Numer. Math.,
28:307{327, 1977.
[47] T. A. Manteu el. Adaptive procedure for estimation of parameters for the nonsymmetric
Tchebychev iteration. Numer. Math., 31:187{208, 1978.
[48] S. F. McCormick, editor. Multigrid Methods. SIAM, Philadelphia, 1987.
[49] J. A. Meijerink and H. A. van der Vorst. An iterative solution method for linear systems of
which the coecient matrix is a symmetric m-matrix. Math. Comp., 31:148{162, 1977.
[50] N. Munksgaard. Solving sparse symmetric sets of linear equations by preconditioned conjugate gradients. ACM Trans. Math. Soft., 20:206{219, 1980.
[51] N. M. Nachtigal, l. Reichel, and L. N. Trefethen. A hybrid GMRES algorithm for nonsymmetric linear systems. SIAM J. Matr. Anal. Appl., 13:796{825, 1992.
[52] N. M. Nachtigal, S. C. Reddy, and L. N. Trefethen. How fast are nonsymmetric matrix
iterations. SIAM J. Matr. Anal. Appl., 13:778{795, 1992.
[53] J. Ortega. Numerical Analysis: A Second Course. Academic Press, New York, 1972.
[54] C. C. Paige and M. A. Saunders. Solution of sparse nde nite systems of linear equations.
SIAM. J. Numer. Anal., 12, 1975.
[55] B. Parlett. Reduction to tridiagonal form and minimal realizations. SIAM. J. Matr. Anal.
Appl., 13:567{593, 1992.
[56] B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cli s, New
Jersey, 1980.
[57] B. N. Parlett, D. R. Taylor, and Z. A. Liu. A look-ahead Lanczos algorithm for unsymmetric
matrices. Math. Comp., 44:105{124, 1985.
[58] S. V. Parter. On estimating the \rates of convergence" of iterative methods for elliptic
di erence operators. Trans. Amer. Math. Soc., 114:320{354, 1965.
[59] S. V. Parter. Iterative methods for elliptic problems and the discovery of \q". SIAM Review,
28:153{175, 1986.
[60] S. V. Parter and M. Steuerwalt. Block iterative methods for elliptic and parabolic di erence
equations. SIAM J. Numer. Anal., 19:1173{1195, 1982.
[61] T. Rivlin. Chebyshev Polynomials: From Approximation Theory to Algebra and Number
Theory. John Wiley & Sons, New York, second edition, 1990.
[62] Y. Saad. Variations of Arnoldi's method for computing eigenelements of large unsymmetric
matrices. Linear Algebra Appl, 34:269{295, 1980.
[63] Y. Saad. The lanczos biorthogonalization algorithm and other oblique projection methods
for solving large unsymmetric systems. SIAM J. Numer. Anal., 19:485{506, 1982.
36

[64] Y. Saad. Iterative solution of inde nite symmetric linear systems by methods using orthogonal polynomials over two disjoint intervals. SIAM J. Numer. Anal., 20:784{811, 1983.
[65] Y. Saad and M. H. Schultz. Conjugate gradient-like algorithms for solving nonsymmetric
linear systems. Math. Comp., 44:417{424, 1985.
[66] Y. Saad and M. H. Schultz. GMRES: A generalized minimual residual algorithm for solving
nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7:856{869, 1986.
[67] P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM J.
Sci. Stat. Comput., 10:36{52, 1989.
[68] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer-Verlag, New York,
1980.
[69] D. B. Szyld and O. B. Widlund. Variational Analysis of Some Conjugate Gradient Methods.
Technical Report CS-1989-28, Department of Computer Science, Duke University, 1989.
[70] D. R. Taylor. Analysis of the Look Ahead Lanczos Algorithm. PhD thesis, University of
Caliornia at Berkeley, Department of Mathematics, 1982.
[71] L. N. Trefethen. Non-normal matrices and pseudo-eigenvalues. Incomplete draft, 1990.
[72] L. N. Trefethen. Pseudospectra of Matrices. Technical Report 91-10, Oxford University
Computing Laboratory, 1991.
[73] H. A. Van der Vorst. BI-CGSTAB: A fast and smoothly converging variant of BI-CG for the
solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 10:631{644, 1992.
[74] R. S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Cli s, New Jersey, 1962.
[75] H. Walker. Implementation of the GMRES method using Householder transformations.
SIAM J. Sci. Stat. Comput., 9:152{164, 1988.
[76] J. W. Watts III. A conjugate gradient-truncated direct method for the iterative solution of
the reservoir simulation pressure equation. Society of Petroleum Engineers Journal, 21:345{
353, 1981.
[77] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, Oxford, 1965.
[78] D. M. Young. Iterative Solution of Large Linear Systems. Academic Press, New York, 1970.
[79] D. M. Young. A historical overview of iterative methods. Computer Physics Communications, 53:1{17, 1989.
[80] D. M. Young. A historical review of iterative methods. In S. Nash, editor, A History of
Scienti c Computing. Addison-Wesley, Reading, MA, 1990.
[81] Z. Zlatev. Use of iterative re nement in the solution of sparse linear systems. SIAM. J.
Numer. Anal., 19, 1982.
37

S-ar putea să vă placă și