Documente Academic
Documente Profesional
Documente Cultură
Howard C. Elman1
Department of Computer Science
and
Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
elman@cs.umd.edu
This chapter contains an overview of some of the important techniques used to solve linear
systems of equations
Ax = b
(1)
by iterative methods. We consider methods based on two general ideas, splittings of the coecient
matrix, leading to stationary iterative methods, and Krylov subspace methods. These two ideas
can also be combined to produce preconditioned iterative methods. In addition, we outline some
convergence results for using the methods considered to solve two classes of model problems
arising from elliptic partial dierential equations.
In x1, we introduce the basic ideas of stationary iterative methods and consider several particular examples of such methods: the Jacobi, Gauss-Seidel, SOR and SSOR methods. We outline
some results on convergence of these methods, for both general matrices and those with special structure. In x2, we give an overview of Krylov subspace methods for systems where the
coecient matrix is symmetric. These include the conjugate gradient method for symmetric
positive-denite systems, and several generalizatons of this technique for the symmetric indenite case. In x3, we examine the use of Krylov subspace methods for nonsymmetric problems.
This is an active area of current research, and we highlight GMRES, the most popular method
in current use, together with the QMR method, one of several new ideas being studied. In x4, we
present several preconditioning techniques that can be used in combination with Krylov subspace
methods. Our emphasis here is methods such as incomplete factorizations that are dened purely
in terms of the algebraic structure of the coecient matrix. In x5, we outline the convergence
properties of the methods presented for two classes of model problems, the discrete Poisson equation, which is symmetric positive-denite, and the discrete convection-diusion equation, which
is nonsymmetric. Finally, in x6, we present a brief discussion of several important topics that we
have not considered here.
Before proceeding, we introduce several points of notation. We will assume that A is a
nonsingular real matrix of order n. All the methods considered generate a sequence of iterates
x(k) that are intended to converge to x = A 1 b. They all require a stopping criterion that
can be used to determine when the iterate is suciently accurate. We will not address this
question in any detail, except to note that the residual r(k) = b Ax(k) is easily computable; a
commonly used stopping criterion is to require that the relative residual kr(k)k=kbk be smaller
than some tolerance, where k: k is some vector
P norm. Throughout this chapter, we will use (v; w)
to represent the Euclidean inner product nj=1 vj wj , and kv k2 = (v; v )1=2 to denote the Euclidean
1
This work was supported by the U. S. Army Research Oce under grant DAAL-0392-G-0016, and by the
National Science Foundation under grants ASC-8958544 and CCR-8818340.
norm. Many of the methods under consideration compute this norm of the residual, kr(k) k, as
part of the iteration. Essentially all of the results presented here carry over to complex systems
of equations where complex inner products are used in place of real inner products.
1. Stationary Methods.
In this section, we give a brief overview of stationary methods for solving (1). Methods of this
type, such as relaxation methods, were the most widely used examples of iterative methods when
large computers rst became available. (See [79, 80] for a historical perspective.) To some extent,
they are now somewhat less popular than the methods discussed in xx2 and 3, although the ease
with which they can implemented and their uses in the context of preconditioners continue to
make them an important topic of study.
A = M N:
(2)
The problem (1) is then equivalent to Mx = Nx + b. This suggests, for nonsingular M , the
stationary method for constructing a sequence of approximate solutions to (1),
x(k+1) = M 1 Nx(k) + M 1 b:
(3)
Here, x(0) is a (possibly arbitrary) initial guess for the solution. The \classical" Jacobi, GaussSeidel and successive overrelaxation methods [74, 78] are examples of such methods.
Let e(k) = x x(k) denote the error at the k'th step. We say that the method (3) is convergent if lim k!1 e(k) = 0. Note that e(k) = Gk e(0), where G = M 1 N is the iteration matrix.
Consequently, for any consistent matrix norm k k, the error satises
(5)
are block matrices consisting of the block diagonal, strict block lower triangular and strict block
upper triangular parts of A, respectively. Here, each Aij is itself a matrix, and the diagonal
entries Aii are assumed to be square and nonsingular. If all the matrices Aij are square of order
1, then the methods we discuss below are \point" versions; otherwise, they are block versions.
The Jacobi method is dened by choosing M = D and N = (L + U ) in the splitting (2),
producing the iteration matrix BJ = D 1 (L + U ). The Gauss-Seidel method is dened using
M = D L, N = U , giving the iteration matrix BGS = (D L) 1 U . The iterations therefore
have the form
x(k+1) = x(k) + D 1 r(k)
for the Jacobi method, and
x(k+1) = x(k) + (D L) 1r(k):
for the Gauss-Seidel method.
The successive over-relaxation method (SOR) is dened by the splitting
(7)
M = 1 (D !L); N = 1 [(1 ! )D + !U ];
!
where ! 6= 0 is real. The iteration matrix is L! = (D !L) 1 [(1 ! )D + !U ], producing the
iteration
The idea underlying this method is to parameterize the Gauss-Seidel scheme, to which it reduces
for the choice ! = 1.
The symmetric successive over-relaxation method performs a \lower triangular sweep" based
on the splitting (7), followed by an analogous \upper triangular sweep." Specically, let
M1 = !1 (D !L);
M2 = !1 (D !U );
N1 = !1 [(1 ! )D + !U ];
N2 = !1 [(1 ! )D + !L]:
(8)
1
2
1
2
(9)
This iteration can be expressed in terms of a single splitting of the form (2). Using (9), it can be
shown that
x(k+1) = x(k) + (M1 1 + M2 1 + M2 1 AM1 1 )r(k):
But
M = (2 1 ! ) (D !L)D 1(D !U ):
(10)
We present this method in the context of stationary methods for historical reasons. In fact, for
model problems, convergence is actually slower than for SOR [78], and the SSOR splitting is now
primarily used in the context of preconditioning. See Section 4.
There is a large body of analysis of convergence properties of stationary iterative methods.
The texts by Varga [74], Young [78] and Hageman and Young [41] are comprehensive references,
and the general texts by Ortega [53] and Stoer and Bulirsch [68] contain good concise overviews.
Much of the analysis is based on the Perron-Frobenius theory of nonnegative matrices, and we
will not develop this machinery here. We highlight some of the main results below.
Definition. Let A be a square matrix.
A is diagonally dominant if jaii j Pj 6=i jaij j for all i.
A is strictly diagonally dominant if strict inequality holds for each i.
A is irreducible if there is no permutation matrix P such that P T AP has the form
!
A~11 A~12 :
0 A~22
where A~11 is of order p and A~22 is of order q and both p and q are greater than 0.
A is irreducibly diagonally dominant if it is irreducible and diagonally dominant, and strict inequality holds in at least one index.
Theorem 1.2.
(i) If A is either strictly diagonally dominant or irreducibly diagonally dominant, then both the
point Jacobi method and the point Gauss-Seidel method are convergent.
(ii) If BJ 0 (elementwise) and the Jacobi method is convergent ((BJ ) < 1), then the GaussSeidel method is also convergent and (BGS ) < (BJ ) < 1.
(iii) If A is symmetric positive denite, then the Gauss-Seidel method is convergent, and the SOR
and SSOR methods are convergent for ! 2 (0; 2).
(iv) The SOR method is not convergent for ! 62 (0; 2).
To give a
avor of the analysis, we prove assertion (i). Note that strictly diagonally dominant
matrices and irreducibly diagonally dominant matrices are nonsingular [74]. Consider the Jacobi
method. Suppose D 1 (L + U )v = v . This implies that (D L U )v = 0, i.e., AJ () =
D L U is singular. But if A = AJ (1) is either strictly or irreducibly diagonally dominant,
then so is A() for all jj 1. Consequently, it must be that (BJ ) < 1, so that the Jacobi
4
The results of Theorem 1.2 do not say anything about how fast convergence is for any of the
methods considered. For problems with additional structure, it is possible to give more precise
statements about rates of convergence.
Definition. Let A = [Aij ], 1i; j nb , where Aij is a submatrix and Aii is square and nonsingular.
A has block Property A if there is a permutation matrix P such that P T AP has the form
D1 C1 :
C2 D2
(11)
where D1 and D2 are block diagonal matrices whose only nonzero blocks are diagonal blocks of
A.
A is block consistently ordered if the integers 1; : : :; nb can be partitioned into t disjoint sets
fSk gtk=1 such that if Aij 6= 0, then i 2 Sk implies j 2 Sk 1 for j < i and j 2 Sk+1 for j > i.
If the blocks of A are all of size 1 (so that nb = n), then these denitions reduce to \point"
Property A and consistent ordering. The most common examples of matrices with these structures are those arising from discretizations of elliptic and parabolic partial dierential equations.
Some examples are given below; see [53, 68, 74, 78] for details.2
Five-point (in two-dimensions) and seven point (for three-dimensions) nite dierence operators. For example, the left side of the gure below shows a \natural" ordering of a 5 4
grid, and the right side shows how these grid points can be grouped into sets indexed by
superscripts that dene a consistent ordering. A \red-black" ordering produces matrices
with point Property A; for example, list the odd-numbered grid points in the left side of
the gure rst, followed by the even-numbered points.
2
It is not always appreciated that consistent ordering is a more restrictive property than Property A. Any
matrix of the form (11) is consistently ordered, with two sets S1 and S2 determined from the blocking in (11), but
not every matrix that can be permuted into this form is consistently ordered itself. A counterexample [78] is
0 2+c 1 0
1 1
A=B
@ 01 2 +1c 2 +1c 01 CA ;
1
0
1 2+c
which is not consistently ordered but has Property A. (Interchange rows 2 and 3 and columns 2 and 3.) This matrix
is a discretization of the one-dimensional Helmholtz equation u00 + u = 0 with periodic boundary conditions. The
relation (12) does not hold for this matrix.
16
11
6
1
17
12
7
2
18
13
8
3
19
14
9
4
(4)
(3)
(2)
(1)
20
15
10
5
(5)
(4)
(3)
(2)
(6)
(5)
(4)
(3)
(7)
(6)
(5)
(4)
(8)
(7)
(6)
(5)
Linear nite elements on triangles, and bilinear nite elements on two-dimensional quadri-
laterals. On regular grids, grouping of unknowns by lines produces matrices with block
consistent orderings, and line red-black orderings lead to block Property A.
Discretizations of coupled dierential operators together with orderings by grid points often
produce matrices with block Property A, where the size of the blocks is the number of
dierential operators that are coupled together.
For consistently ordered matrices, the following results, gleaned from Young [78], Chapters 5
and 6, show the relationship between the Jacobi and Gauss-Seidel iteration matrices and identify
a good choice for the SOR parameter.
Theorem 1.3. Let A be a consistently ordered matrix.
(i) For each eigenvalue of the SOR iteration operator L! , there is an eigenvalue of the Jacobi
operator BJ such that
( + ! 1)2 = ! 22 :
(12)
Conversely, if is an eigenvalue of BJ , then there is an eigenvalue of L! such that (12) holds.
In particular, (BGS ) = (BJ )2 .
(ii) If (BJ ) is real and (BJ ) < 1, then the choice ! = 1+p12 (B ) minimizes (L! ) with
respect to ! , and (L! ) = ! 1.
We present an elementary proof of (i), due to Golub and de Pillis [32], in the case where A
has the form
!
!
!
!
Ip
M = I 0
0 0
0 M = D L U:
M T Iq
0 I
MT 0
0 0
2
Let M = V W T denote the singular value decomposition of M , i.e., V and W are orthogonal
matrices and is the matrix of singular values 1 2 r 0, where r = max(p; q ).
Then
!
!
!
!T
0
M
V
0
0
V
0
BJ = M T 0 = 0 W
(13)
T 0
0 W :
By symmetrically permuting the rows and columns of the interior matrix on the right of (13), we
nd that BJ is similar to a block diagonal matrix containing the r two-by-two blocks
0 j
j 0
; j = 1; : : :; r;
on its block diagonal, and zeros elsewhere. Therefore, the eigenvalues of BJ are fj g, together
with p + q 2r zeros. Analogously, we have
An alternative methodology that has proven to be a fruitful source of iterative methods is based on
Krylov subspaces. Given a square matrix B and vector v , let Kk (v; B ) spanfv; Bv; : : :; B k 1v g,
the Krylov subspace generated by B with respect to v . Given an initial guess x(0) for (1) with
residual r(0), a Krylov subspace method produces a sequence of iterates of the form
(14)
where v (k) 2 Kk (r(0); A). A very simple example is the rst-order Richardson method x(k+1) =
x(k) + k r(k) , where r(k) = b Ax(k) and k is a scalar. Any iterate of the form (14) satises
x(k) = x(0) + k 1 (A)r(0) where k 1 (t) is a polynomial of degree k 1. Equivalently, the residual
satises r(k) = k (A)r(0) where k (t) is a member of the set
derived from the connection between CG and the Lanczos algorithm. An extensive bibliography
of CG and related methods is given in [34].
Assume A is symmetric and positive-denite. Then the expression (u; Av ) denes an inner
product, and we refer to the associated norm kukA (u; Au)1=2 as the \A-norm." The conjugate
gradient method of Hestenes and Stiefel [42] is dened as follows.
The Conjugate Gradient Method.
Choose x(0); compute r(0) = b Ax(0); set p(0) = r(0)
for k = 0 until convergence do
k = (r(k) ; r(k))=(p(k); Ap(k))
x(k+1) = x(k) + k p(k)
r(k+1) = r(k) k Ap(k)
<Test for convergence>
k = (r(k+1); r(k+1))=(r(k); r(k))
p(k+1) = r(k+1) + k p(k)
enddo
The eectiveness of CG stems from its minimization properties. The choice of the scalar
k is such that the new iterate minimizes the A-norm of the error among all choices along the
\direction vector" p(k) , i.e.
kx x(k+1)kA = min
k
k kx ukA :
u=x( ) +p(
More importantly, the one-dimensional minimization is actually a k-dimensional one: x(k) is the
unique vector in the translated Krylov space x(0) + Kk (r(0); A) for which the A-norm of the error
is minimum. We will prove this, making use of the following lemma.
Lemma 2.1. For any k such that x(k) 6= x, the vectors generated by the conjugate gradient
methods satisfy
(i) (r(k); p(j )) = (r(k); r(j )) = 0; j < k;
(ii) (p(k); Ap(j )) = 0;
j < k;
(1)
(0)
(
k
1)
(iii) spanfr ; r1 ; : : :; r g = spanfp(0); p(1); : : :; p(k 1)g = Kk (r(0); A):
Proof. We prove relations (i) and (ii) simultaneously by induction on k. They are trivially true
for k = 0. Assume they all hold for indices 0; : : :; k 1. For the equalities of (i), we have
(r(k); p(j )) = (r(k 1) ; p(j )) k 1 (p(k 1); Ap(j )):
(15)
If j < k 1, then the induction hypothesis implies that both inner products on the right side
of (15) are 0; if j = k 1, then, from the denition of k 1 , the expression on the right is
(r(k 1); p(k 1)) (r(k 1); r(k 1)), which is zero by the induction hypothesis. To complete the
induction, we use the recurrence for p(j ) , giving
(r(k); p(j )) = (r(k); r(j )) + j 1 (r(k); p(j 1)) = (r(k); r(j )):
8
For assertion (ii), by the recurrences dening p(k) and r(k+1), we have
(p(k) ; Ap(j )) = 1 (r(k) ; r(j +1) r(j )) + (p(k 1); Ap(j )):
k 1
(16)
For j < k 1, the induction hypothesis for (i) and (ii) imply that the right side of (16) is zero.
For j = k 1, using the induction hypothesis for (i) and the denition of k and k 1 , the right
hand expression in (16) is
(k 1)
(k 1)
(k) (k)
(r(k) ; r(k)) (p (k 1); Ap(k 1) ) + ((kr 1); r(k )1) (p(k 1) ; Ap(k 1)) = 0:
(r ; r ) (r ; r )
For assertion (iii), a straightforward inductive argument shows that
spanfr(0); r1(1); : : :; r(k 1)g Kk (r(0); A); spanfp(0); p(1); : : :; p(k 1)g Kk (r(0); A):
By (i) and (ii), each of the sets fr(j )gkj =01 and fp(j )gkj =01 is linearly independent. But Kk (r(0); A)
has dimension at most k, so that all three sets must be identical.
To establish the k-dimensional minimization property of CG, it is convenient to use the
function E (u) (u; Au) 2(b; u). Note that E (u) = (x u; A(x u)) (x; Ax), i.e. E (u) diers
by a constant from the square of the A-norm of the error. Consequently, the error norm and
E (u) are minimized by the same quantities.
Theorem 2.1. The iterate x(k) generated by the conjugate gradient method is the unique member
of x(0) + Kk (r(0); A) for which either (and therefore both) the A-norm of the error is minimum,
or the residual b Ax(k) is orthogonal to Kk (r(0); A).
Proof. By denition, x(k) = x(0) + Pk ak , where Pk = [p(0); : : :; p(k 1)] is the matrix with columns
p(0); p(1); : : :; p(k 1), and ak = (0; 1; : : :; k 1 )T . By Lemma 2.1, (iii), x(k) 2 x(0) + Kk (r(0); A).
To establish the minimizing property of x(k) , let u(k) denote any other vector in x(0) + Kk (r(0); A),
and let v (k) = x(k) u(k) . Thus, v (k) has the form v (k) = Pk bk . We have
E (u(k)) = E (x(k) + v(k)) = E (x(k)) + (v(k); Av(k)) + 2(v(k); Ax(k) b):
But
Using (17), it can be shown, see e.g. [61], that k is a polynomial of degree k, and that
h p
i
p
(18)
k (t) = 12 (t + t2 1)k + (t t2 1)k :
Let = (A) = max(A)=min(A), the condition number of A.
Theorem 2.2. The error e(k) = x x(k) after k steps of the conjugate gradient method satises
p !k
1
1
=
ke(k)kA 2 1 + 1=p ke(0)kA:
0n
11=2
X
kk (A)e(0)kA = @
j2j k (j )2A max jk (j )j ke(0)kA max jk (t)j ke(0)kA:
j =1
t2[a;b]
(19)
Consider the particular choice of k (t) as the scaled and translated Chebyshev polynomial
b + a
k (t) = k b + a 2t
k
:
(20)
b a b a
b a
For t 2 [a; b], the argument in the numerator of (20) lies in [ 1; 1], so that jk (t)j 1=k bb+aa .
But, by (18),
b + a 1 1 + 1=p !k
k b a 2 1 1=p :
(21)
The result follows from (19), (20) and (21).
Suppose our objective is to make the relative error ke(k) kA =ke(0)kA . Using the bound
from Theorem 2.2, it suces for the inequality
p !k
p !k
1
1
=
2
=
2 1 + 1=p = 2 1 1 + 1=p
to hold. Taking
the p
natural logarithm of both sides of the inquality and using the fact that
p
2
=
p
ln(1 1+1= ) 2= for large , this is equivalent to the condition
p
k 12 j ln =2j :
(22)
That is, a bound on the number of iterations required to reach a given stopping criterion is
approximately proportional to the square root of the condition number of the coecient matrix.
This quantity is often much smaller than n.
The bound of Theorem 2.2 can be improved in cases where the eigenvalues of A are clustered
into groups, or where there are a small number of isolated eigenvalues. See [3, 36].
10
Finally, we note that it is possible to construct a variant of the conjugate gradient method
whose kth iterate is the unique member of x(0) + Kk (r(0); A) that minimizes ke(k) kA = kr(k)k2 ,
i.e. the Euclidean norm of the residual. This method is known as the conjugate residual method
(CR). From a computational point of view, the main dierence between CG and CR is that the
scalars required by CR have the form
2
If A is indenite, then the denominator (p(k); Ap(k)) of k computed by CG may be zero and the
algorithm will break down. In practice, the exact value zero typically does not occur, but a very
small value of (p(k) ; Ap(k)) will make the computation unstable. We now show how to avoid this
problem by exploiting the connection between CG and the Lanczos algorithm. This idea gave
rise to the SYMMLQ algorithm developed by Paige and Saunders [54]. In addition, we give a
brief description of a stabilized version of the CR method applicable to indenite systems.
Consider the Lanczos computation for generating orthonormal vectors [14, 33, 56, 77]. Let
(0)
v be a vector such that kv (0)k2 = 1, and let v( 1) = 0. An orthogonal basis for Kk+1(v(0); A)
can be constructed by the recurrence
j+1 v(j +1) = Av (j) j v(j )
j v(j 1) ; 0 j k 1;
(24)
where j = (v (j ); Av (j )) and
j +1 is chosen so that kv (j +1)k2 = 1. Let Vk = [v (0); v (1); : : :; v (k 1)],
and let Tk denote the symmetric tridiagonal matrix
tri [
j ; j ;
j +1 ]; 0 j k 1:
Then (24) is equivalent to the relation
AVk = Vk Tk +
k [0; : : :; 0; v(k)];
(25)
and VkT AVk = Tk . The Lanczos algorithm constructs the orthonormal set fv (j )g and uses the
eigenvalues of Tk as estimates for the eigenvalues of A. Note that the o-diagonal entries of Tk
are uniquely determined up to sign.3
Suppose CG is applied to a (possibly indenite) system, but that the computation does not
break down through step k. The residuals and direction vectors satisfy
A(r(j ) +
j 1
p(j 1)) =
Ar(j ) +
1 + j j 1 r(j ) j j 1 r(j
j 1
j 1
or in matrix form
1) ;
11
kr1(1)k2; : : :; kr(k 1)k2). Postmultiplying (26) by k 1 and letting Vek = Rk k 1 denote the matrix
of normalized residuals leads to the equivalent relation
p
e
e
e
(27)
AVk = Vk Tk k 1 [0; : : :; 0; ~v(k)]:
k 1
But (25) and (27) are identical, so that Vek = Vk , and Tek = Tk . That is, the normalized residuals
generated by CG are precisely the Lanczos vectors. Moreover, the CG iterate x(k) can be recovered
directly from (25). By Theorem 2.1, x(k) is the unique vector in x(0) + Kk (r(0); A) with residual
orthogonal to Kk (r(0); A). That is, x(k) = x(0) + Vk y (k) , where y (k) = (y0(k); : : :; yk(k)1 )T , such that
VkT r(k) = 0. But
r(k) = r(0) AVk y (k) = Vk kr(0)k2"1 Tk y (k)
k yk(k)1 v(k);
(28)
where "1 = [1; 0; : : :; 0]T . Orthogonality is imposed by choosing y (k) to satisfy
0 = VkT r(k) = kr(0)k2"1 Tk y (k):
(29)
Note that by (28) and (29),
kr(k)k2 = k yk(k)1 :
(30)
Let us consider the question of breakdown in CG more closely, following [54]. By the recurrence dening fp(j )g in CG, we have Rk = Pk LTk where
2
3
1 0
66 . . . .
77
.
.
7:
LTk = 66
4
1 k 2 75
1
Equivalently, Vk = Pk L~ Tk , where L~ k = k 1 Lk . Let Dk = PkT APk , the diagonal matrix whose
entries are the denominators of the scalars fj g. Then Tk = VkT AVk = L~ k Dk L~ Tk . That is, the
LDLT factorization of Tk contains a (small or) zero pivot if and only if the CG algorithm (nearly)
breaks down. If A is positive-denite, then so is Tk , and the LDLT factorization is stable. If A
is indenite, then the LDLT factorization may not exist, but it may still be possible to compute
an iterate x(k) 2 x(0) + Kk (r(0); A) with residual orthogonal to Kk (r(0); A). It is only necessary
that y (k) in (29) be computable, i.e., that Tk be nonsingular. In fact, Tk may be singular when A
is indenite, but Tk is the leading principal minor of Tk+1 , so that the eigenvalues of Tk interlace
those of Tk+1 [77]. Therefore, Tk cannot be singular for two consecutive indices k, and it is always
possible to construct a sequence of iterates fx(k) g whose residuals are orthogonal to Kk (r(0); A):
This is the basis of the SYMMLQ method developed by Paige and Saunders [54]. This method
can be specied in a form analogous to CG, where x(k+1) is derived from x(k) by a short recurrence.
We will not present this variant here; cf. [9, 54] and x3.2. Instead, we describe a construction
that has also proven to be useful for developing methods for nonsymmetric problems. Let the
tridiagonal system of (29) be represented as Tk y (k) = d(k). The upper left 2 2 corner of Tk is
0
1 ;
1 1
12
0
Q0 = B
@
Ik
1
CA
is such that Q0 Tk is zero below the diagonal in the rst column. In similar fashion, it is possible
to dene plane rotations Q1; : : :; Qk 1 such that Rk = Qk 1 Q1Q0 Tk is an upper triangular
matrix with three nonzero bands, and (29) is equivalent to the upper triangular system
Rk y(k) = Qk 1 Q1 Q0 d(k):
Rk is nonsingular if and only if Tk is nonsingular, in which case y(k) is easily obtained. Moreover,
because Tk 1 is a leading principal minor of Tk , Rk can be computed from Rk 1 using just one
plane rotation. An ecient implementation of a method equivalent to SYMMLQ using kr(k) k2 as
a stopping criterion is to solve for y (k) (when possible), calculate kr(k) k2 from (30), and compute
x(k) only after the stopping test is satised.
The conjugate residual method minimizes kr(k)k2, which constitutes a norm (of the error)
even if A is indenite. However, CR is is also subject to breakdown, if (r(k) ; Ar(k)) = 0 at any
step. (See (23).) This problem can be xed using the ideas just presented. Relation (25) is
equivalent to AVk = Vk+1 Tbk , where Tbk is the matrix of dimensions (k + 1) k containing Tk in
its rst k rows and [0; : : :; 0;
k] in its last row. Then
kr(k)k2 =
kr(0)k2"1 Tbk y(k)
2 :
(31)
Consequently, the coecients y (k) producing x(k) = x(0) + Vk y (k) with minimal residual norm
can be obtained by minimizing the expression on the right of (31). It can be shown that unless
the exact solution has been obtained at step k 1, Tbk has full rank, so that y (k) can always be
obtained. The same set!of plane rotations discussed above can be used to transform Tbk to upper
b
triangular form R0k . The algorithm based on this analysis is equivalent to the MINRES
method of [54].
Finally, we present another algorithm equivalent to MINRES whose implementation is closer
in form to CG and CR. By an argument identical to the proof of Lemma 2.1, (i), it can be shown
that for CR,
k = (r(k); Ap(k))=(Ap(k); Ap(k)):
An alternative method for generating vectors fp(k) g that are orthogonal with respect to the
A2 -inner product, i.e. (Ap(j ); Ap(k)) = 0 for j 6= k, is based on the recurrence
(32)
where k = (Ap(k) ; A2p(k) )=(Ap(k); Ap(k)), k = (Ap(k) ; A2p(k 1))=(Ap(k 1); Ap(k 1)) (with
13
p(
1) = 0, 0 = 0).
This algorithm is presented in Chandra [9]. Only one matrix-vector product is required at each
step, to compute A2 p(k) from Ap(k). Several variants of this method have been developed that
save work by using the CR update for p(k+1) unless this leads to breakdown, in which case (32)
is used; see [9, 45].
Error analysis for the SYMMLQ and MINRES algorithms can be found in [9, 64, 69].
The conjugate gradient method and its variants have two properties that make them eective:
they are \optimal," in the sense that at the kth step, either an error function is minimized
or a condition of orthogonality is imposed with respect to the k-dimensional space Kk (r(0); A);
and they are inexpensive, requiring a xed length recurrence at each step. It is known that
there are no generalizations of CG for solving arbitrary nonsymmetric systems that have both
these properties [25, 26]. In the past two decades, a large amount of eort has been devoted
to developing eective Krylov subspace methods that retain one of the two properties. That
is, either they retain optimality by allowing the cost per iteration to grow, or they sacrice
optimality but require a small amount of work per step. Both types of methods can be derived
using variants of the Lanczos process for nonsymmetric matrices. In this section, we summarize
some of the important developments along these lines.
Before proceeding, we note that one strategy for solving a nonsymmetric system is simply to
embed it into a symmetric positive-denite one using the normal equations AT Ax = AT b, which
can then be solved using the conjugate gradient method. This idea is generally not favored,
because the condition number of AT A is the square of that of A. Cf. [52] for a discussion of
situations where it may be eective.
Krylov subspace methods derived from optimality criteria were the subject of extensive research
from the late 1970's through the mid-1980's. See [2, 15, 65] for characterizations of these ideas and
a more complete list of references. The main idea is as follows. For general nonsymmetric A, it
is not possible to generate an orthogonal basis for Kk (r(0); A) using short recurrences like (24) or
(32). However, a basis can be generated by a strategy analogous to the Gram-Schmidt procedure,
14
in which all previously constructed vectors fv (0); : : :; v (k 1)g are used in the construction of v (k) .
This basis can then be used to construct an iterate x(k) that satises a \k-dimensional" criterion,
e.g., r(k) is minimized over x0 + Kk (r(0); A) or r(k) is orthogonal to Kk (r(0); A).
The work and storage requirements of such a computation grow like O(kn), which will become
prohibitive for large k. To avoid this diculty, the k-dimensional criterion can be modied, either
by truncating the space or by restarting the algorithm. For example, at step k, the (minimization or orthogonality) condition can be imposed on an m-dimensional subspace of Kk (r(0); A),
where m is independent of k. An example of such a truncated algorithm is Orthomin(m) [19].
Alternatively, the k-dimensional condition can be imposed as long as k m, at which point
the iteration is restarted with x(m) as a new initial guess. This strategy, with a minimization
condition, has been demonstrated to have superior convergence characteristics. The most popular implementation of it is the restarted version of the generalized minimal residual algorithm
(GMRES), developed by Saad and Schultz [66].
GMRES is a generalization of the variants of the conjugate gradient method based on the
connection between CG and the Lanczos algorithm. It replaces the symmetric Lanczos recurrence (24) with the Arnoldi algorithm [77], which, given v (0) with kv (0)k2 = 1, constructs an
orthonormal basis for fv (0); Av (0); : : :; Ak 1 v (0)g as follows:
The Arnoldi Method.
Choose v (0) with kv (0)k2 = 1
for s = 0 until k 1 do
w0(s+1) = Av(s)
for r = 0 until s do
(r)
hrs = (wr(s+1)
1 ;v )
wr(s+1) = wr(s+1)
hrsv (r)
1
enddo
hs+1;s = kws(s+1)k2
v (s+1) = ws(s+1)=hs+1;s
enddo
This computation is analogous to the modied Gram-Schmidt process. (Cf. [75] for an alternative method for generating an orthonormal basis that has superior stability characteristics.)
Let Vk = [v (0); : : :; v (k 1)], and let Hk = [hrs ]; 0 r; s k 1. By construction, Hk is an
upper-Hessenberg matrix, and
(33)
The Arnoldi method for eigenvalues is to use the eigenvalues of Hk = VkT AVk as estimates for
those of A; see [62, 77]. When A is symmetric, Hk reduces to the tridiagonal matrix produced
by the Lanczos algorithm, and (33) is identical to (25).
In a similar manner, the variant of MINRES derived from (31) can be generalized. Given
an initial guess x(0) for the solution to (1), let r(0) = b Ax(0) and v (0) = r(0)=kr(0)k2 . Any
x(k) 2 x(0) + Kk (r(0); A) has the form x(k) = x(0) + Vk y(k). Let Hb k denote the matrix of dimensions
15
(k + 1) k containing Hk in its rst k rows and [0; : : :; 0; hk;k 1] in its last row. Then (33) is
equivalent to AVk = Vk+1 Hb k , and
(34)
kr(k)k2 =
kr(0)k2"1 Hb k y(k)
2 :
(35)
The GMRES method computes x(k) 2 x(0) + Kk (r(0); A) such that kr(k) k2 is minimum. From
(35), the vector of coecients y (k) that produce this iterate can be obtained by minimizing the
expression on the right of (35). As in the symmetric case, this upper-Hessenberg
least squares
!
problem can be solved by transforming Hb k into upper triangular form R0k , where Rk is upper
triangular, using k + 1 plane rotations (which are also applied to kr(0)k2"1 ). Here, Hb k contains
Hb k 1 as a submatrix, so that in a practical implementation, Rk can be updated from Rk 1 .
Moreover, by an analysis similar to that leading to (30), it can be shown that kr(k)k2 can be
obtained at essentially no cost. Hence, a step of the GMRES algorithm consists of constructing
a new Arnoldi vector v (k 1), determining the residual norm of the iterate x(k) that would be
obtained from Kk (r(0); A), and then either constructing x(k) if the stopping criterion is satised,
or proceeding to the next step otherwise.
As we have noted, this computation may become too expensive as k gets large. The restarted
GMRES algorithm, GMRES(m), restarts the computation every m steps, as follows.
The Restarted GMRES Method.
1. Choose x0 , compute r(0) = b Ax(0), v (0) = r(0)=kr(0)k2 .
2. Compute fv (1); : : :; v (m 1)
g using the Arnoldi
algorithm.
(
m
)
(0)
(
m
)
b
3. Compute y such that
kr k2 Hm y
2 is minimum, and x(m) = x(0) + Vm y (m) .
4. Compute r(m) = b Ax(m) . If the stopping criterion is met then stop;
otherwise, set x(0) = x(m) , v (0) = r(0)=kr(0)k2 , and repeat step 2.
We summarize the main convergence properties of GMRES in the following result. See [66]
and [19] for proofs.
Theorem 3.1. Let x(k) denote the iterate generated after k steps of GMRES, with residual
r(k) = b Ax(k) .
(i) The residual is zero (i.e. the exact solution has been obtained) if and only if the Arnoldi
iteration breaks down with hk;k 1 = 0. In particular, the exact solution is obtained in at most n
steps.
(ii) The residual norms satisfy kr(k)k2 = mink 2Pk kk (A)r(0)k2.
(iii) If A is diagonalizable, A = X X 1 where is the diagonal matrix of eigenvalues of A, then
min (M )2
2 1 (M ) (M ) + (R)2
min
max
kr(k)k
!k=2
kr(0)k2:
(36)
Assertions (ii) and (iii) follow from the optimality of GMRES with respect to the residual norm.
Assertion (i) guarantees that GMRES and GMRES(m) will solve any nonsingular problem provided that the dimensions of the Krylov spaces are large enough. This dierentiates GMRES
from many other Krylov subspace methods; for example, Orthomin and the generalized conjugate residual method (GCR) [19] may break down without producing the exact solution if the
symmetric part of A is indenite. See [66] for further details. Assertion (iv) is a consequence of
the fact that when M is positive-denite,
kr(j+1)k
min (M )2
2 1 (M ) (M ) + (R)2
min
max
!1=2
kr(j)k2;
see [19]. We also note that bound (36) holds for restarted GMRES, i.e. when r(k) is the residual obtained after s sets of GMRES(m) computations, r(0) is the initial residual for the rst
GMRES(m) computation, and k = sm.
The Lanczos process was originally dened as a method for reducing an arbitrary nonsymmetric
matrix A to tridiagonal form, using two sets of biorthogonal vectors. (See [33, 56, 77] for details
and additional references.) The symmetric version (26) is a special case. Let v (0) and w(0) be
vectors such that (v (0); w(0)) = 1, and v ( 1) = w( 1) = 0. Then the nonsymmetric Lanczos
process is given by
v~(j +1) = Av (j) j v(j)
j v(j 1) ;
w~ (j +1) = AT w(j) j w(j ) j w(j 1) ;
v (j +1) = j +1 v~(j ); w(j+1) =
j +1w~ (j +1) ;
where j = (w(j ); Av (j )) and
j +1 and j +1 are chosen so that (w(j +1) ; v (j +1)) = 1. Letting
Vk [v (0); : : :; v (k 1)], Wk [w(0); : : :; w(k 1)] and Gk tri [
j ; j ; j +1 ]; 0 j k 1, we
have the following relations:
(37)
The Lanczos algorithm for solving nonsymmetric linear systems uses v (0) = r(0)=kr(0)k2 and
computes x(k) 2 x(0) + Kk (v (0); A) such that r(k) is orthogonal to spanfw(j )gkj =01 . Using the
relations (37), we see that
x(k) = x(0) + Vk y (k) ;
(38)
17
where y (k) is the solution of the tridiagonal linear system Gk y (k) = kr(0)k2"1 of order k. Thus, this
method imposes an orthogonality condition on a pair of spaces of increasing dimension, using two
short term recurrences. It is also possible to derive a variant of this scheme, called the biconjugate
gradient method, that bears essentially the same relation to it as CG does to SYMMLQ; see e.g.
[27, 29, 43, 63] for details.
Unfortunately, there are several diculties with this class of methods. First, there is no
guarantee that they will not break down. See [38, 43, 55] for analysis of this issue. In particular,
it may happen that
(39)
so that
j +1 and j +1 cannot be dened. In practice, such an exact breakdown is unlikely to occur,
but a near breakdown, where (~v (j +1); w~ (j +1)) 0 also leads to numerical instability. Moreover,
empirical studies have shown that even when breakdown or near breakdown does not occur,
convergence behavior is very erratic, i.e. residual norms may vary dramatically before a stopping
criterion is met. As a result, although biorthogonality methods have been developed essentially
in parallel with CG, they have been far less popular than minimal residual methods that use
only a single Krylov space. Recently, however, there has been a resurgence of interest in these
methods, due to several attempts to stabilize them and make them more robust. We describe one
such algorithm, the quasi-minimal residual algorithm (QMR) developed by Freund and Nachtigal
[31]. Other related techniques include Sonneveld's conjugate gradient squared method [67], Van der
Vorst's BI-CGSTAB method [73], and \transpose free" quasi-minimal residual methods developed
by Chan, de Pillis and Van der Vorst [7] and Freund [28].
The QMR approach addresses the two deciencies of the Lanczos algorithm, its erratic convergence behavior and the instability associated with orthogonality or near orthogonality of a
pair of Lanzcos vectors v~(j +1) and w~ (j +1). Our description follows [29]. Let us consider the issue
of erratic convergence, assuming for the moment that no breakdown has occurred in the rst k
steps. Conditions (37){(38) imply that the residual generated by the Lanczos iteration satises
(40)
where Gb k is the matrix of dimensions (k +1) k containing Gk in its rst k rows and [0; : : :; 0;
k]T
in its last row. Note the similarity to the residual produced by GMRES (34), except that here
Vk+1 is not an orthogonal matrix, so that kr(
k)k2 cannot be minimized
inexpensively. The idea
(
k
)
(0)
(
k
)
of quasi-minimality is to choose y so that
kr k2 "1 Gk y
2 is minimized instead. As in
the case of GMRES, Gk is an upper-Hessenberg matrix (in fact, it is tridiagonal), and Gk 1 is
the leading principal minor of Gk , so that this minimization problem can be solved inexpensively
from one step to the next.
The iterate x(k) could be computed using (38). However, because the columns of Vk+1 are
not orthogonal, kr(k) k2 cannot be monitored inexpensively. As a result, this update of x(k) will
be costly, of order O(kn) at step k. Fortunately, x(k) can be updated from x(k 1) using short
recurrences.4 Let Q0; : : :; Qk denote the series of plane rotations that transform the subdiagonal
4
The construction outlined here is essentially the one used by SYMMLQ [54].
18
of Gb k to zero, i.e.,
!
R
k
b
Qk Q1 Q0 Gk = 0 ;
where Rk is an upper triangular matrix of order k containing three nonzero bands. The QMR
coecients are given by y (k) = Rk 1 z^(k) , where z^(k) is the vector of length k containing the rst k
entries of z (k) . Let Pk = Vk Rk 1 = [p(0); p(1); : : :; p(k 1)]. Note that the rst k 1 entries of z (k)
and z (k 1) are identical, and Rk 1 is the leading principal minor of Rk . Consequently,
1) + p(k 1)z^(k) :
k 1
(41)
But Pk Rk = Vk , so that p(k 1) can be constructed using a short recurrence involving only p(k 1) ,
p(k 2) and p(k 3) . Therefore, a practical implementation of QMR updates x(k) using (41) and
the ecient computation of p(k 1) .
The modication to handle breakdown changes things slightly. It is based on the fact that
in cases where (near) breakdown (v (j ); w(j )) 0 occurs, it still may be possible to construct
alternative sets of vectors fv (j ) ; : : :; v (j +l 1)g and fw(j ); : : :; w(j +l 1)g such that
[w(j ); : : :; w(j +l 1)]T [v (j ); : : :; v (j +l 1)] is nonsingular;
fv(0); : : :; v(j+l 1)g = spanfv(0); Av(0); : : :; Aj+l 1v(0)g;
fw(0); : : :; w(j+l 1)g = spanfv(0); Av(0); : : :; Aj+l 1v(0)g:
That is, for some number l of steps, the biorthogonality condition cannot be imposed, but bases
for two Krylov subspaces can still be constructed. After this, new vectors v (j +l) and w(j +l) can
be found that augment the Krylov spaces and satisfy
(v (j +l); w(j +l)) 6= 0;
v (j +l) is orthogonal to Kj+l (v(0); A); and
w(j +l) is orthogonal to Kj+l (~v(0); AT ):
The result is two sets of vectors fv (j )gkj =01 and fw(j ) gkj =01 , grouped into blocks,
[V0; V1; : : :; VK 1 ] = [vn ; : : :; vn 1 ; vn ; : : :; vn 1 ; : : :; vnK ; : : :; vk 1 ];
[W0 ; W1; : : :; WK 1 ] = [wn ; : : :; wn 1 ; wn ; : : :; wn 1 ; : : :; wnK ; : : :; wk 1];
(where n0 = 0), which satisfy
(
= j;
T
Vi Wj = 0Di ifif ii 6=
j;
0
with Di nonsingular.
The form of the look-ahead Lanczos computation within block r + 1 is
(42)
(43)
otherwise. The vectors are normalized by v (j +1) = v~(j +1)=j +1 , w(j +1) = w~ (j +1)=j +1 , where
j+1 = kv~(j +1)k2 , j+1 = kw~ (j +1)k2. We refer the reader to [30] for additional details of this
construction, e.g. methods to determine whether (42) or (43) should be used, and denitions of
j and j . By construction, the vectors satisfy all the conditions of (37) except the last, where Gk
is now an upper-Hessenberg matrix of block tridiagonal form. Consequently, (40) is still satised
(with Gb k dened in an identical manner as above, and quasi-minimality can be imposed in exactly
the same way. The length of the recurrence used to update p(k 1) depends on the number of
look-ahead steps needed.
This process prevents the occurrence of breakdown (39) and near breakdown. Except in
special cases, the computations (42) and (43) will terminate at some step m with either v~(m+1) = 0
or w~ (m+1) = 0, in which case the process has produced an invariant subspace of either A or AT .
If it terminates with v~(m+1) = 0, then x(m) is the exact solution; if it terminates with w~ (m+1) = 0,
then the algorithm must be restarted. The special case where termination never occurs, known as
\incurable breakdown," is unlikely to occur in
oating point arithmetic and we will not consider
this issue; see [57, 70].
Some of the important convergence properties of QMR as summarized as follows; see [30, 31]
for proofs.
Theorem 3.2. Let x(k) denote the iterate generated after k steps of QMR, with residual r(k) =
b Ax(k) .
(i) Let Gm denote the upper-Hessenberg matrix generated by the look-ahead Lanczos process, and
assume Gm is diagonalizable, where Xm is the matrix whose columns are the eigenvectors of Hm .
For k < m,
kr(k)k2 kVk+1k2 kX k2 kX 1k2 min
max jk (j )j kr(0)k2 :
2P
k k j 2(A)
(k )
(ii) Let rGMRES
denote the residual produced by k steps of GMRES. Then
p
(k)
(k)
kr(k)k2 kVk+1k2 krGMRES
k2 k + 1 krGMRES
k2:
We brie
y contrast some of the properties of the two classes of methods considered in this
section. As we have noted, one of the main dierences between GMRES and the methods based on
biorthogonality is that GMRES imposes a true minimization, but at the cost of storing and using
a set of vectors of increasing size. One other signicant dierence is in the use of the coecient
matrix. GMRES requires one matrix-vector product per step. The methods considered here,
the nonsymmetric Lanczos method and QMR, require two matrix-vector products, one by A and
one by AT . This will play a signicant role in evaluating the relative costs of the two classes of
methods. Moreover, it may happen that performing a product with AT is more expensive than
performing one with A. Several other versions of biorthogonalization methods avoid reference
to AT . The conjugate gradient squared [67] and BI-CGSTAB methods [73] require two matrixvector products by A at each step but have no minimization properties. The \transpose free"
20
quasi-minimal residual methods [7, 28] have such a property, but at a cost of three matrix-vector
products by A at each step. We emphasize that this is an area of active research, and no single
method has been demonstrated to be clearly superior.
M 1 Ax = M 1 f:
(44)
The idea of preconditioning is to apply a Krylov subspace method to (44), the preconditioned
system, with the aim of computing the solution more eciently than if the method is applied
to (1). In this section, we describe some eective preconditioning techniques for sparse linear
systems, and we describe the construction of the reduced system for matrices with property A.
In order for a preconditioning operator M to be eective, it must satisfy two criteria:
1. The number of iterations required to solve (44) should be smaller than the number of
iterations needed to solve (1).
2. The cost per iteration of the Krylov subspace method applied to (44) should not be significantly higher than the cost per iteration for (1).
For the rst requirement to hold, M 1 should in some sense be a good approximation to A 1 . For
the second requirement, the cost per step should be low enough so that the total cost (essentially,
the product of the cost per step and the number of steps) of the preconditioned iteration is lower
than if the preconditioning operator is not used. In an implementation, the coecient matrix is
referenced by way of one or more matrix-vector products, so that what is required is that the
action of the inverse of the preconditioner applied to a vector, w M 1 v , be inexpensive to
compute.
Recall that for symmetric positive-denite problems, the number of iterations required for
convergence of the conjugate gradient method is approximately proportional to the square root
of the condition number of the coecient matrix. Thus, one way of making more precise the
notion that M 1 be a \good approximation" to A 1 is to require (M 1 A) (A).5 More
generally, the iterates x(k) produced by a Krylov subspace method applied to (1) have errors that
satisfy
e(k) = k (A)e(0);
where k 2 Pk . Therefore
ke(k)k = kk (A)r(0)k kk (A)k ke(0)k:
The method converges rapidly if kk (A)k is small for small k. If, in particular, A is diagonalizable,
A = X X 1 where is a diagonal matrix containing the eigenvalues of A on its main diagonal,
then
kk (A)k = kX k kX 1k kk ()k:
Strictly speaking, for preconditioned CG, we really want the preconditioned problem also to be symmetric
positive-denite. The preconditioned system can be formally represented as C T AC 1 x^ = C 1 f , where M = CC T
is symmetric and x = C T x^. It is not necessary for M to be represented in factored form in an implementation;
see x4.2.
5
21
If, in addition, kX k kX 1k is not too large (for example, kX k2 kX 1k2 = 1 when A is normal
matrix) then kk (A)k is small provided k () is small for 2 (A). Thus, a good approximation
M 1 of A 1 is one for which M 1 A is close to normal and the eigenvalues of M 1A are close to
one. In such a case, it will be easier to nd a polynomial k that is small on (M 1A) than to
nd one that is small on (A). In cases where M 1 A is far from normal, new analytic techniques
based on pseudospectra [71, 72] provide insight into properties of preconditioners.
Preconditioning can also be related to splitting operators and stationary methods. Recall
that if A = M N , then the stationary method dened by (3) or (6) is rapidly convergent if
(M 1N ) = (I M 1 A) is small, i.e., if M 1 A is in some sense close to the identity. Thus,
a good preconditioning operator M is essentially the same thing as a good splitting operator
used to generate stationary iterations. Moreover, consider the modication of (6) dened by
introducing an iteration parameter,
(45)
The new error satises e(k+1) = (I M 1 A)k e(k) , and more generally, e(k) = (I M 1 A)k e(0).
If, for an appropriate choice of , (I M 1 A) is smaller than (M 1N ), then convergence of
the stationary method will be accelerated. We could further rene the iteration by varying the
value of from step to step, i.e., replacing by k in (45). The error at the k'th step then has
the form
e(k) = (I k 1 M 1 A) (I 0 M 1A)e(0):
This expression is simply a specic way of specifying a polynomial k 2 Pk , in factored form.
This requires a strategy for choosing the parameters fj g. Krylov subspace methods construct
such \acceleration polynomials" automatically.
In this section we give some examples of preconditioners that can be applied to arbitrary sparse
matrices. These are all of algebraic type, i.e., dened by means of some algebraic computation
on the elements of the original matrix A.
The (parameterized) incomplete factorization [4, 6, 18, 37, 49]. Incomplete factorization techniques
are motivated by the idea that for very large sparse matrices, where direct solution techniques
based on LU decomposition of A are not viable, it may be possible to compute sparse approximations L and U to the factors of A so that the product represents a reasonable approximation
of A. Let N f(i; j ) j 1 i; j ng be an index set containing all diagonal indices (i; i). The
parameterized incomplete factorization based on N has the form M = LU , where L is a lower
triangular matrix, U is a unit upper triangular matrix, and lij = 0 and uij = 0 for (i; j ) 62 N . If
N contains few indices, then L and U are sparse and applying (LU ) 1 will be inexpensive. The
22
sij
else
lii
endif
enddo
lii + sij
uii
1
for j = i + 1 until n do
uij
enddo
enddo
uij =lii
else
lii
lii + sij
endif
enddo
uii
1
for j = i + 1 until n do
uij
enddo
enddo
uij =lii
M = (D !L);
and the SSOR splitting matrix
M = (D !L)D 1(D !U );
respectively. Note that preconditioning is not aected by multiplication by a scalar, whence the
dierence between these operators and those of (7) and (10). When A is symmetric and D is
positive denite, the SSOR preconditioner can be applied so that the preconditioned problem is
symmetric. In contrast, the SOR operator has not been a popular choice of a preconditioner, in
part because the SOR-preconditioned operator cannot be made symmetric. However, we have
found it to be eective for some nonsymmetric problems [21, 22, 23]; see x5.
All of these methods share the advantage of being very general, in the sense that they can be
dened for arbitrary matrices. For the incomplete factorizations, it is necessary that the pivot
elements lii be nonzero. This can be established for = 0 when A is an M -matrix (aij 0
for i 6= j and A 1 0) [49], but in general, the size of the pivot element must be monitored
during the course of the factorization. We know of no study demonstrating a clear advantage of
either of the strategies for incomplete factorization; an advantage of the strategy based on level
of ll is that the storage requirements of the factors can be determined symbolically, whereas
the method based on drop tolerance requires the numerical computation. The advantage of the
SSOR and SOR preconditionings is that they are well-dened as long as the (block) diagonal of
A is nonsingular. The eectiveness of of all of these preconditioners for certain model problems
arising from partial dierential equations is well understood; see x5. In this regime, they are often
24
not the optimal choices; methods more closely based on properties of the dierential operators
may produce better asymptotic results, e.g. [11].
As noted above, the preconditioned conjugate gradient method (PCG) can be implemented
without explicit reference to factors of M . An implementation is given below; see [13, 33].
It is straightforward to show that if M = CC T , then this scheme is equivalent to solving
C T AC 1x^ = C 1 f by CG, where x(k) = C 1 x^(j).
The Preconditioned Conjugate Gradient Method.
Choose x(0) ; compute r(0) = b Ax(0); solve M r~(0) = r(0); set p(0) = r(0)
for k = 0 until convergence do
k = (r(k) ; r~(k))=(p(k); Ap(k))
x(k+1) = x(k) + k p(k)
r(k+1) = r(k) k Ap(k)
<Test for convergence>
Solve M r~(k+1) = r(k+1)
k+1 = (r(k+1); r~(k+1))=(r(k); r~(k))
p(k+1) = r~(k+1) + k p(k)
enddo
The cost associated with preconditioning is then precisely the cost of applying the action of
M 1.6 A similar algorithm can be developed for the symmetric indenite problem, e.g. using the
orthogonal direction minimum residual method.
For nonsymmetric problems, there is some
exibility in how the preconditioned problem may
formulated, with three dierent \orientations" possible:
Left orientation
[M 1 A] [x] = [M 1 f ];
Two-sided orientation [M1 1 AM2 1 ] [M2x] = [M2 1 f ];
(46)
Right orientation
[AM 1 ] [Mx] = [f ]:
Here, the two-sided orientation requires an explicit representation of M as a product M = M1 M2 .
If such a factorization is not available, then only the left and right orientation are possible.
In general, this issue of orientation does not play a signicant role in the cost per step of any
algorithm. For example, left orientation requires a preconditioned matrix-vector product of the
form
M 1 w1;
w1
Av; w
w1
M 1 v; w
Av;
so that the costs of the two computations are the same. However, for Krylov subspace methods
such as GMRES that minimize the Euclidean norm of the residual, the orientation has an eect.
In all cases, the preconditioned problem can be written as
Abx^ = f;^
If kr(k) k2 is used in the stopping test, then one extra inner product is needed, since in contrast to CG, this
quantity is no longer available as a byproduct of the computation.
6
25
where Ab, x^, and f^ are the quantities in brackets of (46). Thus, if x^(k) is the kth iterate, then
kr^(k)k2 is minimized over the space Kk (^r(0); Ab). Each of the orientations has some potential
advantages.
1. The quantity minimized for the right oriented system is the norm of the residual of the
original system (1), i.e., kr^(k) k2 = kb Ax(k) k2 , where x(k) = M 1 x^(k).
2. If M is a good preconditioner, in the sense that M 1 A I , then for the left-oriented preconditioner, r^(k) = M 1 Ae(k) e(k) , so that kr^(k)k2 may represent a good approximation
to ke(k) k2.
3. For some nonsymmetric problems, it is possible to construct symmetric preconditioners for
which there is a more robust analysis for the two-sided orientation than exists for the other
two orientations. (See [24].)
When convergence analysis of the type alluded to in item (3) is not possible, we have preferred
the right orientation, and its property of minimizing the residual norm associated with (1).
4.3. Preprocessing.
Finally, we mention a technique for preprocessing, which, although not strictly speaking a prepreconditioner like those of the rest of this section, may signicantly improve the performance of
iterative methods. Suppose the system (1) has point Property A, i.e., it is of the form
D1 C1
C2 D2
u1
u2
= bb1 ;
2
(47)
I
0
1
C2 D1 I
produces the equivalent problem
D1
0 D2
C1
C2D1 1 C1
u1
u2
b2
b1
C2D1 1b1 :
This is the same thing as performing one step of block Gaussian elimination, and u2 is then the
solution of the reduced system
(D2 C2 D1 1 C1)u2 = b2 C2D1 1 b1:
The decoupled unknowns u1 can be recovered by solving D1u1 = b1 C1u2 . If C1 and C2 are
sparse, then the preprocessing step has modest cost, and the reduced matrix D2 C2 D1 1 C1 will
also be sparse, so that the reduced system can also be solved using a preconditioned Krylov space
method. We will consider examples of this process in x5.
One of the main problem classes for which iterative methods have been developed is that of
discrete second order elliptic boundary value problems. A model problem is
u + ux + uy = f on
;
u = g on @
;
(48)
where
is a domain on R2 . If at least one of of is nonzero, then the dierential operator
of (48) is non-self-adjoint, and discretization leads to a nonsymmetric linear system (1) (where x
now represents a discrete vector). If both = 0 and = 0, then we have the Poisson equation
u = f;
(49)
which gives rise to a symmetric positive-denite linear system. In this section, we outline some
convergence results for the methods discussed above as they are applied to discretizations of these
model problems.
Consider (49) rst. Suppose this problem is discretized by nite dierences or piecewise linear
nite elements on a uniform m m grid in
, and the grid points are ordered using a natural leftto-right, bottom-to-top ordering. The result is a linear system with a block diagonal coecient
matrix
A = tri [ Im ; T; Im ]
(50)
of block order m, where
T = tri [ 1; 4; 1 ]:
of order m. Hence A has order n = m2 . See [74] for additional details. It can be veried by
direct computation that A has a set of orthogonal eigenvectors
fv(j)gnj=1 = fv(s+(t
1)m) g
1s;t;m
tk
vj(s+(+(kt 1)1)mm) = sin msj
+ 1 sin m + 1 :
s+(t
s
1)m = 4 2 cos m + 1
(51)
2 cos mt+ 1 :
(52)
Let h = 1=(m + 1). Consider the point Jacobi splitting operator BJpt = D 1 (L + U ), where
D is the diagonal of A, and L and U are the strict lower and upper triangular parts of A. Then
the vectors dened by (51) are also the eigenvectors of BJpt, with eigenvalues
1 cos s + 1 cos t ; 1 s; t; m:
2
m+1 2
m+1
The maximum corresponds to s = t = 1, so that the spectral radius of the Jacobi operator is
Moreover, A is consistently ordered, so that Theorem 1.3 holds. (The ordering sets Sk are dened
by grouping diagonal sets of grid points, as in x1.2.) Therefore, for the point Gauss-Seidel method,
pt ) = ((B pt))2 1 2 h2 ;
(BGS
J
! 1 +2h 2 2h;
(L! ) 1 2h:
In light of (5), we see that the number of iterations required for the point Jacobi iteration to
converge is approximately proportional to 2m2 = 2, and the point Gauss-Seidel iteration requires
on the order of m2 = 2 (half as many) steps. SOR with ! is dramatically faster, with iteration
counts proportional to m=(2 ).
Now consider the line Jacobi splitting A = D L U , where
0D
1
B
...
D=@
Dm
1
CA
with Dj = T for each m, and L and U are the remaining lower and upper triangular parts of
A. It can be veried that the vectors of (51) are also the eigenvectors of the line Jacobi iteration
matrix BJline = D 1 (L + U ), with
2 cos(t=(m + 1))
(s+(t
4 2 cos(s=(m + 1)) v
1)m):
and
p
! 1 +2h 2 2 2 h;
p
(Lline
! ) 1 2 2 h:
Thus, the line Jacobi and Gauss-Seidel methods require roughly half asp many iterations as the
point versions of these methods, and the line SOR method requires 1= 2 as many steps as the
point version. The computational costs of line methods are virtually identical to those of point
methods.
Next, we consider the behavior of CG and PCG applied to (49). The smallest and largest
eigenvalues of A correspond to the cases s = t = 1 and s = t = m in (52), giving
(A) 2 2h2 ; (A) 8; and (A) 4 :
min
min
2 h2
28
Consequently, from (22), the number of iterations required for convergence (in the A-norm) is
proportional to 2m= . Asymptotically, this convergence behavior is qualitatively similar to that
of SOR with the optimal value of ! . For more general problems, SOR would require an estimate
for a good value of ! , or one would have to be constructed adaptively; in contrast CG requires
no such parameter estimate.
Convergence of CG can be speeded up using preconditioning. Let us consider three preconditioners: the ILU and MILU factorizations ( = 0 and = 1 in the parameterized incomplete
factorization), and the SSOR factorization. Let the level{0 nonzero patterns be used for the ILU
and MILU factorizations, i.e. no ll-in is permitted in the sparse factors. These techniques have
the following characteristics:
The condition number for the ILU factorization grows like O(m2), the same asymptotic
behavior as for A [9]. However, the eigenvalues of the ILU preconditioned matrix are more
tightly clustered than those of A [49].
When the natural ordering is used for A, the condition number for the MILU factorization
grows like O(m)[18; 37]:7
When the natural ordering is used for A, there exists a parameter !1 such that with
the SSOR preconditioning M = (D !1 L)D 1 (D !1 U ), the condition number of the
preconditioned system grows like O(m) [78].
Thus, with thepMILU and SSOR preconditioners, the number of iterations required by PCG is
reduced to O( m). An advantage of the MILU method is that it is essentially independent of
parameters. See [8] for a systematic treatment of this class of preconditioners for problems with
periodic boundary conditions.
Now let us turn to the more general problem (48). When this equation is discretized by nite
dierences on a uniform m m grid, the resulting coecient matrix has the form
A = tri [ bIm ; T; eIm ]
(53)
of block order m, where
T = tri [ c; a; d ];
of order m, and a, b, c, d, and e are scalars, which we will assume here to be nonnegative. A set
of linearly independent eigenvectors of A is then
vj(s+(+(kt 1)1)mm) =
with eigenvalues
c 1=2
d
(54)
s
1=2
2 (be)1=2 cos mt+ 1 :
1)m = a 2 (cd) cos m + 1
Exactly as in the self-adjoint case, we can show that the vectors of (54) are also eigenvectors of
the point Jacobi iteration matrix and of the line Jacobi iteration matrix. Two commonly used
s+(t
We are being slightly inaccurate here. This result has been established when the MILU factors are constructed
for the matrix A + ch2 I , where c is an arbitrary positive constant. In practice, the conditioning of the MILU
preconditioned system grows like O(m) for c = 0 as well, although there is no proof of this.
7
29
nite dierence operators for (48) are the centered dierence and upwind dierence schemes,
which produce the following matrix coecients (after scaling by h2 ):
Centered a = 4
Upwind
a = 4 + h + h
dierences: b = (1 + h=2)
dierences: b = (1 + h)
c = (1 + h=2)
c = (1 + h)
d = (1 h=2)
d = (1 h)
e = (1 h=2)
e = (1 h) .
For both discretizations, as h ! 0 we have
(BJline ) 1 8 + 8 + 2 h2 ;
line ) 1
2 h2 ;
(BGS
+
+
2
q4 4
line
(L! ) 1 2 4 + 4 + 2 2 h:
2
(55)
Note that these results imply that all of the methods display faster convergence for the nonself-adjoint operator (48) than for the Poisson operator. The methodology developed by Parter
[58, 59] and Parter and Steuerwalt [60] provides a systematic way to obtain results of this type, as
well as generalizations to splittings derived from \multiline" orderings of the underlying grid. All
of these results apply in the asymptotic regime, as h ! 0. Cf. the work of Chin and Manteuel
[10] and Elman and Golub [21, 22, 23] for analysis of these types of methods that apply in the
nonasymptotic regime.
The behavior of preconditioned iterative methods in the nonsymmetric case is less welldeveloped than it is in the symmetric positive-denite case. This is because it is more dicult to
get useful bounds on the quantities referenced in Theorems 3.1 and 3.2 than it is to bound the
condition number in the symmetric positive-denite case. However, using the analysis of splitting
operators as in (55), we can provide some heuristic justication of the eectiveness of incomplete
factorizations applied to discretizations of (48), as follows.
Theorem 5.1. Suppose the matrix A derived from discretizing (48) is an M -matrix. Let A =
M R, where M = LU is the ILU(0) factorization of A, and let BJline denote the line Jacobi
iteration matrix. Then (M 1 R) (BJline).
The proof is based on the fact that for M -matrices, augmenting the index set in which nonzeros
are permitted in an incomplete factorization improves the quality of the incomplete factorization.
In this case, the (factorization of the) block diagonal matrix D of the line Jacobi splitting can be
regarded as an incomplete factorization of A whose nonzero index set is contained in the nonzero
set associated with the ILU(0) factorization. See [5, 23] for details. A consequence of this result
is that the eigenvalues of M 1 A = I M 1 R are contained in a circle centered at 1 with radius
bounded by (BJline).
We conclude this section with a discussion of using the reduced system preprocessing step
of x4.3 to solve the discrete problems under consideration here. First, by using a red-black
ordering, the linear systems (50) and (53) can be permuted into the form (47), so that the reduced
system can be constructed explicitly. Let S = D2 C2D1 1 C1 denote the reduced matrix. It is
straightforward to show that most rows of S contain nine nonzero entries, i.e., S corresponds to
a nine-point operator on the reduced grid. Moreover, the rows and columns of S can be ordered
in such a way that S has block Property A. Examples of two such orderings derived from a
30
six-by-ve grid are shown below. The gures show the unknowns of the reduced system, with
\: " representing the decoupled unknowns. In the gure on the left, if all unknowns lying on the
same diagonal grid line are grouped together, the result is a \one-line" ordering of the reduced
grid. In the gure on the right, all unknowns lying on successive pairs of lines of the reduced grid
are grouped together, producing a \two-line" ordering.
6
2
11
5
1
10
4
14
9
3
13
8
15
12
7
7
1
13
8
2
9
3
14
10
4
11
5
15
12
6
In both cases, the reduced matrix corresponding to these line orderings has a block tridiagonal
structure
S = tri [ Sj;j 1 ; Sjj ; Sj;j +1 ]:
For the one-line orderings, most diagonal blocks Sjj have tridiagonal form, and for the two-line
orderings, most diagonal blocks have pentadiagonal form. In either case, S has block Property
A, and it is possible to dene the block Jacobi splitting S = D L U where D is the block
diagonal matrix consisting of the block diagonal entries Sjj , and L and U come from the lower
triangle and upper triangle of S , respectively. The block Gauss-Seidel and SOR operators are
dened in an analogous manner.
Note that this procedure applies to both symmetric and nonsymmetric problems. Results in
[21, 22, 23, 40, 60] indicate that it is more ecient to solve the reduced system than the original
system, using line iterative methods whose costs per step are comparable. For example, it is
shown in [40] that if A is an M -matrix, then the spectral radius of the two-line Jacobi iteration
matrix for the reduced system is smaller than the spectral radius of the two-line Jacobi iteration
matrix for the original system.8 In view of Theorem 1.2, analogous statements apply for the
block Gauss-Seidel and SOR methods. We cite two results from [22] that show how the block
Jacobi operators behave for the convection-diusion equation.
h
Theorem 5.2. For the centered nite dierence discretization, if j h
2 j < 1 and j 2 j < 1, then the
spectral radius of the one-line Jacobi iteration matrix is bounded by
h 2 r
2
+ 1
h
2
h 2 r
h 2 !2
s
h 2
2
!
2 2
+ 1
+2
h 2
2
(1 cos(h))
For the centered dierence scheme, if j h2 j < 1 and j h2 j < 1, then the spectral radius of the
Indeed, the result is for k-line methods, k 2.
8
31
two-line block Jacobi iteration matrix for the reduced system is bounded by
6. Other Topics.
We conclude by brie
y mentioning some important topics that we have not addressed. These
include:
1. Adaptive methods. In discussing Krylov subspace methods, we have restricted our attention
to techniques that do not require estimates of the eigenvalues of A. Alternative methods,
such as the Chebyshev algorithm for symmetric [35] and nonsymmetric [46, 47] problems,
and various \hybrid" methods that combine CG-like methods with adaptive strategies (see
[51] for a complete set of references), make use of eigenvalue estimates. The disadvantages of
such techniques are that the costs of estimating eigenvalues may be high, and convergence
may be slow if inaccurate estimates are obtained; they are also somewhat more dicult
to program than CG-like methods. They have the advantage, however, of requiring less
work per step than CG-like methods, and they also tend not to depend as much on inner
products, which is useful in the context of parallel computations. (See below.)
2. Problem-based preconditioners. For preconditioners, we have only considered techniques
that can be dened for arbitrary matrices. This has the advantage of being of broad applicability. However, it is often possible to take advantage of properties of particular problems to produce better preconditioners. For example, in the context of partial dierential
equations, techniques such as alternating direction implicit methods [74], block incomplete
factorization [12], domain decomposition [44] and multilevel methods [39, 48] produce preconditioners that often (especially for self-adjoint problems) lead to faster convergence than
the techniques of x4.
3. Ordering Eects. There are many issues associated with the ordering of the rows and
columns of the coecient matrix that aect the performance of iterative methods, especially when preconditioners are used. These include eectiveness of incomplete factorization
methods and eciency of parallel implementations. See e.g. [1, 17, 20] for some discussions
of these issues.
4. Parallel Computations. As for all numerical computations, the ecient implementation of
iterative methods on large-scale parallel computers introduces many new concerns. For
example, as noted above, the inner products required by CG-like methods may present
a diculty on parallel architectures. For vectors of length n, at least O(log n) steps are
required for the parallel computation of an inner product, whereas it may be that all
other computations can be done in O(1) time. Similarly, in the context of discrete partial
32
Acknowledgements. I thank Michael Chernesky, Dianne O'Leary, and Xuejun Zhang for some
editorial remarks. Xuejun Zhang provided the proof of Theorem 1.2, (i).
References
[1] L. M. Adams and H. J. Jordan. Is SOR color blind? SIAM J. Sci. Stat. Comput., 7:490{506,
1986.
[2] S. F. Ashby, T. A. Manteuel, and P. E. Saylor. A taxonomy for conjugate gradient methods.
SIAM J. Numer. Anal., 27:1542{1568, 1990.
[3] O. Axelsson. Solution of linear systems of equations: iterative methods. In V. A. Barker,
editor, Sparse Matrix Techniques, pages 1{51. Springer-Verlag, New York, 1976.
[4] O. Axelsson and G. Lindskog. On the rate of eigenvalue distribution of a class of preconditioning methods. Numer. Math., 48:479{498, 1986.
[5] R. Beauwens. Factorization iterative methods, M-operators and H-operators. Numer. Math.,
31:335{357, 1979.
[6] N. I. Buleev. A numerical method for the solution of two-dimensional and three-dimensional
equations of diusion. Math. Sb., 51:227{258, 1960.
[7] T. F. Chan, L. de Pillis, and H. A. Van der Vorst. A Transpose-Free Squared Lanczos
Algorithm and Applications to Solving Nonsymmetric Linear Systems. Technical report,
Department of Mathematics, UCLA, 1991.
[8] T. F. Chan and H. C. Elman. Fourier analysis of iterative methods for elliptic problems.
SIAM Review, 31:20{49, 1989.
[9] R. Chandra. Conjugate Gradient Methods for Partial Dierential Equations. PhD thesis,
Yale University, Department of Computer Science, 1978.
[10] R. C. Y. Chin and T. A. Manteuel. An analysis of block successive overrelaxation for a
class of matrices with complex spectra. SIAM J. Numer. Anal., 26:564{585, 1988.
[11] P. Concus and G. H. Golub. Use of fast direct methods for the ecient numerical solution
of nonseparable elliptic equations. SIAM J. Numer. Anal., 10:1103{1120, 1973.
[12] P. Concus, G. H. Golub, and G. Meurant. Block preconditioning for the conjugate gradient
method. SIAM J. Sci. Stat. Comput., 6:220{252, 1985.
[13] P. Concus, G. H. Golub, and D. P. O'Leary. A generalized conjugate gradient method for the
numerical solution of elliptic partial dierential equations. In J. R. Bunch and D. J. Rose,
editors, Sparse Matrix Computations, pages 309{332. Academic Press, New York, 1976.
33
[14] J. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue
Comuptations, Volume I, Theory, Volume II, Programs. Birkhauser, Boston, 1986.
[15] J. E. Dennis, Jr. and K. Turner. Generalized conjugate directions. Linear Algebra Appl.,
88/89:187{209, 1987.
[16] J. J. Dongarra, I. S. Du, D. C. Sorensen, and H. A. van der Vorst, editors. Solving Linear
Systems and Vector and Shared Memory Computers. SIAM, Philadelphia, 1990.
[17] I. S. Du and G. A. Meurant. The eect of ordering on preconditioned conjugate gradients.
BIT, 29:635{657, 1989.
[18] T. Dupont, R. P. Kendall, and H. H. Rachford Jr. An approximate factorization procedure
for solving self-adjoint elliptic dierence equations. SIAM J. Numer. Anal., 5:559{573, 1968.
[19] S. C. Eisenstat, H. C. Elman, and M. H. Schultz. Variational iterative methods for nonsymmetric systems of linear equations. SIAM J. Numer. Anal., 20:345{357, 1983.
[20] H. C. Elman and Agron. Ordering techniques for the preconditioned conjugate gradient
method on parallel computers. Computer Physics Communications, 53:253{269, 1989.
[21] H. C. Elman and G. H. Golub. Iterative methods for cyclically reduced non-self-adjoint
linear systems. Math. Comp., 54:671{700, 1990.
[22] H. C. Elman and G. H. Golub. Iterative methods for cyclically reduced non-self-adjoint
linear systems, II. Math. Comp., 56:215{242, 1991.
[23] H. C. Elman and G. H. Golub. Line iterative methods for cyclically reduced convectiondiusion problems. SIAM J. Sci. Stat. Comput., 13:339{363, 1992.
[24] H. C. Elman and M. H. Schultz. Preconditioning by fast direct methods for nonselfadjoint
nonseparable elliptic problems. SIAM J. Numer. Anal, 23:44{57, 1986.
[25] V. Faber and T. A. Manteuel. Necessary and sucient conditions for the existence of a
conjugate gradient method. SIAM J. Numer. Anal, 21:352{362, 1984.
[26] V. Faber and T. A. Manteuel. Orthogonal error methods. SIAM J. Numer. Anal, 24:170{
187, 1987.
[27] R. Fletcher. Conjugate gradient methods for indenite systems. In G. A. Watson, editor,
Numerical Methods Dundee 1975, pages 73{89. Springer-Verlag, New York, 1976.
[28] R. Freund. A Transpose-Free Quasi-Minimal Residual Algorithm for Non-Hermition Linear
Systems. Technical Report 91-18, RIACS, NASA Ames Research Center, 1991.
[29] R. Freund, G. H. Golub, and N. M. Nachtigal. Iterative Solution of Linear Systems. Technical
Report NA-91-05, Stanford University, Numerical Analysis Project, 1991. To appear in Acta
Numerica.
34
[46] T. A. Manteuel. The Tchebychev iteration for nonsymmetric linear systems. Numer. Math.,
28:307{327, 1977.
[47] T. A. Manteuel. Adaptive procedure for estimation of parameters for the nonsymmetric
Tchebychev iteration. Numer. Math., 31:187{208, 1978.
[48] S. F. McCormick, editor. Multigrid Methods. SIAM, Philadelphia, 1987.
[49] J. A. Meijerink and H. A. van der Vorst. An iterative solution method for linear systems of
which the coecient matrix is a symmetric m-matrix. Math. Comp., 31:148{162, 1977.
[50] N. Munksgaard. Solving sparse symmetric sets of linear equations by preconditioned conjugate gradients. ACM Trans. Math. Soft., 20:206{219, 1980.
[51] N. M. Nachtigal, l. Reichel, and L. N. Trefethen. A hybrid GMRES algorithm for nonsymmetric linear systems. SIAM J. Matr. Anal. Appl., 13:796{825, 1992.
[52] N. M. Nachtigal, S. C. Reddy, and L. N. Trefethen. How fast are nonsymmetric matrix
iterations. SIAM J. Matr. Anal. Appl., 13:778{795, 1992.
[53] J. Ortega. Numerical Analysis: A Second Course. Academic Press, New York, 1972.
[54] C. C. Paige and M. A. Saunders. Solution of sparse ndenite systems of linear equations.
SIAM. J. Numer. Anal., 12, 1975.
[55] B. Parlett. Reduction to tridiagonal form and minimal realizations. SIAM. J. Matr. Anal.
Appl., 13:567{593, 1992.
[56] B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Clis, New
Jersey, 1980.
[57] B. N. Parlett, D. R. Taylor, and Z. A. Liu. A look-ahead Lanczos algorithm for unsymmetric
matrices. Math. Comp., 44:105{124, 1985.
[58] S. V. Parter. On estimating the \rates of convergence" of iterative methods for elliptic
dierence operators. Trans. Amer. Math. Soc., 114:320{354, 1965.
[59] S. V. Parter. Iterative methods for elliptic problems and the discovery of \q". SIAM Review,
28:153{175, 1986.
[60] S. V. Parter and M. Steuerwalt. Block iterative methods for elliptic and parabolic dierence
equations. SIAM J. Numer. Anal., 19:1173{1195, 1982.
[61] T. Rivlin. Chebyshev Polynomials: From Approximation Theory to Algebra and Number
Theory. John Wiley & Sons, New York, second edition, 1990.
[62] Y. Saad. Variations of Arnoldi's method for computing eigenelements of large unsymmetric
matrices. Linear Algebra Appl, 34:269{295, 1980.
[63] Y. Saad. The lanczos biorthogonalization algorithm and other oblique projection methods
for solving large unsymmetric systems. SIAM J. Numer. Anal., 19:485{506, 1982.
36
[64] Y. Saad. Iterative solution of indenite symmetric linear systems by methods using orthogonal polynomials over two disjoint intervals. SIAM J. Numer. Anal., 20:784{811, 1983.
[65] Y. Saad and M. H. Schultz. Conjugate gradient-like algorithms for solving nonsymmetric
linear systems. Math. Comp., 44:417{424, 1985.
[66] Y. Saad and M. H. Schultz. GMRES: A generalized minimual residual algorithm for solving
nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7:856{869, 1986.
[67] P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM J.
Sci. Stat. Comput., 10:36{52, 1989.
[68] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer-Verlag, New York,
1980.
[69] D. B. Szyld and O. B. Widlund. Variational Analysis of Some Conjugate Gradient Methods.
Technical Report CS-1989-28, Department of Computer Science, Duke University, 1989.
[70] D. R. Taylor. Analysis of the Look Ahead Lanczos Algorithm. PhD thesis, University of
Caliornia at Berkeley, Department of Mathematics, 1982.
[71] L. N. Trefethen. Non-normal matrices and pseudo-eigenvalues. Incomplete draft, 1990.
[72] L. N. Trefethen. Pseudospectra of Matrices. Technical Report 91-10, Oxford University
Computing Laboratory, 1991.
[73] H. A. Van der Vorst. BI-CGSTAB: A fast and smoothly converging variant of BI-CG for the
solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 10:631{644, 1992.
[74] R. S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Clis, New Jersey, 1962.
[75] H. Walker. Implementation of the GMRES method using Householder transformations.
SIAM J. Sci. Stat. Comput., 9:152{164, 1988.
[76] J. W. Watts III. A conjugate gradient-truncated direct method for the iterative solution of
the reservoir simulation pressure equation. Society of Petroleum Engineers Journal, 21:345{
353, 1981.
[77] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, Oxford, 1965.
[78] D. M. Young. Iterative Solution of Large Linear Systems. Academic Press, New York, 1970.
[79] D. M. Young. A historical overview of iterative methods. Computer Physics Communications, 53:1{17, 1989.
[80] D. M. Young. A historical review of iterative methods. In S. Nash, editor, A History of
Scientic Computing. Addison-Wesley, Reading, MA, 1990.
[81] Z. Zlatev. Use of iterative renement in the solution of sparse linear systems. SIAM. J.
Numer. Anal., 19, 1982.
37