Sunteți pe pagina 1din 15

ALGORITHM 583 LSQR: Sparse Linear Equations and Least Squares Problems

CHRISTOPHER C. PAIGE McGill University, Canada and MICHAEL A. SAUNDERS Stanford University
Categories and Subject Descriptors: G.1.3 [Numerical Analysis] Numerical Linear Algebra--linear systems (dtrect and tterattve methods), G.3 [Mathematics of Computing]: Probability and Statistms--statzshcal computtng, stattsttcal software, G.m [Mathematics of Computing]: Miscellan e o u s - - F O R T R A N program untts General Terms Algorithms Additional Key Words and Phrases: Analysis of variance, conjugate-gradient method, least squares, linear equatmns, regression, sparse matrix

1. INTRODUCTION

L S Q R finds a solution x to the following problems: U n s y m m e t r i c equations: Linear least squares: solve
Ax = b b

(1.1) [[2 (1.2)

minimize [[A x

where A is a matrix with m rows and n columns, b is an m-vector, ~ is a scalar, and the given data A, b, ), are real. T h e matrix A will normally be large and sparse. It is deemed by means of a user-written subroutine A P R O D , whose
Received 4 June 1980; revised 23 September 1981, accepted 28 February 1982 This work was supported by Natural Sciences and Engineering Research Council of Canada Grant A8652, by the New Zealand Department of Scientific and Industrial Research; and by U S. National Science Foundation Grants MCS-7926009 and ECS-8012974, the Department of Energy under Contract AM03-76SF00326, PA No. DE-AT03-76ER72018, the Office of Naval Research under Contract N00014-75-C-0267, and the Army Research Office under Contract DAA29-79-C-0U0, Authors' addresses: C. C. Paige, School of Computer Science, McGill University, Montreal, Quebec, Canada H3A 2K6; M. A Saundem, Department of Operations Research, Stanford University, Stanford, CA 94305. Permmsion to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notme is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 1982 ACM 0098-3500/82/0600-0[95 $00 75 ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982, Pages 195-209.

196

Algorithms

Table I. Comparison of CGLS and LSQR Storage CGLS, h = 0 CGLS, )~~ 0 LSQR, any )~ 2m 4- 2n 2m 4- 2n m 4- 2n Work per iteration 2m + 3n 2m + 5n 3m 4- 5n

essential function is to compute products of the form A x and AWy for given vectors x and y. Problems (1.1) and (1.2) are t r e a t e d as special cases of (1.3), which we shall write as

An earlier successful m e t h o d for such problems is the conjugate-gradient m e t h o d for least squares systems given b y Hestenes and Stiefel [3]. (This m e t h o d is described as algorithm CGLS in [6, sect. 7.1].) CGLS and L S Q R are iterative m e t h o d s with similar qualitative properties. T h e i r computational requirements are summarized in T a b l e I. In addition t h e y require a p r o d u c t A x and a p r o d u c t ATy each iteration. In order to achieve the storage shown for LSQR, we ask the user to i m p l e m e n t the matrix-vector products in the form y ,,-- y + A x and x (-- x + ATy, (1.5)

where <--- m e a n s t h a t one of the given vectors is overwritten by the expression shown. (A p a r a m e t e r specifies which expression the user's subroutine A P R O D should c o m p u t e on any given entry.) W e see t h a t L S Q R has a storage advantage if the operations (1.5) can be performed with no additional storage b e y o n d t h a t required to represent A. For least squares applications with m a n y observations (m >> n), this could be useful. T h e work shown in T a b l e I is the n u m b e r of floating-point multiplications per iteration, excluding the work involved in the products A x , AWy. Since CGLS is s o m e w h a t more efficient, we would not discourage using t h a t m e t h o d whenever A or A is well conditioned. However, L S Q R is likely to obtain a more accurate solution in fewer iterations if.4 is m o d e r a t e l y or severely ill-conditioned. L e t ~k = b - / i x k be the residual vector associated with the k t h iteration. L S Q R provides estimates of ][xk IJ2, II~k ]]2, []-4T~k H 2, the n o r m of.4, the condition n u m b e r of .4, and standard errors for the c o m p o n e n t s of x. T h e last two items require a f u r t h e r 2n multiplications per iteration and an additional n-vector of storage. S u b r o u t i n e L S Q R is written in the P F O R T subset of American National S t a n d a r d F O R T R A N . It contains no m a c h i n e - d e p e n d e n t constants. Auxiliary routines required are A P R O D , NORMLZ, SCOPY, SNRM2, and SSCAL. T h e last t h r e e correspond to m e m b e r s of the BLAS collection [5].
2. MATHEMATICAL BACKGROUND

Algorithmic details are given in [6], mainly for the case ~ = 0. We summarize these here with ~ reintroduced, and show t h a t a given value of :k m a y be dealt with at negligible cost. T h e vector n o r m I] v ]]2 = (vTv) 1/2 is used throughout.
ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982

Algorithms

197

L S Q R uses an algorithm of Golub and K a h a n to reduce A to lower bidiagonal form. T h e quantities produced from A and b after k + 1 steps of the bidiagonalization (procedure Bidiag 1 [6]) are
(XI ~2
U k + l ~ [UI~ U2~ - - . ~ Uk+l]~ B k ~-V k + l = [Vl, V2, . , Vk+l],

O~2

fi3 "

(2.1)
~k+l

T h e k t h approximation to the solution x is then defined to be x k = V ~ y k , where yk solves the subproblem (2.2) Letting the associated residual vectors be
tk+l = file1 - B k y k rk = b - A x k

(2.3)

we find t h a t the relations


rk = Uk+ltk+l AWrk = h2Xk + Olk+lgk+lVk+l

(2.4)

will hold to machine accuracy, where rk+~ is the last component of tk+~, and we therefore conclude t h a t (rk, xD will be an acceptable solution of (1.4) if the computed value of either IIth+l II or I ak+lVk+l I is suitably small. Bjorck [1] has previously observed t h a t subproblem (2.2) is the appropriate generalization of minll B k y k -- fl~el II, when X ~ 0. He also discusses methods for computing yk and xk efficiently for various ~ and k. In L S Q R we assume t h a t a single value of )~ is given, and to save storage and work, we do not compute yk, rk, or tk+~. The orthogonal factorization

(2.5) qk J is computed ( Q T Q k = I ; R k upper bidiagonal, k x k) and this would give R k y k = DTDT fk, but instead we solve ~h k = V[ and form x k = D k f k . The factorization (2.5) is formed similarly to the case )t = 0 in [6], except t h a t two rotations are required per step instead of one. For k --- 2, the factorization
ACM Transac$1ons on Mathematical Software, Vol. 8, No. 2, June 1982.

198

Algorithms

proceeds according to pl
2 0/2 ~2 O/2 j~2 ~2

X I pl 8= 61] -'* f13


p~ e2

p2 62

Note that the first ~ is rotated into the diagonal element a~. This alters the righthand side fl~e~ to produce ~1, the first component of qk. An alternative is to rotate into f12 (and similarly for later h), since this does not affect the right-hand side and it more closely simulates the algorithm that results when LSQR is applied to and 5 directly. However, the rotations then have a greater effect on Bk, and in practice the first option has proved to give marginally more accurate results. The estimates required to implement the stopping criteria are
II kU = = Ilrkll 2 + X=llxkll = = + Ilqkll =,

']'4T~*]l ffi l]ATr~-- X2X*l' = ] ak~Bk+~k "ok ]


This is a simple generalization of the case h ffi 0. No additional storage is needed for qk, since only its norm is required. In short, although the presence of h complicates the algorithm description, it adds essentially nothing to the storage and work per iteration.
3. REGULARIZATION AND RELATED WORK

Introducing h as in (1.3) is just one way of "regularizing" the solution x, in the sense that it can reduce the size of the computed solution and make its components less sensitive to changes in the data. LSQR is applicable when a value of is known a priori. The value is entered via the subroutine parameter DAMP. A second method for regularizing x is available through LSQR's parameter ACOND, which can cause iterations to terminate before I[xk [I becomes large. A similar approach has recently been described by Wold et al. [9], who give an illuminating interpretation of the bidiagonalization as a partial least squares procedure. Their description will also be useful to those who prefer the notation of multiple regression. Methods for choosing X, and other approaches to regularization, are given in [1, 2, 4, 8] and elsewhere. For a philosophical discussion, see [7].
4. CODING APROD

The best way to compute y + Ax and x + ATy depends upon the origin of the matrix A. We shall illustrate a case that commonly arises, in which A is a sparse matrix whose nonzero coefficients are stored by rows in a simple list. Let A have
ACM Transactions on Mathematmal Software, Voi. 8, No. 2, June 1982

Algorithms

199

M rows, N columns, and NZ nonzeros. Conceptually we need three arrays dimensioned as REAL RA(NZ) and INTEGER JA(NZ), NA(M),.where RA(L) is the Lth nonzero of A, counting across row 1, then across row 2, and so on; JA(L) is the column in which the Lth nonzero of A lies; NA(I) is the number of nonzero coefficients in the Ith row of A. These quantities may be used in a straightforward way, as shown in Figure 1 (a FORTRAN implementation). We assume that they are made available to APROD through COMMON, and that the actual array dimensions are suitably large. Blank or labeled COMMON will often be convenient for transmitting data to APROD. (Of course, some of the data could be local to APROD.) For greater generality, the parameter lists for LSQR and APROD include two workspace arrays IW, RW and their lengths LENIW, LENRW. LSQR does not use these parameters directly; it just passes them to APROD. Figure 2 illustrates their use on the same example (sparse A stored by rows). An auxiliary subroutine APROD1 is needed to make the code readable. A similar scheme should be used to initialize the workspace parameters prior to calling LSQR. Returning to the example itself, it may often be natural to store A by c o l u m n s rather than rows, using analogous data structures. However, we note that in sparse least squares applications, A may have many more rows than columns (M >> N). In such cases it is vital to store A by rows as shown, if the machine being used has a paged (virtual) memory. Random access is then restricted to arrays of length N rather than M, and page faults will therefore be kept to a minimum. Note also that the arrays RA, JA, NA are adequate for computing both A x and AWy; we do not need to store A by rows a n d by columns. Regardless of the application, it will be apparent when coding APROD for the two values of MODE that the matrix A is effectively being defined twice. Great care must be taken to avoid coding inconsistent expressions y + A l x and x + A T y , where either A1 or A2 is different from the desired A. (If A1 ~ As, algorithm LSQR will not converge.) Parameters ANORM, ACOND, and CONLIM provide a safeguard for such an event.
5. PRECONDITIONING

It is well known that conjugate-gradient methods can be accelerated if a nonsingular matrix M is available to approximate A in some useful sense. When A is square and nonsingular, the system A x -- b is equivalent to both of the following systems:
(M-1A)x (AM-I)z
=

where where

M c = b; M x = z.

(5.1) (5.2)

-- b

For least squares systems (undamped), only the analogue of (5.2) is applicable: minllAx - b]12 = minll(AM-1)z - bH2, where M x = z. (5.3)
ACM Transac~mns on Mathematical Software, Vol. 8, No 2, June 1982

200

Algorithms
SUBROUTINE APROD( MODE,M,N,X,Y, LENIW,LENRW,IW,RW )
INTEGER INTEGER REAL MODE,M,N,LENIW,LENRW IW(LENIW) X(N) ,Y(M), RW(LENRW)

APROD

PERFORMS THE FOLLOWING FUNCTIONS:


Y = Y + A*X X " X + A(TRANSPOSE)*Y

IF MODE = I, SET IF MODE = 2, SET

WHERE A IS A MATRIX STORED BY ROWS IN THE ARRAYS RA, JA, NA. IN THIS EXAMPLE, RA, JA, NA ARE STORED IN COMMON. REAL INTEGER COMMON INTEGER REAL RA JA, NA RA(9000), JA(9000), NA(I000) I,J,L,LI,L2 SUM, YI, ZERO

ZERO - 0.0 L2 - 0 IF (MODE .NE.I) GO TO 400

Fig. 1. Computation ofy + Ax, x + Ary, where A is a sparse matrix stored compactly by rows. For convenience, the data structure for A is held in COMMON.

MODE - 1

--

SET

Y = Y + A*X.

DO 200 I - I, M SUM - ZERO L1 " L2 + 1 L2 " L2 + NA(I) DO 100 L " LI, L2 J .. JA(L) SUM - SUM + RA(L)*X(J) I00 CONTINUE Y(1) = Y(1) + SUM 200 CONTINUE RETURN

MODE = 2

--

SET

X = X + A(TRANSPOSE)*Y.

400 DO 600 I " I, M YI = Y(1) L1 " L2 + 1 L2 " L2 + NA(I) DO 500 L " LI, L2 J " JA(L) X(J) " X(J) + RA(L)*YI 500 CONTINUE 600 CONTINUE RETURN END OF APROD END A C M Transactions on MathematicalSoftware,Vol 8, ~No. 2,June 1982.

Algorithms
* SUBROUTINE APROD( MODE,H,N,X,Y, LENIW,LENRW,IW,RW ) INTEGER INTEGER REAL APROD MODE,M,N,LENIW,LENRW IW(LENIW) X(N),Y(M),RW(LENRW) PERFORMS THE FOLLOWING FUNCTIONS: Y f f i Y + A*X X f f i X + A(TRANSPOSE)*Y

201

IF MODE = I, SET IF MODE - 2, SET

WHERE A IS A MATRIX STORED BY ROWS IN THE ARRAYS RA, JA, NA. IN THIS EXAMPLE, APROD IS AN INTERFACE BETWEEN LSQR AND ANOTHER USER ROUTINE THAT DOES THE WORK. THE WORKSPACE ARRAY RW CONTAINS RA. THE FIRST M COMPONENTS OF IW CONTAIN NA, AND THE REMAINDER OF IW CONTAINS JA. THE DIMENSIONS OF RW AND IW ARE ASSUMED TO BE SUFFICIENTLY LARGE. INTEGER LENJA,LENRA,LOCJA

LOCJA f f iM + I LENJA f f i LENIW - LOCJA + 1 LENRA f f i LENRW CALL A P R O D I ( M O D E , M , N , X , Y ) * LENJA,LENRA,IW,IW(LOCJA),RW ) RETURN END OF APROD END

Fig. 2 S a m e as Figure 1, with the data structure for A held in the workspace parameters.

SUBROUTINE APRODI( MODE,M,N,X,Y, LENJA,LENRA,NA,JA,RA ) INTEGER INTEGER REAL APRODI INTEGER REAL MODE,M,N,LENJA,LENRA NA(M),JA(LENJA) X(N),Y(M),RA(LENRA) DOES THE WORK FOR I,J,L,LI,L2 SUM,YI,ZERO APROD.

< the same code as in APROD in Figure 1 >

END OF APRODI END

ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982

0 0 0 0 0 0 ~ 0 0 0 0 0 0 0 0 0 0 0 0 ~ ~ ~ ~

~ 0 0 0 ~ ~

II

0'3 0 I
0 0

I r..r..1 0 0 0

,-10 0 I U r.r-1 0 O0

000 0 0 ~.4 I~ 0 I I I I I I I 1 1 1 1 I I I

II

II

II

0 I|

I ~

I 0

I ~

I ~

I 0

I ~

I l l ~

0 0
0 o,I ! I 0 ~'~ 0

0 0 0 0 0 0 0 0 0 0

O O 0 0 0 Q O O O Q Q O O O O O O Q O 0

0 I O Q O

0 I Q Q 0

0 I O O 0

II

I1

0 0
~ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I I I I 0 0 0 0 0 0

I
0 0 0 0 0

II

-,-i

O O O O O 0 0 0 0 0 Q O O O O O Q O O 0 O O 0 0 0 0 0 0 0 0

Q O 0 O O Q O O 0

Z 0
I 0

| 1 1

r~

0 I
v

O~ ~'~1 0~-~"

000

0 I

O0

~2
O0 I I
0 v

000

000
I.-.I u'l

CO 0,-~ O0

I
-I

I 000

0 0 Z
c.,I r'~

aOu'~ 00u'~ OC) txl r.-. 0

0 0

0 ~cO00 0 u~ ,.-4 00oh Z

C~ r~

o
Z

204

Algorithms

W e n o t e only t h a t subroutine L S Q R m a y be applied without c h a n g e to s y s t e m s (5.1)-(5.3). T h e effect of M is localized to the user's own subroutine A P R O D . For example, w h e n M O D E -- 1, A P R O D for the last two s y s t e m s should c o m p u t e y + ( A M - t ) x b y first solving M w = x and t h e n c o m p u t i n g y + A w . Clearly it m u s t be possible to solve s y s t e m s involving M and M T v e r y efficiently.
6. OUTPUT

S u b r o u t i n e L S Q R produces printed o u t p u t on file N O U T , if the p a r a m e t e r N O U T is positive. T h i s is illustrated in Figure 3, in which the least squares p r o b l e m solved is P(20, 10, 1, 1) as defined in [6], with a slight generalization to include a d a m p i n g p a r a m e t e r )~ = 10 -2. (Single precision was used on an I B M 370/168.) T h e items printed~at the k t h iteration are as follows. ITN T h e iteration n u m b e r k. Results are always printed for the first 10 a n d last 10 iterations. I n t e r m e d i a t e results are printed if m _< 40 or n ___ 40, or if one of the convergence conditions is nearly satisfied. Otherwise, i n f o r m a t i o n is printed e v e r y 10th iteration. T h e value of the first e l e m e n t of the a p p r o x i m a t e solution
Xk.

X(1) FUNCTION COMPATIBLE

INCOMPATIBLE

NORM(ABAR) COND(ABAR)

T h e value of the function being minimized, n a m e l y [] Fk [I -(11 rk II2 + x 2 II x, 112)'/2 A dimensionless q u a n t i t y which should converge to zero if a n d only i f A x = b is compatible. I t is a n e s t i m a t e of II ~k II/ II b II, which decreases monotonically. A dimensionless q u a n t i t y which should converge to zero if a n d only i f the o p t i m u m II ~k II is nonzero. I t is a n e s t i m a t e of IIii T ~ II/(11 ti IIwII Fk II), which is usually not monotonic. A m o n o t o n i c a l l y increasing e s t i m a t e of IILi II~. A monotonically increasing e s t i m a t e of cond(z~) = IIA IIr II/i+ IIr, the condition n u m b e r of z{.

ACKNOWLEDGMENT

T h e a u t h o r s are grateful to R i c h a r d H a n s o n for suggestions t h a t p r o m p t e d several i m p r o v e m e n t s to the i m p l e m e n t a t i o n of L S Q R .


REFERENCES

1. BJORCK, A. A bidlagonahzation algorithm for solving ill-posed systems of linear equatmns Rep LITH-MAT-R-80-33, Dep. Mathematics, Linkopmg Univ., Lmkoping, Sweden, 1980. 2 ELDI~,N, L. Algorithms for the regularization of ill-conditioned least squares problems BIT 17 (1977), 134-145. 3. HESTENES, M.R, AND STIEFEL, E Methods of conjugate gradients for solving hnear systems. J Res. N.B.S. 49 (1952), 409-436. 4 LAWSON,C.L, AND HANSON, R.J Solvmg Least Squares Problems. Prentice-Hail, Englewood Cliffs, N.J., 1974. 5 LAWSON, C . L , HANSON, R J., KINCA1D, D.R, ANDKROGH,F T Basic hnear algebra subprograms for Fortran usage ACM Trans Math Softw 5, 3 (Sept 1979), 308-323 and (Algorithm} 324-325. 6. PAIGE, C.C., AND SAUNDERS, M A LSQR An algorithm for sparse linear equations and sparse least squares ACM Trans Math Softw 8, 1 (March 1982), 43-71
ACM TransacUons on Mathematical Software, Vol 8, No. 2, June 1982

Algorithms

205

7. SMITH, G., AND CAMPBELL, F. A critique of some ridge regression methods. J. Am. Star. Assoc. 75, 369 (March 1980), 74-81 8 VARAH, J.M A practical examination of some numerical methods for linear discrete ill-posed problems S I A M Rev. 21 (1979), 100-111. 9. WOLD, S, WOLD, H., DUNN, W.J., AND RUHE, A. The collinearity problem in linear and nonlinear regression. The partial least squares (PLS) approach to generalized inverses. Rep. UMINF-83.80, Univ. Ume~, Ume&, Sweden, 1980.

ALGORITHM
[ A p a r t o f t h e l i s t i n g is p r i n t e d h e r e . T h e c o m p l e t e l i s t i n g is a v a i l a b l e f r o m t h e A C M A l g o r i t h m s D i s t r i b u t i o n S e r v i c e (see p a g e 227 for o r d e r f o r m ) . ] SUBROUTINE LSQR( M,N,APROD, DAMP, LENIW,LENRW, IW,RW, U,V,W,X,SE, ATOL,BTOL,CONLIM, ITNLIM,NOUT, ISTOP,ANORM,ACOND,RNOI~M,ARNORM, XNOEM ) EXTERNAL INTEGER INTEGER REAL i APROD M,N,LENIW, LENRW, ITNLIM,NOUT,ISTOP IW(LENIW) RW(LENRW),U(M),V(N),W(N),X(N),SE(N), ATOL,BTOL, CONLIM, DAMP,ANORM,ACOND, RNORM,ARNORM, XNORM
io 2. 3. 4. 5. 6. 7.

i 2 3 4

8.
9. i. ii. 12. 13. 14. 15. 16. 17. 18. 19. 2~. 21. 22. 23. 24. 25. 26. 27. 28. 29. 3#. 31. 32. 33. 34. 35. 36. 37. 38. 39. 4. 41. 42. 43. 44. 45.

LSQR

FINDS A SOLUTION

TO THE FOLLOWING PROBLEMS... SOLVE A*X = B

i. UNSYMMETRIC EQUATIONS -2. LINEAR LEAST SQUARES --

SOLVE A*X = B IN THE LEAST-SQUARES SENSE SOLVE ( A )*X f f i( B )

3. DAMPED LEAST SQUARES

--

( DAMP*I )

( ~b )

IN THE LEAST-SQUARES SENSE WHERE A IS A MATRIX WITH M ROWS AND N COLUMNS, B IS AN M-VECTOR, AND DAMP IS A SCALAR (ALL QUANTITIES REAL). THE MATRIX A IS INTENDED TO BE LARGE AND SPARSE. IT IS ACCESSED BY MEANS OF SUBROUTINE CALLS OF THE FORM CALL APROD( MODE,M,N,X,Y,LENIW,LENRW, IW,RW ) WHICH MUST PERFORM THE FOLLOWING FUNCTIONS... IF MODE f f i i, COMPUTE IF MODE f f i 2, COMPUTE Y = Y + A*X. X = X + A(TRANSPOSE)*Y.

THE VECTORS X AND Y ARE INPUT PARAMETERS IN BOTH CASES. IF MODE = i, Y SHOULD BE ALTERED WITHOUT CHANGING X. IF MODE = 2, X SHOULD BE ALTERED WITHOUT CHANGING Y. THE PARAMETERS LENIW, LENRW, IW, RW MAY BE USED FOR WORKSPACE AS DESCRIBED BELOW. THE RHS VECTOR B IS INPUT VIA U, AND SUBSEQUENTLY OVERWRITTEN.

ACM Transactionson MathematicalSoftware,Vol. 8, No. 2, June 1982

206

Algorithms
46. 47. 48. 49.

NOTE. LSQR USES AN ITERATIVE METHOD TO APPROXIMATE THE SOLUTION. THE NUMBER OF ITERATIONS REQUIRED TO REACH A CERTAIN ACCURACY DEPENDS STRONGLY ON THE SCALING OF THE PROBLEM. POOR SCALING OF THE ROWS OR COLUMNS OF A SHOULD THEREFORE BE AVOIDED WHERE POSSIBLE. FOR EXAMPLE, IN PROBLEM i THE SOLUTION IS UNALTERED BY ROW-SCALING. IF A ROW OF A IS VERY SMALL OR LARGE COMPARED TO THE OTHER ROWS OF A, THE CORRESPONDING ROW OF ( A B ) SHOULD BE SCALED UP OR DOWN. IN PROBLEMS i AND 2, THE SOLUTION X IS EASILY RECOVERED FOLLOWING COLUMN-SCALING. IN THE ABSENCE OF BETTER INFORMATION, THE NONZERO COLUMNS OF A SHOULD BE SCALED SO THAT THEY ALL HAVE THE SAME EUCLIDEAN NORM (E.G. i.@). IN PROBLEM 3, THERE IS NO FREEDOM TO RE-SCALE IF DAMP IS NONZERO. HOWEVER, THE VALUE OF DAMP SHOULD BE ASSIGNED ONLY AFTER ATTENTION HAS BEEN PAID TO THE SCALING OF A. THE PARAMETER DAMP IS INTENDED TO HELP REGULARIZE ILL-CONDITIONED SYSTEMS, BY PREVENTING THE TRUE SOLUTION FROM BEING VERY LARGE. ANOTHER AID TO REGULARIZATION IS PROVIDED BY THE PARAMETER ACOND, WHICH MAY BE USED TO TERMINATE ITERATIONS BEFORE THE COMPUTED SOLUTION BECOMES VERY LARGE.

5.
51. 52. 53. 54. 55. 56. 57. 58. 59. 6. 61. 62. 63. 64. 65. 66. 67. 68. 69.

7.
71. 72. 73. 74. 75. 76. 77. 78. 79.

NOTATION

THE FOLLOWING QUANTITIES ARE USED IN DISCUSSING THE SUBROUTINE PARAMETERS... ~AR = ( A (D~*I) B ), BBAR (B) (~) BBAR ABAR*X

8~.
81. 82. 83. 84. 85. 86. 87. 88. 89. 9. 91. 92. 93. 94. 95. 96. 97. 98. 99.

R RNOEM

=
=

A'X, +

REAR

SQRT( NORM(R)**2
NORM( RBAR

DAMP**2 * NORM(X)**2 )

RELPR

THE RELATIVE PRECISION OF FLOATING-POINT ARITHMETIC ON THE MACHINE BEING USED. FOR EXAMPLE, ON THE IBM 370, RELPR IS ABOUT i. @E-6 AND 1.~D-16 IN SINGLE AND DOUBLE PRECISION RESPECTIVELY. RNOEM WITH RESPECT TO X.

LSQR

MINIMIZES THE FUNCTION

PARAMETERS

M N APROD DAMP

INPUT INPUT EXTERNAL INPUT

THE NIIMBER OF ROWS IN

A. A.

THE NIJM~ER OF COLUMNS IN SEE ABOVE.

z2. z03.

THE DAMPING PARAMETER FOR PROBLEM 3 ABOVE. (DAMP SHOULD BE ~.~ FOR PROBLEMS i AND 2.)

ACM Transactions on Mathematical Software, Vol 8, No 2, June 1982.

Algorithms

207

LENIW LENRW IW RW

U(M)

V(N) W(N)

X(N)
SE(N)

ATOL

BTOL

CONLIM

1@6. 1@7. 1@8. 1@9. Ii@. IIi. 112. THE WORK PER ITERATION AND THE STORAGE NEEDED 113. BY LSQR ARE THE SAME FOR ALL VALUES Ol ~ DAMP. 114. 115. INPUT THE LENGTH OF THE WORKSPACE ARRAY !W. 116. INPUT THE LENGTH OF THE WORKSPACE ARRAY RW. 117. WORKSPACE AN INTEGER ARRAY OF LENGTH LENIW. 118. WORKSPACE A REAL ARRAY OF LENGTH LENRW. 119. 12@. NOTE. LSQR DOES NOT EXPLICITLY USE THE PREVIOUS FOUR 121. PARAMETERS, BUT PASSES THEM TO SUBROUTINE APROD FOR 122. POSSIBLE USE AS WORKSPACE. IF APROD DOES NOT NEED 123. IW OR RW, THE VALUES LENIW = i OR LENRW ,, i SHOULD 124. BE USED, AND THE ACTUAL PARAMETERS CORRESPONDING TO 125. IW OR RW MAY BE ANY CONVENIENT ARRAY OF SUITABLE TYPE. 126. 127. INPUT THE RHS VECTOR B. BEWARE THAT U IS 128. OVER-WRITTEN BY LSQR. 129. 13@. WORKSPACE 131. WORKSPACE 132. 133. OUTPUT RETURNS THE COMPUTED SOLUTION X. 134. 135. OUTPUT RETURNS STANDARD ERROR ESTIMATES FOR THE 136. COMPONENTS OF X. FOR EACH I, SE(1) IS SET 137. TO THE VALUE ENORM * SQRT(SIGMA(I,I) / T ), 138. 139. WHERE SIGMA(I,I) IS AN ESTIMATE OF THE I-TH DIAGONAL OF THE INVERSE OF ABAR(TRANSPOSE)*ABAR 14@. AND T " i IF M .LE. N, 141. T ffi M - N IF M .GT. N AND DAMP f f i @, 142. T ,, M IF DAMP .NE. @. 143. 144. INPUT AN ESTIMATE OF THE RELATIVE ERROR IN THE DATA 145. DEFINING THE MATRIX A. FOR EXAMPLE, 146. IF A IS ACCURATE TO ABOUT 6 DIGITS, SET 147. ATOL " I.@E-6 . 148. 149. INPUT AN ESTIMATE OF THE RELATIVE ERROR IN THE DATA 15@. DEFINING THE RHS VECTOR B. FOR EXAMPLE, 151. IF B IS ACCURATE TO ABOUT 6 DIGITS, SET 152. BTOL f f i I.@E-6 153. 154. INPUT AN UPPER LIMIT ON COND('ABAR), THE APPARENT 155. CONDITION NUMBER OF THE MATRIX ABAR. 156. ITERATIONS WILL BE TERMINATED IF A COMPUTED 157. ESTIMATE OF COND(ABAR) EXCEEDS CONLIM. 158. THIS IS INTENDED TO PREVENT CERTAIN SMALL OR 159. ZERO SINGUI2d~ VALUES OF A OR ABAR FROM 16@. COMING INTO EFFECT AND CAUSING UNWANTED GROWTH 161. IN THE COMPUTED SOLUTION. 162. 163. CONLIM AND DAMP MAY BE USED SEPARATELY OR 164. TOGETHER TO REGULARIZE ILL-CONDITIONED SYSTEMS. 165.
ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982

IF THE SYSTEM A*X = B IS INCOMPATIBLE, VALUES OF DAMP IN THE RANGE @ TO SQRT(RELPR)*NORM(A) WILL PROBABLY HAVE A NEGLIGIBLE EFFECT. LARGER VALUES OF DAMP WILL TEND TO DECREASE THE NORM OF X AND TO REDUCE THE ~ E R OF ITERATIONS REQUIRED BY LSQR.

~~ ; ~

L ~ I

208
C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C

Algorithms
166. 167.

NORMALLY, 1~ TO

CONLIM

SHOULD BE IN THE RANGE

1/REAR.

SUGGESTED VALb~ -CONLIM = 1/(100*RELPR) FOR COMPATIBLE SYSTEMS, CONLIM - I/(10*SQRT(RELPR)) FOR LEAST SQUARES. NOTE. IF THE USER IS NOT CONCERNED ABOUT THE PARAMETERS ATOL, BTOL AND CONLIM, ANY OR ALL OF THEM MAY BE SET TO ZERO. THE EFFECT WILL BE THE SAME AS THE VALUES RELPR, RELPR AND I/KELPR RESPECTIVELY. ITNLIM INPUT AN UPPER LIMIT ON THE NUMBER OF ITERATIONS. SUGGESTED VALUE -ITNLIM " N/2 FOR WELL CONDITIONED SYSTEMS, ITNLIM - 4*N OTHERWISE. FILE NUMBER FOR PRINTER. IF POSITIVE, A SUMMARY WILL BE PRINTED ON FILE NOUT. AN INTEGER GIVING THE REASON FOR TERMINATION... X " 0 IS THE EXACT SOLUTION. NO ITERATIONS WERE PERFORMED. THE EQUATIONS A*X - B ARE PROBABLY COMPATIBLE. NORM(A*X - B) IS SUFFICIENTLY SMALL, GIVEN THE VALUES OF ATOL AND BTOL. THE SYSTEM A*X = B IS PROBABLY NOT COMPATIBLE. A LEAST-SQUARES SOLUTION HAS BEEN OBTAINED WHICH IS SUFFICIENTLY ACCURATE, GIVEN THE VALUE OF ATOL. AN ESTIMATE OF COND(ABAR) HAS EXCEEDED CONLIM. THE SYSTEM A*X - B APPEARS TO BE ILL-CONDITIONED. OTHERWISE, THERE COULD BE AN AN ERROR IN SUBROUTINE APROD. THE EQUATIONS A*X - B ARE PROBABLY COMPATIBLE. NORM(A*X - B) IS AS SMALL AS SEEMS REASONABLE ON THIS MACHINE. THE SYSTEM A*X - B IS PROBABLY NOT COMPATIBLE. A LEAST-SQUARES SOLUTION HAS BEEN OBTAINED WHICH IS AS ACCURATE AS SEEMS REASONABLE ON THIS MACHINE. COND(ABAR) SEEMS TO BE SO LARGE THAT THERE IS NOT MUCH POINT IN DOING FURTHER ITERATIONS, GIVEN THE PRECISION OF THIS MACHINE. THERE COULD BE AN ERROR IN SUBROUTINE APROD. THE ITERATION LIMIT ITNLLM WAS REACHED.

168. 169.
170. 171. 172, 173. 174. 175. 176.

177.
178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193 194. 195. 196. 197. 198 199.

NOUT

INPUT

ISTOP

OUTPUT 0

200. 2~1. 202. 203. 204. 205. 206. 207. 208. 209. 21~. 211. 212. 213.
214. 215. 216. 217. 218. 219.

7 ANORM OUTPUT

AN ESTIMATE OF THE FROBENIUS NORM OF ABAE. THIS IS THE SQUARE-ROOT OF THE SUM OF SQUARES OF THE ELEMENTS OF ABAR. IF DAMP IS SMALL AND IF THE COLUMNS OF A RAVE ALL BEEN SCALED TO HAVE LENGTH 1.0, ANORM SHOULD INCREASE TO ROUGHLY SQRT(N).

220. 221. 222. 223. 224. 225. 226.

ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982.

Algorithms

200

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C c C C

227. A RADICALLY DIFFERENT VALUE FOR ANORM MAY 228. INDICATE AN ERROR IN SUBROUTINE APROD (THERE MAY BE AN INCONSISTENCY BETWEEN MODES 1 AND 2). 229.

23~.
ACOND OUTPUT AN ESTIMATE OF COND(ABAR), THE CONDITION NUMBER OF ABAR. A VERY HIGH VALUE OF ACOND MAY AGAIN INDICATE AN ERROR IN APROD. AN ESTIMATE OF THE FINAL VALUE OF NORM(KBAR), THE FUNCTION BEING MINIMIZED (SEE NOTATION ABOVE). THIS WILL BE SMALL IF A*X " B HAft A SOLUTION. AN ESTIMATE OF THE FINAL VALUE OF NORM( ABAR(TRANSPOSE)*RBAR ) , THE NORM OF THE RESIDUAL FOR THE USUAL NORMAL EQUATIONS. THIS SHOULD BE SMALL IN ALL CASES. (ARNORM WILL OFTEN BE SMALLER THAN THE TRUE VALUE COMPUTED FROM THE OUTPUT VECTOR X. ) AN ESTIMATE OF THE NORM OF THE FINAL SOLUTION VECTOR X. 231. 232. 233. 234. 235. 236. 237. 238. 239.

RNORM

OUTPUT

ARNORM

OUTPUT

24@.
241. 242. 243. 244. 245. 246.

XNORM

OUTPUT

247. 248. 249.

25@.
SUBROUTINES AND FUNCTIONS USED

251.
252. 253. 254. 255. 256.

USER LSQR BLAS FORTRAN

APROD NORMLZ SCOFY,SNRM2,SSCAL (SEE LAWSON ET AL. BELOW) (SNRM2 IS USED ONLY IN NORMLZ) ABS,MOD, SQRT

257. 258. 259. 26~.


261. 262. 263. 264. 265. 266. 267. 268. 269.

PRECISION

THE NUMBER OF ITERATIONS REQUIRED BY LSQR WILL USUALLY DECREASE IF THE COMPUTATION IS PERFORMED IN HIGHER PRECISION. TO CONVERT LSQR AND NORMLZ BETWEEN SINGLE- AND DOUBLE-PRECISION, CHANGE THE WORDS SCOPY, SNRM2, SSCAL ABS, REAL, SQRT TO THE APPROPRIATE BLAS AND FORTRAN EQUIVALENTS.

REFERENCES

PAIGE, C,C, AND SAUNDERS, M,A. LSQR: AN A L G O R ~ T ~ FOP, SPARSE LINEAR EQUATIONS AND SPARSE LEAST SQUARES. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 8, 1 (MARCH 1982). LAWSON, C.L., HANSON, R.J., KINCAID, D.R. AND KROGH, .T. BASIC LINEAR ALGEBRA SUBPROGRAMS FOR FORTRAN USAGE. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 5, 3 (SEPT 1979), 3@8-323 AND 324-325.

270. 271. 272. 273. 274. 275. 276. 277. 278. 279.

28@.
281. 282. 283. 284. 285. 286.

LSQR.

THIS VERSION DATED 22 FEBRUARY 1982.

287.
ACM Transactions on Mathematical Software, VoL 8, No. 2, June 1982.

S-ar putea să vă placă și