Documente Academic
Documente Profesional
Documente Cultură
IN APPLIED MATHEMATICS
Handbook for
Matrix Computatiions
Handbook for
Matrix Computations
Eric Grosse
AT&T Bell Laboratories
Murray Hill, New Jersey
Handbook for
Matrix Computations
Thomas F. Coleman
Charles Van Loan
Comell University
Philadelphia 1988
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the writLen permission of the
Publisher. For information, write the Society for Industrial and Applied Mathematics,
3600 University City Science Center, Philadelphia, Pennsylvania 19104-2688.
Library of Congress Catalog Card Number 88-61637.
ISBN 0-69871-227-0
Copyright 01988 by the Society for Industrial and Applied Mathematics.
Second printing July 1989.
Third printing April 1991.
Contents
vii
Preface
105
107
112
119
125
132
Section 2.1:
Section 2.2:
Section 2.3:
Section 2.4:
Section 2.5:
Chapter 3: LINPACK
139
4
147
152
161
166
175
180
Bookkeeping Operations
Vector Operations
Norm Computations
Givens Rotations
Double Precision and Complex Versions
Section 3.1:
Section 3.2:
Section 3.3:
Section 3.4:
Section 3.5:
Section 3.6:
Section 3.7:
Triangular Systems
General Systems
Symmetric Systems
Banded Systems
The QR Factorization
The Singular Value Decomposition
Double Precision and Complex Versions
Chapter 4: MATLAB
189
203
212
219
232
243
256
Section 4.1:
Section 4.2:
Section 4.3:
Section 4.4:
Section 4.5:
Section 4.6:
Section 4.7:
261
References
262
Index
Basics
Loops and Conditionals
Working with Submatrices
Built-in Functions
Functions
Factorization
Miscellaneous
Preface
This handbook can be used as a reference by those actively engaged in
scientific computation. It can also serve as a practical companion text in a
numerical methods course that involves a significant amount of linear algebraic
computation, The book has four chapters, each being fairly independent of the
others.
Our treatment of Fortran 77 in Chapter 1 involves a much stronger emphasis
on arrays than is accorded by other authors. We also assume that the reader has
experience with some high-level programming language. This might be in the
form of a recent course in Pascal or a course in Fortran taken many years ago
and now half-forgotten.
The second chapter is about the Basic Linear Algebra Subprograms (BLAS).
The elementary linear algebra that underpins the BLAS makes them a good
vehicle for acquainting the beginning student with modular programming and
the importance of "thinking vector" when organizing a matrix computation.
Chapter 3 is concerned with LINPACK, a highly acclaimed package that is
suitable for many linear equation and least square calculations. The last chapter
is about MATLAB, an interactive system in which it is possible to couch
sophisticated matrix computations at a very high level.
A one-semester course in matrix algebra (or the equivalent) is required to
understand most of the text.
Because the book spans several levels of practical matrix computations, it can
fit into a number of canonically structured numerical methods courses. At
Cornell we use Chapters 1 and 2 in our one-semester introductory numerical
methods course. In this course it is assumed that the students are acquainted
with Pascal. That is why our treatment of Fortran is brisker than what would be
found in a "pure" Fortran text. In our graduate-level numerical analysis courses
we use Chapters 2, 3, and 4 heavily, with Chapter 1 serving as a reference,
The BLAS and LINPACK are in the public domain and are distributed at cost
through Argonne National Laboratory. MATLAB is available from MGA Inc.,
73 Junction Square Dr., Concord, MA 01742.
We are indebted to Nick Higham, Bill Coughran, and Eric Grosse for catching numerous typographical errors and for making many valuable suggestions.
Thomas F, Coleman
Charles Van Loan
Chapter 1
A Subset of Fortran 77
Our treatment of Fortran 77 (F"77)assumes that the reader is already familiar
with some high-level language, e.g., PASCAL. There are many books devoted
to the presentation of the basics of Fortran 77(see, e.g., Zwass [8] ). We do not
attempt to be exhaustive at this level. For example, we say nothing about the
"opening" and "closing" of files, and character manipulation is mentioned only
briefly. Rather, our emphasis is on matrix computations and we have attempted
to be complete in this regard. We pay special attention to arrays since the
implementation of matrix algorithms is an underlying theme of the book.
We believe that it is still important for students with a serious interest in
scientific computation to be familiar with Fortran. This is not to say that we are
advocating that all scientific computing be done in Fortran. On the contrary, the
most appropriate language for a particular application at hand should be used.
The language "C" is an increasingly popular choice. MATLAB is well suited to
dense matrix computations and graphical work. However, we strongly believe
that high-quality subroutines should be used as building blocks whenever
possible. For reasons of portability and standardization, this software is almost
always Fortran software and will probably continue to be so for many years.
Therefore, familiarity with Fortran is essential.
A scientific programmer working in a language other than Fortran can
often use Fortran subroutines directly, provided the language and computer
system support such an interface. Otherwise a direct translation can be used,
provided that extreme care is taken. In either case, Fortran knowledge is
required.
We make two final observations about the use of Fortran. First, Fortran
should seriously be considered when developing general software for a basic
mathematical computation with widespread applicability. Fortran programs that
adhere to professionally set "standards" are easily ported to other computing
systems. This is not true for other languages. Second, it is important to realize
that Fortran is not a "dead" programming language. The "modernization" of
Fortran (and the subsequent updating of the "standards") is an ongoing process.
Fortran 77 is now widely used and accepted (replacing Fortran 66). Fortran 8X
is currently being developed. Each new Fortran basically inherits the old version
as a subset to which the extensions add power and flexibility.
1.1 BASICS
We begin the discussion with a Fortran program that computes the
surface area of a sphere from the formula A = 4m2:
program area
real r, area
C
c
c
c
read(*,*)r
area = 4. *3. 14159*rkr
write (*, * ) area
stop
end
The purpose of each line in t3k program is quite evident. Those lines that begin
with a "c" are comments. The read and write statements perform input and
output. The computation of the surface area takes place in the arithmetic
assignment statement "area = ...". The beginning and end of the program are
indicated by the program and the end statements. Execution is terminated by
the stop. The memory locations for the variables used by the program are set
aside by the line "real r, area" .
In this section we elaborate on these and a few other elementary
constructs.
Program Organization
A Fortran program generally consists of a main program (or "driver") and
several subprograms (or "procedures"). Typically the main program and a few
of the subprograms are written by the user. Other subprograms may come from
a "library." Subprograms are discussed later.
A main program is structured as follows:
1 FORTRAN 77
program { name ]
{ other statements )
stop
end
Each line in a Fortran code must conform to certain column position rules.
Column 1
Columns 2-5
Column 6
Columns 7 -72
Columns 73-80
:
:
:
:
:
Comments
A line that begins with a "c"in the first column is a comment. Comments
may appear anywhere in a program and are crucial to program readability,
Comments should be informative, well written, and sufficiently "set off" from
the body of the program. To accomplish the latter begin and end each comment
block with a blank comment as in the surface area program above.
Sequence Numbers
Every line in a program can be numbered in columns 73-80. This used to
be a common practice but in the age of screen editors sequence numbering has
become less useful for debugging.
1.1 BASICS
read(*,*)
15
( list-of-variables ]
write(*,*) ( list-ofvariables ]
Thus,
reads three items of data and stores them in the variables named x, y, and z.
Similarly,
physically obtains its data and where a write physically sends its output. The
"asterisk" notation in a read or a write specifies a default device, e.g., the
keyboard, the terminal screen, etc. How the default devices are set depends
upon the system and it is necessary to obtain local instructions for their use.
Names in Fortran
Names in Fortran must involve no more than six characters chosen from
the alphanumeric set
The name must begin with a letter. In our examples we do not distinguish
1 FORTRAN 77
between upper and lower case. However, it should be noted that a few F77
compilers accept only uppercase input.
integer
list-of-variables
>
list-of-variables
>
list-of-variables 1
I list-ofvariables 1
list-of-variables 1
There are a few rules to follow regarding the naming and typing of variables.
Variable names must begin witlitaletter and must be no more than six
alphanumeric characters in length.
Each variable should be declared exactly once. Automatic typing
occurs otherwise. This means that variables whose names begin
with letters i through n are integers and all others are real.
1.1 BASICS
17
Integer Variables
Numbers stored in integer variables are represented in Lfixedpoint style.
In a typical computer 32 bits, bo, b1,..., b31, might be allocated for each
integer variable x with the convention that x has the value
The notation (
Note that because of the finiteness of an integer "word" there is an upper bound
on the size of the integers that can be represented. In the 32-bit example, only
integers in the interval [ -m , m 1, where m = 231 - 1 z 2 x 109 , can be
represented.
1 FORTRAN 77
and
e = (-1)bl x (b30 -. b24 )2
with the convention that x = m 2 e . Stipulating that bo # 0 makes the
representation of a given floating point number unique. (An exception to this
rule is required when x = 0. )
Double precision variables represent numbers in the same fashion as do
real variables, but more space is allocated per variable. For example, if 32-bit
words are used for real variables then typically 64-bit words would be used for
double precision variables. This leaves more bits for mantissa specification,
e.g., 56 bits instead of 24 .
Regardless of precision, there are finitely many floating point numbers,
and rounding errors generally arise with every arithmetic operation. Moreover,
an arithmetic operation (such as a divide by a very small number) may lead to a
number that is "too big" to represent in the floating point system. Overflow
results, a situation that usually leads to program termination. Some of the
hazards of floating point computation are discussed in 8 1.9.
Arithmetic Assignment
An arithmetic assignment statement has the form
{
variable name
) =
expression
The expression on the right-hand side is evaluated according to precise rules and
the result is stored in the memory location corresponding to the variable on the
left-hand side. The values of the variables on the right-hand side do not change
as a consequence of the assignment. For example,
would compute the area of the circle and store the result in the variable A,
assuming that p i and r are appropriately initialized. An asterisk denotes
mu1tiplication, whereas a double asterisk specifies exponentiation.
1.1 BASICS
19
-b
sqrt (b*b
4 . *a*c) / 2 . *a
The expression to the right of the "=" involves one of Fortran's numerous
"built-in" (or intrinsic) functions, the square root. We introduce various built-in
functions throughout the text. A complete list is given in Appendix 1.
Readers familiar with solving quadratic equations will recognize that the
above assignment statement does not compute a zero of ax2 + bx + c = 0 .
The order in which the operations are to be performed is not correctly specified.
Indeed, the above assignment statement is equivalent to
root
-b
-t (
) *a )
-b
/ ( 2 . *a)
10
1 FORTRAN 77
b-4.
c ) ) / (2.*a)
is better than
Complicated Expressions
Fortran statements must appear between columns 7 and 72 (inclusive) of
the input record. A long arithmetic assignment may not "fit" into this space. If
this is the case then it can be continued on the next line, provided that any
nonzero character is placed in column six. We use the ampersand "&"for this
purpose. Thus, the single line assignment
is equivalent to
a = b - c - d - e - f - g &
h - k
assuming that the "&" is situated in column 6. When spreading a statement over
two or more lines, break the statement in a natural place, i-e., not in the middle
of a variable name.
String Manipulation
The manipulation of string data is not particularly central to traditional
write
'
(*, * ) s t r
stop
end
A character variable has a size that is determined in the declaration. The "20" in
" c h a r a c t e r * 2 0 " means that s t r can hold strings comprised of 20 or fewer
characters. If no number is specified, the default length is 1, e.g.,
character str
Literal strings are surrounded by quotes when they appear in an expression. The
string computations that are supported by M7 include concatenation, substring
assignment, and substring extraction. For example, if s l = abcde and s 2
= ' x y z 1 then
concatenation:
s = sl / / s2
substring assignment: s ( 2 :4 ) = ' 1 23
substringextraction: s = s l ( 2 : 5 )
'abcdexyz
s
s
'a123e1
'bcde'
The length of a string and the detection of specified substrings is handled with
the built-in functions length and index. Simple examples suffice to define how
these functions work. If s = ' abcdabcd ' then
=
length(s)
index(s, 'da')
k
k
k = 8
k = 4
k = 2
k = O
12
I FORTRAN 77
The length function returns the length of the string named, not the length of the
string variable that contains the string. If s = " , the empty string, then length
returns zero. The index function also returns an integer. In a reference of the
form
k = index (sl,s2)
index looks for an occurrence of the second string in the first string. If no
occurrence is found, then zero is returned. Otherwise, the "starting" slot of the
first occurrence is returned.
Type Conversion
Variables must be assigned values that correspond to their type. If an
expression involves a mixture of types or if the result is to be placed in a variable
of a different type, then it is good practice to force the necessary type conversion
explicitly by using the following functions:
Function
Argument
Type
Result
Type
int
real
dble
ichar
char
x
x
x
c
i
r
d
i
c
L
Here, "c" stands for character, "r" stands for real, "dmstands for double
precision, "i" stands for integer, and "x" stands for real, double precision, or
integer.
For example, if ksum is an integer variable and sum is real then
sum
real (ksum)
1.1 BASICS
1 13
dble (x*y)
is preferable to
a**float (i)
+ b**float(i)
14
1 FORTRAN 77
c
exp ( f l o a t ( i )
* l o g ( a ) ) +exp ( f l o a t (i)*log (b))
is equivalent to
because the "9" and "5" are treated as integers and the integer quotient 9/5 yields
1. Double precision constants should be specified using the "d" notation. For
example, 9.0d0, 90.d-1, and .9dl each specify a double precision version of
"9." Thus, for correct double precision centigrade-to-Fahrenheit conversion one
should use
This is equivalent to
Not using the "d" and "en specification format makes a program vulnerable to
1.1 BASICS
1 15
type errors. It is good practice always to use the "d" and "e" notation when
specifying constants.
c
c
c
read(*, * )r
area = fourpi*r*r
w r i t e (*, *) area
stop
end
parameter
pi
3.14159,
2.718e0 )
16
1 FORTRAN 77
A variable can appear in at most one parameter statement.
The parameter statement@)in a program must come before the first
executable statements.
zero
O.OeO,
one
1.0e0 )
This helps reduce typographical errors and makes it easier to convert a code
fiom one precision to another.
Let t h e t a be t h e s i z e
gram p r i n t s i n t e g e r s d and m
t h e t a equals
d degrees
and r e a l s so t h a t
m minutes t s seconds
paxametex ( p i
3.14159e0
, cl
60.e0,
c2
180.e0)
read (*, * ) t h e t a
d =
S =
wxite ( * , * ) d , m,
stop
end
Do not declare any additional variables. Recall that .x: radians equals 180" and
3600 seconds = 60 minutes = lo.
1.1 BASICS
1 17
Logical Variables
Just as numeric values can be stored in real, integer, and double precision
variables, truth values can be stored in variables that have been typed "logical."
Thus, the declaration
l o g i c a l a, b
establishes two variables, each of which may be assigned either the value
.TRUE. or the value .FALSE. .The periods are part of the syntax and so, for
example,
.TRUE.
to a.
Logical Expressions
The truth value of a logical expression can be assigned to a logical
variable. A logical expression is formed by comparing arithmetic expressions
with the following relational operators :
.L T.
.LE.
.
.
.GT .
.EQ
.NE
.GE
less than
less thanor equal
quai
not equal
greater than
greater than or equal
1 19
is .TRUE. if and only if the contents of y and z when added exactly equal
the value of x.
Logical expressions can be combined using the logical operators AND,
OR, and NOT. These operators are defied by the following truth tables:
L
a .AND.
b
I
.F A L S E .
.FALSE.
.TRUE.
.TRUE.
.F A L S E .
.FALSE .
.FALSE.
.FALSE.
.TRUE.
.FALSE.
.TRUE.
.TRUE.
.F A L S E .
.FALSE.
.TRUE.
.TRUE.
.FALSE.
.TRUE.
.FALSE.
.TRUE.
a
a .OR.
.FALSE.
TRUE.
TRUE.
TRUE.
.
.
.
.NOT.
.FALSE. .TRUE.
.TRUE.
.F A L S E .
On the
20
is
1 FORTRAN 77
.TRUE.
"if'" Constructs
There are several if constructs in F77, The simplest is the logical if
statement:
if ( { logical expression j )
( executable statement )
For example, the following statement prints the value of count if it is positive:
if
c o u n t .GT. 0 )
1 21
w r i t e (*, * ) c o u n t
If "there is more than one thing to do" when the tested logical expression
is true then use the construction
then
If there are different things to do depending upon the value of the con&tiod
then use the if-then-else constmction:
else
{ statements )
Thus,
if
c o u n t .GT. 0 )
then
w r i t e (*, * ) c o u n t
count = count - 1
else
c o u n t = 100
endif
(
22
1 FORTRAN 77
then
{ statements j
then
{ statements j
Execution flows from top to bottom. The fust true logical expression prompts
the execution of the associated conditional code. Control is then transferred to
the next statement after the endif. The final else is optional.
Here is a program that reads three distinct integers and prints the median:
program median
i n t e g e r x, y, z , median
read(*,*)x,y,z
i f ( (x-z) * ( x - y ) .LT. 0 ) t h e n
median = x
e l s e i f ( (y-x) * ( y - z ) .LT. 0 ) t h e n
median = y
else
median = z
endif
w r i t e (*, *)median
stop
end
I 23
then
else
write(*,*)x,
endi f
endif
z, y
Stylistic Considerations
We have chosen to capitalize the relational operators for emphasis. A
more important stylistic concern involves indentation. Appropriate indentation
enhances the readability of if-then-else constructs. To appreciate this point
consider the unindented version of the preceding program segment:
if
if
x .GT.
y .GT.
w r i t e (*, * )
X,
.AND.
z ) then
y, z
x .GT.
z) then
else
write(*,*)x,
endif
endif
z, y
24
1 FORTRAN 77
2. Assume that x, y, and z are integer variables containing three distinct values.
Write a program segment that reananges the contents of these variables so that
has value .TRUE. Strive to make your program segment as short as possible.
Use as many auxiliary variables as you please.
4. Assume that a and b are initialized logical variables. Show how the
following program segments can be rewritten without nesting.
(a)
if
if
then
( b ) then
)
i = l
else
i = 2
endif
endif
(b)
if
if
then
( .NOT. b )
)
else
i f ( b )
endiE
i = 2
1.3 LOOPS
F77 supports one loop: the do-loop. However, it is possible to simulate
while- and until -loops using if constructs.
While-Loops
A while-loop can be realized as follows:
( label )
goto { label }
endif
The body of the loop is executed so long as the conditional is true. The goto
statement passes control to the labeled if statement. The label is a number
between 1 and 99,999 and must be situated within columns 2 through 5. All the
labels in a program must be unique.
To illustrate the while-loop construct consider the generation of the
Fibonacci sequence { X n ) defined by the recurrence Xn = xn-l + xn-2 with
x-1 = xo = 1. The following program prints Fibonacci numbers that are
strictly less than 200:
program f i b o n
i n t e g e r x, y, z
10
x = l
y = l
i f (x
wri
z =
y =
x =
.LT. 200)
t e ( * , *)x
x & y
x
z
got0 10
endif
stop
end
then
26
1 FORTRAN 77
Until-Loops
Until loops can also be implemented with the if and goto constructs:
{label)
continue
{ statements )
10
program: f i b o n a c c i
i n t e g e r x, y, z
x = l
y = l
continue
z = x + y
y = x
x = z
write(*,*)x
if ( X .LE. 200) g o t o 1 0
stop
end
1.3 LOOPS
1 27
= Z
write(*,*)x
10 c o n t i n u e
s*j
n-1,
n-k,
-1
20 c o n t i n u e
The do statement here says "step from n-1 to n - k in steps of -1." This
example reveals the general form of the do-loop:
continue
Here, var is the loop index, exp.1 is an expression that specifies the initial
value of var, exp.2 is an expression that specifies the terminating bound, and
exp.3 is an expression that defines the increment. To clarify do-loop
termination when the counting does not "land" on the termination value we
remark that
do 1 0 c o u n t
( loop body
10 continue
il, i 2 ,
i3
28
I FORTRAN 77
is equivalent to
c o u n t = il
10 i f ( c o u n t .LE. i 2
{
loop body
then
count = count
g o t 0 10
e n d if
i3
loop body
count = count
g o t 0 10
e n d if
then
i3
Note that the entire loop is skipped if the termination criteria are fulfilled when
the do statement is initially processed.
Follow these rules when constructing do-loops:
The do-loop variable, as well as the expressions in the do-loop
statement, should have type integer.
It is permissible to omit the third do-loop parameter, which specifies
the increment. When this is the case a unit increment is presumed.
The do-loop variable must never be changed by statements
within the body of the loop.
Nesting Do-Loops
Do-loops may be nested. Here is a program segment that prints all integer
triplets ( i, j, k ) with the property that I I i I j I 100 and k2 = i2 +
j2.
1.3 LOOPS
1 29
do 10 i = 1,100
do 5 j = i f100
k = int( sqrt( real(i*i + j*j))
if (k*k .EQ. i*i + j*j) then
write(*,*)i, j,k
endif
5
continue
10 continue
{ label )
30
1 FORTRAN 77
One final reminder: a do-loop must be entered "from the top." Never jump
into the body of a do-loop.
This quantity is always an integer. Correct the following program so that it reads
n and k and prints bi:
program bi
integer n, k, bi, j
bi = 1
read(*,*)n, k
do 1 0 j = 1 , k
bi = bi*((n - j + I)/])
1 0 continue
write ( * , *)bi
stop
end
1.3 LOOPS
1 31
2. Here is Euclid's algorithm for finding the greatest common divisor g of two
positive integers a and b :
Step 1. Divide a into b and let r be the remainder.
Step2. If r = 0 then set g = a andquit.
Step3. Seta = b , b = r , andgo to Step 1.
Implement this algorithm with a while-loop construct,
3. The following program segment computes the n = 10 partial sum of the
Taylor series for ex :
sum = 1 . 0
term = 1.0
do 1 0 n = 1 , 1 0
t e r m = t e r m * ( x / r e a l (n))
sum = sum + t e r m
1 0 continue
Change the segment so that upon completion s u m contains the tenth partial
sum for (a) sin(x), (b) cos(x), and (c) cosh(x) .
4. Write a program that prints the number of integral index pairs (a$) of the
form 1 I a c b I 100 with the property that a divides b.
5. Express the following as a while-loop:
eps = 1 . 0
do 1 0 k = 1 , 1 0 0
eps = e p s / 2 . 0
e p s p l = eps + 1 . 0
if ( e p s p l .EQ. eps)
1 0 continue
2 0 w r i t e ( * , * ) eps
go t o 2 0
1.4 ARRAYS
Many scientific computations require manipulation of matrices and
vectors. Fortran stores matrices and vectors in arrays. An array is just a
contiguous block of memory with an implicit indexing scheme. An array has a
data type.
One-dimensional Arrays
The declaration
r e a l a(100)
identifies b as a one-dimensional real array of size 100, with index range -41
1.4 ARRAYS
1 33
Double precision, integer, and logical arrays are also possible, e.g.,
d o u b l e p r e c i s i o n r (100)
i n t e g e r i ( 0 :9 9 )
logical b (10)
(-100:100)
1. Iffirst-index =
integer k(l:100)
34
1 FORTRAN 77
i n t e g e r x (10)
do 5 i = 4 , 7
x ( i ) = i-3
continue
Clearly, the size of the vector must be less than or equal to the size of the
holding array,
Two-dimensional Arrays
The declaration
i n t e g e r A (4,5)
d o 10 j = 1 , 5
do 5 i = 1 , 4
s = s + A(i,j)
5
continue
10 c o n t i n u e
1 35
1.4 ARRAYS
1) (last-index2
- first -index2 +
1)
Again, if an index range is from "I to something" you can avoid the colon
notation:
integer k(20,lOO)
integer k(l:20,1:100)
i*j
10
continue
20 c o n t i n u e
It is not necessary for the size of the matrix to match the size of the array but, of
course, the matrix must fit into the array. In the example above, the row and
column dimensions of the matrix A are less than or equal to the size of the row
and column dimensions of the array A . The matrix sits in the "aorthwest" comer
of the array:
36
1 FORTRAN 77
..............................
......................
...............
......;.. .,...; ............ :....*.'
:.:.........................
.:.:.:.:<..:.:.:.:.:;
..............................
.................
:.:.:.:.:.:.>:..:...:,..:..:.;:..:.: ................
:.:.:.:.:.:.:.:.:.:.:,:,:.:.
.......cm....:.........c..
..............
..........,... ,' ..................
..................... ........:-.;.,.... :.:.:.:.>.....
:.:.:,:.>.................. :.:,:.:+:.:.>:.:.:.:C
' .:
...........................................
.............................
..............................
.............................
...,
.
..:.::>>:.:.:<$?:.:.:< ...
...........
.............................
....................................................
.......)...>....>..
. :.>:.>:.:.:.>w.:.:
.............................
~k.x;~.~:.:.:.:.:.:.
>
:.:S
:.:.:.:.;.'.".. ...
.....
'.:"'.:.:.>.'.:....'
:.:.>:,:.:,:':.>
...:....'."..'.i...................
:.:.>:.:.:.:jg...<::::
.........
>...
...
.
.
<
.
.
...
:*:.>:.>:.:.:.>:.:.:.>:
.,:,:<.,...
.,... ..........................................
.....,..
" ....
.."...
,.....,.. ..........................
............................................
,.:...*. ;'.
............................
....;
n;..
..........................
................................
..................................
i
zE.tJ;
..C'...'.'.
C.....
.5',,,....
....,.. ..........................................
,:.:.:.:.:.:.:.:.>:.:.:.:.:.
i..........)
'
.
.
........... ,.;.:.:............................
;.:.:.:. ::::::>::::::::::>;<:>;:
.......
,::.:.:.:.:.:.:.:.:.:.:.:.:.:.Z,:.:.:.:.:.:.:.:.:.A
. ............................
.:.:.:.:.:.:.:.I*:.>
.>:+:.:.:.:.:.:.>:.>:.>:
:,:.;,:.:O:.:
:.Y::<::::::
k;::Ryr<:?:::::::::
i:i:i;i:i:j<$:::i:i:2j<<
...........................................................
...........................
..........................................
........................................................................................
."
:.:<.:,:.:<.:.:.:.>:.:...........
::.,::.:.:.:.:.:.:.:.:,:.:.:.,:.:.:.:.:<.:.:.:.:.:.:.:.:.
:.: .> ....
.
'
Lr
.
.:,:.:.:.:.:.>>:.:.:.>>:, A".'...'.'.'.....
..........................................
..a*..
::::::::::::::::::::::::.:::: ::5:335>333#::::.:<.>:<.:.:$g:::>::
:'$$:$$%?:$:i:
:.>.:A...:$.:................ ..... :.-:?::>::::::<:s:: ..,.r...... ..'.'..Y......\... ........
................................................................................
............................................
<. .............
....+.
:
...........................
..,..._..:
..
............................................
.............................................:........... '...',... ............
a. . . . . . . . . . . . .
T
.........
C1
Y....
...........................................
It is perfectly legal to refer to an element outside the matrix bounds but within
thearraybounds,e.g., v a l = A ( 4 , 5 ) . Thevariable v a l ismerely
assigned the contents of location ( 4 3 of the array. Depending on the context,
this may be aprogramming error. Array entries A (i,j ) may not be initialized
if either i or j is greater than 3. It is entirely the programmer's responsibility
to ensure that the index values at execution time are meaningful. Do not assume
that array entries are automatically initialized to zero by the compiler.
1.4 ARRAYS
1 37
Note that storage of the 3-by-3 times table matrix in the 4-by-5 array results in
unused array space:
(i-1)
This is because space is allocated column by column for the array A, not the
matrix A. It is interesting to note that neither the column dimension of the array
nor the dimensions of the stored matrix figure in the address computation.
(Note that "physicalttaddress computation requires knowledge of type as well.
For example, a cell may be 4 bytes long for integer arrays and 8 bytes long for
double precision arrays.)
In many programs it is the case that the declared arrays are exactly the
same size as the matrices they house. Thus, the array/matrix distinction is
38
1 FORTRAN 77
immaterial. But there are occasions when several matrix problems of unequal
size need to be performed and it is convenient to utilize a single array that is just
large enough to handle the largest problem. The following example shows how
we would go about printing the matrices F 2 , F,, and F 6 and their
transposes, where F',,is the n-by-n Frank matrix defined by
Example 1.4-1
program f r a n k
i n t e g e r F ( 1 5 , 1 5 ) , n,
i, j
do 20 n = 2,6,2
do 10 i = 1,n
do 5 j = l,n
if
(i ,LE. j ) t h e n
F (i,j) = n- j + l
elseif
(i .EQ. j + l
then
F ( i f j) = n-j
else
F ( i , j)
endif
continue
10
continue
do 1 5 i = l , n
write
15
(*, * ) ( F ( i f
j)
,j = l , n )
continue
d o 20 i = 1 , n
w r i t e (*, * ) ( F ( j , i ),j = l , n )
20
continue
25 c o n t i n u e
stop
end
The write involves a new construct that is handy when printing array elements.
When n = 4, for example,
1.4 ARRAYS
1 39
w r i t e ( * , * ) ( F (i,j) , j = l , n )
is equivalent to
Because matrices can be large, an entire matrix row or column may not fit on a
single output record. The resulting output may not be particularly attractive and it
may be necessary to employ the formatted write construct. This is discussed in
$1.6.
Packed Storage
Mattices are two-dimensional and therefore it is natural to store them in
their F77 counterpart, two-dimensional arrays. As we have mentioned, a twodimensional array is physically stored as a long one-dimensional sequence with
an implicit two-dimensional indexing scheme. However, there are occasions
when it is more efficient to explicitly store a matrix in a one-dimensional array
and to simulate the two-dimensional indexing. This occurs if the matrix has a lot
of zeros or some regular, exploitable structure .
As an example, let us return to the times table matrix. This matrix is
symmetric and so it is redundant to store both aij and aji. Suppose we store
only the lower triangular portion in column-by-column fashion in a onedimensional array. Thus, in h e n = 4 case we have the following identification:
40
1 FORTRAN 77
a(k) = A(i,j)
a(k)
i*j
k = k-tl
5
continue
10 c o n t i n u e
Note the need for the "hand-indexing" variable k .Writing clear programs that
manipulate matrices that are stored in the packed format is a challenge.
Comments should be used to explain the program subscripting.
8, 7, 4
1.4 ARRAYS
1 41
accomplished as follows:
10
20
30
40
d o 40 a = 0 , l
d o 30 b = 0 , l
d o 20 c = 0 , l
d o 10 d = 0 , l
hcube ( a , b, c, d )
continue
continue
continue
continue
8*a+4*b+2*c+d
W i = vi-1 for
3. Let E be the n-by-n down-shift matrix, i.e., w = EV
i = 2:n and wl = v,. Write a program segment that sets up the n-by-n
matrix A whose kth column equals Ek-lv, where v is a given n-vector.
4. The n-by-n Pascal matrix A, has 1's in the first column and row, and
+ u ~ Write
~ a- program
~
that sets
"interior" entries prescribed by aij = a.
up A,. What is the largest n for which A, can be precisely stored in your
computer?
1.5
SUBPROGRAMS
Built-in Functions
It is useful to preface the discussion of "user-supplied" subprograms with
some remarks about the built-in or "intrinsic" functions that are a part of F77
itself. A summary of these functions is given in Appendix 1.
It is necessary to distinguish between two kinds of built-in functions in
F77: specific and generic. A specific function has a data type (e.g., real,
double precision, etc.). For example, the integer-to-character conversion
function char has data type character because it returns a string. When specific
functions are used they should appear in the type declaration statements.
A generic intrinsic function does not have a data type per se. The data
type of the returned value of a generic intrinsic function is determined by the data
type of its arguments. Thus, if the statement
appears in a program, then x and y should be of the same type because cos
returns a real value if y is red and a double precision value if y is double
precision. A generic built-in function should not be declared in a type
declaration statement.
Built-in functions can appear in assignment statements. Thus, if p i is
double precision then
1.5 SUBPROGRAMS
c o s ( t h e t a )* s i n ( p h i )
1 43
c o s ( p h i )* s i n ( t h e t a )
and
s
sin (theta
phi)
Functions
We now consider user-written subprograms. To impose a continuity on
the discussion, many of our examples revolve around the following
mathematical problem:
44
1 FORTRAN 77
Now check f ( l ) , . . , f
x = real(k)
m = min(m, ( ( x
10 c o n t i n u e
w r i t e ( * , *)m
stop
end
(lo)
2.0)*x - 7.0)*x
3.0)
1.5 SUBPROGRAMS
1 45
return
end
Here are some rules associated with user-defined functions:
The value produced by the function is returned to the calling
program via a variable whose name is the function's name, e.g.,
f ( X ) is returned in the variable f.
A function has a type and it must be declared in the calling program.
The body of a function resembles the body of a main program and
the same rules apply.
When a function is called, execution begins at the top and flows
to the bottom. When a return statement is encountered the subroutine
is exited and control is passed back to the calling program.
A reference to a defined function can appear anywhere in an
expression, subject to type considerations.
46
1 FORTRAN 77
do 1 0 k = 1 , 1 0
ml = m i n ( ml, E l ( r e a l ( k ) ) )
10 continue
m2 = f 2 ( 0 . 0 d 0 )
do 2 0 k = 1 , 2 0
m l = m i n ( m l , 1( r e a l ( k ) ) )
20 continue
w r i t e (*,*) ' m ( f 1 , 1 0 ) = t , m l , ' m ( f 2 , 2 0 ) = ' , m 2
stop
end
,f ( r e a l ( k ) )
The example introduces the external statement. Any time a function or a main
program passes the name of a function to another function it must identify the
name in an external statement:
1.5 SUBPROGRAMS
1 47
Common
Now suppose that we want to solve problem P for arbitrary n and
arbitrary cubics Ax) = ax3 + bx2 + ex + d. The function
r e a l function f ( x , a f b f c , d )
real xfafbfc,d
f = ( ( a * x + b ) *x + c ) * x + d
return
end
is a perfectly legal way to evaluate f but it has the flaw that we can no longer
use the function fmin to compute m ( f , n ) . The reason is that f m i n
expects a function with a single argument, i.e., E ( x ) , not a function of the
form f ( x , a , b, c, d ) . One way around this is to modify fmin so that it
accepts functions with five real arguments. However, if we think of fmin as a
general routine for solving problem P then this is not an acceptable solution.
An alternative solution involves a new construct--the common
statement. By placing variables "in common" they can be shared between the
main program and one or more subprograms. We can place the variables a, b,
c, and d in common as follows:
program min5
integer n
r e a l m, a , b, c, d l f
common / c o e f f / a , b, c, d
r e a d (*, * ) n , a , b, c, d
m = fmin(f,n)
w r i t e ( * , *)m
stop
end
48
1 FORTRAN 77
This program prints m(f,n), provided we identify the common variables in the
function f as well:
r e a l function f ( x )
real x
r e a l a,b, c , d
common / c o e f f / a, b, c , d
f = ((a*x + b)*x + c)*x + d
return
end
z , j, k
1.5 SUBPROGRAMS
1 49
results in a type error. A subprogram may use only part of a common block.
Thus,
integer m
real a,b,c
common / e g / a , b, c , m
is legal. However, it is good practice not to take such shortcuts when using
common.
Subroutines
Let us return to problem P. Suppose that in addition to computing
m(f,n) we want to determine an integer k E ( 1,...,n ) so f ( k ) = m(f,n),
i.e., the minimizing point. Since a Fortran function can essentially return only
one value, we cannot readily approach this new problem by modifying the
function fmin. A different kind of subprogram called a subroutine should be
used. Here is an example that solves the problem of computing both the
minimum value and the minimizing point:
s u b r o u t i n e f m i n ( f , n, v a l u e , p o i n t )
i n t e g e r n, p o i n t
r e a l f , value
integer i
r e a l temp
50
1 FORTRAN 77
point = 0
value = f (O.Oe0)
do 10 i = l,n
temp = f (real (i))
if (temp .LT. value) then
point = i
value = temp
endif
10 continue
return
end
The subrou the fmin can then be "called"from the main routine:
program min6
external f
integer n, mpt
real mval
read(*,*)n
call fmin( f, n, mval, mpt )
~rite(*~*)'m(f~n)=~~mv
'mpt
a l ,=
stop
end
Execution of the main program min6 results in the printing of m(f,n) and the
minimizing point.
In general, subroutines are structured as follows:
end
Here are some rules associated with subroutine use:
1.5
SUBPROGRAMS
1 51
Theargumentsinthecallmustagreeinnumberandtypewih
the arguments specifed in the subroutine statement. The names of
the listed variables in the call need not agree with the names of the
"dummy" variables used in the subroutine definition. In the above
example, m a 1 and m p t are identified with the dummy variables
v a l u e and p o i n t , respectively.
When a subroutine is called, execution begins at the top and flows
to the bottom. When a return is encountered control passes back
to the line after the call in the calling program.
When calling a subroutine with an argument list, the Fortran 77
compiler actually passes the addresses of the arguments as opposed
to the values themselves. Usually this does not affect the programmer
one way or the other, but it does play an important role when passing
arrays where the address of the first element is passed. We explore
this in more detail in 9 1.6.
Unlike functions, subroutines do not have a type. The name of a
subroutine does not return a value.
52
1 FORTRAN 77
program p r i n t
r e a l a , b, c , dl xO, v a l u e , f
r e a d ( * , * ) a , b, c, dl xO
v a l u e = ( a , b, c, dl xO)
w r i t e (*, * ) v a l u e
stop
end
r e a l f u n c t i o n f ( a , b, c , d l x )
r e a l a , b, c, d l x
f = ( ( a * x + b ) *x + c ) *x + d
return
end
f = exp(x)
n = n+l
return
end
Ordinarily, the local variable values in a subprogram are lost when control
passes back to the calling program. However, it is possible to retain the value of
a local variable if it is named in a save statement. Here is an example:
subroutine p r i n t (k)
integer k
integer lastk
save l a s t k
C
c
c
c
T h i s s u b r o u t i n e p r i n t s t h e v a l u e of k i f i t
i s z e r o o r d i f f e r e n t from t h e l a s t p r i n t e d
v a l u e . The f i r s t c a l l must be w i t h k = 0 .
( k .EQ. 0 ) t h e n
w r i t e (*, * ) k
lastk = k
e l s e i f ( k .NE. l a s t k ) t h e n
write(*,*) k
lastk = k
endif
return
end
if
The save statement should be placed before the first executable statement in the
subprogram and it should have the form
save ( list-of-variables )
54
1 FORTRAN 77
Subprogram Style
In the interest of space we have suppressed comments in most of our
examples in this section. Of course, in practice it is very important that
subprograms be amply commented so that their use is straightforward. Other
guidelines should be followed when writing subprograms and we express these
in the form of a template. (For functions, replace the first line with " {type)
function ( { argument list ) ) " .)
subroutine { name ) ( { argument list ) )
C
1.5 SUBPROGRAMS
1 55
return
end
c u b i c f ( x ) = ( (ax + b ) x + c ) x
{1,2,.. . , n ) .
+ d on t h e s e t
U s e s fmax.
Common b l o c k s
common / c o e f f /
a , b,
C,
Read n and t h e c o e f f i c i e n t s a , b , c ,
and d.
56
1 FORTRAN 77
r e a d ( * , * ) n , a , b, c, d
C
m = fmax(f,n)
w r i t e ( * , *)m
stop
end
r e a l f u n c t i o n fmin ( f , n )
integer n
real f
C
c
c
c
This function r e t u r n s m ( f , n ) =
min { f ( 0 ) , . . , f ( n ) } where f i s a r e a l
f u n c t i o n of a s i n g l e argument.
Local V a r i a b l e s and I n t r i n s i c f u n c t i o n s
integer k
real f
C
fmin = f ( 0 .OeO)
do 1 0 k = 1 , n
fmin = m i n ( fmin
1 0 continue
return
end
f ( real(k) ) )
r e a l function f (x)
real x
C
c
c
T h i s f u n c t i o n r e t u r n s t h e v a l u e of t h e
cubic f ( x ) = ( ( a x + b ) x + c ) x + d m
r e a l a , b, C, d
common / c o e f f / a , b, c , d
f = ( (a*x + b ) *x + c) *x + d
return
end
Bibliography
31 5
[62] D. De Zeeuw. Matrix-dependent prolongations and restrictions in a blackbox multigrid solver. J. Comput. Appl. Math., 33: 1-27, 1990.
[63] A. J. Degregoria and L. W. Schwartz. A boundary-integral method for two-phase
displacement in Hele-Shaw cells. J. Fluid Mech., 164:383-400, 1986.
[64] S. Deng. Immersed Intetfiace Methodfor Three Dimensional Interface Problems and
Applications, Ph.D. thesis. North Carolina State University, 2000.
[65] S. Deng, K. Ito, and Z. Li. Three dimensional elliptic solvers for interface problems
and applications. J. Comput. Phys., 184:2 15-243,2003.
[66] R. H. Dillon, L. J. Fauci, and A. L. Fogelson. Modeling biofilm processes using the
immersed boundary method. J. Comput. Phys., 12957-73, 1996.
[67] M. Dumett and J. Keener. An immersed interface method for anisotropic elliptic
problems on irregular domains in 2D. Numer: Methods Partial Differential Equations,
2 1 :397-420,2005,
[68] D.L. Dwoyer and F. C. Thames. Accuracy and stability of time-split finite-difference
schemes. In 5th AIAA Computational Fluid Dynamics Conference, Palo Alto, CA,
1981.
[69] W. E and J.-G. Liu. Projection method I. Convergence and numerical boundary
layers. SIAM J. Numer: Anal., 32: 1017-1057, 1995.
[70] W. E and J. Liu. Vorticity boundary condition and related issues for finite difference
schemes. J. Comput. Phys., 124:368-382, 1996,
[71] D. Enright, R. J. Fedkiw Ferziger, and I. Mitchell. A hybrid particle level set method
for improved interface capturing. J. Comput. Phys., 183:83-116,2002.
[72] G. H. Gottet and E. Maitre. A level-set formulation of immersed boundary methods
for fluid-structure interaction problem. C. R. Acad. Sci. Paris, 338:581-586,2004.
[73] L. C. Evans. Partial Differential Equations. AMS, Providence, RI, 1998.
[74] R. Ewing, 0. Iliev, and R. Lazarov. A modified finite volume approximation of
second-order elliptic equations with discontinuous coefficients. SIAMJ. Sci. Comput.,
23:1335-1351,2001.
[75] R. Ewing, Z. Li, T. Lin, and Y. Lin. The immersed finite volume element method for
the elliptic interface problems. Math. Comput. Simulation, 50:63-76, 1999.
[76] D. Eyre and A. Fogelson. IBIS: A software system for immersed boundary and
interface simulations. http://www.math.utah.eddIBIS.
[77] L. J. Fauci. Interaction of oscillating filaments: A computational study. J. Comput.
Phys., 86:294-3 13, 1990.
[78] B. A. Finlayson. Numerical Methods for Problems with Moving Fronts. Ravenna
Park Publishing, Inc., Seattle, 1992.
the double precision two-dimensional array H is defined and can be used within
this subroutine. However, it is not good programming practice to use local
arrays except in cases where the array size is small and fixed. Instead, it is
preferable to define each array in the main program and then to pass it and the
relevant dimension information as arguments. This allows for more flexible
subprograms since all concerns regarding space allocation, as well as input and
ouput of arrays, are dealt with in the main program. Most scientific software
packages adhere to this policy of array use and so it is important that it be
understood.
Local v a r i a b l e s
i n t e g e r i, j
1 59
do 1 0 i = 1, p
w(i) = O.OeO
1 0 continue
do 3 0 j = 1, q
do 2 0 i = 1, p
w(i) = w(i)
20
continue
3 0 continue
return
C(i, j)*v(j)
end
Notice that m a t v e c requires the row and column dimensions of the matrix C
and the row dimension of the array C.
A main program that calls m a t v e c could be structured as follows:
integer i d i m , j d i m
parameter (idim = 50, jdim = 40)
i n t e g e r i f j, m, n
r e a l A ( i d i m , j d i m ) , x ( j d i m ) , y (idin)
Compute y
Ax
c a l l m a t v e c ( m , n, A, i d i m , x , y )
stop
end
This example illustrates a number of key ideas. It is fairly obvious that the row
and column dimensions of the matrix A must be passed in m and n for
otherwise it would not be identifiable. However, it is also essential that i d i m ,
the row dimension of the array, be passed to matvec as well . When m a t vec
is called the address of A ( 1 , l ) is passed. Within m a t v e c , the location of
element ( ij ) of the matrix C is computed as we discussed in 8 1.4 :
60
1 FORTRAN 77
.
That is why the row dimension of the array A must be passed to matvec
Another lesson to be learned from the matrix-vector multiply example
pertains to the use of the asterisk in the type declaration statement in
subprograms. The F77 compiler knows that v and w are to be treated as onedimensional arrays because they are both listed with one argument in the type
declaration statement. The asterisk acts as a "placeholder." Likewise the
compiler knows that C is to be treated as a two-dimensional array because it is
listed with two arguments in the type declaration statement. The number of
arguments listed indicates the assumed dimension of the array in the
subprogram. The last dimension of an F77 array is not needed for address
computation, so asterisks suffice.
The asterisk could be replaced with any positive integer without harm
since it is used only as a marker. In fact, the old Fortran did not allow asterisks
and an argument of " 1" was commonly used.
Different Dimensions
If a variable is used as a two-dimensional array in the body of a
subroutine, a e n it must be declared as a two-dimensional array in the
subrouline's type declaration statement. In the matvec example, C is used as a
two-dimensional array in the body of matvec because it appears with double
indices in a e declaration.
However, the dimension of a passed array does not have to conform to its
dimension in the calling program. It is perfectly legal to pass a two-dimensional
array to a subprogram and then to treat it as one-dimensional in the subprogram,
provided care is exercised. For example, suppose that we have at out disposal
the function
i n t e g e r f u n c t i o n p r o d ( n , x, y )
i n t e g e r n, x ( * ) , y ( * )
C
p r o d = xTy where x , y a r e i n t e g e r n - v e c t o r s
10
integer i
prod = 0.
d o 10 i = 1 , n
prod = prod
continue
return
end
-t
x (i)
*y ( i )
1.6
1 61
A, A )
Note that the address at the top of the jth column is passed at iteration j. As far
as prod is concerned this is just the address of the top of the vectors x and
y. Finally, note that it is not necessary for prod to "know" a d i m because
only single, designated columns of A are referenced during a given call.
Passing Submatrices
Suppose A is an m-by-n matrix and that the integers il, i2 ,j l , and
j2 satisfy 1 I i l I i2 I m and 1 I jl I j2 I n . B y A ( i l : i 2 ,
jl:j2) we mean the submatrix of A obtained by extracting rows il through i2
62
1 FORTRAN 77
hen
n, B, b d i m )
i n t e g e r m, n, b d i m
r e a l c, B ( b d i m , * )
c
c
O v e r w r i t e s t h e m-by-n m a t r i x B w i t h c B .
T h e a r r a y B has r o w d i m e n s i o n b d i m .
do 1 0 j = 1 , n
do 5 i = l , m
~ ( i j)
, = ckB(irj)
5
continue
1 0 continue
return
end
r n = i 2 - i l + 1
n = j 2 - j 2 + 1
c a l l s c a l e l ( c, rn,
n, A
1 63
( 1 ), adim )
a d d r [ ~ ( i lj ,l ) ] + adim* ( j - 1 )
(i - 1)
Stride
We now discuss another setting where "address reasoning" is important.
Suppose we have the following subroutine for scalar-vector multiplication:
s u b r o u t i n e s c a l e 2 ( n, c, v,
incv )
i n t e g e r n, i n c v
r e a l c, v ( * )
C
c
c
For j = 1,n, o v e r w r i t e s v ( l + ( j - 1 ) i n c v ) w i t h
c*v (1+( j - 1 ) i n c v )
i n t e g e r j, k
C
k = l
do 1 0 j = l , n
v ( k ) = c*v(k)
k = k + incv
1 0 continue
return
end
64
1 FORTRAN 77
v ( 1 ) , v ( 3 ), v ( 5 ) , v ( 7 ) s c a l e d
c a l l scale2 ( 4,c,v(2) , 2 )
v ( 2 ), v ( 4 ), v ( 6 ) , v ( 8 ) s c a l e d
c a l l s c a l e 2 ( 2, c, v ( 3 ), 4 )
v ( 3 ) ,v ( 7 ) s c a l e d
A,
adim
...
i n t e g e r adim
real x ( * ) , A(adim,*)
then for addressing purposes, subl assumes that x (1) is the top of x and
A ( 1,I ) is the top of A. If within a calling program we have
real ~ ( 0 ~ 6 B
3 ( )- 1~: 1 0 , 3 : 4 )
c a l l s u b l ( y , B, 11, . . .
1 65
Program
and wish to regard them as indexed from zero in sub2 . The proper way to
invoke sub2 is then
call sub2( x, 0, 4, A , 0, 9, 0, 4, ...
This defines a common block with name matrix and it contains the array A,
the vector v ,and the scalars de Ita,n,and dim,This group of statements,
66
1 FORTRAN 77
It is tempting to delete the parameter statement and thereby let the main
program "control" the size of the common arrays, but this is illegal. Each array
in each common block must be dimensioned consistently in each subprogram in
which it is used, This requirement discourages the use of common blocks in
subroutines that are designed for flexible general library use. Returning to our
example, changing the dimension of A in the main routine demands changing the
corresponding parameter statements in each of the subprograms that use
matrix.
1 67
program main
r e a l ~ ( 1 0 0 )~ ~( 1 0 0 )
integer n
common / f d a t a / u , v, n
r e a l function f (x)
r e a l ~ ( 1 0 0 )~ ~( 1 0 0 )
integer n
common / f d a t a / u , v, n
real s
integer k
s = 0.
do 1 0 k = l , n
s = s + (u(k)*x - v(k))**2
10 continue
f = s - 1 .
return
end
incx, y, incy )
that overwrites
with
a x(1
for i = 1:n. Here, positive integers n , incx, and incy are stored in integer
variables n, i n c x , and i n c y and the appropriate portions of x and y are
initialized.
68
1 FORTRAN 77
that takes a conventionally slored A and stores the nonzero portion column by
column in a singly subscripted array av.
Write a subroutine matvec that can compute the rn-by-n matrix vector product
y = Ax . Assume that m, n, x ( 0 : n - l ) , and~(0:m-1,O:n-1)are
initialized. Subscript all arrays in matvec from zero.
List-directed "read"
A data file consists of records with any number of data items per record.
Conceptually, it is handy to think of a record as a line. The items in a record may
be separated by commas or blank spaces. Consider the two-record data file
70
1 FORTRAN 77
i , j, and rn are integer variables and that x, y, and z are real variables. If the
following read statements are executed
then the integer variables i , j , and rn are set to 10, 5, and 6 and the real
variables x, y, and z are set to 1.0, - 1.0, and 2.0 .
This example highlights a number of guidelines that pertain to the use of
read statements. The type of input value must correspond to the type of variable
to which it is being assigned. Data items 1 0 , 5, 6 are integers (no decimal
point) and correspond to integer variables i, j ,rn. Data items 1.O, -1.O, 2.0
are r e a l , corresponding to real variables x, y , z The foxmat of the data is
"directed" by the list of variables.
Each time a read statement is encountered a new record in the data file is
accessed. So if there are more data items in a record than variables listed in the
corresponding read statement, the extra data items are ignored. For example,
if the above data file is changed to:
then i , j, m, x, y, and z take on the sane values as before but the data items
"3" and "5.0" areignored.
If the statements
then an error results even though there is enough data to go around. The reason
for this is that the data item " 1.0 " is ignored and the program hangs up waiting
for a value for z .
If the number of variables listed in a read statement exceeds the number
1 71
of data items in the current input record, then subsequent records are read. For
example, if the program segment
read(*,*)i, j, m, x
read(*,*)y, z
is executed with data file
then all the variables are assigned. However, if the data file is replaced by
then y and z would not be assigned any new values. This is because upon
reaching the statement
List-directed "write"
The simplest form of this statement is
72
1 FORTRAN 77
Formatted "readt'
If the organization of the data in a file is known then the formatted read
can be used. For example, suppose that every line of the data file consists of
three integers, each occupying afield that is six places wide:
Of course only the digits are part of the data file. The markings are used to
expose the spacing. Now consider the following formatted read statements,
where all the variables in question have integer type:
100
r e a d ( * , 1 0 0 ) i fj , k
read(*,lOO)m,n,o
read(*, 100)prq,r
format ( 16,16,16
1 73
There are three integer fields per line, each of the form "16."The "I"
means integer and the "6"means six spaces wide.
Each integer occupies a six-column "field." This includes the sign
bit. The + sign can be omitted. Thus, 123456 can be accommodated
but -123456 cannot. The second number in the third record is 12345.
All blanks are viewed as zeros so, for example, the third number in
the second record is read as "100."Likewise, the third number in the
third record is 600,000. Thus, each integer should be right-justified
in its field unless the augmentation of low-order zeros is planned.
Floating point numbers can be read either in positional (fixed) form or
exponential form. In the following example, the data file contains numbers in
positional form:
Unlike the I format case, the numbers do not have to be rightjustified in their fields. For example, both x and y are assigned
the same value above.
In general, formats of the form F i . j should satisfy i 2 j+2 because a space
must be reserved for both the decimal point and the sign.
In the next example, the data file contains floating point numbers in
exponential form:
There are two "E" fields having widths 9 and 10. The " i " in " E i .j "
indicates the field width and the " j" designates the length of the
mantissa.
The signs of the mantissa and the exponent and the decimal point
occupy space. In an ~ ji format
.
it should be the case that
i 2 j+5 so that there is enough room to handle the representation
(assuming two-digit exponents).
Trailing zeros are ignored in both the mantissa and exponent parts
and so, for example, y is assigned the value 5671.23 .
Exponential format data can be read into double precision variables using
the D format. The same rules that apply to the E format apply to the D
format. Thus, if d x is a double precision variable then
1 75
Formatted write"
tt
t I
To appreciate why this is so, we summarize some facts about formatted write:
The first character in an output line is for carriage-return. Here are the
possibilities:
76
1 FORTRAN 77
blank
a
a
single spacing
ovenvritc current line
double spacing
triple spacing
new page
Upon t e r m i n a t i o n :
x= .123456D+02
y=
.123456D+02
i- 1 2 3 4 5 j= 1 2 3 4 5
format
lx,
x*y
'xy product is
',
e13.6)
1 77
placed adjacent to it. However, because format statements tend to be ugly and
"disruptive" when reading a code, we advise collecting all the format
statements in the program and placing them at the end just before the end
statement.
Furthermore, we recommend distinctive labels for all the format
statements that appear in a program. For example, if all the labels in the
executable part of the code are between 1 and 999, then four-digit labels for the
formats add a small amount of clarity and encourage the reader interested in the
formats to look for their specification at the end of the program.
Some Shortcuts
Often a given format involves repetition, e,g.,
100
format ( 2 x , 1 2 , 2 x , 1 2 , 2 x ,
12 )
format ( 3 ( 2 x , I 2 )
100
format ( I 5 , 1 5 , / E 1 0 . 2 , / E 1 0 . 2 /
Similarly,
)
is equivalent to
100
format ( 2 1 5 , 2 ( / E 1 0 . 2 ) , / )
...
data (list-ufvariables/[lisf-ofvaluesj/,
78
1 FORTRAN 77
m/-5/,
d a t a n , m / 1 0 0 , -5/,
x/2.0/,
y/2.0/1
2/2-0/
x , y, z / 3 * 2 . O /
These two data statements are equivalent in that they both result in the following
assignments:
assigns a 4-by-3 matrix of ones to A, sets the first row of B to [ 1 21, assigns
a 4-vector of zeros to c, and sets the second component of d to 1. Notice
that the array names A and c appear in the data statement without arguments.
It is not possible to use the data statement to initialize a matrix which is a
proper subset of a Fortran array, except on an element-by-element basis. In the
above example this is done for the first row of the array B. The following is
illegal and does not store the 2-by-2 zero matrix in A :
data A ( l , l ) / 2 * 0 . 0 / ,
A(l,2)/2*0.0/
1 79
before program execution begins. For this reason the data statement is usually
not used within subroutines.
The data statement is quite different from the parameter statement. The
latter is reserved for constants. In contrast, the variables listed in a data
statement can be modified in subsequent program statements. We remark that
early versions of Fortran did not support the parameter statement. Therefore
the data statement was used for constants as well as for variable initialization.
F77 programs should define constants using the parameter statement only.
The data statement cannot be used for variables within a common block
However, there is a special kind of subroutine, called a "block data" subroutine,
made for this purpose:
block d a t a
integer d i m
parameter ( d i m = 1 0 )
integer n
r e a l a(dim, d i m ) , v ( d i m ) , d e l t a
common / m a t r i x / a , v , d e l t a , n
d a t a a/100*1.0/, v/10 * 2.0/, d e l t a / l . O / ,
end
n/5/
attempts to read 2000 numbers and so the data file must contain at least 2000
numbers, The number of entries per record is arbitrary. This statement wilI
cause A to be filled, column by column. Obviously this form of the read
statement is not convenient when entering data for a matrix A that is strictly
embedded in the array A since it demands enough data to fill up the entire array.
To handle this eventuality we can use
80
1 FORTRAN 77
where we assume that the row and column dimensions of the matrix A are
stored in m and n. Here the array A is initialized column by column. Of
course if the data is naturally generated in a row-by-row fashion then
also does the job, However, bear in mind that this order may result in poor
performance as it does not access contiguous blocks of memory. Try to arrange
input in column-oriented fashion. See 5 1.9.
Of course the default options, (*,*), need not be used. The above
constructions can be used with format. For example,
r e a d ( * , l O ) ( ( A ( i , j), i = l , m ) ,
j=l,n)
10 f o r m a t (1015)
indicates that the matrix A is read according to format 1 0 , That is, each line of
data consists of ten data entries, each of which is an integer occupying five
columns.
The output of an entire array A ( i d i m , j d i m ) can be achieved as
follows:
which results in six numbers per line. It is important to realize that when A is
referred to without arguments then the entire array is printed, not just the
contained matrix or vector. Row i of an m-by-n matrix A can be printed by
the statement
One must remember that output lines have a finite width, e.g., 80 characters. If a
program attempts to p h t a line that is wider than the maximum, then the output
will "spill over" into one or more succeeding records. This often happens when
1 81
attempting to print matrices that have a fair number of columns. Care must be
exercised to produce aesthetic output.
where the components are in F7.3 format. If no such e exists then the
components of v should be printed in a column using E format.
sets aside two real variables for representing a complex number: one for the red
part and one for the imaginary part. Physically, it is helpful to picture z as a
pair of adjacent real variables. Complex arrays are also possible. For example,
integer a d i m
p a r a m e t e r (adim = 10)
complex A (adim, adim)
complx (x, y)
assigns x + iy to z . Likewise,
1 83
extract the real and imaginary parts of the complex number stored in z.
Complex conjugates can be obtained as follows:
while the functions cabs, cexp, clog, ccos, and csin, respectively, compute
the exponential, logarithm, cosine, and sine of a complex argument, Note that
cabs is a real-valued function.
is equivalent to
x
y
x*x - y*y
2 .OeO*x*y
complx (u,v)
real(z)
aimag(z)
+ x
begin with,
write ( * ,
*)
is equivalent to
write ( * , * ) real (z), aimag(z) .
Regarding formatted I/O, the D , E , and F formats can be used with the
understanding that a complex variable is really two floating point variables.
Thus, if z is complex
write ( * , 100)z
format (*real(z)=*
,f10.7, 'imag(z)=* 10.7)
100
is equivalent to
x = real(z)
y = aimag(z)
write (*, 100)x,y
100 format('real(z)= * ,f10.7,'imag(z)-' 10.7)
where x and y are real.
C;
c
c
d = b*b - c
i f ( d .GE. 0 . O e O )
1 85
then
Real r o o t s
maxrt = a b s ( -b + s i g n ( s q r t ( d ) ,-b) )
else
Complex r o o t s
maxrt = s q r t ( b*b
d)
endif
return
end
do 1 0 k = l , n
x ( k )=complex ( a * r e a l ( x ( k ) )
1 0 continue
a*aimag (x ( k ) ) )
86
1 FORTRAN 77
+ i sin(2rrln)
and
p and q satisfy 1 5 p,q 5 n. Exploit the fact that w is an nth root of unity
and so wm = wr,where r = m mod n, Assume that A is a complex array
with row dimension stored in a d i m .
2. Write a subroutine
POLAR( z, r, t h e t a )
+ iy
to polar fonn z = r
ei0.
3. Write a subroutine
that overwrites the complex n-vector v with a unit 2-norm complex vector u
that is in the direction v + e-ie II v 112 el , where e l is the first column of
the identity and vl = I vl lei0 .
Here are three ways to compute d. The first is a straight encoding of the
formula, the second exploits some common subexpressions and identifies the
(x,y,z) distances, and the third exploits some trigonometric identities:
Method A
d
sqrt (
& ( r*sin (bl)*COS (al) - r*sin(b2) *cos (a2) ) **2
& ( r*sin(bl)*sin(al) - r*sin(b2)*sin(a2) )**2
& ( r*cos(bl) - r*cos(b2) )**2 )
=
Method B
sl = sin (bl)
+
+
88
1 FORTRAN 77
s 2 = sin(b2)
x d i s t = s l * c o s ( a l ) - s2*cos(a2)
ydist = s l * s i n ( a l ) - s2*sin(a2)
z d i s t = cos(b1) - cos(b2)
d = r * s q r t ( x d i s t * * 2 -t y d i s t * * 2 + z d i s t * * 2 )
Method C
t
d =
&
- a2)
r * s q r t ( l . - t*cos(bl-b2) -
= cos(a1
cos(bl)*cos(b2)*(l.- t ) )
1 89
Because of memory concerns, a work array should "earn its keep." But the
matter should be kept in perspective. For example, a few linear work arrays
hardly matter in a program that has two-dimensional arrays of the same order.
( na
.NE. mb ) t h e n
w r i t e (*, * ) ' P r o d u c t A 3 n o t d e f i n e d '
return
e l s e i f (cdim LT. m)
w r i t e ( * , * ) ' A r r a y C n o t b i g enough.
return
endif
'
90
1 FORTRAN 77
This sets up an n-by-n upper triangular matrix that has 1's on the diagonal and
-1's above the diagonal. It is easy to show that n(3n+l)/2 conditionals are
tested during execution, If the "i EQ . j " conditional were to come first,
almost twice as many comparisons would be required.
The example has further interest because it can be written in such a way as
to avoid all "i-j" comparisons:
5
10
20
25
30
do 1 0 j = 2 , n
do 5 i = 1, j-1
A ( i , j ) = -1
continue
continue
do 20 j = 1 , n
A(j,j) = 1
continue
do 30 j = 1,n-1
do 25 i = j + l , n
A(i,j) = 0
continue
continue
The price paid for this rearrangement is a slight increase in the number of loops,
However, for large n this is not important,
Integer Arithmetic
It is important to be aware of the integer arithmetic overheads associated
with subscripting and loop execution. Integer arithmetic is not trivial compared
to floating point arithmetic. Because more attention is typically paid to the latter,
it often happens that a program's performance is degraded because of careless
looping and subscripting. Suppose that x ( 0 :n- 1 ) and y ( 0 : n- 1 ) are
initialized and that we want to compute
for k = 0,...,n- 1. In practice one would do this using fast Fourier transforms.
However, if we approach the problem with a straightforward double-loop
1 91
encoding we can then illustrate a couple of widely used "tricks." We begin with
the naive algorithm:
d o 30 k = 0,n-1
s ( k ) = 0.
do 1 0 i = 0 , n - k + l
s(k) = s(k) + x(i)*y(i+k)
10
continue
d o 2 0 i = n-k,n-1
s ( k ) = s ( k ) + x (i)
*y(i-n+k)
20
continue
30 c o n t i n u e
In assessing the efficiency of the computation we focus on the " s ( k ) " updates,
as they are the most deeply nested statements. Two maneuvers can significantly
reduce the amount of subscripting , i.e., the amount of integer arithmetic. First,
we can build the running sum in a scalar variable. This avoids having to index
into the s vector until the accumulation of sk is complete. The other economy
results by getting the "k-n" computation out of the 2(rloop. (This assumes a
"stupid" compiler.) With these alterations we obtain
do 30 k = 0 , n - 1
t = 0.
d o 10 i = 0,n-k+l
t = t + x (i)
*y(i+k),
10
continue
kmn = k - n
do 20 i = n - k , n - 1
t = t + x ( i ) *y (i+kmn)
20
continue
s(k) = t
30 c o n t i n u e
92
1 FORTRAN 77
real s
s = 0.
do 10 i = l , n
s = s + x(i)*y(i)
10 c o n t i n u e
dotl = s
return
end
because the stride is two. (The stride of a vector refers to the spacing between
the components.) To rectify this we expand the argument list so that stride
information can be passed:
r e a l f u n c t i o n d o t 2 ( n, x,
i n t e g e r n, i n c x , i n c y
real x ( * ) , y(*)
i n t e g e r if i x , i y
real s
ix = 1
iy = 1
s = 0.
do 10 i = l , n
s = s -t x ( i x ) * y ( i y )
i n c x , y, i n c y )
ix = ix
iy = iy
10 c o n t i n u e
dot2 = s
return
end
1 93
+ incx
+ incy
is thus equivalent to
Note that d o t 2 is a more general dot product routine than d o t 1, but that it
involves considerably more integer arithmetic. For unit stride dot products, we
are better off with d o t 1. If unit stride dot products are sufficiently common
(and they usually are) then d o t 2 is not that attractive. However, the efficiency
of d o t 1 for the unit stride case and the generality of d o t 2 can be efficiently
combined as follows:
r e a l f u n c t i o n d o t 3 ( n , x , i n c x , y, i n c y
i n t e g e r n, i n c x , i n c y
r e a l x(*), y(*)
i n t e g e r i, i x , i y
real s
s = 0.
i f ( i n c x .EQ. 1 .AND. i n c y .EQ. 1) t h e n
do 5 i = l , n
s = s + x ( i )* y ( i )
5
continue
else
ix = 1
iy = 1
do 10 i = 1 , n
s = s + x ( i x )* y ( i y )
i x = i x + incx
i y = i y -t i n c y
10
continue
endif
94
1 FORTRAN 77
dot3 = s
return
end
The program is longer but it is still clear. We mention that this discussion is
based on the design of the BLA subprogram SDOT discussed in the next
chapter.
This is (roughly) the smallest positive floating point number for which 1 + u >
1 in floating point arithmetic.
If x and y are nonnegative floating point numbers then x < uy implies
that x is no bigger than the least significant bit in y's floating point
representation. On a machine with 56-bit floating point arithmetic, a program
that tests for this situation might be structured as follows:
parameter
if
( u = 2**(-55)
x .LT. u*y ) t h e n
else
The trouble with this is that the program only works in 56-bit floating point
systems. If we use the program on a different machine then the parameter u
must be changed accordingly. This is inconvenient and prone to errors.
A better solution is based on the fact that if x + y = y in floating point,
then x must be very small compared to y. In fact, very reasonable assumptions
then
else
e n d if
The reason for the intermediate variable z is to force the storage of the sum
x + y. Otherwise the sum and test might take place in extended precision
registers and give a false impression about the size of x. (This ploy works with
most compilers.)
Termination Criteria
The termination of an iterative process requires care. Here is a real
function that sums the Taylor series for exp(x) until the kth term in absolute
value is less than a tolerance t o 1 times the krh partid sum:
r e a l f u n c t i o n e x p l (x, t o l )
real x,tol
r e a l s, t e r m
integer k
term = x
do 10 k = 1 , 3 0
s = s + term
if ( a b s ( t e r m ) .LT. t o l * a b s ( s )
expl = s
return
e n d if
t e r m = term* ( x / r e a l ( k ))
continue
expl = s
return
end
96
1 FORTRAN 77
Notice that the function is unwilling to go beyond the thirtieth partial sum.
There are two things wrong with expl. In situations like this with
"indefinite termination" it is better to use the while construct. Second, the userdefined tolerance may be too stringent given the underlying machine precision.
This implies that more iterations may be performed than are necessary. Here is a
revision that addresses these two problems:
1 97
This example is discussed in Kahaner, Moler, and Nash [5, chap. 71 and has to
do with the peculiarities of floating point arithmetic.
Recognize, however, that we can halve the number of floating point multiplies
by getting the "z*z" computation outside the loop:
a(n)
W = z*z
do 10 k = n-1:-1:O
s = s*w + a ( k )
10 c o n t i n u e
c
s
else
c
s
y**2)
x/d
= y/d
=
= 1.
=
0.
endif
else
c = 1.
s = 0.
endif
The procedure now involves four divides, two multiplies and two adds, and one
compare. Some of these additional operations can be avoided by exploiting some
connections between c, s, tan(0) = y/x and ctn(0) = x/y:
if abs ( y ) .GT. abs ( x )
t = x/y
s = 1. /sqrt (1 + t * t )
C = s*t
1 99
e l s e i f ( a b s ( x ) .GT. 0 . )
t = y/x
c = 1./sqrt (1. + t * t )
S = t*c
else
c = 1.
s = 0.
endif
y(i)
0.0
do 2 0 j
l,n
y ( i )= y (i) + A ( i , j ) * x ( j )
20
30
continue
continue
The reason has to do with the way arrays are allocated. Because arrays are
stored by column and because the inner loop in the first example varies the row
index, the first code results in contiguous cells of storage being accessed. This
100
I FORTRAN 77
where -n/2 5 0 5 n/2. Assume that the apex is at (1,7t/2,0). Write a single
precision program that reads in h and 0 and prints the pyramid's volume V
and surface area S. Arrange your program so that it involves a minimum
number of sine/cosine evaluations.
2. Suppose A (n-by-n) and x (n-by-1) are given. Give an efficient
algorithm for computing
Appendix.
d s q r t (dabs (x) )
computes the double precision square root of x. This is because dabs is the
intrinsic function for determining the absolute value of a double precision
variable, and dsqrt is the intrinsic function for a double precision square root
The following statement produces the same result:
root
s q r t ( a b s ( x ))
102
I FORTRAN 77
Complex
These functions accept complex arguments. cabs, real, and aimag
return real values. The rest return complex values.
cabs(z)
real(z)
aimag(z)
conj (z)
cexp(z)
clog(z)
ccos(z)
csi n(z)
absolute value
real part
imaginary part
conjugate
exponentid
natural logarithm
cosine
sine
APPENDIX.
BUILT-IN FUNCTIONS
1 103
String
Length, index, and ichar accept character arguments and return
integers. Char applies to integer types and returns a character.
length(s)
length of string
index(s1 ,s2) location of s2 in s l
convert character to integer
ichar (s)
convert integer to cham ter
char(i)
int(x)
real(x)
dble(x)
c mplx(x, y )
convert to hteger
convert to single precision
convert to double precision
convert to complex
Chapter 2
The BLAS
In this and the next chapter we discuss LINPACK, a package for solving
linear system and least squares problems. The "high-level" routines in
LINPACK (to be discussed in the next chapter) all access a collection of "lowlevel" routines that perform e l e m e n t . operations on vectors. These low-level
routines are known as the Basic Linear Algebra Subprograms, or BLAS. There
are BLAS for bookkeeping, vector operations, norm calculations, and Givens
rotations. In general, each BLA subprogram is implemented in real single
precision, real double precision, complex single precision, and complex double
precision. For simplicity, we describe the real single precision BLAS in $52.12.4 .In $2.5 the complex and double precision implementations are discussed.
The BLAS are discussed in C.L. Lawson, R.J. Hanson, F.T. Krogh, and O.R.
Kincaid [6] and in the LINPACK manual of Dongarra, Bunch, Moler, and
Stewart [I].
We mention that the BLAS covered in this chapter are what are called
"Level-1" BLAS. They involve vector operations: O(n) input and O(n) work,
Level-2 and Level-3 BLAS also exist and are increasingly important software
building blocks in various high-performance computing environments. See
Golub and Van Loan [4], J. Dongarra, J. Du Croz, S. Hammarling, and R.J.
Hanson [2], and J. Dongarra, J, Du Croz, I.S. Duff, and S. Hammarling [3] for
more details. Level-2 BLAS involve matrix-vector operations such as matrixvector multiplication and rank-1 updates: 0(n2) data and 0(n2) work. Matrixmatrix multiplication is one of the Level-3 BLAS : 0(n2) data and 0(n3) work.
XI
i n c x , y, i n c y
*
SCOPY ( 1 5 , x ( l O ) , l , y ( 2 0 ) ,1) *
c a l l SCOPY ( 2 0 , x , 1, y , 1)
x(i) + Y(l?
i = 1:20
call
y(lg+i) t x(9+i)
x(i) t x(lO+i)
i = 1:15
i = 1:lO
y ( i ) t x(-1+2i)
i = 1:SO
Nil
i = 1:20
*
c a l l SSWM(20, x, 1, y , 1)
*
callSSWAP(10,~(20),3,y(2),2) *
call SCOPY(50,~(1),2,y(l),l)
~(i)
x(17+3i)~y(2i)
i=1:10
Zero increments are perfectly legal and occasionally useful. For example, a
handy way to set up an n-dimensional zero vector is by the call
c a l l SCOPY ( n , 0, 0
, y,
1 )
and
108
1 THE BLAS
n, B, bdim, A,
adim )
n, bdim, adim
weal B ( b d i m , * ) , A ( a d i m , * )
O v e r w r i t e s t h e m-by-n
m-by-n
m a t r i x B.
On e n t r y :
B
r e a l (bdim,n)
The m a t r i x B .
bdim
integer
Row d i m e n s i o n o f t h e a r r a y B .
weal (adim,n)
The m a t r i x A .
adim
integer
Row d i m e n s i o n o f a r r a y A
integer
Row d i m e n s i o n o f m a t r i c e s B a n d A
Must h a v e m <= adim, bdim
integer
Column
dimension of m a t r i c e s
B and A .
On e x i t :
R e t u r n s a copy of t h e m a t r i x B .
Subprograms c a l l e d : SCOPY
Local v a r i a b l e s :
integer k
Copy B t o A,
d o 10 k
column by column.
=
1,n
c a l l SCOPY( m, B ( l , k ) , 1, A ( l , k ) , 1
10
continue
return
end
matrix A .with i t s
transpose,
On e n t r y :
A
r e a l (adim,n)
The m a t r i x A.
adim
integer
Row d i m e n s i o n o f t h e a r r a y A.
integer
Order o f t h e m a t r i x A .
On e x i t :
Returns t h e matrix A ~ .
Subprograms c a l l e d : SSWAP
Local v a r i a b l e s
integer k
Swap t h e k t h u p p e r t r i a n g u l a r row w i t h t h e
k t h l o w e r t r i a n g u l a r column,
do 10 k
1, n-1
c a l l SSWAP( n-k,
10
continue
return
end
A(k+l,k),
1, A ( k , k + l ) , adim )
I 109
cdim
r e a l C (cdim, * ) , v ( * )
T h i s s u b r o u t i n e s e t s u p a n n-by-n
circulant matrix C
whose k t h column i s g i v e n by
On e n t r y :
v
r e a l (n)
The n - v e c t o r
integer
Order o f t h e m a t r i x C ,
cdim
integer
Row d i m e n s i o n o f t h e a r r a y C .
v.
P u s t h a v e n <= cdim
On e x i t :
C
r e a l (cdim,n)
Subprograms c a l l e d :
Returns t h e c i r c u l a n t matrix C .
SCOPY
Local v a r i a b l e s
integer k
S e t u p t h e c i r c u l a n t m a t r i x C,
column by column.
c a l l SCOPY( n, v ( l ) , 1, C ( 1 , 1 ) ,
do 1 0 k
2,n
c a l l SCOPY( n-1,
C (n,k) = C (1,k-1)
10
continue
return
end
1 )
C(2,k-11,
1, C ( l , k ) , 1
1 111
) 1, B ( 1 ) 1 )
a d i m , B, b d i m
that accepts as input an n-by-n symmetric matrix A and copies the upper
triangular part into an array B.
adim,
work )
that overwrites a given n-by-n matrix A with the matrix AP, where P is the
permutation P = [ e 2 ,..., e , , e l ] and In=
[el ,...,en] .
4. Write a Fortran subroutine
HAM( n , A,
adim,
F,
f d i m , G, g d i m , H f h d i m )
given A
Scaling Vectors
The subroutine SSCAL is used to form products of the form ax, where
a is a scalar and x is a vector:
c a l l SSCAL( n, a , x, i n c x
The "top" of the vector to be scaled is passed through x, the scale factor is a,
the effective dimension of the scaled vector is n, and i n c x is the increment.
To illustrate how these parameters are used, if a and x ( 1 : 10 0 ) are
initialized along with the real scalar a then
*
50, a, x (2) , 2 )
*
99,x(l),x(2),1) *
c a l l SSCAL( 1 0 0 , a , x ( l ) , l )
x ( i ) t ax(;)
c a l l SSCAL(
x(2i) t ax(2i)
c a l l SSCAL(
x(i+l)tx(l)x(i+l)
i = 1:lOO
i = 1 :50
i = 1:99
incy )
1 113
i = 1:50
i = 1:5
i = 1:49
Inner Products
Inner products are computed using the function SDOT . For example, the
statement
w
SDOT( n, x ,
incx, y, incy 1
computes the inner product between the two vectors x and y. The arguments x
and y specify the tops of these two vectors and i n c x and i n c y specify the
corresponding increments. The effective length of the inner product is passed
through n To illustrate these details, if x ( 1 : 10 0 ) and y ( 1 : 5 0 ) are
initialized then
SDOT(50,x,l,y,l)
SDOT ( 5 0 , x , 2 , y, 1)
SDOT(50,x(51),1,yr1)
SDOT(3,x,l,y(3),-1)
114
1 THEBLAS
integer
r e a l (m)
r e a l (m)
The d i m e n s i o n o f v e c t o r s a a n d b.
The m-vector a .
The m-vector b ,
real
Returns minimizer of
r e a l (m)
r e a l (m)
R e t u r n s t h e minimum r e s i d u a l b - a x .
R e t u r n s t h e optimum p r e d i c t o r a x .
On e x i t :
SDOT( m, a , 1, a , 1 )
( x .NE. 0 . 0 )
then
x = SDOT ( m, a , 1, b, 1 ) / x
-x,
a , 1, b, 1 )
Compute t h e optimum p r e d i c t o r a x .
c a l l SSCAL( m,
return
end
x, a , 1 )
1 1 b-ax 1
I 115
re a 1 A (adim, * )
This s u b r o u t i n e o v e r w r i t e s t h e upper t r i a n g u l a r p o r t i o n of
m a t r i x A w i t h i t s s y m m e t r i c p a r t T = (A
a n n-by-n
A ~ /2,
)
a n d t h e l o w e r t r i a n g u l a r p o r t i o n w i t h i t s skew-symmetric
part
(A
A ~ / 2) ,
On e n t r y :
A
r e a l (adim,n)
The m a t r i x A.
adim
integer
Row d i m e n s i o n o f t h e a r r a y A.
integer
Order o f t h e m a t r i x A.
On e x i t :
Upper t r i a n g u l a r p a r t r e t u r n s u p p e r t r i a n g u l a r
p a r t of
(A
A ~ / 2) .
S t r i c t l y lower t r i a n g u l a r
p a r t r e t u r n s s t r i c t l y lower t r i a n g u l a r p a r t o f
(A
A ~ / 2) .
.50e0)
1, n-1
kpl = k
nmk = n
+
-
c a l l SAXPY( nmk,
I . , A ( k p l , k ) , 1, A ( k , k p l ) , adim)
116
1 THEBLAS
n, A, adim, x, y ,
job )
n, adim, j o b
r e a l A (adim, * ) , x ( * ) , Y ( * I
C
T h i s s u b r o u t i n e p e r f o r m s mat r i x - v e c t o r m u l t i p l i c a t i o n .
I f job
0 then y
Ax, o t h e r w i s e y
= A ~ X .
On e n t r y :
r e a l (adim, n )
The m a t r i x A.
adim
integer
Row d i m e n s i o n of a r r a y A.
x
m
r e a l (*)
The v e c t o r x.
integer
Row d i m e n s i o n o f m a t r i x A,
must h a v e m <=
adim
integer
Column d i m e n s i o n o f m a t r i x A .
job
integer
j o b = 0 compute y
job
Ax
1 compute y = ATX,
On e x i t :
r e a l (*)
I f job
I f job
0, r e t u r n s Ax;
1, r e t u r n s
A ~ X
S u b p r o g r a m s c a l l e d : SAXPY, SDOT
Local v a r i a b l e s
r e a l SDOT
i n t e g e r i fj
if
( j o b .EQ. 0 ) t h e n
do 10 i
y(i)
1,m
0.0
continue
do 20 j
1,n
c a l l SAXPY( m, x ( j ) , A ( l , j ) , 1, y ( l ) , 1 )
continue
else
do 30 j
1,n
Y O ) = SDOT
1 )
continue
30
endif
C
return
end
adim, kdim,
adim, K, kdim, v , j )
Given a n n-by-n
m a t r i x A and a n n - v e c t o r v, t h i s
s u b r o u t i n e sets u p t h e K r y l o v m a t r i x K where
On e n t r y :
A
r e a l (adim, n )
The m a t r i x A.
adim
integer
Row d i m e n s i o n o f t h e a r r a y A.
r e a l (n)
Contains t h e v e c t o r v.
integer
Dimension o f m a t r i x A and
v e c t o r v . Must h a v e n <= adim.
integer
Column d i m e n s i o n o f m a t r i x K .
kdim
integer
Row d i m e n s i o n o f a r r a y K
Must have n <= kdim.
On e x i t :
K
r e a l (kdim,n)
Subprograms c a l l e d :
Returns t h e Krylov m a t r i x K.
SCOPY, MMULT
Local v a r i a b l e s :
integer p
S e t up t h e n-by-j
K r y l o v m a t r i x K , column by column.
1 117
118
1 THEBLAS
do 10 p
2, j
c a l l MMULT( n, n , A, adim, K ( 1 , p - l ) ,
10
K(l,p),
continue
return
end
CROSS( m,
adim,
work )
that overwrites the upper triangular portion of an m-by-n matrix A with the
upper triangular portion of ATA You will need a workspace.
adim,
work )
adim,
m, B, b d i m , C, c d i m
is a very
SASUM ( n, x , i n c x )
As usual, x specifies the top of the vector x, n passes the dimension, and
i n c x the increment. To illustrate the use of these two BLAS, if x (1:20 ) is
initialized then
Maximal Entry
To compute the infinity norm of a vector, one must scan for the largest
component in absolute value. The BLA subprogram I SAMAX can be used for
this computation, If
imax
ISAMAX( n,
x, incx
then imax contains the index of the component of maximal absolute value. The
effective dimension of the vector scanned is passed through n. The top of the
120
1 THE BLAS
vectorisgivenby x andtheincrementby i n c x . If x = ( 2 , -1 , 0 , 6 , - 7 )
then
1
imax
imax
i s a m a x ( 5, x,
imax
i s a m a x ( 2, x ( 2 ) , 2 )
t 5
imnx t 2
r e a l (n)
The v e c t o r x .
integer
Dimension o f t h e v e c t o r x .
On e x i t :
NORM1
real
The i n f i n i t y - n o r m
Subprograms c a l l e d : ISAMAX, a b s
Local V a r i a b l e s
i n t e g e r ISAMAX
r e a l abs
return
end
of vector x.
1 121
n,
n, A,
adim )
adim
r e a l A(adim, * )
T h i s f u n c t i o n c o m p u t e s t h e F r o b e n i u s norm o f m a t r i x A.
On e n t r y :
A
r e a l (adim,n)
The m a t r i x A.
adim
integer
Row d i m e n s i o n o f a r r a y A.
integer
Row d i m e n s i o n o f m a t r i x A.
integer
Column d i m e n s i o n o f m a t r i x A.
On e x i t :
The F r o b e n i u s norm o f m a t r i x A.
SNORMF r e a l
Subprograms c a l l e d : SNRM2
Local v a r i a b l e s
r e a l v ( 2 ) , SNRM2
integer k
do 10 k
10
l,n
v (1) =
SNORMF
v(2) =
SNRM2( m, A ( l , k ) , 1
SNORMF
continue
return
end
0.0
SNRM2( 2 , ~ ( l )
1 ,)
122
1 THE BLAS
n, A, adim )
i n t e g e r m, n, adim
r e a l A(adim, * )
T h i s f u n c t i o n c o m p u t e s t h e 1-norm o f t h e m a t r i x A.
On e n t r y :
A
r e a l (adim,n)
The m a t r i x A.
adim
integer
Row d i m e n s i o n of t h e a r r a y A.
integer
ROW
integer
Column d i m e n s i o n o f t h e m a t r i x A.
d i m e n s i o n of t h e m a t r i x A.
On e x i t :
SNORMl r e a l
The 1-norm o f m a t r i x A.
Subprograms c a l l e d , SASUM
Local v a r i a b l e s
r e a l s , SASUM
integer k
Compute t h e 1-norm o f m a t r i x A column b y column, u p d a t i n g
partial result.
SNORMl
do 10 k
s
0.0
l,n
SASUM( m, A ( l , k ) ,
SNORMl
10
continue
return
end
max ( s , SNORMl )
T h i s s u b r o u t i n e o v e r w r i t e s an n - v e c t o r x w i t h a
u n i t n-vector v such t h a t i f P
Px i s
I-2vvT,
then
z e r o b e l o w t h e f i r s t component.
On e n t r y :
x
n
r e a l (n)
The v e c t o r x .
integer
Dimension o f t h e v e c t o r x .
On e x i t :
R e t u r n s t h e Householder v e c t o r v .
S e t t o el i f x
Subprograms c a l l e d .
0.
SSCAL, SNRM2, a b s
Local v a r i a b l e s
r e a l a l f a , b e t a , x l , SNRM2, a b s
alfa
if
SNRM2( n, x ( l ) t 1 )
( a l f a .NE.
0.0) t h e n
xl = x(1)
x(1)
xl
sign (alfa,xl)
beta
s q r t ( 2.0
alfa
c a l l SSCAL( n, l . O / b e t a ,
else
x(1)
endif
return
1.
(alfa
x(l), 1
abs(x1)) )
123
124
1 TIE BLAS
Problems for Section 2.3
that accepts as input an m-by-n real matrix A and returns the magnitude of its
largest entry.
that overwrites the real n-vector x with Px, where P is a permutation and
the components of Px are ordered from smallest to largest in absolute value.
The integer vector p should have the property that P = [ e P ( ~,...,
) ep(,) 1,
where I = [ el ,..., e, 1.
that overwrites the n-vector x with a unit 2-norm n-vector v such that if
P = I - 2vvT then P x is a multiple of y. Assume that x and y are
nonzero.
2.4
GIVENS ROTATIONS
Two of the BLAS are concerned with Givens rotations. The subroutine
SROTG constructs rotations and SROT applies them.
.NE. 0 )
then
* 42) t 5 , 4 4 ) t . 6 , c t . 8 , s t . 6
* 4 2 ) t 513 ,x(4)+ 5, c t . 6 , s t .8
126
1 THE BLAS
s u b r o u t i n e ZTOCS ( z ,
c, s
real z,c,s
C
c
c
Computes a s i n e - c o s i n e p a i r f r o m s c a l a r
p r o d u c e d by
SROTG.
Subprograms c a l l e d :
abs, s q r t
if
( abs
( z ) .LT.
s = z
c = sqrt(1.
elseif
s*s)
( a b s ( z ) .GT.
c = l./z
s = s q r t (1.
else
c
s
1 ) then
0.
1.
then
c*c)
endif
C
return
end
1 127
A = J(p,q,c,s)A
call S~OT(n,A(p,l),adirn,A(q,1),adim,c,-s)
A=J(~,~,c,s)~A
A = A J(pjqjc,s)
S)
call S R O T ( ~ , A ( I , ~, )
l , ~ ( l , q )1 1 1 c r - s )
+ A =A J ( ~ , ~ , c , s ) ~
. . . G (n-1) , where G ( k ) i s a G i v e n s r o t a t i o n i n
(k, k t l ) . Information necessary t o reconstruct G (k)
G (1)
plane
i s r e t u r n e d i n x ( k t 1 ) . S e e APPLY i n n e x t example.
On e n t r y :
r e a l (n)
The v e c t o r x.
integer
Dimension of v e c t o r x.
On e x i t :
G ( k ) f o r k = 1,
. . .,n-1.
Subprograms c a l l e d :
SROTG
Local v a r i a b l e s
r e a l c,s
integer k
Z e r o x ( k + l ) and r e c o r d t h e G i v e n s r o t a t i o n .
c a l l SROTG ( x ( k ) ,x ( k t l ) ,c, s )
10
continue
return
end
Used by
1 129
integer n
real x(*), y(*)
This subroutine overwrites t h e vector y with t h e vector
Qy, where Q
G (1)
.. .
G (n-1)
i s an o r t h o g o n a l m a t r i x
p r o d u c e d b y t h e s u b r o u t i n e ZERO.
On e n t r y :
x
r e a l (n)
real(n)
The v e c t o r y .
integer
~ i m e n s i o no f t h e v e c t o r s x and y .
On e x i t :
R e t u r n s t h e v e c t o r Qy.
and a p p l y Givens
n-1,
1, -1
c a l l ZTOCS ( x ( k + l ) , c , s )
c a l l SROT( 1, y ( k ) , 1, Y ( k t 1 1
10
continue
return
end
1,
s )
130
1 TI%
BLAS
adim )
adim
r e a l A (adim, * )
This s u b r o u t i n e o v e r w r i t e s t h e upper t r i a n g u l a r p o r t i o n of
t h e n-by-n
upper Hessenberg m a t r i x A w i t h t h e
u p p e r t r i a n g u l a r m a t r i x Q ~ A= R, where Q i s o r t h o g o n a l .
On e n t r y :
A
r e a l (adim, n )
The
H e s s e n b e r g m a t r i x A.
adim
integer
Row d i m e n s i o n o f t h e a r r a y A.
integer
Order o f H e s s e n b e r g m a t r i x A.
On e x i t :
Returns t h e upper t r i a n g u l a r matrix Q ~ A .
jpl
G(1)
...G(n-1)
where Q i s o r t h o g o n a l , and
a p p l y r o t a t i o n s t o A s u c c e s s i v e l y t o compute
d o 20 j
jpl
=
=
1, n-1
j
c a l l SROTG( A ( j , j ) ,
c a l l SROT ( n - j ,
20
A ( j p l , j ) , c, s )
A ( j, j p l )
adim, A ( j p l j p l )
adim, c, s )
&
continue
return
end
R = Q TA.
1 131
that accepts real 2-vectors x and y (both nonzero) and computes a Givens
rotation G such that Gx is a multiple of y .
c, s
that overwrites the n-by-n real matrix A with QTA Q, where Q is the
identity everywhere except 9;; = qjj = c and qii = -qji = s .
3. Write a Fortran subroutine
SYM2 ( B, bdim, c, s
that accepts as input a real 2-by-2 matrix B and computes a Givens rotation
G such that the product GB is symmetric.
In this section we discuss these options as they pertain to the BLAS, To facilitate
the discussion, the above prefixes are also used to designate the type of an
argument. 'I'hus,
dw
DZNRM2( n, zx, i n c x
Copying a Vector
call
SCOPY( n, s x , i n c x , s y , i n c y )
call
DCOPY( n, d x , i n c x , dy, i n c y )
call
call
ZCOPY ( n, zx, i n c y ,
zy, i n c y )
W VERSIONS
I 133
Interchanging Vectors
call
SSWAP( n, s x , i n c x , s y , i n c y
call
DSWAP( n, d x , i n c x , dy, i n c y
call
CSWAP ( n, c x , i n c x , c y , i n c y
call
ZSWAP( n, zx, i n c x ,
zy, i n c y
c a l l DSCAL( n, d a , dx, i n c x
c a l l CSCAL( n, c a , cx, i n c x
c a l l ZSCAL( n,
za, zx, i n c x
CSSCAL( n, s a , c x , i n c x
call
ZDSCAL( n, d a , zx, i n c x
Inner Product
sw
SDOT( n, s x , i n c x , s y , i n c y
dw
DDOT( n, d x ,
i n c x , dy, i n c y
xHy
cw =
CDOTC ( n,
c x , i n c x , cy, i n c y
zw =
ZDOTC( n,
zx, i n c x , zy, i n c y
cw =
CDOTU( n,
c x , i n c x , cy, i n c y
zw =
ZDOTU( n,
c x , i n c x , cy, i n c y
SAXPY( n, s a , s x , i n c x , s y , i n c y
call
CAXPY( n,
d a , d x , i n c x , dy, i n c y
call
CAXPY( n, c a , c x , i n c x , cy, i n c y
call
sw
SNRM:!
n, s x , i n c x
dw
DNRM:!
n, dx, i n c x
sw
SCNRM2( n, cx, i n c x
dw
DZNRM2( n,
zx, i n c x
n, s x , i n c x
dx, inCx
sw =
SCASUM( n, dx, i n c x
dw
DZASUM( n,
sw =
SASUM
dw
DASUM
( n,
zx, i n c x )
Maximal Entry
imax
ISAMAX( n, s x , i n c x
imax
IDAMAX( n, d x , i n c x
imax
ICAMAX( n, c x , i n c x )
imax
IZAMAX( n, zx, i n c x
1 135
SROT ( n,
call
DROT ( n, d x ,
call
call
Z D R O T ( n,
s x , i n c x , s y , i n c y , sc, ss )
i n c x , dy, i n c y , dc, d s )
zx, i n c x ,
and ZDROT
zy, i n c y , dc, d s
ca
a1 + i*a2
cb = b l
i*b2
T h i s s u b r o u t i n e computes r e a l c o s i n e - s i n e p a i r s
( c 1 , s l ) and (122, s 2 ) s o
[
sl(c2+i*s2) I
cl
[-sl (c2-i*s2)
cl
[ a1
1 [ blt
i*a2 I
i*b2 1
[ r
Lo1
On e n t r y :
ca
complex
1st component o f v e c t o r t o be z e r o e d .
136
1 THE BLAS
cb
complex
On e x i t :
scl
real
c1
ss1
r e a1
s1
sc2
r e a1
c2
ss2
r e a1
s2
Subprograms c a l l e d : SROTG,
r e a l , aimag
Local v a r i a b l e s
r e a l a l , a2, b l , b2
r e a l r e a l , aimag
Compute r e a l and imaginary p a r t s of ca and cb,
t h e 1 s t and 2nd component of v e c t o r t o be zeroed.
a1
real (ca)
a2
aimag(ca)
bl
b2
r e a l (cb)
aimag(cb)
sca*scb
ssa*ssb
ss2
ssa*scb
sca*ssb
return
end
1 137
Chapter 3
LINPACK
LINPACK is a collection of Fortran subroutines that can be used to solve
numerous linear system and least squares problems. For square systems it has
routines that perform the following factorizations:
PA = LfU
A
= GGT
A
= LDLT
=QR
= USV'
( Orthonormahation )
( Singular Value Decomposition )
3.1 TRIANGULAR
SYSTEMS
The standard approach to solving dense linear systems reduces the given
problem to a collection of triangular problems. Thus, we begin with the
LINPACK codes associated with the solution of upper or lower triangular
systems.
and
where
T
real (tdim,n)
tdim
integer
integer
red (n>
job
integer
info
integer
job
job
job
job
= 11
*
*
Solve
Solve
Solve
Solve
Tx = b,
Tx = b,
P x = b,
F x = b,
where
where
where
where
T
T
T
T
is lower triangular
is upper triangular
is lower triangular
is upper triangular
Condition Estimation
The 1-norm condition Kl(n of a triangular matrix can be estimated using
the subroutine STRCO :
c a l l STRCO( T , t d i m , n, rcond,
z, job
Here,
T
tdim
integer
integer
rcond real
real (n)
job
integer
If the value returned by rcond is sufficiently small, then one may conclude
that T is numerically singular . This is because
I I Tx l I
min
II xll1=
1 143
The definition of "sufficiently small" usually means that the value is so small that
we can make the stored T exactly singular by altering least significant mantissa
bits. Such a decision usually affects subsequent computation in a serious way.
To check for this, code of the following form should be executed:
r c o n d l = rcond + 1 . 0 e 0
i f (rcondl G . 1 . O e O )
then
{ Process nonsingular T case )
else
{ Process the singular T case 1
endif
The idea behind this kind of testing is detailed in 5 1.9. Essentially, if rcondl is
1, then rcond is less than the machine precision,
Condition estimation has a prominent role to play in the subroutines in
this chapter that are concerned with linear system solving. Indeed, we stress the
use of LINPACK routines that return condition estimates. That way, when a
computed solution x, to Ax = b is obtained then the heuristic
rcond I I x - x c I l l = I1 x
111
can be invoked.
tdim, n, d ,
job,
info
On input the variable job contains a three-digit integer ABC that is used to
indicate the required computations. The argument d is a two-vector.
144
1 LINPACK
A =
A =
B =
B =
det(T)= d(l)x106(2)
a No determinant computations
* Overwrite T with its inverse
* No inverse computations
* T is upper triangular
T is lower triangular
n, bdim, p,
r e a l T( t d i m , *)
, B (bdim, * ) ,
info
rcond,
z (*)
O v e r w r i t e s B ( n x p ) w i t h s o l u t i o n t o TX = B where T ( n x n )
i s upper
however,
t r i a n g u l a r . I f T is numerically singular,
t h e n n o a t t e m p t i s made t o compute
X.
On e n t r y :
T
r e a l (tdim,n)
The m a t r i x
tdim
integer
ROW
integer
Dimension o f t h e m a t r i x
r e a l (bdim, p )
The m a t r i x
bdim
integer
The row d i m e n s i o n o f a r r a y B .
integer
The column d i m e n s i o n o f m a t r i x B,
r e a l (bdim,p)
R e t u r n s s o l u t i o n t o TX= B
T.
dimension of t h e a r r a y T .
T.
B.
On e x i t :
B
i f T nonsingular,
info
integer
I f i n f o = 0,
T nonsingular.
I f i n f o = 1, T n u m e r i c a l l y
singular.
rcond
real
Returns l/condl( T
r e a l (n)
Returns
11
Tz
II
estimate.
u n i t 1-norm z
=
rcondll z
II
so
i n 1-norm
1 145
Local V a r i a b l e s
integer j
r e a l rcondl
C
c a l l STRCO( T , t d i m , n,
rcondl = 1.0e0
if
( rcondl
rcond,
z,
1 )
rcond
.EQ, 1 . 0 e 0 ) t h e n
i n f o -- 1
else
info = 0
d o 1 0 j = 1, p
t d i m , n, B ( 1 , j ) , 0 1 , i n f o )
c a l l STRSL( T,
10
continue
endif
return
end
t d i m , n , u, v, b , i n f o )
tdim,
n,
R,
rdim,
job,
info )
146
1 LMPACK
that accepts as input two n- b y-n upper triangular matrices T and R, and
overwrites T with TR" if job = 0, and with R-'T if job = 1. The integer
variable info should be used to communicate trouble encountered in the
solution process.
4. Write a subroutine
TRILYP( U,
udim,
n, C , c d i m ,
work, info
LU Factorization
The subroutine S G E C 0 computes the factorization PA = LU via
Gaussian elimination with partial pivoting. It should be used for general square
matrices. It also provides an estimate of the condition. A call to SGECO has the
form
CALL SGECO ( A,
adim,
n, i p v t , rcond, z
where
A
real ( a d i m , n)
adim
integer
integer
r c o n d real
i p vt
integer (n)
(n)
Gauss transforms M 1 . . M
and permutations PI Pn-1 a r e
-..MIPIA = U is upper triangular. The upper
generated so that Mn_IPn_I
triangular portion of A is overwritten by U,whereas for all i > k, A(i,k) is
overwritten by the (i,k) entry of Mk. The permutation Pk can be obtained by
148
1 LINPACK
n, i p v t , i n f o
adim,
All the arguments have the same role as in SGECO except the integer variable
i n f o which is used to communicate trouble in the event of ill-conditioning:
info = 0
info = k
=3
UU
= 0
adim,
n, i p v t , b,
job
where
A
real (adim,n)
adim
integer
integer
ipvt
integer (n)
lEal (n)
integer
job
job = 0
job = 1
1 149
* solve Ax = b
* solve ATx = b
job
job = 1 0
job
11
*
*
only inverse
01dy determinant
Determinant and inverse
150
I LINPACK
i p v t , rcond,
z,
b,
info)
integer adim,
r e a l A (adim, *)
n,
ipvt(*), info
b (*)
rcond,
z (*)
C;
c
c
On e n t r y :
A
r e a l (adim,n)
T h e m a t r i x A.
c
c
c
c
adim
integer
T h e row d i m e n s i o n o f t h e a r r a y A
integer
The d i m e n s i o n o f t h e m a t r i x A.
Must h a v e n <= a d i m .
r e a l (n)
The r i g h t - h a n d s i d e v e c t o r .
r e a l (n)
R e t u r n s A-lb i f A i s
On e x i t :
b
integer(n)
Used f o r p i v o t i n f o r m a t i o n .
rcond
real
r e a l (n)
Returns
11 Az I I
info
integer
info
u n i t 1-norm z s o
= rcondll z I 1 i n 1-norm
0
info > 0
*
*
x found
A s i n g u l a r , no x
Local Variables:
r e a l rcondl
c a l l SGECO ( A, a d i m ,
n,
i p v t , rcond, b )
if ( r c o n d l
.GT.
1.0e0)
then
a d i m , n,
i p v t , b, 0 )
info = 0
c a l l SGESL ( A,
else
info
endif
return
end
1 151
2. Write a subroutine
B I L I N ( A,
adirn, n, u, v, f , i n f o )
3. Write a subroutine
EMAX( A,
adirn, n, i p v t , w )
that takes the output from S G E C O (i.e., the factorization PA = LU) and
computes the real number
where e is the vector of all 1's. Note: the absolute value of a matrix is obtained
by taking the absolute value of its entries,
Cholesky Factorization
The subroutine S P O C O can be used to compute the Cholesky
factorization A = RTR of a symmetric positive definite (s.p.d.) matrix :
CALL SPOCO ( A,
adim,
n, rcond,
z, i n f o
Here:
A
adim
integer
integer
rcond
real
real (n)
integer
k=0
k>0
*R
is found.
A(l:k,l:k) is not p.d.
SPOFA ( A,
adim,
1 153
n, i n f o )
can be used to compute the Cholesky factorization. The arguments perform the
same function as in SPOCO.
The output of either SPOCO or S P OFA can be used by the subroutine
SPOS L to solve a symmetric positive definite system Ax = b :
CALL S P O S L ( A,
adim,
n, b )
Here,
A
real (adirn,n)
adim
integer
integer
real (n)
adim,
n, i p v t , rcond, z )
Here,
A
real (adim,n)
adim
integer
integer
154 LINPACK
ip v t
integer(n)
rcond real
red (n)
adim, n, i p v t , k )
k > O
adim, n, i p v t , b )
Here,
A
real (adim,n)
adim
integer
integer
integer (n)
red (n)
1 155
adim, n, d e t
job
(n)
Packed Storage
In some applications it is convenient to store symmetric matrices in
"packed" form, This means that the upper triangular part of the symmetric
matrix A is stored column by column in a linear array AP having dimension at
least n(n+ 1)/2 (see $1.5). Thus, for i < j ,the (id entry of the matrix A is
housed in AP ( i + j ( j - 1 ) / 2 ) All of the subroutines mentioned in this
section have packed storage implementations in LINPACK, e.g.,
Packed Version
SPPCO (AP, n ,
S P P S L (AP, n,
SSPCO (AP, n ,
SSPSL (AP,n,
rcond, z, i n f o )
b)
ipvt, rcond, z)
ipvt,b)
Conventional Version
SPOCO (A, adim, n ,
SPOSL (A, adim, n ,
S S I C O ( A , adim, n ,
S S I S L (A, adim, n ,
r c o n d , z, i n f o )
b)
ipvt, rcond, z )
i p v t b)
156
1 LINPACK
while SCHDD can be used to compute the Cholesky factor (if it exists) of
1 157
z, i n f o )
integer
real
S o l v e s Ax
n,
adim, i n f o
A (adim, * )
b ( * ) , rcond,
z(*)
= b,
On e n t r y :
r e a l (adim, n )
adim
integer
Row d i m e n s i o n o f t h e a r r a y A.
integer
Dimension o f t h e m a t r i x
r e a l (n)
The r i g h t - h a n d
A.
A.
side.
c On e x i t :
r e a l (n)
U s u a l l y r e t u r n s s o l u t i o n t o Ax
c
c
c
rcond
real
r e a l (n)
~ e t u r n s u n i t 1-norm z
11
info
integer
Az
11
rcondl
II
so
i n 1-norm.
b returns ~ ' l b ,
info > 0
A(l:k,l:k)
info
info
-1
= b.
not pos.def.
A numerically singular.
Local Variables
r e a l rcondl
c a l l SPOCO( A, a d i m , n,
rcondl
i f
rcond
( info
.EQ. 0
rcond,
z, i n f o
1.0e0
.AND. r c o n d l , G T .
1.0e0 ) then
c a l l SPOSL( A, adim, n , b )
e l s e i f ( i n f o .EQ. 0
info
endif
return
end
-1
.AND. r c o n d l .EQ. 1 . 0 e 0 ) t h e n
158
1 LINPACK
Solves
Ax
, b(*),
where
rcond,
z(*)
i s symmet r i c .
On e n t r y :
A
real (adim, n )
The syrnmet r i c m a t r i x
adim
integer
Row d i m e n s i o n o f t h e a r r a y A .
integer
Dimension o f t h e m a t r i x
ipvt
i n t e g e r (n)
r e a l (n)
The r i g h t - h a n d
r e a l (n)
Usually returns ~ - 1 b .
r e a l (adim,n )
Upper t r i a n g u l a r p a r t r e t u r n s U .
rcond
real
r e a l (n)
Returns
11 Az 1 1
info
integer
info = 0
i n f o = -1
A.
A.
side.
On e x i t :
u n i t 1-norm z s o
= rcondll z 1 1 i n 1-norm.
* x found.
* A numerically
singular.
1,0e0
( r c o n d l .GT. 1 . 0 e 0 ) t h e n
info
c a l l S S I S L ( A,
else
info
endif
return
end
rcond
-1
adim, n , i p v t , b )
z )
1 159
n, ipvt, b, rcond, z, w,
adim,
info
2. Write a subroutine
L R ( A,
a d i m , n,
info, work )
that overwrites the upper triangular part of A with the upper triangular part of
RRT, where A = RTR is A's Cholesky factorization.
A,
adim,
n, i n f o )
that overwrites the upper triangular part of A with an upper triangular matrix
U such that A = UUT. Hint: Apply SPOFA to the matrix EAE, where E is
the exchange matrix, i.e., the identity with the column ordering reversed.
4. Let A be an n- by-n symmetric positive definite matrix and let A ri,ll
denote the principal submatrix defined by rows and columns i ,..., j . Assume
that 1 5 i < j I n . Write a subroutine
A I J I N V ( A,
adim,
n, i, j ,
info,
.. .
that overwrites the upper triangular portion of Aridlwith the upper triangular
portion of its inverse,
n, B, b d i m ,
i n fof w o r k )
160
1 LINPACK
that overwrites the upper triangular part of A with the upper triangular part of
ReTAR-1,where B = RTR is the Cholesky factorization of B.
3.4
BANDED SYSTEMS
SGECO ( A,
SGBCO( A,
adim, n, i p v t , rcond, z )
adim, n, p, q , i p v t , rcond,
Full:
Banded:
SGEFA ( A,
SGBFA ( A,
adim, n, i p v t , i n f o )
adim, n, p , q , i p v t , i n f o )
Full:
Banded:
S G E S L ( A,
SGBSL( A ,
adim, n, i p v t , b, job )
adim, n , p, q , i p v t , b,
Full:
Banded:
SGEDI ( A ,
SGBDI ( A,
adim, n , i p v t , b , job )
adim, n, p , q , i p v t , b, d e t )
job )
The arguments to the band routines perform the same function as their
counterparts in the full routines. The integer variables p and q pass the lower
and upper bandwidths. However, the n-by-n matrix A is not stored
conventionally. Instead, it is stored in an array of size (2p + q + 1)-by-n.
The case n = 7 ,p = 1 ,and q = 3 amply illustrates how this is done:
162
1 LINPACK
Band Cholesky
The subroutines described in 93.3 for full positive definite symmetric
systems also have banded analogs:
Full:
Banded:
SPOCO( A,
SPBCO ( A,
adim,
adim,
Full:
Banded:
S P O F A ( A,
S P B F A ( A,
n, i n f o )
a d i m , n, q, i n f o )
Fd:
Banded:
S P O S L ( A,
S P B S L ( A,
adim,
adim,
Fulk
Banded:
S P O D I ( A,
S P B D I ( A,
n , d e t , job )
a d i m , n,
q, d e t )
n, rcond, z , i n f o )
n,
q, rcond, z, i n f o )
adim,
n, b )
n, q , b )
adim,
The arguments to the band routines perform the same function as their
counterparts in the full routines. The integer variable q passes the bandwidth.
However, the n-by-n matrix A is not stored conventionally, For these banded
subroutines, the matrix A must be stored as follows:
1 163
Here, we are displaying the case n = 8, q = 3, In this data structure, the matrix
element a;j is stored in the array element A (q+I+i-j ,j ) .
Tridiagonal Systems
Consider the tridiagonal system
If the data for this problem is stored in real arrays c, d, e, and b then x can
be computed using Gaussian elimination with partial pivoting by invoking the
subroutine SGTSL :
CALL SGTSL( n, c , d, e, b, info )
164
1 LINPACK
q , rcond,
z, b, i n f o )
n , q , adim, i n f o
r e a l A (adim, *)
S o l v e s Ax = b
, b (*) ,
(*)
rcond
where A i s s p d w i t h bandwidth
q.
On e n t r y :
A
r e a l (adim, n )
The nxn s y m m e t r i c p o s i t i v e
d e f i n i t e matrix
stored i n
banded form.
adim
integer
Row d i m e n s i o n o f t h e a r r a y
A,
integer
Dimension o f t h e m a t r i x
r e a l (n)
The r i g h t - h a n d s i d e .
integer
The b a n d w i d t h o f t h e m a t r i x A.
A.
Must h a v e q t l <= a d i m .
On e x i t :
A
r e a l (adim, n )
Returns t h e Cholesky t r i a n g l e .
r e a l (n)
U s u a l l y r e t u r n s s o l u t i o n t o Ax=b.
rcond
real
r e a l (n)
R e t u r n s u n i t 1-norm z s o
11
info
integer
II
Az
rcondll z
info=O
11 i n
1-norm.
xfound.
i n f o = -1
A s i n g u l a r , no x .
(info
.EQ.
adim
i n f o = -1
endif
return
end
rcond
info )
1.0e0
0
c a l l SPBSL( A
else
.AND.
adim
rcondl
.GT.
q, b )
1.OeO)
then
1 165
alfa, b, info
is a real
2. Write a subroutine
CORNER( n, d, e, alfa, b, info )
In particular, if
real (adimp)
adim
integer
integer
integer
qraux
real(n)
i p vt
integer (n)
work
real(n)
Workspace.
job
integer
If job
= 1 on
ipvr(k) > 0
ipvr(k) = 0
ipvr(k) < 0
continue
c a l l SQRDC ( A , a d i m , m , n , q r a u x , i p v t , w o r k , 1 )
Here,
A
real(adim,n)
adim
integer
integer
integer
168
1 LINPACK
qraux
real (n)
real (m)
To interpret the rest of the calling sequence we need additional notation. Let
Ak denote the first k columns of AP and let Rk be the leading k-by-k
portion of R. Depending upon what is requested,
Qb
ram)
QTb
real(m)
real(k)
rsd
real(m)
AX
real(m)
ino
integer
job
integer
I I A@ - b 1 1
*
*
*
*
3
returnsthevectorQb
Q T ~returnsthevector j2Tb
returnsthesolutiontominIIAy-bI12
x
r s d returns the minimum residual b - Akx
returns the predictor A@
AX
Qb
Compute everything:
H
call SQRSL(A,adim,m,k,qraux,b,b,b,b,b,b,01000,info)
n,
info
, x (*) ,
b (*)
rcond,
(*)
qraux (*)
Solves
min I I Ax
QR.
must n o t be n u m e r i c a l l y r a n k d e f i c i e n t .
On e n t r y :
r e a l (adim, n)
T h e m-by-n
adim
integer
Row d i m e n s i o n o f t h e a r r a y A.
integer
Row d i m e n s i o n o f t h e m a t r i x A.
integer
Column d i m e n s i o n o f m a t r i x
real (m)
The real m - v e c t o r b.
m a t r i x A.
A.
3.5 QR FACTORIZATION
1 171
exit :
A
r e a l (adim,n)
R e t u r n s Q and R.
r e a l (m)
Returns b
r e a l (n)
R e t u r n s t h e m i n i m i z e r XLS o f
llAx
AxLS.
bll2-
rcond
real
real (n)
R e t u r n s u n i t 1-norm z s o
II
info
Az
info
real (n)
rcondl
info = 0
integer
qraux
II
0
11
*solution
*A
i n 1-norm.
found.
rank d e f i c i e n t .
r e t u r n s i n f o r m a t i o n o n Q.
Local v a r i a b l e s
real r c o n d l
c a l l SQRDC ( A, a d i m , m, n , q r a u x ,
1, z, 0 )
c a l l STRCO ( A, a d i m , n,
rcondl
if
rcond
r c o n d l .GT.
rcond,
z,
1.0e0
1.0e0)
then
c a l l SQRSL ( A, a d i m , m, n , q r a u x , b, z , b , x, b,z, l l 0 , i n f o )
else
info
-1
endif
return
end
n, i n f o , i p v t (*)
r e a l A ( a d i m , * ) , b(*), x ( * ) , z ( * ) , q r a u x ( * ) , r c o n d
Solves
t h e f a c t o r i z a t i o n AP
numerically rank d e f i c i e n t .
t h e u n d e r d e t e r m i n e d s y s t e m Ax
=
QR p r o v i d e d
b via
is
not
On e n t r y :
A
adim
m
n
C o n t a i n s a r e a l m-vector b.
r e a l (adim, n )
R e t u r n s Q and R .
r e a l (m)
Is d e s t r o y e d .
r e a l (n)
R e t u r n s t h e s o l u t i o n t o Ax
rcond
rea1
Returns l/condl
r e a l (n)
R e t u r n s u n i t 1-norm z s o
On e x i t :
11
info
Az
info
integer
II
=
info > 0
qraux
ipvt
r e a l (n)
i n t e g e r (n)
estimate.
( A )
rcondll z
= b.
11
i n 1-norm.
* s o l u t i o n found.
* A rank d e f i c i e n t .
R e t u r n s i n f o r m a t i o n on Q.
R e t u r n s i n f o r m a t i o n on P .
adim, rn
c a l l STRCO ( A, adim, m,
r c o n d l = r c o n d + 1.0e0
if
( r c o n d l .GT.
1.0e0
n, q r a u x , i p v t ,
z, 1 )
r c o n d , z, 1 )
then
30
x(k)
if
i p v t ( k ) ,LE. m
continue
else
info
endif
return
end
-1
0,
)
x(k) = b(ipvt(k))
Example 3.5-2 illustrates an important point when using SQRDC with column
pivoting. Namely, the user may be obliged to unscramble results as in the 30loop. This is in contrast to the LINPACK solve routines such as SGESL and
SSISL that hide the underlying permutation activity from the user.
(1)
(2)
(3)
(4)
A = QR
A = QL
A = RQ
A = L(2
Q
Q
Q
Q
orthogonal, R
orthogonal, L
orthogonal, R
orthogonal, L
upper triangular
lower triangular
upper triangular
lower triangular
Let E be the exchange matrix, i.e,, the identity with the column ordering
reversed. By working with transposes and the exchange matrix it is possible to
relate the factorizations (2), (3), and (4) to (1). For example, if EAE = QR is
the QR factorization of EAE, then A = (EQE)(ERE) is the "QL"
factorization of A. Write a subroutine
Q R ( A,
adim,
n, Q,
qdim,
job,
work,
qraux
adim, m,
n, B, bdim, p, work, q r a u x
b) Solve R
fl = b
c) Compute
Verify that this algorithm works and implement it in a subroutine of the fonn
S220( D,
ddim,
n , E , e d i m , p, f, b, work
call SSVDC( A,
U,
&
job,
info
Here,
adim
integer
integer
integer
real (n)
real (n)
Work vector.
udim
integer
vdim
integer
work
real@)
Work vector.
j ob
integer
info
integer(n)
The variable job is a two-digit integer AB that prescribes "how much" of the
SVD to compute. For example, U and/or V may not be required. Provision
can also be made to form just the first n columns of U, i,e., the matrix U1 in
Anything less than the full SVD allows for substantial overwriting by identifying
the mays u and/or v with the m a y A in the calling sequence. Here are the
possibilities:
L
job
00
00
01
01
10
11
20
20
21
21
21
Identification
A
A
A
A
A
A
A
A
A
A
A
A
A
U
A
A
U
U
U
A
U
A
U
V
A
A
V
A
A
V
A
V
U
A
Effect
NoUorV.
No U or V. A saved.
Matrix Vreturned in V,
Matrix Vreturnedin topofA,
Matrix Ureturned in U. A saved,
Matrices U and V returned in U and V. A saved.
Matrix U1 returned inU. A saved.
MatrixU1 returnedin A,
Matrices U 1 and V returned in U and V. A saved.
Matrices U1 and V returned in U and A. A saved,
Matrices U1 and V returned in A and V.
i n t e g e r adim, m,
n, vdim, i n f o
r e a l A(adim, * ) , s ( * ) , e ( * ) , V ( v d i m , * ) , b ( * ) , t o l , x ( * )
C
I I Arx -
11
Minimizes
such t h a t b r ( A ) 2 t o 1
where
a n d A,
1 177
m a t r i x t o A.
On e n t r y :
A
adim
integer
matrix
A.
Row d i m e n s i o n o f t h e a r r a y A .
Must h a v e adim 2 m .
integer
Row d i m e n s i o n o f t h e m a t r i x A.
integer
Column d i m e n s i o n o f m a t r i x A ,
r e a l (n)
Work v e c t o r .
vdim
integer
Row d i m e n s i o n o f t h e a r r a y V.
r e a l (m)
The r i g h t - h a n d s i d e v e c t o r .
to1
rea1
A positive tolerance.
to1
>
machine p r e c i s i o n .
On e x i t :
A
r e a l (adim,n) A i s d e s t r o y e d .
r e a l (n)
Returns t h e s i n g u l a r values
i n descending
order.
real(vdim,n)
Returns t h e orthogonal V.
r e a l (n)
Returns minimizer of
integer
I I Arx - b 1 1 2 .
info > 0 implies
info
SVD t r o u b l e .
i n f o < 0 i m p l i e s r a n k (A)
-info.
Local V a r i a b l e s :
i n t e g e r i, j
real ujtb
c a l l SSVDC ( A, adim,m, n , s , e , A ,
if
( i n f o ,NE. 0
do5
i = l , n
x(i)
5
cont inue
0.
then
178
1 LINPACK
do 5 0 j
if
l,n
( s(j)
uj t b
t o 1 ) then
.GE.
=
SDOT ( m,
a (1, j) , 1
b1
1)
c a l l SAXPY( n,ujtb/s(j),V(l,j),l,x(l),l
info = info
endif
50
continue
endif
wetu wn
end
adim, n, U,
udim, P, pdim
Z1
1 179
Triangular
(Solve)
STRSL( SA, adim,
n,
sb,
job,
info )
DTRSL( DA,
adim, n,
db,
job,
info )
CTRSL( CA,
adim,
n,
c b , job,
info )
ZTRSL( ZA,
adim,
n,
zb,
info )
n,
srcond, s z ,
DTRCO( DA,
adim,
n, d r c o n d , d z , j o b
CTRCO ( CA,
adim,
n,
srcond,
cz, job
ZTRCO( ZA,
adim,
n,
drcond,
z z , job
n,
i p v t , srcond, s z )
DGECO( DA,
adim,
n,
i p v t , drcond,
dz )
CGECO( CA,
adim, n ,
i p v t , srcond,
cz )
ZGECO ( ZA,
adim, n ,
i p v t , drcond,
zz
job,
(Condition)
job )
General
(Factor and Condition)
(Factor)
SGEFA( SA, adim, n, i p v t ,
info )
DGEFA( DA,
adim, n, i p v t ,
info )
CGEFA ( CA,
adim, n,
ipvt,
info )
ZGEFA( ZA,
adim, n, i p v t ,
info )
(Solve)
SGESL( SA, adim, n,
ipvt,
sb,
DGESL ( DA,
adim, n,
i p v t , db, job )
CGESL( CA,
adim, n,
i p v t , cb,
ZGESL( ZA,
adim, n, i p v t ,
zb,
job )
job )
job )
(Inverse and D e t e r m i m )
SGEDI( SA, adim, n,
i p v t , s d e t , swork, j o b )
D G E D I ( DA,
adim, n,
i p v t , d d e t , dwork,
CGEDI ( CA,
adim, n, i p v t ,
c d e t , cwowk,
job )
Z G E D I ( ZA,
adim, n,
zdet,
job )
ipvt,
zwork,
job )
General Banded
(Factor and Condition)
SGBCO( SA, adim,
n, p,
q,
i p v t , srcond,
sz
DGBCO( DA,
adim, n, p , q ,
i p v t , d r c o n d , dz
CGBCO ( CA,
adim,
i p v t , srcond, cz )
ZGBCO( ZA,
adim, n, p , q ,
n, p , q ,
i p v t , drcond,
(Factor)
SGBFA( SA, adim, n, p, q, i p v t ,
DGBFA( DA,
adim, n, p ,
q,
CGBFA( CA,
adim, n, p,
q, i p v t ,
ZGBFA ( ZA,
adim, n, p, q ,
info )
ipvt, info )
ipvt,
info )
info )
zz
1 181
182
1 LINPACK
(Solve)
SGBSL( SA, adim,
n,
p,
q,
ipvt,
sb, j o b )
DGBSL( DA,
adim,
n,
p,
q,
ipvt,
db,
CGBSL( CA,
adim,
n,
p,
q, i p v t , cb, j o b )
ZGBSL( ZA,
adim,
n, p,
job )
q,
ipvt,
zb,
sdet,
job )
n, p ,
q,
ipvt,
swork,
job )
D G B D I ( DA,
adim,
n, p ,
q,
i p v t , d d e t , dwork,
job )
C G B D I ( CA,
adim, n , p , q ,
ipvt,
cdet,
cwork,
job )
Z G B D I ( ZA,
adim,
ipvt,
zdet,
zwork,
job )
n, p , q ,
General Tridiagonal
SGTSL( n,
s c , s d , se, s b , i n f o
DGTSL( n,
d c , dd,
CGTSL( n,
ZGTSL( n,
zc,
zd,
de,
ze,
db,
zb,
info )
info )
n,
srcond, s z ,
info )
DPOCO ( DA,
adim,
n,
drcond,
dz,
info )
CPOCO ( CA,
adim,
n,
srcond, c z ,
info )
ZPOCO ( ZA,
adim,
n,
drcond,
n,
info )
DPOFA( DA,
adim,
n,
info )
CPOFA ( CA,
adim,
n,
info )
ZPOFA( ZA,
adim,
n,
info )
(Factor)
zz, info
(Solve)
S P O S L ( SA,
adim, n, s b )
D P O S L ( DA,
adim, n, db )
C P O S L ( CA,
adim, n, cb )
Z P O S L ( ZA,
adim, n,
zb )
adim, n,
D P O D I ( DA,
adim, n, d d e t , job )
C P O D I ( CAI adim, n,
Z P O D I ( ZA,
s d e t , job )
sdet, job
adim, n, d d e t , job )
adim, n, q, s r c o n d , s z ,
DPBCO( DA,
adim, n, q, drcond, d z , i n f o )
info
ZA, adim, n, q , drcond, z z , i n f o
CPBCO( CA,
ZPBCO (
adim, n, q, s r c o n d , c z ,
info
(Factor)
S P B F A ( SA,
adim, n, q, i n f o )
D P B F A ( DA,
adim, n, q, i n f o )
C P B F A ( CA,
adim, n, q, i n f o )
Z P B F A ( ZA,
adim, rr; q, i n f o )
(Solve)
S P B S L ( SA,
adim, n, q, sb )
D P B S L ( DA,
adim, n,
C P B S L ( CA,
adim, n , q, cb )
Z P B S L ( ZA,
adim, n, q, z b )
q, db )
)
)
1 183
184
1 LINPACK
(Inverse and Determinanb)
SPBDI ( SA, adim,
n,
q,
sdet,
job
DPBDI( DA,
adim, n,
q, ddet,
job
CPBDI( CA,
adim, n,
q,
sdet,
job
ZPBDI ( ZA,
adim,
q, ddet,
job
n,
sd,
se, s b , i n f o
D P T S L ( n, d d , d e , d b ,
info )
CPTSL( n,
sd,
ce, c b , i n f o
ZPTSL( n,
dd,
ze,
zb,
info )
Hermitian Indefinite
(Factor and Condition)
SSICO ( SA, adim,
n,
ipvt,
srcond,
sz
DSICO( DA,
adim,
n,
ipvt,
drcond,
dz
CSICO( CA,
adim,
n,
ipvt,
srcond,
cz
ZSICO( ZA,
adim, n,
ipvt,
drcond,
zz
(Factor)
SSIFA( SA, adim,
n,
ipvt,
info )
DSIFA( DA,
adim,
n,
ipvt,
info )
CSIFA( CA,
adim,
n,
ipvt,
info )
ZSIFA( ZA,
adim,
n,
ipvt,
info
S S I S L ( SA, adim,
n,
ipvt,
sb )
DSISL( DA,
adim,
n,
ipvt,
db )
CSISL( CA,
adim,
n,
ipvt,
cb )
ZSISL( ZA,
adim,
n,
ipvt,
zb )
(Solve)
I 185
job)
job)
job)
QR Factorization
(Factor)
SQRDC(SA, adirn, rn,
n, s q r a u x , i p v t , swork, j o b )
DQRDC (DA,
CQRDC (CA,
adirn, rn,
ZQRDC (ZA,
adirn, rn, n,
n, c q r a u x , i p v t , cwork, j o b )
z q r a u x , i p v t , zwork, j o b )
(Solve)
SQRSL (SA, adirn, rn, n , s q r a u x , s b , sQbIsQTb-,
job, i n f o )
186
1 LINPACK
Problems for Section 3.7
1. Some complex problems can be cast in real form at the expense of increased
dimension. For example, the equation
Write real arithmetic subroutines CGECO, CPOCO, CSICO, and CQRDC that
compute the real and imaginary parts of the following factorizations:
CGE co :
CPOCO :
CS I co :
CQRDC:
A = L U,
A =U ,
A = LDLH,
A = QR,
2. Write the double precision and complex single precision versions of the
subroutine SGSOL in Example 3.2- 1.
3. Write the double precision and complex single precision versions of the
subroutine LS in Example 3.5- 1.
Chapter 4
MATLAB
MATLAB is an interactive system in which it is possible to express matrix
algorithms at a very high level. For example, QR factorizations and eigenvalue
decompositions are "one-liners." Submatrices and block matrices are easily
manipulated.
Our aim is to convey a sense of the MATLAB language. We do not
cover MATLAB graphics or those aspects of MATLAB that involve interaction
with the underlying operating system. These matters are best left to The
MATLAB User's Guide [7] and the excellent on-line help facility.
4.1
BASICS
Once a MATLAB session begins, the user can create matrices and vectors
and perform computations that involve them. We describe some of the ways
that matrices and vectors can be set up, some elementary operations that can be
performed, and how to display results.
Setting Up Matrices
Matrices can be explicitly introduced with assignment statements of the
form
{ r w 2 ; ; { ruwm} 1;
where the n entries for each row are separated by one or more blanks. It is also
legal to make input "look like" output when setting up matrices.
Simple Output
If a statement ends with a semicolon, then the result of the statement is
not displayed. Thus, if you enter
then
displays A.
4.1 BASICS
1 191
and then subsequently overwrite this matrix with the 2-by-2 identity:
The system "knows enough" to recognize that B has changed from a 2-by-3
matrix to a 2-by-2 matrix.
There are other important aspects associated with MATLAB's automatic
dimensioning capability. If C is uninitialized then the command
In other words, MATLAB makes C "just large enough" so that the assignment
makes sense. If you now enter
then
Subscripts
Subscripts in MATLAB must be positive. Fractional subscripts are
"floored:
4.1 BASICS
1 193
Continuat ion
Sometimes it is not possible to put an entire MATLAB instruction on a
single line. When this is the case, two adjacent periods can be used to signal
continuation. Thus
~ = [ 1 2 3 ; 4 5 6 ;
7 8 9 I;
..
is the same as
Variable Names
Variable names in MATLAB must consist of nineteen or fewer characters.
The name must begin with a letter. Any letter, number or underscore may
follow, e.g., sym-p a r t 1.Upper and lower cases are distinguished when a
MATLAB session begins. However, the case sensitivity status can be toggled
by issuing the command casesen. Thus if upper and lower cases are currently
distinguished then
casesen
A = [ 1 2 ; 3 41;
a(2,2) = 1
produces
where A is a matrix, b is a vector, and the ci are scalars. Here are two
equivalent ways to compute f :
f
A* (A* (c(3)*Akb
+ c(2)*b
+ c(l)*b)
4.1 BASICS
1 195
Although these two statements are equivalent, the first turns out to be much
more efficient than the second. See the discussion of flops later on in this
section.
Transpose
The conjugate transpose of a matrix can be obtained using a single quote:
In general, if a unit increment is required then the assignment takes the form
= {
start
) :{
stop)
while
v = { start ] : { increment ) : { stop )
should be used for increments other than unity. The starting value, increment,
and terminating value may be determined by expressions and they need not be
integers.
Remember that the colon method generates row vectors.
l o g s p a c e (dl,d2, n )
then
A =
e y e (m,n )
z e r o s (m,n)
A = o n e s (m, n )
4.1 BASICS
1 197
If dimension can be inferred from context, then ones can be used without any
arguments. For example,
A
= A
ones
assigns the m-by-n matrix of ones to A . (B must not be a 1-by-1 matrix for
this to work.) Similar comments apply to
A = eye (B)
and
A = zeros (B)
Toeplitz matrices can be generated using toeplitz. if r and c are nvectors and
A
= t o e p l i t z ( c , r)
Random Matrices
It is possible to generate matrices and vectors of random numbers:
A
A =
A =
rand (m,n)
rand(n)
rand(B)
*A
There are two probability distributions from which random numbers can be
selected: the uniform distribution on [0,1] and the normal distribution with mean
0 and variance 1 .The commands
rand ( 'uniform')
and
rand ( ' normal '
*
*
n = current seed
current seed = n
4.1 BASICS
1 199
Complex Matrices
If A and B are real matrices with the same dimension and i2 = - 1, then
the complex matrix C = A +iB can be generated as follows:
C = A
+ sqrt ( - 1 ) * B
sqrt (-1);
Pointwise Operations
Element operations between two matrices of the same size are also
possible:
Of course, the above statements make sense only if the corresponding point
operations make sense.
Another example of the "dot" operator is
= ajp
More on InputIOutput
The format for printed output can be controlled using the following
commands:
format
format
format
format
format
short
short e
long
long e
hex
3
3
a
==3
Machine Precision
At the start of a MATLAB session, the variable eps contains the
effective machine precision, i.e., the srnallest floating point number such that if
eps
then x > 1. It is handy to use eps for zero checking in the presence of
roundoff. It is legal to alter the value of eps but it is not recommended.
Flops
The variable flops maintains the current number of flops that have been
performed since the MATLAB session began. The command
f l o p s (0)
resets this counter to zero. For real operands, each arithmetic operation counts as
a single flop. For complex operands, addition and subtraction count as two flops
while multiplications and divisions each count as six flops.
To illustrate the flop notion, suppose A (10-by-lo), b (10-by-l), and c
(1-by-1) are initialized and real. The following table lists the number of flops
required for certain basic computations:
L
Operation
c*A
A*b
A*A
b*bl
A*b*bl
b*b' *A
b* (b'*A)
Flops
100
200
2000
200
400
2200
400
L
have been executed. How many flops are required to execute each of the
following statements:
4. Suppose n is a positive integer and low and high are real numbers that
satisfy low < high. Write a one-liner that assigns to A an n-by-n matrix
whose entries are chosen from the uniform distribution on the interval [ low ,
high] ,
5. Assume that n (integer), w (complex), and x (n-by-1) are given. For each
of the following examples write a one-liner that generates the n-by-n matrix A
= (00) :
For-Loops
In MATLAB, for-loops have the form
end
For example,
f o r i = 1:3
x ( i ) = i;
end
is equivalent to
for i
1:n-1
B(i,i) = d(i);
B
+
) = e(i);
end
B (n,n) = d ( n )
=
amount of output would result if the semicolons in the loop body were deleted,
Negative increments are also possible in a for-loop. The segment
F = c ( d l *A;
for k = d - 1 : - l : l
F = A*F + c ( k ) * A ;
end
is equivalent to
Rela ti ons
A relation has the form
( matrix expression ) ( relational operator ) ( matrix enpression )
--.-=
c
<=
>
>=
equals
not equals
less than
less than or equal to
greater than
greater than or qua1 to
The value of a relation is a 0-1 matrix that has the same size as the matrix
expressions in the relation. The 1's appear everywhere the relation holds. Thus
We again stress that there is only one variable type in MATLAB: complex
matrices. The matrix T is not a Boolean array. It is a complex matrix that
happens to be storing 0's and 1's.
then
prompts
while
sets
and
renders
1 207
It is possible, but not good practice, to apply logical operations to matrices that
are not 0-1.
"ifu Constructions
The simplest if statement has the form
if ( relation )
( statements )
end
Thus,
if print -flag > 0
A
end
else
A(i,j)
end
end
end
= 1;
If the tested relation compares scalars, as in the above examples, then the
if "reads" in the familiar way; but a relation can also compare matrices. In this
case the tested relation is regarded as hue if and only if it is true in each element.
For example, if A and B are real matrices then
i f A > B
C = A;
end
returns "1" if any vi > 1 and it returns "0" otherwise. (Recall that v > 1 is a
0-1 vector.)
If the function all is applied to a vector then it returns a " 1" if all the
entries are nonzero. Otherwise it returns a "0." Thus,
a l l ( v > ones
any (A)
all (A)
w = [ a n y ( a l ),..., any&) I
w = [ all( al ) ,..., all( a,)]
While-Loops
While-loops are also possible and have the general form:
while ( relation )
( statements )
end
Here is a while-loop that is equivalent to the statement
&
==
z = fib(j-1) + fib(j-2);
i f z >= 1 0 0 0
break
end
fib(j) = z
end
terminated.
The use of break should be minimized in the interests of readability. It is
usually possible to avoid break merely by using a while-loop instead of a forloop. Here is a rewrite of the above Fibonnaci computation:
fib(1) = 1
fib(2) = 1
j = 2;
z = fib(1) + fib(2)
w h i l e z < 1000
j = j+l;
fib(j) = z
z = f i b ( j ) + fib(j-1)
end
All A12
A21 A22
A31 A32 f
The general command for setting up a block matrix has the form
A
= [
( blockrow I );
...; ( b l o c k r o w p ) ]
The block rows can be specified by expressions but they each must have the
same number of columns. For example, if A (n-by-n) and b (n-by-I) are
given, then
When setting up block matrices it is crucial that no blanks appear within the
1 213
would result in an error because of the spaces in the (1,2) block expression.
As another example, here is a loop that sets up a random block diagonal
where each Ai is p-by-p.
matrix A = diag(A1,...,Ak),
A = rand(p,p)
for i = 2:k
[ m,m] = size (A) ;
A = [ A z e r o s ( m , p ) ; z e r o s ( p , m ) r a n d ( p ,P ) 1
end
B = [I;
f o r j = n:-l:l
B = [ BA(:,j)
I;
end
Designating Submatrices
If A is m-by-n and the integers i, j, p, and q satisfy
l l i l j 5 m
l l p S q l n
then
[a;
B = A( i: j, :)
B =
ajz
T]
.. ai,
1 215
It is often handy to use variables to hold the vector of indices that define a
submatrix. For example, the above 3-by-2 submatrix can be generated as
follows:
rows = [ 1 3 5 1 ;
c o l s = [ 2 4 1;
B = A(rows, c o l s ) ;
Assignment to a Submatrix
It is also possible to make an assignment to a designated submatrix. For
example, if A (m-by-n), v (n-by-1), and w (m-by-1) are given and the
integersp and q satisfy 1 I p I m and 1 I q In then
Kronecker Product
The Kronecker product of two matrices A and B can be calculated
using the built-in function kron:
This is equivalent to
[I;
[m,n] = s i z e ( A ) ;
f o r j = 1:n
ccol = [ I
f o r i = 1:m
ccol = [
ccol
A(i,j)*B 1;
end
C = [ C ccol 1 ;
end
Thus, if
C = k r o n (
then
[ 1 2 ] ,
[ 1 2 ;
3 4 1 )
1 217
is equivalent to
x = [I
for j = 1:n
x = [ x ; A
end
: 1
is the same as
for j = 1:n
A(:,j) = v( (j-l)*m+l:j*m 1
end
with
4.4
BUILT-IN FUNCTIONS
=3
real (A)
=3
B = imag (A)
=3
B = c o n j (A)
B = (lagI )
B = (real(aij) )
B = ( i m a g ( a ~ ))
B = real(A) - i irnag(A), i2 = - 1
Thus,
A = c o n j (A) '
Norms
Various norms are easily computed in MATLAB. If A (m-by-n) is
given and real then
norm(A)
=3
= 1 1 A llz
n o r m (A, 1)
= 1 1 A 11,
=3
= IIAIl,
t = norm(A, 'FRO')
1 1 A [IF
1 221
=3
= max Vi
Here, v is either a row or column vector. The index of the largest entry can
also be computed:
max (v)
w = m a x ( a b s ( v ))
If A is m-by-n and m > 1 (to rule out the row vector case) then
v = max (A)
=3
and
is equivalent to
for j
1:n
max(A(:,j))
end
rnax (min ( A ) )
If A is m-by-n and m > 1 , then sum and prod return row vectors. In
particular
is equivalent to
for j
V(J)
1:n
= sum(A(:,j))
end
while
is synonymous with
for j = 1:n
v(j) = prod(A(:, j ) )
end
1 223
s o r t (x)
v = Px
vl
I v2 I * * * Iv,
. The statement
[ v , i ] = s o r t (x)
s o r t (x)
v@=x(i@), j =1:n.
B = [ sort(A(:,I)) ,,..,sort(A(:,n)) ]
B = s o r t (A)
while
[ B , i ] = s o r t (A)
, j = 1:n
If sort is applied to a complex argument then it sorts on the absolute value of the
argument.
The find function can be used to locate nonzero entries. If v is a vector
then
find(v)
v ( i ) = v ( i ) -t 1;
end
=
224
1 MATLAB
fix (x)
floor (x)
ceil (x)
a
3
round
round
round
round
x to nearest integer
x to zero
x to
-..
x to -t-
B = t r i l (A)
B = t r i u (A)
a b j = ay
a bii = ag
1 225
tril (A,k)
B = triu(A,k)
b ~ = a i j ifi+k >j,
zero otherwise
ifi+k5j,
zero otherwise
a bij=av
Thus,
H = t r i u ( A , -1)
H = tril(A,l)
Likewise,
T = t r i l ( t r i u ( A , - l ) ,1)
diag(A)
while
diag(x)
More generally,
D = d i a g ( x , k)
toeplitz (r,c)
is equivalent to
T = zeros (n)
for k = 0:n-1
j = n-k
i f k > O
T = T + diag(r(k)*ones(j,l),k)
end
T = T -t diag(c(k)*ones(j,l),-k)
end
If A is m-by-n, then
eps 0, 2 or+,>
2 aq 2 0
q = min{m,n)
(Recall that eps is the unit roundoff.) It is also possible to base the rank
decision on an arbitrary tolerance. If to1 is a nonnegative scalar then
r
defines r by
rank(A, tol)
1 227
Condition
The 2-norm condition o / omin
of a matrix A can be computed as
follows:
kappa
cond(A)
rcond(A)
Determinant
The determinant of a square matrix A can be computed as follows:
d
d e t (A)
Elementary Functions
Elementary functions supported by MATLAB include
trigonometric :
inverse trigonometric:
exponential:
hyperbolic:
sin(.)
asin(.)
exp(.
sinh(.)
cos(.)
aces(.)
log(.)
cash(.)
tan(.)
atan(.)
log10 ( . )
tanh(.)
Functions of Matrices
MATLAB has provisions for computing several important matrix
functions. If A is a square matrix then
More generally, if
F
p l y (A)
The function poly can also accept vector arguments. If r is an n-vector then
poly(r)
H
cn+l+ cnX + * * r
Thus,
then
r
r o o t s (c)
W
(h - rl)
--
(h - rJ = cn+l+ cnh +
- m e +
czhn-' + clhn
Thus,
Fourier Transforms
If x is an n-vector and n is a power of two then
y = fft(x)
= Fourier transform of x
y = ifft(x)
=3
Before these functions are applied, the vector x is padded with zeros (if
necessary) so that its dimension is a power of two,
The functions dft and idft correspond to fft and ifft. However, they do
not pad x with zeros if n is not a power of two and they do not invoke the fast
Fourier transform.
Two-dimensional Fourier transforms are also possible:
B
= f ft
2 (A)
B = i f f t2 (A)
=3
B = 2D Fourier transform of A
- A
B
O
O
- 0
B
A
B
O
0
0 0 O 7
B
0 0
A B 0
B A B
O B A -
1 231
4.5
FUNCTIONS
An Example
A simple example best motivates the ideas behind user-defhed functions
in MATLAB. Here is a function that returns a unit 2-norm vector in the
direction of Az, where A (n-by-n) and z (n-by-1) are given.
Example 4.5-1
function
prodAx (A, x )
%
%
T h i s f u n c t i o n r e t u r n s a u n i t 2-norm v e c t o r i n t h e
a n d x i s n-by-1.
A*x;
norm(y);
i f c -= 0
y = y/c;
else
y
[ 1 ; z e r o s ( l e n g t h ( x ) -1,l) ] ;
4.5 FUNCTIONS
1 233
performs 10 steps of the power method, overwriting v with a unit vcc tor in the
direction of Alov.
The example highlights a number of key points associated with the use of
functions.
The first line in the function definition is of the form
help prodAx
Our indentation of "noncomments" is optional and is done for
the sake of readability. Formally, the "%" symbol "says" to ignore
the rest of the line. Thus, we could write
y = y/c ;
% scale y
')
=
=
k r y (A, v, p )
Av
AA2 v
, . . . , An (p-1)
v 1 DI
Computes K
w h e r e A i s n-by-n,
c h o s e n s o t h a t t h e columns o f K h a v e u n i t 2-norm.
[ v
v i s n-by-1,
and D i s d i a g o n a l
%
[ m,n
nv
if
] = s i z e (A);
l e n g t h (v)
n I n-=
-=
nv
else
K = v/norm(v);
for j
K
=
=
2:p-1
[ K
prodAx(A,K(:,j-1))
I;
end
end
return
This example illustrates the return statement. It essentially has the same role to
play in MATLAB as it does in Fortran. A MATLAB function can have more
than one return or it need not have any.
4.5 FUNCTIONS
1 235
Example 4.5-3
f u n c t i o n [ P N]
= pn(A)
%
% C o m p u t e s n o n n e g a t i v e matrices P a n d N ,
% A ( i , j)
i m p l i e s P (i, j ) = A ( i , j )
> EPS
A ( i , j ) < EPS
i m p l i e s P (i,j )
1 A
j)
N(ir j)
(i,j )
0, N
I <= E P S i m p l i e s P ( i , j )
where
N(i,j)
= - A ( i , j)
= 0
E
P
EPS*ones (A);
A.*( A
N = -A.*(-A>
> E
);
E )
end
Statement
pn ( A )
[ P I N 1 = pn(A)
P = pn(A, t o l )
[ P I N ] = pn (A, t o l l
Tolerance
Output
ePS
ePs
to1
to1
Just P
PandN
Just P
P and N
h
Example 4.5-4
f u n c t i o n [P N ]
pn(A,tol)
%
% Computes n o n n e g a t i v e m a t r i c e s P and N ,
where
% A(i,j)
> EPS
i m p l i e s P ( i , j ) = A ( i , j)
% A(i,j)
< EPS
i m p l i e s P ( i , j ) = 0, N ( i , j)
% 1 A
= N(i,
N ( 1 , j)
j)
-A(i,j).
= 0
I f one i n p u t a r g u m e n t , t o 1
I f one o u t p u t argument, P o n l y i s r e t u r n e d .
EPS.
i f nargin
to1
==
EPS;
end
E
tol*ones(A);
P = A,*( A
> E
i f nargout
==
);
> E
N = -A.*(-A
end
a f u n (A, f )
Tf A i s a n m-by-n
a built-in
m a t r i x a n d f i s t h e name o f e i t h e r
on s c a l a r s , t h e n t h i s f u n c t i o n r e t u r n s a n m-by-n
m a t r i x F w i t h F ( i , j)
= A
be i n quotes, e . g . ,
to
cos(A(i,j))
for j
1:m
afun(A,'cosl)
sets F ( i , j )
f o r i = 1:n
F ( i , j ) = eval([ f
'(A(i,j))'
);
end
end
'cos'
assigns the three-character string 'cos' to the string sl. Strings can be concatenated. Thus,
s1 = ' c o s '
s2 = ' ( A ( i , j ) ) '
s = [ s l s21
is equivalent to
The built-in function eval takes a string and executes it, assuming that the
string is a meaningful MATLAB statement. Thus,
is equivalent to
means: concatenate the contents of f with ' (A (i,j ) ) ' and execute. If f
has the value ' cos ' we see that
is equivalent to
Global Variables
The MATLAB version of the Fortran common construct is global. The
values in all global variables can be accessed whenever inside a function.
Suppose x and y are initialized vectors of equal dimension and we execute
global y
a = f (x)
where
function a
a = yt*z
f (2)
This is equivalent to
Recursion
A MATLAB function can refer to itself. We illustrate this with a
somewhat contrived example that is concerned with L1-norm computation.
4.5 FUNCTIONS
1 239
Example 4.5-6
f u n c t i o n normLl
f (x)
R e t u r n s t h e 1-norm o f a n n - v e c t o r x
l e n g t h (x);
i f n == 1
normLl
abs (x);
else
normLl = f ( x(1:n-1)
+ abs(x(n));
end
Assume that all the blocks are square and of the same dimension. Strassen
showed how to compute C with seven multiplies and eighteen adds :
S t r a s s e n m a t r i x m u l t i p l i c a t i o n C = AB, A,B s q u a r e .
I f n <= nmin,
m u l t i p l y done c o n v e n t i o n a l l y .
[ n n
size(A) ;
n <= nmin
if
C = A*B;
% n
small
get C c o n v e n t i o n a l l y
else
m
n/2;
1:m
P1
s t r a s s ( A ( u , u ) + A ( v , v ) ,B ( u , u ) +B ( v , v )
P2
strass ( A ( v , u )
);
P6 = s t r a s s (
P7
A(u,v)
P4
strass (
P5
strass(
- strass(
C = [ Pl+P4-P5+P7
A(v,v)
, nmin
, nmin ) ;
A ( u , u ) , B ( u , v ) - B ( v , v ) , nmin ) ;
A ( v , v ) , B ( v , u ) - B ( u , u ) , nmin ) ;
A ( u , u ) + A ( u , v ) , B ( v , v ) , nmin ) ;
A ( v , u ) - A ( u , u ) I B ( ~ I ~ ) + B (,~ nmin
I ~ )
P3 = strass(
end
; v = m+l:n;
B(u,u)
A ( v , v ) , B ( v , u ) + B ( v , v ) ,nmin );
P3+P5 ; P2+P4
Pl+P3-P2+P6
I;
);
4.5 FUNCTIONS
1 241
that computes the product A = BCD, where B,C, and D are matrices. The
order of multiplication should be determined to minimize the number of required
flops. An error message should be printed if either (BC) or (CD) is not defined
because of dimension conflicts.
2. Write a MATLAB function
where A and B are square matrices of the same size and s is a string that
prescribes the product of A's and B's that is to be performed. For example
C
would be equivalent to
Assume that the names of the two input matrices are one character long.
3. Write a MATLAB s e p e n t that generates twenty 2-by-2 random matrices and
storesthemin A O O , A 0 1 , ..., A 1 9 . Hint: S e t s = ' 0 1 2 3 4 5 6 7 8 9 ' then
generate matrices using eval and concatenation.
follows:
, m ] = sylv(X,A,B) that
performs as
[L,ml
L
= sylv(X,A)
= S Y ~ V
(x,A)
J
J
L=AX+XAT, r n = I I ~ , 1 1 ~
L=AX+XA~
4.6 FACTORIZATION
One of the most powerful aspects of MATLAB is the ease with which
various matrix factorizations can be computed. In this section we survey the
high-level commands that rnake these computations so easy.
i n v (A)
i n v ( A ) *B
The LU Factorization
The factorization A = LU, where U is upper triangular and L is a
row -permuted unit lower triangular matrix, can be computed as follows:
Thus,
chol (A)
computes an upper triangular R such that A = RHR. Only the upper triangular
portion of A is accessed.
There is no built-in function for computing the factorization A = LDLFI,
where L is unit lower triangular and D is diagonal. However, the segment
L = chol (A)';
d = diag(L);
L = Lf
for k = 1:n
L(:rk) =L(:,k)/d(k);
end
D = diag( d.*d);
QR Factorization
If A is an m-by-n matrix then
4.6 FACTORIZATION
1 245
Here, R
is r - b y - r and R l 2 is r-by-(n-r), where r = rank(A).
Moreover, for i = l:min(m,n) we have
qr(A) ;
[ m,n ] = s i z e ( A ) ;
y = R(1:m , 1:m) \ Q'*b ;
x = P * [ y ; z e r o s (n-m,1) ]
=
4.6 FACTORIZATION
1 247
An orthonomd basis for the null space of a matrix can be obtained using
null. If
Q
n u l l (A)
This is the singular value decomposition and the diagonal entries of S are
arranged so that
If m 2 n then the "compact" SVD in which U is m-by-n and S is n-byn can be detemined using a two-argument version of svd:
This is equivalent to
[ U,
Sf V ] = svd(A);
[ rn, n I = s i z e ( A ) ;
i f rn >= n
U = U ( : , 1:n);
S = diag(diag(~));
end
svd ( A ) ;
is a shortcut for
pinv ( A )
pinv ( A ) *b ;
The backslash operator can also be used to solve least squares problems.
However,
4.6 FACTORIZATION
1 249
computes the basic solution via QR with column pivoting. Multiple right-hand
side least squares problems can be solved with either of the commands
hess ( A ) ;
hess ( A )
In practice, one would want to exploit the fact that the matrix Ql is a
Householder matrix.
schur (A);
computes the Schur form of the n-by-n matrix A. There are two cases to
distinguish. If A is real then Q is orthogonal and T = QTAQ is upper quasitriangular, This is the real Schw form of A. If the imaginary part of A is
nonzero, then Q is unitary and T = QfIAQ is upper triangular.
The Schur decomposition of a real symmetric matrix A has the form
QTAQ = diag(dl,. ..,d,)
where the eigenvalues can be ordered as follows:
D 1 = schur (A);
[ Dl
indx I = sort ( d i a g ( ~ ) ) ;
indx = indx( n:-1:l);
Q = Q( : , indx ) ;
D = ~(indx~indx);
[Q
The function rsf2csf can be used to compute the complex Schur form
from the real Schur form. (Recall that a real matrix can have complex
eigenvalues in which case a complex unitary Q must be used to upper
triangularize A.) In particular, if A is square and
[ Q
[ Q
,
,
T ] = schur (A);
T ] = rsf2csf(QIT);
4.6 FACTORIZATION
I 251
s (i)
abs
( x i * y i ) / ( n o r m ( x i )* n o r m ( y i )) ;
end
r h o = max ( a b s ( e i g ( A ) ) ) ;
computes the spectral radius of the matrix A, i.e., the magnitude of the largest
eigenvalue in absolute value. Likewise,
alfa
max ( r e a l ( e i g ( A ) )
computes the spectral abscissa, the maximum real part of any eigenvalue.
Generalized Eigenvalues
A two-argument version of eig can be used to solve the generalized
eigenvalue problem
,..., d,)
computes unitary Q and Z such that Q1i A Z = R and QHBZ = S are upper
triangular. The matrix X = [ xl ,..., x, ] is the matrix of generalized
eigenvectors and sarisfics
qz(A,B);
d i a g ( d i a g(R) / d i a g (S) ) ;
[ R, S r Q r Z r X
4.6 FACTORIZATION
1 253
eig ( A , B) .
2. Suppose
is positive definite and that R is an upper triangular matrix such that A = RTR,
Write an efficient MATLAB segment that overwrites R with C's Cholesky
factor.
where Q is unitary and h has the largest absolute value amongst A's
eigenvalues.
6. Write a MATLAB function
k= 1
k =2
k=3
k =4
*
*
*
*
A = QR,
A = QR,
A = RQ,
A = RQ,
Q
Q
Q
Q
unitary,
unitary,
anitary,
unitary,
unitary
T upper triangular
*
*
7. Note that if
upper triangular
lower triangular
upper triangular
lower triangular
EQE unitary
ETE lower triangular
R
R
R
R
P = s q r t ( a - w r w ) . Use
R = cholrecur (A)
this to develop a
4.7
MISCELLANEOUS
IEEE Arithmetic
The result of a zero divide is a special value called "inf." Thus,
ttx
IIp Ily Ib
for p = 1:lO:
f o r p = 1:10
q = p / ( p -1);
a l f a = norm(x,p)
end
w = norm(x, q )
as required.
norm(x,q)
- a b s ( x l*y)
w = norm ( x , i n )
4.7 MISCELLANEOUS
1 257
=3
=3
= y*z
a = x / y
a = [ x ; y ]
=3
=3
a=i$
a=O
a=Nd
a=Nd
a(1) = inf, a(2) = NaN
The function clock returns Lhe currcnt time in a row vector having dimension
six. (It specifies the year, month, day, hour, minute, and second.) The
"difference" between the two "time vectors" t 1and t 2 is computed by etime,
which returns the elapsed time in seconds. Such computations are fairly accurate
at the millisecond level,
Timed Displays
A delay can be built into a MATLAB segment with the pause statement.
Here is an example that displays the n-by-n magic square matrix on the screen
for approximately n seconds for n = 1:15 .
for n
1:15
A = magic(n)
pause ( n )
end
=
1:15
A = magic ( n )
d i s p ( ' S t r i k e any key t o see n e x t . .
magic s q u a r e ' )
pause
end
=
Getting Input
It is possible to prompt the user for an input value during execution of a
MATLAB segment. For example,
n = l
w h i l e ( n >= 1)
A = magic ( n )
n = i n p u t ( ' E n t e r dimension1)
end
4.7 MISCELLANEOUS
1 259
bench(n,f)
that returns the time it takes to compute f (rand(n)) . Here f is a string that
names any MATLAB function that can operate on a square matrix. Sample:
time
bench(l0, 'svd').
final program should not be tailored to this particular example. How you
compute Aqx depends on n and q, and you are expected to work out a
heuristic that "picks" the appropriate method as a function of these two
parameters.
References
[l] J. DONGARRA,
J.R. BUNCH,C.B. MOLER,AND G.W. STEWART
(1978),
LINPACK User3 Guide, Society for Industrial and Applied Mathematics,
Philadelphia, PA.
[2] J. DONGARRA,
J. DUCROZ,S. HAMMARLING,
AND R. J. HANSON
(1984),
An extended set of Fortran Basic Linear Algebra Subprogram, Report
TM41, MCS Division, Argonne National Laboratory, Argonne, IL; ACM
Trans. Math. Software, to appear.
[3] J. ~ G A R R AJ., DU CROZ,1.S. DUFF,AND S. &~MMARLING(1987), A
proposal for a set of level 3 Basic Linear Algebra Subprogram, Report TM
212, MCS Division, Argonne National Laboratory, Argonne, IL.
[4] G.H. GOLUBAND C. VAN AN (1983), Matrix Computations, The Johns
Index
.or. 19
overflow 98-99
packed storage 39-40
parameter 15-16
precedence 9
Propun
organization 3
overflow 8
read 5,69
formatted 72-75
l i s t d k ~ t e d69-71
readability 10
real 6
relational operators 18
return 45
rounding errors 8
testing for 94
save 53-54
sequence numbers 4
small corrections 96-97
stride 63-64
submatrix arguments 62
subprograms 42-68
built-in functions 42-43
functions 4347
functions versus subroutines 5 1-53
style 54-55
subroutines 49-5 1
string manipulation 10-12
style and
if constructs 23
loops 30
subpropuns 54
termination criteria 95-96
tYPe
conversion 12
list of 6
until-loops 26
whileloops 25
write 5,71
formatted 75-76
list-directed 7 1-72
BLAS
complex 132
double precision 132-135
Givens rotation
application 126-127, 129
construction 125
reconstruction 126
zerolng with 128
Hessenberg QR 130
Householder vector 123
matrix
circulant 110
copy 108
Krylov 117
skew part 115
symmetric part 115
transpose 109
matrix norm
Frobenius 121
one-norm 122
matrix-vector multiplication 116
least squares
univariate 114
srot 126
srotg 125
snrm2 119
sscal 112
s m p 107
vector norm
infinity 120
one-norm 119
two-norm 119
vector operations
copy 107
dot product 113
scaling 112
swapping 107
LINPACK
band matrix
Cholesky 162
Gaussian elimination 161
tridiagonal 163
bandwidth 161
complex versions 180-185
condition estimation
142143
sppco 155
sppsl 155
sptsl 163
sqdc 166
q r s l 167- 170
ssico 153
ssifa 154
ssisl 154
sspco 155
sspsl 155
S S V ~ C 175- 176
strco 142
strdi 143-144
strsl 140
MA'TLAB
abs 219
all 208-209
and 206
any 208-209
block matrices 212213
break 210
ceil 224
chol 244
colon notation 195- 196,
213-215
cond 227
conj 219
continuation 193
det 227
diag 225-226
dimensioning 1 W19 1
disp 233
eig 251
elementary functions 227
eps 200-201
eye 196-197
find 223
fix 224
floor 224
flops 201
for-loops 203-205
Fourier transform 229
functions 232-240
functions of matrices 228
generalized eigenvalues 252
global 238
hdp 233
hess 249
pinv 248-249
p01y 228-229
prod 222
kron 216-217
rand 198
226
resl 219
relations 205-206
roots 228-229
round 224
length 192
linear equation solving 243
logspace 1%
lu 243-244
matrices
block 212-213
complex 199
empty 213
random 198
setting up 189
matrix operations
addition 194
exponentiation 194
Kronecker product 2 16-217
multiplication 194
pointwise 19P200
transpose 195
matrix-to-vector 2 17
max 221
min 221
names 193
norms 219
not 206
null 247
num2str 259
ones 19&197
or 206
orth 246
output 190
scalars 192
schur 250
size 192
sort 223
strings 259
submatrices 213-21 5
submatrix assignment 2 15-2 16
subscripts
sum 222
svd 247-248