Contents
Preface
viii
Part I
1
1.1
1.2
1.3
1.5
1.4
Matrix Methods
General Considerations
Determinants
Gaussian Elimination
Matrix Inverse
Cramers Rule
1.6
1.7
Linear Transformations
Note on Norms
2
2.1
2.2
2.3
2.4
Properties of Eigenvectors
Linear Systems of Equations
Quadratic Forms
Normal Matrices
Diagonalization
2.4.1
2.4.2
iii
3
3
6
8
9
15
17
17
18
18
19
22
26
28
30
30
33
33
34
37
38
41
43
43
52
52
53
56
61
63
64
65
iv
2.5
Contents
Systems of Ordinary Differential Equations
2.5.1
2.5.2
2.6
Decomposition of Matrices
2.6.1
2.6.2
Polar Decomposition
SingularValue Decomposition
2.7
Functions of Matrices
3
3.1
3.2
Eigenfunction Expansions
3.2.1
3.2.2
3.3
Derivative Operators
Integral Theorems
Coordinate Transformations
Extrema of Functions
4.2.1
4.2.2
4.2.3
4.2.4
4.3
Laplace Equation
Unsteady Diffusion Equation
4.2
4
4.1
Definitions
Eigenfunctions of Differential Operators
3.4
Definitions
Linear Independence of Functions
Basis Functions
General Considerations
Constrained Extrema and Lagrange Multipliers
Linear Programming
Quadratic Programming
Part II
Numerical Methods
68
68
78
81
82
83
88
93
94
94
94
95
99
99
100
107
107
109
111
115
116
120
124
124
127
130
132
132
132
135
138
139
143
143
146
149
5
5.1
5.2
151
151
152
6
6.1
156
157
157
158
6.1.1
6.1.2
Operation Counts
IllConditioning and RoundOff Errors
Contents
6.2
6.4
SingularValue Decomposition
161
162
164
166
167
170
171
172
175
175
175
176
180
182
183
184
185
10
Numerical Integration
186
11
11.1
FiniteDifference Methods
General Considerations
6.2.1
6.2.2
6.2.3
6.2.4
6.2.5
6.2.6
6.2.7
6.2.8
6.3
LU Decomposition
Cholesky Decomposition
Partitioning
Iterative Convergence
Jacobi Method
GuassSeidel Method
Successive OverRelaxation (SOR)
ConjugateGradient Method
Similarity Transformation
QR Method to Obtain Eigenvalues and Eigenvectors
Arnoldi Method
11.5
187
187
187
190
192
197
199
200
201
204
12
206
13
13.1
13.2
13.3
13.4
13.5
207
207
210
211
212
213
14
14.1
14.2
214
215
218
219
222
224
11.2
11.3
11.4
14.3
vi
Contents
14.3.1 Jacobi Method
14.3.2 GaussSeidel Method
14.3.3 Successive OverRelaxation (SOR)
14.4
Boundary Conditions
14.4.1 Dirichlet Boundary Conditions
14.4.2 Neumann Boundary Conditions
14.5
14.6
14.7
14.8
15
15.1
15.2
15.3
Implicit Methods
15.3.1 FirstOrder Implicit Method
15.3.2 CrankNicolson Method
15.4
15.5
Multidimensional Problems
15.5.1
15.5.2
15.5.3
15.5.4
225
226
227
228
228
229
231
231
232
234
237
238
240
252
253
253
255
258
259
260
260
261
262
263
264
266
268
268
269
272
272
273
274
278
279
280
280
282
16
285
17
17.1
286
286
286
286
287
288
289
17.1.1 Introduction
17.1.2 Sequential Method
17.2
Contents
17.3
Parallel Computing
289
290
291
296
297
Part III
299
17.4
vii
Applications
18
301
19
19.1
19.2
19.3
19.4
19.5
19.6
19.7
302
303
309
309
313
317
321
327
20
20.1
20.2
20.3
20.4
20.5
Continuous Systems
Wave Equation
Electromagnetics
Schr
odinger Equation
Stability of Continuous Systems BeamColumn Buckling
Numerical Solution of the Differential Eigenproblem
329
329
337
337
337
341
341
341
342
343
344
347
351
352
20.6
Hydrodynamic Stability
20.6.1
20.6.2
20.6.3
20.6.4
20.6.5
21
21.1
21.2
21.3
21.4
354
354
354
354
354
22
22.1
22.2
355
356
356
Appendix A
357
Preface
Preface
ix
Matrix methods are interpreted loosely to include function spaces and eigenfunction expansions, vector calculus, linear systems,etc.
Focus on topics common to numerous areas of science and engineering, not
subjectspecific topics.
Part I is material typically found in an engineering analysis or linear algebra
text, part II is material typically found in a numerical methods book, and part III
is found in a variety of domainspecific texts on linear systems theory, dynamical
systems, image/signal processing, etc. Of course, all of this is treated within a
unified framework with matrix methods being the centerpiece.
Part I contains the mathematical foundations for the topics in the remainder
of the book. Mathematicians would refer to the material in Chapters 1 and 2 as
linear algebra, while scientists and engineers would more likely refer to it as matrix
analysis. Chapter 3 extends the approaches used in Chapters 1 and 2 for vectors
and matrices to functions and differential operators, which facilitates application
to ordinary and partial differential equations through the use of eigenvalues and
eigenfunctions.
The methods articulated in Part 1 are appropriate for small systems that can
be solved exactly by hand or using symbolic mathematical software. Despite this
obvious limitation, they are essential knowledge for all subsequent discussion of
numerical methods and applications. In order to solve moderate to large systems,
which are done approximately with digital computers, we must revisit linear
systems of algebraic equations and the eigenproblem to see how the methods
in Part 1 can be adapted for large systems. This is known as computational, or
numerical, linear algebra and is covered in Chapter 6.
Part I
Matrix Methods
1
Vector and Matrix Algebra
I did not look for matrix theory. It somehow looked for me.
(Olga Taussky Todd)
Although the typical undergraduate engineering student has not had a formal
course in linear algebra, which is the mathematics of vectors and matrices, they
have been exposed to such constructs as a convenient means of representing tensors and systems of linear algebraic and differential equations, for example. They
have likely observed several means of solving systems of equations, calculating
determinants, applying linear transformations, and determining eigenvalues and
eigenvectors. Typically, however, these topics have been presented to the student
in a disjointed fashion in the context of undergraduate engineering courses where
the need arose. The objective of this text is to unify these topics into a single
coherent subject and extend our skill with vectors and matrices to the level that
is required for graduatelevel study and research in science and engineering.
Whereas vectors and matrices arise in a wide variety of applications and settings (see Section 1.3), the mathematics of these constructs is the same regardless
of where the vectors or matrices have their origin. We will focus here on the mathematics, but with little emphasis on formalism and proofs, and mention and/or
illustrate many of the applications to science and engineering.
1.1 Definitions
Let us begin by defining vectors and matrices and several different types of matrices.
Matrix: A matrix is an ordered arrangement of numbers, variables, or functions
comprised of a rectangular grouping of elements arranged in rows and columns
as follows:
A = ..
..
.. = [Aij ] .
..
.
.
.
.
Am1 Am2 Amn
The size of the matrix is denoted by the number of rows, m, and the number of
3
AT = ..
..
.. = [Aji ] ,
..
.
.
.
.
A1n A2n Amn
which results in an n m matrix. If AT = A, then A is said to be symmetric
(Aji = Aij ). Note that a matrix must be square to be symmetric.
T
If the elements of A are complex and A = A, then A is a Hermitian matrix
T
(Aji = Aij ), where the overbar represents the complex conjugate, and A is the
conjugate transpose of A. Note that a symmetric matrix is a special case of a
Hermitian matrix.
Zero Matrix (0): Matrix of all zeros.
Identity Matrix (I): Square matrix
everywhere else, for example
1 0
0 1
I5 =
0 0
0 0
0 0
0
0
0
1
0
0
0
0
= [ij ] ,
0
1
where
ij =
1,
0,
i=j
.
i 6= j
1.1 Definitions
Scalar Matrix (S): Diagonal matrix with equal diagonal elements such that
S = aI = [aij ].
Triangular Matrix: All elements above (lower triangular) or
angular) the main diagonal are zeros. For example,
A11 0
0
0
0
A11 A12 A13
0 A22 A23
A21 A22
0
0
0
, U = 0
A
A
A
0
0
0 A33
L=
31
32
33
A15
A25
A35
.
A45
A55
Tridiagonal Matrix: All elements are zero except along the lower (first subdiagonal), main, and upper (first superdiagonal) diagonals as follows:
A11 A12 0
0
0
A21 A22 A23
0
0
0
A=
0 A32 A33 A34
.
0
0 A43 A44 A45
0
0
0 A54 A55
Hessenberg Matrix: All elements are
A11 A12
A21 A22
A=
0 A32
0
0
0
0
Toeplitz Matrix: Each diagonal
A11
A21
A=
A31
A41
A51
A=
A11 A12
A21 A22
1
A
A22 A12
A21 A11
Symmetric: A = AT
Hermitian: A = A
SkewSymmetric: A = AT
SkewHermitian: A = A
Orthogonal: A1 = AT
Unitary: A1 = A
where A = A11 A22 A12 A21 is the determinant of A (see Section 1.4.2).
Orthogonal Matrix: An n n matrix A is orthogonal if
AT A = I.
It follows that
AT = A1
for an orthogonal matrix. Such a matrix is called orthogonal because its column
(and row) vectors are mutually orthogonal (see Section 1.5).
Remark: Generalizations of some of the above definitions to matrices with complex elements are given in Table 1.1.
mr rn
A11 A12
A21 A22
..
..
.
.
Ai1 Ai2
.
..
..
.
Am1 Am2
A1r
A2r
..
.
Air
..
.
mn
B1n
B2n
..
.
Brn
Amr
C11
C12
..
.
..
.
Cm1 Cm2
C1n
Cij
..
.
Cmn
where
Cij = Ai1 B1j + Ai2 B2j + + Air Brj =
r
X
Aik Bkj .
k=1
That is, Cij is the inner product of the ith row of A and the j th column of B
(see Section 1.5.1 for a definition of the inner product).
Note that in general, AB 6= BA (even if square), that is, premultiplying B
by A is not the same as postmultiplying B by A.
Rules: For A, B, C, and I of appropriate size, and n integer.
A (BC) = (AB) C (but cant change order of A, B and C)
A (B + C) = AB + AC
(B + C) A = BA + CA
AI = IA = A
T
(A + B) = AT + BT
(AB) = BT AT
1
(AB)
= B1 A1
An = AA A
(n factors)
An = (A )n = A1 A1 A1
(n factors)
An Am = An+m
(An )m = Anm
A0 = I
Note that scalar arithmetic is a special case of matrix arithmetic, that is, a scalar
is simply a 1 1 matrix.
We can combine vector addition and scalar multiplication to devise a very
common algebraic operation in matrix methods known as a linear combination
of vectors. Let us say that we have the m vectors x1 , x2 , . . ., xm , which are
ndimensional. A linear combination of these vectors is given by
u = a1 x1 + a2 x2 + + am xm ,
where ai , i = 1, . . . m are constants. Thus, we are taking the combination of some
portion of each vector through summing to form the ndimensional vector u. In
Section 1.5, we will regard the vectors xi , i = 1, . . . , m as the basis of a vector space
comprised of all possible vectors u given by all possible values of ai , i = 1, . . . m.
1.3 Applications
There is nothing inherently physical about the way vectors and matrices are defined and manipulated. However, they provide a convenient means of representing
the mathematical models that apply in a vast array of applications that span all
areas of physics, engineering, computer science, and image processing, for example. Once we become comfortable with them, vectors and matrices simply become
a natural extension of the basic algebra that is so inherent to mathematics. Linear
algebra, then, is not so much a separate branch of applied mathematics as it is
an essential part of our mathematical repertoire.
Before getting into the particulars of manipulating vectors and matrices, it is
worthwhile to summarize and categorize the various classes of problems that are
represented using these methods and some of the applications that produce them.
The remainder of Chapters 1 and 2 will then focus on the methods and techniques
for treating each of these classes of problems. Applications will be interspersed
periodically in order to remind ourselves that we are developing the tools to
solve real problems, and also to expand our understanding of what the general
mathematical constructs represent in physical and numerical applications.
The following subsections provide a classification of the various types of matrix
problems that commonly arise in engineering and the sciences. Various examples
1.3 Applications
are drawn from these fields, and it is shown how the particular class of matrix
problem arises naturally from the physical or numerical situation under consideration.
10
1. The sum of the currents into each junction is equal to the sum of the currents
out of each junction.
2. The sum of the voltage drops around each closed circuit is zero.
Applying the second of Kirchhoffs laws around each of the three parallel circuits gives the three coupled equations
Loop 1: 8 12i1 8(i1 i3 ) 10(i1 i2 ) = 0,
Loop 2:
4 10(i2 i1 ) 6(i2 i3 ) = 0,
Loop 3:
8i3 = 8,
10i1 +16i2
6i3 = 4,
8i1
6i2 +18i3 = 6,
30 10 8 i1
8
10 16 6 i2 = 4 .
8 6 18
i3
6
Solving this system of coupled, linear, algebraic equations would provide the
T
sought after solution for the currents x = [i1 i2 i3 ] .
Note that if the circuit also includes capacitors and/or inductors, application
of Kirchhoffs laws would produce a system of ordinary differential equations (see
Section 1.3.4).
Example 1.2 Recall that for a structure in static equilibrium, the sum of the
forces and moments are both zero, which holds for each individual member as
well as the entire structure. Determine the forces in each member of the truss
structure shown in Figure 1.2 (see Jeffrey, p. 227).
Solution: Note that all members are of length `; therefore, the structure is comprised of equilateral triangles with all angles being /3. We first obtain the reaction forces at the supports 1 and 2 by summing the forces and moments for the
entire structure as follows:
P
F =0
R1 + R2 3 = 0,
P
MA = 0 2`R2 3` = 0;
therefore,
3
R1 = ,
2
3
R2 = .
2
1.3 Applications
11
1
1 0
0
0
0
0
2
3
0
0
0
0
2 0 0
0
1
1
32
1
0
0
0
F1
2
2
0
3
2 0 23
0
0
0
0 F2
0
0 1 1
0 12 1 0 F3 0
F4 = .
3
3
0 0
0
0
0 3
2
2
F5
1
1
0 0 0
0
1
0
2
2 F6
3
3 F
7
0
0
0
0
0
0
2
2
1
0 0 0
0
0
1
32
2
3
0 0 0
0
0
0
2
3
,
2
the
Observe that there are ten equations (two for each joint), but only seven unknown
forces; therefore, three of the equations must be linear combinations of the other
equations in order to have a unique solutions (see Section 1.4.3). Carefully examining the coefficient matrix and righthand side vector, one can see that the
12
coefficient matrix contains information that only relates to the structures geometry, while the righthand side vector is determined by the external loads only.
Hence, different loading scenarios for the same structure can be considered by
simply adjusting c accordingly.
The previous two examples illustrate how matrices can be used to represent the
governing equations of a physical system directly. There are many such examples
in science and engineering. In the truss problem, for example, the equilibrium
equations are applied to each discrete element of the structure leading to a system of linear algebraic equations for the forces in each member that must be
solved simultaneously. In other words, the discretization follows directly from the
geometry of the problem. For continuous systems, such as solids, fluids, and heat
transfer problems involving continuous media, the governing equations are in the
form of ordinary or partial differential equations for which the discretization is
less obvious. This gets us into the important and expansive topic of numerical
methods. Although this is beyond the scope of our considerations at this time
(see Part II), let us briefly motivate the overall approach using a simple onedimensional example.
Example 1.3 Consider the fluid flow between two infinite parallel flat plates
with the upper surface moving with constant speed U and an applied pressure
gradient in the xdirection; this is known as Couette flow. This is shown schematically in Figure 1.3.
Solution: The onedimensional, fullydeveloped flow is governed by the ordinary
differential equation (derived from the NavierStokes equations enforcing conservation of momentum)
d2 u
1 dp
=
= const.,
2
dy
dx
(1.1)
where u(y) is the fluid velocity in the xdirection, which we seek, p(x) is the
specified linear pressure distribution in the xdirection (such that dp/dx is a
constant), and is the fluid viscosity. The noslip boundary conditions at the
1.3 Applications
13
at y = 0,
(1.2)
at
(1.3)
y = H,
=
,
dy y=yi
y
y
where ui = u(yi ). This is called a forward difference. A more accurate approximation is given by a central difference of the form
ui+1 ui1
du
.
(1.4)
dy y=yi
2y
We can obtain a central difference approximation for the secondorder derivative
d du
d2 u
=
dy 2
dy dy
as in equation (1.1) by applying (1.4) midway between successive grid points as
follows:
du
du
2
dy i+1/2
dy i1/2
d u
.
2
dy y=yi
y
14
dy 2 y=yi
ui+1 ui
y
ui ui1
y
or
d2 u
ui+1 2ui + ui1
.
2
dy y=yi
(y)2
(1.5)
(1.6)
where
a = 1,
b = 2,
c = 1,
d=
(y)2 dp
.
dx
uI+1 = U.
(1.7)
i=3
i=4
i=i
= d,
1.3 Applications
15
b c
u1
d
a b c
u2 d
u3 d
a b
c
.. ..
.. .. ..
.
.
.
. = . ,
a
b
c
ui d
.. .. ..
... ...
.
.
.
a
b c uI1 d
a b
uI
d cU
where all of the empty elements in the coefficient matrix are zeros. This is called
a tridiagonal matrix as only three of the diagonals have nonzero elements. This
structure arises because of how the equation is discretized using central differences
that involve ui1 , ui , and ui+1 . In general, the number of domain subdivisions
I can be very large, thereby leading to a large system of algebraic equations to
solve for the velocities u2 , u3 , . . . , ui , . . . , uI .
In all three examples, we end up with a system of linear algebraic equations to
solve for the currents, forces, or discretized velocities.
1.3.2 Eigenproblems
As described in Section 1.6, a linear system of algebraic equations can be thought
of as a transformation from one vector space to another. Such transformations
can scale, translate, and/or rotate a vector. In many applications, it is of interest to determine for a given transformation which vectors get transformed into
themselves with only a scalar stretch factor, that is, the vectors point in the same
direction after the transformation has been applied. In such a case, the system
of equations is of the form
Ax = x,
where A is the transformation matrix, is an eigenvalue representing the stretch
factor, and x is known as an eigenvector and is the vector that is transformed
into itself.
Eigenproblems commonly arise in stability problems (see Section 19.3), obtaining natural frequencies of dynamical systems (see Section 19.1), and determining
the principle stresses in solid and fluid mechanics (see Section 2.1), for example.
It also plays a starring role in the diagonalization procedure used in quadratic
forms (see Section 2.2.3) and, more importantly, for solving systems of ordinary
differential equations (see Section 2.4). In addition, there are several powerful
matrix decompositions that are related to, based on, or used to solve the eigenproblem (see Section 2.6). Optimization using quadratic programming reduces
to solving generalized eigenproblems (see Section 4.2.4). The eigenproblem also
16
has important applications in numerical methods, such as determining if an iterative method will converge toward the exact solution (see Section 6.2.4). The
eigenproblem in its various forms occupies our attention in Chapter 2.
Example 1.4 Consider the stresses acting on an infinitesimally small tetrahedral element of a solid or fluid as illustrated in Figure 1.5.
Solution: Note that A is the area of BCD. The stress field is given by the stress
tensor, which is a 3 3 matrix of the form
x xy xz
= xy y yz ,
xz yz z
where x , y , and z are the normal stresses, and xy , xz , and yz are the shear
stresses. Observe that the stress tensor is symmetric, that is, = T .
The principal axes are the coordinate axes with respect to which only normal
stresses act, that is, there are no shear stresses. These are called principal stresses.
Let us take n to be the normal unit vector, having length one, to the inclined
face BCD on which only a normal stress n acts, then
n = nx i + ny j + nz k.
Enforcing static equilibrium of the element, that is,
Fx = 0 requires that
1.3 Applications
17
Similarly,
P
Fy = 0 : n ny = xy nx + y ny + yz nz ,
Fz = 0 : n nz = xz nx + yz ny + z nz .
x xy xz
nx
nx
xy y yz ny = n ny ,
xz yz z
nz
nz
or n = n n. Thus, the three eigenvalues n of the stress tensor are the principal
stresses, and the three corresponding eigenvectors n are the principal axes on
which they each act.
In a similar manner, the principal moments of inertia are the eigenvalues of
the moment of inertia tensor, which in twodimensions is
Ixx Ixy
,
Ixy Iyy
where Ixx and Iyy are the moments of inertia, and Ixy is the product of inertia. The
eigenvectors are the corresponding coordinate directions on which the principal
moments of inertia act.
18
referred to as the statespace representation (see Section 2.5.2). The diagonalization procedure used to solve such systems are discussed in Sections 2.5 and have
widespread application in a variety of fields of engineering. In addition, determining the natural frequencies of a system, that is when the entire system oscillates
with a single frequency, results in a differential eigenproblem as in Section 1.3.2
(see Section 19.1). Similarly, evaluating stability of a system subject to small
disturbances leads to a differential eigenproblem (see Sections 19.4 and 19.5).
When studying dynamical systems, we typically start with linear systems
from parallel electrical circuits and discrete mechanical systems involving masses,
springs, pendulums, and dampers, for example. However, some of the concepts
and methods extend to continuous systems of solids or fluids and to nonlinear
systems.
1.4 Systems of Linear Algebraic Equations
The primary application of vectors and matrices is in representing and solving
systems of coupled linear algebraic equations. The size of the system may be
as small as 2 2, for a simple dynamical system, to n n, where n is in the
millions, for systems that arise from implementation of numerical methods applied to complex physical problems. In the following, we discuss the properties of
such systems and methods for determining their solution. Methods developed in
this chapter are suitable for hand calculations involving small systems of equations. Techniques for solving very large systems of equations computationally are
discussed in Part II.
1.4.1 General Considerations
A linear equation is one in which only polynomial terms of first degree or less
appear. For example, a linear equation in n variables, x1 , x2 , . . . , xn , is of the form
A1 x1 + A2 x2 + + An xn = c,
where Ai , i = 1, . . . , n and c are constants. A system of m coupled linear equations
for n variables is of the form
A11 x1 + A12 x2 + + A1n xn = c1
A21 x1 + A22 x2 + + A2n xn = c2
.
..
.
Am1 x1 + Am2 x2 + + Amn xn = cm
The solution vector xT = [x1 x2 xn ] must satisfy all m equations
simultaneously. This system may be written in matrix form Ax = c as
A11 A12 A1n
x1
c1
A21 A22 A2n x2 c2
..
..
.. .. = .. .
..
.
.
.
. . .
Am1 Am2 Amn
xn
cm
19
1.4.2 Determinants
The determinant is an important quantity that characterizes a matrix. We first
motivate the determinant and why it is useful, then we describe how to compute
it for large matrices, and finally describe some of its properties.
Consider the system of linear algebraic equations
A11 x1 + A12 x2 = c1
,
A21 x1 + A22 x2 = c2
where A11 , A12 , A21 , A22 , c1 and c2 are known constants, and x1 and x2 are the
variables to be found. In order to solve for the unknown variables, multiply the
20
l1
l2
=> no solution
y
l2
l1
=> one solution
y
l1
l2
first equation by A22 , the second by A12 and then subtract. This eliminates the
x2 variable, and solving for x1 gives
x1 =
A22 c1 A12 c2
.
A11 A22 A12 A21
21
Similarly, to obtain x2 , multiply the first equation by A21 , the second by A11 and
then subtract. This eliminates the x1 variable, and solving for x2 gives
x2 =
A11 c2 A21 c1
.
A11 A22 A12 A21
Observe that the denominators are the same in both cases. We call this denominator the determinant of the 2 2 coefficient matrix
A11 A12
A=
,
A21 A22
and denote it by
A = A11 A22 A12 A21 .
For a 3 3 matrix
A11
A = A21
A31
A12 A13
A22 A23 ,
A32 A33
(1.8)
(1.9)
Remark: This diagonal multiplication method does not work for A larger than 3
3. Therefore, a cofactor expansion must be used.
Cofactor Expansion:
For a square matrix An , Mij is the minor of Aij , and Cij is the cofactor of Aij .
The minor Mij is the determinant of the submatrix that remains after the ith
row and j th column are removed, and the cofactor of A is
Cij = (1)i+j Mij .
We can then write the cofactor matrix, which is
..
.
M14
+M24
M34
.
+M44
..
.
22
= A11 (A22 A33 A23 A32 ) A12 (A21 A33 A23 A31 )
+A13 (A21 A32 A22 A31 )
A = A11 A22 A33 + A12 A23 A31 + A13 A21 A32
A11 A23 A32 A12 A21 A33 A13 A22 A31
which is the same as equation (1.9). We must use the cofactor expansion for matrices larger than 3 3.
Properties of Determinants (A and B are n n):
1. If any row or column is all zeros, A = 0.
2. If two rows (or columns) are interchanged, the sign of A changes.
3. If any row (or column) is a linear combination of the other rows (or columns),
then A = 0.
4. AT  = A.
5. In general, A + B =
6 A + B.
6. AB = BA = AB.
7. If A = 0, then A is singular and not invertible. If A =
6 0, then A is
1
nonsingular and invertible, and A1  =
.
A
8. The determinant of a triangular (or diagonal) matrix is the product of the
diagonal elements.
23
Procedure:
1. Write the augmented matrix:
A11
A21
[Ac] = ..
.
A12
A22
..
.


A1n
A2n
..
.
c1
c2
.. .
.

Am1 Am2 Amn  cm
0
0
1
A12 A1n 
A21 A22 A2n 
.
..
..
..
.
.

Am1 Am2 Amn 
4. Multiply the 1st row by Ai1
1
0
.
..
bottom.
0
c1
c2
.
..
.
cm
0
0
0
A32 A3n  c3
.
..
..
..
.
.
 .
0
0 Am2 Amn  cm
5. Repeat steps (2) (4), called forward elimination, on the (m 1) (n 1)
submatrix, etc . . . until the matrix is in rowechelon form, in which case the
first nonzero element in each row is a one with zeros below it.
6. Obtain the solution by back substitution. First, solve each equation for its
leading variable (for example, x1 = , x2 = , x3 = ). Second, starting
at the bottom, substitute each equation into all of the equations above it, that
is, back substitution. Finally, the remaining nr variables are arbitrary, where
the rank r is defined below.
Example 1.5
equations
2 0 1 x1
3
0 1 2 x2 = 3 .
1 2 0 x3
3
24
2 0 1  3
0 1 2  3 ,
1 2 0  3
and perform elementary row operations to produce the rowechelon form via
forward elimination. In order to have a leading zero in the first column of the
first row, dividing through by two produces
1 0 12  32
0 1 2  3 .
1 2 0  3
We first seek to eliminate the elements below the leading one in the first column
of the first row. The first column of the second row already contains a zero, so
we focus on the the third row. To eliminate the leading one, subtract the third
row from the first row to produce
1 0 12  32
0 1 2  3 .
0 2 12  32
Now the submatrix that results from eliminating the first row and first column
is considered. There is already a leading one in the second column of the second
row, so we seek to eliminate the element directly beneath it. To do so, multiply
the second row by two and add to the third row to yield
1 0 12  32
0 1 2  3 .
0 0 92  92
To obtain rowechelon form, divide
1
0
0
0 12  32
1 2  3 .
0 1  1
Having obtained the rowechelon form, we now perform the back substitution
to obtain the solution for x. Beginning with the third row, we see that
x3 = 1.
Then from the second row
x2 + 2x3 = 3,
or substituting x3 = 1, we have
x2 = 1.
25
In order to introduce some terminology, consider the following possible rowechelon form resulting from Gaussian elimination:
( 1 a b c  d
0 0 1 e  f
r
0 0
0 0 0 1  g
[A c ] =
0 0 0 0  h
residual eqns.
0 0 0 0  j
The order (size) of the largest square submatrix of A0 with a nonzero determinant is the rank of A, denoted by r = rank(A). Equivalently, the rank of A is
the number of nonzero rows of A when reduced to rowechelon form A0 . In the
above example, the rank is r = 3.
Remarks:
1. Possibilities, with n being the number of unknowns (columns) of A, and m
the number of equations (rows):
0
If the Aij s for the ith row (i > r) are all zero, but ci is nonzero, the system
is inconsistent and no solution exists. In this case, the residual equations
say that a nonzero number equals zero, which obviously cannot be the case.
If r = n (r remaining equations for n unknowns), then there is a unique
solution.
If r < n, then there is a (n r)parameter family of solutions with defect
n r. That is, (n r) of the variables are arbitrary. In this case, there are
infinite solutions.
2. If the rank of A and the rank of the augmented matrix [Ac] are equal, the
system is consistent and a solution(s) exists (cf. note 1).
3. rank(A) = rank(AT ), that is, the rank of the row space of A is the same as
that of the column space of A (see Appendix B).
4. Elementary row operations do not change the rank of a matrix.
5. If A is an n n triangular matrix, then
A = A11 A22 Ann ,
26
in which case the determinant is the product of the elements on the main diagonal. Therefore, we may row reduce the matrix A to triangular form in order to
simplify the determinant calculation. In doing so, however, we must take into
account the influence of each elementary row operation on the determinant as
follows:
0
row
yoperations
[In  A1
n ]
Remarks:
1. Note that A is not invertible if the row operations do not produce In .
1
2. Similar to the transpose, (AB) = B1 A1 , where we again note the reverse
order.
Despite our use of the notation A1 for the matrix inverse, there is no such thing as a matrix
1
reciprocal, that is, A1 6= A
.
Note that while the terms homogeneous and trivial both have to do with something being zero,
homogeneous refers to the righthand side of an equation, while trivial refers to a solution.
27
2 0
0 1
1 2
this is
3
1 x1
2 x2 = 3 .
3
0 x3
2 0 1  1 0 0
0 1 2  0 1 0 .
1 2 0  0 0 1
Performing the same Gaussian elimination steps as in the previous example to
obtain the rowechelon form produces
1 0 12  21 0 0
0 1 2  0 1 0 .
0 0 1  19 49 29
To obtain the inverse of A on the right, we must have the identify matrix on the
left, that is, we must have the left side in reduced rowechelon form. To eliminate
the two in the third column of the second row, multiply the third row by minus
two and add to the second row. Similarly, to eliminate the onehalf in the third
column of the first row, multiply the third row by minus onehalf and add to the
first row. This leads to
1
1 0 0  49 29
9
1
4
0 1 0  2
.
9
9
9
1
4
2
0 0 1  9
9
9
Therefore, the inverse of A is
A1
4 2 1
1
4 ,
= 2 1
9 1
4 2
4 2 1
3
9
1
1
1
1
2 1
4
3 =
9 = 1 ,
x=A c=
9 1
9 9
4 2 3
1
28
which is the same solution as obtained using Gaussian elimination directly in the
previous example.
See Chapter 6 for additional methods for determining the inverse of a matrix
that are suitable for computer algorithms, including LU decomposition, Cholesky
decomposition, and partitioning.
1.4.5 Cramers Rule
Cramers rule is an alternative method for obtaining the unique solution to a
system Ax = c, where A is n n and nonsingular (A 6= 0). Recall that the
cofactor matrix for an n n matrix A is
C = ..
..
.. ,
.
.
.
Cn1 Cn2 Cnn
where Cij is the cofactor of Aij . We define the adjugate of A to be the transpose
of the cofactor matrix as follows
Adj(A) = CT .
The adjugate matrix is often referred to as the adjoint matrix; however, this can
lead to confusion with the term adjoint used in other settings.
Evaluating the product of matrix A with its adjugate, it can be shown using
the rules of cofactor expansions that
A Adj(A) = A I.
Therefore, because AA1 = I (if A is invertible), we can write
A1 =
1
Adj(A).
A
This leads, for example, to the familiar result for the inverse of a 22 matrix (see
Section 1.1). More generally, this result leads to Cramers rule for the solution of
the system Ax = c by recognizing that the j th element of the vector Adj(A)c =
Aj , where Aj , j = 1, 2, . . . , n, is obtained by replacing the j th column of A by
the righthand side vector
c1
c2
c = .. .
.
cn
Then the system has a unique solution, which is
x1 =
A1 
A2 
An 
, x2 =
, , xn =
.
A
A
A
29
Remarks:
1. If the system is homogeneous, in which case c = 0, then Aj  = 0, and the
unique solution is x = 0, that is, the trivial solution.
2. Although Cramers rule applies for any size nonsingular matrix, it is efficient
for small systems (n 3) owing to the ease of finding determinants of 3 3
and smaller systems. For large systems, using Gaussian elimination is typically
more practical.
Example 1.7 Let us once again consider the system of linear algebraic equations from the last two examples, which is
2x1 + x3 = 3,
x2 + 2x3 = 3,
x1 + 2x2 = 3.
Solution: In matrix form Ax = c,
2 0
0 1
1 2
the system is
1 x1
3
2 x2 = 3 .
0 x3
3
which is the same as obtained using the inverse and Gaussian elimination in the
previous two examples.
Problem Set # 1
30
u
u3
u2
u1
u1
v1
u1 + v1
u + v = u2 + v2 = u2 + v2 .
u3
v3
u3 + v3
31
v
u
u + v
ku
u
Observe that adding two vectors results in another vector called the resultant
vector. The resultant vector u + v extends from the tail of one vector to the tip
of the other as illustrated in Figure 1.8.
Multiplying a scalar times a vector simply scales the length of the vector accordingly without changing its direction. It is a special case of matrix scalar
multiplication. For example,
u1
ku1
ku = k u2 = ku2 ,
u3
ku3
which is a vector as shown in Figure 1.9.
Matrixmatrix multiplication applied to vectors is known as the inner product.3
The inner product of u and v is denoted by
v1
hu, vi = uT v = u1 u2 u3 v2 = u1 v1 + u2 v2 + u3 v3 = vT u = hv, ui ,
v3
which is a scalar. If hu, vi = 0, we say that the vectors u and v are orthogonal.
In two and three dimensions, this means they are geometrically perpendicular.
Different authors use various notation for the inner product, including
hu, vi = u v = uT v = (u, v) .
The inner product can be used to determine the length of a vector, which is
3
The inner product also goes by the terms dot product and scalar product.
32
kuk = hu, ui
A unit vector is one having length (norm) equal to one, that is, unity. The inner
product and norm can also be used to determine the angle between vectors. The
angle between two vectors, as illustrated in Figure 1.10, is such that
cos =
hu, vi
.
kuk kvk
If hu, vi = 0, then = /2, and the vectors are orthogonal. For more on vector
norms, see Section 1.7.
Given the term inner product, one may wonder if there is an outer product.
Indeed there is; however we will not find much practical use for it. Whereas the
inner project of u and v is uT v, which produces a scalar (1 1 matrix), the outer
product is uvT , which produces an m n matrix, where u is m 1 and v is n 1
(m = n for there to be an inner product).
A vector operation that is unique to the two and threedimensional cases is
the cross product.4 Recall that the dot (inner or scalar) product of two vectors
produces a scalar; the cross product of two vectors produces a vector that is
perpendicular to the two vectors and obeys the righthand rule.
In Cartesian coordinates, where the vectors u = u1 i + u2 j + u3 k and v =
v1 i + v2 j + v3 k, the cross product is
i
j k
w = u v = u1 u2 u3 = (u2 v3 u3 v2 ) i (u1 v3 u3 v1 ) j + (u1 v2 u2 v1 ) k.
v1 v2 v3
Note how the cofactor expansion about the first row is used.
Remarks:
1. Observe that the order of the vectors matters when taking the cross product
according to the righthand rule.
4
The cross product is sometimes referred to as a vector product because it produces a vector in
contrast to the inner product, which produces a scalar.
33
2. If either u or v is zero, or if u and v are parallel, then the cross product is the
zero vector w = 0.
3. If the two vectors u and v are taken to be the two adjacent sides of a parallelogram, then the length of the cross product w = u v is the area of the
parallelogram.
4. The cross product is a measure of rotation of some sort. For example, they
are typically first encountered in physics when computing moments of forces
about an axis.
1/2
(1.10)
34
To obtain a criteria for the existence of the ci s, take the inner product of each
vector ui (1 i m) with (1.10):
c1 u21 + c2 hu1 , u2 i + + cm hu1 , um i = 0
c1 hu2 , u1 i + c2 u22 + + cm hu2 , um i = 0
,
..
.
c1 hum , u1 i + c2 hum , u2 i + + cm u2m = 0
or in matrix form
u21
hu2 , u1 i
..
hu1 , um i
c1
0
c2 0
hu2 , um i
.. = .. .
..
..
. .
.
.
2
cm
0
hum , u1 i hum , u2 i
um
{z
}

hu1 , u2 i
u22
..
.
G0
Thus, if G = G  6= 0, where G is the Gram determinant, the only solution is
c = 0, that is, the trivial solution, and the vectors u1 , , um are linearly independent. If G = G0  = 0, many nontrivial solutions exist (nonunique solutions),
and the vectors u1 , , um are linearly dependent.
0
Remarks:
1. The matrix G0 is symmetric owing to the properties of inner products.
2. If the matrix A is formed by placing the vectors ui , i = 1, . . . , m as the
columns, then G0 = AT A.
Example 1.8 Consider the case when u1 , , um are all mutually orthogonal,
and determine if the vectors are linearly independent.
Solution: In this case, the vectors are nonzero, in which case
ku1 k2
0
0
c1
0
2
0
ku2 k
0 c2 0
..
..
.. .. = .. .
.
.
.
.
.
. . .
0
kum k2
cm
35
such that
v = c1 u1 + c2 u2 + + cr ur ,
where the ci s are arbitrary, form a vector space V with dimension r. We say that
1) V is a subspace of ndimensional space, 2) the vectors u1 , , ur span V and
form a basis for it, 3) if ui are unit vectors, each ci is the component of v along
ui , 4) r is the rank of G0 (defect = n r), 5) r is also the rank of the matrix
with the ui s as rows or columns, and 6) the range of a matrix A, denoted by
range(A), is the vector space spanned by the columns of A.
Example 1.9
u1 = 1 , u2 = 0 , u3 = 1
3
1
2
or
v1 = c1 +c2 +2c3
v2 = c1
+c3 ,
v3 = 2c1 +c2 +3c3
or
1
1
2

1 2 c1
v1
0 1 c2 = v2 .
1 3 c3
v3
{z
}
A
Note that the columns of A are u1 , u2 and u3 .
Is this linear system consistent for all v? To find out, evaluate the determinant
(only possible if A is square)
A = 0 + 2 + 2 0 3 1 = 0.
Because the determinant is zero, the system is not consistent, and no unique
solution exists for c with any v. Hence, the vectors u1 , u2 and u3 do not span all
of threedimensional space.
36
u21
hu1 , u2 i hu1 , u3 i
u22
hu2 , u3 i
G0 = hu2 , u1 i
hu3 , u1 i hu3 , u2 i
u23
6 3 9
G0 = 3 2 5 ,
9 5 14
iT2 = [0 1 0 0] .
..
.
iTn = [0 0 0 1]
Summarizing: If A is an n n matrix, the following statements are equivalent:
A =
6 0.
A is nonsingular.
A is invertible.
A has rank n.
The row (or column) vectors of A are linearly independent.
The row (or column) vectors of A span ndimensional space and form a basis
for it.
7. Ax = 0 has only the trivial solution because x = A1 0 = 0.
1.
2.
3.
4.
5.
6.
u2
u2
e1
37
<u2 ,e1>
u1
,
ku1 k
38
e2 =
u02
.
ku02 k
3. Determine the component of u03 that is orthogonal to e1 and e2 . This is accomplished by subtracting the components of u3 in the e1 and e2 directions
from u3 as follows:
u03 = u3 hu3 , e1 i e1 hu3 , e2 i e2 .
Normalize u03 produces
e3 =
u03
.
ku03 k
4. Continue for 4 i s:
u0i
= ui
i1
X
hui , ek i ek ,
ei =
k=1
u0i
.
ku0i k
Remarks:
1. The GramSchmidt procedure results in an orthonormal (orthogonal and normalized) set of basis vectors e1 , e2 , , es .
2. This procedure always works if the ui s are linearly independent (the order of
the ui s affects the final orthonormal vectors).
x = y ,
mn n1
m1
where A is specified and x and y are arbitrary vectors, may be viewed as a linear
transformation (mapping) from nspace (size of x) to mspace (size of y). In this
context, A transforms vectors from an nspace domain to an mspace range as
shown in Figure 1.12.
Example 1.10
39
Ax
====>
Domain
Range
mspace
nspace
(y1, y2)
(x1, x2)
x1 sin
+ x2 cos
x1
x1 cos  x2 sin
= Ax
cos sin x1
=
sin cos
x2
y1
x1 cos x2 sin
=
.
y2
x1 sin + x2 cos
From Figure 1.13, we see that premultiplying by A rotates x through an angle
.
Remarks:
1. Because the transformations are linear, they can be superimposed. For example,
Ax + Bx = y
40
is equivalent to
(A + B)x = y.
2. Successively applying a series of transformations to x, such as
x1 = A1 x, x2 = A2 x1 , , xk = Ak xk1 ,
is equivalent to applying the transformation
xk = Ax,
where A = Ak A2 A1 (note the reverse order).
3. If the transformation matrix A is invertible, then A1 y transforms y back
to x, which is the inverse transformation. For example, consider the rotation
transformation. Rotating y through should return x. Recalling that sin
is odd and cos is even, we have
x = A0 y
cos() sin()
=
y
sin() cos()
cos
sin
x =
y.
sin
cos
Is A0 = A1 ? Check if A0 A = I:
cos sin cos sin
0
AA =
sin cos sin cos
cos2 + sin2
cos sin + sin cos
=
sin cos + cos sin
sin2 + cos2
1 0
=
0 1
= I,
in which case A0 = A1 .
4. If y = Ax, then y is a linear combination of the columns (a1 , a2 , . . . , an ) of A
in the form
y = Ax = x1 a1 + x2 a2 + + xn an .
5. In the case when a matrix A transforms a nonzero vector x to the zero vector
0, that is, Ax = 0, the vector x is said to be in the nullspace of A. The
nullspace is a subspace of that defined by the row or column vectors of A.
Problem Set # 2
41
n
X
ui .
i=1
where aj denotes the jth column of A. The L norm is the largest L1 norm of
the row vectors of A, that is,
kAk = max kai k1 ,
1im
42
sometimes referred to as the spectral norm. To more directly mimic the L2 norm
for vectors, we also define the Frobenius norm of a matrix given by
!1/2
m X
n
X
2
kAkF =
Aij 
.
i=1 j=1
This is simply the square root of the sum of the squares of all the elements of the
matrix A.
Remarks:
1. When both vectors and matrices are present, we use the same norm for both
when performing operations.
2. Unless indicated otherwise, we will use the L2 norm for both vectors and
matrices.
3. Although the L2 norm is most commonly used, the L1 norm and L norm are
typically more convenient to compute.
4. The Schwarz inequality for vectors is
 hu, vi  kuk2 kvk2 .
More generally, it can be proven that
kABk kAkkBk.
These inequalities prove useful in determining bounds on various quantities of
interest involving vectors and matrices.
5. For more on the mathematical properties of norms, see Golub and Van Loan
(2013) and Horn and Johnson (2013).
6. In Chapter 3, we will extend the L2 norm to functions, and vector and matrix
norms will play a prominent role in defining the condition number in Chapter 6
that is so central to computational methods.
2
The Eigenproblem and Its Applications
Recall from Section 1.6 that we may view a matrix A as a linear transformation
from a vector x to a vector y in the form Ax = y. It is often of interest to know
for what characteristic values of a matrix transforms a vector x into a constant
multiple of itself, such that Ax = x. Such values of are called eigenvalues, and
the corresponding vectors x are called eigenvectors. Some common applications
where eigenvalues and eigenvectors are encountered in mechanics are:
The eigenvalues of the stress tensor, which is a 3 3 matrix, in mechanics are
the principal stresses, with the eigenvectors defining the principal axes along
which they act (see Section 1.3.2).
Similarly, the principal moments of inertia of an object are the eigenvalues of
the moment of inertia matrix, with the eigenvectors defining the principal axes
of inertia.
The natural frequencies of dynamical (mechanical or electrical) systems correspond to eigenvalues (see Section 19.1).
Stability of dynamical (mechanical or electrical) systems is determined from a
consideration of eigenvalues (see Section 19.3).
2.1 Eigenvalues and Eigenvectors
For a given n n matrix A, we form the eigenproblem
Ax = x,
(2.1)
43
44
x
x
A1n
A21
A22
A2n
= 0.
.
.
..
.
..
..
..
.
An1
An2
Ann
Setting this determinant equal to zero results in a polynomial equation of degree n for , which is called the characteristic equation. The n solutions to the
characteristic equation are the eigenvalues 1 , 2 , . . . , n , which may be real or
complex.
For each = i , i = 1, . . . , n, there is a nontrivial solution ui = x, i = 1, . . . , n;
these are the eigenvectors. For example, if n = 3, the characteristic equation is
of the form
c1 3 + c2 2 + c3 + c4 = 0;
therefore, there are three eigenvalues 1 , 2 , and 3 and three corresponding eigenvectors u1 , u2 , and u3 .
45
Example 2.1 Find the principal axes and principal stresses for the twodimensional
stress distribution given by the stress tensor
3 1
A = xx xy =
.
yx yy
1 3
Solution: The normal stresses are xx and yy , and xy and yx are the shear
stresses acting on the body. Stresses are defined such that the first subscript
indicates the outward normal to the surface, and the second subscript indicates
the direction of the stress on that face as shown in Figure 2.2. Also recall that
xy = yx ; therefore, the stress tensor A is always symmetric.
The principal axes of the stress tensor A correspond to the orientations of
the axes for which only normal stresses act on the body, that is, there are no
shear stresses. The principal axes are the eigenvectors of the stress tensor A. The
eigenvalues of the stress tensor are the principal (normal) stresses. To find the
eigenvalues and eigenvectors of A, we write (A I)x = 0, which is
3 1
x = 0.
(2.2)
1 3
For a nontrivial solution to exist
3 1
1 3 = 0
(3 )(3 ) (1)(1) = 0
2 6 + 8 = 0
( 2)( 4) = 0
1 = 2, 2 = 4.
46
47
2. Observe that the principal axes (eigenvectors) are orthogonal, such that hu1 , u2 i =
0. As shown in Section 2.2.1, this is always true for real symmetric matrices
with distinct eigenvalues.
In the above example, the eigenvalues are distinct, that is, 1 6= 2 . In some
cases, however, eigenvalues may be repeated as in the next example.
Example 2.2
0 0 1
1 2 0
A=
1 0 2
1 0 1
1
1
.
1
0
To evaluate the determinant of a 4 4 matrix requires forming a cofactor expansion (see Section 1.4.2). Observing that the second column of our determinant
has all zeros except for one element, we take the cofactor expansion about this
column, which is
1
1
+(2 ) 1 2 1 = 0.
1
1
48
0 0 1 1
1 2 0 1
1 0 2 1 x = 0.
1 0 1 0
Although it is not obvious by observation that we lose one equation, note that
the sum of the third and fourth equations equals the first equation. Therefore,
the system reduces to the final three equations for the four unknowns according
to
x1 + 2x2
+ x4 = 0,
x1
+ 2x3 + x4 = 0,
x1
x3
= 0.
Because we have only three equations for four unknowns, let x1 = c1 . Then back
substituting into the above equations leads to x3 = c1 , x4 = c1 , and x2 = c1 .
Therefore, the eigenvector corresponding to the eigenvalue 1 = 0 is
1
1
u1 = c1
1 .
1
Similarly, for the eigenvalue 4 = 2, we have the system
2 0 1
1
1 0 0
1
x = 0.
1 0 0
1
1 0 1 2
49
In order to solve this system of three equations for four unknowns, we can use
Gaussian elimination to reduce the system to
x1
x3 2x4 = 0,
x3 + 3x4 = 0,
2x4 = 0.
u4 = c4
0 ,
0
where we note that x2 is arbitrary as it does not appear in any of the equations.
For the repeated eigenvalue 2 = 3 = 1, we attempt to find two linearlyindependent eigenvectors. Setting = 1 in (A I)x = 0 gives
1 0 1
1
1 1 0
1
x = 0.
1 0 1
1
1 0 1 1
The first, third, and fourth rows (equations) are the same, so x1 , x2 , x3 , and x4
are determined from the two equations
x1
+ x3 + x4 = 0,
x1 + x2
+ x4 = 0.
We have two equations for four unknowns requiring two arbitrary constants, so
let x1 = c2 and x2 = c3 ; thus,
x1
x2
x4
x3
=
=
=
=
c2 ,
c3 ,
c2 c3 ,
c2 x4 = c2 (c2 c3 ) = c3 .
c2
c3
u2,3 =
c3 .
c2 c3
50
The parameters c2 and c3 are arbitrary, so we may choose any two unique pairs
of values that result in linearlyindependent eigenvectors. Choosing, for example,
c2 = 1, c3 = 1 for u2 and c2 = 1, c3 = 0 for u3 gives the additional two eigenvectors
(along with u1 and u4 )
1
1
1
0
u2 =
1 , u3 = 0 .
0
1
Note that the arbitrary constant is always implied, even if it is not shown explicitly. In this case, the four eigenvectors are linearly independent; therefore, they
provide a basis for the 4dimensional vector space associated with matrix A.
Remarks:
1.
2.
3.
4.
The
The
The
The
where tr(A) is the trace of A, which is the sum of the elements on the main
diagonal.
5. The eigenvectors corresponding to distinct eigenvalues are always linearly independent.
6. It can be shown that A = 1 2 . . . n . Therefore, A is a singular matrix if
and only if at least one of the eigenvalues is zero.
7. The eigenvalues of an n n diagonal matrix are on the main diagonal, that is,
1 0 0
0 2 0
A = ..
.. . .
.. ,
.
. .
.
0 0 n
and the corresponding eigenvectors are
1
0
0
0
1
0
u1 = .. , u2 = .. , . . . , un = .. ,
.
.
.
0
0
1
which are linearly independent (and mutually orthogonal). If A = 1 2 n 6=
A1 = .
.
.. . . .
..
0
51
of A is
0
0
.
..
.
1
n
n = 1, 2, 3, . . . .
That is, if i are the eigenvalues of A, then the eigenvalues of An are ni , and
the eigenvectors of A and An are the same.
9. CayleyHamilton Theorem: If Pn () = 0 is the nth degree characteristic polynomial of an n n matrix A, then A satisfies
Pn (A) = 0.
That is, A satisfies its own characteristic polynomial. See Section 2.7 for an
example of the use of the CayleyHamilton theorem.
10. In some cases with repeated eigenvalues, there are fewer eigenvectors than
eigenvalues. For example, an eigenvalue with multiplicity two may only have
one corresponding regular eigenvector. When an n n matrix has fewer than
n linearlyindependent eigenvectors, we say the matrix is defective. A procedure for obtaining generalized eigenvectors in such cases will be provided in
Section 2.4.2.
11. Computer algorithms for finding the eigenvalues and eigenvectors of large matrices are typically based on QR decomposition. Such algorithms are described
in Appendix C.
12. Some applications result in the so called generalized eigenproblem
Ax = Bx,
such that the regular eigenproblem (2.1) corresponds to B = I. Both types of
eigenproblems can be treated using very similar techniques. Do not confuse the
generalized eigenproblem with generalized eigenvectors; they are not related.
52
Au2 = 2 u2 .
(2.3)
(2.4)
53
in order to obtain an orthonormal basis for nspace (cf. GramSchmidt orthogonalization procedure).
4. If an eigenvalue is repeated s times, there are s corresponding eigenvectors
that are linearly independent, but not necessarily orthogonal (the remaining
ns eigenvectors are mutually orthogonal).3 The eigenvectors form a basis for
the ndimensional vector space; one can orthogonalize using GramSchmidt if
desired.
5. Recall that a Hermitian matrix is such that
A = A = ..
..
.. ,
..
.
.
.
.
A1n A2n Ann
where the main diagonal terms must be real. As with real symmetric matrices, the eigenvalues of a Hermitian matrix are real, and the eigenvectors
corresponding to distinct eigenvalues are mutually orthogonal.
The following sections consider two applications involving real symmetric matrices. For an additional application to stability of numerical algorithms, see Section 15.2.
(2.5)
(2.6)
54
by eT1 , gives
he1 , ci = eT1 c = b1 eT1 e1 +b2 eT1 e2 + + bn eT1 en ,
 {z }
 {z }
 {z }
1
0
0
but the eigenvectors are mutually orthogonal (and normalized) because the eigenvalues are distinct; therefore, he1 , ei i = 0 for i = 2, . . . , n, leaving
b1 = he1 , ci .
Generalizing, we have
bi = hei , ci ,
i = 1, 2, . . . , n,
(2.7)
which can all be evaluated to give the constants in equation (2.6). Substituting
(2.5) and (2.6) into Ax = c leads to
A (a1 e1 + a2 e2 + + an en ) = b1 e1 + b2 e2 + + bn en
a1 Ae1 + a2 Ae2 + + an Aen = b1 e1 + b2 e2 + + bn en
a1 1 e1 + a2 2 e2 + + an n en = b1 e1 + b2 e2 + + bn en .
Note that we could have used any linearlyindependent basis vectors in (2.5) and
(2.6), but using the orthonormal eigenvectors as basis vectors allows us to perform
the last step above.
Because the ei s are linearly independent, each of their coefficients must be
equal according to
b2
bn
b1
,
a1 = , a2 = , , an =
1
2
n
or from (2.7)
ai =
hei , ci
,
i
i = 1, . . . , n.
Then from (2.5), the solution vector x for a system with a real symmetric coefficient matrix having distinct eigenvalues is
x=
n
X
hei , ci
i=1
ei .
(2.8)
55
b1
b2
bn
, a2 =
, , an =
.
1
2
n
x1
x3
= 3,
x2
+ 2x3 = 3,
+ 2x2
= 3.
2 0
0 1
1 2
the system is
1 x1
3
2 x2 = 3 ,
0 x3
3
56
Although there is an exact expression for determining the roots of a cubic polynomial (analogous to the quadratic formula), it is not widely known and not worth
memorizing. Instead, we can make use of mathematical software, such as Matlab
or Mathematica, to do the heavy lifting for us. Doing so results in the eigenvalues
1 = 3, 2 = 3, 3 = 3,
and the corresponding normalized eigenvectors
2 + 3
1
1
1
1 3 ,
e1 = 1 , e2 = q
3 1
6(2 3)
1
2 3
1
1 + 3 .
e3 = q
6(2 + 3)
1
n
X
hei , ci
i=1
ei =
he1 , ci
he2 , ci
he3 , ci
e1 +
e2 +
e3
1
2
3
n X
n
X
Aij xi xj
i=1 j=1
(2.9)
57
1
(xT Ax),
2 xi
i = 1, 2, . . . , n.
(2.10)
Example 2.4
defined by
(2.11)
A12 = 1,
A22 = 3.
58
A12
3 1
=
.
A22
1 3
Checking (2.11)
A
11
= xT A x
12 22 21
3 1 x1
x
x
=
1
2
1 3
x2
3x1 x2
= x1 x2
x1 + 3x2
= x1 (3x1 x2 ) + x2 (x1 + 3x2 )
59
where the transformation matrix Q rotates and/or translates the coordinate system, such that the quadratic is in canonical form with respect to y. Substituting
into equation (2.11) gives
A = xT A x
T
= (Q y) A (Q y)
= yT (QT AQ)y
A = yT D y,
where D = QT A Q must be a diagonal matrix in order to produce the canonical
form of the quadratic with respect to y. We say that Q diagonalizes A, such that
premultiplying A by QT and postmultiplying by Q gives a matrix D that must
be diagonal. We call Q the modal matrix.
Procedure to find the modal matrix for real symmetric matrices with distinct
eigenvalues:
1. Determine the eigenvalues, 1 , 2 , . . . , n , of the n n real symmetric matrix
A.
2. Determine the orthonormal eigenvectors, e1 , e2 , . . . , en . Note that they are
already orthogonal owing to the distinct eigenvalues, and they simply need to
be normalized. Then
Ae1 = 1 e1 , Ae2 = 2 e2 , . . . , Aen = n en .
(2.12)
3. Construct the orthonormal modal matrix Q with columns given by the orthonormal eigenvectors e1 , e2 , . . . , en as follows:
..
..
..
.
.
.
e
e
e
Q =
2
n .
1
nn
..
..
..
.
.
.
Note: The order of the columns does not matter but corresponds to the order
in which the eigenvalues appear along the diagonal of D.
60
..
..
..
.
.
.
e
e
e
AQ = A
2
n
1
..
..
..
.
.
.
..
..
..
..
..
..
.
. .
.
.
.
Ae
Ae
Ae
=
=
2
n
2 2
n en .
1
1 1
..
..
..
..
..
..
.
.
.
.
.
.
Then premultiply by QT
e1
e2
D = QT AQ =
..
.
en
.
..
..
.
.
.
.
1 e1 2 e2 n en
..
..
..
.
.
.
2 he1 , e2 i n he1 , en i
2 he2 , e2 i n he2 , en i
.
..
..
..
.
.
.
1 hen , e1 i 2 hen , e2 i n hen , en i
1 he1 , e1 i
1 he2 , e1 i
=
..
Because all vectors are of unit length and are mutually orthogonal, D is the
diagonal matrix
1 0 0
0 2 0
D = ..
.. . .
. .
.
. ..
.
0
Thus,
A = yT Dy = 1 y 21 + 2 y 22 + + n y 2n ,
which is in canonical form with respect to the coordinate system y.
Remarks:
1. Not only is D diagonal, we know what it is once we have the eigenvalues. That
is, it is not actually necessary to evaluate QT AQ.
2. The order in which the eigenvalues i appear in D corresponds to the order
in which the corresponding eigenvectors ei are placed in the modal matrix Q.
3. In the above case for quadratic forms, QT = Q1 , and we have an orthogonal
modal matrix.
4. The columns (and rows) of an orthogonal matrix form an orthonormal set of
vectors.
61
5. If A is symmetric, but with repeated eigenvalues, then GramSchmidt orthogonalization cannot be used to produce orthonormal eigenvectors from linearly
independent ones to form Q. This is because GramSchmidt orthogonalization
does not preserve eigenvectors.
6. Not only is transforming quadratic forms to canonical form an application
of diagonalization, it provides us with a geometric interpretation of what the
diagonalization procedure is designed to accomplish in other settings as well.
7. The diagonalization procedure introduced here is a special case (when Q is
orthogonal) of the general diagonalization procedure presented in Section 2.4
for symmetric A or nonsymmetric A with distinct eigenvalues.
2.3 Normal Matrices
We focus our attention in Section 2.2 on real symmetric matrices because of
their prevalence in applications. The two primary results encountered for real
symmetric A are:
1. The eigenvectors of A corresponding to distinct eigenvalues are mutually orthogonal.
2. The matrix A is diagonalizable using the orthonormal modal matrix Q having
the orthonormal eigenvectors of A as its columns according to
D = QT AQ,
where D is a diagonal matrix containing the eigenvalues of A. This is referred
to as a similarity transformation because A and D have the same eigenvalues.6
One may wonder whether a real symmetric matrix is the most general matrix for
which these two results are true. It turns out that it is not.
The most general matrix for which they hold is a normal matrix. A normal
matrix is such that it commutes with its conjugate transpose so that
T A = AA
T,
A
or
AT A = AAT ,
if A is real.
Clearly symmetric and Hermitian matrices are normal. In addition, all orthogonal, skewsymmetric, unitary, and skewHermitian matrices are normal. However,
not all normal matrices are one of these forms. For example, the matrix
1 1 0
0 1 1
1 0 1
is normal, but it is not orthogonal, symmetric, or skewsymmetric.
6
Two matrices are said to be similar if they have the same eigenvalues.
62
A11 A21 A11
A12 A22 A21
AT A AAT = 0,
A12
A11 A12 A11 A21
= 0,
A22
A21 A22 A12 A22
A211 + A221
A11 A12 + A21 A22
A211 + A212
A11 A21 + A12 A22
= 0,
A11 A12 + A21 A22
A212 + A222
A11 A21 + A12 A22
A221 + A222
A221 A212
A11 A12 + A21 A22 A11 A21 A12 A22
= 0,
A11 A12 + A21 A22 A11 A21 A12 A22
A212 A221
(A12 A21 )
A12 A21 A11 A22
= 0.
A11 A22 A12 + A21
A11 = A22 ,
in which case
a b
A=
.
b a
Remarks:
1. The result from Section 2.2.1 applies for A normal. Specifically, the solution
to
Ax = c,
where A is an n n normal matrix, is
x=
n
X
hei , ci
i=1
ei ,
2.4 Diagonalization
63
Table 2.1 Summary of the eigenvectors of a matrix based on the type of matrix and the nature
of the eigenvalues.
Symmetric A
Nonsymmetric A
Distinct Eigenvalues
Repeated Eigenvalues
LinearlyIndependent Eigenvectors
1.
2.
3.
4.
See Variational Methods with Applications in Science and Engineering, Section 6.5.
64
matrix with repeated eigenvalues, however, the eigenvectors are not linearly independent, in general. Therefore, it cannot be fully diagonalized; however, the
same basic procedure will produce the socalled Jordan canonical form, which is
nearly diagonalized.
2.4.1 Matrices with LinearlyIndependent Eigenvectors
The diagonalization procedure described in Section 2.2.3 for real symmetric matrices with distinct eigenvalues is a special case of the general procedure given
here. If the eigenvectors are linearly independent (including those that are also
orthogonal), then the modal matrix8 P, whose columns are the eigenvectors of
A, produce the similarity transformation
1 0 0
0 2 0
D = P1 AP = ..
.. . .
.. ,
.
. .
.
0 0 n
where D is diagonal with the eigenvalues along the diagonal as shown.
Remarks:
1. To prove that A and D are similar, consider the following:
D I = P1 AP I
= P1 (A I) P
= P A I P1 
= P A I
1
P
D I = A I ;
therefore, D and A have the same eigenvalues.
2. The general diagonalization procedure requires premultiplying by the inverse
of the modal matrix. When the modal matrix is orthogonal, as for symmetric
A with distinct eigenvalues, then P1 = PT = QT . Note that whereas it is
necessary for the orthonormal modal matrix Q to be formed from the normalized eigenvectors, it is not necessary to normalize the eigenvectors when
forming P; they simply need to be linearly independent.
3. An n n matrix A can be diagonalized if there are n regular eigenvectors
that are linearly independent [see ONeil (2012), for example, for a proof].
Consequently, all symmetric matrices can be diagonalized, and if the matrix
is not symmetric, the eigenvalues must be distinct.
4. It is not necessary to evaluate P1 AP, because we know the result D if the
eigenvectors are linearly independent.
8
2.4 Diagonalization
65
5. The term modal matrix arises from its application in dynamical systems in
which the diagonalization, or decoupling, procedure leads to isolation of the
natural modes of vibration of the system, which correspond to the eigenvalues.
The general motion of the system is then a superposition (linear combination)
of these modes (see Section 19.1).
2.4.2 Jordan Canonical Form
If an n n matrix A has fewer than n regular eigenvectors, as may be the case
for a nonsymmetric matrix with repeated eigenvalues, it is called defective and
additional generalized eigenvectors may be obtained such that they are linearly
independent with the regular eigenvectors in order to form the modal matrix. In
this case, P1 AP results in the Jordan canonical form, which is not completely
diagonalized.
For example, if A has two repeated eigenvalues 1 (multiplicity two) and three
repeated eigenvalues 2 (multiplicity three), then the Jordan canonical form is
1 a1 0 0 0
0 1 0 0 0
1
,
0
0
a
0
J = P AP =
2
2
0 0 0 2 a3
0 0 0 0 2
where a1 , a2 , and a3 above the repeated eigenvalues are 0 or 1.9 Unfortunately,
the only way to find the Jordan canonical matrix J is to actually evaluate P1 AP
just to determine the elements above the repeated eigenvalues.
In order to form the modal matrix P when there are less than n regular eigenvectors, we need a procedure for obtaining the necessary generalized eigenvectors
from the regular ones. Recall that regular eigenvectors ui satisfy the eigenproblem
(A i I) ui = 0.
If for a given eigenvalue, say 1 , with multiplicity k, we only obtain one regular eigenvector, then we need to obtain k 1 generalized eigenvectors. These
generalized eigenvectors satisfy the sequence of eigenproblems
m
(A 1 I) um = 0,
m = 2, 3, . . . , k.
Note that the regular eigenvector u1 results from the case with m = 1.
Rather than taking successive integer powers of (A 1 I) and obtaining the
corresponding eigenvectors, observe the following. If
(A 1 I) u1 = 0
(2.13)
(A 1 I) u2 = 0.
9
(2.14)
66
(A 1 I) u2 = (A 1 I) u1 ,
but from equation (2.13), the righthand side is zero and we have equation (2.14).
Thus, the generalized eigenvectors can be obtained by successively solving the
systems of equations
(A 1 I) um = um1 ,
m = 2, 3, . . . , k,
(2.15)
Example 2.6 Determine the regular and generalized eigenvectors for the nonsymmetric matrix
2 1 2 0
0 3 1 0
.
A=
0 1
1 0
0 1 3 5
Then obtain the generalized modal matrix that reduces A to the Jordan canonical
form, and determine the Jordan canonical form.
Solution: The eigenvalues are
1 = 2 = 3 = 2,
4 = 5.
u1 =
0 ,
0
2.4 Diagonalization
67
u4 =
0 .
1
Therefore, we only have two regular eigenvectors.
In order to determine the two generalized eigenvectors, u2 and u3 , corresponding to = 2, we solve the system of equations
(A 1 I) u2 = u1 ,
(2.16)
(2.17)
to obtain u3 from u2 . For example, using Gaussian elimination to solve the two
systems of equations in succession, we obtain
c1
c2
1
c1 + 2
u2 =
1 , u3 = c1 + 1 ,
2/3
2c1 /3 + 5/9
where c1 and c2 are arbitrary and arise because (2.16) and (2.17) do not have
unique solutions owing to the fact that A 1 I = 0. Choosing c1 = c2 = 0, we
have the generalized modal matrix
1 0
0 0
0 1
2 0
.
P=
0 1
1 0
0 2/3 5/9 1
This procedure produces four linearlyindependent regular and generalized eigenvectors. Pseudodiagonalizing then gives the Jordan canonical form
2 1 0 0
0 2 1 0
J = P1 AP =
0 0 2 0 .
0 0 0 5
Note that this requires that we actually invert P and evaluate P1 AP, unlike
cases for which D = P1 AP is a diagonal matrix. Observe that the eigenvalues
are on the diagonal with 0 or 1 above the repeated eigenvalues as expected.
Note that the generalized eigenvector(s), and the regular eigenvector(s) from
which it is obtained, must be placed in the order in which they are obtained to
form the modal matrix P.
68
Problem Set # 3
=
= ax(t),
dt
where a is a constant, is
x(t) = ceat ,
where c is a constant of integration. A dot denotes differentiation with respect to
the independent variable t.10
Now consider a system of n coupled firstorder linear ordinary differential equations
x 1 (t) = A11 x1 (t) + A12 x2 (t) + + A1n xn (t) + f1 (t)
x 2 (t) = A21 x1 (t) + A22 x2 (t) + + A2n xn (t) + f2 (t)
,
..
.
x n (t) = An1 x1 (t) + An2 x2 (t) + + Ann xn (t) + fn (t)
where the Aij coefficients and fi (t) functions are known, and the functions x1 (t),
x2 (t), . . ., xn (t) are to be determined. If fi (t) = 0, i = 1, . . . , n, the system is
homogeneous. This system may be written in matrix form as
x(t)
= Ax(t) + f (t).
(2.18)
In order to solve this coupled system, we transform the solution vector x(t) to a
new vector of dependent variables y(t) for which the equations are easily solved.
In particular, we diagonalize the coefficient matrix A such that with respect to
the new coordinates y, the system is uncoupled. This is accomplished using
x(t) = Py(t),
10
(2.19)
We follow the convention of indicating ordinary differentiation with respect to time using dots and
primes otherwise, for example, with respect to x.
69
..
..
..
.
.
.
u
u
u
P=
1
2
n .
..
..
..
.
.
.
x(t)
= Py(t),
(2.20)
Py(t)
= APy(t) + f (t).
Premultiplying by P1 gives
y(t)
= P1 APy(t) + P1 f (t).
Recall that unless A is nonsymmetric with repeated eigenvalues, D = P1 AP is
a diagonal matrix with the eigenvalues of A on the main diagonal. For example,
for a homogeneous system, having f (t) = 0, we have
y 1 (t)
1 0 0
y1 (t)
y 2 (t) 0 2 0 y2 (t)
.. = ..
.. . .
.. .. ,
. .
. . .
.
y n (t)
0 0 n yn (t)
or
y 1 (t) = 1 y1 (t)
y 2 (t) = 2 y2 (t)
.
..
.
y n (t) = n yn (t)
Therefore, the differential equations have been uncoupled, and the solutions in
terms of the transformed variables are
y1 (t) = c1 e1 t , y2 (t) = c2 e2 t , , yn (t) = cn en t ,
or in vector form
c1 e1 t
c2 e2 t
y(t) = .. .
.
cn en t
70
Now transform back to determine the solution in terms of the original variable
x(t) using
x(t) = Py(t).
The constants of integration, ci , i = 1, . . . , n, are determined using the initial
conditions for the differential equations.
Remarks:
1. For this procedure to fully uncouple the equations requires linearlyindependent
eigenvectors; therefore, it works if A is symmetric or if A is nonsymmetric with
distinct eigenvalues.
2. If A is nonsymmetric with repeated eigenvalues, that is, the eigenvectors are
not all linearly independent and a Jordan canonical matrix results from the
diagonalization procedure, then the system of equations in y can often still be
solved, although they are not completely uncoupled.
3. Note that if A is symmetric with distinct eigenvalues, then P1 = PT (if P is
comprised of orthonormal eigenvectors), and P = Q is an orthonormal matrix.
4. If the system is not homogeneous, determine the homogeneous and particular
solutions of the uncoupled equations and sum to obtain the general solution.
5. Although we can write down the uncoupled solution in terms of y(t) having
only the eigenvalues, we need the modal matrix P to transform back to the
original variable x(t).
6. Alternatively, reconsider the formulation for homogeneous systems
x(t)
= Ax(t).
(2.21)
(2.22)
Then
x i (t) = i ui ei t ,
and substituting into equation (2.21) leads to
i ui ei t = Aui ei t ;
therefore,
Aui = i ui .
Observe that we have an eigenproblem for the coefficient matrix A, with the
eigenvalues 1 , 2 , . . . , n and eigenvectors u1 , u2 , . . . , un . After obtaining the
eigenvalues and eigenvectors, the solutions (2.22) are
x1 (t) = u1 e1 t , x2 (t) = u2 e2 t , . . . , xn (t) = un en t ,
and the general solution is
x(t) = c1 u1 e1 t + c2 u2 e2 t + . . . + cn un en t (= Py) .
71
dx1
=
dt
dx2
= x1
dt
dx3
= x1 +
dt
Solution: We write this system in matrix
x 1 (t)
0 1
x 2 (t) = 1 0
x 3 (t)
1 1
x2 + x3
+ x3 .
x2
form x0 = Ax, or
1 x1 (t)
1 x2 (t) .
0 x3 (t)
Note that the coefficient matrix is symmetric; therefore, the eigenvectors are
linearly independent (even if the eigenvalues are not distinct). The characteristic
equation for the coefficient matrix is
3 3 2 = 0,
which gives the eigenvalues
1 = 1, 2 = 1, 3 = 2.
Consider the repeated eigenvalue 1 = 2 = 1. Substituting into (A I)x = 0
gives the single equation
x1 + x2 + x3 = 0.
Let x1 = d1 , x2 = d2 , then x3 = d1 d2 and we have
d1
.
u1,2 = d2
d1 d2
To obtain u1 , we choose d1 = 1, d2 = 0 resulting in
1
u1 = 0 ,
1
and for u2 we choose d1 = 1, d2 = 1 to give
1
u2 = 1 .
0
72
Be sure to check and confirm that u1 and u2 are linearly independent. Substituting 3 = 2 gives the two equations (after some row reduction)
2x1 + x2 + x3 = 0
x2 + x3 = 0
u3 = d1 1 .
1
Now we form the modal matrix
1 1 1
1 1 .
P= 0
1
0 1
x1 (t)
1
x2 (t) = 0
x3 (t)
1
1 1 c1 et
1 1 c2 et .
0 1
c3 e2t
The coefficients c1 , c2 , and c3 would be determined using the three required initial
conditions. For example, the solution with initial conditions x1 (0) = 1, x2 (0) =
0, and x3 (0) = 2 is given in Figure 2.4.
In the next example, the system of equations that results from applying the
relevant physical principle is not of the usual form.
Example 2.8 Obtain the differential equations governing the parallel electrical
circuit shown in Figure 2.5.
Solution: In circuit analysis, we define the current I, resistance R, voltage V ,
inductance L, and capacitance C, which are related as follows. Ohms law gives
the voltage drop across a resistor as
V = IR,
73
0.2
0.4
0.6
0.8
1.0
1
dI
,
dt
dV
,
dt
in which case
1
V =
C
Idt.
74
VL + VR1 = 0
dI1
+ (I1 I2 )R1 = 0.
dt
Loop 2:
VC + VR2 + VR1 = 0
1
C
I2 dt + I2 R2 + (I2 I1 )R1 = 0.
Differentiating the second equation with respect to t (to remove the integral)
leads to
dI2 dI1
dI2
= 0.
+ CR1
I2 + CR2
dt
dt
dt
Therefore, we have the system of ordinary differential equations for the currents
I1 (t) and I2 (t) given by
dI1
= R1 I1 + R1 I2
dt
,
dI2
dI1
+ C(R1 + R2 )
= I2
CR1
dt
dt
L
or in matrix form
L
0
I1 (t)
I1 (t)
R1 R1
,
=
CR1 C(R1 + R2 )
I2 (t)
0
1
I2 (t)

{z
}
{z
}

A1
A2
or
A1 x(t)
= A2 x(t).
To obtain the usual form (x = Ax), premultiply A1
1 by both sides to give
x(t)
= A1
1 A2 x(t).
Therefore,
A = A1
1 A2 .
We would then diagonalize A as usual in order to solve for I1 (t) and I2 (t).
A sample solution with R1 = 10, R2 = 3, L = 30, C = 5 and initial conditions
I1 (0) = 10, and I2 (0) = 5 is shown in Figure 2.6. Note that because there is no
voltage source, the current decays with time via a damped oscillation.
Next we consider an example consisting of a nonhomogeneous system of equations having f (t) 6= 0.
75
10
20
40
60
80
100
120
140
10
20
30
Example 2.9
x 1 (t) =
x 2 (t) = 2x1
or
x(t)
= Ax(t) + f (t),
where
0 2
A=
,
2 0
sin t
f (t) =
.
t
76
y(t)
= P1 APy(t) + P1 f (t).
Recalling that D = P1 AP has the eigenvalues along its main diagonal and
evaluating P1 f (t), this becomes
1 sin t + t
y 1 (t)
2 0
y1 (t)
;
=
+
y 2 (t)
0 2 y2 (t)
2 sin t t
therefore, we have the two uncoupled equations
1
1
sin t + t
2
2
.
1
1
y 2 (t) = 2y2 (t) + sin t t
2
2
Because the equations are nonhomogeneous, the solutions are of the form
y1 (t) = y1c (t) + y1p (t)
y2 (t) = y2c (t) + y2p (t)
We determine the particular solutions using the method of undetermined coefficients,11 which works for righthand sides involving polynomials, exponential
functions, and trigonometric functions.
Consider the equation for y1 (t), which is12
y1p (t) = A sin t + B cos t + Ct + D,
0
y1p
(t) = A cos t B sin t + C.
Substituting yields
A cos t B sin t + C = 2(A sin t + B cos t + Ct + D) +
1
1
sin t + t.
2
2
Equating like terms leads to four equations for the four unknown constants as
11
12
77
follows
A=
cos t : A = 2B
sin t : B = 2A +
1
t : 0 = 2C +
2
const : C = 2D
1
2
1
5
B = 4B +
1
1
B=
2
10
1
C=
4
1
D=
8
1
1
1
1
sin t
cos t t .
5
10
4
8
1
1
1
1
sin t
cos t t + ,
5
10
4
8
1
1
1
1
sin t
cos t t + .
5
10
4
8
78
x3 (t) = x
(t),
..
.
xn1 (t) = x(n2) (t),
xn (t) = x(n1) (t).
Differentiating the above substitutions results in a system of firstorder equations
x 1 (t)
= x(t)
= x2 (t),
x 2 (t)
=x
(t)
...
= x (t)
= x3 (t),
x 3 (t)
= x4 (t),
..
.
x n1 (t) = x(n1) (t) = xn (t),
x n (t)
= x(n) (t)
= F (t, x1 , x2 , , xn ),
with the last equation following from the original differential equation.
Remarks:
1. This approach can be used to convert any system of higherorder linear differential equations to a system of firstorder linear equations. For example, three
coupled secondorder equations could be converted to six firstorder equations.
2. For dynamical systems considered in the following section, this firstorder form
is called the statespace representation.
Example 2.10
(2.23)
79
dx
= 2.
dt t=0
(2.24)
(2.25)
Differentiating the substitutions and transforming to x1 (t) and x2 (t), we have the
following system of two firstorder equations
x 1 (t) = x(t)
= x2 (t),
x 2 (t) = x
(t) = x(t) = x1 (t),
for the two unknowns x1 (t) and x2 (t). Note that the original secondorder equation (2.23) has been used in the final equation (
x = x). Written in matrix form,
x(t)
= Ax(t), we have
x 1 (t)
0 1 x1 (t)
=
,
x 2 (t)
1 0 x2 (t)
where A is not symmetric.
To obtain the eigenvalues, we evaluate A I = 0, or
1
1 = 0,
which yields the characteristic equation
2 + 1 = 0.
Factoring gives the eigenvalues
1 = i,
2 = i,
which is a complex conjugate pair but are distinct. Having complex eigenvalues
requires a minor modification to the procedure outlined above, but for now we
proceed as before. The corresponding eigenvectors are also complex and given by
i
i
u1 =
, u2 =
.
1
1
Consequently, forming the modal matrix, we have
i i
P=
.
1 1
Note that because A is not symmetric but has distinct eigenvalues, we have
linearlyindependent eigenvectors, and the system can be fully diagonalized. In
80
y2 (t) = c2 eit .
Transforming back using x(t) = Py(t) gives the solution to the system of firstorder equations in terms of x(t). From the substitutions (2.25), we obtain the
solution with respect to the original variable as follows
x(t) = x1 (t) = c1 ieit c2 ieit .
We would normally be finished at this point. Because the solution is complex,
however, we must do a bit more work to obtain the real solution. It can be
shown that for linear equations, both the real and imaginary parts of a complex
solution are by themselves solutions of the differential equations, and that a linear
combination of the real and imaginary parts, which are both real, is also a solution
of the linear equations. We can extract the real and imaginary parts by applying
Eulers formula, which is
eait = cos(at) + i sin(at).
Applying the Euler formula to our solution yields
x(t) = c1 i (cos t i sin t) c2 i (cos t + i sin t) ;
therefore, the real and imaginary parts are
Re(x) = (c1 + c2 ) sin t,
Im(x) = (c1 c2 ) cos t.
To obtain the general solution for x(t), we then superimpose the real and imaginary parts to obtain the general form of the solution to the original secondorder
differential equation
x(t) = A sin t + B cos t.
(2.26)
The constants A and B are obtained by applying the initial conditions (2.24),
which lead to A = 2 and B = 1; therefore, the final solution to (2.23) subject to
the initial conditions (2.24) is
x(t) = 2 sin t + cos t.
Remarks:
1. Observe that the imaginary eigenvalues correspond to an oscillatory solution.
2. The secondorder differential equation (2.23) considered in this example governs the motion of an undamped oscillator. Dynamical systems are discussed
in more depth in Part III.
81
Remarks:
1. The approach used in the above example to handle complex eigenvalues and
eigenvectors holds for linear systems for which superposition of solutions is
valid.
2. For additional examples of systems of linear ordinary differential equations
and application to discrete dynamical systems, see Chapter 19.
3. For an illustration of the types of solutions possible when solving systems of
nonlinear equations, see the Mathematica notebook Lorenz.nb.
Problem Set # 4
82
(2.27)
(2.28)
or premultiplying by A = VR
AAT = VRRT V = V2 .
(2.29)
Recall that the eigenvectors of U and U2 are the same, and the eigenvalues of U2
are 2i , where i are eigenvalues of U. Therefore, we determine the eigenvalues
and eigenvectors of the symmetric matrix U2 = AT A, and form the orthonormal
modal matrix Q. Then to diagonalize the symmetric matrices U2 and U, we take
2
1 0 0
0 22 0
QT U2 Q = ..
.. . .
.. ,
.
.
.
.
0
2n
and
1 0 0
0 2 0
QT UQ = ..
.. . .
.. .
.
. .
.
0 0 n
1 0 0
0 2 0
T
U = Q ..
.. . .
. Q .
.
. ..
.
0
83
(2.30)
AT Av = 2 v,
(2.31)
where 2 are the eigenvalues, and u and v are the respective sets of eigenvectors.
As suggested by equations (2.31), the eigenvalues of both AAT and AT A are the
same (despite their size difference). More specifically, all of the nonzero eigenvalues are the same, and the remaining eigenvalues of the larger matrix are all zeros.
Note that because AAT and AT A are at least positive semidefinite, thereby having all nonnegative eigenvalues, we can denote them as squared quantities.
If equations (2.31) hold, then it can be shown that the eigenvectors u and v
are related through the relationships
Av = u,
AT u = v,
(2.32)
where the values of are called the singular values of the matrix A. These
relationships can be confirmed by substitution into equations (2.31). For example,
substituting the first of equations (2.32) into the second of equations (2.31) and
canceling one of the s from both sides yields the second of equations (2.32).
84
Similarly, substituting the second of equations (2.32) into the first of equations
(2.31) and canceling one of the s from both sides yields the first of equations
(2.32). Thus, there is a special relationship between the two sets of basis vectors
u and v.
Finally, from the relationship (2.30), we can obtain the singular value decomposition of the matrix A by premultiplying U on both sides and postmultiplying
VT on both sides to yield
A = UVT ,
(2.33)
To further see the connection between the SVD and diagonalization, using this
decomposition, observe that
T
AAT = UVT UVT = UVT VT UT = U2 UT ,
where we have used the fact that VT V = I because V is orthogonal, and T =
2 for the diagonal matrix , where 2 contains the squares of the singular
values along the diagonal. Premultiplying UT on both sides and postmultiplying
U on both sides leads to
2 = UT AAT U.
Therefore, U is the orthogonal modal matrix that diagonalizes the symmetric
matrix AAT . Similarly,
2 = VT AT A V,
such that V is the orthogonal modal matrix that diagonalizes the symmetric
matrix AT A.
Method
Any mn matrix A, even singular or nearly singular matrices, can be decomposed
as follows
A = UVT ,
where U is an m m orthogonal matrix, V is an n n orthogonal matrix, and
is an m n diagonal matrix of the form
1 0 0 0 0
0 2 0 0 0
= ..
.. . .
. .
.. ,
.
. .. ..
.
.
0 0 p 0 0
where p = min(m, n) and 1 2 p 0. The i s are the square roots of
the eigenvalues of AT A or AAT , whichever is smaller, and are called the singular
values of A.
The columns ui of matrix U and vi of matrix V satisfy
Avi = i ui ,
A T ui = i v i ,
85
and kui k = 1, kvi k = 1, that is, they are normalized. From the first equation
AT Avi = i AT ui = i2 vi ;
(2.34)
therefore, the vi s are the eigenvectors of AT A, with the i2 s being the eigenvalues. From the second equation
AAT ui = i Avi = i2 ui ;
(2.35)
therefore, the ui s are the eigenvectors of AAT with the i2 s being the eigenvalues.
We only need to solve one of the eigenproblems (2.34) or (2.35). Once we have
vi , for example, then we obtain ui from Avi = i ui . Whereas, if we have ui ,
then we obtain vi from AT ui = i vi . We compare m and n to determine which
eigenproblem will be easier (smaller).
Example 2.11
1 1
AT = 0 1 ;
1 0
therefore,
2 1 1
AT A = 1 1 0 ,
1 0 1
AAT =
2 1
.
1 2
86
1 1
2
1 T
1
1
1
1
v 1 = A u1 = 0 1
= 1 ,
1
3 1 0
2 1
6 1
which is normalized. Similarly,
1 1
0
1
1
1 T
1 1
1 .
=
v 2 = A u2 = 0 1
2
1 1 0
2 1
2 1
Again, note that v1 and v2 are orthonormal. The third eigenvector, v3 , also must
be orthogonal to v1 and v2 ; thus, by inspection (let v3 = [a b c]T and evaluate
hv3 , v1 i = 0 and hv3 , v2 i = 0 to determine two of the three constants, then
normalize to determine the third)
1
1
v3 = 1 .
3 1
Therefore,
V = [v1 v2 v3 ] =
and
2
16
6
1
6
0
1
2
12
1
3
13
,
13
3 0 0
U AV =
= ,
0
1 0
where the singular values of A are 1 = 3 and 2 = 1. One can check the
result of the decomposition by evaluating
T
A = UVT .
See the Matrices with Mathematica Demo.
Remarks:
1. Note that we have not made any assumptions about A; therefore, the SVD can
be obtained for any real or complex rectangular matrix even if it is singular. It
is rather remarkable, therefore, that any matrix can be decomposed into the
product of two orthogonal matrices and one diagonal matrix.
2. For m n matrix A, r = rank(A) = rank(AT A) = rank(AAT ) is the number
of nonzero singular values of A. If r < p = min(m, n), then A is singular.
3. If A is nonsingular and invertible (must be square), the singular values of A1
are the reciprocals of the singular values of A.
87
8.
9.
10.
11.
13
88
f (x) = ex ,
f (x) = x2 + x + 1,
f (A) = eA ,
f (A) = A2 + A + I.
Indeed, there is, and that is the subject of this section, and no, it is not simply
the sine, exponential, etc. of each element.
Because we know how to take integer powers of matrices, we can express the
matrix polynomial of degree m in the form
Pm (A) = 0 I + 1 A + 2 A2 + + m Am ,
where I is the identity matrix of the same size as A. Therefore, we have a straightforward generalization of algebraic polynomials. Because we can express any integer power of a matrix, we can also imagine expressing functions for which a
Taylor series exists. For example, the Taylor series of the exponential function is
f (x) = ex = 1 + x +
x2
+ ;
2!
A2
+ .
2!
sin x = x
A3 A5
+
3!
5!
A2 A4
cos A = I
+
2!
4!
sin A = A
See Variational Methods with Applications in Science and Engineering, Chapters 6 and 10.
89
The CayleyHamilton theorem is proven for symmetric matrices in Hildebrand, Section 1.22.
90
Remarks
1. The CayleyHamilton theorem may be used to obtain the inverse of a nonsingular matrix in terms of matrix multiplications. If a nonsingular n n matrix
A has the characteristic polynomial
Pn () = n + c1 n1 + + cn1 + cn = 0,
then from the CayleyHamilton theorem
An + c1 An1 + + cn1 A + cn I = 0.
Premultiplying by A1 and rearranging gives
1 n1
A1 =
A
+ c1 An2 + + cn1 I ,
cn
which is an expression for the inverse of a matrix in terms of a linear combination of powers of the matrix.
2. Let us consider the case in which the matrix A has repeated eigenvalues. If
P () has (r )s , where s > 1, as a factor, that is, r is a repeated eigenvalue
with multiplicity s, a symmetric matrix A also satisfies
G(A) = 0,
where
G() =
P ()
.
( r )s1
n
X
k=1
f (k )
Y A r I
,
k r
r6=k
recalling that r6=k represents the product of factors with r = 1, 2, . . . , n, excluding r 6= k.16 The essential result is that any matrix function f (A) can be
expressed as a linear combination of I, A, A2 , . . . , An1 , where n is the size of A.
As such, rather then expressing a function in the form of a Taylor series having
infinite powers of A, one can express such functions as a polynomial in A of degree n 1. Note that Sylvesters formula does not apply for the triangular matrix
in the above example because 1 = 2 = 2.17
Although Sylvesters formula tells us that f (A) can be expressed as a polynomial, it is more straightforward to determine that polynomial using an alternative
method, which applies for any f (A), including with repeated eigenvalues. To illustrate the alternative method, let us determine f (A) = eA for A 2 2 and
16
17
91
(2.36)
where we simply need to determine the constants a0 and a1 . As a result of CayleyHamiltons theorem and Sylvesters formula, we do so by taking advantage of the
fact that
f () = a0 + a1 .
Hence, the matrix equation becomes a scalar equation of the same form.
In this case, f () = e with the two eigenvalues 1 and 2 of A. Given these
two eigenvalues, we have the two linear algebraic equations
e1 = a0 + a1 1
e2 = a0 + a1 2
1 e2 2 e1
,
1 2
a1 =
e1 e2
.
1 2
1
(1 e2 2 e1 )I + (e1 e2 )A .18
1 2
18
2 1 0
A = 0 2 1 .
0 0 2
92
4 4 1
A2 = 0 4 4 .
0 0 4
The polynomial is of the form
f (A) = a0 I + a1 A + a2 A2 .
Because the matrix A is triangular, the eigenvalues are on the diagonal, and
= 2 has multiplicity three. Hence, owing to the repeated eigenvalue, we have
f () = a0 + a1 + a2 2 ,
f 0 () = a1 + 2a2 ,
f 00 () = 2a2 .
In this case, with f (A) = cos A, we have
cos 2 = a0 + 2a1 + 4a2 a0 = cos 2 + 2 sin 2,
sin 2 = a1 + 4a2
a1 = 2 cos 2 sin 2,
cos 2 = 2a2
1
a2 = cos 2.
2
Therefore,
cos A = (2 sin 2 cos 2)I + (2 cos 2 sin 2)A 21 cos 2A2
1 0 0
2 1 0
4 4 1
1
= (2 sin 2 cos 2) 0 1 0 + (2 cos 2 sin 2) 0 2 1 cos 2 0 4 4
2
0 0 1
0 0 2
0 0 4
3
Eigenfunction Solutions of Differential
Equations
Time was when all the parts of the subject were dissevered, when algebra, geometry,
and arithmetic either lived apart or kept up cold relations of acquaintance confined to
occasional calls upon one another; but that is now at an end; they are drawn together
and are constantly becoming more and more intimately related and connected by a
thousand fresh ties, and we may confidently look forward to a time when they shall
form but one body with one soul.
(James Joseph Sylvester)
Thus far, we have focused our attention on linear algebraic equations represented as vectors and matrices. In this chapter, we build on analogies with
vectors and matrices to develop methods for solving ordinary and partial differential equations using eigenfunction expansions, which are series expansions
of solutions in terms of eigenfunctions of differential operators. As such, vectors
become functions, and matrices become differential operators.
As engineers and scientists, we view matrices and differential operators as two
distinct entities, and we motivative similarities by analogy. Although we often
think that the formal mathematical approach replete with theorems and proofs
is unnecessarily cumbersome (even though we appreciate that someone has put
such methods on a firm mathematical foundation), this is an example where the
more mathematical approach is very valuable. Although from the applications
point of view, matrices and differential operators seem very different and occupy
different topical domains, the mathematician recognizes that matrices and certain
differential operators are both linear operators that have particular properties,
thus unifying these two constructs. Therefore, the mathematician is not surprised
by the relationship between operations that apply to both; in fact, it couldnt be
any other way!
We begin by defining a function space by analogy to vector spaces that allows us
to consider linear independence and orthogonality of functions. We then consider
eigenvalues and eigenfunctions of differential operators, which allow us to develop
powerful techniques for solving ordinary, and in fact partial, differential equations
equations. These analytical methods also provide the background for spectral
numerical methods.
93
94
i 6= j.
Function Space:
1. A set of functions ui (x), i = 1, . . . , m, are linearly independent if their linear
combination is zero, that is
c1 u1 (x) + c2 u2 (x) + + cm um (x) = 0,
only if the ci coefficients are all zero. Thus, no function ui (x) can be written
as a linear combination of the others.
2. Any piecewise continuous function in an interval may be expressed as a linear combination of an infinite number of linearlyindependent basis functions.
Although not necessary, we prefer mutually orthogonal basis functions.
um (x)
u01 (x)
u02 (x)
u0m (x)
W [u1 (x), u2 (x), . . . , um (x)] =
.
..
..
..
..
.
.
.
.
(m1)
(m1)
(m1)
u
(x) u
(x) u
(x)
1
95
cn un (x),
n=1
where not all ci s are zero. Because this requires an infinite set of basis functions,
we say that the function f (x) is infinite dimensional.
Although not necessary, we prefer mutually orthogonal basis functions such
that
Z b
hui , uj i =
r(x)ui (x)uj (x)dx = 0, i 6= j,
a
that is, ui (x) and uj (x) are orthogonal over the interval [a, b] with respect to the
weight function r(x). Note that r(x) = 1 unless stated otherwise.
Analogous to norms of vectors, we define the norm of a function, which gives
a measure of a functions size. Recall that the norm of a vector1 is defined by
!1/2
n
X
1/2
2
.
kuk = hu, ui =
uk
k=1
ku(x)k = hu, ui
#1/2
r(x)u (x)dx
where the discrete sum is replaced by a continuous integral in both the inner
product and norm. For example, we can use the GramSchmidt orthogonalization
1
Recall from Section ?? that we use the L2 norm unless indicated otherwise.
96
= x
ku0 k =
(1)2 dx
= 2;
1
u0
1
= .
ku0 k
2
Consider u1 (x) = x. First, evaluate the inner product of u1 (x) with the previous
function u0 (x)
1
Z 1
1
x2
hu1 , u0 i =
xdx = = 0;
2 1
2 2 1
therefore, they are already orthogonal, and we simply need to normalize u1 (x) as
above:
1 !1/2 r
Z 1
1/2
x3
2
2
ku1 k =
x dx
=
=
;
3
3
1
1
u1
u1 (x) =
=
ku1 k
3
x.
2
Recall that even functions are such that f (x) = f (x), and odd functions are
such that f (x) = f (x). Therefore, all of the odd functions (x, x3 , . . .) are
orthogonal to all of the even functions (x0 , x2 , x4 , . . .) in the interval 1 x 1.
Now considering u2 (x) = x2 , the function that is mutually orthogonal to the
previous ones is given by
u02 = u2 hu2 , u1 i u1 hu2 , u0 i u0 ,
97
but hu2 , u
1 i = 0, and
1
hu2 , u0 i =
2
1
x3
2
x dx = = .
3 2 1 3 2
1
Thus,
u02
2
=x
3 2
2
1
= x2 .
3
Normalizing gives
ku02 k
Z
1
2
=
1
u0
u
2 (x) = 2 =
ku2 k
1 2
3
1/2
dx
8
;
45
r 2
45
1
5 3x 1
2
x
=
.
8
3
2
2
If the square root factors are removed, this produces Legendre polynomials, which
are denoted by Pn (x) (see Section 3.3.3). Any continuous (or piecewise continuous) function f (x) in the interval 1 x 1 can be written as a linear
combination of Legendre polynomials according to
f (x) =
X
n=0
an u
n (x) =
hf (x), u
n (x)i u
n (x),
n=0
where an = hf (x), u
n (x)i because u
n (x) are orthonormal (cf. vectors).
98
according to
1
cos(nx), n = 1, 2, 3, . . .
sin(mx), m = 1, 2, 3, . . .
2
1
X
a0
cos(nx) X sin(mx)
f (x) = +
an
+
bm
,
2 n=1
m=1
where
a0
Z 2
1
1
=
=
f (x),
f (x)dx,
2
2 0
an
Z 2
cos(nx)
1
f (x) cos(nx)dx,
=
f (x),
=
bm =
Z 2
sin(mx)
1
f (x),
f (x) sin(mx)dx.
=
This produces the Fourier series (see Section 3.4.2). Once again, any piecewise
continuous function f (x) in the interval 0 x 2 can be expressed as a Fourier
series.
Remarks:
1. An infinite set of orthogonal functions u0 (x), u1 (x), u2 (x) . . . is said to be complete if any piecewise continuous function in the interval can be expanded in
terms of u0 (x), u1 (x), u2 (x), . . . . Note that whereas an ndimensional vector
space is spanned by n mutually orthogonal vectors, an infinitedimensional
function space requires an infinite number of mutually orthogonal functions to
span.
2. Analogous definitions and procedures can be developed for functions of more
than one variable; for example, two functions u(x, y) and v(x, y) are orthogonal
in the region A if
ZZ
hu(x, y), v(x, y)i =
r(x, y)u(x, y)v(x, y)dxdy = 0.
A
3. The orthogonality of Legendre polynomials and Fourier series, along with other
sets of functions, will prove very useful and is why they are referred to as special
functions.
99
mspace
nspace
3.2.1 Definitions
Let us further build on the analogy between vector and function spaces.
Vector Space: Consider the linear transformation (cf. Section 1.6)
A x = y ,
mnn1
m1
d2
dn
d
+ a2 2 + + an n ,
dx
dx
dx
which is an nth order linear differential operator with constant coefficients. In the
differential equation
Lu(x) = f (x),
the differential operator L transforms u(x) into f (x) as illustrated in Figure 3.3.
Just as an eigenproblem can be formulated for a matrix A in the form
Au = u,
100
u(x)
f(x)
Figure 3.3 Transformation from u(x) to f (x) via the linear transformation
L.
u(b) = 0,
u0 (b) = 0,
c3 u0 (b) + c4 u(b) = 0,
u0 (b) = 0.
Example 3.3
101
d2 u
= f (x),
dx2
over the range 0 x 1 with the homogeneous boundary conditions
u(0) = 0,
(3.1)
u(1) = 0.
Solve for u(x) in terms of the eigenfunctions of the differential operator in equation (3.1).
Solution: The differential operator is
L=
d2
,
dx2
(3.2)
(3.3)
Observe once again that the original differential equation is not an eigenproblem,
but that the eigenproblem is a differential equation.
We solve the eigenvalue problem using the techniques for solving linear ordinary
differential equations to determine the eigenvalues for which nontrivial solutions
u(x) exist. In general, may be positive, negative, or zero. In order to solve (3.3)
with (3.2), try letting = +2 > 0 giving
u00 2 u = 0.
The solution of constant coefficient, linear differential equations of this form is
u(x) = erx , where r is a constant. Upon substitution, this leads to the requirement
that r must satisfy
r2 2 = 0,
or
(r + )(r ) = 0.
Therefore, r = , and the solution is
u(x) = c1 ex + c2 ex ,
(3.4)
or equivalently
u(x) = c1 cosh(x) + c2 sinh(x).
Applying the boundary conditions to determine the constants c1 and c2
u(0) = 0 0 = c1 + c2
c1 = c2 ,
u(1) = 0 0 = c1 e + c2 e e = e .
The last condition is only true if = 0; therefore, the only solution (3.4) that
102
satisfies the boundary conditions is the trivial solution u(x) = 0. We are seeking
nontrivial solutions, so let = 2 < 0, in which case we have
u00 + 2 u = 0.
Again, considering a solution of the form u(x) = erx , r must satisfy
r2 + 2 = 0,
or
(r + i)(r i) = 0.
The solution is
u(x) = c3 eix + c4 eix ,
or from the Euler formula
u(x) = c3 cos(x) + c4 sin(x).
(3.5)
c3 = 0,
n = 1, 2, . . . .
Recall that we must consider positive, negative, and zero. We have considered
the cases with < 0 and > 0. From equation (3.3) with = 0, the eigenproblem
is u00 = 0, which has the solution
u(x) = c5 x + c6 .
103
sin
c5 = 0.
Therefore, we once again get the trivial solution. Thus, the roots that give nontrivial solutions are
= n = n,
n = 1, 2, 3, . . . ,
n = 1, 2, 3, . . . .
n = 1, 2, 3, . . . .
Note that it is not necessary to consider negative n because we obtain the same
eigenfunctions as for positive n. For example,
u2 (x) = c4 sin(2x)
u2 (x) = c4 sin(2x) = c4 sin(2x),
and both represent the same eigenfunction given that c4 is arbitrary.
The solution to the eigenproblem gives the eigenfunctions of the differential
operator. Although the constant c4 is arbitrary, it is often convenient to choose
c4 by normalizing the eigenfunctions, such that
kun k = 1,
or equivalently
kun k2 = 1.
104
Then
Z
u2n (x)dx = 1
1
=1
2
c4 = 2.
c24
un (x) = 2 sin(nx), n = 1, 2, 3, . . . ,
which is the Fourier sine series. Recall that the corresponding eigenvalues are
n = 2n = n2 2 ,
n = 1, 2, 3, . . . .
Observe that a solution of the differential eigenproblem (3.3) exists for any value
of ; however, not all of these solutions satisfy the boundary conditions. Those
nontrivial solutions that do satisfy both equation (3.3) and the boundary conditions are the eigenfunctions, and the corresponding values of are the eigenvalues,
of the differential operator.
Remarks:
1. Changing the differential operator and/or the boundary conditions will change
the eigenfunctions.
2. Although unusual, if more than one case ( < 0, = 0, > 0) produce nontrivial solutions, superimpose the corresponding eigenvalues and eigenfunctions.
3. In what follows, we will take advantage of the orthogonality of the eigenfunctions un (x). See Section 3.4 for why it is not necessary to explicitly check the
orthogonality of the eigenfunctions in this case.
Having obtained the eigenfunctions of the differential operator with associated
boundary conditions, we can now obtain a series solution to the original differential equation (3.1), which is repeated here
d2 u
= f (x), 0 x 1,
dx2
u(0) = 0, u(1) = 0.
(3.6)
We follow the same steps as outlined in Section 2.2.2. Note that because the
eigenfunctions, which are a Fourier sine series, are all mutually orthogonal, they
provide a basis for the function space spanned by any piecewise continuous function f (x) in the interval [0, 1]. Hence, the known righthand side of the differential
equation can be expressed as follows
X
f (x) =
bn un (x), 0 x 1.
n=1
105
In order to obtain the coefficients bn for a given f (x), take the inner product of
the eigenfunctions um (x) with both sides
hf (x), um (x)i =
n=1
Because the eigenfunctions are all orthonormal, the terms on the righthand side
vanish except for that corresponding to m = n, for which hun (x), um (x)i = 1.
Thus,
bn = hf (x), un (x)i .
We may also expand the unknown solution itself in terms of the orthonormal
eigenfunctions
X
u(x) =
an un (x),
n=1
X
d2 u X
an u00n (x) =
an n un (x),
Lu = 2 =
dx
n=1
n=1
where the last step is only possible because we have expanded u(x) using the
eigenfunctions of the differential operator. Substituting into the ordinary differential equation and equating like terms to determine the an coefficients gives
d2 u
dx2
= f (x),
an n un (x) =
n=1
bn un (x),
n=1
an n un (x) =
n=1
hf, un i un (x),
n=1
an
hf, un i
.
n
X
hf, un i
n=1
un (x),
0 x 1.
Note the similarity to the solution obtained in Section 2.2.2 using an analogous procedure. Substituting for the eigenvalues and eigenfunctions, we have the
Fourier sine series for the given differential equation and boundary conditions
E
X
2D
u(x) =
f
(x),
2
sin(nx)
sin(nx), 0 x 1.
n2 2
n=1
106
See the Mathematica notebook Solutions to Differential Equations using Eigenfunction Expansions for an illustration of the solution for f (x) = x5 .
Note: Solving u00 = x5 exactly is straightforward simply by integrating twice, so
what is the advantage of the eigenfunction solution approach?
1. In general, Lu = f (x) can only be solved exactly for certain f (x), whereas the
eigenfunction expansion approach may be applied for general (even piecewise
continuous) f (x).
2. Solutions for various f (x) may be obtained with minimal effort once the eigenvalues and eigenfunctions of the differential operator are obtained (cf. changing
the righthand side vector c in Section 2.2.2).
3. Eigenfunction expansions, for example Fourier series, may be applied to discrete data, such as from experiments. A popular approach is called properorthogonal decomposition (POD), which provides an alternative means of expressing large experimental or numerical data sets.
4. The eigenfunction approach provides the basis for spectral numerical methods
(see Section 3.7).
Example 3.4 Instead of equation (3.6), consider an ordinary differential equation of the form
d2 u
u = f (x), u(0) = 0, u(1) = 0,
(3.7)
dx2
where is some physical parameter.
Solution: The eigenproblem for the differential operator L is the same as in the
previous example; therefore, we use the same eigenfunctions, and following the
same procedure as above gives the coefficients in the expansion for the solution
as
hf, un i
.
an =
n
Therefore, the solution to equation (3.7) is
u(x) =
X
n=1
1
hf (x), un (x)i un (x),
+
n2 2
0 x 1,
un (x) = 2 sin(nx), n = 1, 2, 3, . . . .
If the parameter equals one of the eigenvalues, that is, = n = n2 2
for a particular n, then no solution exists unless hf, un i = 0, in which case an is
arbitrary. Again, recall the similarities to the cases considered in Section 2.2.2.
Alternatively, note that the differential operator in equation (3.7) could be
regarded as L = d2 /dx2 ; however, this would require obtaining different
eigenvalues and eigenfunctions.
107
hv, Aui = u, AT v .
We can think of this as a means to define the transpose of a matrix AT .
By analogy, a differential operator L has an adjoint3 operator L that satisfies
hv, Lui = hu, L vi ,
(3.8)
where u(x) and v(x) are arbitrary functions with homogeneous boundary conditions.
In order to illustrate the approach for determining the adjoint of a differential operator, consider the secondorder linear differential equation with variable
coefficients
1
[a0 (x)u00 + a1 (x)u0 + a2 (x)u] = 0, a x b,
(3.9)
Lu =
r(x)
where r(x) is a weight function. To obtain the adjoint operator, consider an
arbitrary function v(x), and take the inner product with Lu to obtain the lefthand side of (3.8) as follows
Z b
1
00
0
hv, Lui =
r(x)v(x)
[a0 (x)u + a1 (x)u + a2 (x)u] dx,
(3.10)
r(x)
a
where the inner product is taken with respect to the weight function r(x). We
want to switch the roles of u(x) and v(x) in the inner product, that is, interchange
derivatives on u(x) for derivatives on v(x), in order to obtain the righthand side
2
This material is from Variational Methods with Applications in Science and Engineering, Section
1.8.
In an unfortunate twist of terminology, the term adjoint has different meanings in the context of
matrices. Recall from Section 1.4.5 that the adjoint of a matrix is related to its cofactor matrix,
while from Section 1.1, it is also another term for taking the transpose of a matrix.
108
where
Z
pdq = pq
qdp
with
p = va1 ,
q = u,
dp = (va1 )0 dx,
dq = u0 dx.
(2)
(1)
p = va0 ,
q = u0 ,
dp = (va0 )0 dx,
dq = u00 dx,
p = (va0 )0 ,
v = u,
d
v = u0 dx.
(2)
(3.11)
Note that the variable coefficients move inside the derivatives, and the oddorder
4
See Variational Methods with Applications in Science and Engineering, Section 1.6, for a review of
integration by parts.
109
derivatives change sign as compared to Lu. This is the case for higherorder
derivatives as well.
Example 3.5
d2
d
+x ,
dx2
dx
with homogeneous boundary conditions.
Solution: From equation (3.9), we have
L=
a0 (x) = 1,
a1 (x) = x,
0 x 1,
a2 (x) = 0,
r(x) = 1.
d2
d
x
1.
dx2
dx
110
is, for arbitrary a0 (x), a1 (x), and a2 (x) coefficients. Let us determine the subset
of such equations that are selfadjoint.
Recall from equation (3.11) that the adjoint operator is given by
L v =
1
00
0
[a0 v] [a1 v] + a2 v .
r(x)
1
{a0 v 00 + [2a00 a1 ] v 0 + [a000 a01 + a2 ] v} .
r(x)
(3.12)
(3.13)
1
{a0 u00 + a00 u0 + a2 u} = 0.
r(x)
The differential operator in the above expression may be written in the form
1
d
d
L=
a0 (x)
+ a2 (x) ,
r(x) dx
dx
which is called the SturmLiouville differential operator.
Therefore, a secondorder linear differential operator is selfadjoint if and only
if it is of the SturmLiouville form
d
d
1
p(x)
+ q(x) ,
(3.14)
L=
r(x) dx
dx
where p(x) > 0 and r(x) > 0 in a x b, and the boundary conditions
are homogeneous. It follows that the corresponding eigenfunctions of the SturmLiouville differential operator are orthogonal with respect to the weight function
r(x).
Similarly, consider the fourthorder SturmLiouville differential operator
2
1
d
d2
d
d
L=
s(x)
+
p(x)
+
q(x)
.
r(x) dx2
dx2
dx
dx
111
This operator can also be shown to be selfadjoint if the boundary conditions are
homogeneous of the form
u = 0, u0 = 0,
or
u = 0,
s(x)u00 = 0,
or
u0 = 0,
[s(x)u00 ] = 0,
u(b) + u0 (b) = 0,
where and are not both zero, and and are not both zero.
As shown in the previous section, for the SturmLiouville differential operator:
1. The eigenvalues are distinct and nonnegative.
2. The eigenfunctions un (x) are orthogonal with respect to the weight function
r(x), such that
Z b
hun , um i =
r(x)un (x)um (x)dx = 0, m =
6 n.
a
Recall that the norm of un (x) is also defined with respect to the weight function
Z b
2
kun k =
r(x)u2n (x)dx.
a
Solutions to the eigenproblems associated with the SturmLiouville differential operator with various p(x), q(x), and r(x) and having appropriate boundary
conditions produce several common eigenfunctions, for example Fourier Series,
Legendre Polynomials, Bessel Functions, and Chebyshev Polynomials (see, for
example, Jeffrey 2002 and Asmar 2005).
112
Fourier Series:
We have already found in Section 3.2 that for eigenproblems of the form
d2 u
+ u = 0, 0 x 1,
dx2
u(0) = 0, u(1) = 0,
the eigenfunctions of the differential operator produce a Fourier sine series
un (x) = an sin(nx),
n = 1, 2, 3, . . . .
n = 0, 1, 2, . . . .
q(x) =
2
,
x
r(x) = x,
= 2 .
Bessel functions are orthogonal over the interval 0 x 1 with respect to the
weight function r(x) = x, and the Bessel equation arises when solving partial
differential equations involving the Laplacian operator 2 in cylindrical coordinates (see Section 3.6.2).
Legendre Polynomials:
0.5
113
J0
J1
Y0
10
12
14
0.5
Y1
1
Legendre polynomials arise as eigenfunctions of the differential operator associated with the equation
d
2 du
(1 x )
+ ( + 1)u = 0, 1 x 1,
dx
dx
where in the SturmLiouville equation
p(x) = 1 x2 ,
q(x) = 0,
r(x) = 1,
= ( + 1).
P1 (x) = x,
1
P2 (x) = (3x2 1), . . .
2
q(x) = 0,
r(x) = (1 x2 )1/2 ,
= 2.
114
T1 (x) = x,
T2 (x) = 2x2 1, . . .
d2 u dp du
+
+ [q(x) + r(x)] u = 0,
dx2 dx dx
or
d2 u
1 dp du
q(x)
r(x)
+
+
+
u = 0.
dx2 p(x) dx dx
p(x)
p(x)
a1
dx;
a0
Z
a1 (x)
dx ,
a0 (x)
ln p =
thus,
p(x) = exp
(3.18)
115
a2 (x)
p(x),
a0 (x)
r(x) =
a3 (x)
p(x).
a0 (x)
(3.19)
Note that a3 (x) 6= 0 for differential eigenproblems. Using equations (3.18) and
(3.19), we can convert any secondorder linear eigenproblem of the form (3.17)
into the selfadjoint SturmLiouville form (3.15).
NonHomogeneous Equations:
Just as in Section 3.2, solutions to nonhomogeneous forms of differential equations with the above operators are obtained by expanding both the solution and
the right hand side in terms of the eigenfunctions of the differential operators
and determining the coefficients in the expansions.
We expand the right hand side f (x) in terms of eigenfunctions un (x), n =
0, 1, 2, . . . according to
f (x) =
n=0
To determine the coefficients bn , evaluate the inner product hum (x), f (x)i, which
is equivalent to multiplying the above expression by r(x)um (x) and integrating
over the interval a x b. If the eigenfunctions are orthogonal, then all of the
terms in the expansion with n 6= m are zero leaving that with n = m providing
Z b
Z b
2
r(x)um (x)f (x)dx = bm
r(x) [um (x)] dx,
a
or
hum (x), f (x)i = bm kum (x)k2 .
Thus, the coefficients in the expansion for f (x) are
bn =
Note that if the eigenfunctions are normalized with respect to the weight function
r(x), then kun (x)k2 = 1. Once again, it is the orthogonality of the eigenfunctions
that makes this approach possible.
116
Figure 3.6 Domain for Laplace equation with n indicating outward facing
normals to the boundary.
(3.20)
u
specified on each boundary, where n represents
n
the normal to the boundary (see Figure 3.6).
It is supposed that the solution u(x, y) can be written as the product of two
functions, one a function of x only and one a function of y only, thereby separating
the variables, as follows
with boundary conditions u or
u(x, y) = (x)(y).
(3.21)
d2
d2
+
= 0,
dx2
dy 2
= .
(x) dx2
(y) dy 2
Because the lefthand side is a function of x only, and the righthand side is a
function of y only, the equation must be equal to a constant, say , as x and y
may be varied independently. Thus, we have two equations
1 d2
=
(x) dx2
d2
(x) = 0,
dx2
(3.22)
and
1 d2
=
(y) dy 2
d2
+ (y) = 0.
dy 2
(3.23)
117
Observe that when it applies, that is, when an equation is separable in the
sense shown above, the method of separation of variables allows one to convert a
partial differential equation into a set of ordinary differential equations. Only the
differential equation with two homogeneous boundary conditions is an eigenproblem. Solve this one first, and then solve the other ordinary differential equation
using the same values of the eigenvalues from the eigenproblem.5
Example 3.6 Consider the temperature distribution u(x, y) due to conduction
in the rectangular domain given in Figure 3.7. Heat conduction is governed by
Laplaces equation (3.20), and the boundary conditions are
u=0
at x = 0, x = a, y = 0,
u = f (x)
at
y = b.
(3.24)
That is, the temperature is zero on three sides and some specified distribution of
x on the fourth side.
Solution: Separating variables as in (3.21) leads to the ordinary differential equations (3.22) and (3.23). We first consider the eigenproblem, which is the equation
having two homogeneous boundary conditions. Thus, consider equation (3.22)
d2
= 0,
dx2
which for = 2 < 0 ( = 0 and = +2 > 0 produce trivial solutions) has
the solution
(x) = c1 cos(x) + c2 sin(x).
(3.25)
The method of separation of variables is also used in Section 2.2 of Complex Variables.
118
n
,
a
n = 1, 2, . . . .
(3.26)
n
x ,
a
n = 1, 2, . . . ,
(3.27)
where c2 is arbitrary and set equal to one for convenience. Now consider equation
2
(3.23), recalling that n = 2n = n
,
a
d2
2n = 0,
dy 2
which has the solution6
n (y) = c3 cosh
n
n
y + c4 sinh
y .
a
a
(3.28)
X
n=1
un (x, y) =
n (x)n (y) =
n=1
X
n=1
cn sin
n
n
x sinh
y , (3.29)
a
a
which isP
essentially an eigenfunction expansion with variable coefficients n (y) (cf.
(x) = cn n (x)). We determine the cn coefficients by applying the remaining
boundary condition (3.24) at y = b as follows
u(x, b) =
n=1
Recognizing that the n (b) are constants, taking the inner product of the eigenfunctions m (x) with both sides gives
kn (x)k2 n (b) = hf (x), n (x)i ,
where all the terms in the summation on the lefthand side vanish owing to
orthogonality of eigenfunctions except when m = n. Then with kn k2 = a/2,
solving for the constants n (b) yields
Z
n
2 a
hf (x), n (x)i
=
f
(x)
sin
x dx, n = 1, 2, . . . ,
(3.30)
n (b) =
kn (x)k2
a 0
a
which are the Fourier sine coefficients of f (x). As before, if we had chosen the c2
6
We typically prefer trigonometric functions for finite domains and exponential functions for infinite
or semiinfinite domains.
119
n (b)
;
sinh nb
a
n sinh
u(x, y) =
un (x, y) =
n (x)n (y) =
n (b) sin
x
a
sinh
n=1
n=1
n=1
n
y
a
nb
a
(3.31)
where n (b) are the Fourier sine coefficients of f (x) obtained from equation (3.30).
For example, consider the case with a = b = 1, and f (x) = 1. Then equation
(3.30) becomes
Z 1
2
n (1) = 2
sin (nx) dx =
[1 cos(n)] , n = 1, 2, . . . ,
n
0
4
for n odd. Therefore, let us define a new index
which is zero for n even, and n
according to n = 2m + 1, m = 0, 1, 2, 3, . . ., in which case
m (1) =
4
.
(2m + 1)
A contour plot of this solution is shown in Figure 3.8, where the contours represent
constant temperature isotherms in heat conduction.
Remarks:
1. The above approach works when three of the four sides have homogeneous
boundary conditions. Because the equations are linear, more general cases
may be treated using superposition. For example, see Figure 3.9 for an example
having two nonhomogeneous boundary conditions.
2. When we obtain the eigenfunctions
un (x, y) = n (x)n (y),
they each satisfy the Laplace equation individually for n = 1, 2, 3, . . .. Because the Laplace equation is linear, we obtain the most general solution by
superimposing these solutions according to
X
X
u(x, y) =
un (x, y) =
n (x)n (y).
120
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
(3.32)
(3.33)
121
u(`, t) = 0.
(3.34)
The unsteady diffusion equation may also be solved using the method of separation of variables in certain cases. As with the Laplace equation, we seek a
solution comprised of the product of two functions, the first a function of time
only and the second a function of space only according to
u(x, t) = (x)(t).
(3.35)
1 d
1 d2
=
= = 2 ,
dt
dx2
where we note that = 0 and > 0 lead to trivial solutions. Consequently, the
single partial differential equation (3.32) becomes the two ordinary differential
equations
d2
+ 2 = 0,
(3.36)
dx2
d
+ 2 = 0.
dt
(3.37)
Equation (3.36) along with the homogeneous boundary conditions (3.34) represents the eigenproblem for (x). The general solution to equation (3.36) is
(x) = c1 cos(x) + c2 sin(x).
(3.38)
From the boundary condition u(0, t) = 0, we must have (0) = 0, which requires
that c1 = 0. Then from u(`, t) = 0, we must have (`) = 0, which gives the
characteristic equation
sin(n `) = 0;
therefore,
n
, n = 1, 2, 3, . . . ,
`
from which we obtain the eigenvalues
n 2
n = 2n =
, n = 1, 2, 3, . . . .
`
If we let c2 = 1 for convenience, the eigenfunctions are
n
n (x) = sin
x , n = 1, 2, 3, . . . .
`
n =
(3.39)
(3.40)
(3.41)
122
Let us now consider equation (3.37) with the result (3.39). The solution to this
firstorder ordinary differential equation is
n (t) = cn exp(2n t),
n = 1, 2, 3, . . . .
n
X
X
n2 2
u(x, t) =
n (x)n (t) =
cn exp 2 t sin
x ,
`
`
n=1
n=1
(3.42)
n = 1, 2, 3, . . . .
(3.43)
The cn coefficients are determined through application of the initial condition
(3.33) applied at t = 0, which requires that
n
X
u(x, 0) =
cn sin
x = f (x).
`
n=1
Alternatively, we may write this as
cn n (x) = f (x).
n=1
Taking the inner product of the eigenfunctions m (x) with both sides, the only
nonvanishing term occurs when m = n. Therefore,
cn kn (x)k2 = hf (x), n (x)i ,
which gives the coefficients as
Z
n
2 `
hf (x), n (x)i
=
f
(x)
sin
x dx,
cn =
kn (x)k2
` 0
`
n = 1, 2, 3, . . . .
(3.44)
These are the Fourier sine coefficients of the initial condition f (x). Thus, the
eigenfunction solution is given by equation (3.43) with the coefficients (3.44).
Remarks:
1. It is rather remarkable that we have been able to extend and generalize methods developed to solve algebraic systems of equations to solve ordinary differential equations and now partial differential equations. This is the remarkable
power of mathematics!
2. For much more on SturmLiouville and eigenfunction theory, see Asmar (2005).
For example, see Section 11.1 for application to Schrodingers equation of quantum mechanics, and Section 3.9 for application to nonhomogeneous partial
differential equations, such as the Poisson equation.
3. For additional examples of the method of separation of variables for partial
differential equations and application to continuous systems governed by the
wave equation, see Section 20.1.
4. Observe that all the partial differential equations considered here have been
on bounded domains, in which case separation of variables with eigenfunction
expansions provides a solution in some cases. For partial differential equations
123
Problem Set # 5
4
Vector and Matrix Calculus
125
Figure 4.1 Tetrahedral element of a solid or fluid with the stresses on each
of the three orthogonal faces to counterbalance the force on the inclined face.
xx xy xz
= yx yy yz .
zx zy zz
The ii components on the diagonal are the normal stresses and the ij components are the shear stresses on an infinitesimally small tetrahedral element
of a solid or fluid substance (see Figure 4.1). In general, such tensors have nine
components; however, equilibrium requires that they be symmetric, in which case
ij = ji and there are only six unique components. A scalar is a rank zero tensor,
a vector in three dimensions is a rank one tensor, and a tensor in three dimensions
is a rank two tensor.
Before proceeding to the derivative operators, let us first consider further the
two most commonly used coordinate systems, namely Cartesian and cylindrical.
In the Cartesian coordinate system (see Figure 4.2), the unit vectors in the x, y,
and z coordinate directions are i, j, and k, respectively. Therefore, a vector in
threedimensional Cartesian coordinates is given by
a = ax i + ay j + az k,
where, for example, ax is the component of the vector a in the xdirection, which
is given by ha, ii = a i. Note that in the Cartesian coordinate system, the unit
vectors i, j, and k are independent of changes in x, y, and z. Therefore, derivatives
of the unit vectors with respect to each of the coordinate directions vanish. For
126
example,
a
x
(ax i + ay j + az k) ,
x
0
0
0
7
i ay
j az
k
ax
i + ax +
j + ay +
k + az ,
=
x
x
x
x
x
x
a
ax
ay
az
=
i+
j+
k.
x
x
x
x
Note that this is the same as simply taking the partial derivative with respect to
x of each of the components of a.
Cylindrical coordinates, which are shown in Figure 4.3, requires a bit more
care. A vector in threedimensional cylindrical coordinates is given by
=
a = ar er + a e + az ez .
In terms of Cartesian coordinates, the unit vectors are
er = cos i + sin j,
e = sin i + cos j,
ez = k.
Note that the unit vector ez is independent of r, , and z, and the unit vectors
er and e are independent of r and z; however, they are dependent on changes
in . For example,
er
=
(cos i + sin j) = sin i + cos j = e .
Similarly,
e
=
( sin i + cos j) = cos i sin j = er .
er
= 0,
z
e
= 0,
r
e
= er ,
e
= 0,
z
ez
= 0,
r
ez
= 0,
ez
= 0.
z
127
The key in cylindrical coordinates is to remember that one must first take the
derivative before performing vector operations. For an example, see Example 4.1.
i+
j+
k.
x
y
z
(4.1)
It is a vector operator that operates on scalars, such as (x, y, z), and vectors,
such as f (x, y, z), in three distinct ways. The first is denoted by , and is called
the gradient, which is the spatial rate of change of the scalar field (x, y, z). The
second is denoted by f and is called the divergence. The third is denoted by
f and is called the curl. The latter two are spatial rates of change of the
vector field f (x, y, z). We first focus on these operations in Cartesian coordinates
and then generalize to cylindrical and spherical coordinates.
Gradient (grad )
In Cartesian coordinates, the gradient of the scalar field (x, y, z) is
i+
j+
k =
i+
j+
k.
x
y
z
x
y
z
(4.2)
This is a vector field that represents the direction and magnitude of the greatest
spatial rate of change of (x, y, z) at each point. That is, it indicates the steepest
slope of the scalar field at each point as illustrated in Figure 4.4.
1
It is important to realize that when writing vectors containing dertivatives in this manner, we do
not mean to imply that in the first term, for example, that /x is operating on the unit vector i; it
is merely the xcomponent of the vector. This is why some authors place the unit vector before the
respective components to prevent any such confusion.
128
Divergence (div f )
In Cartesian coordinates, the divergence of the vector field f (x, y, z) is the inner
(dot) product of the gradient and force vectors as follows:
fx fy
fz
i+
j+
k (fx i + fy j + fz k) =
+
+
.
(4.3)
f =
x
y
z
x
y
z
This is a scalar field that represents the net normal component of the vector field
f passing out through the surface of an infinitesimal volume surrounding each
point. It indicates the expansion of a vector field.
Curl (curl f )
In Cartesian coordinates, the curl of the vector field f (x, y, z) is the corss product
of the gradient and force vectors as follows:
i
j k
fz
fy
fz
fx
fy
fx
j+
k.
f = x y z =
y
z
x
z
x
y
f
fy fz
x
(4.4)
Recall that we use a cofactor expansion about the first row to evaluate the determinant. This is a vector field that represents the net tangential component of
the vector field f along the surface of an infinitesimal volume surrounding each
point. It indicates the rotational characteristics of a vector field. For example,
the curl of a force field is the moment or torque, and the curl of a velocity field
yields the rate of rotation of the particles.
Laplacian
In Cartesian coordinates, the Laplacian operator is
2
2
2
2
i+
j+
k
i+
j+
k =
+
+
.
==
x
y
z
x
y
z
x2 y 2 z 2
(4.5)
The Laplacian can operate on both scalars and vectors to produce a scalar or
vector expression, respectively. For example, 2 = 0 is the Laplace equation,
which we have encountered several times already.
Gradient, Divergence, Curl, and Laplacian in Curvilinear Orthogonal
Coordinates
If is a scalar function and f = f1 e1 + f2 e2 + f3 e3 is a vector function of orthogonal curvilinear coordinates u1 , u2 , and u3 with e1 , e2 , and e3 being unit vectors
in the direction of increasing u1 , u2 , and u3 , respectively, then the gradient, divergence, curl, and Laplacian are defined as follows:
Gradient:
=
1
1
1
e1 +
e2 +
e3 .
h1 u1
h2 u2
h3 u3
(4.6)
129
Divergence:
1
(h2 h3 f1 ) +
(h3 h1 f2 ) +
(h1 h2 f3 ) .
f =
h1 h2 h3 u1
u2
u3
(4.7)
Curl:
h1 e1
h2 e2
h3 e3
1
/u1 /u2 /u3 .
f =
h1 h2 h3
h1 f1
h2 f2
h3 f3
(4.8)
Laplacian:
1
h2 h3
h3 h1
h1 h2
+
+
.
h1 h2 h3 u1
h1 u1
u2
h2 u2
u3
h3 u3
(4.9)
In the above expressions h1 , h2 , and h3 are the scale factors, which are given by
r
r
r
,
h1 =
, h2 =
, h3 =
u1
u2
u3
2 =
where r represents the position vector of a point in space. In Cartesian coordinates, for example, r = xi + yj + zk.
Below we provide the ui , hi , and ei for Cartesian, cylindrical, and spherical
coordinates for use in equations (4.6)(4.9).
Cartesian Coordinates: (x, y, z)
u1 = x h1 = 1 e1 = i
u2 = y h2 = 1 e2 = j
u3 = z h3 = 1 e3 = k
Cylindrical Coordinates: (r, , z)
u1 = r h1 = 1 e1 = er
u2 = h2 = r e2 = e
u3 = z h3 = 1 e3 = ez
Spherical Coordinates: (R, , )
u1 = R h1 = 1
e1 = eR
u2 = h2 = R
e2 = e
u3 = h3 = R sin e3 = e
The following properties can be established for the gradient, divergence, and
130
curl of scalar and vector functions. The curl of the gradient of any scalar vanishes,
that is,
= 0.
The divergence of the curl of any vector vanishes according to
( f ) = 0.
Note also that the divergence and curl of the product of a scalar and vector obey
the product rule. For example,
(f ) = f + f ,
(f ) = f + f .
In these relationships, recall that the gradient of a scalar is a vector, the curl of
a vector is a vector, and the divergence of a vector is a scalar.
Example 4.1
er +
e +
ez (fr er )
f =
r
r
z
1
(fr er ) + e
(fr er ) + ez
(fr er )
= er
r
r
z
0
0
7
7
er
fr er
er
= er fr + e
+ ez fr
r
r
z
fr
= e e
r
fr
f =
,
r
where we have used the fact that er / = e obtained above. If we would have
first carried out the dot products before differentiating, we would have obtained
f = fr /r = 0, which is incorrect.
131
Divergence Theorem
Let f be any continuously differentiable vector field in a volume V surrounded
by a surface S, then
I
Z
f dA =
f dV ,
(4.10)
V
f dA =
f dV .
(4.12)
Stokes theorem transforms a line integral over the bounding curve to an area
integral and relates the line integral of the tangential component of f over the
132
x0
closed curve C to the integral of the normal component of the curl of f over the
area A.
at x = x0 ,
This material is from Variational Methods with Applications in Science and Engineering, Section
1.4.
133
or equivalently
df =
df
dx = 0
dx
at x = x0 ,
where df is the total differential of f (x). That is, the slope of f (x) is zero at
x = x0 . The following possibilities exist at a stationary point:
1. If d2 f /dx2 < 0 at x0 , the function f (x) has a local maximum at x = x0 .
2. If d2 f /dx2 > 0 at x0 , the function f (x) has a local minimum at x = x0 .
3. If d2 f /dx2 = 0, then f (x) may still have a local minimum (for example,
f (x) = x4 at x = 0) or a local maximum (for example, f (x) = x4 at x = 0),
or it may have neither (for example, f (x) = x3 at x = 0).
The important point here is that the requirement that f 0 (x0 ) = 0, in which case
x0 is a stationary point, provides a necessary condition for a local extremum,
while possibilities (1) and (2) provide additional sufficient conditions for a local
extremum at x0 .
Now consider the twodimensional function f (x, y). For an extremum to occur
at (x0 , y0 ), it is necessary (but not sufficient) that
f
f
=
=0
x
y
at x = x0 , y = y0 ,
or equivalently
df =
f
f
dx +
dy = 0
x
y
at x = x0 , y = y0 ,
where df is the total differential of f (x, y). The point (x0 , y0 ) is a stationary point
of f (x, y) if df = 0 at (x0 , y0 ), for which the rate of change of f (x, y) at (x0 , y0 )
in all directions is zero. At a stationary point (x0 , y0 ), the following possibilities
exist, where subscripts denote partial differentiation with respect to the indicated
variable:
1.
2.
3.
4.
If
If
If
If
2
fxx fyy fxy
2
fxx fyy fxy
2
fxx fyy fxy
2
fxx fyy fxy
>0
>0
<0
=0
134
f
f
f
dx1 +
dx2 + +
dxn = 0.
x1
x2
xn
fx1 x1 fx1 x2
fx2 x1 fx2 x2
H = ..
..
.
.
fxn x1
fxn x2
by
fx1 xn
fx2 xn
.. .
..
.
.
fxn xn
If all of the derivatives of f are continuous, then the Hessian matrix is symmetric.
Using this Hessian matrix, the secondderivative test is used to determine what
type of extrema exists at each stationary point as follows:
1. If the Hessian matrix is negative definite (all eigenvalues are negative), then
the function f (x1 , . . . , xn ) has a local maximum.
2. If the Hessian matrix is positive definite (all eigenvalues are positive), then
the function f (x1 , . . . , xn ) has a local minimum.
3. If the Hessian matrix has both positive and negative eigenvalues, then the
function f (x1 , . . . , xn ) has a saddle point.
4. If the Hessian matrix is semidefinite (at least one eigenvalue is zero), then the
secondderivative test is inconclusive.
Note that the cases with f (x) and f (x, y) considered above are special cases of
this more general result.
Example 4.2 Obtain the location (x1 , x2 , x3 ) at which a minimum value of the
function
f1 (x1 , x2 , x3 )
x2 + 6x1 x2 + 4x22 + x23
= 1
f (x1 , x2 , x3 ) =
f2 (x1 , x2 , x3 )
x21 + x22 + x23
occurs.
Solution: At a point (x1 , x2 , x3 ) where f has zero slope, we have
f
= 0,
xi
i = 1, 2, 3.
= 0,
f2 xi f2 xi
i = 1, 2, 3,
or letting = f = f1 /f2
f1
f2
= 0,
xi
xi
i = 1, 2, 3.
135
Substituting
f1 = x21 + 6x1 x2 + 4x22 + x23 ,
and evaluating the partial derivatives produces three equations for the three unknown coordinate values. In matrix form, these equations are
1
3
0
x1
0
3
4
0 x2 = 0 .
0
0
1 x3
0
For a nontrivial solution, the required eigenvalues are
1
1
1 = 1, 2 = (5 + 3 5), 3 = (5 3 5),
2
2
and the corresponding eigenvectors are
4
4
0
3 + 61 (5 + 3 5)
3 + 61 (5 3 5)
, u3 =
.
u1 = 0 , u2 =
1
1
1
0
0
(4.14)
Because dx, dy, and dz are arbitrary (x, y, and z are independent variables), this
requires that
fx = 0,
fy = 0,
fz = 0,
This material is from Variational Methods with Applications in Science and Engineering, Section
1.5.
136
(4.15)
which is zero because g is equal to a constant. Because both (4.14) and (4.15)
equal zero at (x0 , y0 , z0 ), it follows that we can add them according to
df + dg = (fx + gx )dx + (fy + gy )dy + (fz + gz )dz = 0,
(4.16)
fy + gy = 0,
fz + gz = 0,
g = c.
Most authors use to denote Lagrange multipliers. Throughout this text, however, we use to
denote eigenvalues (as is common) and for Lagrange multipliers.
Example 4.3
by
137
where is the Lagrange multiplier and is multiplied by the constraint that the
138
because A is real and symmetric with distinct eigenvalues, the eigenvectors are
mutually orthogonal.
In order to determine which eigenvectors correspond to the semimajor and
semiminor axes, we recognize that a point on the ellipse must satisfy
(x1 + x2 )2 + 2(x1 x2 )2 = 8.
Considering uT1 = [1 1], let us set x1 = c1 and x2 = c1 . Substituting into the
equation for the ellipse yields
4c21 + 0 = 8,
in which
1 = 2 and
case c1 = 2. Therefore, x1 = 2 and x2 =p 22 or (x
x2 = 2), and the length of the corresponding axis is x1 + x22 = 2. Similarly,
considering uT2 = [1 1], let us set x1 = c2 and x2 = c2 . Substituting into the
equation for the ellipse yields
0 + 8c22 = 8,
in which case c2 = 1. Therefore, x1 = 1 and
px2 = 1 (or x1 = 1 and x2 = 1),
and the length of the corresponding axis is x21 + x22 = 2. As a result, the
eigenvector u1 corresponds to the semimajor axis, and u2 corresponds to the
semiminor axis.
(4.17)
139
(4.18)
where the minus sign is for convenience. A necessary condition for an extremum of
a function is that its derivative with respect to each of the independent variables
must be zero. This requires that
J
= 0,
xi
i = 1, 2, . . . , n.
(4.19)
Applying (4.19) to the augmented function (4.18) yields the result that
AT = c
must be satisfied.5 This is a system of linear algebraic equations for the Lagrange
multipliers . However, we do not seek the Lagrange multipliers; we seek the
stationary solution(s) x that minimize the objective function and satisfy the constraints.
Remarks:
1. The reason why the usual approach does not work for linear programming is
because we are dealing with straight lines (in two dimensions), planes (in three
dimensions), and linear functions (in n dimensions) that do not in general have
finite maximums and minimums.
2. Thankfully, the standard approach described in the previous sections do work
for quadratic programming and the leastsquares method, which are covered
in the following sections.
3. Linear programming requires introduction of additional techniques that do not
simply follow from differentiating the algebraic functions and setting equal to
zero. These will be covered in Chapter 8.
We do not include the details here because the method does not work. The details for similar
problems (that do work) will be given in subsequent sections.
140
(4.20)
(4.21)
i = 1, 2, . . . , n.
(4.23)
A11 A12 A1n
x1
A12 A22 A2n x2
xT Ax = x1 x2 xn ..
..
.. ..
..
.
.
.
. .
A1n A2n Ann xn
xT Ax = A11 x21 + A22 x22 + . . . + Ann x2n
+2A12 x1 x2 + 2A13 x1 x3 + . . . + 2An1,n xn1 xn ,
where each term is quadratic in xi . Differentiating with respect to xi , i = 1, 2, . . . , n
6
Note that a minimum can be changed to a maximum or vice versa simply by changing the sign of
the objective function.
We use for the Lagrange multiplier here because it turns out to be the eigenvalues of an
eigenproblem.
141
yields
..
.
xT Ax
xi
..
.
which is simply 2Ax. For the augmented function (4.22), therefore, equation
(4.23) leads to
2 (Ax Bx) = 0.
Because of the constraint (4.21), we are seeking nontrivial solutions for x.
Such solutions correspond to solving the generalized eigenproblem
Ax = Bx,
(4.24)
where the eigenvalues are the values of the Lagrange multiplier, and the eigenvectors x are the candidate stationary points for which (4.20) is an extremum.
The resulting eigenvectors must be checked against the constraint (4.21) to be
sure that it is satisfied and are thus stationary points. To determine whether
the stationary points are a minimum or maximum would require checking the
behavior of the second derivatives. In the special case when B = I, we have the
regular eigenproblem
Ax = x.
Example 4.4
(4.25)
(4.26)
142
1
.
x2 = c2
1
Observe that x1 does not satisfy the constraint for any c1 , and that c2 = 1
in order for x2 to satisfy the constraint. Therefore, the stationary points are
(x1 , x2 ) = (1, 1) and (x1 , x2 ) = (1, 1) corresponding to = 2. It can be
confirmed that both of these points are local minimums.
Let us reconsider Example 4.3 involving determining the major and minor axes
of an ellipse.
Example 4.5
(4.27)
(4.28)
143
(4.29)
which, for a given coefficient matrix A and righthand side vector c, is essentially
is to satisfying the system of equations.8
a measure of how close the vector x
Hence, one could define the solution of an overdetermined system to be the
that comes closest to satisfying the system of equations, that is, the one
vector x
with the smallest residual. As before, we measure size of vectors using the norm.
Thus, we seek to minimize the square of the norm of the residual, which is the
scalar quantity
J(
x) = krk2 = kc A
xk2 .
(4.30)
8
If the system did have an exact solution, then the residual of the exact solution would be zero.
144
This is the objective function. Because the norm involves the sum of squares, and
we are seeking a minimum, this is known as the leastsquares solution.
In order to minimize the objective function (4.30), we will need to differentiate
. In preparation, let us expand the square of the
with respect to the solution x
norm of the residual as follows:
J(
x) = krk2
= kc A
xk2
T
= (c A
x) (c A
x)
h
i
T
= cT (A
x) (c A
x)
T AT ) (c A
= (cT x
x)
T AT c + x
T AT A
= cT c cT A
xx
x
T
T AT A
J(
x) = cT c cT A
x (cT A
x) + x
x.
Note that each of these four terms are scalars; thus, the transpose can be removed
on the third term. This yields
T AT A
J(
x) = cT c 2cT A
x+x
x.
(4.31)
T AT A
x
x = 2AT A
x,
x
and we will show that
cT A
x = AT c.
(4.32)
x
yields
Therefore, differentiating equation (4.31) with respect to x
J(
x) = 2AT c + 2AT A
x.
x
Setting equal to zero gives
AT A
x = AT c.
(4.33)
145
A11 A12 A1n
x1
A21 A22 A2n x2
cT A
x = c1 c2 cm ..
..
.. ..
..
.
.
.
. .
Am1 Am2 Amn xn
= (c1 A11 + c2 A21 + + cm Am1 ) x1
+ (c1 A12 + c2 A22 + + cm Am2 ) x2
..
.
+ (c1 A1n + c2 A2n + + cm Amn ) xn .
Differentiating with respect to each of the variables xi , i = 1, 2, . . . , n, we have
cT A
x = c1 A11 + c2 A21 + + cm Am1 ,
x1
cT A
x = c1 A12 + c2 A22 + + cm Am2 ,
x2
..
.
cT A
x = c1 A1n + c2 A2n + + cm Amn .
xn
cT A
x = AT c.
146
Example 4.6 Determine the leastsquares solution of the overdetermined system of linear algebraic equations Ax = c, where
1
2 0
1
1
, c = 0 .
A=
1
0 2
Solution: To evaluate equation (4.34), observe that
2 0
2 1 0
5 1
T
1 1 =
A A=
,
0 1 2
1 5
0 2
which is symmetric as expected. Then the inverse is
1
1 5 1
T
A A
=
.
24 1 5
Also
1
2 1 0
2
0 =
AT c =
.
0 1 2
2
1
=
=
.
x= A A
A c=
24 1 5 2
24 8
3 1
We will encounter the versatile leastsquares method in several contexts throughout the remainder of the text. For example, ....
4.3.2 Underdetermined Systems (r < n)
Can we use a similar technique to obtain a unique solution to systems that have
an infinite number of solutions? Such systems are called underdetermined and
have more unknowns than linearlyindependent equations, that is, the rank is
less than the number of unknowns. We can use the leastsquares method as in
the previous section, but with two key modifications.
Consider a system of equations Ax = c for which the rank of A is less than the
number of unknowns. Because this system has an infinite number of solutions, the
task now is to determine the unique solution that satisfies an additional criteria;
that is closest to the origin. Again, we use the norm
we seek the solution x = x
to quantify this distance. Thus, we seek to minimize the square of the norm of x
J(
x) = k
xk2 ,
(4.35)
which is now our objective function. Of all the solutions x, we seek the one x
that satisfies the original system of equations Ax = c and minimizes the objective
function. Using the Lagrange multiplier method, we form the augmented function
x) = k
J(
xk2 + T (c A
x) ,
(4.36)
147
J(
x) = 2
x AT .
x
Setting the derivative equal to zero, we can write
1
= AT .
x
(4.37)
2
Substituting equation (4.37) into the original system of equations c = A
x gives
1
c = AAT .
2
If AAT is invertible, then
= 2 AAT
1
c.
Substituting into equation (4.37) eliminates the Lagrange multiplier and provides
the leastsquares solution
1
= AT AAT
c.
(4.38)
x
that satisfies Ax = c and minimizes J(
This is the x
x) = k
xk2 .
Part II
Numerical Methods
149
5
Introduction to Numerical Methods
So long as a man remains a gregarious and sociable being, he cannot cut himself off
from the gratification of the instinct of imparting what he is learning, of propagating
through others the ideas and impressions seething in his own brain, without stunting
and atrophying his moral nature and drying up the surest sources of his future
intellectual replenishment.
(James Joseph Sylvester)
The advent of the digital computer in the middle of the twentieth century
ushered in a true revolution in scientific and engineering research and practice.
Progressively more sophisticated algorithms and software paired with ever more
powerful hardware continues to provide for increasingly realistic simulations of
physical, chemical, and biological systems. This revolution depends upon the
remarkable efficiency of the computer algorithms that have been developed to
solve larger and larger systems of equations, thereby building on the fundamental
mathematics of vectors and matrices covered in Chapters 1 and 2.
In fact, one of the most common and important uses of matrix methods in
modern science and engineering contexts is in the development and execution
of numerical methods. Although numerical methods draw significantly from the
mathematics of linear algebra, they have not typically been considered subsets
of matrix methods or linear algebra. However, this association is quite natural
both in terms of applications and the methods themselves. Therefore, the reader
will benefit from learning these numerical methods in association with the mathematics of vectors, matrices, and differential operators.
The subsequent chapters of Part II provide a comprehensive treatment of numerical methods spanning the solution of systems of linear algebraic equations
and the eigenproblem to solution of ordinary and partial differential equations,
which is so central to scientific and engineering research and practice.
152
Analog and digital computers improvements in algorithms and software layered on top of improvements in hardware (Moores law).
Technological milestones:
Hardware: vacuum tubes (1940s 1960s), transistors and integrated circuits
(1960s Present)
Software: FORTRAN (FORmula TRANslation) (developed 1954), Unix (1970)
Eras of calculations:
153
Analytical:
+
+
+
Computational:
+ Address more complex problems, including physics and geometries.
+ Provides detailed solutions from which a good understanding of the physics
can be discerned.
+ Can easily try different configurations, for example, geometry, boundary conditions, etc., which is important in design.
+ Computers are becoming faster and cheaper; therefore, the range of applicability of computational mechanics continues to expand.
+ More cost effective and faster than experimental prototyping.
Requires accurate governing equations.
Boundary conditions are sometimes difficult to implement.
Difficult to do in certain parameter regimes, for example, highly nonlinear
physics.
Experimental:
+ Easier to get overall quantities for problem, for example, lift and drag on an
airfoil.
+ No modeling or assumptions necessary.
Often requires intrusive measurement probes.
Limited measurement accuracy.
Some quantities are difficult to measure, for example, the stress in the interior
of beam.
Experimental equipment often expensive.
Difficult and costly to test fullscale models.
As an example of how the rise of computational methods is altering engineering
practice, consider the typical design process. Figure 5.1 shows the traditional
approach used before computers became ubiquitous, where the heart of the design
process consists of iteration between the design and physical prototyping stages
until a satisfactory product or process is obtained. This approach can be both
time consuming and costly as it involves repeated prototype building and testing.
The modern approach to design is illustrated in Figure 5.2, where computational
modeling is inserted in two steps of the design process. Along with theoretical
modeling and experimental testing, computational modeling can be used in the
early stages of the design process in order to better understand the underlying
physics of the process or product before the initial design phase. Computational
prototyping can then be used to test various designs, thereby narrowing down
the possible designs before performing physical prototyping.
154
Experimental
Testing
Iterate
Design
Physical
Prototyping
Product
Computational
Modeling
Experimental
Testing
Iterate
Design
Iterate
Computational
Prototyping
Physical
Prototyping
Product
Using computational modeling, the modern design approach provides for more
knowledge and understanding going into the initial design before the first physical prototype is built. For example, whereas the traditional approach may have
required on the order of ten to fifteen wind tunnel tests to develop a wing design,
the modern approach incorporating computational modeling may only require on
155
the order of two to four wind tunnel tests. As a bonus, computational modeling
and prototyping are generally faster and cheaper than experimental testing
and physical prototyping. This reduces timetomarket and design costs, as fewer
physical prototypes and design iterations are required, but at the same time, it
holds the potential to result in a better final product.
6
Computational Linear Algebra
157
O(A) is read is of the same order as A, or simply, order of A, and indicates that the operation
count, in this case, is directly proportional to A for large n.
158
Computational Linear Algebra
For numerical calculations, the important figure of merit is the FLOPS234 rating
of the CPU or its cores.
(6.1)
159
exactly on digital computers; in other words, all numbers, including rational numbers, are represented as decimals. This leads to roundoff errors. More specifically,
the continuous number line, with an infinity of real numbers within any interval,
has smallest intervals of 251 = 4.4 1016 when using double precision6 (thus,
any number smaller than this is effectively zero). Every calculation, for example,
addition or multiplication, introduces a roundoff error of roughly this magnitude.
Second, because of the large number of operations required to solve large systems
of algebraic equations, we must be concerned with whether these small roundoff
errors can grow to pollute the final solution that is obtained; this is the subject
of numerical stability.
When solving small to moderate sized systems of equations by hand, the primary diagnostic required is the determinant, which tells us whether the matrix
A is singular and, thus, not invertible. This is the case if A = 0. When dealing
with roundoff errors in solving large systems of equations, however, we also must
be concerned when the matrix is nearly singular, that is, when the determinant
is close to zero. How close? That is determined by the condition number. The
condition number (A) of the matrix A is defined by
(A) = kAk kA1 k,
(6.2)
max
.
min
(6.3)
bombing process in order to more accurately target the bombs. These devices were considered so
advanced that, if a bomber was shot down or to land in enemy territory, the bombardier was
instructed to first destroy the bombsight before seeking to save himself.
By default, most computers use single precision for calculations; however, double precision is
standard for numerical calculations of the sort we are interested in throughout this text.
160
(6.5)
Alternatively, one could perturb the righthand side vector c by the small amount c and evaluate
the resulting perturbation in the solution vector x. The result (6.5) is the same but with A
replaced with c.
161
holds for each row of the matrix. If the greater than sign applies, then the matrix
is said to be strictly diagonally dominant. If the equal sign applies, it is weakly
diagonally dominant. It can be shown that diagonally dominant matrices are wellconditioned. More specifically, it can be proven that for any strictly diagonally
dominant matrix A:
1. A is nonsingular; therefore, it is invertible.
2. Gaussian elimination does not require row interchanges for conditioning.
3. Computations are stable with respect to roundoff errors.
In the future, we will check our algorithms for diagonal dominance in order to
ensure that we have wellconditioned matrices.
To see the influence of condition number and diagonal dominance on roundoff
errors, see the Mathematica notebook IllConditioned Matrices and RoundOff
Error (incorporate into text).
The condition number and diagonal dominance addresses how mathematically
amenable a particular system is to providing an accurate solution. In addition, we
also must be concerned with whether the algorithm that is used to actually obtain
this mathematical solution on a computer is numerically amenable to producing
an accurate solution. This is determined by the algorithms numerical stability,
which will be discussed in Section 15.2. That is, conditioning is a mathematical
property of the algebraic system itself, and numerical stability is a property of the
numerical algorithm used to obtain its solution. See Trefethen and Bau (1997)
for much more on numerical stability of common operations and algorithms in
numerical linear algebra, including many of those treated in the remainder of this
chapter.
162
of linear algebraic equations. We begin with a series of direct methods that are
faster than Gaussian elimination. Two are decomposition methods similar to polar and singularvalue decomposition discussed in Chapter 2. These methods are
appropriate for implementation in computer algorithms that determine the solution for large systems. We then discuss iterative, or relaxation, methods for
obtaining approximate solutions of systems of linear algebraic equations, including the Jacobi, GaussSeidel, successive overrelaxation, and conjugate gradient
methods. The essential issue of whether these iterative methods converge toward
the exact solution is also addressed.
Finally, we discuss methods for systems of equations having sparse coefficient
matrices. A sparse matrix is one in which relatively few of the elements of the
matrix are nonzero. This is often the case for systems of equations that arise
from numerical methods, for example. In some cases, these sparse matrices also
have a particular structure. For example, we will encounter tridiagonal, blocktridiagonal, and other types of banded matrices.
6.2.1 LU Decomposition
LU decomposition provides a general method for solving systems governed by any
nonsingular matrix A that has several advantages over Gaussian elimination. We
decompose (factor) A into the product of two matrices
A = LU,
where L is lower triangular, and U is upper triangular. For example, if A is 4 4,
then
1
0
0 0
U11 U12 U13 U14
L21 1
0 U22 U23 U24
0 0
.
L=
L31 L32 1 0 , U = 0
0 U33 U34
L41 L42 L43 1
0
0
0 U44
From matrix multiplication, observe that
A11 = U11 ,
A12 = U12 ,
A13 = U13 ,
A14 = U14 ,
L21 =
A21
,
U11
163
1
L21
L31
..
.
0
1
L32
0
0
1
0 y1
c1
y2 c2
0
0
y3 = c3 .
. .
..
. .. ..
1 yn
cn
i1
X
Lij yj ,
i = 2, 3, . . . , n,
j=1
164
3. The above approach is called Doolittles method and results in ones on the
main diagonal of L. Alternatively, Crouts method leads to ones on the main
diagonal of U.
4. This procedure is particularly efficient if A remains the same, while c is
changed (cf. the truss example in Section 1.3). In this case, the L and U
matrices only must be determined once.
5. Sometimes it is necessary to use pivoting, in which rows of A are exchanged,
in order to avoid division by zero (or small numbers). This is analogous to
exchanging rows (equations) in Gaussian elimination.
6. Recall that the determinant of a triangular matrix is the product of the elements along the main diagonal. Because the determinant of L is unity, the
determinant of A is simply
A = U = ni=1 Uii ,
which is the product of the main diagonal elements of the triangular matrix
U.
7. For more details on the implementation of LU decomposition, see Numerical
Recipes.
Given the LU decomposition of matrix A, the inverse A1 may be obtained as
follows. Recall that
AA1 = I.
Let us consider the 3 3 case for illustration, for which we have
B11
1
B12
0
B13
0
A B21 = 0 , A B22 = 1 , A B23 = 0 .
B31
0
B32
0
B33
1
Because A does not change, these three systems of equations may be solved
efficiently using the LU decomposition of A.
6.2.2 Cholesky Decomposition
LU decomposition applies for any nonsingular matrix. If A is also positive definite,
whereby it is real symmetric (or Hermitian more generally) with all positive
eigenvalues, a special case of LU decomposition can be devised that is even more
efficient to compute. This is known as Cholesky decomposition (factorization)
and is given by
A = UT U,
165
U = ..
..
.. .
..
.
.
.
.
0
Unn
Then A = UT U is
0 U22 U2n
=
.
..
..
.
.
.
.
.
.
.
..
..
..
.. ..
..
.. ..
..
..
.
.
.
.
.
An1 An2 Ann
U1n U2n Unn
0
0 Unn
Multiplying the matrices on the righthand side, we obtain
2
A11 = U11
U11 = A11
A12 = U11 U12
..
.
U12 =
A12
U11
A1n
U11
which is the first row of U. This procedure is continued to calculate all Uij .
After completion of the Cholesky decomposition of A, the system Ax = c may
be solved using forward and backward substitution as in LU decomposition (see
previous section). If the inverse of A is desired, observe that
T
A1 = U1 U1 ,
where
U1
B11 B12
0 B22
= B = ..
..
.
.
0
0
B1n
B2n
.. ,
..
.
.
Bnn
166
6.2.3 Partitioning
Partitioning allows for determination of the inverse of a large matrix by reducing
it down to finding the inverses of many small matrices. Let us partition matrix
A into four submatrices as follows:
A11 A12
,
A=
A21 A22
where A11 and A22 are square, but A12 and A21 need not be square (but the
number of rows of A12 must equal the number of rows of A11 , and its number of
columns must equal that of A22 , for example. Now let
B11 B12
,
B = A1 =
B21 B22
where each submatrix is the same size as its corresponding submatrix in A.
Then
I1 0
AB = I =
.
0 I2
167
Multiplying
A11 A12 B11 B12
I
= 1
A21 A22 B21 B22
0
0
,
I2
we obtain
A11 B11 + A12 B21
A21 B11 + A22 B21
A11 B12 + A12 B22
A21 B12 + A22 B22
= I1
=0
.
=0
= I2
(6.7)
(6.8)
1
In this manner, we obtain the partitions of A1 , for example, B11 , by inverting portions of A then substituting into equation (6.8) to obtain B21 . Repeating
this procedure with the last two equations of (6.7) provides B12 and B22 . Because smaller is better when inverting matrices, the partitioning procedure can
be implemented recursively.
Note that Mathematica and Matlab are also capable of obtaining exact solutions using symbolic
arithmetic, which is the computer analogy to hand calculations in that no roundoff errors are
incurred.
168
x2
..
.
12
A
x
A11 2
21
x
A
A22 1
13
A
x
A11 3
1n
A
x
A11 n
1
+ Ac11
23
A
x
A22 3
2n
A
x
A22 n
2
+ Ac22
An1
An2
An3
xn = A
x1 A
x2 A
x3
nn
nn
nn
n
+ Acnn
The xi , i = 1, . . . , n, on the righthand sides are taken from the previous iteration
in order to update the solution vector on the lefthand side. In matrix form, this
and other iterative numerical schemes may be expressed as
x(r+1) = Mx(r) + c,
(6.9)
(6.10)
169
r = 0, 1, 2, . . . ,
(6.12)
170
or
x(r+1) = Mx(r) + M1
1 c,
where M = M1
1 M2 is the iteration matrix.
171
Let
D = diagonal elements of A,
L = lower triangular elements of A less main diagonal,
U = upper triangular elements of A less main diagonal.
Therefore, Ax = c becomes
(D L U)x = c.
Using this notation, the Jacobi iteration (M1 = D, M2 = L + U) is of the form
Dx(r+1) = (L + U)x(r) + c,
or
x(r+1) = D1 (L + U)x(r) + D1 c,
such that the iteration matrix is
1
M = M1
(L + U).
1 M2 = D
We then check to be sure that the spectral radius of M satisfies the requirement
that < 1 to ensure convergence of the iterative scheme according to the general
result in the previous section.
In addition to checking for iterative convergence, a smaller spectral radius
results in more rapid convergence, that is, in fewer iterations. For the Jacobi
method, the spectral radius of the iteration matrix M is
.
Jac (n) = cos
n+1
If n is large, then from the Taylor series for cosine
2
(6.13)
172
(r)
2
2
= 1
+ = 1
+ .
GS (n) = Jac (n) = cos
n+1
2 n+1
n+1
(6.14)
Consequently, the rate of convergence is twice as fast as for the Jacobi method
for large n, that is, the GaussSeidel method requires only onehalf the iterations
for the same level of accuracy.
Remark:
1. It can be shown that strong diagonal dominance of A is a sufficient, but not
necessary, condition for convergence of the Jacobi and GaussSeidel iteration
methods. That is, the spectral radius is such that < 1 for the iteration matrix
M = M1
1 M2 (see Morton and Mayers, p. 205 for proof).
xi
(r)
= (1 )xi + xi ,
(6.15)
173
where is the acceleration, or relaxation, parameter and 1 < < 2 for convergence (Morton & Mayers, p. 206). Note that = 1 corresponds to the GaussSeidel method.
In matrix form, the SOR method is given by M1 = D L, M2 = (1 )D +
U. Then
(D L)x(r+1) = [(1 )D + U] x(r) + c,
or
x(r+1) = (D L)1 [(1 )D + U] x(r) + (D L)1 c.
Therefore, the iteration matrix is
1
M = M1
[(1 )D + U] .
1 M2 = (D L)
It can be shown that the optimal value of that minimizes the spectral radius,
and consequently the number of iterations, is (see, for example, Morton and
Mayers, p. 212, and Moin, p. 146)
opt =
1+
2
,
1 2Jac
(6.16)
SOR =
1
2
n+1
opt =
2
(6.17)
2
p
1 + 1 2Jac
1+
=
1 1
1
2
n+1
2
2
2
2
1 1 n+1 +
1+
opt
!2
2
.
1 + n+1
(6.18)
174
2 2
1
2 n+1
SOR
1 + n+1
("
)2
2 #
1
1
1
+
2 n+1
n+1
2
1
+
n+1
2
SOR 1
.
n+1
(6.19)
175
(6.20)
(6.21)
This is the approach used by the builtin Mathematica and Matlab functions Eigenvalues[]/
Eigenvectors[] and eig(), respectively.
176
(6.22)
(x = Qy) .
(6.23)
(6.24)
(6.25)
T
Because Q0 is orthogonal, premultiplying equation (6.24) by Q1
0 = Q0 gives
QT0 A0 = QT0 Q0 R0 = R0 .
10
Recall that in LU decomposition, the L refers to a lower triangular matrix, and U refers to an
upper triangular matrix, whereas here it is customary to use L for left and R for right
triangular matrices. More to the point, a left triangular matrix is lower triangular, and a
right triangular matrix is upper triangular.
177
(6.26)
k = 0, 1, 2, . . . ,
(6.27)
Ppq = s,
Pqp = s,
..
c 0 0 s
0
1
0
.
..
.
..
..
P=
.
0
1
0
s 0 0 c
..
1
Observe the effect of transforming an ndimensional vector x according to the
tranformation
y = Px,
(6.28)
178
(6.29)
yq = sxp + cxq .
(6.30)
179
(6.31)
(6.32)
In this manner, the QR decomposition (6.31) and (6.32) is obtained from a series
of plane (Givens or Householder) rotations. Givens transformations are most efficient for large, sparse, structured matrices, which can be configured to only zero
the elements that are not already zero. There is a fast Givens transformation,
for which the P matrices are not orthogonal, but the QR decompositions can be
obtained two times faster than in the standard Givens transformation illustrated
here. Convergence of the iterative QR method may be accelerated using shifting
(see, for example, Numerical Recipes, Section 11.3).
The order of operations for the QR method per iteration are as follows: O(n3 )
for a dense matrix, O(n2 ) for a Hessenberg matrix, and O(n) for a tridiagonal
matrix. Thus, the most efficient procedure is as follows:
1. Transform A to a similar tridiagonal or Hessenberg form if A is symmetric or
nonsymmetric, respectively. This is done using a series of similarity transformations based on Householder rotations for dense matrices or Givens rotations
for sparse matrices.
2. Use the iterative QR method to obtain the eigenvalues of the tridiagonal or
Hessenberg matrix.
See the Mathematica notebook QRmethod.nb for an illustration of how QR
decomposition is used in an iterative algorithm to obtain the eigenvalues of a
matrix.
See Trefethen and Bau (1997) for the QR algorithm with shifting, which accelerates the convergence of the iterative method.
The iterative QR method is the workhorse of the vast majority of eigenproblem
solvers. Although we are interested in square matrices here, the QR decomposition
exists for any rectangular matrix as well. For A m n, Q is m m, and R is
m n. Carried to completion, the QR method provides approximations for the
full spectrum consisting of all eigenvalues and eigenvectors of a matrix.
Although primarily used to obtain eigenvalues and eigenvectors, the QR decomposition can also be used to determine the solution of the system of equations
Ax = c. Given the QR decomposition of A, the system of equations becomes
QRx = c,
180
i)
ii)
iii)
iv)
v)
Multiply qi = Aqi1 .
Orthonormalize qi against q1 , q2 , . . . , qi1 .
Append qi to Q.
Form the Hessenberg matrix H = QT AQ.
Determine the eigenvalues of H.
5. End Do
At each step i = 2, . . . , k, an n i orthonormal matrix Q is produced that forms
an orthonormal basis for the Krylov subspace Ki (A, q0 ). Using the projection
matrix Q, we transform A to produce an i i Hessenberg matrix H (or tridiagonal for symmetric A), which is an orthogonal projection of A onto the Krylov
subspace Ki . The eigenvalues of H, sometimes called the Ritz eigenvalues, approximate the largest i eigenvalues of A. The approximations of the eigenvalues
improve as each step is incorporated, and we obtain the approximation of one
181
additional eigenvalue.
Remarks:
1. Because k n, we only require the determination of eigenvalues of Hessenberg
matrices that are no larger than k k as opposed to the original n n matrix
A.
2. Although the outcome of each step depends upon the starting Arnoldi vector
q0 used, the procedure converges to the correct eigenvalues of matrix A for
any q0 .
3. The more sparse the matrix A is, the smaller k can be and still obtain a good
approximation of the largest k eigenvalues of A.
4. When applied to symmetric matrices, the Arnoldi method reduces to the Lanczos method.
5. A shift and invert approach can be incorporated to determine the k eigenvalues
close to a specified part of the spectrum rather than that with the largest
magnitude. For example, it can be designed to determine the k eigenvalues
with the largest real or imaginary part.
6. When seeking a set of eigenvalues in a particular portion of the full spectrum,
it is desirable that the starting Arnoldi vector q0 be in (or nearly in) the
subspace spanned by the eigenvectors corresponding to the sought after eigenvalues. As the Arnoldi method progresses, we get better approximations of the
desired eigenvectors that can then be used to form a more desirable starting
vector. This is known as the implicitly restarted Arnoldi method and is based
on the implicitlyshifted QR decomposition method. Restarting also reduces
storage requirements by keeping k small.
7. The Arnoldi method may also be adapted to solve linear systems of equations;
this is called the generalized minimal residual (GMRES) method.
8. The Arnoldi method can be designed to apply to the generalized eigenproblem
Ax = Bx,
where it is required that B be positive definite, that is, have all positive eigenvalues. The generalized eigenproblem is encountered in structural design problems in which A is called the stiffness matrix, and B is called the mass matrix.
It also arises in hydrodynamic stability.
9. Many have standardized on the Arnoldi method as implemented in ARPACK
(http://www.caam.rice.edu/software/ARPACK/). ARPACK was developed
at Rice University in the mid 1990s, first as a Fortran 77 library of subroutines, and subsequently it has been implemented as ARPACK++ for C++ .
It has been implemented in Matlab via the eigs() function, where the s denotes sparse. In addition, it has been implemented in Mathematica, where
one includes the option Method Arnoldi in the Eigenvalues[] function.
The Arnoldi method is illustrated in more detail in the Mathematica notebook
Arnoldi.nb.
182
References:
Arnoldi, W. (1951) Q. Appl. Math. 9, 17. (Did not originally apply to the
eigenproblem!)
Nayar, N. & Ortega, J. M. (1993) Computation of Selected Eigenvalues of
Generalized Eigenvalue Problems. J. Comput. Phys. 108, pp. 814.
Saad, Y. Iterative Methods for Sparse Linear Systems SIAM, Philadelphia (2003).
Radke, R. A Matlab Implementation of the Implicitly Restarted Arnoldi Method
for Solving LargeScale Eigenvalue Problems, MS Thesis, Rice University (1996).
6.4 SingularValue Decomposition
7
Nonlinear Algebraic Equations Root
Finding
183
8
Optimization of Algebraic Systems
184
9
Curve Fitting and Interpolation
185
10
Numerical Integration
186
11
FiniteDifference Methods
One of the central goals of applied mathematics is to develop methods for solving differential equations as they form the governing equations for many topics
in the sciences and engineering. In Chapter 2, we addressed the solution of systems of linear firstorder (and by extension higherorder) ordinary differential
equations by diagonalization. In Chapter 3, we used eigenfunction expansions
to develop methods for solving selfadjoint ordinary differential equations, with
extension to certain linear partial differential equations via the method of separation of variables. While these methods represent fundamental techniques in
applied mathematics, the scope of their application is very limited, primarily owing to their restriction to linear differential equations. As a complement to these
analytical techniques, therefore, numerical methods open up the full spectrum of
ordinary and partial differential equations for solution. Although these solutions
are approximate, the techniques are adaptable to ordinary differential equations
in the form of initial and boundaryvalue problems as well as large classes of
partial differential equations.
In so far as many topics in science and engineering involve solving differential equations, and in so far as very few practical problems are amenable to exact
closed form solution, numerical methods form an essential tool in the researchers
and practitioners arsenal. In this chapter, we introduce finitedifference methods
and focus on aspects common to solving both ordinary and partial differential
equations. A simple initialvalue problem and a boundaryvalue problem are used
to develop many of the ideas and provide a framework for thinking about numerical methods as applied to differential equations. The following two chapters then
address methods for ordinary and partial differential equations in turn.
188
FiniteDifference Methods
Physical System
i.e. Reality
Mathematical Model
i.e. Governing Equations
(odes or pdes)
Analytical
Solution
Discretization
Matrix Solver
Numerical Solution
physical laws include, for example, conservation of mass, momentum, and energy,
and models include any assumptions or idealizations applied in order to simplify
the governing equations. When possible, analytical solutions of the mathematical
model are sought. If this is not possible, we turn to numerical methods, which
is the focus of Part II. The second step is to discretize the mathematical model,
which involves approximation of the continuous differential equation(s) by a system of algebraic equations for the dependent variables at discrete locations in the
independent variables (space and time). For example, see Figure 11.2. The discretization step leads to a system of linear algebraic equations, whose numerical
solution comprises step three of the numerical solution procedure. The method
of discretization often produces a large, sparse matrix problem with a particular
structure. For example, we will see that secondorder accurate, central differences
189
m
F cos(t)
Figure 11.3 Schematic of the forced springmass system.
190
FiniteDifference Methods
the weight, and ff = F cos(t) being the force owing to the forcing, Newtons
second law leads to
ma = fd + fs W ff ,
or
ma = cv ku mg F cos(t).
Writing in terms of the mass position u(t) only, the governing equation is
d2 u
c du
k
F
+
+ u = g cos(t),
(11.1)
2
dt
m dt
m
m
which is a secondorder, linear, nonhomogeneous ordinary differential equation
in the form of an initialvalue problem.
Now let us consider the forced springmass system in the context of the general
numerical solution procedure. Step 1 is application of the physical law, namely
conservation of momentum in the form of Newtons second law, and models. In
this case, we assume that the spring is linear elastic and that the moving mass is
subject to lowspeed Stokes drag. This results in the mathematical model given by
equation (11.1) in the form of a secondorder, linear ordinary differential equation.
Although in this case, it is possible to obtain an exact solution analytically, let
us continue with the numerical solution procedure.
The second step of the numerical solution procedure involves discretizing the
continuous governing equation and time domain. We approximate the continuous
domain u(t) and differential equation at discrete locations in time ti , which are
separated by the small time step t. In order to see how the derivatives in the
governing equation (11.3) are discretized, recall the definition of the derivative
with t small, but finite:
u(t + t) u(t)
u(t + t) u(t)
du
= u0 (t) = lim
.
t0
dt
t
t
As suggested by this definition, the derivative of u(t) can be approximated by
linear combinations of the values of u at adjacent time steps more on this later.
Such finite differences, that is, differences of the dependent variable between
adjacent finite time steps, allow for calculation of the position u at the current
time step in terms of values at previous time steps. In this way, the value of
the position is calculated at successive time steps in turn. See Figure 11.4 for a
sample solution.
11.1.2 Properties of a Numerical Solution
Each step of the numerical solution procedure produces its own source of error.
Step one produces modeling errors, which are the differences between the actual
physical system and the exact solution of the mathematical model. The difference
between the exact solution of the governing equations and the exact solution of
the system of algebraic equations is the discretization error. This error, which is
produced by step two, is comprised of two contributing factors: 1) the inherent
191
y!t"
20
40
60
80
100
5
10
15
20
error of the method of discretization, that is, the truncation error, and 2) the
error owing to the resolution of the computational grid used in the discretization.
Finally, unless a direct method is used, there is the iterative convergence error,
which is the difference between the iterative numerical solution and the exact
solution of the algebraic equations. In both direct and iterative methods, roundoff errors arise as discussed in Section 4.1.2.
As we discuss various numerical methods applied to a variety of types of problems and equations, there are several properties of successful numerical solution
methods that must be considered. The first is consistency, which requires that
the discretized equations formally become the governing equations as the grid
size goes to zero; this is a property of the discretization method. Specifically, the
truncation error, which is the difference between the solution to the discretized
equations and the exact solution of the governing equations, must go to zero
as the grid size goes to zero. For example, we will see that a finitedifference
approximation to the firstorder derivative of u with respect to x is given by
du
ui+1 ui1
=
+ O(x2 ),
dx
2x
where O(x2 ) is the truncation error. Therefore, from the definition of the derivaui1
tive, as x 0, ui+12x
du
for consistency.
dx
The second property is stability, which requires that a timemarching numerical
procedure must not magnify roundoff errors produced in the numerical solution
such that the numerical solution diverges from the exact solution. This will be
discussed in more detail in Sections (odes) and 15.2. Note the similarity between
stability and conditioning discussed in Section 6.1.2 as they both involve examining the effect of disturbances. The distinction is that conditioning quantifies the
effect of disturbances in the system of equations itself, whereas stability determines the effect of disturbances in the algorithm that is used to solve the system
of equations.
The third property is convergence, whereby the numerical solution of the dis
192
FiniteDifference Methods
193
Forward Difference:
ui+1 ui
du
,
dx xi
x
Backward Difference:
du
ui ui1
,
dx xi
x
Central Difference:
ui+1 ui1
du
.
dx xi
2x
Intuitively, we might expect that the central difference will provide a more accurate approximation than the forward and backward differences. Indeed, this is
the case as shown formally using Taylor series expansions.
Finitedifference approximations are based on truncated Taylor series expansions, which allow us to express the local behavior of a function in the vicinity of
some point in terms of the value of the function and its derivatives at that point.
Consider the Taylor series expansion of u(x) in the vicinity of the point xi
(x xi )2 d2 u
du
u(x) = u(xi ) + (x xi )
+
dx i
2!
dx2 i
(x xi )3 d3 u
(x xi )n dn u
+
+ +
+ .
3!
dx3 i
n!
dxn i
(11.2)
ui+1 = ui + x
du
dx
+
i
x2
2
d2 u
dx2
+
i
x3
6
d3 u
dx3
++
i
xn
n!
dn u
dxn
+ .
i
(11.3)
Solving for (du/dx)i gives
du
dx
=
i
ui+1 ui x
x
2
d2 u
dx2
xn1
n!
dn u
dxn
+ .
i
(11.4)
194
FiniteDifference Methods
+ +
+ .
dx i
2
dx2 i
6
dx3 i
n!
dxn i
(11.5)
Solving again for (du/dx)i gives
du
ui ui1 x d2 u
x2 d3 u
(1)n xn1 dn u
=
+
+ +
+ .
dx i
x
2
dx2 i
6
dx3 i
n!
dxn i
(11.6)
Alternatively, subtract equation (11.5) from (11.3) to obtain
x3 d3 u
2x2n+1 d2n+1 u
du
+
+ +
+ ,
ui+1 ui1 = 2x
dx i
3
dx3 i
(2n + 1)! dx2n+1 i
(11.7)
and solve for (du/dx)i to obtain
2n+1
du
ui+1 ui1 x2 d3 u
x2n
d
u
. (11.8)
dx i
2x
6
dx3 i
(2n + 1)! dx2n+1 i
If all of the terms are retained in the expansions, equations (11.4), (11.6), and
(11.8) are exact expressions for the first derivative (du/dx)i . Approximate finite
difference expressions for the first derivative may then be obtained by truncating
the series after the first term:
ui+1 ui
du
Forward Difference:
+ O(x),
dx xi
x
Backward Difference:
ui ui1
du
+ O(x),
dx xi
x
Central Difference:
ui+1 ui1
du
+ O(x2 ).
dx xi
2x
The O(x) and O(x2 ) terms represent the truncation error of the corresponding
approximation. For small x, successive terms in the Taylor series get smaller,
and the order of the truncation error is given by the first truncated term. We
say that the forward and backwarddifference approximations are firstorder
accurate, and the centraldifference approximation is secondorder accurate.
Observe that the centraldifference approximation is indeed better than the forward and backward differences as expected. Observe that the truncation error
arises because of our choice of algorithm, whereas the roundoff error arises because of the way calculations are carried out on a computer. Therefore, truncation
error would result even on a perfect computer using exact arithmetic.
Higherorder approximations and/or higherorder derivatives may be obtained
by various manipulations of the Taylor series at additional points. For example,
to obtain a secondorder accurate forwarddifference approximation to the first
195
(11.9)
The (d2 u/dx2 )i term can be eliminated by taking 4(11.3) (11.9) to obtain
du
2x3 d3 u
4ui+1 ui+2 = 3ui + 2x
+ .
dx i
3
dx3 i
Solving for (du/dx)i gives
3ui + 4ui+1 ui+2 x2 d3 u
du
=
+ ,
+
dx i
2x
3
dx3 i
(11.10)
which is secondorder accurate and involves the point of interest and the next
two points to the right.
For a secondorder accurate centraldifference approximation to the second derivative add equation (11.3) and (11.5) for ui+1 and ui1 , respectively, to eliminate
the (du/dx)i term. This gives
2
d u
x4 d4 u
2
ui+1 + ui1 = 2ui + x
+
+ .
dx2 i
12 dx4 i
Solving for (d2 u/dx2 )i leads to
2
ui+1 2ui + ui1 x2 d4 u
d u
=
+ ,
dx2 i
x2
12 dx4 i
(11.11)
which is secondorder accurate and involves the point of interest and its nearest
neighbors to the left and right.
We call finitedifference approximations that only involve the point of interest
and its two nearest neighbors on either side compact. Therefore, the secondorder accurate centraldifference approximations given above for both the first
and second derivatives are compact, whereas the secondorder accurate forwarddifference approximation to the first derivative is not compact.
In comparing the firstorder and secondorder accurate forwarddifference approximations, observe that increasing the order of accuracy of a finitedifference
approximation requires including additional grid points. Thus, as more complex
situations are encountered, for example, involving higherorder derivatives and/or
approximations, determining how to combine the linear combination of Taylor
series to produce such finitedifference formulae can become very difficult. Alternatively, it is sometimes easier to frame the question as follows: For a given set of
adjacent grid points, called the finitedifference stencil, what is the highestorder
finitedifference approximation possible? Or stated slightly differently: What is
the best finitedifference approximation using a given pattern of grid points, in
other words, the one with the smallest truncation error?
We illustrate the procedure using an example. Equation (11.10) provides a
secondorder accurate, forwarddifference approximation to the first derivative,
and it involves three adjacent points at xi , xi+1 , and xi+2 . Let us instead determine
196
FiniteDifference Methods
the most accurate approximation to the first derivative that involves the four
points xi , xi+1 , xi+2 , and xi+3 . This will be of the form
u0i + c0 ui + c1 ui+1 + c2 ui+2 + c3 ui+3 = T.E.,
(11.12)
where primes denote derivatives of u with respect to x, and T.E. is the truncation
error. The objective is to determine the constants c0 , c1 , c2 , and c3 that produce
the highestorder truncation error. The Taylor series approximations for u(x) at
xi+1 , xi+2 , and xi+3 about xi are
x2 00 x3 000 x4 0000
u +
u +
u + ,
2! i
3! i
4! i
(2x)2 00 (2x)3 000 (2x)4 0000
ui+2 = ui + 2xu0i +
ui +
ui +
ui + ,
2!
3!
4!
(3x)2 00 (3x)3 000 (3x)4 0000
ui +
ui +
ui + .
ui+3 = ui + 3xu0i +
2!
3!
4!
Substituting these expansions into equation (11.12) and collecting terms leads to
ui+1 = ui + xu0i +
+x2
(11.13)
The highestorder truncation error will occur when the maximum number of
lowerorder derivative terms are eliminated in this equation. Because we have
four constants to determine in this case, we can eliminate the first four terms
in the expansion. This requires the following four simultaneous linear algebraic
equations to be solved for the coefficients:
c0 + c1 + c2 + c3 = 0,
c1 + 2c2 + 3c3 =
1
,
x
1
9
c1 + 2c2 + c3 = 0,
2
2
1
4
9
c1 + c2 + c3 = 0.
6
3
2
The solution to this system of equations is
11
3
3
1
, c1 =
, c2 =
, c3 =
,
6x
x
2x
3x
or with the least common denominator
11
18
9
2
c0 =
, c1 =
, c2 =
, c3 =
.
6x
6x
6x
6x
The remaining term that has not been zeroed provides the leadingorder truncation error. Substituting the solutions for the coefficients just obtained, this term
c0 =
197
becomes
1
T.E. = x3 u0000
i ,
4
which indicates that the approximation is thirdorder accurate. Therefore, the
approximation (11.12) that has the highestorder truncation error is
11ui + 18ui+1 9ui+2 + 2ui+3
du
=
+ O(x3 ).
(11.14)
dx i
6x
11.3 ExtendedFin Example
As a second example, let us now consider a boundaryvalue problem, the onedimensional model of the heat conduction in an extended fin, an array of which
may be used to cool an electronic device, for example. This example will assist in
further solidifying our understanding of the general numerical solution procedure
as well as introduce a number of the issues in finitedifference methods. After
introducing this example, the Thomas algorithm will be devised to solve the resulting tridiagonal system of equations. In addition, there will be some discussion
of handling different types of boundary conditions. Figure 11.6 is a schematic of
the extended fin with heat conduction in the fin and convection between the fin
and ambient air.
Step 1 of the numerical solution procedure consists of applying conservation
of energy within the fin along with any simplifying assumptions. In this case, we
assume that the heat transfer is onedimensional, that is, only axially along the
length of the fin. This is a good assumption for a fin with small cross section
relative to its length and for which the crosssectional area changes gradually
along the length of the fin. It is further assumed that the convective heat transfer
coefficient is a constant. Based on this, the heat transfer within the extended fin is
governed by the onedimensional ordinarydifferential equation (see, for example,
Incropera & DeWitt)
1 dAc dT
1 h dAs
d2 T
+
(T T ) = 0,
(11.15)
dx2
Ac dx dx
Ac k dx
198
FiniteDifference Methods
where T (x) is the temperature distribution along the length of the fin, T is
the ambient air temperature away from the fin, Ac (x) is the crosssectional area,
As (x) is the surface area from the base, h is the convective heat transfer coefficient
at the surface of the fin, and k is the thermal conductivity of the fin material.
Observe that equation (11.15) is a secondorder ordinary differential equation
with variable coefficients, and it is a boundaryvalue problem requiring boundary
conditions at both ends of the domain.
Letting u(x) = T (x) T , rewrite equation (11.15) as
du
d2 u
+ f (x)
+ g(x)u = 0,
2
dx
dx
(11.16)
where
f (x) =
1 dAc
,
Ac dx
g(x) =
1 h dAs
.
Ac k dx
For now, consider the case where we have a specified temperature at both the
base and tip of the fin, such that
u = ub = Tb T at x = 0,
(11.17)
u = u` = T` T at x = `.
This is called a Dirichlet boundary conditions. Equation (11.16) with boundary
conditions (11.17) represents the mathematical model (step 1 in the numerical
solution procedure).
Step 2 consists of discretizing the domain and governing differential equation.
Dividing the interval 0 x ` into I equal subintervals of length x = `/I
gives the uniform grid illustrated in Figure 11.7.1 Here, fi = f (xi ) and gi = g(xi )
are known at each grid point, and the solution ui = u(xi ) is to be determined for
all interior points i = 2, . . . , I. Approximating the derivatives in equation (11.16)
using secondorder accurate finite differences gives
ui+1 ui1
ui+1 2ui + ui1
+ fi
+ gi ui = 0,
2
x
2x
1
We will begin our grid indices at one in agreement with the typical notation used for vectors and
matrices as in Chapters 1 and 2; that is, i = 1 corresponds to x = 0. This is despite the fact that
many programming languages begin array indices at zero by default.
199
i = 2, . . . , I,
(11.18)
where the difference equation is applied at each interior point of the domain, and
ai = 1
x
fi ,
2
bi = 2 + x2 gi ,
ci = 1 +
x
fi ,
2
di = 0.
Note that because we have discretized the differential equation at each interior
grid point, we obtain a set of (I 1) algebraic equations for the (I 1) unknown values of the temperature ui , i = 2, . . . , I at each interior grid point. The
coefficient matrix for the difference equation (11.18) is tridiagonal
b2 c2 0 0
a3 b3 c3 0
0 a4 b4 c4
..
..
.. ..
.
.
. .
0 0 0 0
0 0 0 0
..
.
0
0
0
..
.
0
0
0
..
.
aI1 bI1
0
aI
0
0
0
..
.
u2
u3
u4
..
.
d2 a2 u1
d3
d4
..
.
=
,
dI1
cI1 uI1
uI
dI cI uI+1
bI
where we note that the righthandside coefficients have been adjusted to account
for the known values of u at the boundaries.
Remarks:
1. As in equation (11.18), it is customary to write difference equations with the
unknowns on the lefthand side and knowns on the righthand side.
2. We multiply through by (x)2 in equation (11.18) such that the resulting
coefficients in the difference equation are O(1).
3. To prevent illconditioning, the tridiagonal system of equations should be diagonally dominant, such that
bi  ai  + ci .
200
FiniteDifference Methods
b c 0 0 0
a b c 0 0
0 a b 0 0
A = .. .. .. . . .. .. .
. . .
.
. .
0 0 0 b c
0 0 0 a b
It can be shown that the eigenvalues of such a tridiagonal matrix with constants
along each diagonal are
j
, j = 1, . . . , N.
(11.19)
j = b + 2 ac cos
N +1
The eigenvalues with the largest and smallest magnitudes are (which is largest
or smallest depends upon a, b, and c)
,
1  = b + 2 ac cos
N +1
.
N  = b + 2 ac cos
N +1
Let us consider N large. In this case, expanding cosine in a Taylor series gives
[the first is expanded about /(N +1) 0 and the second about N /(N +1) ]
2
4
cos
=1
+
+ ,
N +1
2! N + 1
4! N + 1
2
2
1
N
1
N
cos
= 1 +
= 1 +
.
N +1
2! N + 1
2 N +1
Consider the common case that may result from the use of central differences
for a secondorder derivative:
a = 1,
b = 2,
c = 1,
which is weakly diagonally dominant. Then with the large N expansions from
above
"
#
2
2
q
1
1  = 2 + 2 (1)(1) 1
+ =
+ ,
2 N +1
N +1
"
#
2
q
1
N  = 2 + 2 (1)(1) 1 +
= 4 + .
2 N +1
(11.20)
201
4(N + 1)2
4
=
,
2
(/(N + 1))
2
for large N .
b = 4,
c = 1,
which is strictly diagonally dominant. Then from equation (11.20), the condition
number for large N is approximately
cond2 (A)
6
= 3,
2
for large N ,
u2
b2 c2 0 0
0
0
0
d2 a2 u1
a3 b3 c3 0
0
0
0
d3
u3
0 a4 b4 c4
0
0
0 u4
d4
..
.
..
.. .. . .
..
..
.. ..
..
.
.
.
.
.
.
.
.
.
.
dI1
uI
0 0 0 0
0
aI
bI
dI cI uI+1
This tridiagonal form is typical of other finitedifference methods having compact
stencils and Dirichlet boundary conditions.
Tridiagonal systems may be solved directly, and efficiently, using the Thomas
algorithm, which is based on Gaussian elimination. Recall that Gaussian elimination consists of a forward elimination and back substitution step. First, consider
the forward elimination, which eliminates the ai coefficients along the lower diagonal. Let us begin by dividing the first equation through by b2 to give
1 F2 0 0
0
0
0
u2
2
a3 b3 c3 0
0
0
0
d3
u3
0 a4 b4 c4
0
0
0 u4
d4
=
..
,
.
.
.
.
.
.
.
.
..
.. .. . . .
..
..
.. ..
..
.
dI1
0 0 0 0
0
aI
bI
uI
dI cI uI+1
202
FiniteDifference Methods
where
F2 =
c2
,
b2
2 =
d2 a2 u1
.
b2
(11.21)
To eliminate a3 in the second equation, subtract a3 times the first equation from
the second equation to produce
1
F2
0 0
0
0
0
u2
2
0 b3 a3 F2 c3 0
0
0
0
u3 d3 a3 2
a4
b4 c4
0
0
0 u4
d4
=
..
.
.
.
.
.
.
.
.
.
..
.. .. . . .
..
..
.. ..
..
0
0 0 aI1 bI1 cI1 uI1
dI1
0
0
0 0
0
aI
bI
uI
dI cI uI+1
Dividing the second equation through by b3 a3 F2 then leads to
1 F2 0 0
0
0
0
u2
2
0 1 F3 0
0
0
0
3
u3
0 a4 b4 c4
0
0
0
u
d
4
4
=
.. ..
,
.
.
.
.
.
.
.
..
.. . . .
..
..
.. ..
..
. .
0 0
c3
,
b3 a3 F2
3 =
d3 a3 2
.
b3 a3 F2
(11.22)
1 F2
0
0
0
0
0
u2
2
0 1
F3
0
0
0
0
3
u3
0 0 b4 a4 F3 c4
0
0
0 u4 d4 a4 3
.. ..
.. =
.
..
.. . .
..
..
..
..
. .
.
.
.
.
.
. .
.
0 0
0
0 aI1 bI1 cI1 uI1
dI1
0 0
0
0
0
aI
bI
uI
dI cI uI+1
Dividing the third equation through by b4 a4 F3 then leads to
1 F2 0 0
0
0
0
u2
2
0 1 F3 0
0
0
0
3
u3
0 0
1 F4
0
0
0 u4
4
=
.. ..
,
..
.. . .
..
..
.. ..
..
. .
.
.
.
.
.
.
.
.
0 0
c4
,
b4 a4 F3
4 =
d4 a4 3
.
b4 a4 F3
(11.23)
203
Comparing equations (11.21), (11.22), and (11.23), observe that we can define
the following recursive coefficients to perform the forward elimination:
F1 = 0,
Fi =
ci
,
bi ai Fi1
i =
1 = u1 ,
di ai i1
,
bi ai Fi1
i = 2, . . . , I.
Upon completion of the forward elimination step, we have the rowechelon form
of the system of equations given by
1 F2 0 0 0 0
0
u2
2
0 1 F3 0 0 0
0
3
u3
0 0
1
F
0
0
0
4
4
4
=
.. ..
.
.
.
.
.
.
.
.
..
.. . . . .. ..
.. ..
..
. .
0 0
0 0 0 1 FI1 uI1
I1
0 0 0 0 0 0
1
uI
I FI uI+1
We then apply back substitution to obtain the solutions for ui . Starting with
the last equation, we have
uI = I FI uI+1 ,
where uI+1 is known from the Dirichlet boundary condition at the tip. Using this
result, the second to last equation then gives
uI1 = I1 FI1 uI .
Generalizing yields
ui = i Fi ui+1 ,
i = I, . . . , 2,
where we note the order starting at the tip and ending at the base.
Summarizing, the Thomas algorithm consists of the two recursive stages:
1. Forward elimination:
F1 = 0,
Fi =
1 = u1 = ub = boundary condition
ci
,
bi ai Fi1
i =
di ai i1
,
bi ai Fi1
i = 2, . . . , I.
2. Back substitution:
uI+1 = u` = boundary condition
ui = i Fi ui+1 ,
i = I, . . . , 2.
204
FiniteDifference Methods
more complex situations, such as two and threedimensional partial differential equations, are often designed specifically to take advantage of the Thomas
algorithm.
2. Observe that it is only necessary to store each of the three diagonals in a vector
(onedimensional array) and not the entire matrix owing to the structure of
the matrix.
3. Similar algorithms are available for other banded, such as pentadiagonal, matrices.
4. Notice how roundoff errors could accumulate in the Fi and i coefficients
during the forward elimination step, and ui in the back substitution step.
du
=r
dx
at x = `,
(11.24)
(11.25)
uI+2 uI
= r,
2x
205
which also contains the value uI+2 outside the domain. Hence, solving for this
gives
2x
uI+2 = uI +
(r puI+1 ) .
q
Substituting into equation (11.25) in order to eliminate the point outside the
domain and collecting terms yields
q (aI+1 + cI+1 ) uI + (qbI+1 2xpcI+1 ) uI+1 = qdI+1 2xrcI+1 .
This equation is appended to the end of the tridiagonal system of equations to
allow for determination of the additional unknown uI+1 .
Finally, let us consider evaluation of the heat flux at the base of the fin according
to Fouriers law
dT
du
qb = kAc (0)
= kAc (0)
.
dx x=0
dx x=0
In order to evaluate du/dx at the base x = 0, we must use a forward difference.
From equation (11.4) applied at i = 1, we have the firstorder accurate forwarddifference approximation
u2 u1
du
+ O(x).
dx x=0
x
For a more accurate approximation, we may use the secondorder accurate approximation from equation (11.10) applied at i = 1
du
3u1 + 4u2 u3
+ O(x2 ).
dx x=0
2x
Even higherorder finitedifference approximations may be formed. For example,
the thirdorder, forwarddifference approximation from equation (11.14) is
du
11u1 + 18u2 9u3 + 2u4
+ O(x3 ).
dx x=0
6x
Observe that each successive approximation requires one additional point in the
interior of the domain.
12
FiniteDifference Methods for Ordinary
Differential Equations
BVPs and IVPs
206
13
Classification of SecondOrder Partial
Differential Equations
Our focus in the remainder of Part II will be development of methods for solving
partial differential equations. As we develop these numerical methods, it is essential that they be faithful to the physical behavior inherent within the various
types of partial differential equations. This behavior is determined by its classification, which depends upon the nature of its characteristics. These are the curves
within the domain along which information propagates in space and/or time. The
nature of the characteristics depends upon the coefficients of the highestorder
derivatives. Owing to their prominence in applications, we focus on secondorder
partial differential equations.
(13.1)
where subscripts denote partial differentiation with respect to the indicated variable. The equation is linear if the coefficients a, b, c, d, e, and f are only functions
of (x, y). If they are functions of (x, y, u, ux , uy ), then the equation is said to be
quasilinear. In this case, the equation is linear in the highest derivatives, and
any nonlinearity is confined to the lowerorder derivatives.
Let us determine the criteria necessary for the existence of a smooth (differentiable) and unique (singlevalued) solution along a characteristic curve C as
illustrated in Figure 13.1. Along C, we define the parametric functions
1 ( ) = uxx , 2 ( ) = uxy , 3 ( ) = uyy ,
1 ( ) = ux , 2 ( ) = uy ,
(13.2)
(13.3)
208
(13.4)
d2
d
dx
dy
dx
dy
=
uy =
uxy +
uyy =
2 +
3 .
d
d
d
d
d
d
(13.5)
and
Equations (13.3)(13.5) are three equations for three unknowns, namely the
secondorder derivatives 1 , 2 , and 3 . Written in matrix form, they are
a
b
c
1
H
dx/d dy/d
2 = d1 /d .
0
0
dx/d dy/d
3
d2 /d
Because the system is nonhomogeneous, if the determinant of the coefficient
matrix is not equal to zero, a unique solution exists for the second derivatives
along the curve C. It also can be shown that if the secondorder derivatives exist,
then derivatives of all orders exist along C as well, in which case they are smooth.
On the other hand, if the determinant of the coefficient matrix is equal to zero,
then the solution is not unique, and the second derivatives are discontinuous
along C. Setting the determinant equal to zero gives
2
2
dx
dx
dy
dy
a
+c
b
= 0,
d
d
d
d
or multiplying by (d /dx)2 yields
2
dy
dy
a
+ c = 0.
b
dx
dx
This is a quadratic equation for dy/dx, which is the slope of the characteristic
curve C. Consequently, from the quadratic formula, the slope is
dy
b b2 4ac
=
.
(13.6)
dx
2a
209
The characteristic curves C of equation (13.1), for which y(x) satisfies (13.6), are
curves along which the secondorder derivatives are discontinuous.
Because the characteristics must be real, their behavior is determined by the
sign of b2 4ac as follows:
b2 4ac > 0 2 real roots
2 characteristics
Hyperbolic PDE,
1 characteristics
Parabolic PDE,
> 0,
4 det[A] = b2 4ac = 0,
< 0,
hyperbolic
parabolic .
elliptic
210
us consider the case when they are constants, such that the characteristics are
straight lines. The conclusions drawn for each type of equation are then naturally
generalizable.
y = 2 x + x2 .
,
t2
x2
where u(x, t) is the amplitude of the wave, and is the wave speed within the
medium. In this case, the independent variable y becomes time t, and a = 2 ,
b = 0, and c = 1. Comparing with equations (13.6) and (13.7), we have
1
1
, 2 = .
Therefore, the characteristics of the wave equation with a, b, and c constant are
straight lines with slopes 1/ and 1/ as shown in Figure 13.2. Take particular
notice of the domains of influence and dependence. The domain of influence of
point P indicates the region within (x, t) space whose solution can influence that
1 =
211
ut (x, 0) = g(x).
The first is an initial condition on the amplitude, and the second is on its velocity.
Note that no boundary conditions are necessary unless there are boundaries at
finite x.
(13.8)
Integrating yields
y=
b
x + 1 ,
2a
which is a straight line. Therefore, the solution propagates along one linear characteristic direction (usually time).
For example, consider the onedimensional, unsteady diffusion equation
2u
u
= 2,
t
x
which governs unsteady heat conduction, for example. Here, u(x, t) is the quantity
undergoing diffusion (for example, temperature), and is the diffusivity of the
medium. Again, the independent variable y becomes time t, and a = and
b = c = 0. Because b = 0 in equation (13.8), the characteristics are lines of
constant t, corresponding to a solution that marches forward in time as illustrated
in Figure 13.3. Observe that the DoI is every position and time prior to the current
time, and the DoD is every position and time subsequent to the current time.
Initial and boundary conditions are required, such as
u(x, 0) = u0 (x),
u(x1 , t) = f (t),
u(x2 , t) = g(t),
which can be interpreted as the initial temperature throughout the domain and
the temperatures at the boundaries in the context of unsteady heat conduction.
212
213
Figure 13.5 Transonic fluid flow past an airfoil with distinct regions where
the equation is elliptic, parabolic, and hyperbolic.
2 2
+ 2 = 0,
x2
y
where (s, n) is the velocity potential, and M is the local Mach number. Here,
a = 1 M 2 , b = 0, and c = 1. To determine the nature of the equation, observe
that
b2 4ac = 0 4(1 M 2 )(1) = 4(1 M 2 );
therefore,
M < 1 (subsonic)
M = 1 (sonic)
b2 4ac = 0 Parabolic,
14
FiniteDifference Methods for Elliptic
Partial Differential Equations
Recall from the previous chapter that the canonical secondorder, elliptic partial
differential equation is the Laplace equation, which in twodimensional Cartesian
coordinates is
2 2
+ 2 = 0.
(14.1)
x2
y
The nonhomogeneous version of the Laplace equation, that is,
2 2
+ 2 = f (x, y),
x2
y
(14.2)
is called the Poisson equation. Also recall that elliptic problems have no preferred
direction of propagation; therefore, they require a global solution strategy and
boundary conditions on a closed contour surrounding the entire domain as illustrated in Figure 14.1, that is, elliptic problems are essentially boundaryvalue
problems (whereas parabolic and hyperbolic problems are initialvalue problems).
The types of boundary conditions include Dirichlet, in which the values of on
the boundary are specified, Neumann, in which the normal derivative of is
specified on the boundary, and Robin, or mixed, in which a linear combination
of and its normal derivative is specified along the boundary. In the context
of heat transfer, for example, a Dirichlet condition corresponds to an isothermal
boundary condition, a Neumann condition corresponds to a specified heat flux,
and a Robin condition arises from a convection condition at the boundary (see,
for example, Section 11.5). Combinations of the above boundary conditions may
be applied on different portions of the boundary as long as some boundary condition is applied at every point along the boundary contour.
Remarks:
1. Solutions to the Laplace and Poisson equations with Neumann boundary conditions on the entire boundary can only be determined relative to an unknown
constant, that is, (x, y)+c is a solution. We will need to take this into account
when devising numerical algorithms.
2. A linear elliptic problem consists of a linear equation and boundary conditions,
whereas a nonlinear problem is such that either the equation and/or boundary
conditions are nonlinear. For example, the Laplace or Poisson equation with
Dirichlet, Neumann, or Robin boundary conditions is linear. An example of
214
215
= D(4 4sur ),
n
where n represents the outward facing normal to the boundary. The nonlinearity arises owing to the fourth power on the temperature.
216
Figure 14.3 and note that the domain has been discretized into a twodimensional
grid intersecting at (I + 1) (J + 1) points.
Consider secondorder accurate, central difference approximations to the derivatives in equation (14.2) at a typical point (i, j); the five point finitedifference
stencil is shown in Figure 14.4. With i,j = (xi , yj ), the secondorder accurate,
central difference approximations are given by
i+1,j 2i,j + i1,j
2
=
+ O(x2 ),
x2
x2
2
i,j+1 2i,j + i,j1
=
+ O(y 2 ).
2
y
y 2
Substituting into (14.2), multiplying by (x)2 , and collecting terms gives the
final form of the finitedifference equation
"
#
x 2
x 2
i+1,j 2 1 +
i,j + i1,j +
(i,j+1 + i,j1 ) = x2 fi,j , (14.3)
y
y
217
218
results from our numerical discretizations. Instead, we will carry out the iteration process until an approximate solution is obtained having an acceptably small
amount of iterative convergence error. As in Section 6.2, we will begin with a discussion of the Jacobi, GuassSeidel, and SOR methods. This will be followed by
the alternatingdirection implicit (ADI) method, which improves even more on
scalability properties. Finally, the multigrid framework will be described, which
accelerates the underlying iteration technique, such as GaussSeidel, and results
in the fastest and most flexible algorithms currently available.
(14.4)
(14.5)
219
D I 0
I D I
0 I D
..
..
..
.
.
.
0 0 0
0 0 0
0 0
1
d1
0 0
2
d2
0 0
3
d3
.
..
..
..
..
..
. .
.
.
.
D I
(I+1)(J+1)1
d(I+1)(J+1)1
I D
(I+1)(J+1)
d(I+1)(J+1)
4 1
0 0
0
1 4 1 0
0
1
4
0
0
D = ..
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
0
0
0 4 1
0
0
0 1 4
where is the frequency, and i is the imaginary number. The inverse transform
is
Z
()e2ix d.
(x) =
220
Consider the discrete form of the Fourier transform in which we have K values
of (x) at discrete points defined by
k = (xk ),
k = 0, 1, 2, . . . , K 1,
xk = k,
(x)e2im x dx,
(m ) =
K1
X
k e2im xk .
k=0
K1
X
k e2ikm/K .
k=0
m =
K1
X
k e2ikm/K .
(14.6)
k=0
1
K
K/2
X
m e2ikm/K .
(14.7)
m=K/2
The above expressions are all for a onedimensional function. For a twodimensional
spatial grid, we define the physical grid (x, y)
k = 0, . . . , K 1,
l = 0, . . . , L 1.
(m,n )
2 m,n ,
m,n =
K1
X L1
X
k=0 l=0
(14.8)
221
K/2
L/2
X
X
1
m,n e2ikm/K e2iln/L .
=
KL m=K/2 n=L/2
(14.9)
Now let us apply this to the discretized Poisson equation (14.4). For simplicity,
set = x = y giving (with i k, j l)
k+1,l + k1,l + k,l+1 + k,l1 4k,l = 2 fk,l .
(14.10)
Substituting equation (14.9) into equation (14.10) leads to the following equation
for each Fourier mode
m,n e2i(k+1)m/K e2iln/L + e2i(k1)m/K e2iln/L
+e2ikm/K e2i(l+1)n/L + e2ikm/K e2i(l1)n/L
4e2ikm/K e2iln/L = 2 fm,n e2ikm/K e2iln/L ,
where fm,n is the Fourier transform of the righthandside fk,l . Canceling the
common factor e2ikm/K e2iln/L , we have
h
i
m,n e2im/K + e2im/K + e2in/L + e2in/L 4 = 2 fm,n .
Recalling that cos(ax) = 12 (eiax + eiax ) gives
2m
2n
m,n 2 cos
+ 2 cos
4 = 2 fm,n .
K
L
Solving for the Fourier transform m,n leads to
m,n =
2 cos
2 fm,n
2m
+ cos
K
2n
L
,
(14.11)
for m = 1, . . . , K 1; n = 1, . . . , L 1.
Therefore, the procedure to solve the difference equation (14.10) using Fourier
transform methods is as follows:
1) Compute the Fourier transform fm,n of the righthand side fk,l using
fm,n =
K1
X L1
X
(14.12)
k=0 l=0
222
1. The above procedure works for periodic boundary conditions, such that the
solution satisfies
k,l = k+K,l = k,l+L .
For Dirichlet boundary conditions, we use the Fourier sine transform, and for
Neumann boundary conditions, we use the Fourier cosine transform.
2. In practice, the Fourier (and inverse) transforms are computed using a Fast
Fourier Transform (FFT) technique (see, for example, Numerical Recipes).
3. Fourier transform methods can only be used for partial differential equations
with constant coefficients in the direction(s) for which the Fourier transform
is applied.
4. We use Fourier transforms to solve the difference equation, not the differential
equation; therefore, this is not a spectral method (see Section 17.3).
i = 0, . . . , I,
(14.13)
223
where
i,0
i,1
i,2
..
.
ui =
,
i,J1
i,J
fi,0
fi,1
fi,2
..
.
fi =
,
fi,J1
fi,J
2 1
0 0
0
1 2 1 0
0
0
1 2 0
0
0
B = ..
.
..
.. . .
..
..
.
. .
.
.
.
0
0
0 2 1
0
0
0 1 2
The first three terms in equation (14.13) correspond to the central difference
in the xdirection, and the fourth term corresponds to the central difference in
the ydirection (see B0 ). Taking B = 2I + B0 , where I is the identity matrix,
equation (14.13) becomes
ui1 + Bui + ui+1 = 2 fi ,
where the (J + 1) (J + 1) matrix B is
4 1
0
1 4 1
0
1 4
B = ..
..
..
.
.
.
0
0
0
0
0
0
..
.
i = 0, . . . , I,
0
0
0
..
.
0
0
0
..
.
(14.14)
4 1
1 4
Note that equation (14.14) corresponds to the blocktridiagonal matrix for equation (14.5), where B is the tridiagonal portion and the coefficients of the ui1
and uu+1 terms are the fringes. Writing three successive equations of (14.14) for
i 1, i, and i + 1 gives
ui2 + Bui1 + ui = 2 fi1 ,
ui1 + Bui + ui+1 = 2 fi ,
ui + Bui+1 + ui+2 = 2 fi+1 .
Multiplying B times the middle equation and adding all three gives
ui2 + B ui + ui+2 = 2 fi ,
(14.15)
where
B = 2I B2 ,
fi = fi1 Bfi + fi+1 .
This is an equation of the same form as (14.14); therefore, applying this procedure
to all even numbered i equations in (14.14) reduces the number of equations by a
factor of two. This cyclic reduction procedure can be repeated recursively until a
single equation remains for the middle line of variables uI/2 , which is tridiagonal.
This is why in the xdirection, I = 2m with integer m.
Using the solution for uI/2 , solutions for all other i are obtained by successively
224
solving the tridiagonal problems at each level in reverse as illustrated in Figure 14.8. This results in a total of I tridiagonal problems to obtain ui , i = 0, . . . , I.
Remarks:
1. The number of grid points in the direction for which cyclic reduction is applied,
x in the above case, must be an integer power of two.
2. The speed of FFT and cyclic reduction methods are comparable, with FFT
being a bit faster.
3. Cyclic reduction may be applied to somewhat more general equations, such as
those with variable coefficients.
4. One can accelerate by taking the FFT in the direction with constant coefficients and using cyclic reduction in the other.
5. These socalled fast Poisson solvers have O(N log N ) operations, where N =
I J.
6. This algorithm is not (easily) implemented in parallel.
(14.16)
(14.17)
225
n
1
n
ni,j+1 + ni,j1 x2 fi,j .
i+1,j
i1,j
2(1 + )
(14.18)
In addition to checking for iterative convergence, which requires that < 1 for
the iteration matrix M, a smaller spectral radius results in more rapid convergence, that is, in fewer iterations. Similar to Section 6.2.5, it can be shown that
2
Note that we have switched to n being the iteration number, rather than r, and the parentheses
have been removed from the superscripted iteration number as there is no confusion with powers in
the present context.
226
Jac (I, J) =
cos
+ cos
.
2
I +1
J +1
If I = J and I is large, then from the Taylor series for cosine
2
1
=1
+ ;
Jac (I) = cos
I +1
2 I +1
(14.19)
n
1
n+1
2
n
i+1,j + n+1
i1,j + i,j+1 + i,j1 x fi,j .
2(1 + )
(14.20)
The values of i,j are all stored in the same array, and it is not necessary to
distinguish between the nth or (n + 1)st iterates. We simply use the most recently
updated information.
227
1
2
+ = 1
+ .
GS (I) = Jac (I) = 1
2 I +1
I +1
(14.21)
Consequently, the rate of convergence is twice as fast as that for the Jacobi
method for large I, that is, the GaussSeidel method requires onehalf the iterations for the same level of accuracy. Recall from Section 6.2.6 that it can be
shown that diagonal dominance of A is a sufficient, but not necessary, condition
for convergence of the Jacobi and GaussSeidel iteration methods.
14.3.3 Successive OverRelaxation (SOR)
Recall from Section 6.2.7 that SOR accelerates GaussSeidel iteration by magnifying the change in the solution accomplished by each iteration. By taking a
weighted average of the previous iterate ni,j and the GaussSeidel iterate n+1
i,j ,
the iteration process may be accelerated toward the exact solution.
If we denote the GaussSeidel iterate (14.20) by i,j , the new SOR iterate is
given by
n
n+1
i,j = (1 )i,j + i,j ,
(14.22)
where is the relaxation parameter and 0 < < 2 for convergence (Morton &
Mayers, p. 206). We then have the following three possibilities:
1 < < 2 OverRelaxation
=1
GaussSeidel
As determined in Section 6.2.7, the optimal value of that minimizes the spectral
radius for large I is
2
opt
,
1 + I+1
228
2
.
I +1
n
1
n + 2,1 x2 f2,2 .
3,2 + 1,2 +
2,3
2(1 + )
Hence, we simply apply the general finitedifference equation for Jacobi or GaussSeidel in the interior as usual, and the values on the boundary are picked up as
necessary.
229
= c at x = 0
(14.23)
x
as shown in Figure 14.11. The simplest treatment would be to use the Jacobi
(14.18), GaussSeidel (14.20), or SOR (14.22) equation to update i,j in the interior for i = 2, . . . , I, and then to approximate the boundary condition (14.23)
by a forward difference applied at i = 1 according to
2,j 1,j
+ O(x) = c.
x
This could then be used to update 1,j , j = 2, . . . , J using
(14.24)
230
Figure 14.12 Boundary conditions in the top left corner of the domain.
A better alternative is to use the same method as in Section 11.5 for the Thomas
algorithm with a Robin boundary condition. In this approach, the interior points
are updated as before, but we now apply the difference equation at the boundary.
For example, we could apply Jacobi (14.18) at i = 1 as follows:
n
1
n
n1,j+1 + n1,j1 x2 f1,j .
(14.25)
n+1
2,j
0,j
1,j =
2(1 + )
However, this involves a value n0,j that is outside the domain. A secondorder
accurate, centraldifference approximation for the boundary condition (14.23) is
n2,j n0,j
+ O(x2 ) = c,
(14.26)
2x
which also involves the value n0,j . Therefore, solving equation (14.26) for n0,j
gives
n0,j = n2,j 2cx,
and substituting into the difference equation (14.25) to eliminate n0,j leads to
n+1
1,j =
1
n1,j+1 + n1,j1 x2 f1,j .
2 n2,j cx +
2(1 + )
(14.27)
= d at y = b.
y
(14.28)
cx
+
1,J+1 =
2,J+1
2(1 + )
231
where n1,J+2 is outside the domain. Approximating equation (14.28) using a central difference in the same manner as equation (14.26) gives
n1,J+2 n1,J
= d,
2y
which leads to
n1,J+2 = n1,J + 2dy.
Substituting into equation (14.29) to eliminate n1,J+2 gives
n+1
1,J+1 =
1
n
n1,J + dy x2 f1,J+1 ,
2
cx
+
2
2,J+1
2(1 + )
(14.30)
Consider the j th line and assume that values along the j + 1st and j 1st lines
are taken from the previous iterate. Rewriting equation (14.30) as an implicit
equation for the values of i,j along the j th line gives
n+1
2
n
n+1
n
n+1
i = 2, . . . , I.
i+1,j 2(1 + )i,j + i1,j = x fi,j i,j+1 + i,j1 ,
(14.31)
Therefore, we have a tridiagonal problem for i,j along the j th line, which can be
solved using the Thomas algorithm.
Remarks:
1. If sweeping through jlines, j = 2, . . . , J, then ni,j1 becomes n+1
i,j1 in equation
(14.31), as it has already been updated. In other words, we might as well
update the values as in GaussSeidel.
232
2. SOR can also be incorporated after each tridiagonal solve to accelerate iterative convergence.
3. This approach is more efficient at spreading information throughout the domain; therefore, it reduces the number of iterations required for convergence,
but there is more computation per iteration.
4. This illustration provides the motivation for the alternatingdirection implicit
(ADI) method, which accomplishes the above in both x and ydirections.
14.5.2 ADI Method
In the ADI method, we sweep along lines but in alternating directions. Although
the order is arbitrary, let us sweep along lines of constant y first followed by lines
of constant x.
In the first half of the iteration, from n to n + 1/2, we perform a sweep along
constant ylines by solving the series of tridiagonal problems for each of j =
2, . . . , J given by
n+1/2
n+1/2
n+1/2
i,j1
(14.32)
This results in a tridiagonal system of equations to solve for each constant y line,
that is, for each j and all i. The tridiagonal system corresponding to each value
of j is solved in succession as we sweep along each constant y line. This is why
n+1/2
the i,j1 term on the righthandside has been updated from the previous line
solved. Unlike in equation (14.31), differencing in the x and ydirections are kept
separate to mimic diffusion in each direction. This is why i,j appears on both
sides of the equation one from the derivative in the xdirection and one from the
derivative in the ydirection. This approach is called a splitting method, in which
all of the terms associated with the derivatives in each direction are kept together
on one side of the difference equation. Observe that we have added a term i,j to
each side of the equation. The numerical parameter is an acceleration parameter
to enhance diagonal dominance ( 0); = 0 corresponds to no acceleration.
Note that the terms on each side of the equation cancel so as not to alter the
solution for . Although it would appear to enhance diagonal dominance, larger
is not necessarily better.
In the second half of the iteration, from n + 1/2 to n + 1, we sweep along
constant xlines by solving the series of tridiagonal problems for i = 2, . . . , I
given by
n+1
2
n+1
n+1
i,j
i1,j
where n+1
i1,j has been updated from the previous line. Once again, there is a
tridiagonal system to solve for each constant x line, and splitting has been used.
233
Remarks:
1. Each ADI iteration involves (I 1) + (J 1) tridiagonal solves (for Dirichlet
boundary conditions in which it is not necessary to solve along the boundaries).
= 1), it can be shown that for the Poisson (or Laplace)
2. For x = y (
equation with Dirichlet boundary conditions that the acceleration parameter
that gives the best speedup is
= 2 cos (/R) ,
3.
4.
5.
6.
2
i+1,j 2i,j + i1,j
=
,
2
x
(x)2
Ty =
2
i,j+1 2i,j + i,j1
=
.
2
y
(y)2
Then in the ADI method with splitting, along constant y lines we solve
2
(x) Tx = (x) (f Ty ) ,
and along constant x lines we solve
2
(x) Ty = (x) (f Tx ) .
Problem Set # 2
234
x2 i,j =
y2 i,j
Recall from equation (11.11) in Section 11.2 that from the Taylor series, we have
2
x2 4
2
=
+ O(x4 ),
x
x2
12 x4
(14.34)
where the second term is the truncation error for the secondorder accurate approximation, which we will now include in our approximation. Therefore,
2 x2 4
x2 2 2
2
4
x =
+
+ O(x ) = 1 +
+ O(x4 ). (14.35)
x2
12 x4
12 x2 x2
But from equation (14.34), observe that
2
= x2 + O(x2 ).
x2
235
+ O(x4 ).
x = 1 +
12
x2
12 x x2
Solving for 2 /x2 yields
1
1
2
x2 2
x2 2
2
= 1+
x + 1 +
O(x4 )
x2
12 x
12 x
From a binomial expansion (with x sufficiently small) observe that
1
x2 2
x2 2
1+
x
=1
+ O(x4 ).
12
12 x
(14.36)
(14.37)
Because the last term in equation (14.36) is still O(x4 ), we can write equation
(14.36) as
1
2
x2 2
=
x2 + O(x4 ).
(14.38)
1
+
x2
12 x
Substituting the expression (14.37) into equation (14.38) leads to an O(x4 )
accurate centraldifference approximation for the second derivative given by
x2 2 2
2
= 1
+ O(x4 ).
x2
12 x x
Owing to the x2 (x2 ) operator, however, this approximation involves the five
points i2 , i1 , i , i+1 , and i+2 ; therefore, it is not compact. In order to obtain
a compact scheme, we also consider the derivative in the ydirection. Similar to
equation (14.38), we have in the ydirection
1
2
y 2 2
=
1
+
y2 + O(y 4 ).
(14.39)
y 2
12 y
Now consider the Poisson equation
2 2
+ 2 = f (x, y).
x2
y
Substituting equations (14.38) and (14.39) into the Poisson equation leads to
1
1
x2 2
y 2 2
1+
x
x2 + 1 +
y
y2 + O(4 ) = f (x, y),
12
12
2
y 2 2
2
where = max(x, y). Multiplying by 1 + x
1
+
gives
x
y
12
12
y 2 2 2
x2 2 2
1+
+ 1+
+ O(4 )
12 y x
12 x y
(14.40)
x2 2 y 2 2
4
= 1+
+
+ O( ) f (x, y),
12 x
12 y
236
12 y x
12 y
x2
1
=
(i1,j 2i,j + i+1,j )
x2
1
+
[(i1,j1 2i1,j + i1,j+1 )
12x2
2 (i,j1 2i,j + i,j+1 )
+ (i+1,j1 2i+1,j + i+1,j+1 )]
=
1
[20i,j + 10 (i1,j + i+1,j )
12x2
2 (i,j1 + i,j+1 )
+i1,j1 + i1,j+1 + i+1,j1 + i+1,j+1 ] .
Therefore, we have a ninepoint stencil, but the approximation only requires three
points in each direction, and thus it is compact.
Similarly, expanding the second term in equation (14.40) yields
x2 2 2
1
1+
x y =
[20i,j + 10 (i,j1 + i,j+1 )
12
12y 2
2 (i1,j + i+1,j ) + i1,j1 + i1,j+1 + i+1,j1 + i+1,j+1 ] ,
and the righthandside of equation (14.40) is
x2 2 y 2 2
1
1+
x +
y f (x, y) = fi,j +
[fi1,j 2fi,j + fi+1,j
12
12
12
+fi,j1 2fi,j + fi,j+1 ]
1
[8fi,j + fi1,j + fi+1,j + fi,j1 + fi,j+1 ] .
12
Thus, the coefficients in the ninepoint finitedifference stencils for (x, y) and
f (x, y) are illustrated graphically in Figures 14.13 and 14.14, respectively.
=
Remarks:
1. Observe that in equation (14.40), the twodimensionality of the equation has
been exploited to obtain the compact finitedifference stencil. That is, the
x2 x2 and y2 y2 operators have been converted to x2 y2 and y2 x2 difference
operators.
2. Because the finitedifference stencil is compact, that is, only involving three
points in each direction, application of the ADI method as in the previous
section results in a set of tridiagonal problems to solve. In this manner, the
fourthorder, compact finitedifference approach is no less efficient than that
for the secondorder scheme used in the previous section (there are simply additional terms on the righthandside of the equations that must be evaluated).
237
238
to nonlinear equations with minimal loss in efficiency. They are currently among
the best methods for solving partial differential equations numerically, both in
terms of efficiency and flexibility.
14.7.1 Motivation
If one were to carefully examine the iterative convergence history of the typical
iterative techniques, such as GaussSeidel or ADI, the following properties would
be observed: highfrequency components of the error experience fast convergence,
whereas lowfrequency components of the error exhibit relatively slower convergence. As in Briggs et al. (??), let us illustrate this behavior by considering the
following simple onedimensional problem
d2
= 0,
dx2
0 x 1,
(14.41)
with (0) = (1) = 0, which has the exact solution (x) = 0. Therefore, all plots
of the numerical solution for (x) are also plots of the error. Discretizing equation
(14.41) over I equal subintervals (I + 1 points) using central differences gives3
i+1 2i + i1 = 0,
i = 1, . . . , I 1,
0 = I = 0.
To show how the nature of the error affects convergence, consider an initial
guess, that is, error, consisting of the Fourier mode (x) = sin(kx), where k
is the wavenumber and indicates the number of half sine waves on the interval
0 x 1. In discretized form, with xi = ix = i/I, this is
ki
i = sin
, i = 0, . . . , I,
I
with the wavenumber 1 k I 1. Thus, the error for the initial guess is such
that small wavenumber k corresponds to long, smooth (low frequency) waves,
and large k corresponds to highly oscillatory (high frequency) waves in the initial
condition. Figure 14.15 illustrates several different modes in the error. Applying
Jacobi iteration with I = 64, the solution converges more rapidly for the higher
frequency initial guess as illustrated in Figure 14.16. A more realistic situation is
one in which the initial guess contains multiple modes, for example
i
6i
32i
1
sin
+ sin
+ sin
,
i =
3
I
I
I
which has modes with k = 1, 6, and 32 terms represent low, medium, and highfrequency modes, respectively. Applying Jacobi iteration with I = 64, the error is
reduced rapidly during the early iterations but more slowly thereafter as shown
in Figure 14.17. Thus, there is rapid convergence of the overall solution until
3
Note that the first grid point at the left boundary x = 0 is designated as being at i = 0, rather than
i = 1 as before. This is more natural given the way that the grids will be defined.
239
the highfrequency modes are smoothed out followed by slow convergence when
only lower frequency modes are present. To further illustrate this phenomena,
consider the following sequence using Jacobi iteration. Figure 14.18 shows the
result of relaxation acting on a mode with k = 3 after one iteration (left) and
ten iterations (right), and Figure 14.19 shows the result of relaxation acting on a
mode with k = 16 after one iteration (left) and ten iterations (right). Figure 14.20
shows the result of relaxation acting on an error having modes with k = 2 and
k = 16 after one iteration (left) and ten iterations (right). Again, we observe that
the highfrequency error is reduced more rapidly as we iterate (relax) compared
to the lowfrequency error.
Multigrid methods take advantage of this property of relaxation techniques by
recognizing that smooth components of the error become more oscillatory with
240
Figure 14.17 Convergence rate for error with modes having k = 1, 6, and
32 using Jacobi iteration.
Figure 14.18 Error of a mode with k = 3 after one iteration (left) and ten
iterations (right).
respect to the grid size on a coarse grid. That is, there are fewer grid points per
wavelength on a coarser grid as compared to a finer grid, and the error appears
more oscillatory on the coarser grid. Thus, relaxation would be expected to be
more effective on a coarse grid representation of the error. Note that it is also
faster as there are fewer points to compute.
Remarks:
1. Multigrid methods are not so much a specific set of techniques as they are a
framework for accelerating relaxation (iterative) methods.
2. Multigrid methods are comparable in speed with fast direct methods, such as
Fourier methods and cyclic reduction, but they can be used to solve general
elliptic equations with variable coefficients and even nonlinear equations.
241
Figure 14.19 Error of a mode with k = 16 after one iteration (left) and ten
iterations (right).
2
+B(x, y) +C(x, y) 2 +D(x, y) +E(x, y) = F (x, y). (14.42)
2
x
x
y
y
To be elliptic, A(x, y)C(x, y) > 0 for all (x, y). Approximating this differential
equation using secondorder accurate central differences gives
i+1,j i1,j
i+1,j 2i,j + i1,j
+ Bi,j
2
x
2x
i,j+1 2i,j + i,j1
i,j+1 i,j1
+Ci,j
+ Di,j
y 2
2y
+Ei,j i,j = Fi,j ,
Ai,j
where Ai,j = A(xi , yj ), etc. We rewrite this difference equation in the form
ai,j i+1,j + bi,j i1,j + ci,j i,j+1 + di,j i,j1 + ei,j i,j = Fi,j ,
where
Ai,j
Bi,j
+
,
2
x
2x
Ci,j
Di,j
=
+
,
2
y
2y
2Ai,j
= Ei,j
x2
Ai,j
Bi,j
,
2
x
2x
Ci,j
Di,j
=
,
2
y
2y
ai,j =
bi,j =
ci,j
di,j
ei,j
2Ci,j
.
y 2
(14.43)
242
Figure 14.21 The coarse and fine grids, with the coarse grid consisting of
every other point in the fine grid.
CoarseGrid Correction
For convenience, write (14.43) (or some other difference equation) as
L = f,
(14.44)
e = ,
(14.45)
r = f L.
(14.46)
Observe from equation (14.44) that if = , then the residual is zero; therefore,
the residual is a measure of how wrong the approximate solution is. Substituting
(14.45) into equation (14.44) gives
Le + L = f,
or
Le = f L,
which is the error equation
Le = r.
(14.47)
Our use of r to denote the residual here is not to be confused with the rank of a matrix in
Chapters 1 and 2.
In the numerical methods literature, h is often used to indicate the grid size, here given by x and
y.
243
operators, observe that the subscript indicates the grid from which information is
moved, and the superscript indicates the grid to which the information is moved.
From these definitions, we can devise a scheme with which to correct the solution on a fine grid by solving for the error on a coarse grid. This is known as
coarsegrid correction (CGC) and consists of the following steps as illustrated in
Figure 14.22:
1. Relax the original difference equation Lh h = f h on the fine grid h using
GaussSeidel, ADI, etc. 1 times with an initial guess h .
2. Compute the residual on the fine grid h and restrict it to the coarse grid 2h :
r2h = Ih2h rh = Ih2h (f h Lh h ).
3. Using the residual as the righthandside, solve the error equation L2h e2h =
r2h on the coarse grid 2h .
4. Interpolate the error to the fine grid and correct the finegrid approximation
according to
h h + I h e2h .
2h
244
Figure 14.24 Error from Step 1 after one relaxation sweep on the fine grid.
245
Figure 14.25 Error from Step 1 after three relaxation sweeps on the fine
grid.
Figure 14.26 Error from Step 3 after one relaxation sweep on the coarse
grid.
of solving the error equation on the fine grid, let us simply relax as on the fine
grid. Figure 14.26 shows the error after one relaxation sweep on the coarse grid.
An additional two relaxation sweeps on the coarse grid produce the error shown
in Figure 14.27. Note that restriction to the coarse grid accelerates convergence
of the lowfrequency mode, which has a higher frequency relative to the coarser
grid. In Step 4, the error is interpolated from the coarse to the fine grid and
used to correct the approximate solution on the fine grid. Step 5 after three
relaxation sweeps on the fine grid produces the error shown in Figure 14.28. As
246
Figure 14.27 Error from Step 3 after three relaxation sweeps on the coarse
grid.
Figure 14.28
you can see, the lowfrequency mode is nearly eliminated after the CGC sequence
is completed.
An obvious question at this point is, how do we obtain the coarse grid solution for e2h in step 3? Actually, we already know the answer to this perform
additional CGCs. That is, if the CGC works between two grid levels, it should
be even more effective between three, four, or as many grid levels as we can accommodate. To implement this, then, we recursively replace Step 3 by additional
CGCs on progressively coarser grids until it is no longer possible to further reduce
the grid. This leads to the socalled Vcycle as illustrated in Figure 14.29. The
Vcycles are then repeated until convergence; each V of the Vcycle is essentially
247
a multigrid iteration. Although we simply denote the error at each grid level as
e(x, y), it is important to realize that on each successively coarser grid, what is
being solved for is the error of the error on the next finer grid. Thus, on the finest
grid, say h , relaxation is carried out on the original equation for (x, y); on the
next coarser grid 2h , relaxation is on the equation for the error on h ; on the
next coarser grid 4h , relaxation is on the equation for the error on 2h , and so
forth.
This simple Vcycle scheme is appropriate when a good initial guess is available
to start the Vcycles. For example, when considering a solution to equation (14.42)
in the context of an unsteady calculation, in which case the solution for h from
the previous time step is a good initial guess for the current time step. If no
good initial guess is available, then full multigrid Vcycle (FMG) may be applied
according to the following procedure, which utilizes the same components as in
the coarsegrid correction sequence:
1.
2.
3.
4.
5.
6.
248
general grids may be obtained using the following grid definitions. The differential
equation (14.42) is discretized on a uniform grid having Nx Ny points, which
are defined by
Nx = mx 2(nx 1) + 1,
Ny = my 2(ny 1) + 1,
(14.48)
where nx and ny determine the number of grid levels, and mx and my determine
the size of the coarsest grid, which is (mx + 1) (my + 1).
In order to maximize the benefits of the multigrid methodology, we want to
maximize the number of grid levels between which the algorithm will move. Therefore, for a given grid, nx and ny should be as large as possible, and mx and my
should be as small as possible for maximum efficiency. Typically, mx and my are
2, 3, or 5. For example:
Nx = 65
mx = 2, nx = 6
Nx = 129 mx = 2, nx = 7
Nx = 49
mx = 3, nx = 5
Nx = 81
mx = 5, nx = 5
(14.49)
(14.50)
where G(1) is the coarsest grid, G(N ) is the finest grid, and L = 1, . . . , N . Each
grid G(L) has Mx (L) My (L) grid points, where
Mx (L) = mx 2[max(nx +LN,1)1] + 1,
My (L) = my 2[max(ny +LN,1)1] + 1.
For example, if
Nx = 65,
Ny = 49,
(14.51)
249
then
mx = 2, nx = 6
and my = 3, ny = 5.
My (3) = 7
G(2) : Mx (2) = 5,
My (2) = 4
G(1) : Mx (1) = 3,
My (1) = 4
Boundary Conditions
At each boundary, let us consider the general form of the boundary condition
= s,
(14.52)
n
where n is the direction normal to the surface. A Dirichlet boundary condition has
q = 0, and a Neumann boundary condition has p = 0. This boundary condition
is applied directly on the finest grid h , that is,
p + q
h
= sh .
(14.53)
n
On the coarser grids, however, we need the boundary condition for the error. In
order to obtain such a condition, consider the following. On the coarse grid 2h ,
equation (14.52) applies to the solution ; thus,
p h h + q h
p2h 2h + q 2h
2h
= s2h .
n
250
Relaxation
At the heart of the multigrid method is an iterative (relaxation) scheme that is
used to update the numerical approximation to the solution. Typically, redblack
GaussSeidel iteration is used to relax the difference equation as illustrated in
Figure 14.31. By performing the relaxation on all of the red and black grid points
separately, it eliminates data dependencies such that it is easily implemented on
parallel computers (see Section 12). Note that when GaussSeidel is used, SOR
should not be implemented because it destroys the highfrequency smoothing of
the multigrid approach.
Although GaussSeidel is most commonly used owing to its ease of implementation, particularly in parallel, it is better to use alternatingdirection implicit
(ADI) relaxation for the same reason that ADI is better than GaussSeidel. When
sweeping along lines of constant y, the following tridiagonal problem is solved for
each j = 1, . . . , My (L) (see equation (14.43))
ai,j i+1,j + ei,j i,j + bi,j i1,j = fi,j ci,j i,j+1 di,j i,j1 ,
(14.55)
for i = 1, . . . , Mx (L). Here, denotes the most recent approximation, which may
be from the previous or current iteration depending upon the sweep direction.
Similar to redblack GaussSeidel, we could sweep all lines with j even and j
odd separately to eliminate data dependencies. We will refer to this as zebra
relaxation. Then lines of constant x are swept by solving the tridiagonal problem
for each i = 1, . . . , Mx (L) given by
ci,j i,j+1 + ei,j i,j + di,j i,j1 = fi,j ai,j i+1,j bi,j i1,j ,
(14.56)
for j = 1, . . . , My (L). Again we could sweep all lines with i even and i odd
separately to accommodate parallelization.
251
(i, j)
(i , j )
1 i Nx ,
1 i Nx ,
1 j Ny
1 j Ny
j = 2j 1.
in which we simply drop the points that are not common to both the coarser and
finer grids. The matrix symbol for straight injection is [1].
A better restriction operator is full weighting, with the matrix symbol being
given in Figure 14.32. In this case,
1
2h
=
hi 1,j 1 + hi 1,j +1 + hi +1,j 1 + hi +1,j +1
i,j
16
1
(14.57)
+ hi ,j 1 + hi ,j +1 + hi 1,j + hi +1,j
8
1
+ hi ,j
4
This represents a weighted average of surrounding points in the fine mesh. We
h
then use straight injection on the boundaries, such that 2h
i,j = i ,j , i = 1, Nx , j =
1, . . . , Ny and j = 1, Ny , i = 1, . . . , Nx .
In general, the grids in the x and y directions may have different numbers of
grid levels as discussed above. Therefore, restriction may be in both directions or
252
only one direction. If, for example, restriction is applied only in the xdirection,
then Ny = Ny and j = j in equation (14.57).
h
Interpolation (Prolongation) Operator: I2h
The interpolation operator is required for moving information from the coarser
to finer grid. The most commonly used interpolation operator is based on bilinear
interpolation as illustrated in Figure 14.33. In bilinear interpolation, information
at the four corners of a general grid cell on the finer and coarser grids are related
by
copy common points
hi ,j = 2h
i,j ,
2h
hi +1,j = 12 2h
i,j + i+1,j ,
2h
hi ,j +1 = 12 2h
i,j + i,j+1 ,
2h
2h
2h
hi +1,j +1 = 14 2h
i,j + i+1,j + i,j+1 + i+1,j+1 .
2
+
B(x)
+
C(y)
+ D(y)
= F (x, y),
2
2
x
x
y
y
with Neumann boundary conditions. The following times are for an SGI Indy
R5000150MHz. The grid is N N .
ADI:
N
65
129
= 104
Iterations Time (sec)
673
22.35
2, 408
366.06
= 105
Iterations Time (sec)
821
27.22
2, 995
456.03
Note that in both cases, the total time required for the N = 129 case is approximately 16 that with N = 65 ( 4 increase in points and 4 increase
253
in iterations).
Multigrid:
Vcycle with ADI relaxation (no FMG to get improved initial guess). Here the
convergence criterion is evaluated between Vcycles.
N
65
129
= 104
VCycles Time (sec)
18
1.78
23
10.10
= 105
VCycles Time (sec)
23
2.28
29
12.68
Remarks:
1. In both cases, the total time required for the N = 129 case is approximately
6 that with N = 65 (the minimum is 4).
The multigrid method scales to larger grid sizes more effectively than ADI
alone, i.e. note the small increase in the number of Vcycles with increasing
N.
2. The case with N = 65 is approximately 13 faster than ADI, and the case
with N = 129 is approximately 36 faster!
3. Aside from the additional programming complexity, which is considerable, the
only cost for the dramatic speed and scalability improvements is a doubling
of memory requirements in order to store the and errors at each grid level.
Hence, the multigrid method is essentially a tradeoff between computational
time and memory requirements.
4. Because of their speed and generality, multigrid methods are currently the
preferred framework for solving elliptic partial differential equations, including
nonlinear equations. Nonlinear equations are solved using the Full Approximation Storage (FAS) method.
5. References:
Developed by Achi Brandt in the 1970s See original references on multilevel adaptive techniques.
Briggs, W.C., Henson, V.E. and McCormick, S.F., A Multigrid Tutorial,
(2nd Edition) SIAM (2000).
Thomas, J.L., Diskin, B. and Brandt, A.T., Textbook Multigrid Efficiency
for Fluid Simulations, Annual Review of Fluid Mechanics (2003), 35, pp.
317340.
254
throughout the domain, respectively. In applications that also involve convection, additional terms are required that are nonlinear. We now discuss how to
treat nonlinear convective terms using the Burgers equations.
Consider the twodimensional, steady Burgers equations
2u 2u
u
u
=
+ 2,
(14.58)
Re u
+v
x
y
x2
y
2v
v
v
2v
=
Re u
+v
+
,
(14.59)
x
y
x2 y 2
which represent a simplified prototype of the NavierStokes equations as there
are no pressure terms. The velocity components are u(x, y) and v(x, y) in the
x and y directions, respectively. The Reynolds number Re is a nondimensional
parameter representing the ratio of convective to inertial forces in the flow; larger
Reynolds numbers result in increased nonlinearity of the equations. The terms
on the lefthandside are the convection terms, and those on the righthandside
are the viscous, or diffusion, terms. The Burgers equations are elliptic owing to
the nature of the secondorder viscous terms, but the convection terms make the
equations nonlinear actually quasilinear.
A simple approach to linearizing convective terms is known as Picard iteration,
in which we take the coefficients of the nonlinear (first derivative) terms to be
.
known from the previous iteration denoted by ui,j and vi,j
Let us begin by approximating equation (14.58) using central differences for
all derivatives as follows
ui,j+1 ui,j1
ui+1,j ui1,j
+ vi,j
Re ui,j
2x
2y
ui+1,j 2ui,j + ui1,j
ui,j+1 2ui,j + ui,j1
=
+
.
x2
y 2
Multiplying by x2 and rearranging leads to the difference equation
1 12 Re x ui,j ui+1,j + 1 + 12 Re x ui,j ui1,j
1/2 v ui,j+1 +
+ 1 Re x
1/2 v ui,j1
1 Re x
+
i,j
i,j
2
2
i,j = 0,
2(1 + )u
(14.60)
1/2 vi,j
q = Re x
.
2
255
(p 1) + (1 + p) + (q )
or
2(p + q) 2(1 + ),
this condition cannot be satisfied, and equation (14.60)
but with p > 1 and q >
is not diagonally dominant. The same result holds for p < 1 and q < .
or
Therefore, we must have p 1 and q
1
1/2 ,
Re x u 1, and 1 Re x v
i,j
i,j
2
2
which is a restriction on the mesh size for a given Reynolds number and velocity
field. There are two difficulties with this approach:
1. As the Reynolds number Re increases, the grid sizes x and y must decrease.
u
ui,j ui1,j
= ui,j
+ O(x),
x
x
which gives a positive addition to the ui,j term to promote diagonal dominance
(note the sign of the ui,j terms from the viscous terms on the righthandside
of the difference equation).
2. If ui,j < 0, then using a forward difference gives
u
u
ui+1,j ui,j
= ui,j
+ O(x),
x
x
which again gives a positive addition to the ui,j term to promote diagonal
dominance.
Similarly, for the v u/y term:
256
1. If vi,j
> 0, then use a backward difference according to
u
ui,j ui,j1
= vi,j
+ O(y).
y
y
2. If vi,j
< 0, then use a forward difference according to
u
ui,j+1 ui,j
= vi,j
+ O(y).
y
y
;
x2
x
x2
x u
u
,
u
<
0
i+1,j
i,j
i,j
therefore, multiplying by (x)2 to keep the coefficients O(1) gives
257
d2 u
du
= 2.
dx
dx
(14.61)
Recall from Section 3.2 that, for example, the firstorder, backwarddifference
approximation to the firstorder derivative is
ui ui1 x d2 u
du
=
+
+ ...,
dx i
x
2
dx2 i
where we have included the truncation error. Substituting into (14.61) gives
2
d u
ui ui1 x d2 u
+
.
.
.
=
,
+
Re ui
2
x
2
dx i
dx2 i
or
2
ui ui1
Re
d u
Re ui
.
= 1
x ui
x
2
dx2 i
Therefore, depending upon the values of Re, x, and u, the truncation error from
the firstderivative terms, which is not included in the numerical solution, may
be of the same order, or even larger than, the physical diffusion term. This is
often referred to as artificial, or numerical, diffusion, the effects of which increase
with increasing Reynolds number. There are two potential remedies to the firstorder approximations inherent in the upwinddownwind differencing approach as
usually implemented:
1. Secondorder, that is, O(x2 , y 2 ), accuracy can be restored using deferred
correction, in which we use the approximate solution for u to evaluate the leading term of the truncation error, which is then added to the original discretized
equation as a source term.
2. Alternatively, we could use secondorder accurate forward and backward differences, but the resulting system of equations would no longer be tridiagonal.
In other words, it is no longer compact.
258
(14.65)
We then replace equation (14.62) with equation (14.65)/(2x) and iterate until
convergence as usual.
Remarks:
1. Newtons method exhibits quadratic convergence rate if a good initial guess is
used, that is, u is small.
2. It has problems with divergence if u is too large.
15
FiniteDifference Methods for Parabolic
Partial Differential Equations
Whereas the elliptic partial differential equations that were the subject of the previous chapter are boundaryvalue problems, parabolic partial differential equations are initialvalue problems. Parabolic equations have one preferred direction
of propagation of the solution, which is usually time. Timedependent problems
are often referred to as unsteady or transient. The canonical parabolic partial
differential equation is the onedimensional, unsteady diffusion equation
2
= 2,
(15.1)
t
x
where is the diffusivity of the material through which diffusion is occurring. Because parabolic problems are initialvalue problems, we need to develop numerical
methods that march in the preferred direction (time) in a stepbystep manner.
Unlike elliptic problems, for which the approximate numerical solution must be
stored throughout the entire domain, in parabolic problems, it is only necessary
to store in memory the approximate solution at the current and previous time
steps. Solutions obtained at earlier time steps can be saved to permanent storage,
but do not need to be retained in memory.
Consider the general linear, onedimensional, unsteady equation
2
= a(x, t) 2 + b(x, t)
+ c(x, t) + d(x, t),
(15.2)
t
x
x
which is parabolic forward in time for a(x, t) > 0. For the onedimensional, unsteady diffusion equation (15.2), a = 1, and b = c = d = 0. In various fields,
the dependent variable (x, t) represents different quantities. For example, temperature in heat conduction, velocity in momentum diffusion, and concentration
in mass diffusion. Techniques developed for the canonical equation (15.1) can be
used to solve the more general equation (15.2).
There are two basic techniques for numerically solving parabolic problems:
1. Method of lines: Discretize the spatial derivatives to reduce the partial differential equation to a set of ordinary differential equations in time and solve
using predictorcorrector, RungeKutta, etc...
2. Marching methods: Discretize in both space and time.
i) Explicit methods obtain a single equation for (x, t) at each mesh point.
ii) Implicit methods obtain a set of algebraic equations for (x, t) at all mesh
points for each time step.
259
260
Explicit methods are designed to have a single unknown when the difference
equation is applied at each grid point, which allows for the solution to be updated
explicitly at each point in terms of the approximate solution at surrounding
points (cf. GaussSeidel). Implicit methods, on the other hand, result in multiple
unknowns when applied at each grid point, which requires solution of a system
of algebraic equations to obtain the solution at each time step (cf. ADI).
n+1 ni
= i
+ O(t),
t
t
and the secondorder accurate central difference for the spatial derivatives at the
previous, that is, nth , time level, at which the approximate solution is known
ni+1 2ni + ni1
2
=
+ O(x2 ).
x2
x2
261
(15.3)
where s = t/x2 .
Remarks:
1. Equation (15.3) is an explicit equation for n+1
at the (n + 1)st time step in
i
n
th
terms of i at the n time step.
2. It requires one sweep for i = 1, . . . , I + 1 at each time step n + 1.
3. The method is secondorder accurate in space and firstorder accurate in time.
4. The time steps t may be varied from steptostep.
5. As will be shown in Section 15.2, there are restrictions on t and x for the
firstorder explicit method applied to the onedimensional, unsteady diffusion
equation to remain stable. We say that the method is conditionally stable,
because for the numerical method to remain stable requires that
t
1
,
2
x
2
1
which is very restrictive. If s > 2 , then the method is unstable, and errors in
the solution grow to become unbounded as time proceeds.
s=
i = 1, . . . , I + 1.
(15.4)
Remarks:
1. The Richardson method is secondorder accurate in both space and time.
2. We must keep t constant as we move from time step to time step, and a
starting method is required as we need i at the two previous time steps.
3. The method is unconditionally unstable for s > 0; therefore, it is not used.
262
t2 2
n+1
n
i = i + t
+
+ .
t i
2
t2 i
Similarly, consider the Taylor series approximation at tn1 about tn
n
n
t2 2
n1
n
+
i = i t
+ .
t i
2
t2 i
Adding these Taylor series together gives
n+1
i
n1
i
2ni
+ t
2
t2
n
+ ,
i
1 2
1 n+1
i + n1
t
i
2
2
2
t2
n
+ .
(15.5)
Therefore, averaging between time levels in this manner (with uniform time step
t) is O(t2 ) accurate.
Substituting the time average (15.5) into the difference equation (15.4) from
Richardsons method gives
n+1
= n1
+ 2s ni+1 n+1
+ n1
+ ni1 ,
i
i
i
i
or
(1 + 2s)n+1
= (1 2s)in1 + 2s ni+1 + ni1 .
i
i = 1, . . . , I + 1.
(15.6)
263
2
n+1
t2 3
n1
i
i
2 =
+
t
x
2t
6
t3 i
n 2ni + ni1 x2
+
i+1
x2
12
4
x4
n
+ .
i
Substituting the timeaveraging equation (15.5) for ni to implement the DuFortFrankel method leads to
2
n+1 n1
n
i
2 = i
+
+ .
x2
t2 i
6
t3 i
12
x4 i
For consistency, all of the truncation error terms must go to zero as t 0 and
x 0. That is, the difference equation must reduce to the differential equation
as x, t 0. Although the second and third truncation error terms do so,
however, the first term requires that t 0 faster than x 0, in which case
t << x, for consistency. Because this is not the case in general, the DuFortFrankel method is considered inconsistent.
Remarks:
1. The method is secondorder accurate in both space and time.
2. We must keep t constant, and a starting method is necessary because the
finitedifference equation involves two time levels.
3. The method is unconditionally stable for any s = t/x2 .
4. The method is inconsistent; therefore, it is not used.
15.2 Numerical Stability Analysis
Whereas for elliptic problems, our concern is iterative convergence rate, determined by the spectral radius of the iteration matrix, in parabolic problems, the
concern is numerical stability. In particular, the issue is how the numerical timemarching scheme handles the inevitable small errors, for example, roundoff errors, that are inherent to all numerical calculations. These errors effectively act
as disturbances in the solution, and the question is what happens to these disturbances as the solution progresses in time. If the small errors decay in time, that
is, they are damped out, then the numerical solution is stable. If the small errors
grow in time, that is, they are amplified, then the numerical solution is unstable.
There are two general techniques commonly used to test numerical methods
for stability:
264
(15.8)
i = 2, . . . , I.
(15.9)
n = 0, 1, 2, . . . .
(15.10)
Thus, we perform a matrix multiply to advance each time step (cf. the matrix
form for iterative methods). The (I 1) (I 1) matrix A and the (I 1) vector
en are
n
1 2s
s
0
0
0
e2
s
en3
1
2s
s
0
0
0
e4
s
1 2s
0
0
n
A = ..
,
e
=
.. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
en
0
0
1 2s
s
I1
0
0
0
s
1 2s
enI
Note that if is specified at the boundaries, then the error is zero there, that is,
en1 = enI+1 = 0.
265
The numerical method is stable if the eigenvalues j of the matrix A are such
that
j  1
for all
j ,
(15.11)
that is, the spectral radius is such that 1, in which case the error will not grow.
This is based on the same arguments for convergence in Section 6.2.4. Because
A is tridiagonal with constant elements along each diagonal, the eigenvalues are
(see Section 11.4.1)
j
2
, j = 1, . . . , I 1.
(15.12)
j = 1 (4s) sin
2I
Note that there are (I 1) eigenvalues of the (I 1) (I 1) matrix A. For
equation (15.11) to hold,
j
2
1 1 (4s) sin
1.
2I
The right inequality is true for all j (s > 0 and sin2 () > 0). The left inequality
is true if
j
2
1,
1 (4s) sin
2I
j
2
(4s) sin
2,
2I
1
j
.
s sin2
2I
2
Because 0 sin2
j
2I
t
1
.
2
x
2
Thus, the firstorder explicit method with Dirichlet boundary conditions is stable
for s 1/2.
s=
Remarks:
Whereas here we only needed eigenvalues for a tridiagonal matrix, in general the
matrix method required determination of the eigenvalues of an (I + 1) (I + 1)
matrix.
The effect of different boundary conditions are reflected in A and, therefore,
the resulting eigenvalues.
Because the eigenvalues depend on the number of grid points I + 1, stability
is influenced by the grid size x.
This is the same method used to obtain convergence properties of iterative
methods for elliptic problems. Recall that the spectral radius (I, J) is the
modulus of the largest eigenvalue of the iteration matrix, and 1 for an
iterative method to converge. In iterative methods, however, we are concerned
266
not only with whether they will converge or not, but the rate at which they
converge, for example, the GaussSeidel method as compared to the Jacobi
method. Consequently, we seek to devise algorithms that minimize the spectral
radius for maximum convergence rate. For parabolic problems, however, we
are only concerned with stability in an absolute sense. There is no such thing
as a timemarching method that is more stable than another one; we only
care about whether the spectral radius is less than or equal to one not how
much less than one. Because of this, it is often advantageous to use a timemarching (parabolic) scheme for solving steady (elliptic) problems owing to the
less restrictive stability criterion. This is sometimes referred to as the pseudotransient method. If we seek to solve the Laplace equation
2 = 0,
for example, it may be more computational efficient to solve its unsteady counterpart
= 2
t
until /t 0, which corresponds to the steady solution.
15.2.2 von Neumann Method (Fourier Analysis)
If the difference equation is linear, then we can take advantage of superposition
of solutions using Fourier analysis. The error is regarded as a superposition of
Fourier modes, and the linearity allows us to evaluate stability of each mode
individually; if each mode is stable, then the linear superposition of the modes is
also stable.
We expand the error along grid lines at each time level as a Fourier series.
It is then determined if the individual Fourier modes decay or amplify in time.
Expanding the error at t = 0 (n = 0), we have
e(x, 0) =
I1
X
em (x, 0) =
m=1
I1
X
am (0)eim x ,
(15.13)
m=1
where
0 x 1, am (0) are the amplitudes of the Fourier modes m = m, and
i = 1. At a later time t, the error is expanded as follows:
e(x, t) =
I1
X
m=1
em (x, t) =
I1
X
am (t)eim x .
(15.14)
m=1
We want to determine how am (t) behaves with time for each Fourier mode m. To
do this, define the gain of mode m, denoted by Gm (x, t), as
Gm (x, t) =
em (x, t)
am (t)
=
,
em (x, t t)
am (t t)
which is the amplification factor for the mth mode during one time step. Hence,
267
the error will not grow, that is, the method is stable, if Gm  1 for all m. If it
takes n time steps to get to time t, then the amplification after n time steps is
(Gm )n =
am (t)
am (t t)
am (t)
am (t)
...
=
,
am (t t) am (t 2t)
am (0)
am (0)
j = 2, . . . , I.
(15.15)
This equation is linear; therefore, each mode m must satisfy the equation independently. Thus, substituting equation (15.14) with am (t) = (Gm )n am (0) into
equation (15.15) gives (canceling am (0) in each term)
h
i
(Gm )n+1 eim x = (1 2s)(Gm )n eim x + s(Gm )n eim (x+x) + eim (xx) ,
where we note that j x, j+1 x+x, j1 xx. Dividing by (Gm )n eim x ,
which is common in each term, we have
Gm = (1 2s) + s eim x + eim x
Gm
= 1 2s [1 cos(m x)]
m x
2
.
= 1 (4s) sin
2
for all
m = m.
268
3. von Neumann analysis applies for linear differential equations having constant
coefficients, and the boundary conditions are not accounted for actually, they
are assumed to be periodic.
Collecting the unknowns on the lefthand side, we have the difference equation
n+1
n
sn+1
+ sn+1
j+1 (1 + 2s)j
j1 = j ,
j = 2, . . . , I
(15.16)
269
Let us consider a von Neumann stability analysis. The error satisfies the finite
difference equation (15.16), so is
n+1
n
sen+1
+ sen+1
j+1 (1 + 2s)ej
j1 = ej .
(15.17)
I1
X
(Gm )n am (0)eim x ,
m = m.
(15.18)
m=1
Substituting into equation (15.17) for the error gives [and canceling am (0) in each
term]
s(Gm )n+1 eim (x+x) (1 + 2s)(Gm )n+1 eim x
+s(Gm )n+1 eim (xx) = (Gm )n eim x
s eim x + eim x (1 + 2s) Gm = 1
[(2s) cos(m x) (1 + 2s)] Gm = 1
{1 + 2s [1 cos(m x)]} Gm = 1.
Thus, the gain of mode m is
m x 1
Gm = 1 + (4s) sin2
.
2
270
i
=
t
2
n+1
ni+1 2ni + ni1
n+1
+ n+1
i+1 2i
i1
+
x2
x2
Later we will show that averaging the secondorder derivative terms across time
levels in this manner is secondorder accurate in time. Writing the difference
equation in tridiagonal form, with the unknowns on the lefthand side, we have
n+1
n
n
n
sn+1
+sn+1
i+1 2(1+s)i
i1 = si+1 2(1s)i si1 ,
i = 2, . . . , I, (15.19)
271
1
= (n+1
+ ni ) + T.E.
2 i
(15.20)
n+1/2
We seek an expression of the form i
= + T.E., where is the exact value
n+1/2
of i
midway between time levels. Let us expand each term in the expression
(15.20) as a Taylor series about (xi , tn+1/2 ) as follows:
n+1
i
ni
k
X
1 t
=
Dt ,
k!
2
k=0
k
k
X
X
t
1
(1)k t
Dt =
Dt ,
=
k!
2
k!
2
k=0
k=0
k
t
1X 1
1 + (1)k
Dt .
2 k=0 k!
2
Noting that
(
1 + (1)k =
0,
k = 1, 3, 5, . . .
2,
k = 0, 2, 4, . . .
1
=
(2m)!
m=0
t
Dt
2
2m
272
1
= (n+1
+ ni ) + O(t2 ).
2 i
This confirms that averaging across time levels gives an O(t2 ) approximation
of i at the midtime level tn+1/2 .
15.4 Nonlinear Convective Problems
As with elliptic equations, which correspond to steady problems, it is essential
in fluid dynamics, heat transfer, and mass transfer to be able to handle nonlinear convective terms in unsteady, that is, parabolic, contexts. Consider the onedimensional, unsteady Burgers equation, which is a onedimensional, unsteady
diffusion equation with convection term, given by
2u
u
u
= 2 u ,
(15.21)
t
x
x
where is the viscosity. Let us consider how the nonlinear convection term
uu/x is treated in the various schemes.
u
+ O(x2 , t),
i
t
x2
2x
where the uni in the convection term is known from the previous time level.
Writing in explicit form leads to
1 n n
1 n n
n
un+1
=
s
C
u
+
(1
2s)u
+
s
+
C
ui1 ,
i
i+1
i
2 i
2 i
where s =
Remark:
t
,
x2
and Cin =
un
i t
x
273
1. For stability, this method requires that 2 Rex 2/Ci , where Rex =
uni x/ is the mesh Reynolds number. This is very restrictive.
15.4.2 CrankNicolson Method
In the CrankNicolson method, all spatial derivatives are approximated at the
midtime level as shown in Figure 15.6. Therefore, the onedimensional, unsteady
Burgers equation becomes
2 un+1 2 un
un+1
uni
1 n+1/2 un+1 un
i
=
+
+
u
t
2
x2
x2
2
x
x
n+1
n
n
n
(15.22)
=
ui+1
2un+1
+ un+1
i
i1 + ui+1 2ui + ui1
2
2x
n+1/2
u
n+1
n
n
2
2
un+1
i
i+1 ui1 + ui+1 ui1 + O(x , t ),
4x
where we average across time levels to obtain the velocity according to
1
= (un+1
+ uni ) + O(t2 ).
(15.23)
2 i
This results in the implicit finitedifference equation
1 n+1/2 n+1
1 n+1/2 n+1
n+1
s Ci
ui+1 + 2(1 + s)ui s + Ci
ui1
2
2
1 n+1/2 n
1 n+1/2 n
n
= s Ci
ui+1 + 2(1 s)ui + s + Ci
ui1 ,
2
2
(15.24)
n+1/2
ui
n+1/2
n+1/2
n+1/2
where here Ci
= i x , but we do not know ui
procedure requires iteration at each time step:
1. Begin with uki = uni (k = 0), that is, use the ui from the previous time step as
an initial guess at current time step.
2. Increment k = k + 1. Compute an update for uki = un+1
, i = 1, . . . , I + 1, using
1
equation (15.24).
n+1/2
3. Update ui
= 12 (un+1
+ uni ).
i
274
un+1/2
.
x
un uni
un+1
un+1
i
i1
+ i+1
x
x
(15.25)
275
i1/2
is given by
u
1
=
x
2
n+1
un uni1
un+1
i+1 ui
+ i
x
x
(15.26)
)
un+1
un+1
i
i1
+ 2(1 +
+
n+1
un+1
i+1 ui
(
)
n
n
u
u
,
n+1/2
i+1
i
= suni+1 + 2(1 s)uni + suni1 Ci
n
n
ui ui1
,
sun+1
i+1
s)un+1
i
sun+1
i1
n+1/2
Ci
n+1/2
ui
n+1/2
ui
>0
.
<0
(15.27)
Remarks:
1. Equation (15.27) is diagonally dominant for the onedimensional, unsteady
276
n+1/2
n+1/2
may be positive or
2. Iteration at each time step is required owing to the nonlinear term uu/x,
which may require underrelaxation on ui ; therefore,
,
uk+1
= (1 )uki + uk+1
i
i
k = 0, 1, 2, . . . ,
(15.28)
We would like to determine the order of accuracy, that is, the truncation error
T.E., of this approximation. Here, Dt = /t, Dx = /x, and u
is the exact
n+1/2
value of ui
midway between time levels as illustrated in Figure 15.9.
We seek an expression of the form
u
u
=
+ T.E.
x
x
Expanding each term in equation (15.28) as a twodimensional Taylor series about
277
un+1
=
i
uni
uni1
k
X
1 t
,
Dt + xDx u
k! 2
k=0
k
X
1 t
,
Dt u
k! 2
k=0
k
k
X
X
t
(1)k t
1
Dt u
=
,
Dt u
k!
2
k!
2
k=0
k=0
k
k
X
X
1
t
(1)k t
=
.
Dt xDx u
Dt + xDx u
k!
2
k!
2
k=0
k=0
t
u
1 X 1
=
1 (1)k
Dt + xDx
x
2x k=0 k!
2
k )
t
k
Dt
u
,
+ 1 + (1)
2
"
k
k #
u
1 X 1 + (1)k+1
t
t
=
Dt + xDx
Dt
u
.
x
2x k=0
k!
2
2
Note that
(
1 + (1)k+1 =
0,
k = 0, 2, 4, . . .
2,
k = 1, 3, 5, . . .
u
1 X
1
t
t
=
Dt + xDx
Dt
u
.
x
x l=0 (2l + 1)!
2
2
(15.29)
In order to treat the term that arises from the twodimensional Taylor series,
recall the binomial theorem
!
k
X
k km m
k
(a + b) =
a
b ,
m
m=0
where
k
m
k!
m!(k m)!
(0! = 1).
278
X 2l + 1 t 2l+1m
1 X
1
u
(xDx )m
=
Dt
x
x l=0 (2l + 1)! m=0
m
2
2l+1 #
t
u
,
Dt
2
"2l+1
!
X 2l + 1 t 2lm+1
1 X
1
=
(xDx )m
Dt
x l=0 (2l + 1)! m=1
m
2
"
!
!
#
2l+1
2l+1 #
2l + 1
t
2l + 1
t
+
u
,
Dt
Dt
=1
0
2
2
0
2lm+1
2l+1
X
X
(2l + 1)!
t
1
(x)m1 (Dx )m u
,
=
Dt
(2l
+
1)!
m!(2l
m
+
1)!
2
m=1
l=0
2lm+1
2l+1
u
u
XX
1
t
=
+
Dt
(x)m1 (Dx )m u
,
x
x l=1 m=1 m!(2l m + 1)! 2
where the u
/x term results from taking l = 0, m = 1 in the double summation.
To obtain the truncation error, consider the l = 1 term for which m = 1, 2, 3
produces
2
1
t
1
t
1
2
Dt Dx u
+
xDt Dx2 u
+
x2 Dx3 u
.
1!2! 2
2!1! 2
3!1!
Therefore, the truncation error is
O(t2 , tx, x2 ).
Consequently, if t < x, then the approximation is O(x2 ) accurate, and if
t > x, then the approximation is O(t2 ) accurate. This is better than O(x)
or O(t), but strictly speaking it is not O(t2 , x2 ). Note that the O(tx)
term arises owing to the diagonal averaging across time levels with the upwinddownwind differencing.
Remark:
1. The method is unconditionally stable.
279
2
+ 2 , = (x, y, t),
=
t
x2
y
(15.30)
at
t = 0,
(15.31)
(15.32)
where sx = t/x , and sy = t/y . For numerical stability, a von Neumann stability analysis requires that
1
sx + sy .
2
Thus, for example if x = y, in which case sx = sy = s, we must have
1
s ,
4
which is even more restrictive than for the onedimensional, unsteady diffusion
equation, with s 12 for stability. The threedimensional case becomes even more
restrictive, with s 16 for numerical stability.
280
(15.33)
Observe that this contains five unknowns and produces a banded matrix as illustrated in Figure 15.11, rather than a tridiagonal matrix as in the onedimensional
case.
Remarks:
1. For the twodimensional, unsteady diffusion equation, the firstorder implicit
method is unconditionally stable for all sx and sy .
2. The usual CrankNicolson method could be used to obtain secondorder accuracy in time. It produces a similar implicit equation as for the onedimensional
case, but with more terms on the righthandside evaluated at the previous
time step.
281
where a central difference is used for the time derivative, and averaging across
time levels is used for the spatial derivatives. Consequently, putting the unknowns
on the lefthand side and the knowns on the righthand side yields
1
1
1
1
n+1/2
n+1/2
n+1/2
sx i+1,j (1 + sx )i,j
+ sx i1,j = sy ni,j+1 (1 sy )ni,j sy ni,j1 .
2
2
2
2
(15.35)
Taking i = 1, . . . , I + 1 leads to the tridiagonal problems (15.35) to be solved for
n+1/2
i,j
at each j = 1, . . . , J + 1, at the intermediate time level.
We then sweep along lines of constant x during the second half time step using
the difference equation in the form
!
n+1/2
n+1/2
n+1/2
n+1/2
n+1
n+1
n+1
i+1,j 2i,j
n+1
+ i1,j
i,j i,j
i,j+1 2i,j + i,j1
=
+
,
t/2
x2
y 2
(15.36)
which becomes
1
1
1
1
n+1/2
n+1/2
n+1/2
n+1
n+1
sy n+1
sx i1,j .
i,j+1 (1 + sy )i,j + sy i,j1 = sx i+1,j (1 sx )i,j
2
2
2
2
(15.37)
Taking i = 1, . . . , I + 1 leads to the tridiagonal problems (15.37) to be solved for
n+1
at each j = 1, . . . , J + 1, at the current time level.
i,j
Note that this approach requires boundary conditions at the intermediate time
level n + 1/2 for equation (15.35). This is straightforward if the boundary condition does not change with time; however, a bit of care is required if this is not the
case. For example, if the boundary condition at x = 0 is Dirichlet, but changing
with time, as follows
(0, y, t) = a(y, t),
then
n1,j = anj .
Subtracting equation (15.36) from (15.34) gives
n+1/2
n+1/2
ni,j
n+1
i,j i,j
t/2
t/2
!
n+1
n+1
i,j+1 2i,j + n+1
i,j1
,
y 2
i,j
and solving for the unknown at the intermediate time level results in
1 n
1 n
n+1/2
n+1
n+1
i,j
=
i,j + n+1
+ sy i,j+1 2ni,j + ni,j1 n+1
.
i,j
i,j+1 2i,j + i,j1
2
4
Applying this equation at the boundary x = 0, leads to
1 n
1 n
n+1/2
n+1
1,j
=
+ an+1
.
aj + an+1
+ sy aj+1 2anj + anj1 an+1
j
j+1 2aj
j1
2
4
This provides the boundary condition for 1,j at the intermediate (n + 1/2) time
282
level. Note that the first term on the righthandside is the average of a at the
n and n + 1 time levels, the second term is 2 an /y 2 , and the third term is
2 an+1 /y 2 . Thus, if the boundary condition a(t) does not depend on y, then
n+1/2
1,j
is simply the average of an and an+1 .
Remarks:
1. The ADI method with time splitting is O(t2 , x2 , y 2 ) accurate.
2. For stability, it is necessary to apply the von Neumann analysis at each half
step and take the product of the resulting amplification factors, G1 and G2 ,
to obtain G for the full time step. Such an analysis shows that the method
is unconditionally stable for all sx and sy for the twodimensional, unsteady
diffusion equation.
3. In three dimensions, we require three fractional steps (t/3) for each time
step, and the method is only conditionally stable, where
3
sx , sy , sz ,
2
for stability (sz = t/z 2 ).
15.5.4 Factored ADI Method
We can improve on the ADI method with time splitting using the factored ADI
method. It provides a minor reduction in computational cost as well as improving
on the stability properties for theedimensional cases. In addition, it can be extended naturally to the nonlinear convection case. Let us once again reconsider
the twodimensional, unsteady diffusion equation
2
2
=
+ 2 ,
(15.38)
t
x2
y
and apply the CrankNicolson approximation
n
n+1
2 n+1
i,j i,j
2 n
=
x i,j + x2 ni,j + y2 n+1
i,j + y i,j ,
t
2
y2 i,j =
(15.39)
283
(15.40)
where the first factor only involves the difference operator in the xdirection, and
the second factor only involves the difference operator in the ydirection. Observe
that the factored operator produces an extra term as compared to the unfactored
operator
1 2 2 2 2
t x y = O(t2 ),
4
which is O(t2 ). Therefore, the factorization (15.40) is consistent with the secondorder accuracy in time of the CrankNicolson approximation.
The factored form of equation (15.39) is
n
1
1
1
2
2
n+1
2
2
1 tx 1 ty i,j = 1 + t x + y i,j ,
2
2
2
which can be solved in two steps by defining the intermediate variable
1
i,j = 1 ty2 n+1
i,j .
2
(15.41)
n+1/2
Note that i,j is not the same as i,j , which is an intermediate approximation
to i,j at the half time step, in ADI with time splitting. The twostage solution
process is:
(15.42)
1 ty n+1
(15.43)
i,j = i,j ,
2
which produces a tridiagonal problem at each i for n+1
i,j , j = 1, . . . , J + 1 at
the current time step. Note that the righthand side of this equation is the
intermediate variable solved for in equation (15.42).
Remarks:
1. The factored ADI method is similar to the ADI method with time splitting,
n+1/2
but we have an intermediate variable i,j rather than half time step i,j .
The factored ADI method is somewhat faster as it only requires one evaluation
of the spatial derivatives on the righthand side per time step [for equation
(15.42)] rather than two for the ADI method with time splitting [see equations
(15.35) and (15.37)].
284
16
FiniteDifference Methods for Hyperbolic
Partial Differential Equations
Like parabolic equations, hyperbolic equations represent initialvalue problems.
Therefore, methods for their numerical solution bear a strong resemblance. However, there are several unique issues that arise in the solution of hyperbolic problems that must be adequately accounted for in obtaining their numerical solution.
285
17
Additional Topics in Numerical Methods
287
equations, an outer loop is added to account for the time marching as shown in
Figure 17.2. The inner loop is performed to obtain the solution of the coupled
equations at the current time step. Generally, there is a tradeoff between the
time step and the number of iterations required at each time step. Specifically,
reducing the time step t reduces the number of inner loop iterations required
for convergence. In practice, the time step should be small enough such that no
more than ten to twenty iterations are required at each time step.
288
+ .
2
x
xi1 + xi 2(xi1 + xi ) x i 6(xi1 + xi ) x3 i
If the grid is uniform, that is, xi1 = xi , then the second term vanishes,
and the approximation reduces to the usual O(x2 )accurate central difference
approximation for the first derivative. However, for a nonuniform grid, the truncation error is only O(x). We could restore secondorder accuracy by using an
appropriate approximation to ( 2 /x2 )i in the second term, which results in
i+1 x2i1 i1 x2i + i (x2i x2i1 )
=
+ O(x2 ).
x
xi xi1 (xi1 + xi )
As one can imagine, this gets very complicated, and it is difficult to ensure consistent accuracy for all approximations.
An alternative approach for concentrating grid points in certain regions of a
domain employs grid transformations that map the physical domain to a computational domain that may have a simple overall shape and/or cluster grid points
in regions of the physical domain where the solution varies rapidly, that is, where
large gradients occur. The mapping is such that a uniform grid in the computa
289
See Variational Methods with Applications in Science and Engineering, Chapter 12, for more on
algebraic, elliptic, and variational grid generation.
290
291
292
eigenfunction expansions (see Chapter 3). Rather than approximating the differential equation itself, the unknown solution is approximated directly in the form
of a truncated series expansion. The primary virtue of spectral methods is that
they converge very rapidly toward the exact solution as the number of terms in
the expansions is increased. At its best, this convergence is exponential, which is
referred to as spectral convergence.
Along with finiteelement methods, spectral methods utilize the method of
weighted residuals. We begin by approximating the unknown solution to a differential equation Lu = f using the expansion
u
(x, y, z, t) = 0 (x, y, z) +
N
X
(17.1)
n=1
where u
(x, y, z, t) is the approximate solution to the differential equation; n (x, y, z)
are the spatial basis, or trial, functions; cn (t) are the timedependent coefficients; and 0 (x, y, z) is chosen to satisfy the boundary conditions, such that
n = 0, n = 1, . . . , N at the boundaries. Similarly, we expand the forcing function f (x, y, z) in terms of the same basis functions.
Consider the exact solution u(x, y, z, t) of the differential equation
Lu = f.
(17.2)
n=1
293
Least squares:
wi (x) =
which results in
Galerkin:
r
,
ci
r2 dx being a minimum.
wi (x) = i (x);
that is, the weight (test) functions are the same as the basis (trial) functions.
In spectral methods, the trial functions are chosen such that they are mutually
orthogonal for reasons that will become apparent in the following example.
As a onedimensional example, let us consider the ordinary differential equation
Lu =
d2 u
+ u = 0,
dx2
0 x 1,
(17.5)
at x = 0,
u=1
at x = 1.
and
Let us use sines, which are mutually orthogonal over the specified domain, for
the trial functions, such that
n (x) = sin(nx),
n = 1, . . . , N.
u
(x) = x +
cn sin(nx),
(17.6)
n=1
N
X
cn n cos(nx),
n=1
and again
u
00 (x) =
N
X
cn (n)2 sin(nx).
n=1
Substituting into the differential equation (17.5) to obtain the residual yields
r(x) = f L
u=
N
X
n=1
cn (n) sin(nx) x
N
X
n=1
cn sin(nx),
294
or
r(x) = x
N
X
cn 1 (n)2 sin(nx).
n=1
From equation (17.3) with wi (x) = i (x), that is, the Galerkin method, we have
Z 1
r(x)i (x)dx = 0, i = 1, . . . , N,
0
n=1
or
Z
x sin(ix)dx +
0
N
X
cn 1 (n)2
sin(ix) sin(nx)dx = 0
(17.7)
n=1
R1
for i = 1, . . . N . Owing to orthogonality of sines, that is, 0 sin(ix) sin(nx)dx =
0 for n 6= i, the only contribution to the summation in equation (17.7) arises when
n = i. Let us evaluate the resulting integrals:
1
Z 1
sin(ix) x cos(ix)
x sin(ix)dx =
(i)2
i
0
0
, i = 1, 3, 5, . . .
(1)i
i
=
=
,
1
i
, i = 2, 4, 6, . . .
i
and
Z
0
x sin(2ix)
sin (ix)dx =
2
4i
2
1
0
1
= .
2
(1)i 1
+ ci 1 (i)2 = 0,
i
2
i = 1, . . . , N.
2(1)i
,
i [1 (i)2 ]
i = 1, . . . , N,
(17.8)
N
X
2(1)n
sin(nx).
n [1 (n)2 ]
n=1
(17.9)
295
1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
2
sin(x).
[1 2 ]
sin x
.
sin 1
Observe from Figure 17.4, even with only one term in the spectral solution, that
it is indistinguishable from the exact solution.
Remarks:
1. Only small N is required in the above example because the underlying solution
is a sine function. This, of course, is not normally the case.
2. Because each trial function spans the entire domain, spectral methods provide
global approximations.
3. Spectral methods give highly accurate approximate solutions when the underlying solution is smooth, and it is when the solution is smooth that exponential
(spectral) convergence is achieved with increasing N . The number of terms required becomes very large when the solution contains large gradients, such
as shocks. Finitedifference methods only experience geometric convergence.
For example, in a secondorder accurate method, the error scales according to
1/N 2 , which is much slower than exponential decay.
4. For periodic problems, trigonometric functions are used for the trial functions.
Whereas, for nonperiodic problems, Chebyshev or Legendre polynomials are
typically used for the trial functions.
5. For steady problems, the cn coefficients are in general determined by solving a
system of N algebraic equations.
296
n=1
Observe the similarities between obtaining the weak form and the inverse problem
in variational methods.
The primary difference between finiteelement and spectral methods is that N
is small, that is, N = 1 (linear) or 2 (quadratic) is typical, for finiteelement
methods. However, these shape functions are applied individually across many
small elements that make up the entire domain. In other words, just like finitedifference methods, finiteelement method are local approximation methods. Recall that in spectral methods, each basis function is applied across the entire
domain, which is why N must in general be larger than in finiteelement methods.
Remarks:
1. References:
For more details on the variational underpinnings of finiteelement methods, see Chapter 3 of Variational Methods with Applications in Science and
Engineering by Cassel (2013).
For additional details of finiteelement methods and the method of weighted
residuals, see Sections 6.6 and 6.7 of Fundamentals of Engineering Numerical
Analysis by Moin (2010) and Computational Fluid Dynamics by Chung
(2010).
2. Finiteelement methods can be combined with spectral methods to obtain the
spectralelement method, for which the shape functions across each element in
the finiteelement method are replaced by spectral expansion approximations.
This typically allows for larger elements as compared to finiteelement methods that have only linear or quadratic shape functions. One can then take
advantage of the flexibility of finiteelements in representing complex domains
with the spectral accuracy and convergence rate of spectral methods.
297
Part III
Applications
299
18
Static Structural and Electrical Systems
301
19
Discrete Dynamical Systems
The mathematical theory of discrete dynamical systems, known generally as linear systems theory, plays a prominent role in almost all areas of science and
engineering, including mechanical systems, biological systems, electrical systems,
communications, signal processing, structural systems, and many others. For a
system to be linear, it simply means that the mathematical model governing the
systems mechanics is comprised of linear algebraic and/or differential operators.
They have the property that when the input is increased or decreased, the output
is increased or decreased by a proportional amount. The mathematics of linear
systems is very well established and tightly integrated.
Linear systems theory is so powerful, and the theory so well developed, that
when dealing with a nonlinear system, it is common to first analyze the system by
linearizing the model in the vicinity of a characteristic (often equilibrium) state
or solution trajectory, thereby bringing to bear the full range of tools available for
analyzing linear systems. Although such an analysis only applies when the system
state is close to that about which it has been linearized, this approach is often
very fruitful and provides a wealth of knowledge about the systems behaviors.
In discrete systems, the masses are rigid (nondeformable), whereas in continuous systems, the mass is distributed and deformable. Discrete dynamical systems
are comprised of masses, pendulums, springs, etc, and are generally governed
by systems of linear ordinary differential equations (see Chapter 2). Continuous
dynamical systems are comprised of strings, membranes, beams, etc, and are
generally governed by linear partial differential equations (see Chapter 3).
Linear systems theory is most closely associated with the dynamics of discrete
systems, for which there are two formulations that are commonly used in the context of dynamical systems. Application of Newtons second law and Hamiltons
principle (through the EulerLagrange equations) directly produces a system of
n secondorder ordinary differential equations in time, where n is the number
of degrees of freedom of the system. Alternatively, the statespace form is often
emphasized. This form follows from Hamiltons canonical equations, which produce a system of 2n firstorder linear ordinary differential equations in time.1 Of
course, these two formulations are mathematically equivalent, and we can easily
transform one to the other as desired.
1
See Variational Methods with Applications in Science and Engineering by Cassel (2013) for more on
Hamiltons principle, the EulerLagrange equations, and Hamiltons canonical equations.
302
303
2m
x1 (t)
k
x2 (t)
 k(x x )
2
1
d2 x1
= k(x2 x1 ) 2kx1 = 3kx1 + kx2 ,
dt2
(19.1)
2m
k(x2 x1 )
2m
 kx
2
d2 x2
= kx2 k(x2 x1 ) = kx1 2kx2 .
(19.2)
dt2
Thus, we have a system of two secondorder ordinary differential equations for
x1 (t) and x2 (t).
It is common to write such systems of equations for dynamical systems in the
matrix form
2m
M
x + Cx + Kx = f (t),
where M is the mass matrix, C is the damping matrix, K is the stiffness matrix,
304
x
is the velocity vector, x(t) is the displacement
vector, and f (t) is the force vector. In the present case
m 0
0 0
3k k
0
x
M=
, C=
, K=
, f=
, x= 1 .
0 2m
0 0
k 2k
0
x2
Before seeking the general solution to this system of equations, let us first
obtain the natural frequencies (normal modes) of the system.2 These will occur
when the two masses oscillate with the same frequency; therefore, assume periodic
motion of the form3
x1 (t) = A1 eit ,
x2 (t) = A2 eit ,
where is the natural frequency, and A1 and A2 are the amplitudes of masses 1
and 2, respectively. It is understood that we are taking the real part of the final
expressions for x1 (t) and x2 (t). For example,
x1 (t) = Re A1 eit = Re [A1 cos(t) + A1 i sin(t)] = A1 cos(t).
Evaluating the derivatives gives
d2 x1
= A1 2 eit ,
dt2
d2 x2
= A2 2 eit .
dt2
Substituting into equations (19.1) and (19.2) and canceling eit gives the two
linear algebraic equations
m 2 A1 = 3kA1 + kA2 ,
2m 2 A2 = kA1 2kA2 ,
or upon rearranging
3A1 A2 = A1 ,
21 A1 + A2 = A2 ,
where = m 2 /k. Thus, we have two linear algebraic equations for the two
unknown amplitudes A1 and A2 . In matrix form, this is
3 1 A1
A
= 1 .
21 1
A2
A2
Consequently, we have an eigenproblem in which the eigenvalues are related to
the natural frequencies of the system, which is typical of dynamical systems.
Solving, we find that the eigenvalues are
6
m22
6
m12
=2
, 2 =
=2+
.
1 =
k
2
k
2
2
A normal mode is a solution in which all of the parts of the system oscillate with the same
frequency, so the motion of the entire system is periodic.
This form is better than setting x1 (t) = A1 cos(t) directly, for example, which does not work if we
have oddorder derivatives.
305
(19.3)
x
4 (t) = x 2 (t).
Recall from Section 2.5.2 that two secondorder ordinary differential equations
transform to four firstorder equations. Differentiating the substitutions and transforming to x
1 (t), . . . , x
4 (t), we have the following system of four equations
x
1 (t) = x 1 (t) = x
3 (t),
x
2 (t) = x 2 (t) = x
4 (t),
x
3 (t) = x
1 (t) = 3Kx1 (t) + Kx2 (t) = 3K x
1 (t) + K x
2 (t),
1 (t) K x
2 ,
x
4 (t) = x
2 (t) = 21 Kx1 (t) Kx2 (t) = 12 K x
for the four unknowns x
1 (t), x
2 (t), x
3 (t), and x
4 (t), where K = k/m. Written in
(t) = A
matrix form, x
x(t), we have
x
1 (t)
0
0
1 0 x
1 (t)
x
2 (t)
0
0 1
2 (t) = 0
x
.
x
3 (t) 3K K 0 0 x
3 (t)
1
x
4 (t)
K K 0 0 x
4 (t)
2
For simplicity, let us take K = k/m = 1. Then the eigenvalues of A, which are
obtained using Matlab or Mathematica, are
1 = 1.7958i, 2 = 1.7958i, 3 = 0.8805i, 4 = 0.8805i,
which are complex conjugate pairs. The corresponding eigenvectors are also complex
0.4747i
0.4747i
0.1067i
0.1067i
u1 =
0.8524 , u2 = 0.8524 ,
0.1916
0.1916
306
0.3077
0.3077
0.6846
0.6846
u3 =
0.2709i , u4 = 0.2709i .
0.6027i
0.6027i
(19.4)
a2 = A(c3 + c4 ),
a3 = B(c1 c2 ),
a4 = B(c3 c4 ).
307
1.0
0.5
10
20
30
40
0.5
1.0
Figure 19.2 Solution for x1 (t) (solid line) and x2 (t) (dashed line) for
Example ??.
Note that the general solution consists of a superposition of the two modes determined by the natural modes calculation above. Also observe that the imaginary
eigenvalues correspond to oscillatory behavior as expected for the springmass
system.
To obtain the four integration constants, we require four initial conditions. We
will specify the positions and velocities of the two masses as follows:
x1 = 0, x 1 = 0
at t = 0,
x2 = 1, x 2 = 0
at t = 0.
0
0.3077 0.4747
0
a1
0
0.8525
a2 0
0
0
0.2709
= .
0
0.6846 0.1067
0 a3 1
0.1916
0
0
0.6028 a4
0
Observe that the initial conditions appear in the righthand side vector. Solving
this system gives
a1 = 0,
a2 = 1.3267,
a3 = 0.8600,
a4 = 0.
Substituting these constants into the general solutions (19.4) for x1 (t) (solid line)
and x2 (t) (dashed line) yields the solution shown in Figure 19.2.
Recall that all general solutions, including the one above, are linear combinations of the two natural frequencies. Thus, we would expect that a set of initial
conditions can be found that excites only one of the natural frequencies. For
example, choosing initial conditions such that only 1 is excited is shown in Figure 19.3, and that which only excites 2 is shown in Figure 19.4.
308
0.5
10
20
30
40
0.5
1.0
Figure 19.3 Solution for x1 (t) (solid line) and x2 (t) (dashed line) for
Example ?? when only 1 is excited.
0.6
0.4
0.2
10
20
30
40
0.2
0.4
0.6
Figure 19.4 Solution for x1 (t) (solid line) and x2 (t) (dashed line) for
Example ?? when only 2 is excited.
Remarks:
1. In this example, the matrix A is comprised entirely of constants, that is, it
does not depend on time. Such systems are called autonomous.
2. The diagonalization procedure is equivalent to finding an alternative set of
coordinates, y(t), sometimes called principal coordinates, with respect to which
the motions of the masses in the system are uncoupled. Although this may
seem physically counterintuitive for a coupled mechanical system, such as our
springmass example, the mathematics tells us that it must be the case (except
for systems that result in nonsymmetric matrices with repeated eigenvalues,
which result in the Jordan canonical form).
309
0
0
0
1
2 .
2 =
(19.5)
x
3
3 3K K 0 0 x
1
x
4
K K 0 0 x
4
2
First, we obtain the equilibrium positions for which the velocities and accelerations of the masses are both zero. Thus,
{x 1 , x 2 , x
1 , x
2 } = {x
1 , x
2 , x
3 , x
4 } = {0, 0, 0, 0} .
= 0 in equation (19.5) and solving A
This is equivalent to setting x
x = 0. Because
A is invertible, the only solution to this homogeneous system is the trivial solution
= 0,
s=x
or
T
s = [
x1 , x
2 , x
3 , x
4 ] = [0, 0, 0, 0] ,
(19.6)
This material is from Variational Methods with Applications in Science and Engineering, Chapter 6.
310
system at which the masses could remain indefinitely. Note that s = 0 is the
neutral position of the masses for which the forces in the three springs are zero.
The second step is to consider the behavior, that is, stability, of the system
about this equilibrium point subject to small disturbances
= s + u,
x
= A
= u and s = 0 for this case, the linear system x
x
where 1. Given that x
becomes
u = Au.
(19.7)
3,4 = 0.8805i,
which corresponds to harmonic motion, that is, linear combinations of sin(i t) and
cos(i t), i = 1, 2, 3, 4. Harmonic motion remains bounded for all time; therefore,
it is stable. We say that the equilibrium (stationary) point s of the system is
linearly stable in the form of a stable center. Note that the motion does not decay
toward s due to the lack of damping in the system.
Remarks:
1. Stability of dynamical systems is determined by the nature of its eigenvalues.
2. More degrees of freedom leads to larger systems of equations and additional
natural frequencies.
Example 19.3 Now let us consider stability of a nonlinear system. As illustrated in Figure 19.5, a simple pendulum (with no damping) is a onedegree of
freedom system with the angle (t) being the dependent variable. It is governed
by
g
+ sin = 0.
(19.8)
`
Solution: To transform the nonlinear governing equation to a system of firstorder equations, let
x1 (t) = (t),
x2 (t) = (t),
311
y
0
g
(t)
such that x1 (t) and x2 (t) give the angular position and velocity, respectively.
Differentiating and substituting the governing equation gives
x 1 = = x2 ,
(19.9)
g
g
x 2 = = sin = sin x1 .
`
`
Therefore, we have a system of firstorder nonlinear equations owing to the sine
function.
Equilibrium positions occur where the angular velocity and angular acceleration are zero; thus,
}
= {x 1 , x 2 } = {0, 0}.
{,
Thus, from the system (19.9)
g
sin x1 = 0,
`
and the stationary points are given by
x2 = 0,
s1 = x1 = n,
s2 = x2 = 0,
n = 0, 1, 2, . . . .
n = 0, 1, 2, . . . .
Therefore, there are two equilibrium points corresponding to n even and n odd.
The equilibrium point with n even corresponds to when the pendulum is hanging vertically downward ( = 0, 2, . . .), and the equilibrium point with n odd
corresponds to when the pendulum is located vertically above the pivot point
( = , 3, . . .).
In order to evaluate stability of the equilibrium states, let us impose small
disturbances about the equilibrium points according to
x1 (t) = s1 + u1 (t) = n + u1 (t),
x2 (t) = s2 + u2 (t) = u2 (t),
312
(19.10)
Note that
(
sin(u1 ),
n even
sin(u1 ), n odd
Because is small, we may expand the sine function in terms of a Taylor series
(u1 )3
+ = u1 + O(3 ),
3!
where we have neglected higherorderterms in . Essentially, we have linearized
the system about the equilibrium points by considering an infinitesimally small
perturbation to the equilibrium (stationary) solutions of the system. Substituting
into the system of equations (19.10) and canceling leads to the linear system of
equations
u 1 = u2 ,
(19.11)
g
u 2 = u1 ,
`
where the minus sign corresponds to the equilibrium point with n even, and the
plus sign to n odd. Let us consider each case in turn.
For n even, the system (19.11) in matrix form u = Au is
" # "
#" #
0 1 u1
u 1
g
.
=
0 u2
u 2
`
In order to diagonalize the system, determine the eigenvalues of A. We write
sin(u1 ) = u1
(A I) = 0.
For a nontrivial solution, the determinant must be zero, such that
1
g
= 0,
`
or
g
2 + = 0.
`
Factoring the gives the eigenvalues
r
r
g
g
1,2 = =
i ( imaginary).
`
`
Then in uncoupled variables y (x = Py), where
y = P1 APy,
P1 AP =
1 0
,
0 2
313
the solution near the equilibrium point corresponding to n even is of the form
g
g
u(t) = c1 e ` it + c2 e ` it ,
or
r
u(t) = c1 sin
r
g
g
t + c2 cos
t .
`
`
This oscillatory solution is linearly stable in the form of a stable center (see the
next section) as the solution remains bounded for all time.
In the case of n odd, the system (19.11) is
" # "
#" #
0 1 u1
u 1
= g
.
0 u2
u 2
`
Finding the eigenvalues produces
1
g
= 0,
`
g
2 = 0,
`
r
g
1,2 =
( real).
`
Thus, in uncoupled variables
g
y1 = c1 e ` t ,
y2 = c2 e
g
`
Then the solution near the equilibrium point corresponding to n odd is of the
form
g
g
u(t) = c1 e ` t + c2 e ` t .
Observe that while the second term decays exponentially as t , the first term
grows exponentially, eventually becoming unbounded. Therefore, this equilibrium
point is linearly unstable in the form of an unstable saddle (see the next section).
314
This material is from Variational Methods with Applications in Science and Engineering, Chapter 6.
315
u2 HtL
2
2
1
u1 HtL
1
2
10
5
10
u1 HtL
5
saddle point. This is the behavior exhibited by the simple pendulum with n
odd ( = ).
2. Unstable Node or Source: tr(A)2 4A > 0, tr(A) > 0, A > 0
If the eigenvalues of A are both positive real values, the equilibrium point is
an unstable node as shown in Figure 19.7. In the case of an unstable node, all
trajectories move away from the origin to become unbounded.
3. Stable Node or Sink: tr(A)2 4A > 0, tr(A) < 0, A > 0
If the eigenvalues of A are both negative real values, the equilibrium point is a
stable node as shown in Figure 19.8. In the case of a stable node, all trajectories
move toward the equilibrium point and remain there.
4. Unstable Spiral or Focus: tr(A)2 4A < 0, Re[1,2 ] > 0 (tr(A) 6= 0)
If the eigenvalues of A are complex, but with positive real parts, the equilibrium point is an unstable focus. A plot of position versus time is shown in
316
10
10
5
10
u1 HtL
5
10
10
5
Figure 19.9 Plot of position versus time for an unstable spiral or focus.
Figure 19.9 and the phase plane plot is shown in Figure 19.10. As can be seen
from both plots, the trajectory begins at the origin, that is, the equilibrium
point, and spirals away, becoming unbounded.
5. Stable Spiral or Focus: tr(A)2 4A < 0, Re[1,2 ] < 0 (tr(A) 6= 0)
If the eigenvalues of A are complex, but with negative real parts, the equilibrium point is a stable focus. A plot of position versus time is shown in
Figure 19.11 and the phase plane plot is shown in Figure 19.12. In the case of
a stable focus, the trajectory spirals in toward the equilibrium point from any
initial condition.
6. Stable Center: tr(A)2 4A < 0, tr(A) = 0
If the eigenvalues of A are purely imaginary, the equilibrium point is a stable
center. A plot of position versus time is shown in Figure 19.13 and the phase
plane plot is shown in Figure 19.14. A stable center is comprised of a periodic
317
u2 HtL
10
u1 HtL
5
10
5
10
12
14
5
Figure 19.11 Plot of position versus time for a stable spiral or focus.
limit cycle solution centered at the equilibrium point. Because the trajectory
remains bounded, the solution is stable. Recall that the simple pendulum with
n even ( = 0) has a stable center.
There are situations for which such a modal analysis is not possible, as for
nonautonomous systems illustrated in the next section, or incomplete, as for
nonnormal systems illustrated in Section 19.6.
This material is from Variational Methods with Applications in Science and Engineering, Chapter 6.
318
5
2
4
0.5
10
15
20
25
30
0.5
1.0
cos(t) sin = 0.
`
`
Observe that this reduces to the equation of motion (19.8) for the simple pendulum when the amplitude of forcing vanishes (A = 0). For convenience, let us
nondimensionalize according to the forcing frequency by setting = t, in which
case d2 /dt2 = 2 d2 /d 2 , and the equation of motion becomes
+ ( + cos ) sin = 0,
(19.12)
2
where = g/`
p and = A/`. Note that the natural frequency of the unforced
pendulum is g/`; therefore, is the square of the ratio of the natural frequency
319
u2 HtL
1.0
0.5
1.0
u1 HtL
1.0
0.5
0.5
0.5
1.0
A cos (t)
g
(t)
m
Figure 19.15 Schematic of the forced pendulum.
of the
p pendulum to the forcing frequency . Forcing at the natural frequency,
= g/`, corresponds to = 1.
As for the simple pendulum, the equilibrium positions s, for which qi = qi = 0,
require that sin s = 0; thus, s = n, n = 0, 1, 2, . . .. The cases for which n is
even correspond to s = 0, in which case the pendulum is hanging vertically
down. Likewise, the cases for which n is odd correspond to s = , for which the
pendulum is located vertically above the pivot point.
Now let us consider stability of the equilibrium position s = 0, which is stable
for the simple pendulum (A = 0). Introducing small perturbations about the
equilibrium position
( ) = s + u( ) = u( ),
320
(19.13)
3
where again we linearize using the Taylor series sin(u) = u 3!u + . Equation (19.13) is known as Mathieus equation. As with the simple pendulum, we
can convert this to a system of firstorder equations in matrix form using the
transformation
u1 ( ) = u( ),
u2 ( ) = u(
).
Differentiating and substituting Mathieus equation leads to the system
u 1
0
1 u1
=
.
(19.14)
u 2
( + cos ) 0 u2
Observe that the matrix A in (19.14) for Mathieus equation is timedependent
owing to the forcing. Such systems are called nonautonomous, and we cannot evaluate stability by examining the eigenvalues. Instead, we directly solve Mathieus
equation numerically for the perturbations u( ).
Whereas the simple pendulum is stable near the equilibrium position corresponding to s = 0, surprisingly, the forced pendulum allows for regions of instability for s = 0 and certain values of the parameters and , which are related
to the forcing frequency and amplitude, respectively. To illustrate this behavior,
we will set the initial conditions to be u(0) = 1 and u(0)
321
u()
1.0
0.5
100
200
300
400
500
400
500
0.5
1.0
(a) = 0.24.
u()
100
200
300
(b) = 0.25.
Figure 19.16 Oscillation amplitude u( ) for a case with = 0.01 and two
forcing frequencies.
toward the equilibrium position, keeping the pendulums motion stable. This
unusual behavior is only observed for small magnitudes of negative , that is,
large forcing frequencies.
322
0.5
100
200
300
400
500
0.5
1.0
AA = A A.
A real, symmetric matrix, for example, is normal, but not all normal matrices
are symmetric.
For systems governed by nonnormal matrices, the asymptotic stability behavior for large times is still determined by the eigenvalues of the systems coefficient
matrix. However, owing to exchange of energy between the nonorthogonal modes,
it is possible that the system will exhibit a different behavior for O(1) times as
compared to the asymptotic behavior observed for large times. This is known as
transient growth.7 For example, a system may be asymptotically stable, meaning
that none of the eigenvalues lead to a growth of their corresponding modes (eigenvectors) as t , but it displays a transient growth for finite times in which
the amplitude of a perturbation grows for some period of time before yielding to
the largetime asymptotic behavior.
Consider a secondorder example given in Farrell and Ioannou (1996) in order
to illustrate the effect of nonnormality of the system on the transient growth
behavior for finite times before the solution approaches its asymptotically stable
7
This material is from Variational Methods with Applications in Science and Engineering, Chapter 6.
323
1.0
0.8
0.6
0.4
0.2
10
Figure 19.18 Solution of the normal system with = /2 for u1 (t) (solid
line) and u2 (t) (dashed line).
equilibrium point for large time. The system is governed by the following firstorder ordinary differential equations
u 1
1 cot u1
=
.
(19.15)
u 2
0
2
u2
The coefficient matrix A is normal if = /2 and nonnormal for all other .
The system has an equilibrium point at (u1 , u2 ) = (0, 0).
Let us first consider the behavior of the normal system with = /2. In this
case, the eigenvalues are 1 = 2 and 2 = 1. Because the eigenvalues are
negative and real, the system is asymptotically stable in the form of a stable
node. The corresponding eigenvectors are given by
0
1
v1 =
, v2 =
,
1
0
which are mutually orthogonal as expected for a normal system. The solution
for the system (19.15) with = /2 is shown in Figures 19.18 and 19.19 for the
initial conditions u1 (0) = 1 and u2 (0) = 1. For such a stable normal system,
the perturbed solution decays exponentially toward the stable equilibrium point
(u1 , u2 ) = (0, 0) according to the rates given by the eigenvalues. In phase space,
the trajectory starts at the initial condition (u1 (0), u2 (0)) = (1, 1) and moves
progressively toward the equilibrium point (0, 0).
Now consider the system (19.15) with = /100, for which the system is no
longer normal. While the eigenvalues 1 = 2 and 2 = 1 remain the same, the
corresponding eigenvectors are now
1
cot 100
, v2 =
,
v1 =
1
0
324
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
u1(t)
Figure 19.19 Solution of the normal system with = /2 for u1 (t) and
u2 (t) in phase space (the dot indicates the initial condition).
which are no longer orthogonal giving rise to the potential for transient growth
behavior. The solution for the system (19.15) with = /100 is shown in Figures 19.20 and 19.21 for the same initial conditions u1 (0) = 1 and u2 (0) = 1
as before. Recall that the system is asymptotically stable as predicted by its
eigenvalues, and indeed the perturbations do decay toward zero for large times.
Whereas u2 (t) again decays monotonically8 toward the equilibrium point, however, u1 (t) first grows to become quite large before decaying toward the stable
equilibrium point. In phase space, the trajectory starts at the initial condition
(u1 (0), u2 (0)) = (1, 1) but moves farther away from the equilibrium point at the
origin before being attracted to it, essentially succumbing to the asymptotically
stable larget behavior.
The transient growth behavior can occur for nonnormal systems because the
nonorthogonal eigenvectors lead to the possibility that the individual modes can
exchange energy leading to transient growth. In other words, even though each
eigenvalue leads to exponential decay of their respective modes, the corresponding
nonorthogonal eigenvectors may interact to induce a temporary growth of the
solution owing to the different rates of decay of each mode.
Note that for a nonlinear system, the linearized behavior represented by the
perturbation equations only governs in the vicinity of the equilibrium point.
Therefore, transient growth behavior could lead to the solution moving too far
from the equilibrium point for which the linearized behavior is valid, thereby
bringing in nonlinear effects and altering the stability properties before the eventual decay to the asymptotic stability state can be realized.
Not all nonnormal systems exhibit such a transient growth behavior. For example, try the system (6.9) with = /4. Also, observe that the coefficient matrices
are nonnormal for both the simple and forced pendulum examples considered
8
325
t
2
10
Figure 19.20 Solution of the nonnormal system with = /100 for u1 (t)
(solid line) and u2 (t) (dashed line).
u2(t)
1.0
0.8
0.6
0.4
0.2
u1(t)
Figure 19.21 Solution of the nonnormal system with = /100 for u1 (t)
and u2 (t) in phase space (the dot indicates the initial condition).
(19.16)
for which the energy growth over some time interval 0 t tf is a maximum.
We define the energy of a disturbance u(t) at time t using the norm operator as
326
follows:
E(t) = ku(t)k2 = hu(t), u(t)i ,
where h, i is the inner product. We seek the initial perturbation u(0) that produces the greatest relative energy gain during the time interval 0 t tf defined
by
E [u(tf )]
.
G(tf ) = max
u(0) E [u(0)]
In other words, we seek the initial disturbance that maximizes the gain functional
G[u] =
ku(tf )k2
hu(tf ), u(tf )i
uT (tf )u(tf )
=
=
,
ku(0)k2
hu(0), u(0)i
uT (0)u(0)
(19.17)
(19.18)
and the initial conditions are
2
u(0) =
[uT (0)u(0)]
(0),
2uT (tf )u(tf )
(tf ) =
2
uT (0)u(0)
u(tf ).
(19.19)
(19.20)
The differential operator in the adjoint equation (19.18) is the adjoint of that
in the original governing equation u = Au. Note that the negative sign requires
equation (19.18) to be integrated backward in time from t = tf to t = 0, which
is consistent with the initial condition (19.20).
In order to determine the optimal initial perturbation that maximizes the gain
for a given terminal time tf , we solve the governing equation (19.16) forward
in time using the initial condition obtained from equation (19.19). The solution
at the terminal time u(tf ) is then used in equation (19.20) to determine the
initial condition (tf ) for the backwardtime integration of the adjoint equation
(19.18). The resulting solution for the adjoint variable (t) provides (0) for use
in equation (19.19), which is used to obtain the initial condition for the governing
equation (19.16). This procedure is repeated iteratively until a converged solution
for u(t) and (t) is obtained, from which the optimal initial perturbation u(0)
is determined.
Let us return to the system governed by equation (19.15). Recall that the
system is nonnormal for = /100 and leads to transient growth for finite
times before succumbing to the asymptotically stable behavior for large times
predicted by a modal analysis. After carrying out a series of optimal perturbation
calculations as described above for a range of terminal times tf , we can calculate
327
G(tf )
70
60
50
40
30
20
10
0.0
0.5
1.0
1.5
2.0
2.5
3.0
tf
Figure 19.22 Gain G(tf ) for a range of terminal times tf for the
nonnormal system with = /100.
the gain G(tf ) as defined by equation (19.17) and shown in Figure 19.22 for
a range of terminal times. The optimal initial perturbation u(0) that produces
these maximum gains for each terminal time tf is plotted in Figure 19.23. The
maximum gain is found to be G = 63.6 for a terminal time of tf = 0.69, which
means that the transient growth can result in a perturbation energy growing to
nearly 64 times its initial energy at t = 0. The initial perturbation that produces
this maximum gain is (u1 (0), u2 (0)) = (0.134, 2.134). As shown by the plot of
gain G(tf ), the gain eventually decays to zero as the terminal time increases,
consistent with the fact that the system is asymptotically stable for large times.
For the linear stability analysis of small perturbations about stationary equilibrium points considered here, the transient growth analysis for nonnormal systems
can be reduced to performing a singularvalue decomposition (SVD) to obtain the
optimal disturbance. The primary virtue of the variational approach presented
here is that it can be extended to evaluate stability of nonlinear state equations
and/or stability of perturbations about timeevolving (unsteady) solutions. The
nonlinear case corresponds to analysis of nonlinear systems subject to perturbations that are not infinitesimally small. Stability of nonnormal, continuous fluid
dynamical systems will be encountered in Section 20.6.
19.7 Nonlinear Systems
328
4
3
2
1
0.5
1.0
1.5
2.0
2.5
3.0
tf
1
2
Figure 19.23 Initial perturbations u1 (0) (solid line) and u2 (0) (dashed
line) that produce the maximum gain for a range of terminal times tf for the
nonnormal system with = /100.
20
Continuous Systems
Recall that for discrete systems there are a finite number of degrees of freedom,
and the equations of motion are systems of ordinary differential equation in time.
Furthermore, stability is governed by algebraic eigenproblems. For continuous
systems, in which the mass is distributed throughout the system, there are an
infinite number of degrees of freedom, and the equations of motion are partial
differential equations in space and time. Stability is governed by differential eigenproblems.
We begin our discussion of continuous systems by considering the wave equation, which is the canonical hyperbolic partial differential equation (see Section 13.2). Using the method of separation of variables described in Chapter 3,
we obtain solutions in terms of eigenfunctions expansion for the onedimensional
and twodimensional wave equations.
20.1 Wave Equation
Consider a general partial differential equation, for example wave equation, unsteady beam equation, etc., of the form
M
u + Ku = 0,
(20.1)
where M and K are linear differential operators in space, dots denote differentiation in time, and u(x, y, z, t) is the unsteady displacement of the string,
membrane, beam, etc.
In order to convert the partial differential equation (20.1) into ordinary differential equations, we use the method of separation of variables. We write u(x, y, z, t)
as the product of two functions, one that accounts for the spatial dependence and
one for the temporal dependence, as follows
u(x, y, z, t) = (x, y, z)(t).
Substituting into the governing equation (20.1) gives
M
+ K = 0,
or separating variables leads to
(t)
K(x, y, z)
=
= .
M(x, y, z)
(t)
329
(20.2)
330
Continuous Systems
(x,t)
P
P
x=0
x=
P
x=
x=0
Because the lefthand side is a function of x, y, and z only, and the righthand
side of t only, the equation must be equal to a constant, say . Then
K(x, y, z) = M(x, y, z),
(20.3)
= (t),
(t)
(20.4)
2u
2u
= c2 2 ,
(20.5)
2
t
x
where c is the wave speed in the material, and u(x, t) is the displacement. The
wave equation1 governs, for example:
1. Lateral vibration of string, with c2 = P , where is the mass per unit length
of the string, as shown in Figure 20.1.
2. Longitudinal vibration of rod, with c2 = E , where E is Youngs modulus of
the rod, and is the mass per unit volume of the rod, as shown in Figure 20.2.
Solve the wave equation using an eigenfunction expansion.
1
See Variational Methods with Applications in Science and Engineering for a derivation.
331
Solution: Writing the wave equation (20.5) in the form of equation (20.1), we
have the spatial differential operators
1
2
,
K
=
.
c2
x2
From equations (20.3) and (20.4), and letting = 2 > 0, we have
M=
d2 2
+ 2 = 0,
dx2
c
2
d
+ 2 = 0,
dt2
which are two ordinary differential equations in x and t, respectively, from one
partial differential equation. The solutions to these equations are (see the first
example in Section 3.2 with = /c and = , respectively),
(x) = c1 cos
x + c2 sin
x ,
(20.6)
c
c
(t) = c3 cos(t) + c4 sin(t).
(20.7)
Recall that (20.6) are the spatial eigenfunctions (x), that is, the vibrational
modes, and 2 /c2 are the eigenvalues.
To obtain the four constants, we need boundary and initial conditions. For a
vibrating string, for example, consider the homogeneous boundary conditions
u(0, t) = 0,
u(`, t) = 0,
corresponding to zero displacement at both ends. Noting that the boundary conditions in x on u(x, t) = (x)(t) must be satisfied by (x), we require
u(0, t) = 0
(0) = 0
c1 = 0.
Similarly, from
u(`, t) = 0
(`) = 0,
and we have
sin
`
c
= 0,
(20.8)
n = 1, 2, . . . ;
therefore,
nc
, n = 1, 2, . . . ,
`
where n are the natural frequencies (rad/s). The eigenfunctions are
n
n (x) = c2 sin
x , n = 1, 2, . . . ,
`
n =
(20.9)
(20.10)
332
Continuous Systems
(20.11)
(20.12)
X
u(x, 0)
= g(x)
n (x) n (0) = g(x).
t
But from equations (20.11) and (20.12)
n (0) = c3 ,
(20.13)
n (0) = c4 n ;
1
n (0) sin(n t).
n
(20.14)
Taking the inner product of m (x) with both sides of the first equation in (20.13)
gives
kn (x)k2 n (0) = hf (x), n (x)i ,
where all terms are zero due to orthogonality except when m = n. Then
Z
n
hf (x), n (x)i
2 `
n (0) =
=
f
(x)
sin
x dx.
kn (x)k2
` 0
`
Similarly, from the second equation in (20.13)
kn (x)k2 n (0) = hg(x), n (x)i ,
n (0) =
hg(x), n (x)i
2
=
kn (x)k2
`
g(x) sin
0
n
x dx.
`
333
2
u!x,t"
0
3
1
2
0
2
0.2
0.4
x
0.6
0.8
1 0
Both n (0) and n (0) are Fourier sine coefficients of f (x) and g(x), respectively.
Finally, the solution is
n
X
X
1
x n (0) cos(n t) +
u(x, t) =
n (x)n (t) =
sin
n (0) sin(n t) ,
`
n
n=1
n=1
(20.15)
where from (20.9)
nc
n =
, n = 1, 2, . . . .
`
For example, consider the initial conditions
2
u(x,
0) = g(x) = 0,
334
Continuous Systems
Thus far our examples have involved Fourier series for the eigenfunctions. In
the following example we encounter Bessel functions as the eigenfunctions.
Example 20.2 As an additional application to vibrations problems, consider
the vibration of a circular membrane of radius one.
Solution: The governing equation for the lateral displacement u(r, , t) is the
twodimensional wave equation in cylindrical coordinates
2
2u
u 1 u
1 2u
2
=
c
+
+
= c2 2 u ,
(20.16)
2
2
2
2
t
r
r r
r
where in equation (20.1)
M=
1
,
c2
K=
2
1
1 2
+
+
.
r2 r r r2 2
+
+
+ 2 = 0,
2
2
2
r
r r
r
c
(20.17)
d2 G 1 dG
1 d2 H
+
+
G
=
= ,
dr2
r dr
c2
H d2
(20.18)
where is the separation constant. Then we have the two ordinary differential
eigenproblems
2
2
dG
2d G
r
+r
+ 2 r G = 0,
(20.19)
dr2
dr
c
d2 H
+ H = 0.
d2
The general solution of equation (20.20) (with > 0) is of the form
(20.20)
(20.21)
335
To be singlevalued in the circular domain, however, the solution must be 2periodic in , requiring that
(r, + 2) = (r, ).
For the cosine term to be 2periodic,
= 0, 1, 2, 3, . . . ,
cos(2 ) = 1
and
sin(2 ) = 0
1
3
= 0, , 1, , 2, . . . .
2
2
= n = 0, 1, 2, 3, . . . .
(20.22)
n = 0, 1, 2, 3, . . . .
dGn
n2
r
+ + 2 r Gn = 0,
dr
dr
r
c
(20.23)
(20.24)
Gn (r) = c3 Jn
r + c4 Yn
r ,
(20.25)
c
c
where Jn and Yn are Bessel functions of the first and second kind, respectively.
Recall that Yn are unbounded at r = 0; therefore, we must set c4 = 0. Taking
c3 = 1, we have
!
n (r, ) = Jn
r [c1 cos(n) + c2 sin(n)] , n = 0, 1, 2, . . . ,
c
where in order to
satisfy the boundary condition at r = 1, we have
p the characteristic equation Jn ( /c) = 0. That is, m,n is chosen such that m,n /c are zeros
of the Bessel functions Jn (there are an infinity of zeros, m = 1, 2, 3, . . ., for each
Bessel function Jn ). For example, four modes are shown in Figure 20.4. Observe
336
Continuous Systems
1
0.5
0
0.5
1
1
1
0.5
0.75
1
0.5
0.25
0.5
0
1
1
0.5
0.5
0.5
0.5
0.5
1
1
1
(a) n = 0, m = 1
n = 0, m = 2.
0.5
0.5
0.5
0.5
1
1
0.5
0.5
0.5
0.5
1
1
0.5
0.5
0
0
0.5
0.5
1
(b) n = 1, m = 1
n = 1, m = 2.
X
u(r, , t) =
n (r, )n (t).
n=0
Remarks:
1. Whereas the eigenvalues and eigenfunctions have physical meaning in the vibrations context, where they are the natural frequencies and modes of vibration, respectively, in other contexts, such as heat conduction, electromagnetics,
20.2 Electromagnetics
337
(x)
P
P
x
x=0
x=
etc..., they do not and are merely a mathematical device by which to obtain
the solution via an eigenfunction expansion.
20.2 Electromagnetics
20.3 Schr
odinger Equation
20.4 Stability of Continuous Systems BeamColumn Buckling
Recall from Section 2.6.3 that stability of discrete systems, with n degrees of
freedom, results in an algebraic eigenproblem of the form
Ax = x.
In contrast, stability of continuous systems, with infinite degrees of freedom,
results in a differential eigenproblem of the form
Lu = u.
For example, let us consider the beamcolumn buckling problem.
Example 20.3 The buckling equation governing the lateral stability of a beamcolumn under axial load is the fourthorder differential equation2
d2
d2 u
d2 u
EI
+
P
= 0, 0 x `.
dx2
dx2
dx2
Here, x is the axial coordinate along the column, u(x) is the lateral deflection
of the column, and P is the axial compressive force applied to the ends of the
column as shown in Figure 20.5. The properties of the column are the Youngs
modulus E and the moment of inertia I of the crosssection. The product EI
represents the stiffness of the column.
If the end at x = 0 is hinged, in which case it cannot sustain a moment, and
the end at x = ` = 1 is fixed, then the boundary conditions are
u(0) = 0,
u00 (0) = 0,
u(1) = 0,
u0 (1) = 0.
See Variational Methods with Applications in Science and Engineering, Section 6.6, for a derivation.
338
Continuous Systems
Determine the buckling mode shapes and the corresponding critical buckling
loads.
Solution: For a constant crosssection column with uniform properties, E and I
are constant; therefore, we may define
=
P
,
EI
= 0,
(20.26)
dx4
dx2
which may be regarded as a generalized eigenproblem of the form L1 u = L2 u.3
Note that this may more readily be recognized as an eigenproblem of the usual
form if the substitution w = d2 u/dx2 is made resulting in w00 + w = 0; however,
we will consider the fourthorder equation for the purposes of this example. Trivial
solutions with u(x) = 0, corresponding to no lateral deflection, will occur for most
= P/EI. However, certain values of the parameter will produce nontrivial
solutions corresponding to buckling; these values of are the eigenvalues.
The fourthorder linear ordinary differential equation with constant coefficients
(20.26) has solutions of the form u(x) = erx . Letting = +2 > 0 ( = 0 and
< 0 produce only trivial solutions for the given boundary conditions4 ), then r
must satisfy
r4 + 2 r2 = 0,
r2 (r2 + 2 ) = 0,
r2 (r + i)(r i) = 0,
r = 0, 0, r = i.
Taking into account that r = 0 is a double root, the general solution to (20.26) is
u(x) = c1 e0x + c2 xe0x + c3 eix + c4 eix ,
or
u(x) = c1 + c2 x + c3 sin(x) + c4 cos(x).
(20.27)
0 = c1 + c4 c1 = c4 ,
00
u (0) = 0 0 = 2 c4
c4 = 0.
0 = c2 + c3 sin ,
u0 (1) = 0 0 = c2 + c3 cos .
3
4
(20.28)
339
30
20
10
2
10
12
10
20
30
Plotting tan and in Figure 20.6, the points of intersection are the roots. The
roots of the characteristic equation are (0 = 0 gives the trivial solution)
1 = 1.43, 2 = 2.46, 3 = 3.47, . . . ,
which have been obtained numerically. Thus, with n = 2n , the eigenvalues are
1 = 2.05 2 , 2 = 6.05 2 , 3 = 12.05 2 , . . . .
From the first relationship in (20.28), we see that
c2
c3 =
;
sin n
therefore, from the solution (20.27), the eigenfunctions are
sin(n x)
un (x) = c2 x
, n = 1, 2, 3, . . . .
sin n
(20.29)
As before, c2 is an arbitrary constant that may be chosen such that the eigenfunctions are normalized if desired.
Any differential equation with the same differential operator and boundary conditions as in equation (20.26), including nonhomogeneous forms, can be solved
by expanding the solution (and the righthand side) in terms of the eigenfunctions (20.29). A nonhomogeneous equation in this context would correspond to
application of lateral loads on the column.
For the homogeneous case considered here, however, solutions for different values of n correspond to various possible modes of the solution for the lateral
340
Continuous Systems
1.4
1.2
1
u1 (x) 0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
sin(1.43x)
,
sin(1.43)
341
u(`, t) = 0,
(20.31)
(20.32)
(20.33)
n2 2
,
`2
n
x ,
`
n = 1, 2, 3, . . .
(20.34)
i = 2, 3, . . . , I,
342
Continuous Systems
2 1
0 0
0
2
2
1 2 1 0
3
0
0
4
4
1
2
0
0
..
.. = x2 .. ,
..
.. . .
..
..
.
. .
.
.
. .
.
0
I1
0
0 2 1 I1
I
0
0
0 1 2
I
or
A = .
(20.35)
This is a somewhat curious term to use for this subject as hydro normally refers to water
specifically, whereas fluid dynamics and hydrodynamic stability theory apply universally to all fluids
and gases that behave as a continuum.
343
(20.36)
u
u
p
1
u
+u
+v
=
+
t
x
y
x Re
v
v
v
p
1
+u
+v
=
+
t
x
y
y Re
2u 2u
+
,
x2
y 2
(20.37)
2v
2v
+
x2 y 2
(20.38)
Here, x and y are the spatial Cartesian coordinates, and t is time. The velocity
components in the x and y directions are u(x, y, t) and v(x, y, t), respectively,
and p(x, y, t) is the pressure. The Reynolds number, Re, is a nondimensional
number that characterizes the relative importance of viscous diffusion and convection. Equation (20.36) enforces conservation of mass and is often referred to
as the continuity equation. Equations (20.37) and (20.38) enforce conservation of
momentum in the x and y directions, respectively.
We denote the solution to (20.36)(20.38), which we call the base flow, by
u0 (x, y, t), v0 (x, y, t) and p0 (x, y, t), and seek the behavior of small disturbances
to this base flow. If the amplitude of the small disturbances grow with time or
space, the flow is hydrodynamically unstable. There are two possibilities that
may be considered: 1) a temporal analysis determines whether the amplitude of
a spatial disturbance, for example, a wavy wall, grows or decays with time, and
2) a spatial analysis determine whether the amplitudes of temporal disturbances,
for example, a vibrating ribbon, grows or decays with time. The former is known
as an absolute instability, and the latter is known as a convective instability.
For infinitesimally small disturbances ( 1), the flow may be decomposed as
follows
u(x, y, t) = u0 (x, y, t) +
u(x, y, t),
v(x, y, t) = v0 (x, y, t) +
v (x, y, t),
p(x, y, t) = p0 (x, y, t) +
p(x, y, t).
Substituting into the NavierStokes equations (20.36)(20.38) gives
u
v0
v
u0
+
+
+
= 0,
x
x
y
y
u0
u
u0
u
u0
u
+
+ (u0 +
u)
+
+ (v0 +
v)
+
t
t
x
x
y
y
2
2
2
2
p0
p
1 u0
u
u0
u
=
+
+ 2 +
+ 2 ,
x
x Re x2
x
y 2
y
(20.39)
344
Continuous Systems
v0
v0
v
v0
v
+ (v0 +
v)
+
+ (u0 +
u)
+
+
t
t
x
x
y
y
2
2 v 2 v0
2 v
p0
p
1 v0
+ 2 +
+ 2 .
=
+
y
y Re x2
x
y 2
y
As expected, the O(1) terms are simply the NavierStokes equations (20.36)
(20.38) for the base flow u0 (x, y, t), v0 (x, y, t), and p0 (x, y, t). The O() terms for
the disturbance flow are
u
v
+
= 0,
(20.40)
x y
2u
2u
+ 2 ,
x2
y
(20.41)
2 v 2 v
+
.
x2 y 2
(20.42)
u
u0
u0
p
1
+ u0
+ v0
+
u
+
v =
+
t
x
y
x
y
x Re
v v0
v0
p
1
+ u0
+ v0
+
u
+
v =
+
t
x
y
x
y
y Re
Because is small, we neglect O(2 ) terms. Thus, the evolution of the disturbances u
(x, y, t), v(x, y, t), and p(x, y, t) are governed by the linearized NavierStokes (LNS) equations (20.40)(20.42), where the base flow is known. Because
we assume that the disturbances are infinitesimally small, such that the nonlinear
NavierStokes equations are linearized, this is called linear stability theory.
In principle, we could impose a disturbance u
, v, p at any time ti and track its
evolution in time and space to determine if the flow is stable to the imposed disturbance. To fully characterize the stability of the base flow, however, would require
many calculations of the LNS equations with different disturbance shapes imposed at different times. We can formulate a more manageable stability problem
by doing one or both of the following: 1) consider simplified base flows, and/or
2) impose wellbehaved disturbances.
(20.43)
A parallel base flow is one in which the velocity components only depend upon
on coordinate direction. For example, pressuredriven flow through a horizontal
channel, called Poiseuille flow, is such that (see Figure 20.8.
u0 = u0 (y),
v0 = 0,
p0 = p0 (x).
(20.44)
345
p(x, y, t) = p1 (y)e
(20.45)
346
Continuous Systems
due to the parallelflow assumption. In some cases, the parallelflow assumption can be justified on formal grounds if the wavelength is such that 1/ L,
where L is a typical streamwise length scale in the flow.
4. For the temporal analysis considered here, is real and c is complex. For a
spatial analysis, is complex and c is real in the perturbations (20.45).
For steady, parallel base flow (20.43) and (20.44), the NavierStokes equations
(20.36)(20.38) reduce to [from equation (20.37)]
dp0
d2 u0
= Re
,
dy 2
dx
(20.46)
where Re p00 (x) is a constant for Poiseuille flow. The disturbance equations (20.40)
(20.42) become
v
u
+
= 0,
x y
(20.47)
u
u0
p
1
u
+ u0
+
v =
+
t
x
y
x Re
v
p
1
v
+ u0
=
+
t
x
y Re
2u
2u
+ 2 ,
x2
y
2 v 2 v
+
.
x2 y 2
(20.48)
(20.49)
Substitution of the normal modes (20.45) into the disturbance equations (20.47)
(20.49) leads to
iu1 + v10 = 0,
i(u0 c)u1 + u00 v1 = ip1 +
i(u0 c)v1 = p01 +
(20.50)
1 00
(u 2 u1 ),
Re 1
1 00
(v 2 v1 ),
Re 1
(20.51)
(20.52)
1 1 000
(v 2 v10 ),
i Re 1
(20.53)
leaving equations (20.52) and (20.53) for v1 (y) and p1 (y). Solving equation (20.53)
for p1 (y), differentiating, and substituting into equation (20.52) leads to
(u0 c)(v100 2 v1 ) u000 v1 =
1 1 0000
(v 22 v100 + 4 v1 ),
i Re 1
(20.54)
347
1. For a given base flow velocity profile u0 (y), Reynolds number Re, and wavenumber , the OrrSommerfeld equation is a differential eigenproblem of the form
L1 v1 = cL2 v1 ,
where the wavespeeds c are the (complex) eigenvalues, the disturbance velocities v1 (y) are the eigenfunctions, and L1 and L2 are differential operators.
2. The OrrSommerfeld equation applies for steady, parallel, viscous flow perturbed by infinitesimally small normal modes, which is a local analysis.
3. For inviscid flow, which corresponds to Re , the OrrSommerfeld equation
reduces to the Rayleigh equation
(u0 c)(v100 2 v1 ) u000 v1 = 0.
4. For nonparallel flows, a significantly more involved global stability analysis is
required in which the base flow is two or threedimensional, that is,
u0 = u0 (x, y),
v0 = v0 (x, y),
p0 = p0 (x, y),
(20.55)
2i
,
Re
3 i
2 u0 (yj ) u000 (yj ).
Re
348
Continuous Systems
6i
+ y 2 y 2 Qj 2Pj ,
Re
4i
=
+ y 2 Pj ,
Re
i
,
=
Re
= y 2 (2 + 2 y 2 ) ,
= y 2 .
Because the OrrSommerfeld equation is 4th order, we need two boundary conditions on the disturbance velocity at each boundary. For solid surfaces at y = a, b,
we set
v = v 0 = 0,
at y = a, b.
(20.58)
>
>
Cv0 + B2
v
+
A
v
+
B
v
+
Cv
=
c
B
v
+
Av
+
Bv
1
2 2
2 3
4
2
3 ,
1
and from v 0 = 0 at y = a (j = 1)
v2 v0
=0
2y
v0 = v2 .
(20.59)
Similarly, for j = J
[CvJ2 + BJ vJ1 + (C + AJ )vJ ] = c [BvJ1 + AvJ ] .
(20.60)
349
C + A2 B 2 C
B3
A3 B3
C
B
A4
4
M(, Re) = ..
.
..
..
.
.
0
0
0
0
0
0
(20.55) are
0 0
C 0
B4 C
..
..
.
.
0 0
0 0
..
.
A B 0 0
A B
0
B
A B
0 B
N() = .. .. .. ..
.
. . .
0 0 0 0
0 0 0 0
..
.
0
0
0
..
.
0
0
0
..
.
0
0
0
..
.
0
0
0
..
.
0 0 0
0 0 0
0 0 0
,
.. .. ..
. . .
A B
0 B A
350
Continuous Systems
351
when computers were not capable of such large calculations. In addition, it allowed for use of welldeveloped algorithms for IVPs. However, using methods
for IVPs to solve BVPs is like using a hammer to drive a screw.
3. Solve the generalized eigenproblem
Mv = cNv,
where M and N are large, sparse matrices. In addition to the fact that the
matrices are sparse, in stability contexts such as this, we only need the least
stable mode, not the entire spectrum of eigenvalues. Recall that the least stable
mode is that with the largest imaginary part. Currently, the stateoftheart
in such situations is the Arnoldi method (see Section 4.?).
Note that N must be positive definite for use in the Arnoldi method. That
is, it must have all positive eigenvalues. In our case, this requires us to take
the negatives of the matrices M and N as defined above.
20.6.4 Example: PlanePoiseuille Flow
As an illustration, let us consider stability of planePoiseuille flow, which is
pressuredriven flow in a channel, as illustrated in Figure 20.12. Such a flow
is parallel; therefore, the base flow is a solution of equation (20.46). The solution
is a parabolic velocity profile given by
u0 (y) = y(2 y),
0 y 2.
(20.61)
Note that the base flow is independent of the Reynolds number. See the Mathematica notebook OS Psvll.nb for a solution of the OrrSommerfeld equation
using the approach outlined in the previous section. This notebook calculates
the complex wavespeeds, that is, the eigenvalues, for a given wavenumber and
Reynolds number Re in order to evaluate stability. Recall that a flow is unstable if the imaginary part of one of the discrete eigenvalues is positive, that is, if
max(ci ) > 0. The growth rate of the instability is then max(ci ).
Here, the baseflow solution is known analytically. More typically, the base flow
is computed numerically. If this is the case, note that the accuracy requirements
352
Continuous Systems
1.10
1.05
1.00
0.95
0.90
0.85
0.80
5000
6000
7000
8000
9000
10 000
for the base flow in order to perform stability calculations are typically much
greater than that required for the baseflow resolution alone. For example, 201
points have been used in the current calculations, which is much greater than
would be required to accurately represent the parabolic base flow.
By performing a large number of such calculations for a range of wavenumbers and Reynolds numbers, we can plot the marginal stability curve for planePoiseuille flow. This shows the curve in Re parameter space for which max(ci ) =
0, thereby delineating the regions of parameter space in which the flow is stable and unstable to normalmode disturbances. The marginal stability curve for
planePoiseuille flow is shown in Figure 20.13. Thus, the critical Reynolds number
is approximately Rec = 5, 800.
20.6.5 Numerical Stability Revisited
In light of our discussion of hydrodynamic instability here and numerical instability in Section 6.3, let us revisit these issues in relation to one another. Recall that
in real flows, small disturbances, for example, imperfections, vibrations, etc...,
affect the base flow. In numerical solutions, on the other hand, small errors,
for example, roundoff, etc..., act as disturbances in the flow. Consequently, the
issue is, What happens to small disturbances or errors as a real flow and/or
numerical solution evolves in time? If they decay, then it is stable, and the disturbances/errors are damped out. If they grow, then they are unstable, and the
disturbances/errors are amplified.
In computational fluid dynamics (CFD), therefore, there are two possible sources
of instability: 1) hydrodynamic instability, in which the flow itself is inherently
unstable, and 2) numerical instability, in which the numerical algorithm magnifies
the small errors. It is important to realize that the former is real and physical,
whereas the second is not physical and signals that a new numerical methods is
needed. The difficulty in CFD is that both are manifest in similar ways, that is,
in the form of oscillatory solutions; therefore, it is often difficult to determine
353
21
Optimization and Control
354
22
Image or Signal Processing and Data
Analysis
Until the early seventeenth century, our sight was limited to those things that
could be observed by the naked eye. The invention of the lens facilitated development of both the microscope and the telescope that allowed us to directly
observe the very small and the very large but distant. This allowed us to extend
our sight in both directions on the spatial scale from O(1). The discovery of the
electromagnetic spectrum and invention of devices to view and utilize it extended
our sight in both directions on the electromagnetic spectrum from the visible
range.
As I write this in July 2015, the New Horizons probe has just passed Pluto
after nine and onehalf years traveling through the solar system. There are seven
instruments on the probe that are recording pictures and data that are then
beamed billions of miles back to earth for recording and processing. Three of the
instruments are recording data in the visible, infrared, and ultraviolet ranges of
the electromagnetic spectrum, and the data from all of the instruments is transmitted back to earth as radio waves. In addition to interplanetary exploration,
medical imaging, scientific investigation, and surveillance, much of modern life
relies on myriad signals transmitting our communication, television, GPS, internet, and encryption data. More and more of this data is digital, rather than
analog, and requires rapid processing of large quantities of images and signals
that each contain increasing amounts of detailed data. All of these developments
have served to dramatically increase the volume, type, and complexity of images
and signals requiring processing.
Signals and images are produced, harvested, and processed by all manner of
electronic device that utilize the entire electromagnetic spectrum. The now traditional approach to image and signal processing is based on Fourier analysis,
which is the primary subject of the early part of this chapter. This accounts for
both discrete and continuous data. More recently, advances in variational image
processing have supplemented these techniques as well (see, for example, Cassel
2013).
As the volume of experimental and computational data rapidly proliferates,
there is a growing need for creative techniques to characterize and consolidate the
data such that the essential features readily can be extracted. This has given rise
to the development of the field of reducedorder modeling, of which the method of
properorthogonal decomposition (POD) (or principalcomponent analysis (PCA))
is the workhouse.
355
356
Appendix A
Row and Column Space of a Matrix
..
..
..
..
.
.
.
.
,
Am1 Am2 Amn
vm
u1
u2
un
where ui are the column vectors of A, and vi are the row vectors of A. The row
representation of A is
T
v1
A11
v2T
A12
A = .. , where v1 = .. , etc . . .
.
.
T
vm
A1n
357
358
A11
A21
u1 = .. ,
.
etc . . .
Am1
Then, for Ax = y, if c is a linear combination of the ui vectors, that is,
x1 u1 + x2 u2 + + xn un = c,
then c is in the column space of A.
Mult mai mult decât documente.
Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.
Anulați oricând.